When binlog is disabled, WSREP will not behave correctly when
SAVEPOINT ROLLBACK is executed since we don't register handlers for such case.
Fixed by registering WSREP handlerton for SAVEPOINT related commands.
If a table is altered using the MDEV-11369/MDEV-15562/MDEV-13134
ALGORITHM=INSTANT, it can force the table to use a non-canonical
format:
* A hidden metadata record at the start of the clustered index
is used to store each column's DEFAULT value. This makes it possible
to add new columns that have default values without rebuilding the table.
* Starting with MDEV-15562 in MariaDB Server 10.4, a BLOB in the
hidden metadata record is used to store column mappings. This makes
it possible to drop or reorder columns without rebuilding the table.
This also makes it possible to add columns to any position or drop
columns from any position in the table without rebuilding the table.
If a column is dropped without rebuilding the table, old records
will contain garbage in that column's former position, and new records
will be written with NULL values, empty strings, or dummy values.
This is generally not a problem. However, there may be cases where
users may want to avoid putting a table into this format.
For example, users may want to ensure that future UPDATE operations
after an ADD COLUMN will be performed in-place, to reduce write
amplification. (Instantly added columns are essentially always
variable-length.) Users might also want to avoid bugs similar to
MDEV-19916, or they may want to be able to export tables to
older versions of the server.
We will introduce the option innodb_instant_alter_column_allowed,
with the following values:
* never (0): Do not allow instant add/drop/reorder,
to maintain format compatibility with MariaDB 10.x and MySQL 5.x.
If the table (or partition) is not in the canonical format, then
any ALTER TABLE (even one that does not involve instant column
operations) will force a table rebuild.
* add_last (1, default in 10.3): Store a hidden metadata record that
allows columns to be appended to the table instantly (MDEV-11369).
In 10.4 or later, if the table (or partition) is not in this format,
then any ALTER TABLE (even one that does not involve column changes)
will force a table rebuild.
Starting with 10.4:
* add_drop_reorder (2, default): Like 'add_last', but allow the
metadata record to store a column map, to support instant
add/drop/reorder of columns (MDEV-15562).
- multi_range_read_info_const now uses the new records_in_range interface
- Added handler::avg_io_cost()
- Don't calculate avg_io_cost() in get_sweep_read_cost if avg_io_cost is
not 1.0. In this case we trust the avg_io_cost() from the handler.
- Changed test_quick_select to use TIME_FOR_COMPARE instead of
TIME_FOR_COMPARE_IDX to align this with the rest of the code.
- Fixed bug when using test_if_cheaper_ordering where we didn't use
keyread if index was changed
- Fixed a bug where we didn't use index only read when using order-by-index
- Added keyread_time() to HEAP.
The default keyread_time() was optimized for blocks and not suitable for
HEAP. The effect was the HEAP prefered table scans over ranges for btree
indexes.
- Fixed get_sweep_read_cost() for HEAP tables
- Ensure that range and ref have same cost for simple ranges
Added a small cost (MULTI_RANGE_READ_SETUP_COST) to ranges to ensure
we favior ref for range for simple queries.
- Fixed that matching_candidates_in_table() uses same number of records
as the rest of the optimizer
- Added avg_io_cost() to JT_EQ_REF cost. This helps calculate the cost for
HEAP and temporary tables better. A few tests changed because of this.
- heap::read_time() and heap::keyread_time() adjusted to not add +1.
This was to ensure that handler::keyread_time() doesn't give
higher cost for heap tables than for normal tables. One effect of
this is that heap and derived tables stored in heap will prefer
key access as this is now regarded as cheap.
- Changed cost for index read in sql_select.cc to match
multi_range_read_info_const(). All index cost calculation is now
done trough one function.
- 'ref' will now use quick_cost for keys if it exists. This is done
so that for '=' ranges, 'ref' is prefered over 'range'.
- scan_time() now takes avg_io_costs() into account
- get_delayed_table_estimates() uses block_size and avg_io_cost()
- Removed default argument to test_if_order_by_key(); simplifies code
page_cur_insert_rec_low(): Check the array bounds before comparing.
We used to read one byte beyond the end of the 'rec' payload.
The incorrect logic was originally introduced in
commit 7ae21b18a6
and modified in commit 138cbec5f2.
Backport to 10.4:
- Don't try to push down SELECTs that have a side effect
- In case the storage engine did support pushdown of SELECT with an INTO
clause, write the rows we've got from it into select->join->result,
and not thd->protocol. This way, SELECT ... INTO ... FROM
smart_engine_table will put the result into where instructed, and
NOT send it to the client.
The cause was an uninitalized variable on the slave when reading a dummy
event that can only be generated by the test.
Fixed by ensuring that flag2 is always initialized.
Fixed also some indentation issues and improved comments.
FOREIGN_KEY_CHECKS is disabled
- dict_foreign_find_index() can return NULL if InnoDB already dropped
the foreign index when FOREIGN_KEY_CHECKS is disabled.
MDEV-21605 Clean up and speed up interfaces for binary row logging
MDEV-21617 Bug fix for previous version of this code
The intention is to have as few 'if' as possible in ha_write() and
related functions. This is done by pre-calculating once per statement the
row_logging state for all tables.
Benefits are simpler and faster code both when binary logging is disabled
and when it's enabled.
Changes:
- Added handler->row_logging to make it easy to check it table should be
row logged. This also made it easier to disabling row logging for system,
internal and temporary tables.
- The tables row_logging capabilities are checked once per "statements
that updates tables" in THD::binlog_prepare_for_row_logging() which
is called when needed from THD::decide_logging_format().
- Removed most usage of tmp_disable_binlog(), reenable_binlog() and
temporary saving and setting of thd->variables.option_bits.
- Moved checks that can't change during a statement from
check_table_binlog_row_based() to check_table_binlog_row_based_internal()
- Removed flag row_already_logged (used by sequence engine)
- Moved binlog_log_row() to a handler::
- Moved write_locked_table_maps() to THD::binlog_write_table_maps() as
most other related binlog functions are in THD.
- Removed binlog_write_table_map() and binlog_log_row_internal() as
they are now obsolete as 'has_transactions()' is pre-calculated in
prepare_for_row_logging().
- Remove 'is_transactional' argument from binlog_write_table_map() as this
can now be read from handler.
- Changed order of 'if's in handler::external_lock() and wsrep_mysqld.h
to first evaluate fast and likely cases before more complex ones.
- Added error checking in ha_write_row() and related functions if
binlog_log_row() failed.
- Don't clear check_table_binlog_row_based_result in
clear_cached_table_binlog_row_based_flag() as it's not needed.
- THD::clear_binlog_table_maps() has been replaced with
THD::reset_binlog_for_next_statement()
- Added 'MYSQL_OPEN_IGNORE_LOGGING_FORMAT' flag to open_and_lock_tables()
to avoid calculating of binary log format for internal opens. This flag
is also used to avoid reading statistics tables for internal tables.
- Added OPTION_BINLOG_LOG_OFF as a simple way to turn of binlog temporary
for create (instead of using THD::sql_log_bin_off.
- Removed flag THD::sql_log_bin_off (not needed anymore)
- Speed up THD::decide_logging_format() by remembering if blackhole engine
is used and avoid a loop over all tables if it's not used
(the common case).
- THD::decide_logging_format() is not called anymore if no tables are used
for the statement. This will speed up pure stored procedure code with
about 5%+ according to some simple tests.
- We now get annotated events on slave if a CREATE ... SELECT statement
is transformed on the slave from statement to row logging.
- In the original code, the master could come into a state where row
logging is enforced for all future events if statement could be used.
This is now partly fixed.
Other changes:
- Ensure that all tables used by a statement has query_id set.
- Had to restore the row_logging flag for not used tables in
THD::binlog_write_table_maps (not normal scenario)
- Removed injector::transaction::use_table(server_id_type sid, table tbl)
as it's not used.
- Cleaned up set_slave_thread_options()
- Some more DBUG_ENTER/DBUG_RETURN, code comments and minor indentation
changes.
- Ensure we only call THD::decide_logging_format_low() once in
mysql_insert() (inefficiency).
- Don't annotate INSERT DELAYED
- Removed zeroing pos_in_table_list in THD::open_temporary_table() as it's
already 0
MDEV-21606 Improve update handler (long unique keys on blobs)
MDEV-21470 MyISAM and Aria start_bulk_insert doesn't work with long unique
MDEV-21606 Bug fix for previous version of this code
MDEV-21819 2 Assertion `inited == NONE || update_handler != this'
- Move update_handler from TABLE to handler
- Move out initialization of update handler from ha_write_row() to
prepare_for_insert()
- Fixed that INSERT DELAYED works with update handler
- Give an error if using long unique with an autoincrement column
- Added handler function to check if table has long unique hash indexes
- Disable write cache in MyISAM and Aria when using update_handler as
if cache is used, the row will not be inserted until end of statement
and update_handler would not find conflicting rows.
- Removed not used handler argument from
check_duplicate_long_entries_update()
- Syntax cleanups
- Indentation fixes
- Don't use single character indentifiers for arguments
MDEV-19964 S3 replication support
Added new configure options:
s3_slave_ignore_updates
"If the slave has shares same S3 storage as the master"
s3_replicate_alter_as_create_select
"When converting S3 table to local table, log all rows in binary log"
This allows on to configure slaves to have the S3 storage shared or
independent from the master.
Other thing:
Added new session variable '@@sql_if_exists' to force IF_EXIST to DDL's.
redo log during recovery
- InnoDB unnecessarily reads the page even though it has fully initialized
buffered redo log records. Allow the page initialization redo log to
apply for the page in buf_page_get_gen() during recovery.
- Renamed buf_page_get_gen() to buf_page_get_low()
- Newly added buf_page_get_gen() will check for buffered redo log for
the particular page id during recovery
- Added new function buf_page_mtr_lock() which basically latches the page
for the given latch type.
- recv_recovery_create_page() is inline function which creates a page
if it has page initialization redo log records.
when selecting from perfschema, filter out statements
used by the test istself in wait_condition.inc, because they,
by design, can be repeated unpredictable number of times.
* Remove dead code
* MDEV-21675 Data inconsistency after multirow insert rollback
This patch fixes data inconsistencies that happen after rollback of
multirow inserts, with binlog disabled.
For example, statements such as `INSERT INTO t1 VALUES (1,'a'),(1,'b')`
that fail with duplicate key error. In such cases the whole statement
is rolled back. However, with wsrep_emulate_binlog in effect, the
IO_CACHE would not be truncated, and the pending rows events would be
replicated to the rest of the cluster. In the above example, it would
result in row (1,'a') being replicated, whereas locally the statement
is rolled back entirely. Making the cluster inconsistent.
The patch changes the code so that prior to statement rollback,
pending rows event are removed and the stmt cache reset.
That patch also introduces MTR tests that excercise multirow insert
statements for regular, and streaming replication.
Remove CREATE/DROP database.
Remove some unnecessary suppressions, replacements, and
SQL statements.
Populate tables via have_sequence.inc to avoid the creation of
explicit InnoDB record locks in INSERT...SELECT. This will remove
some gaps in AUTO_INCREMENT values.
Forcing wait on nodes 2 and 3, to turn wsrep_ready to 'ON' before querying wsrep status variables.
This guarantees that status reads don't come too early on these nodes
A large INSERT would occasionally fail with a deadlock error. This
is due to a limitation in galera library: large writesets that are
internally fragmented (> 32KB), may fail to replicate during a
configuration change. Even if the configuration change is from primary
to primary.
Changed the test to reduce writeset sizes from ~65K to ~16K.
fil_delete_tablespace(): Remove the unused parameter drop_ahi,
and add the parameter if_exists=false. We want to suppress
error messages if we know that the tablespace has been discarded.
dict_table_rename_in_cache(): Pass the new parameter to
fil_delete_tablespace(), that is, do not complain about
missing tablespace if the tablespace has been discarded.
row_make_new_pathname(): Declare as static.
row_drop_table_for_mysql(): Tolerate !table->data_dir_path
when the tablespace has been discarded.
row_rename_table_for_mysql(): Skip part of the RENAME TABLE
when fil_space_get_first_path() returns NULL.
buf_pool_resize(): Simplify the fault injection
for innodb.buf_pool_resize_oom.
innodb.buf_pool_resize_oom: Use a small buffer pool.
innodb.innodb_buffer_pool_load_now: Make use of the sequence engine,
to avoid creating explicit InnoDB record locks. Clean up the
accesses to information_schema.innodb_buffer_page_lru.
Adapt from 10.2: git cherry-pick bfb5e1c3f0
buf_pool_t::chunk_t::create(), buf_pool_t::resize():
Restore or simplify the debug instrumentation.
buf_resize_callback(): Add DBUG_ENTER/DBUG_VOID_RETURN so that
the DBUG_EXECUTE_IF in buf_pool_t::resize() can be triggered.