If a table is altered using the MDEV-11369/MDEV-15562/MDEV-13134
ALGORITHM=INSTANT, it can force the table to use a non-canonical
format:
* A hidden metadata record at the start of the clustered index
is used to store each column's DEFAULT value. This makes it possible
to add new columns that have default values without rebuilding the table.
* Starting with MDEV-15562 in MariaDB Server 10.4, a BLOB in the
hidden metadata record is used to store column mappings. This makes
it possible to drop or reorder columns without rebuilding the table.
This also makes it possible to add columns to any position or drop
columns from any position in the table without rebuilding the table.
If a column is dropped without rebuilding the table, old records
will contain garbage in that column's former position, and new records
will be written with NULL values, empty strings, or dummy values.
This is generally not a problem. However, there may be cases where
users may want to avoid putting a table into this format.
For example, users may want to ensure that future UPDATE operations
after an ADD COLUMN will be performed in-place, to reduce write
amplification. (Instantly added columns are essentially always
variable-length.) Users might also want to avoid bugs similar to
MDEV-19916, or they may want to be able to export tables to
older versions of the server.
We will introduce the option innodb_instant_alter_column_allowed,
with the following values:
* never (0): Do not allow instant add/drop/reorder,
to maintain format compatibility with MariaDB 10.x and MySQL 5.x.
If the table (or partition) is not in the canonical format, then
any ALTER TABLE (even one that does not involve instant column
operations) will force a table rebuild.
* add_last (1, default in 10.3): Store a hidden metadata record that
allows columns to be appended to the table instantly (MDEV-11369).
In 10.4 or later, if the table (or partition) is not in this format,
then any ALTER TABLE (even one that does not involve column changes)
will force a table rebuild.
Starting with 10.4:
* add_drop_reorder (2, default): Like 'add_last', but allow the
metadata record to store a column map, to support instant
add/drop/reorder of columns (MDEV-15562).
Backport to 10.4:
- Don't try to push down SELECTs that have a side effect
- In case the storage engine did support pushdown of SELECT with an INTO
clause, write the rows we've got from it into select->join->result,
and not thd->protocol. This way, SELECT ... INTO ... FROM
smart_engine_table will put the result into where instructed, and
NOT send it to the client.
redo log during recovery
- InnoDB unnecessarily reads the page even though it has fully initialized
buffered redo log records. Allow the page initialization redo log to
apply for the page in buf_page_get_gen() during recovery.
- Renamed buf_page_get_gen() to buf_page_get_low()
- Newly added buf_page_get_gen() will check for buffered redo log for
the particular page id during recovery
- Added new function buf_page_mtr_lock() which basically latches the page
for the given latch type.
- recv_recovery_create_page() is inline function which creates a page
if it has page initialization redo log records.
* Remove dead code
* MDEV-21675 Data inconsistency after multirow insert rollback
This patch fixes data inconsistencies that happen after rollback of
multirow inserts, with binlog disabled.
For example, statements such as `INSERT INTO t1 VALUES (1,'a'),(1,'b')`
that fail with duplicate key error. In such cases the whole statement
is rolled back. However, with wsrep_emulate_binlog in effect, the
IO_CACHE would not be truncated, and the pending rows events would be
replicated to the rest of the cluster. In the above example, it would
result in row (1,'a') being replicated, whereas locally the statement
is rolled back entirely. Making the cluster inconsistent.
The patch changes the code so that prior to statement rollback,
pending rows event are removed and the stmt cache reset.
That patch also introduces MTR tests that excercise multirow insert
statements for regular, and streaming replication.
Remove CREATE/DROP database.
Remove some unnecessary suppressions, replacements, and
SQL statements.
Populate tables via have_sequence.inc to avoid the creation of
explicit InnoDB record locks in INSERT...SELECT. This will remove
some gaps in AUTO_INCREMENT values.
Forcing wait on nodes 2 and 3, to turn wsrep_ready to 'ON' before querying wsrep status variables.
This guarantees that status reads don't come too early on these nodes
fil_delete_tablespace(): Remove the unused parameter drop_ahi,
and add the parameter if_exists=false. We want to suppress
error messages if we know that the tablespace has been discarded.
dict_table_rename_in_cache(): Pass the new parameter to
fil_delete_tablespace(), that is, do not complain about
missing tablespace if the tablespace has been discarded.
row_make_new_pathname(): Declare as static.
row_drop_table_for_mysql(): Tolerate !table->data_dir_path
when the tablespace has been discarded.
row_rename_table_for_mysql(): Skip part of the RENAME TABLE
when fil_space_get_first_path() returns NULL.
buf_pool_resize(): Simplify the fault injection
for innodb.buf_pool_resize_oom.
innodb.buf_pool_resize_oom: Use a small buffer pool.
innodb.innodb_buffer_pool_load_now: Make use of the sequence engine,
to avoid creating explicit InnoDB record locks. Clean up the
accesses to information_schema.innodb_buffer_page_lru.
Temporary tables are typically short-lived, and temporary tables
are assumed to be accessed only by the thread that is handling
the owning connection. Hence, they must not be subject to
defragmenting.
ha_innobase::optimize(): Do not add temporary tables to
the defragment_table() queue.
AUTO_INCREMENT values are nondeterministic after crash recovery.
While MDEV-6076 guarantees that the AUTO_INCREMENT values of committed
transactions will not roll back, it is possible that the AUTO_INCREMENT
values will be durably incremented for incomplete transactions. So
changing the test case to avoid showing the result of AUTO_INCREMENT value.
This was to remove a performance regression between 10.3 and 10.4
In 10.5 we will have a better implementation of records_in_range
that will enable us to get more statistics.
This change was not done in 10.4 because the 10.5 will be part of
a larger change that is not suitable for the GA 10.4 version
Other things:
- Changed default handler block_size to 8192 to fix things statistics
for engines that doesn't set the block size.
- Fixed a bug in spider when using multiple part const ranges
(Patch from Kentoku)
The default keyread_time() was optimized for blocks and not suitable for
HEAP. The effect was the HEAP prefered table scans over ranges for btree
indexes.
Fixed also get_sweep_read_cost() for HEAP tables.
When connections go to same node and deadlock happens, BF abort should
not happen for victim thread. Fixed by guarding
`wsrep_handle_SR_rollback()` so that is called only for SR transactions.
Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi>
Co-authored-by: Daniele Sciascia <daniele.sciascia@galeracluster.com>
- Flag ALTER_STORED_COLUMN_TYPE set while doing varchar extension
for partition table. Basically all partition supports
can_be_converted_by_engine() then it should be set to
ALTER_COLUMN_TYPE_CHANGE_BY_ENGINE.
If async replication slave thread conflicts with cluster replication,
then the async slave transaction should be BF aborted, and depending on the
state of async slave transaction execution, potentially also replayed.
There were problems in such BF abort implementation and the replaying was not
started.
This pull request contains fixes which make sure that if async slave thread is
marked to abort and replay, it will complete carry out the rollback and
release all locks and resources before starting the replaying. After replaying,
async slave transactions is treated as successful, so the slave thread will
continue as usual, handling next replication event.
There is also new mtr test: galera.galera_slave_replay, which stresses both a
certification failure for async slave thread and a successful BF abort
followed by replaying.
Do not rebuild index when it's key part converted from utf8mb3 to utf8mb4
but key part stays the same.
dict_index_add_to_cache(): assert that prefix_len is divided by mbmaxlen
ha_innobase::compare_key_parts(): compare key part lenght in symbols instead
of bytes.
* Remove those tests that will not be supported on that release.
* Make sure that correct tests are disabled and have MDEVs
* Sort test names
This should not be merged upwards.