This is a backport of the following fix from MySQL 5.7.23.
Some code refactoring has been omitted, and the test case has
been adapted to MariaDB.
commit 7a689acaa65e9d602575f7aa53fe36a64a07460f
Author: Krzysztof Kapuścik <krzysztof.kapuscik@oracle.com>
Date: Tue Mar 13 12:34:03 2018 +0100
Bug#27082268 Invalid FTS sync synchronization
The fix closes two issues:
Bug #27082268 - INNODB: FAILING ASSERTION: SYM_NODE->TABLE != NULL DURING FTS SYNC
Bug #27095935 - DEADLOCK BETWEEN FTS_DROP_INDEX AND FTS_OPTIMIZE_SYNC_TABLE
Both issues were related to a FTS cache sync being done during
operations that perfomed DDL actions on internal FTS tables
(ALTER TABLE, TRUNCATE). In some cases the FTS tables and/or
internal cache structures could get removed while still being
used to perform FTS synchronization leading to crashes. In other
the sync operations could not get finishes as it was waiting for
dict lock which was taken by thread waiting for the background
sync to be finished.
The changes done includes:
- Stopping background operations during ALTER TABLE and TRUNCATE.
- Removal of unused code in FTS.
- Cleanup of FTS sync related code to make it more readable and
easier to maintain.
RB#18262
In Galera BF (brute force) transactions may not wait for lock requests
and normally BF-transaction would select transaction holding conflicting
locks as a victim for rollback. However, background statistic calculation
transaction is InnoDB internal transaction and it has no thd i.e. it can't be
selected as a victim. If background statistics calculation transaction holds
conflicting locks to statistics tables it will cause BF lock wait long
error message. Correct way to handle background statistics calculation is to
acquire thd for transaction but that change is too big for GA-releases and
there are other reported problems on background statistics calculation.
This fix avoids adding a table to background statistics calculation if
dict0dict.cc
buf_LRU_drop_page_hash_for_tablespace(): Return whether any adaptive
hash index entries existed. If yes, the caller should keep retrying to
drop the adaptive hash index.
row_import_for_mysql(), row_truncate_table_for_mysql(),
row_drop_table_for_mysql(): Ensure that the adaptive hash index was
entirely dropped for the table.
Also fixes MDEV-14727, MDEV-14491
InnoDB: Error: Waited for 5 secs for hash index ref_count (1) to drop to 0
by replacing the flawed wait logic in dict_index_remove_from_cache_low().
On DISCARD TABLESPACE, there is no need to drop the adaptive hash index.
We must drop it on IMPORT TABLESPACE, and eventually on DROP TABLE or
DROP INDEX. As long as the dict_index_t object remains in the cache
and the table remains inaccessible, the adaptive hash index entries
to orphaned pages would not do any harm. They would be dropped when
buffer pool pages are reused for something else.
btr_search_drop_page_hash_when_freed(), buf_LRU_drop_page_hash_batch():
Remove the parameter zip_size, and pass 0 to buf_page_get_gen().
buf_page_get_gen(): Ignore zip_size if mode==BUF_PEEK_IF_IN_POOL.
buf_LRU_drop_page_hash_for_tablespace(): Drop the adaptive hash index
even if the tablespace is inaccessible.
buf_LRU_drop_page_hash_for_tablespace(): New global function, to drop
the adaptive hash index.
buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Remove the parameter drop_ahi.
dict_index_remove_from_cache_low(): Actively drop the adaptive hash index
if entries exist. This should prevent InnoDB hangs on DROP TABLE or
DROP INDEX.
row_import_for_mysql(): Drop any adaptive hash index entries for the table.
row_drop_table_for_mysql(): Drop any adaptive hash index for the table,
except if the table resides in the system tablespace. (DISCARD TABLESPACE
does not apply to the system tablespace, and we do no want to drop the
adaptive hash index for other tables than the one that is being dropped.)
row_truncate_table_for_mysql(): Drop any adaptive hash index entries for
the table, except if the table resides in the system tablespace.
InnoDB does not allow FOREIGN KEY constraints to exist for TEMPORARY TABLE.
InnoDB introduced a dedicated tablespace for temporary tables, and actually
stopped creating persistent metadata and data for temporary tables.
row_table_add_foreign_constraints(): Do not create a persistent
transaction.
dict_create_foreign_constraints_low(): Add the persistent transaction to
the update the foreign key relation in dictionary.
dict_create_foreign_constraints_low(): Remove a duplicated check for
partitioned tables.
The predicate dict_table_is_discarded() checks whether
ALTER TABLE…DISCARD TABLESPACE has been executed.
Replace most occurrences of dict_table_is_discarded() with
checks of dict_table_t::space. A few checks for the flag
DICT_TF2_DISCARDED are necessary; write them inline.
Because !is_readable() implies !space, some checks for
dict_table_is_discarded() were redundant.
MDEV-12266 changed dict_table_t::space to a pointer.
Displaying pointer values in error messages would be even more
meaningless than displaying numeric tablespace identifiers.
row_create_table_for_mysql(): Do not display table->space when
deleting the file fails. We cannot dereference table->space here,
because fil_delete_tablespace() would have freed the object.
fil_wait_crypt_bg_threads(): Do not display table->space. We could
display table->space_id here, but it should not really add any value,
because the table reference-counts have no direct connection to files
or tablespaces.
InnoDB always keeps all tablespaces in the fil_system cache.
The fil_system.LRU is only for closing file handles; the
fil_space_t and fil_node_t for all data files will remain
in main memory. Between startup to shutdown, they can only be
created and removed by DDL statements. Therefore, we can
let dict_table_t::space point directly to the fil_space_t.
dict_table_t::space_id: A numeric tablespace ID for the corner cases
where we do not have a tablespace. The most prominent examples are
ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file.
There are a few functional differences; most notably:
(1) DROP TABLE will delete matching .ibd and .cfg files,
even if they were not attached to the data dictionary.
(2) Some error messages will report file names instead of numeric IDs.
There still are many functions that use numeric tablespace IDs instead
of fil_space_t*, and many functions could be converted to fil_space_t
member functions. Also, Tablespace and Datafile should be merged with
fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use
fil_space_t& instead of a numeric ID, and after moving to a single
buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to
fil_space_t::page_hash.
FilSpace: Remove. Only few calls to fil_space_acquire() will remain,
and gradually they should be removed.
mtr_t::set_named_space_id(ulint): Renamed from set_named_space(),
to prevent accidental calls to this slower function. Very few
callers remain.
fseg_create(), fsp_reserve_free_extents(): Take fil_space_t*
as a parameter instead of a space_id.
fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(),
fil_name_write_rename(), fil_rename_tablespace(). Mariabackup
passes the parameter log=false; InnoDB passes log=true.
dict_mem_table_create(): Take fil_space_t* instead of space_id
as parameter.
dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter
'status' with 'bool cached'.
dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name.
fil_ibd_open(): Return the tablespace.
fil_space_t::set_imported(): Replaces fil_space_set_imported().
truncate_t: Change many member function parameters to fil_space_t*,
and remove page_size parameters.
row_truncate_prepare(): Merge to its only caller.
row_drop_table_from_cache(): Assert that the table is persistent.
dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL
if the tablespace has been discarded.
row_import_update_discarded_flag(): Remove a constant parameter.
We can rely on the dict_table_t::space. All indexes of a table object
are always in the same tablespace. (For fulltext indexes, the data is
located in auxiliary tables, and these will continue to have their own
table objects, separate from the main table.)
It does not hurt to delete non-existing records from SYS_TABLESPACES
and SYS_DATAFILES. Because MariaDB does not support CREATE TABLESPACE,
only the system tablespace (space_id=0) can contain multiple tables.
But, there are no entries for the system tablespace in these tables
(which actually are stored inside the system tablespace).
Make foreign system versioning tables work in CASCADE UPDATE/SET NULL.
In that case basically row update is performed. This patch makes insert
of a historical row performed too.
row_update_versioned_insert(): restores btr_pcur_t, reads row from it, makes
row historical and inserts to table.
row_ins_check_foreign_constraint(): disable constraint check for historical
rows because it has no sense. Also check will fail always, because referenced
table is updated at that point.
row_update_cascade_for_mysql(): insert historical row for system versioning
tables before updating current row.
revert DATA_VERSIONED -> DATA_UNVERSIONED
Revert the dead code for MySQL 5.7 multi-master replication (GCS),
also known as
WL#6835: InnoDB: GCS Replication: Deterministic Deadlock Handling
(High Prio Transactions in InnoDB).
Also, make innodb_lock_schedule_algorithm=vats skip SPATIAL INDEX,
because the code does not seem to be compatible with them.
Add FIXME comments to some SPATIAL INDEX locking code. It looks
like Galera write-set replication might not work with SPATIAL INDEX.
Rollback attempted to dereference DB_ROLL_PTR=0, which cannot possibly
be a valid undo log pointer. A safe canonical value would be
roll_ptr_t(1) << ROLL_PTR_INSERT_FLAG_POS
which is what was chosen in MDEV-12288.
This bug was reproduced in 10.3 only. Potentially, the problem could
have been introduced by MDEV-11415, which suppresses undo logging for
ALGORITHM=COPY operations. In those operations, we should actually
have written the safe value of DB_ROLL_PTR instead of writing 0.
However, the test in commit 5421e3aee7
demonstrates that access to the rebuilt table by earlier-started
transactions should actually have been refused with ER_TABLE_DEF_CHANGED.
btr_cur_ins_lock_and_undo(): When undo logging is disabled, use the
safe value of DB_ROLL_PTR.
btr_cur_optimistic_insert(): Validate the DB_TRX_ID,DB_ROLL_PTR before
inserting into a clustered index leaf page.
ins_node_t::sys_buf[]: Replaces row_id_buf and trx_id_buf and some
heap usage.
row_ins_alloc_sys_fields(): Initialize ins_node_t::sys_buf[].
trx_undo_page_report_modify(): Assert that the DB_ROLL_PTR is not 0.
trx_undo_get_undo_rec_low(): Assert that the roll_ptr is valid before
trying to dereference it.
dict_index_t::is_primary(): Check if the index is the primary key.
Rollback attempted to dereference DB_ROLL_PTR=0, which cannot possibly
be a valid undo log pointer. A safer canonical value would be
roll_ptr_t(1) << ROLL_PTR_INSERT_FLAG_POS
which is what was chosen in MDEV-12288, corresponding to reset_trx_id.
No deterministic test case for the bug was found. The simplest test
cases may be related to MDEV-11415, which suppresses undo logging for
ALGORITHM=COPY operations. In those operations, in the spirit of
MDEV-12288, we should actually have written reset_trx_id instead of
using the transaction identifier of the current transaction
(and a bogus value of DB_ROLL_PTR=0). However, thanks to MySQL Bug#28432
which I had fixed in MySQL 5.6.8 as part of WL#6255, access to the
rebuilt table by earlier-started transactions should actually have been
refused with ER_TABLE_DEF_CHANGED.
reset_trx_id: Move the definition to data0type.cc and the declaration
to data0type.h.
btr_cur_ins_lock_and_undo(): When undo logging is disabled, use the
safe value that corresponds to reset_trx_id.
btr_cur_optimistic_insert(): Validate the DB_TRX_ID,DB_ROLL_PTR before
inserting into a clustered index leaf page.
ins_node_t::sys_buf[]: Replaces row_id_buf and trx_id_buf and some
heap usage.
row_ins_alloc_sys_fields(): Init ins_node_t::sys_buf[] to reset_trx_id.
row_ins_buf(): Only if undo logging is enabled, copy trx->id
to node->sys_buf. Otherwise, rely on the initialization in
row_ins_alloc_sys_fields().
row_purge_reset_trx_id(): Invoke mlog_write_string() with reset_trx_id
directly. (No functional change.)
trx_undo_page_report_modify(): Assert that the DB_ROLL_PTR is not 0.
trx_undo_get_undo_rec_low(): Assert that the roll_ptr is valid before
trying to dereference it.
dict_index_t::is_primary(): Check if the index is the primary key.
PageConverter::adjust_cluster_record(): Fix
MDEV-15249 Crash in MVCC read after IMPORT TABLESPACE
by resetting the system fields to reset_trx_id instead of writing
the current transaction ID (which will be committed at the
end of the IMPORT TABLESPACE) and DB_ROLL_PTR=0.
This can partially be viewed as a follow-up fix of MDEV-12288,
because IMPORT should already then have written
DB_TRX_ID=0 and DB_ROLL_PTR=1<<55 to prevent unnecessary
DB_TRX_ID lookups in subsequent accesses to the table.
MDEV-14222 Unnecessary 'cascade' memory allocation for every updated row
when there is no FOREIGN KEY
This reverts the MySQL 5.7.2 change
377774689b
which introduced these problems. MariaDB 10.2.2 inherited these problems
in commit 2e814d4702.
The FOREIGN KEY CASCADE and SET NULL operations implemented as
procedural recursion are consuming more than 8 kilobytes of stack
(9 stack frames) per iteration in a non-debug GNU/Linux AMD64 build.
This is why we need to limit the maximum recursion depth to 15 steps
instead of the 255 that it used to be in MySQL 5.7 and MariaDB 10.2.
A corresponding change was made in MySQL 5.7.21 in
7b26dc98a6
MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPY
Move a test from innodb.rename_table_debug to innodb.alter_copy.
ha_innobase::extra(HA_EXTRA_BEGIN_ALTER_COPY): Register id-versioned
tables so that mysql.transaction_registry will be updated, even for
empty tables that are subjected to ALTER TABLE…ALGORITHM=COPY.