This is motivated by Oracle MySQL Bug #27542720 SCHEMA MISMATCH
- TABLE FLAGS DON'T MATCH, BUT FLAGS ARE NUMBERS
but using a different approach.
row_import::match_schema(): In case of a mismatch, display the
ROW_FORMAT and optionally KEY_BLOCK_SIZE of the .cfg file.
dict0dict.cc
buf_LRU_drop_page_hash_for_tablespace(): Return whether any adaptive
hash index entries existed. If yes, the caller should keep retrying to
drop the adaptive hash index.
row_import_for_mysql(), row_truncate_table_for_mysql(),
row_drop_table_for_mysql(): Ensure that the adaptive hash index was
entirely dropped for the table.
fil_page_decompress(): Replaces fil_decompress_page().
Allow the caller detect errors. Remove
duplicated code. Use the "safe" instead of "fast" variants of
decompression routines.
fil_page_compress(): Replaces fil_compress_page().
The length of the input buffer always was srv_page_size (innodb_page_size).
Remove printouts, and remove the fil_space_t* parameter.
buf_tmp_buffer_t::reserved: Make private; the accessors acquire()
and release() will use atomic memory access.
buf_pool_reserve_tmp_slot(): Make static. Remove the second parameter.
Do not acquire any mutex. Remove the allocation of the buffers.
buf_tmp_reserve_crypt_buf(), buf_tmp_reserve_compression_buf():
Refactored away from buf_pool_reserve_tmp_slot().
buf_page_decrypt_after_read(): Make static, and simplify the logic.
Use the encryption buffer also for decompressing.
buf_page_io_complete(), buf_dblwr_process(): Check more failures.
fil_space_encrypt(): Simplify the debug checks.
fil_space_t::printed_compression_failure: Remove.
fil_get_compression_alg_name(): Remove.
fil_iterate(): Allocate a buffer for compression and decompression
only once, instead of allocating and freeing it for every page
that uses compression, during IMPORT TABLESPACE. Also, validate the
page checksum before decryption, and reduce the scope of some variables.
fil_page_is_index_page(), fil_page_is_lzo_compressed(): Remove (unused).
AbstractCallback::operator()(): Remove the parameter 'offset'.
The check for it in FetchIndexRootPages::operator() was basically
redundant and dead code since the previous refactoring.
fil_page_decompress(): Replaces fil_decompress_page().
Allow the caller detect errors. Remove
duplicated code. Use the "safe" instead of "fast" variants of
decompression routines.
fil_page_compress(): Replaces fil_compress_page().
The length of the input buffer always was srv_page_size (innodb_page_size).
Remove printouts, and remove the fil_space_t* parameter.
buf_tmp_buffer_t::reserved: Make private; the accessors acquire()
and release() will use atomic memory access.
buf_pool_reserve_tmp_slot(): Make static. Remove the second parameter.
Do not acquire any mutex. Remove the allocation of the buffers.
buf_tmp_reserve_crypt_buf(), buf_tmp_reserve_compression_buf():
Refactored away from buf_pool_reserve_tmp_slot().
buf_page_decrypt_after_read(): Make static, and simplify the logic.
Use the encryption buffer also for decompressing.
buf_page_io_complete(), buf_dblwr_process(): Check more failures.
fil_space_encrypt(): Simplify the debug checks.
fil_space_t::printed_compression_failure: Remove.
fil_get_compression_alg_name(): Remove.
fil_iterate(): Allocate a buffer for compression and decompression
only once, instead of allocating and freeing it for every page
that uses compression, during IMPORT TABLESPACE.
fil_node_get_space_id(), fil_page_is_index_page(),
fil_page_is_lzo_compressed(): Remove (unused code).
fil_iterate(): Invoke fil_encrypt_buf() correctly when
a ROW_FORMAT=COMPRESSED table with a physical page size of
innodb_page_size is being imported. Also, validate the page checksum
before decryption, and reduce the scope of some variables.
AbstractCallback::operator()(): Remove the parameter 'offset'.
The check for it in FetchIndexRootPages::operator() was basically
redundant and dead code since the previous refactoring.
Also fixes MDEV-14727, MDEV-14491
InnoDB: Error: Waited for 5 secs for hash index ref_count (1) to drop to 0
by replacing the flawed wait logic in dict_index_remove_from_cache_low().
On DISCARD TABLESPACE, there is no need to drop the adaptive hash index.
We must drop it on IMPORT TABLESPACE, and eventually on DROP TABLE or
DROP INDEX. As long as the dict_index_t object remains in the cache
and the table remains inaccessible, the adaptive hash index entries
to orphaned pages would not do any harm. They would be dropped when
buffer pool pages are reused for something else.
btr_search_drop_page_hash_when_freed(), buf_LRU_drop_page_hash_batch():
Remove the parameter zip_size, and pass 0 to buf_page_get_gen().
buf_page_get_gen(): Ignore zip_size if mode==BUF_PEEK_IF_IN_POOL.
buf_LRU_drop_page_hash_for_tablespace(): Drop the adaptive hash index
even if the tablespace is inaccessible.
buf_LRU_drop_page_hash_for_tablespace(): New global function, to drop
the adaptive hash index.
buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Remove the parameter drop_ahi.
dict_index_remove_from_cache_low(): Actively drop the adaptive hash index
if entries exist. This should prevent InnoDB hangs on DROP TABLE or
DROP INDEX.
row_import_for_mysql(): Drop any adaptive hash index entries for the table.
row_drop_table_for_mysql(): Drop any adaptive hash index for the table,
except if the table resides in the system tablespace. (DISCARD TABLESPACE
does not apply to the system tablespace, and we do no want to drop the
adaptive hash index for other tables than the one that is being dropped.)
row_truncate_table_for_mysql(): Drop any adaptive hash index entries for
the table, except if the table resides in the system tablespace.
When Oracle fixed MDEV-13899 in their own way, they moved the
condition to the only caller of PageConverter::update_records().
Thus, the merge of 5.6.40 into MariaDB added a redundant condition.
PageConverter::update_records(): Move the page_is_leaf() condition
to the only caller, PageConverter::update_index_page().
Remove unused InnoDB function parameters and functions.
i_s_sys_virtual_fill_table(): Do not allocate heap memory.
mtr_is_block_fix(): Replace with mtr_memo_contains().
mtr_is_page_fix(): Replace with mtr_memo_contains_page().
The trx_t::undo_mutex covered both some main-memory data structures
(trx_undo_t) and access to undo pages. The trx_undo_t is only
accessed by the thread that is associated with a running transaction.
Likewise, each transaction has its private set of undo pages.
The thread that is associated with an active transaction may
lock multiple undo pages concurrently, but no other thread may
lock multiple pages of a foreign transaction.
Concurrent access to the undo logs of an active transaction is possible,
but trx_undo_get_undo_rec_low() only locks one undo page at a time,
without ever holding any undo_mutex.
It seems that the trx_t::undo_mutex would have been necessary if
multi-threaded execution or rollback of a single transaction
had been implemented in InnoDB.
Problem was that if tablespace was encrypted we try to copy
also page 0 from read buffer to write buffer that are in
that case the same memory area.
fil_iterate
When tablespace is encrypted or compressed its
first page (i.e. page 0) is not encrypted or
compressed and there is no need to copy buffer.
Problem was that if tablespace was encrypted we try to copy
also page 0 from read buffer to write buffer that are in
that case the same memory area.
fil_iterate
When tablespace is encrypted or compressed its
first page (i.e. page 0) is not encrypted or
compressed and there is no need to copy buffer.
Problem was that if tablespace was encrypted we try to copy
also page 0 from read buffer to write buffer that are in
that case the same memory area.
fil_iterate
When tablespace is encrypted or compressed its
first page (i.e. page 0) is not encrypted or
compressed and there is no need to copy buffer.
row_ins_sec_index_entry(): Compare a pointer to fil_system.sys_space,
not to a numeric constant. This code was recently changed in MDEV-13637,
and the condition was essentially disabled, potentially causing the
change buffer to grow uncontrollably when something is inserted into
a table that has secondary indexes and resides in the system tablespace.
Thanks to Daniel Black for pointing out that clang 7 flagged a warning
for the comparison of a pointer to an integer.
row_import_for_mysql(): Fix a possible compiler warning.
InnoDB always keeps all tablespaces in the fil_system cache.
The fil_system.LRU is only for closing file handles; the
fil_space_t and fil_node_t for all data files will remain
in main memory. Between startup to shutdown, they can only be
created and removed by DDL statements. Therefore, we can
let dict_table_t::space point directly to the fil_space_t.
dict_table_t::space_id: A numeric tablespace ID for the corner cases
where we do not have a tablespace. The most prominent examples are
ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file.
There are a few functional differences; most notably:
(1) DROP TABLE will delete matching .ibd and .cfg files,
even if they were not attached to the data dictionary.
(2) Some error messages will report file names instead of numeric IDs.
There still are many functions that use numeric tablespace IDs instead
of fil_space_t*, and many functions could be converted to fil_space_t
member functions. Also, Tablespace and Datafile should be merged with
fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use
fil_space_t& instead of a numeric ID, and after moving to a single
buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to
fil_space_t::page_hash.
FilSpace: Remove. Only few calls to fil_space_acquire() will remain,
and gradually they should be removed.
mtr_t::set_named_space_id(ulint): Renamed from set_named_space(),
to prevent accidental calls to this slower function. Very few
callers remain.
fseg_create(), fsp_reserve_free_extents(): Take fil_space_t*
as a parameter instead of a space_id.
fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(),
fil_name_write_rename(), fil_rename_tablespace(). Mariabackup
passes the parameter log=false; InnoDB passes log=true.
dict_mem_table_create(): Take fil_space_t* instead of space_id
as parameter.
dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter
'status' with 'bool cached'.
dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name.
fil_ibd_open(): Return the tablespace.
fil_space_t::set_imported(): Replaces fil_space_set_imported().
truncate_t: Change many member function parameters to fil_space_t*,
and remove page_size parameters.
row_truncate_prepare(): Merge to its only caller.
row_drop_table_from_cache(): Assert that the table is persistent.
dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL
if the tablespace has been discarded.
row_import_update_discarded_flag(): Remove a constant parameter.
We can rely on the dict_table_t::space. All indexes of a table object
are always in the same tablespace. (For fulltext indexes, the data is
located in auxiliary tables, and these will continue to have their own
table objects, separate from the main table.)
fil_iterate(), fil_tablespace_iterate(): Replace os_file_read()
with os_file_read_no_error_handling().
os_file_read_func(), os_file_read_no_error_handling_func():
Do not retry partial reads. There used to be an infinite amount
of retries. Because InnoDB extends both data and log files upfront,
partial reads should be impossible during normal operation.
Initialize block.page.zip only once.
PageConverter::update(): Initialize m_page_zip_ptr
as late as possible.
(We should really remove it at some point.)
PageConverter::operator(): Refer to block->page.zip instead of
m_page_zip_ptr.
AbstractCallback::get_frame(): Define static. Refer
to block->page.zip.data directly.
fil_iterate(): Refer to block->page.zip.data directly.
fil_tablespace_iterate(): Initialize block.page.zip.data as soon
as possible.
Reduce unnecessary inter-module calls for IMPORT TABLESPACE.
Move some IMPORT-related code from fil0fil.cc to row0import.cc.
PageCallback: Remove. Make AbstractCallback the base class.
PageConverter: Define some member functions inline.
PageConverter::adjust_cluster_record(): Instead of writing
the invalid value DB_ROLL_PTR=0, write a value that indicates
a fresh insert, that is, prevents the DB_ROLL_PTR from being
dereferenced in any circumstances.
It can be argued that IMPORT TABLESPACE should actually
update the dict_index_t::trx_id to prevent older transactions
from accessing the table, similar to what I did on table
rebuild in MySQL 5.6.6 in
03f81a55f2
Rollback attempted to dereference DB_ROLL_PTR=0, which cannot possibly
be a valid undo log pointer. A safer canonical value would be
roll_ptr_t(1) << ROLL_PTR_INSERT_FLAG_POS
which is what was chosen in MDEV-12288, corresponding to reset_trx_id.
No deterministic test case for the bug was found. The simplest test
cases may be related to MDEV-11415, which suppresses undo logging for
ALGORITHM=COPY operations. In those operations, in the spirit of
MDEV-12288, we should actually have written reset_trx_id instead of
using the transaction identifier of the current transaction
(and a bogus value of DB_ROLL_PTR=0). However, thanks to MySQL Bug#28432
which I had fixed in MySQL 5.6.8 as part of WL#6255, access to the
rebuilt table by earlier-started transactions should actually have been
refused with ER_TABLE_DEF_CHANGED.
reset_trx_id: Move the definition to data0type.cc and the declaration
to data0type.h.
btr_cur_ins_lock_and_undo(): When undo logging is disabled, use the
safe value that corresponds to reset_trx_id.
btr_cur_optimistic_insert(): Validate the DB_TRX_ID,DB_ROLL_PTR before
inserting into a clustered index leaf page.
ins_node_t::sys_buf[]: Replaces row_id_buf and trx_id_buf and some
heap usage.
row_ins_alloc_sys_fields(): Init ins_node_t::sys_buf[] to reset_trx_id.
row_ins_buf(): Only if undo logging is enabled, copy trx->id
to node->sys_buf. Otherwise, rely on the initialization in
row_ins_alloc_sys_fields().
row_purge_reset_trx_id(): Invoke mlog_write_string() with reset_trx_id
directly. (No functional change.)
trx_undo_page_report_modify(): Assert that the DB_ROLL_PTR is not 0.
trx_undo_get_undo_rec_low(): Assert that the roll_ptr is valid before
trying to dereference it.
dict_index_t::is_primary(): Check if the index is the primary key.
PageConverter::adjust_cluster_record(): Fix
MDEV-15249 Crash in MVCC read after IMPORT TABLESPACE
by resetting the system fields to reset_trx_id instead of writing
the current transaction ID (which will be committed at the
end of the IMPORT TABLESPACE) and DB_ROLL_PTR=0.
This can partially be viewed as a follow-up fix of MDEV-12288,
because IMPORT should already then have written
DB_TRX_ID=0 and DB_ROLL_PTR=1<<55 to prevent unnecessary
DB_TRX_ID lookups in subsequent accesses to the table.