fil_page_decompress(): Replaces fil_decompress_page().
Allow the caller detect errors. Remove
duplicated code. Use the "safe" instead of "fast" variants of
decompression routines.
fil_page_compress(): Replaces fil_compress_page().
The length of the input buffer always was srv_page_size (innodb_page_size).
Remove printouts, and remove the fil_space_t* parameter.
buf_tmp_buffer_t::reserved: Make private; the accessors acquire()
and release() will use atomic memory access.
buf_pool_reserve_tmp_slot(): Make static. Remove the second parameter.
Do not acquire any mutex. Remove the allocation of the buffers.
buf_tmp_reserve_crypt_buf(), buf_tmp_reserve_compression_buf():
Refactored away from buf_pool_reserve_tmp_slot().
buf_page_decrypt_after_read(): Make static, and simplify the logic.
Use the encryption buffer also for decompressing.
buf_page_io_complete(), buf_dblwr_process(): Check more failures.
fil_space_encrypt(): Simplify the debug checks.
fil_space_t::printed_compression_failure: Remove.
fil_get_compression_alg_name(): Remove.
fil_iterate(): Allocate a buffer for compression and decompression
only once, instead of allocating and freeing it for every page
that uses compression, during IMPORT TABLESPACE. Also, validate the
page checksum before decryption, and reduce the scope of some variables.
fil_page_is_index_page(), fil_page_is_lzo_compressed(): Remove (unused).
AbstractCallback::operator()(): Remove the parameter 'offset'.
The check for it in FetchIndexRootPages::operator() was basically
redundant and dead code since the previous refactoring.
fil_page_decompress(): Replaces fil_decompress_page().
Allow the caller detect errors. Remove
duplicated code. Use the "safe" instead of "fast" variants of
decompression routines.
fil_page_compress(): Replaces fil_compress_page().
The length of the input buffer always was srv_page_size (innodb_page_size).
Remove printouts, and remove the fil_space_t* parameter.
buf_tmp_buffer_t::reserved: Make private; the accessors acquire()
and release() will use atomic memory access.
buf_pool_reserve_tmp_slot(): Make static. Remove the second parameter.
Do not acquire any mutex. Remove the allocation of the buffers.
buf_tmp_reserve_crypt_buf(), buf_tmp_reserve_compression_buf():
Refactored away from buf_pool_reserve_tmp_slot().
buf_page_decrypt_after_read(): Make static, and simplify the logic.
Use the encryption buffer also for decompressing.
buf_page_io_complete(), buf_dblwr_process(): Check more failures.
fil_space_encrypt(): Simplify the debug checks.
fil_space_t::printed_compression_failure: Remove.
fil_get_compression_alg_name(): Remove.
fil_iterate(): Allocate a buffer for compression and decompression
only once, instead of allocating and freeing it for every page
that uses compression, during IMPORT TABLESPACE.
fil_node_get_space_id(), fil_page_is_index_page(),
fil_page_is_lzo_compressed(): Remove (unused code).
Also fixes MDEV-14727, MDEV-14491
InnoDB: Error: Waited for 5 secs for hash index ref_count (1) to drop to 0
by replacing the flawed wait logic in dict_index_remove_from_cache_low().
On DISCARD TABLESPACE, there is no need to drop the adaptive hash index.
We must drop it on IMPORT TABLESPACE, and eventually on DROP TABLE or
DROP INDEX. As long as the dict_index_t object remains in the cache
and the table remains inaccessible, the adaptive hash index entries
to orphaned pages would not do any harm. They would be dropped when
buffer pool pages are reused for something else.
btr_search_drop_page_hash_when_freed(), buf_LRU_drop_page_hash_batch():
Remove the parameter zip_size, and pass 0 to buf_page_get_gen().
buf_page_get_gen(): Ignore zip_size if mode==BUF_PEEK_IF_IN_POOL.
buf_LRU_drop_page_hash_for_tablespace(): Drop the adaptive hash index
even if the tablespace is inaccessible.
buf_LRU_drop_page_hash_for_tablespace(): New global function, to drop
the adaptive hash index.
buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Remove the parameter drop_ahi.
dict_index_remove_from_cache_low(): Actively drop the adaptive hash index
if entries exist. This should prevent InnoDB hangs on DROP TABLE or
DROP INDEX.
row_import_for_mysql(): Drop any adaptive hash index entries for the table.
row_drop_table_for_mysql(): Drop any adaptive hash index for the table,
except if the table resides in the system tablespace. (DISCARD TABLESPACE
does not apply to the system tablespace, and we do no want to drop the
adaptive hash index for other tables than the one that is being dropped.)
row_truncate_table_for_mysql(): Drop any adaptive hash index entries for
the table, except if the table resides in the system tablespace.
Problem:
Fix for Bug #21348684 (#Rb9581) introduced a conditional debug execute
'buf_pool_resize_chunk_null', which causes new chunks memory for 2nd
buffer pool instance is freed.
Buffer pool resize function removes all old chunks entry from
'buf_chunk_map_reg' and add new chunks entry into it. But when
'buf_pool_resize_chunk_null' is set true, 2nd buffer pool
instance's chunk entries are not added into 'buf_chunk_map_reg'.
When purge thread tries to access that buffer chunk, it leads to
debug assertion.
Fix:
Added old chunk entries into 'buf_chunk_map_reg' for 2nd buffer pool
instance when 'buf_pool_resize_chunk_null' debug condition is set to true.
Reviewed by: Jimmy <Jimmy.Yang@oracle.com>
RB: 18664
If the tablespace is dropped or truncated after the
space->is_stopping() check in fil_crypt_get_page_throttle_func(),
we would proceed to request the page, and eventually report a fatal
error.
buf_page_get_gen(): Do not retry reading if mode==BUF_GET_POSSIBLY_FREED.
lock_rec_block_validate(): Be prepared for a NULL return value when
invoking buf_page_get_gen() with mode=BUF_GET_POSSIBLY_FREED.
- Removed test if HA_FT_WTYPE == HA_KEYTYPE_FLOAT as this never worked
(HA_KEYTYPE_FLOAT is an enum)
- Define HA_FT_MAXLEN to 126 (was tested before but never defined)
There is only one redo log subsystem in InnoDB. Allocate the object
statically, to avoid unnecessary dereferencing of the pointer.
log_t::create(): Renamed from log_sys_init().
log_t::close(): Renamed from log_shutdown().
log_t::checkpoint_buf_ptr: Remove. Allocate log_t::checkpoint_buf
statically.
Bind more InnoDB parameters directly to MYSQL_SYSVAR and
remove "shadow variables".
innodb_change_buffering: Declare as ENUM, not STRING.
innodb_flush_method: Declare as ENUM, not STRING.
innodb_log_buffer_size: Bind directly to srv_log_buffer_size,
without rounding it to a multiple of innodb_page_size.
LOG_BUFFER_SIZE: Remove.
SysTablespace::normalize_size(): Renamed from normalize().
innodb_init_params(): A new function to initialize and validate
InnoDB startup parameters.
innodb_init(): Renamed from innobase_init(). Invoke innodb_init_params()
before actually trying to start up InnoDB.
srv_start(bool): Renamed from innobase_start_or_create_for_mysql().
Added the input parameter create_new_db.
SRV_ALL_O_DIRECT_FSYNC: Define only for _WIN32.
xb_normalize_init_values(): Merge to innodb_init_param().
fil_space_t::n_pending_ops, n_pending_ios: Use a combination of
fil_system.mutex and atomic memory access for protection.
fil_space_t::release(): Replaces fil_space_release().
Does not acquire fil_system.mutex.
fil_space_t::release_for_io(): Replaces fil_space_release_for_io().
Does not acquire fil_system.mutex.
InnoDB always keeps all tablespaces in the fil_system cache.
The fil_system.LRU is only for closing file handles; the
fil_space_t and fil_node_t for all data files will remain
in main memory. Between startup to shutdown, they can only be
created and removed by DDL statements. Therefore, we can
let dict_table_t::space point directly to the fil_space_t.
dict_table_t::space_id: A numeric tablespace ID for the corner cases
where we do not have a tablespace. The most prominent examples are
ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file.
There are a few functional differences; most notably:
(1) DROP TABLE will delete matching .ibd and .cfg files,
even if they were not attached to the data dictionary.
(2) Some error messages will report file names instead of numeric IDs.
There still are many functions that use numeric tablespace IDs instead
of fil_space_t*, and many functions could be converted to fil_space_t
member functions. Also, Tablespace and Datafile should be merged with
fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use
fil_space_t& instead of a numeric ID, and after moving to a single
buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to
fil_space_t::page_hash.
FilSpace: Remove. Only few calls to fil_space_acquire() will remain,
and gradually they should be removed.
mtr_t::set_named_space_id(ulint): Renamed from set_named_space(),
to prevent accidental calls to this slower function. Very few
callers remain.
fseg_create(), fsp_reserve_free_extents(): Take fil_space_t*
as a parameter instead of a space_id.
fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(),
fil_name_write_rename(), fil_rename_tablespace(). Mariabackup
passes the parameter log=false; InnoDB passes log=true.
dict_mem_table_create(): Take fil_space_t* instead of space_id
as parameter.
dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter
'status' with 'bool cached'.
dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name.
fil_ibd_open(): Return the tablespace.
fil_space_t::set_imported(): Replaces fil_space_set_imported().
truncate_t: Change many member function parameters to fil_space_t*,
and remove page_size parameters.
row_truncate_prepare(): Merge to its only caller.
row_drop_table_from_cache(): Assert that the table is persistent.
dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL
if the tablespace has been discarded.
row_import_update_discarded_flag(): Remove a constant parameter.
fil_space_t::atomic_write_supported: Always set this flag for
TEMPORARY TABLESPACE and during IMPORT TABLESPACE. The page
writes during these operations are by definition not crash-safe
because they are not written to the redo log.
fil_space_t::use_doublewrite(): Determine if doublewrite should
be used.
buf_dblwr_update(): Add assertions, and let the caller check whether
doublewrite buffering is desired.
buf_flush_write_block_low(): Disable the doublewrite buffer for
the temporary tablespace and for IMPORT TABLESPACE.
fil_space_set_imported(), fil_node_open_file(), fil_space_create():
Initialize or revise the space->atomic_write_supported flag.
buf_page_io_complete(), buf_flush_write_complete(): Add the parameter
dblwr, to indicate whether doublewrite was used for writes.
buf_dblwr_sync_datafiles(): Remove an unnecessary flush of
persistent tablespaces when flushing temporary tablespaces.
(Move the call to buf_dblwr_flush_buffered_writes().)
There is only one lock_sys. Allocate it statically in order to avoid
dereferencing a pointer whenever accessing it. Also, align some
members to their own cache line in order to avoid false sharing.
lock_sys_t::create(): The deferred constructor.
lock_sys_t::close(): The early destructor.
Note: Linux only
Core dumps of large buffer pool pages take time and space
and pose potential data expose in scenarios where data-at-rest
encryption is deployed.
Here we use madvise(MADV_DONT_DUMP) on large memory allocations
used by the innodb buffer pool, log_sys and recv_sys. The effect
of this system call is that these memory areas will not appear in
a core dump. Data from these buffers is rarely useful in fault
diagnosis.
log_sys and recv_sys structures now use large memory allocations
for their large buffer.
Debug builds don't include the madvise syscall and as such will
include full core dumps.
A function, buf_madvise_do_dump, is added but never called. It
is there to be called from a debugger to re-enable the core
dumping of all of these pages if for some reason the entire
contents of these buffers are needed.
Idea thanks to Hartmut Holzgraefe
Before that line there is call to buf_page_get_gen that could
return block = NULL when decrypting a page fails. However,
we should set error to be != DB_SUCCESS also. In error log
there was error about decompression but in that code there
is one case where error is not set correctly.
Import and adjust the innodb.innodb_buffer_pool_resize tests,
except innodb.innodb_buffer_pool_resize_debug, which would time out.
buf_pool_clear_hash_index(): Adjust assertions.
There is only one transaction system object in InnoDB.
Allocate the storage for it at link time, not at runtime.
lock_rec_fetch_page(): Use the correct fetch mode BUF_GET.
Pages may never be deallocated from a tablespace while
record locks are pointing to them.
trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite
expensive operations under trx_sys_t::mutex protection: e.g. malloc/free
when adding/removing elements. Traversing b-tree is not that cheap either.
This has negative scalability impact, which is especially visible when running
oltp_update_index.lua benchmark on a ramdisk.
To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None
of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex)
protection.
Another interesting issue observed with std::set is reproducible ~2% performance
decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable.
All in all this patch optimises away one of three trx_sys->mutex locks per
oltp_update_index.lua query. The other two critical sections became smaller.
Relevant clean-ups:
Replaced rw_trx_set iteration at startup with local set. The latter is needed
because values inserted to rw_trx_list must be ordered by trx->id.
Removed redundant conditions from trx_reference(): it is (and even was) never
called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY.
do_ref_count doesn't (and probably even didn't) make any sense: now it is called
only when reference counter increment is actually requested.
Moved condition out of mutex in trx_erase_lists().
trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were
greatly simplified and replaced by appropriate trx_rw_hash_t methods.
Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or
ACTIVE states. Transactions in COMMITTED state were required to be found
at InnoDB startup only. They are now looked up in the local set.
Removed unused trx_assert_recovered().
Removed unused innobase_get_trx() declaration.
Removed rather semantically incorrect trx_sys_rw_trx_add().
Moved information printout from trx_sys_init_at_db_start() to
trx_lists_init_at_db_start().
Ever since MDEV-10813 cleaned up InnoDB use of atomic memory operations
and made buf_block_fix() an atomic operation, some conditions around
buf_block_fix() have been unnecessary.
With a big buffer pool that contains many data pages,
DISCARD TABLESPACE took a long time, because it would scan the
entire buffer pool to remove any pages that belong to the tablespace.
With a large buffer pool, this would take a lot of time, especially
when the table-to-discard is empty.
The minimum amount of work that DISCARD TABLESPACE must do is to
remove the pages of the to-be-discarded table from the
buf_pool->flush_list because any writes to the data file must be
prevented before the file is deleted.
If DISCARD TABLESPACE does not evict the pages from the buffer pool,
then IMPORT TABLESPACE must do it, because we must prevent pre-DISCARD,
not-yet-evicted pages from being mistaken for pages of the imported
tablespace.
It would not be a useful fix to simply move the buffer pool scan to
the IMPORT TABLESPACE step. What we can do is to actively evict those
pages that could be mistaken for imported pages. In this way, when
importing a small table into a big buffer pool, the import should
still run relatively fast.
Import is bypassing the buffer pool when reading pages for the
adjustment phase. In the adjustment phase, if a page exists in
the buffer pool, we could replace it with the page from the imported
file. Unfortunately I did not get this to work properly, so instead
we will simply evict any matching page from the buffer pool.
buf_page_get_gen(): Implement BUF_EVICT_IF_IN_POOL, a new mode
where the requested page will be evicted if it is found. There
must be no unwritten changes for the page.
buf_remove_t: Remove. Instead, use trx!=NULL to signify that a write
to file is desired, and use a separate parameter bool drop_ahi.
buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Replace buf_remove_t.
buf_LRU_remove_pages(), buf_LRU_remove_all_pages(): Remove.
PageConverter::m_mtr: A dummy mini-transaction buffer
PageConverter::PageConverter(): Complete the member initialization list.
PageConverter::operator()(): Evict any 'shadow' pages from the
buffer pool so that pre-existing (garbage) pages cannot be mistaken
for pages that exist in the being-imported file.
row_discard_tablespace(): Remove a bogus comment that seems to
refer to IMPORT TABLESPACE, not DISCARD TABLESPACE.
ibuf_check_bitmap_on_import(): Only access the pages that
are below FSP_FREE_LIMIT. It is possible that especially with
ROW_FORMAT=COMPRESSED, the FSP_SIZE will be much bigger than
the FSP_FREE_LIMIT, and the bitmap pages (page_size*N, 1+page_size*N)
are filled with zero bytes.
buf_page_is_corrupted(), buf_page_io_complete(): Make the
fault injection compatible with MariaDB 10.2.
Backport the IMPORT tests from 10.2.