- In 10.6, trx_rseg_t mutex was ported to use latch. As part of this porting
profiling of the patch was removed. This patch reenables it given that
the said latch continues to occupy the top-slots in the contention list.
A prominent bottleneck in mtr_t::commit() is log_sys.mutex between
log_sys.append_prepare() and log_close().
User-visible change: The minimum innodb_log_file_size will be
increased from 1MiB to 4MiB so that some conditions can be
trivially satisfied.
log_sys.latch (log_latch): Replaces log_sys.mutex and
log_sys.flush_order_mutex. Copying mtr_t::m_log to
log_sys.buf is protected by a shared log_sys.latch.
Writes from log_sys.buf to the file system will be protected
by an exclusive log_sys.latch.
log_sys.lsn_lock: Protects the allocation of log buffer
in log_sys.append_prepare().
sspin_lock: A simple spin lock, for log_sys.lsn_lock.
Thanks to Vladislav Vaintroub for suggesting this idea, and for
reviewing these changes.
mariadb-backup: Replace some use of log_sys.mutex with recv_sys.mutex.
buf_pool_t::insert_into_flush_list(): Implement sorting of flush_list
because ordering is otherwise no longer guaranteed. Ordering by LSN
is needed for the proper operation of redo log checkpoints.
log_sys.append_prepare(): Advance log_sys.lsn and log_sys.buf_free by
the length, and return the old values. Also increment write_to_buf,
which was previously done in log_close().
mtr_t::finish_write(): Obtain the buffer pointer from
log_sys.append_prepare().
log_sys.buf_free: Make the field Atomic_relaxed,
to simplify log_flush_margin(). Use only loads and stores
to avoid costly read-modify-write atomic operations.
buf_pool.flush_list_requests: Replaces
export_vars.innodb_buffer_pool_write_requests
and srv_stats.buf_pool_write_requests.
Protected by buf_pool.flush_list_mutex.
buf_pool_t::insert_into_flush_list(): Do not invoke page_cleaner_wakeup().
Let the caller do that after a batch of calls.
recv_recover_page(): Invoke a minimal part of
buf_pool.insert_into_flush_list().
ReleaseBlocks::modified: A number of pages added to buf_pool.flush_list.
ReleaseBlocks::operator(): Merge buf_flush_note_modification() here.
log_t::set_capacity(): Renamed from log_set_capacity().
In the parent commit, dict_sys.latch could theoretically have been
replaced with a mutex. But, we can do better and merge dict_sys.mutex
into dict_sys.latch. Generally, every occurrence of dict_sys.mutex_lock()
will be replaced with dict_sys.lock().
The PERFORMANCE_SCHEMA instrumentation for dict_sys_mutex
will be removed along with dict_sys.mutex. The dict_sys.latch
will remain instrumented as dict_operation_lock.
Some use of dict_sys.lock() will be replaced with dict_sys.freeze(),
which we will reintroduce for the new shared mode. Most notably,
concurrent table lookups are possible as long as the tables are present
in the dict_sys cache. In particular, this will allow more concurrency
among InnoDB purge workers.
Because dict_sys.mutex will no longer 'throttle' the threads that purge
InnoDB transaction history, a performance degradation may be observed
unless innodb_purge_threads=1.
The table cache eviction policy will become FIFO-like,
similar to what happened to fil_system.LRU
in commit 45ed9dd957.
The name of the list dict_sys.table_LRU will become somewhat misleading;
that list contains tables that may be evicted, even though the
eviction policy no longer is least-recently-used but first-in-first-out.
(Note: Tables can never be evicted as long as locks exist on them or
the tables are in use by some thread.)
As demonstrated by the test perfschema.sxlock_func, there
will be less contention on dict_sys.latch, because some previous
use of exclusive latches will be replaced with shared latches.
fts_parse_sql_no_dict_lock(): Replaced with pars_sql().
fts_get_table_name_prefix(): Merged to fts_optimize_create().
dict_stats_update_transient_for_index(): Deduplicated some code.
ha_innobase::info_low(), dict_stats_stop_bg(): Use a combination
of dict_sys.latch and table->stats_mutex_lock() to cover the
changes of BG_STAT_SHOULD_QUIT, because the flag is being read
in dict_stats_update_persistent() while not holding dict_sys.latch.
row_discard_tablespace_for_mysql(): Protect stats_bg_flag by
exclusive dict_sys.latch, like most other code does.
row_quiesce_table_has_fts_index(): Remove unnecessary mutex
acquisition. FLUSH TABLES...FOR EXPORT is protected by MDL.
row_import::set_root_by_heuristic(): Remove unnecessary mutex
acquisition. ALTER TABLE...IMPORT TABLESPACE is protected by MDL.
row_ins_sec_index_entry_low(): Replace a call
to dict_set_corrupted_index_cache_only(). Reads of index->type
were not really protected by dict_sys.mutex, and writes
(flagging an index corrupted) should be extremely rare.
dict_stats_process_entry_from_defrag_pool(): Only freeze the dictionary,
do not lock it exclusively.
dict_stats_wait_bg_to_stop_using_table(), DICT_BG_YIELD: Remove trx.
We can simply invoke dict_sys.unlock() and dict_sys.lock() directly.
dict_acquire_mdl_shared()<trylock=false>: Assert that dict_sys.latch is
only held in shared more, not exclusive mode. Only acquire it in
exclusive mode if the table needs to be loaded to the cache.
dict_sys_t::acquire(): Remove. Relocating elements in dict_sys.table_LRU
would require holding an exclusive latch, which we want to avoid
for performance reasons.
dict_sys_t::allow_eviction(): Add the table first to dict_sys.table_LRU,
to compensate for the removal of dict_sys_t::acquire(). This function
is only invoked by INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS.
dict_table_open_on_id(), dict_table_open_on_name(): If dict_locked=false,
try to acquire dict_sys.latch in shared mode. Only acquire the latch in
exclusive mode if the table is not found in the cache.
Reviewed by: Thirunarayanan Balathandayuthapani
For now, we will acquire the lock_sys.latch only in exclusive mode,
that is, use it as a mutex.
This is preparation for the next commit where we will introduce
a less intrusive alternative, combining a shared lock_sys.latch
with dict_table_t::lock_mutex or a mutex embedded in
lock_sys.rec_hash, lock_sys.prdt_hash, or lock_sys.prdt_page_hash.
InnoDB buffer pool block and index tree latches depend on a
special kind of read-update-write lock that allows reentrant
(recursive) acquisition of the 'update' and 'write' locks
as well as an upgrade from 'update' lock to 'write' lock.
The 'update' lock allows any number of reader locks from
other threads, but no concurrent 'update' or 'write' lock.
If there were no requirement to support an upgrade from 'update'
to 'write', we could compose the lock out of two srw_lock
(implemented as any type of native rw-lock, such as SRWLOCK on
Microsoft Windows). Removing this requirement is very difficult,
so in commit f7e7f487d4b06695f91f6fbeb0396b9d87fc7bbf we
implemented an 'update' mode to our srw_lock.
Re-entrant or recursive locking is mostly needed when writing or
freeing BLOB pages, but also in crash recovery or when merging
buffered changes to an index page. The re-entrancy allows us to
attach a previously acquired page to a sub-mini-transaction that
will be committed before whatever else is holding the page latch.
The SUX lock supports Shared ('read'), Update, and eXclusive ('write')
locking modes. The S latches are not re-entrant, but a single S latch
may be acquired even if the thread already holds an U latch.
The idea of the U latch is to allow a write of something that concurrent
readers do not care about (such as the contents of BTR_SEG_LEAF,
BTR_SEG_TOP and other page allocation metadata structures, or
the MDEV-6076 PAGE_ROOT_AUTO_INC). (The PAGE_ROOT_AUTO_INC field
is only updated when a dict_table_t for the table exists, and only
read when a dict_table_t for the table is being added to dict_sys.)
block_lock::u_lock_try(bool for_io=true) is used in buf_flush_page()
to allow concurrent readers but no concurrent modifications while the
page is being written to the data file. That latch will be released
by buf_page_write_complete() in a different thread. Hence, we use
the special lock owner value FOR_IO.
The index_lock::u_lock() improves concurrency on operations that
involve non-leaf index pages.
The interface has been cleaned up a little. We will use
x_lock_recursive() instead of x_lock() when we know that a
lock is already held by the current thread. Similarly,
a lock upgrade from U to X is only allowed via u_x_upgrade()
or x_lock_upgraded() but not via x_lock().
We will disable the LatchDebug and sync_array interfaces to
InnoDB rw-locks.
The SEMAPHORES section of SHOW ENGINE INNODB STATUS output
will no longer include any information about InnoDB rw-locks,
only TTASEventMutex (cmake -DMUTEXTYPE=event) waits.
This will make a part of the 'innotop' script dead code.
The block_lock buf_block_t::lock will not be covered by any
PERFORMANCE_SCHEMA instrumentation.
SHOW ENGINE INNODB MUTEX and INFORMATION_SCHEMA.INNODB_MUTEXES
will no longer output source code file names or line numbers.
The dict_index_t::lock will be identified by index and table names,
which should be much more useful. PERFORMANCE_SCHEMA is lumping
information about all dict_index_t::lock together as
event_name='wait/synch/sxlock/innodb/index_tree_rw_lock'.
buf_page_free(): Remove the file,line parameters. The sux_lock will
not store such diagnostic information.
buf_block_dbg_add_level(): Define as empty macro, to be removed
in a subsequent commit.
Unless the build was configured with cmake -DPLUGIN_PERFSCHEMA=NO
the index_lock dict_index_t::lock will be instrumented via
PERFORMANCE_SCHEMA. Similar to
commit 1669c8890c
we will distinguish lock waits by registering shared_lock,exclusive_lock
events instead of try_shared_lock,try_exclusive_lock.
Actual 'try' operations will not be instrumented at all.
rw_lock_list: Remove. After MDEV-24167, this only covered
buf_block_t::lock and dict_index_t::lock. We will output their
information by traversing buf_pool or dict_sys.
The extension of the test perfschema.sxlock_func in
commit 1669c8890c
turned out to be unstable.
Let us filter out purge_sys.latch (trx_purge_latch) from the output,
because it might happen that the purge tasks will not be executed
during the test execution.
Let us try to avoid code bloat for the common case that
performance_schema is disabled at runtime, and use
ATTRIBUTE_NOINLINE member functions for instrumented latch acquisition.
Also, let us distinguish lock waits from non-contended lock requests
by using write_lock,read_lock for the requests that lead to waits,
and try_write_lock,try_read_lock for the wait-free lock acquisitions.
Actual 'try' operations are not being instrumented at all.
We must avoid acquiring a latch while we are already holding one.
The tablespace latch was being acquired recursively in some
operations that allocate or free pages.
fts_cache_t::init_lock: Replace with mutex. This was only acquired
in exclusive mode.
fts_cache_t:🔒 Replace with mutex. The only read-lock user was
i_s_fts_index_cache_fill() for producing content for the view
INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE.
Many InnoDB rw-locks unnecessarily depend on the complex
InnoDB rw_lock_t implementation that support the SX lock mode
as well as recursive acquisition of X or SX locks.
One of them is the bunch of adaptive hash index search latches,
instrumented as btr_search_latch in PERFORMANCE_SCHEMA.
Let us introduce a simpler lock for those in order to
reduce overhead.
srw_lock: A simple read-write lock that does not support recursion.
On Microsoft Windows, this wraps SRWLOCK, only adding
runtime overhead if PERFORMANCE_SCHEMA is enabled.
On Linux (all architectures), this is implemented with
std::atomic<uint32_t> and the futex system call.
On other platforms, we will wrap mysql_rwlock_t with
zero runtime overhead.
The PERFORMANCE_SCHEMA instrumentation differs
from InnoDB rw_lock_t in that we will only invoke
PSI_RWLOCK_CALL(start_rwlock_wrwait) or
PSI_RWLOCK_CALL(start_rwlock_rdwait)
if there is an actual conflict.
We always defined PFS_SKIP_BUFFER_MUTEX_RWLOCK, that is,
the latches of the buffer pool blocks were never instrumented
in PERFORMANCE_SCHEMA.
For some reason, the debug_latch (which enforce proper usage of
buffer-fixing in debug builds) was instrumented.
The rw_lock_s_lock() calls for the buf_pool.page_hash became a
clear bottleneck after MDEV-15053 reduced the contention on
buf_pool.mutex. We will replace that use of rw_lock_t with a
special implementation that is optimized for memory bus traffic.
The hash_table_locks instrumentation will be removed.
buf_pool_t::page_hash: Use a special implementation whose API is
compatible with hash_table_t, and store the custom rw-locks
directly in buf_pool.page_hash.array, intentionally sharing
cache lines with the hash table pointers.
rw_lock: A low-level rw-lock implementation based on std::atomic<uint32_t>
where read_trylock() becomes a simple fetch_add(1).
buf_pool_t::page_hash_latch: The special of rw_lock for the page_hash.
buf_pool_t::page_hash_latch::read_lock(): Assert that buf_pool.mutex
is not being held by the caller.
buf_pool_t::page_hash_latch::write_lock() may be called while not holding
buf_pool.mutex. buf_pool_t::watch_set() is such a caller.
buf_pool_t::page_hash_latch::read_lock_wait(),
page_hash_latch::write_lock_wait(): The spin loops.
These will obey the global parameters innodb_sync_spin_loops and
innodb_sync_spin_wait_delay.
buf_pool_t::freed_page_hash: A singly linked list of copies of
buf_pool.page_hash that ever existed. The fact that we never
free any buf_pool.page_hash.array guarantees that all
page_hash_latch that ever existed will remain valid until shutdown.
buf_pool_t::resize_hash(): Replaces buf_pool_resize_hash().
Prepend a shallow copy of the old page_hash to freed_page_hash.
buf_pool_t::page_hash_table::n_cells: Declare as Atomic_relaxed.
buf_pool_t::page_hash_table::lock(): Explain what prevents a
race condition with buf_pool_t::resize_hash().