In commit ad6171b91c (MDEV-22456)
we removed the acquisition of the adaptive hash index latch
from the caller of btr_search_update_hash_ref().
The tests innodb.innodb_buffer_pool_resize_with_chunks
and innodb.innodb_buffer_pool_resize
would occasionally fail starting with 10.3,
due to MDEV-12288 causing more purge activity during the test.
btr_search_update_hash_ref(): After acquiring the adaptive hash index
latch, check that the adaptive hash index is still enabled on the page.
Problem:
=======
- During alter rebuild, document read from old table is tokenzied
parallelly by innodb_ft_sort_pll_degree threads and stores it
in respective merge files. While doing the parallel merge, InnoDB
wrongly skips the root level selection of merging buffer records.
So it leads to insertion of merge records in non-ascending order.
Solution:
==========
Build selection tree for the root level also. So that root of
selection tree can always contain sorted buffer.
The rw_lock_stats were incorrectly updated.
While global statistics have limited usefulness, we cannot
remove them from a GA version. This contribution is slightly
improving performance in write workloads.
If the InnoDB buffer pool contains many pages for a table or index
that is being dropped or rebuilt, and if many of such pages are
pointed to by the adaptive hash index, dropping the adaptive hash index
may consume a lot of time.
The time-consuming operation of dropping the adaptive hash index entries
is being executed while the InnoDB data dictionary cache dict_sys is
exclusively locked.
It is not actually necessary to drop all adaptive hash index entries
at the time a table or index is being dropped or rebuilt. We can let
the LRU replacement policy of the buffer pool take care of this gradually.
For this to work, we must detach the dict_table_t and dict_index_t
objects from the main dict_sys cache, and once the last
adaptive hash index entry for the detached table is removed
(when the garbage page is evicted from the buffer pool) we can free
the dict_table_t and dict_index_t object.
Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE
skip both the buffer pool eviction and the drop of the adaptive hash index.
We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE.
We can remove the eviction from DROP TABLE. We must retain the eviction
in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the
discarded table is being re-imported with the same tablespace identifier,
the fresh data from the imported tablespace will replace any stale pages
in the buffer pool.
rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can
no longer be interrupted inside InnoDB.
fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(),
fseg_free_page_low(), fseg_free_extent(): Remove the parameter
that specifies whether the adaptive hash index should be dropped.
btr_search_lazy_free(): Lazily free an index when the last
reference to it is dropped from the adaptive hash index.
buf_pool_clear_hash_index(): Declare static, and move to the
same compilation unit with the bulk of the adaptive hash index
code.
dict_index_t::clone(), dict_index_t::clone_if_needed():
Clone an index that is being rebuilt while adaptive hash index
entries exist. The original index will be inserted into
dict_table_t::freed_indexes and dict_index_t::set_freed()
will be called.
dict_index_t::set_freed(), dict_index_t::freed(): Note that
or check whether the index has been freed. We will use the
impossible page number 1 to denote this condition.
dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count().
dict_index_t::detach_columns(): Move the assignment n_fields=0
to ha_innobase_inplace_ctx::clear_added_indexes().
We must have access to the columns when freeing the
adaptive hash index. Note: dict_table_t::v_cols[] will remain
valid. If virtual columns are dropped or added, the table
definition will be reloaded in ha_innobase::commit_inplace_alter_table().
buf_page_mtr_lock(): Drop a stale adaptive hash index if needed.
We will also reduce the number of btr_get_search_latch() calls
and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT
in order to benefit cmake -DWITH_INNODB_AHI=OFF.
Problem was that trx->lock.was_chosen_as_wsrep_victim variable was
not set back to false after it was set true.
wsrep_thd_bf_abort
Add assertions for correct mutex status and take necessary
mutexes before calling thd->awake_no_mutex().
innobase_rollback_trx()
Reset trx->lock.was_chosen_as_wsrep_victim
wsrep_abort_slave_trx()
Removed unused function.
wsrep_innobase_kill_one_trx()
Added function comment, removed unnecessary parameters
and added debug assertions to enforce correct usage. Added
more debug output to help out on error analysis.
wsrep_abort_transaction()
Added debug assertions and removed unused variables.
trx0trx.h
Removed assert_trx_is_free macro and replaced it with
assert_freed() member function.
trx_create()
Use above assert_free() and initialize wsrep variables.
trx_free()
Use assert_free()
trx_t::commit_in_memory()
Reset lock.was_chosen_as_wsrep_victim
trx_rollback_for_mysql()
Reset trx->lock.was_chosen_as_wsrep_victim
Add test case galera_bf_kill
On a checksum failure of a ROW_FORMAT=COMPRESSED page,
buf_LRU_free_one_page() would invoke buf_LRU_block_remove_hashed()
which will read the uncompressed page frame, although it would not
be initialized. With bad enough luck, fil_page_get_type(page)
could return an unrecognized value and cause the server to abort.
buf_page_io_complete(): On the corruption of a ROW_FORMAT=COMPRESSED
page, zerofill the uncompressed page frame.
This essentially reverts commit b393e2cb0c.
The leak might have been fixed, but because the
DEBUG_SYNC instrumentation for InnoDB purge threads was reverted
in 10.5 commit 5e62b6a5e0
as part of introducing a thread pool, it is easiest to revert
the entire change.
- There are multiple inconsistency and incorrect way in which rw-lock
stats are calculated.
- shared rw-lock stats:
"rounds" counter is incremented only once for N rounds done
in spin-cycle.
- all rw-lock stats:
If the spin-cycle is short-circuited then attempts are re-counted.
[If spin-cycle is interrupted, before it completes
srv_n_spin_wait_rounds (default 30) rounds, spin_count is incremented
to consider this. If thread resumes spin-cycle (due to unavailability
of the locks) and is again interrupted or completed, spin_count
is again incremented with the total count, failing to adjust the
previous attempt increment].
- s/x rw-lock stats:
spin_loop counter is not incremented at-all instead it is projected
as 0 (in show engine output) and division to calculate spin-round per
spin-loop is adjusted.
As per the original semantics spin_loop counter should be incremented
once per spin_loop execution.
- sx rw-lock stats:
sx locks increments spin_loop counter but instead of incrementing it
once for a spin_loop invocation it does it multiple times based on how
many time spin_loop flow is repeated for same instance post os-wait.
innobase_get_charset(), innobase_get_stmt_safe(): Remove.
It is more efficient and readable to invoke thd_charset()
and thd_query_safe() directly, without a non-inlined wrapper function.
As part of the SPATIAL INDEX implementation in InnoDB,
dict_index_t was expanded by a rtr_ssn_t field. There are only
3 operations for this field, all protected by rtr_ssn_t::mutex:
* btr_cur_search_to_nth_level() stores the least significant 32 bits
of the 64-bit value that is stored in the index root page.
(This would better be done when the table is opened for the
very first time.)
* rtr_get_new_ssn_id() increments the value by 1.
* rtr_get_current_ssn_id() reads the current value.
All these operations can be implemented equally safely by using
atomic memory access operations.
Let us limit the maximum value of the debug parameter
innodb_data_file_size to 256 MiB. It is only being used
in the test innodb.log_data_file_size, and the size
of the system tablespace should never exceed some 70 MiB
in ./mtr. Thus, 256 MiB should be a reasonable limit.
The fact that negative values that are passed to unsigned parameters
wrap around to the maximum value appears to be a regression due to
commit 18ef02b04d
and has been filed as bug MDEV-22219.
The test was incompatible with ./mtr --repeat=2 until
commit 2d6719d7ee
fixed that.
It turns out that the failing assertion that we disabled in
commit 3db94d2403
is bogus and can fail when the change buffer is emptied
during the last batch of crash recovery. The reason for this
is the condition around the page_create_empty() call in
page_cur_delete_rec(). The condition was removed in MariaDB
Server 10.5 as part of MDEV-12353, in
commit 7ae21b18a6 and
commit f8a9f90667.
The bug that the assertion aimed to catch is MDEV-22497, which
was fixed in commit 26aab96ecf.
The InnoDB insert buffer was upgraded in MySQL 5.5 into a change
buffer that also covers delete-mark and delete (purge) operations.
There is an important constraint for delete operations: a B-tree
leaf page must not become empty unless the entire tree becomes empty,
consisting of an empty root page. Because change buffer merges only
occur on a single leaf page at a time, delete operations must not be
buffered if it is possible that the last record of the page could be
deleted. (In that case, we would refuse to use the change buffer, and
if we really delete the last record, we would shrink the index tree.)
The function ibuf_get_volume_buffered_hash() is part of our insurance
that the page would not become empty. It is supposed to map each
buffered INSERT or DELETE_MARK record payload into a hash value.
We will only count each such record as a distinct key if there is no
hash collision. DELETE operations will always decrement the predicted
number fo records in the page.
Due to a bug in the function, we would actually compute the hash value
not only on the record payload, but also on some following bytes,
in case the record contains NULL values. In MySQL Bug #61104, we had
some examples of this dating back to 2012. But back then, we failed to
reproduce the bug, and in commit d84c95579b
we simply demoted the hard assertion to a message printout and a debug
assertion failure.
ibuf_get_volume_buffered_hash(): Correctly compute the hash value
of the payload bytes only. Note: we will consider
('foo','bar'),(NULL,'foobar'),('foob','ar') to be equal, but this
is not a problem, because in case of a hash collision, we could
also consider ('boo','far') to be equal, and underestimate the number
of records in the page, leading to refusing to buffer a DELETE.
ibuf_read_merge_pages(): Request a possibly freed page.
The change buffer is discarded lazily for freed pages either
by this function or when buf_page_create() reuses a page.
buf_page_get_low(): Relax a debug assertion.
Do not attempt change buffer merge on freed pages.
ibuf_merge_or_delete_for_page(): Assert that the page state is NORMAL.
INIT_ON_FLUSH is not possible, because in that case buf_page_create()
should have removed any buffered changes for the page.
buf_page_get_gen(): Apply buffered changes also in the case when
we can avoid reading the page based on buffered redo log records.
This addresses a hard-to-reproduce scenario that was broken in
commit 6697135c6d.
Rowid Filter check is just like Index Condition Pushdown check: before
we check the filter, we must check if we have walked out of the range
we are scanning. (If we did, we should return, and not continue the scan).
Consequences of this:
- Rowid filtering doesn't work for keys that have partially-covered
blob columns (just like Index Condition Pushdown)
- The rowid filter function has three return values: CHECK_POS (passed)
CHECK_NEG (filtered out), CHECK_OUT_OF_RANGE.
All of the above is implemented in this patch
ha_innobase::check_if_supported_inplace_alter(): Do not allow
ALGORITHM=INSTANT for operations that avoid a table rebuild
but involve dropping (or creating) secondary indexes.
This is a partial backport of
commit 5e7e7153b4 from 10.4.
assert_trx_is_free(): Assert !is_wsrep().
trx_init(): Do not initialize trx->wsrep, because it must have been
initialized already.
trx_commit_in_memory(): Invoke wsrep_commit_ordered(). This call
was being skipped, because the transaction object had already been
freed to the pool.
trx_rollback_for_mysql(), innobase_commit_low(),
innobase_rollback_trx(): Always reset trx->wsrep.