This was the last abuse of trx_sys.mutex, which is now exclusively
protecting trx_sys.trx_list.
This global acquisition was also potential scalability bottleneck for
oltp_read_write benchmark. Although it didn't expose itself as such due
to larger scalability issues.
Replaced trx_sys.mutex based synchronisation between ReadView creator
thread and purge coordinator thread performing latest view clone with
ReadView::m_mutex.
It also allowed to simplify tri-state view m_state down to boolean
m_open flag.
For performance reasons trx->read_view.close() is left as atomic relaxed
store, so that we don't have to waste resources for otherwise meaningless
mutex acquisition.
- During column reorder table rebuild, rollback of insert fails.
Reason is that InnoDB tries to fetch the column position from
new clustered index and it exceeds default column value tuple fields.
So InnoDB should use the table column position while searching for
defaults column value.
ins_node_create_entry_list(): Create dummy empty tuples for
corrupted or incomplete indexes, to avoid dereferencing a NULL
dict_field_t::col pointer in row_build_index_entry_low().
This issue was found by a crash in the test gcol.innodb_virtual_basic
when merging the fix to 10.5.
This is a regression due to the cleanup
commit 12f804acfa.
row_sel_open_pcur(): Remove the unnecessary parameter.
It suffices for us to acquire the adaptive hash index latch
only when btr_search_guess_on_hash() is called by
btr_cur_search_to_nth_level_func(), in
btr_pcur_open_with_no_init().
This code seems to be a relic from the times when there was
only one btr_search_latch, which was held in shared mode
for longer periods of time. Another relic of that era was
removed in commit e5980bf1b1.
This clean-up was missed when the btr_search_latch was split in
mysql/mysql-server/commit@ab17ab91ce18a47bb6c5c49e4dc0505ad488a448
(MySQL 5.7.8).
Problem:
=======
- During alter rebuild, document read from old table is tokenzied
parallelly by innodb_ft_sort_pll_degree threads and stores it
in respective merge files. While doing the parallel merge, InnoDB
wrongly skips the root level selection of merging buffer records.
So it leads to insertion of merge records in non-ascending order.
Solution:
==========
Build selection tree for the root level also. So that root of
selection tree can always contain sorted buffer.
If the InnoDB buffer pool contains many pages for a table or index
that is being dropped or rebuilt, and if many of such pages are
pointed to by the adaptive hash index, dropping the adaptive hash index
may consume a lot of time.
The time-consuming operation of dropping the adaptive hash index entries
is being executed while the InnoDB data dictionary cache dict_sys is
exclusively locked.
It is not actually necessary to drop all adaptive hash index entries
at the time a table or index is being dropped or rebuilt. We can let
the LRU replacement policy of the buffer pool take care of this gradually.
For this to work, we must detach the dict_table_t and dict_index_t
objects from the main dict_sys cache, and once the last
adaptive hash index entry for the detached table is removed
(when the garbage page is evicted from the buffer pool) we can free
the dict_table_t and dict_index_t object.
Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE
skip both the buffer pool eviction and the drop of the adaptive hash index.
We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE.
We can remove the eviction from DROP TABLE. We must retain the eviction
in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the
discarded table is being re-imported with the same tablespace identifier,
the fresh data from the imported tablespace will replace any stale pages
in the buffer pool.
rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can
no longer be interrupted inside InnoDB.
fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(),
fseg_free_page_low(), fseg_free_extent(): Remove the parameter
that specifies whether the adaptive hash index should be dropped.
btr_search_lazy_free(): Lazily free an index when the last
reference to it is dropped from the adaptive hash index.
buf_pool_clear_hash_index(): Declare static, and move to the
same compilation unit with the bulk of the adaptive hash index
code.
dict_index_t::clone(), dict_index_t::clone_if_needed():
Clone an index that is being rebuilt while adaptive hash index
entries exist. The original index will be inserted into
dict_table_t::freed_indexes and dict_index_t::set_freed()
will be called.
dict_index_t::set_freed(), dict_index_t::freed(): Note that
or check whether the index has been freed. We will use the
impossible page number 1 to denote this condition.
dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count().
dict_index_t::detach_columns(): Move the assignment n_fields=0
to ha_innobase_inplace_ctx::clear_added_indexes().
We must have access to the columns when freeing the
adaptive hash index. Note: dict_table_t::v_cols[] will remain
valid. If virtual columns are dropped or added, the table
definition will be reloaded in ha_innobase::commit_inplace_alter_table().
buf_page_mtr_lock(): Drop a stale adaptive hash index if needed.
We will also reduce the number of btr_get_search_latch() calls
and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT
in order to benefit cmake -DWITH_INNODB_AHI=OFF.
This essentially reverts commit b393e2cb0c.
The leak might have been fixed, but because the
DEBUG_SYNC instrumentation for InnoDB purge threads was reverted
in 10.5 commit 5e62b6a5e0
as part of introducing a thread pool, it is easiest to revert
the entire change.
Rowid Filter check is just like Index Condition Pushdown check: before
we check the filter, we must check if we have walked out of the range
we are scanning. (If we did, we should return, and not continue the scan).
Consequences of this:
- Rowid filtering doesn't work for keys that have partially-covered
blob columns (just like Index Condition Pushdown)
- The rowid filter function has three return values: CHECK_POS (passed)
CHECK_NEG (filtered out), CHECK_OUT_OF_RANGE.
All of the above is implemented in this patch
que_thr_t::magic_n: Remove. Access to freed data is best caught
by AddressSanitizer.
que_thr_t::start_running(): Replaces que_thr_move_to_run_state_for_mysql()
and que_thr_move_to_run_state(), which were identical non-inline functions.
que_thr_t::stop_no_error(): Replaces que_thr_stop_for_mysql_no_error().
que_fork_t::n_active_thrs, trx_lock_t::n_active_thrs: Make debug-only.
que_fork_t::set_active(bool active): Update n_active_thrs.
trx_t::rollback(): Renamed from trx_rollback_to_savepoint().
trx_t::rollback_low(): Renamed from trx_rollback_to_savepoint_low().
fts_sql_commit(): Defined as an alias of trx_commit_for_mysql().
fts_sql_rollback(): Defined as an alias of trx_t::rollback().
fts_rename_aux_tables_to_hex_format(): Fix the error handling
that likely never worked because we failed to roll back the
first transaction.
During the UPDATE of PRIMARY KEY columns, we may miscalculate the
size of the clustered index record.
row_upd_clust_rec_by_insert(): Pass the total number of off-page columns,
which may include such columns that were inherited from the record
and not created as part of the UPDATE operation.
This is based on
mysql/mysql-server@490c45e8c8
which is a follow-up to
mysql/mysql-server@1fa475b85d
which we filed and fixed as MDEV-21511.
No test case was provided by Oracle.
dict_stats_update_if_needed(): Replace the parameter THD*
with const trx_t& so that trx_t::is_wsrep() can be invoked
instead of the more expensive wsrep_on().
Replace also other occurrences of wsrep_on() with trx_t::is_wsrep().
The function wsrep_on() was being called rather frequently
in InnoDB and XtraDB. Let us cache it in trx_t and invoke
trx_t::is_wsrep() instead.
innobase_trx_init(): Cache trx->wsrep = wsrep_on(thd).
ha_innobase::write_row(): Replace many repeated calls to current_thd,
and test the cheapest condition first.
row_vers_vc_matches_cluster(): Remove the parameter in_purge,
which was always passed as in_purge=true.
This parameter became constant in
mysql/mysql-server@1dec14d346
and it always was constant in MariaDB starting from the
introduction of the function in
commit 2e814d4702 (MariaDB 10.2.2).
row_prebuilt_free(): Do not attempt to drop orphan indexes
that might have been left behind by a failed ADD UNIQUE INDEX.
This avoids the execution of unwanted transactions during shutdown.
Commit 5defdc382b
(which aimed to reduce sizeof(mtr_t) for non-debug builds)
replaced the ternary mtr_t::status with two debug-only bool
data members m_start, m_commit and inadvertently made the
(now debug-only) predicate mtr_t::is_active() wrongly hold
after mtr_t::commit().
mtr_t::is_active(): Evaluate both m_start and m_commit,
to be compatible with the old definition.
row_merge_read_clustered_index(): Correct a debug assertion.
The reason for this is to make all temporary file names similar and
also to be able to figure out from where a #sql-xxx name orginates.
New format is for most cases:
'#sql-name-current_pid-thread_id[-increment]'
Where name is one of subselect, alter, exchange, temptable or backup
The exceptions are:
ALTER PARTITION shadow files:
'#sql-shadow-thread_id-'original_table_name'
Names used with temp pool:
'#sql-name-current_pid-pool_number'
In main.index_merge_myisam we remove the test that was added in
commit a2d24def8c because
it duplicates the test case that was added in
commit 5af12e4635.
After MDEV-12353, the consistency check that I originally added for
commit 1b9fe0bbac
(InnoDB Plugin for MySQL 5.1) started randomly failing.
It turns out that the IMPORT TABLESPACE code was always incorrect:
it did not update the (redundantly stored) tablespace ID
in index tree root pages. It only does that for page headers
and BLOB pointers.
PageConverter::update_index_page(): Update the tablespace ID
in the BTR_SEG_TOP and BTR_SEG_LEAF of index root pages.
This is a backport of commit b8b3edff13.