mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-11-30 05:23:50 +03:00

Author	SHA1	Message	Date
Marko Mäkelä	14685b10df	MDEV-32050: Deprecate&ignore innodb_purge_rseg_truncate_frequency The motivation of introducing the parameter innodb_purge_rseg_truncate_frequency in mysql/mysql-server@28bbd66ea5 and mysql/mysql-server@8fc2120fed seems to have been to avoid stalls due to freeing undo log pages or truncating undo log tablespaces. In MariaDB Server, innodb_undo_log_truncate=ON should be a much lighter operation than in MySQL, because it will not involve any log checkpoint. Another source of performance stalls should be trx_purge_truncate_rseg_history(), which is shrinking the history list by freeing the undo log pages whose undo records have been purged. To alleviate that, we will introduce a purge_truncation_task that will offload this from the purge_coordinator_task. In that way, the next innodb_purge_batch_size pages may be parsed and purged while the pages from the previous batch are being freed and the history list being shrunk. The processing of innodb_undo_log_truncate=ON will still remain the responsibility of the purge_coordinator_task. purge_coordinator_state::count: Remove. We will ignore innodb_purge_rseg_truncate_frequency, and act as if it had been set to 1 (the maximum shrinking frequency). purge_coordinator_state::do_purge(): Invoke an asynchronous task purge_truncation_callback() to free the undo log pages. purge_sys_t::iterator::free_history(): Free those undo log pages that have been processed. This used to be a part of trx_purge_truncate_history(). purge_sys_t::clone_end_view(): Take a new value of purge_sys.head as a parameter, so that it will be updated while holding exclusive purge_sys.latch. This is needed for race-free access to the field in purge_truncation_callback(). Reviewed by: Vladislav Lesin	2023-10-25 09:11:58 +03:00
Marko Mäkelä	80585c9d6f	Merge 10.5 into 10.6	2023-06-08 10:42:56 +03:00
Marko Mäkelä	3e40f9a7f3	MDEV-31355 innodb_undo_log_truncate=ON fails to wait for purge of enough transaction history purge_sys_t::sees(): Wrapper for view.sees(). trx_purge_truncate_history(): Invoke purge_sys.sees() instead of comparing to head.trx_no, to determine if undo pages can be safely freed. The test innodb.cursor-restore-locking was adjusted by Vladislav Lesin, as was the the debug instrumentation in row_purge_del_mark(). Reviewed by: Vladislav Lesin	2023-06-08 09:17:52 +03:00
Marko Mäkelä	3e2ad0e918	Merge 10.5 into 10.6	2023-02-27 13:17:35 +02:00
Marko Mäkelä	0de3be8cfd	MDEV-30671 InnoDB undo log truncation fails to wait for purge of history It is not safe to invoke trx_purge_free_segment() or execute innodb_undo_log_truncate=ON before all undo log records in the rollback segment has been processed. A prominent failure that would occur due to premature freeing of undo log pages is that trx_undo_get_undo_rec() would crash when trying to copy an undo log record to fetch the previous version of a record. If trx_undo_get_undo_rec() was not invoked in the unlucky time frame, then the symptom would be that some committed transaction history is never removed. This would be detected by CHECK TABLE...EXTENDED that was impleented in commit `ab0190101b`. Such a garbage collection leak should be possible even when using innodb_undo_log_truncate=OFF, just involving trx_purge_free_segment(). trx_rseg_t::needs_purge: Change the type from Boolean to a transaction identifier, noting the most recent non-purged transaction, or 0 if everything has been purged. On transaction start, we initialize this to 1 more than the transaction start ID. On recovery, the field may be adjusted to the transaction end ID (TRX_UNDO_TRX_NO) if it is larger. The field TRX_UNDO_NEEDS_PURGE becomes write-only; only some debug assertions that would validate the value. The field reflects the old inaccurate Boolean field trx_rseg_t::needs_purge. trx_undo_mem_create_at_db_start(), trx_undo_lists_init(), trx_rseg_mem_restore(): Remove the parameter max_trx_id. Instead, store the maximum in trx_rseg_t::needs_purge, where trx_rseg_array_init() will find it. trx_purge_free_segment(): Contiguously hold a lock on trx_rseg_t to prevent any concurrent allocation of undo log. trx_purge_truncate_rseg_history(): Only invoke trx_purge_free_segment() if the rollback segment is empty and there are no pending transactions associated with it. trx_purge_truncate_history(): Only proceed with innodb_undo_log_truncate=ON if trx_rseg_t::needs_purge indicates that all history has been purged. Tested by: Matthias Leich	2023-02-24 14:24:44 +02:00
Marko Mäkelä	e441c32a0b	Merge 10.5 into 10.6	2023-01-03 18:13:11 +02:00
Vlad Lesin	3ddc00dc3b	MDEV-30225 RR isolation violation with locking unique search Before the fix next-key lock was requested only if a record was delete-marked for locking unique search in RR isolation level. There can be several delete-marked records for the same unique key, that's why InnoDB scans the records until eighter non-delete-marked record is reached or all delete-marked records with the same unique key are scanned. For range scan next-key locks are used for RR to protect scanned range from inserting new records by other transactions. And this is the reason of why next-key locks are used for delete-marked records for unique searches. If a record is not delete-marked, the requested lock type was "not-gap". When a record is not delete-marked during lock request by trx 1, and some other transaction holds conflicting lock, trx 1 creates waiting not-gap lock on the record and suspends. During trx 1 suspending the record can be delete-marked. And when the lock is granted on conflicting transaction commit or rollback, its type is still "not-gap". So we have "not-gap" lock on delete-marked record for RR. And this let some other transaction to insert some record with the same unique key when trx 1 is not committed, what can cause isolation level violation. The fix is to set next-key locks for both delete-marked and non-delete-marked records for unique search in RR.	2022-12-20 11:31:49 +03:00
Vlad Lesin	8ff1096999	MDEV-29081 trx_t::lock.was_chosen_as_deadlock_victim race in lock_wait_end() The issue is that trx_t::lock.was_chosen_as_deadlock_victim can be reset before the transaction check it and set trx_t::error_state. The fix is to reset trx_t::lock.was_chosen_as_deadlock_victim only in trx_t::commit_in_memory(), which is invoked on full rollback. There is also no need to have separate bit in trx_t::lock.was_chosen_as_deadlock_victim to flag transaction it was chosen as a victim of Galera conflict resolution, the same variable can be used for both cases except debug build. For debug build we need to distinguish deadlock and Galera's abort victims for debug checks. Also there is no need to check for deadlock in lock_table_enqueue_waiting() for Galera as the coresponding check presents in lock_wait(). Local variable "error_state" in lock_wait() was replaced with trx->error_state, because before the replace lock_sys_t::cancel<false>(trx, lock) and lock_sys.deadlock_check() could change trx->error_state, which then could be overwritten with the local "error_state" variable value. The lock_wait_suspend_thread_enter DEBUG_SYNC point name is misleading, because lock_wait_suspend_thread was eliminated in `e71e613`. It was renamed to lock_wait_start. Reviewed by: Marko Mäkelä, Jan Lindström.	2022-08-24 17:06:57 +03:00
Vlad Lesin	a6f258e47f	MDEV-20605 Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration Backported from 10.5 `20e9e804c1` and `5948d7602e`. sel_restore_position_for_mysql() moves forward persistent cursor position after btr_pcur_restore_position() call if cursor relative position is BTR_PCUR_ON and the cursor points to the record with NOT the same field values as in a stored record(and some other not important for this case conditions). It was done because btr_pcur_restore_position() sets page_cur_mode_t mode to PAGE_CUR_LE for cursor->rel_pos == BTR_PCUR_ON before opening cursor. So we are searching for the record less or equal to stored one. And if the found record is not equal to stored one, then it is less and we need to move cursor forward. But there can be a situation when the stored record was purged, but the new one with the same key but different value was inserted while row_search_mvcc() was suspended. In this case, when the thread is awaken, it will invoke sel_restore_position_for_mysql(), which, in turns, invoke btr_pcur_restore_position(), which will return false because found record don't match stored record, and sel_restore_position_for_mysql() will move forward cursor position. The above can lead to the case when awaken row_search_mvcc() do not see records inserted by other transactions while it slept. The mtr test case shows the example how it can be. The fix is to return special value from persistent cursor restoring function which would notify its caller that uniq fields of restored record and stored record are the same, and in this case sel_restore_position_for_mysql() don't move cursor forward. Delete-marked records are correctly processed in row_search_mvcc(). Non-unique secondary indexes are "uniquified" by adding the PK, the index->n_uniq should then be index->n_fields. So there is no need in additional checks in the fix. If transaction's readview can't see the changes made in secondary index record, it requests clustered index record in row_search_mvcc() to check its transaction id and get the correspondent record version. After this row_search_mvcc() commits mtr to preserve clustered index latching order, and starts mtr. Between those mtr commit and start secondary index pages are unlatched, and purge has the ability to remove stored in the cursor record, what causes rows duplication in result set for non-locking reads, as cursor position is restored to the previously visited record. To solve this the changes are just switched off for non-locking reads, it's quite simple solution, besides the changes don't make sense for non-locking reads. The more complex and effective from performance perspective solution is to create mtr savepoint before clustered record requesting and rolling back to that savepoint after that. See MDEV-27557. One more solution is to have per-record transaction id for secondary indexes. See MDEV-17598. If any of those is implemented, just remove select_lock_type argument in sel_restore_position_for_mysql().	2022-02-21 12:49:54 +03:00
Vlad Lesin	20e9e804c1	MDEV-20605 Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration sel_restore_position_for_mysql() moves forward persistent cursor position after btr_pcur_restore_position() call if cursor relative position is BTR_PCUR_ON and the cursor points to the record with NOT the same field values as in a stored record(and some other not important for this case conditions). It was done because btr_pcur_restore_position() sets page_cur_mode_t mode to PAGE_CUR_LE for cursor->rel_pos == BTR_PCUR_ON before opening cursor. So we are searching for the record less or equal to stored one. And if the found record is not equal to stored one, then it is less and we need to move cursor forward. But there can be a situation when the stored record was purged, but the new one with the same key but different value was inserted while row_search_mvcc() was suspended. In this case, when the thread is awaken, it will invoke sel_restore_position_for_mysql(), which, in turns, invoke btr_pcur_restore_position(), which will return false because found record don't match stored record, and sel_restore_position_for_mysql() will move forward cursor position. The above can lead to the case when awaken row_search_mvcc() do not see records inserted by other transactions while it slept. The mtr test case shows the example how it can be. The fix is to return special value from persistent cursor restoring function which would notify its caller that uniq fields of restored record and stored record are the same, and in this case sel_restore_position_for_mysql() don't move cursor forward. Delete-marked records are correctly processed in row_search_mvcc(). Non-unique secondary indexes are "uniquified" by adding the PK, the index->n_uniq should then be index->n_fields. So there is no need in additional checks in the fix. If transaction's readview can't see the changes made in secondary index record, it requests clustered index record in row_search_mvcc() to check its transaction id and get the correspondent record version. After this row_search_mvcc() commits mtr to preserve clustered index latching order, and starts mtr. Between those mtr commit and start secondary index pages are unlatched, and purge has the ability to remove stored in the cursor record, what causes rows duplication in result set for non-locking reads, as cursor position is restored to the previously visited record. To solve this the changes are just switched off for non-locking reads, it's quite simple solution, besides the changes don't make sense for non-locking reads. The more complex and effective from performance perspective solution is to create mtr savepoint before clustered record requesting and rolling back to that savepoint after that. See MDEV-27557. One more solution is to have per-record transaction id for secondary indexes. See MDEV-17598. If any of those is implemented, just remove select_lock_type argument in sel_restore_position_for_mysql().	2022-02-14 17:35:04 +03:00

10 Commits