mariadb

database/mariadb

Fork 0

mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Marko Mäkelä	ddd7d5d8e3	MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure Under unknown circumstances, the SQL layer may wrongly disregard an invocation of thd_mark_transaction_to_rollback() when an InnoDB transaction had been aborted (rolled back) due to one of the following errors: * HA_ERR_LOCK_DEADLOCK * HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON) * HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON) Such an error used to cause a crash of InnoDB during transaction commit. These changes aim to catch and report the error earlier, so that not only this crash can be avoided but also the original root cause be found and fixed more easily later. The idea of this fix is from Michael 'Monty' Widenius. HA_ERR_ROLLBACK: A new error code that will be translated into ER_ROLLBACK_ONLY, signalling that the current transaction has been aborted and the only allowed action is ROLLBACK. trx_t::state: Add TRX_STATE_ABORTED that is like TRX_STATE_NOT_STARTED, but noting that the transaction had been rolled back and aborted. trx_t::is_started(): Replaces trx_is_started(). ha_innobase: Check the transaction state in various places. Simplify the logic around SAVEPOINT. ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only(). The InnoDB logic around transaction savepoints, commit, and rollback was unnecessarily complex and might have contributed to this inconsistency. So, we are simplifying that logic as well. trx_savept_t: Replace with const undo_no_t*. When we rollback to a savepoint, all we need to know is the number of undo log records that must survive. trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t directly in the space allocated at innobase_hton->savepoint_offset. fts_trx_create(): Do not copy previous savepoints. fts_savepoint_rollback(): If a savepoint was not found, roll back everything after the default savepoint of fts_trx_create(). The test innodb_fts.savepoint is extended to cover this code. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2024-12-12 18:02:00 +02:00
Vlad Lesin	78a04a4c22	MDEV-29869 mtr failure: innodb.deadlock_wait_thr_race 1. The merge `aeccbbd926` has overwritten lock0lock.cc, and the changes of MDEV-29622 and MDEV-29635 were partially lost, this commit restores the changes. 2. innodb.deadlock_wait_thr_race test: The following hang was found during testing. There is deadlock_report_before_lock_releasing sync point in Deadlock::report(), which is waiting for sel_cont signal under lock_sys_t lock. The signal must be issued after "UPDATE t SET b = 100" rollback, and that rollback is executing undo record, which is blocked on dict_sys latch request. dict_sys is locked by the thread of statistics update(dict_stats_save()), and during that update lock_sys lock is requested, and can't be acquired as Deadlock::report() holds it. We have to disable statistics update to make the test stable. But even if statistics update is disabled, and transaction with consistent snapshot is started at the very beginning of the test to prevent purging, the purge can still be invoked for system tables, and it tries to open system table by id, what causes dict_sys.freeze() call and dict_sys latching. What, in combination with lock_sys::xx_lock() causes the same deadlock as described above. We need to disable purging globally for the test as well. All the above is applicable to innodb.deadlock_wait_lock_race test also.	2022-10-26 12:15:40 +03:00
Vlad Lesin	acebe35719	MDEV-29635 race on trx->lock.wait_lock in deadlock resolution Returning DB_SUCCESS unconditionally if !trx->lock.wait_lock in lock_trx_handle_wait() is wrong. Because even if trx->lock.was_chosen_as_deadlock_victim was not set before the first check in lock_trx_handle_wait(), it can be set after the check, and trx->lock.wait_lock can be reset by another thread from lock_reset_lock_and_trx_wait() if the transaction was chosen as deadlock victim. In this case lock_trx_handle_wait() will return DB_SUCCESS even the transaction was marked as deadlock victim, and continue execution instead of rolling back. The fix is to check trx->lock.was_chosen_as_deadlock_victim once more if trx->lock.wait_lock is reset, as trx->lock.wait_lock can be reset only after trx->lock.was_chosen_as_deadlock_victim was set if the transaction was chosen as deadlock victim.	2022-10-21 10:55:18 +03:00

Marko Mäkelä

ddd7d5d8e3

MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure

Under unknown circumstances, the SQL layer may wrongly disregard an
invocation of thd_mark_transaction_to_rollback() when an InnoDB
transaction had been aborted (rolled back) due to one of the following errors:
* HA_ERR_LOCK_DEADLOCK
* HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON)
* HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON)

Such an error used to cause a crash of InnoDB during transaction commit.
These changes aim to catch and report the error earlier, so that not only
this crash can be avoided but also the original root cause be found and
fixed more easily later.

The idea of this fix is from Michael 'Monty' Widenius.

HA_ERR_ROLLBACK: A new error code that will be translated into
ER_ROLLBACK_ONLY, signalling that the current transaction
has been aborted and the only allowed action is ROLLBACK.

trx_t::state: Add TRX_STATE_ABORTED that is like
TRX_STATE_NOT_STARTED, but noting that the transaction had been
rolled back and aborted.

trx_t::is_started(): Replaces trx_is_started().

ha_innobase: Check the transaction state in various places.
Simplify the logic around SAVEPOINT.

ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only().

The InnoDB logic around transaction savepoints, commit, and rollback
was unnecessarily complex and might have contributed to this
inconsistency. So, we are simplifying that logic as well.

trx_savept_t: Replace with const undo_no_t*. When we rollback to
a savepoint, all we need to know is the number of undo log records
that must survive.

trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t
directly in the space allocated at innobase_hton->savepoint_offset.

fts_trx_create(): Do not copy previous savepoints.

fts_savepoint_rollback(): If a savepoint was not found, roll back
everything after the default savepoint of fts_trx_create().
The test innodb_fts.savepoint is extended to cover this code.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich

2024-12-12 18:02:00 +02:00

Vlad Lesin

78a04a4c22

MDEV-29869 mtr failure: innodb.deadlock_wait_thr_race

1. The merge aeccbbd926 has overwritten
lock0lock.cc, and the changes of MDEV-29622 and MDEV-29635 were
partially lost, this commit restores the changes.

2. innodb.deadlock_wait_thr_race test:

The following hang was found during testing.

There is deadlock_report_before_lock_releasing sync point in
Deadlock::report(), which is waiting for sel_cont signal under lock_sys_t
lock. The signal must be issued after "UPDATE t SET b = 100" rollback,
and that rollback is executing undo record, which is blocked
on dict_sys latch request. dict_sys is locked by the thread of statistics
update(dict_stats_save()), and during that update lock_sys lock is
requested, and can't be acquired as Deadlock::report() holds it. We have
to disable statistics update to make the test stable.

But even if statistics update is disabled, and transaction with consistent
snapshot is started at the very beginning of the test to prevent purging,
the purge can still be invoked for system tables, and it tries to open
system table by id, what causes dict_sys.freeze() call and dict_sys
latching. What, in combination with lock_sys::xx_lock() causes the same
deadlock as described above. We need to disable purging globally for the
test as well.

All the above is applicable to innodb.deadlock_wait_lock_race test also.

2022-10-26 12:15:40 +03:00

Vlad Lesin

acebe35719

MDEV-29635 race on trx->lock.wait_lock in deadlock resolution

Returning DB_SUCCESS unconditionally if !trx->lock.wait_lock in
lock_trx_handle_wait() is wrong. Because even if
trx->lock.was_chosen_as_deadlock_victim was not set before the first check
in lock_trx_handle_wait(), it can be set after
the check, and trx->lock.wait_lock can be reset by another thread from
lock_reset_lock_and_trx_wait() if the transaction was chosen as deadlock
victim. In this case lock_trx_handle_wait() will return DB_SUCCESS even
the transaction was marked as deadlock victim, and continue execution
instead of rolling back.

The fix is to check trx->lock.was_chosen_as_deadlock_victim once more if
trx->lock.wait_lock is reset, as trx->lock.wait_lock can be reset only
after trx->lock.was_chosen_as_deadlock_victim was set if the transaction
was chosen as deadlock victim.

2022-10-21 10:55:18 +03:00

3 Commits