mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-05 13:16:09 +03:00

Author	SHA1	Message	Date
Sergei Golubchik	bead24b7f3	mariadb-test: wait on disconnect Remove one of the major sources of race condiitons in mariadb-test. Normally, mariadb_close() sends COM_QUIT to the server and immediately disconnects. In mariadb-test it means the test can switch to another connection and sends queries to the server before the server even started parsing the COM_QUIT packet and these queries can see the connection as fully active, as it didn't reach dispatch_command yet. This is a major source of instability in tests and many - but not all, still less than a half - tests employ workarounds. The correct one is a pair count_sessions.inc/wait_until_count_sessions.inc. Also very popular was wait_until_disconnected.inc, which was completely useless, because it verifies that the connection is closed, and after disconnect it always is, it didn't verify whether the server processed COM_QUIT. Sadly the placebo was as widely used as the real thing. Let's fix this by making mariadb-test `disconnect` command _to wait_ for the server to confirm. This makes almost all workarounds redundant. In some cases count_sessions.inc/wait_until_count_sessions.inc is still needed, though, as only `disconnect` command is changed: * after external tools, like `exec $MYSQL` * after failed `connect` command * replication, after `STOP SLAVE` * Federated/CONNECT/SPIDER/etc after `DROP TABLE` and also in some XA tests, because an XA transaction is dissociated from the THD very late, after the server has closed the client connection. Collateral cleanups: fix comments, remove some redundant statements: * DROP IF EXISTS if nothing is known to exist * DROP table/view before DROP DATABASE * REVOKE privileges before DROP USER etc	2025-07-16 09:14:33 +07:00
Marko Mäkelä	ddd7d5d8e3	MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure Under unknown circumstances, the SQL layer may wrongly disregard an invocation of thd_mark_transaction_to_rollback() when an InnoDB transaction had been aborted (rolled back) due to one of the following errors: * HA_ERR_LOCK_DEADLOCK * HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON) * HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON) Such an error used to cause a crash of InnoDB during transaction commit. These changes aim to catch and report the error earlier, so that not only this crash can be avoided but also the original root cause be found and fixed more easily later. The idea of this fix is from Michael 'Monty' Widenius. HA_ERR_ROLLBACK: A new error code that will be translated into ER_ROLLBACK_ONLY, signalling that the current transaction has been aborted and the only allowed action is ROLLBACK. trx_t::state: Add TRX_STATE_ABORTED that is like TRX_STATE_NOT_STARTED, but noting that the transaction had been rolled back and aborted. trx_t::is_started(): Replaces trx_is_started(). ha_innobase: Check the transaction state in various places. Simplify the logic around SAVEPOINT. ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only(). The InnoDB logic around transaction savepoints, commit, and rollback was unnecessarily complex and might have contributed to this inconsistency. So, we are simplifying that logic as well. trx_savept_t: Replace with const undo_no_t*. When we rollback to a savepoint, all we need to know is the number of undo log records that must survive. trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t directly in the space allocated at innobase_hton->savepoint_offset. fts_trx_create(): Do not copy previous savepoints. fts_savepoint_rollback(): If a savepoint was not found, roll back everything after the default savepoint of fts_trx_create(). The test innodb_fts.savepoint is extended to cover this code. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2024-12-12 18:02:00 +02:00
Iaroslav Babanin	5d49a2add7	MDEV-33935 fix deadlock counter - The deadlock counter was moved from Deadlock::find_cycle into Deadlock::report, because the find_cycle method is called multiple times during deadlock detection flow, which means it shouldn't have such side effects. But report() can, which called only once for a victim transaction. - Also the deadlock_detect.test and *.result test case has been extended to handle the fix.	2024-06-19 20:43:33 +03:00
Marko Mäkelä	c68007d958	MDEV-24738 Improve the InnoDB deadlock checker A new configuration parameter innodb_deadlock_report is introduced: * innodb_deadlock_report=off: Do not report any details of deadlocks. * innodb_deadlock_report=basic: Report transactions and waiting locks. * innodb_deadlock_report=full (default): Report also the blocking locks. The improved deadlock checker will consider all involved transactions in one loop, even if the deadlock loop includes several transactions. The theoretical maximum number of transactions that can be involved in a deadlock is `innodb_page_size` * 8, limited by the persistent data structures. Note: Similar to mysql/mysql-server@3859219875 our deadlock checker will consider at most one blocking transaction for each waiting transaction. The new field trx->lock.wait_trx be nullptr if and only if trx->lock.wait_lock is nullptr. Note that trx->lock.wait_lock->trx == trx (the waiting transaction), while trx->lock.wait_trx points to one of the transactions whose lock is conflicting with trx->lock.wait_lock. Considering only one blocking transaction will greatly simplify our deadlock checker, but it may also make the deadlock checker blind to some deadlocks where the deadlock cycle is 'hidden' by the fact that the registered trx->lock.wait_trx is not actually waiting for any InnoDB lock, but something else. So, instead of deadlocks, sometimes lock wait timeout may be reported. To improve on this, whenever trx->lock.wait_trx is changed, we will register further 'candidate' transactions in Deadlock::to_check(), and check for 'revealed' deadlocks as soon as possible, in lock_release() and innobase_kill_query(). The old DeadlockChecker was holding lock_sys.latch, even though using lock_sys.wait_mutex should be less contended (and thus preferred) in the likely case that no deadlock is present. lock_wait(): Defer the deadlock check to this function, instead of executing it in lock_rec_enqueue_waiting(), lock_table_enqueue_waiting(). DeadlockChecker: Complete rewrite: (1) Explicitly keep track of transactions that are being waited for, in trx->lock.wait_trx, protected by lock_sys.wait_mutex. Previously, we were painstakingly traversing the lock heaps while blocking concurrent registration or removal of any locks (even uncontended ones). (2) Use Brent's cycle-detection algorithm for deadlock detection, traversing each trx->lock.wait_trx edge at most 2 times. (3) If a deadlock is detected, release lock_sys.wait_mutex, acquire LockMutexGuard, re-acquire lock_sys.wait_mutex and re-invoke find_cycle() to find out whether the deadlock is still present. (4) Display information on all transactions that are involved in the deadlock, and choose a victim to be rolled back. lock_sys.deadlocks: Replaces lock_deadlock_found. Protected by wait_mutex. Deadlock::find_cycle(): Quickly find a cycle of trx->lock.wait_trx... using Brent's cycle detection algorithm. Deadlock::report(): Report a deadlock cycle that was found by Deadlock::find_cycle(), and choose a victim with the least weight. Altogether, we may traverse each trx->lock.wait_trx edge up to 5 times (2*find_cycle()+1 time for reporting and choosing the victim). Deadlock::check_and_resolve(): Find and resolve a deadlock. lock_wait_rpl_report(): Report the waits-for information to replication. This used to be executed as part of DeadlockChecker. Replication must know the waits-for relations even if no deadlocks are present in InnoDB. Reviewed by: Vladislav Vaintroub	2021-02-17 12:44:08 +02:00
Marko Mäkelä	3ddb4fddf1	MDEV-24738: Extend the test innodb.deadlock_detect	2021-02-17 12:34:24 +02:00
Jan Lindström	ae7e1b9b13	MDEV-13262: innodb.deadlock_detect failed in buildbot There is a race condition on a test. con1 is the older transaction as it query started wait first. Test continues so that con1 gets lock wait timeout first. There is possibility that default connection gets lock timeout also or as con1 is rolled back it gets the locks it waited and does the update. Fixed by removing query outputs as they could vary and accepting success from default connection query.	2018-01-08 12:25:31 +02:00
Shaohua Wang	d3a2f60e1a	BUG#23477773 OPTION TO TURN OFF/ON DEADLOCK CHECKER Backport WL#9383 INNODB: ADD AN OPTION TO TURN OFF/ON DEADLOCK CHECKER (rb#12873) to 5.7.	2017-04-24 15:09:18 +03:00

7 Commits