mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-09-13 13:47:59 +03:00

Author	SHA1	Message	Date
Oleksandr Byelkin	a8d4642375	Merge branch '10.11' into 11.4	2025-04-26 10:53:02 +02:00
Vlad Lesin	47e687b109	MDEV-36639 innodb_snapshot_isolation=1 gives error for not committed row changes Set solution is to check if transaction, which modified a record, is still active in lock_clust_rec_read_check_and_lock(). if yes, then just request a lock. If no, then, depending on if the current transaction read view can see the changes, return eighter DB_RECORD_CHANGED or request a lock. We can do the check in lock_clust_rec_read_check_and_lock() because transaction tries to set a lock on the record which cursor points to after transaction resuming and cursor position restoring. If the lock already exists, then we don't request the lock again. But for the current commit it's important that lock_clust_rec_read_check_and_lock() will be invoked again for the same record, so we can do the check again after transaction, which modified a record, was committed or rolled back. MDEV-33802(`4aa9291`) is partially reverted. If some transaction holds implicit lock on some record and transaction with snapshot isolation level requests conflicting lock on the same record, it should be blocked instead of returning DB_RECORD_CHANGED to have ability to continue execution when implicit lock owner is rolled back. The construction -------------------------------------------------------------------------- let $wait_condition= select count(*) = 1 from information_schema.processlist where state = 'Updating' and info = 'UPDATE t SET b = 2 WHERE a'; --source include/wait_condition.inc -------------------------------------------------------------------------- is not reliable enought to make sure transaction is blocked in test case, the test failed sporadically with -------------------------------------------------------------------------- ./mtr --max-test-fail=1 --parallel=96 lock_isolation{,,,,,,,}{,,,}{,,} \ --repeat=500 -------------------------------------------------------------------------- command. That's why it was replaced with debug sync-points. Reviewed by: Marko Mäkelä	2025-04-22 20:41:43 +03:00
Jan Lindström	5f2562291c	MDEV-36509 : Galera test failure on galera_sr.mysql-wsrep-features#165 Problem was that thread was holding lock_sys.wait_mutex when streaming replication transaction rollback was handled and in wsrep-lib requests THD::LOCK_thd_kill mutex causing wrong mutex usage (thd->reset_globals()). Fix is to remove streaming replication rollback handling from Deadlock::report() i.e. wsrep_handle_SR_rollback call. Purpose of Deadloc::report() is to find a cycle in the waits-for graph if exists, report it, mark victim transaction as deadlock victim and release locks it is waiting for. Actual streaming replication rollback that can take longer time can be handled later at trx_t::rollback where lock_sys.wait_mutex is not held. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-04-15 02:18:19 +02:00
Marko Mäkelä	f5bd250f5b	Merge 10.11 into 11.4	2025-03-28 13:55:21 +02:00
Marko Mäkelä	d1a6792324	MDEV-36122: Protect table references with a lock dict_table_open_on_id(): Simplify the logic. dict_stats: A helper for acquiring MDL and opening the tables mysql.innodb_table_stats and mysql.innodb_index_stats. innodb_ft_aux_table_validate(): Contiguously hold dict_sys.latch while accessing the table that we open with dict_table_open_on_name(). lock_table_children(): Do not hold a table reference while invoking dict_acquire_mdl_shared<false>(), which may temporarily release and reacquire the shared dict_sys.latch that we are holding. prepare_inplace_alter_table_dict(): If an unexpected reference to the table exists, wait for the purge subsystem to release its table handle, similar to how we would do in case FULLTEXT INDEX existed. This function is supposed to be protected by MDL_EXCLUSIVE on the table name. If purge is going to access the table again later during is ALTER TABLE operation, it will have access to an MDL compatible name for it and therefore should conflict with any MDL_EXCLUSIVE that would cover ha_innobase::commit_inplace_alter_table(commit=true). ha_innobase::rename_table(): Before locking the data dictionary, ensure that the purge subsystem is not holding a reference to the table due to the lack of metadata locking, related to FULLTEXT INDEX or the row-level undo logging of ALTER IGNORE TABLE. ha_innobase::truncate(): Before locking the data dictionary, ensure that the purge subsystem is not holding a reference to the table due to insufficient metadata locking related to an earlier ALTER IGNORE TABLE operation. trx_purge_attach_undo_recs(), purge_sys_t::batch_cleanup(): Clear purge_sys.m_active only after all table handles have been released. With these changes, no caller of dict_acquire_mdl_shared<false> should be holding a table reference. All remaining calls to dict_table_open_on_name(dict_locked=false) except the one in fts_lock_table() and possibly in the DDL recovery predicate innodb_check_version() should be protected by MDL, but there currently is no assertion that would enforce this. Reviewed by: Debarun Banerjee	2025-03-26 14:31:44 +02:00
Marko Mäkelä	67caeca284	MDEV-36122: Protect table references with a lock dict_table_open_on_id(): Simplify the logic. dict_stats: A helper for acquiring MDL and opening the tables mysql.innodb_table_stats and mysql.innodb_index_stats. innodb_ft_aux_table_validate(): Contiguously hold dict_sys.latch while accessing the table that we open with dict_table_open_on_name(). lock_table_children(): Do not hold a table reference while invoking dict_acquire_mdl_shared<false>(), which may temporarily release and reacquire the shared dict_sys.latch that we are holding. With these changes, no caller of dict_acquire_mdl_shared<false> should be holding a table reference. All remaining calls to dict_table_open_on_name(dict_locked=false) except the one in fts_lock_table() and possibly in the DDL recovery predicate innodb_check_version() should be protected by MDL, but there currently is no assertion that would enforce this. Reviewed by: Debarun Banerjee	2025-03-26 14:22:58 +02:00
Sergei Golubchik	7d657fda64	Merge branch '10.11 into 11.4	2025-01-30 12:01:11 +01:00
Vlad Lesin	3a6af458e6	MDEV-34877 Port "Bug #11745929 Change lock priority so that the transaction holding S-lock gets X-lock first" fix from MySQL to MariaDB This commit implements mysql/mysql-server@7037a0bdc8 functionality. If some transaction 't' requests not-gap X-lock 'Xt' on record 'r', and locks list of the record 'r' contains not-gap granted S-lock 'St' of transaction 't', followed by not-gap waiting locks WB={Wb1, Wb2, ..., Wbn} conflicting with 'Xt', and 'Xt' does not conflict with any other lock, located in the list after 'St', then grant 'Xt'. Note that insert-intention locks are also gap locks. If some transaction 't' holds not-gap lock 'Lt' on record 'r', and some other transactions have not-gap continuous waiting locks sequence L(B)={L(b1), L(b2), ..., L(bn)} following L(t) in the list of locks for the record 'r', and transaction 't' requests not-gap, what means also not insert intention, as ii-locks are also gap locks, X-lock conflicting with any lock in L(B), then grant the. MySQL's commit contains the following explanation of why insert-intention locks must not overtake a waiting ordinary or gap locks: "It is important that this decission rule doesn't allow INSERT_INTENTION locks to overtake WAITING locks on gaps (`S`, `S\|GAP`, `X`, `X\|GAP`), as inserting a record into a gap would split such WAITING lock, violating the invariant that each transaction can have at most single WAITING lock at any time." I would add to the explanation the following. Suppose we have trx 1 which holds ordinary X-lock on some record. And trx 2 executes "DELETE FROM t" or "SELECT * FOR UPDATE" in RR(see lock_delete_updated.test and MDEV-27992), i.e. it creates waiting ordinary X-lock on the same record. And then trx 1 wants to insert some record just before the locked record. It requests insert-intention lock, and if the lock overtakes trx 2 lock, there will be phantom records for trx 2 in RR. lock_delete_updated.test shows how "DELETE" allows to insert some records in already scanned gap and misses some records to delete. The current implementation differs from MySQL implementation. There are two key differences: 1. Lock queue ordering. In MySQL all waiting locks precede all granted locks. A new waiting lock is added to the head of the queue, a new granted lock is added to the end of the queue, if some waiting lock is granted, it's moved to the end of the queue. In MariaDB any new lock is added to the end of the queue and waiting lock does not change its position in the queue where the lock is granted. The rule is that blocking lock must be located before blocked lock in lock queue. We maintain the rule with inserting bypassing lock just before bypassed one. 2. MySQL implementation uses some object(locksys::Trx_locks_cache) which can be passed to consecutive calls to rec_lock_has_to_wait() for the same trx and heap_no to cache the result of checking if trx has a granted lock which is blocking the waiting lock(see locksys::Trx_locks_cache::has_granted_blocker()). The current implementation does not use such object, because it looks for such granted lock on the level of lock_rec_other_has_conflicting() and lock_rec_has_to_wait_in_queue(). I.e. there is no need in additional lock queue iteration in locksys::Trx_locks_cache::has_granted_blocker(), as we already iterate it in lock_rec_other_has_conflicting() and lock_rec_has_to_wait_in_queue(). During the testing the following case was found. Suppose we have delete-marked record and going to do inplace insert into that delete-marked record. Usually we don't create explicit lock if there are no conlicting with not gap X-lock locks(see lock_clust_rec_modify_check_and_lock(), btr_cur_update_in_place()). The implicit lock will be converted to explicit one by demand. That can happen during INSERT, the not-gap S-lock can be acquired on searching for duplicates(see row_ins_duplicate_error_in_clust()), and, if delete-marked record is found, inplace insert(see btr_cur_upd_rec_in_place()) modifies the record, what is treated as implicit lock. But there can be a case when some transaction trx1 holds not-gap S-lock, another transaction trx2 creates waiting X-lock, and then trx2 tries to do inplace insert. Before the fix the waiting X-lock of trx2 would be conflicting lock, and trx1 would try to create explicit X-lock, what would cause deadlock, and one of the transactions whould be rolled back. But after the fix, trx2 waiting X-lock is not treated as conflicting with trx1 X-lock anymore, as trx1 already holds S-lock. If we don't create explicit lock, then some other transaction trx3 can create it during implicit to explicit lock conversion and place it at the end of the queue. So there can be the following locks order in the queue: S1(granted) X2(waiting) X1(granted) The above queue is not valid, because all granted trx1 locks must be placed before waiting trx2 lock. Besides, lock_rec_release_try() can remove S(granted, trx1) lock and grant X lock to trx 2, and there can be two granted X-locks on the same record: X2(granted) X1(granted) Taking into account that lock_rec_release_try() can release cell and lock_sys latches leaving some locks unreleased, the queue validation function can fail in any unexpected place. It can be fixed with two ways: 1) Place explicit X(granted, trx1) lock before X(waiting, trx2) lock during implicit to explicit lock conversion. This option is implemented in MySQL, as granted lock is always placed at the top of locks queue, and waiting locks are placed at the bottom of the queue. MariaDB does not do this, and implementing this variant would require conflicting locks search before converting implicit to explicit lock, what, in turns, would require cell and/or lock_sys latch acquiring. 2) Create and place X(granted, trx1) lock before X(waiting, trx2) during inplace INSERT, i.e. when lock_rec_lock() is invoked from lock_clust_rec_modify_check_and_lock() or lock_sec_rec_modify_check_and_lock(), if X(waiting, trx2) is bypassed. Such a way we don't need in additional conflicting locks search, as they are searched anyway in lock_rec_low(). This fix implements the second variant(see the changes around c_lock_info.insert_after in lock_rec_lock). I.e. if some record was delete-marked and we do inplace insert in such a record, and some lock for bypass was found, create explicit lock to avoid conflicting lock search on each implicit to explicit lock conversion. We can remove it if MDEV-35624 is implemented. lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue(): search locks to bypass along with conflicting locks searching in the same loop. The result is returned in conflicting_lock_info object. There can be several locks to bypass, only the first one is returned to limit lock_rec_find_similar_on_page() with the first bypassed lock to preserve "blocking before blocked" invariant. conflicting_lock_info also contains a pointer to the lock, after which we can insert bypassing lock. This lock precedes bypassed one. Bypassing lock can be next-key lock, and the following cases are possible: 1. S1(not-gap, granted) II2(granted) X3(waiting for S1), When new X1(ordinary) lock is acquired, there will be the following locks queue: S1(not-gap, granted) II2(granted) X1(ordinary, granted) X3(waiting for S1) If we had inserted new X1 lock just after S1, and S1 had been released on transaction commit or rollback, we would have the following sequence in the locks queue: X1(ordinary, granted) II2(granted) X3(waiting for X1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is not a real issue as II lock once granted can be ignored but it could possibly hit some assert(taking into account that lock_release_try() can release lock_sys latch, and other threads can acquire the latch and validate lock queue) as it breaks our design constraint that any granted lock in the queue should not conflict with locks ahead in the queue. But lock_rec_queue_validate() does not check the above constraint. We place new bypassing lock just before bypassed one, but there still can be the case when lock bitmap is used instead of creating new lock object(see lock_rec_add_to_queue() and lock_rec_find_similar_on_page()), and the lock, which owns the bitmap, can precede II2(granted). We can either disable lock_rec_find_similar_on_page() space optimization for bypassing locks or treat "X1(ordinary, granted) II2(granted)" sequence as valid. As we don't currently have the function which would fail on the above sequence, let treat it as valid for the case, when lock_release() execution is in process. 2. S1(ordinary, granted) II2(waiting for S1) X3(waiting for S1) When new X1(ordinary) lock is acquired, there will be the following locks queue: S1(ordinary, granted) II2(waiting for S1) X1(ordinary, granted) X3(waiting for S1). After S1 releasing there will be: II2(granted) X1(ordinary, granted) X3(waiting for X1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The above queue is valid because ordinary lock does not conflict with II-lock(see lock_rec_has_to_wait()). lock_rec_create_low(): insert new lock to the position which lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue() returned if the lock is bypassing. lock_rec_find_similar_on_page(): add ability to limit similiar lock search with the certain lock to preserve "blocking before blocked" invariant for all bypassed locks. lock_rec_add_to_queue(): don't treat bypassed locks as waiting ones to let lock bitmap reusing for bypassing locks. lock_rec_lock(): fix inplace insert case, explained above. lock_rec_dequeue_from_page(), lock_rec_rebuild_waiting_queue(): move bypassing lock to the correct place to preserve "blocking before blocked" invariant. Reviewed by: Debarun Banerjee, Marko Mäkelä.	2025-01-23 17:38:32 +03:00
Vlad Lesin	c05e7c4e0e	MDEV-35708 lock_rec_get_prev() returns only the first record lock It's supposed that the function gets the previous lock set on a record. But if there are several locks set on a record, it will return only the first one. Continue locks list iteration till the certain lock even if the certain bit in lock bitmap is set.	2025-01-20 12:03:50 +03:00
Marko Mäkelä	0abef37ccd	Minor lock_sys cleanup Let us make some member functions of lock_sys_t non-static to avoid some shuffling of function parameter registers. lock_cancel_waiting_and_release(): Declare static, because there are no external callers. Reviewed by: Debarun Banerjee	2025-01-15 16:55:29 +02:00
Marko Mäkelä	b82abc7163	MDEV-35701 trx_t::autoinc_locks causes unnecessary dynamic memory allocation trx_t::autoinc_locks: Use small_vector<lock_t*,4> in order to avoid any dynamic memory allocation in the most common case (a statement is holding AUTO_INCREMENT locks on at most 4 tables or partitions). lock_cancel_waiting_and_release(): Instead of removing elements from the middle, simply assign nullptr, like lock_table_remove_autoinc_lock(). The added test innodb.auto_increment_lock_mode covers the dynamic memory allocation as well as nondeterministically (occasionally) covers the out-of-order lock release in lock_table_remove_autoinc_lock(). Reviewed by: Debarun Banerjee	2025-01-15 16:55:01 +02:00
Sergei Golubchik	f1a7693bc0	Merge branch '10.11' into 11.4	2025-01-14 23:45:41 +01:00
Sergei Golubchik	9929a0a76e	MDEV-32576 increase query length in the InnoDB deadlock output * increase target buffer size to 3072 * remove the parameter, just use the buffer size as a limit	2025-01-09 10:00:36 +01:00
Marko Mäkelä	17f01186f5	Merge 10.11 into 11.4	2025-01-09 07:58:08 +02:00
Marko Mäkelä	ddd7d5d8e3	MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure Under unknown circumstances, the SQL layer may wrongly disregard an invocation of thd_mark_transaction_to_rollback() when an InnoDB transaction had been aborted (rolled back) due to one of the following errors: * HA_ERR_LOCK_DEADLOCK * HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON) * HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON) Such an error used to cause a crash of InnoDB during transaction commit. These changes aim to catch and report the error earlier, so that not only this crash can be avoided but also the original root cause be found and fixed more easily later. The idea of this fix is from Michael 'Monty' Widenius. HA_ERR_ROLLBACK: A new error code that will be translated into ER_ROLLBACK_ONLY, signalling that the current transaction has been aborted and the only allowed action is ROLLBACK. trx_t::state: Add TRX_STATE_ABORTED that is like TRX_STATE_NOT_STARTED, but noting that the transaction had been rolled back and aborted. trx_t::is_started(): Replaces trx_is_started(). ha_innobase: Check the transaction state in various places. Simplify the logic around SAVEPOINT. ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only(). The InnoDB logic around transaction savepoints, commit, and rollback was unnecessarily complex and might have contributed to this inconsistency. So, we are simplifying that logic as well. trx_savept_t: Replace with const undo_no_t*. When we rollback to a savepoint, all we need to know is the number of undo log records that must survive. trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t directly in the space allocated at innobase_hton->savepoint_offset. fts_trx_create(): Do not copy previous savepoints. fts_savepoint_rollback(): If a savepoint was not found, roll back everything after the default savepoint of fts_trx_create(). The test innodb_fts.savepoint is extended to cover this code. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2024-12-12 18:02:00 +02:00
Marko Mäkelä	2719cc4925	Merge 10.11 into 11.4	2024-12-02 11:35:34 +02:00
Daniele Sciascia	e821c9fa7c	MDEV-35281 SR transaction crashes with innodb_snapshot_isolation Ignore snapshot isolation conflict during fragment removal, before streaming transaction commits. This happens when a streaming transaction creates a read view that precedes the INSERTion of fragments into the streaming_log table. Fragments are INSERTed using a different transaction. These fragment are then removed as part of COMMIT of the streaming transaction. This fragment removal operation could fail when the fragments were not part the transaction's read view, thus violating snapshot isolation.	2024-11-29 08:06:32 +01:00
Marko Mäkelä	895cd553a3	MDEV-32175: Reduce page_align(), page_offset() calls When srv_page_size and innodb_page_size were introduced, the functions page_align() and page_offset() got more expensive. Let us try to replace such calls with simpler pointer arithmetics with respect to the buffer page frame. page_rec_get_next_non_del_marked(): Add a page frame as a parameter, and template<bool comp>. page_rec_next_get(): A more efficient variant of page_rec_get_next(), with template<bool comp> and const page_t* parameters. lock_get_heap_no(): Replaces page_rec_get_heap_no() outside debug checks. fseg_free_step(), fseg_free_step_not_header(): Take the header block as a parameter. Reviewed by: Vladislav Lesin	2024-11-21 11:01:30 +02:00
Marko Mäkelä	3c312d247c	MDEV-35190 HASH_SEARCH duplicates effort before HASH_INSERT or HASH_DELETE The HASH_ macros are unnecessarily obfuscating the logic, so we had better replace them. hash_cell_t::search(): Implement most of the HASH_DELETE logic, for a subsequent insert or remove(). hash_cell_t::remove(): Remove an element. hash_cell_t::find(): Implement the HASH_SEARCH logic. xb_filter_hash_free(): Avoid any hash table lookup; just traverse the hash bucket chains and free each element. xb_register_filter_entry(): Search databases_hash only once. rm_if_not_found(): Make use of find_filter_in_hashtable(). dict_sys_t::acquire_temporary_table(), dict_sys_t::find_table(): Define non-inline to avoid unnecessary code duplication. dict_sys_t::add(dict_table_t *table), dict_table_rename_in_cache(): Look for duplicate while finding the insert position. dict_table_change_id_in_cache(): Merged to the only caller row_discard_tablespace(). hash_insert(): Helper function of dict_sys_t::resize(). fil_space_t::create(): Look for a duplicate (and crash if found) when searching for the insert position. lock_rec_discard(): Take the hash array cell as a parameter to avoid a duplicated lookup. lock_rec_free_all_from_discard_page(): Remove a parameter. Reviewed by: Debarun Banerjee	2024-11-21 08:59:02 +02:00
Oleksandr Byelkin	69d033d165	Merge branch '10.11' into 11.2	2024-10-29 16:42:46 +01:00
Vlad Lesin	8c7786e7d5	MDEV-34690 lock_rec_unlock_unmodified() causes deadlock lock_rec_unlock_unmodified() is executed either under lock_sys.wr_lock() or under a combination of lock_sys.rd_lock() + record locks hash table cell latch. It also requests page latch to check if locked records were changed by the current transaction or not. Usually InnoDB requests page latch to find the certain record on the page, and then requests lock_sys and/or record lock hash cell latch to request record lock. lock_rec_unlock_unmodified() requests the latches in the opposite order, what causes deadlocks. One of the possible scenario for the deadlock is the following: thread 1 - lock_rec_unlock_unmodified() is invoked under locks hash table cell latch, the latch is acquired; thread 2 - purge thread acquires page latch and tries to remove delete-marked record, it invokes lock_update_delete(), which requests locks hash table cell latch, held by thread 1; thread 1 - requests page latch, held by thread 2. To fix it we need to release lock_sys.latch and/or lock hash cell latch, acquire page latch and re-acquire lock_sys related latches. When lock_sys.latch and/or lock hash cell latch are released in lock_release_on_prepare() and lock_release_on_prepare_try(), the page on which the current lock is held, can be merged. In this case the bitmap of the current lock must be cleared, and the new lock must be added to the end of trx->lock.trx_locks list, or bitmap of already existing lock must be changed. The new field trx_lock_t::set_nth_bit_calls indicates if new locks (bits in existing lock bitmaps or new lock objects) were created during the period when lock_sys was released in trx->lock.trx_locks list iteration loop in lock_release_on_prepare() or lock_release_on_prepare_try(). And, if so, we traverse the list again. The block can be freed during pages merging, what causes assertion failure in buf_page_get_gen(), as btr_block_get() passes BUF_GET as page get mode to it. That's why page_get_mode parameter was added to btr_block_get() to pass BUF_GET_POSSIBLY_FREED from lock_release_on_prepare() and lock_release_on_prepare_try() to buf_page_get_gen(). As searching for id of trx, which modified secondary index record, is quite expensive operation, restrict its usage for master. System variable was added to remove the restriction for testing simplifying. The variable exists only either for debug build or for build with -DINNODB_ENABLE_XAP_UNLOCK_UNMODIFIED_FOR_PRIMARY option to increase the probability of catching bugs for release build with RQG. Note that the code, which does primary index lookup to find out what transaction modified secondary index record, is necessary only when there is no primary key and no unique secondary key on replica with row based replication, because only in this case extra X locks on unmodified records can be set during scan phase. Reviewed by Marko Mäkelä.	2024-10-23 12:36:17 +03:00
Vlad Lesin	92180ad513	MDEV-34466 XA prepare don't release unmodified records for some cases There is no need to exclude exclusive non-gap locks from the procedure of locks releasing on XA PREPARE execution in lock_release_on_prepare_try() after commit `17e59ed3aa` (MDEV-33454), because lock_rec_unlock_unmodified() should check if the record was modified with the XA, and release the lock if it was not. lock_release_on_prepare_try(): don't skip X-locks, let lock_rec_unlock_unmodified() to process them. lock_sec_rec_some_has_impl(): add template parameter for not acquiring trx_t::mutex for the case if a caller already holds the mutex, don't crash if lock's bitmap is clean. row_vers_impl_x_locked(), row_vers_impl_x_locked_low(): add new argument to skip trx_t::mutex acquiring. rw_trx_hash_t::validate_element(): don't acquire trx_t::mutex if the current thread already holds it. Thanks to Andrei Elkin for finding the bug. Reviewed by Marko Mäkelä, Debarun Banerjee.	2024-10-23 12:36:17 +03:00
Jan Lindström	b3be3c2157	MDEV-30653 : With wsrep_mode=REPLICATE_ARIA only part of mixed-engine transactions is replicated Replication of non-transactional engines is experimental and uses TOI. This naturally means that if there is open transaction with transactional engine it's changes will be rolled back. Fixed by adding error message if non-transactional engine is part of multi-engine transaction with warning. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-10-23 04:00:52 +02:00
Marko Mäkelä	12a91b57e2	Merge 10.11 into 11.2	2024-10-03 13:24:43 +03:00
Denis Protivensky	231900e5bb	MDEV-34836: TOI on parent table must BF abort SR in progress on a child Applied SR transaction on the child table was not BF aborted by TOI running on the parent table for several reasons: Although SR correctly collected FK-referenced keys to parent, TOI in Galera disregards common certification index and simply sets itself to depend on the latest certified write set seqno. Since this write set was the fragment of SR transaction, TOI was allowed to run in parallel with SR presuming it would BF abort the latter. At the same time, DML transactions in the server don't grab MDL locks on FK-referenced tables, thus parent table wasn't protected by an MDL lock from SR and it couldn't provoke MDL lock conflict for TOI to BF abort SR transaction. In InnoDB, DDL transactions grab shared MDL locks on child tables, which is not enough to trigger MDL conflict in Galera. InnoDB-level Wsrep patch didn't contain correct conflict resolution logic due to the fact that it was believed MDL locking should always produce conflicts correctly. The fix brings conflict resolution rules similar to MDL-level checks to InnoDB, thus accounting for the problematic case. Apart from that, wsrep_thd_is_SR() is patched to return true only for executing SR transactions. It should be safe as any other SR state is either the same as for any single write set (thus making the two logically equivalent), or it reflects an SR transaction as being aborting or prepared, which is handled separately in BF-aborting logic, and for regular execution path it should not matter at all. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-09-24 11:14:01 +02:00
Marko Mäkelä	e91a799458	Merge 10.11 into 11.2	2024-08-29 16:02:57 +03:00
Marko Mäkelä	0e76c1ba94	Merge 10.5 into 10.6	2024-08-28 15:51:36 +03:00
Marko Mäkelä	e7bb9b7c55	MDEV-24923 fixup: Correct a function comment	2024-08-27 18:06:24 +03:00
Oleksandr Byelkin	80abd847da	Merge branch '10.11' into 11.1	2024-08-03 09:32:42 +02:00
Thirunarayanan Balathandayuthapani	3359ac09a4	MDEV-34066 Output of SHOW ENGINE INNODB STATUS uses the nanoseconds suffix for microseconds - This issue is caused by commit `e71e613353` (MDEV-24671). Change the output of transaction lock wait time in microseconds suffix.	2024-07-23 21:36:13 +05:30
Yuchen Pei	f071b7620b	Merge branch '10.5' into 10.6	2024-07-16 15:54:22 +08:00
Thirunarayanan Balathandayuthapani	00d2c7f7f4	MDEV-34542 Assertion `lock_trx_has_sys_table_locks(trx) == __null' failed in void row_mysql_unfreeze_data_dictionary(trx_t*) - During XA PREPARE, InnoDB releases the non-exclusive locks. But it fails to remove the non-exclusive table lock from the transaction table locks. In the mean time, main thread evicts the table from the LRU cache. While rollbacking the XA transaction, InnoDB iterates through the table locks to check whether it holds lock on any system tables and wrongly assumes the evicted table as system table since the table id is 0 Fix: === During XA PREPARE, remove the table locks of the transaction while releasing the non-exclusive locks.	2024-07-12 17:42:14 +05:30
Julius Goryavsky	4026f04425	Merge branch 10.5 into 10.6	2024-07-09 11:56:47 +02:00
Denis Protivensky	b7718a1c1c	MDEV-32738: Don't roll back high-prio txn waiting on a lock in InnoDB DML transactions on FK-child tables also get table locks on FK-parent tables. If there is a DML transaction holding such a lock, and a TOI transaction starts, the latter BF-aborts the former and puts itself into a waiting state. If at this moment another DML transaction on FK-child table starts, it doesn't check that the transaction waiting on a parent table lock is TOI, and it erroneously BF-aborts the waiting TOI transaction. The fix: don't roll back high-priority transaction waiting on a lock in InnoDB, instead roll back an incoming DML transaction. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-07-08 23:36:21 +02:00
Oleksandr Byelkin	2447dda2c0	Merge branch '10.11' into 11.1	2024-07-08 22:40:16 +02:00
Marko Mäkelä	d1ecf5cc5f	MDEV-32176 Contention in ha_innobase::info_low() During a Sysbench oltp_point_select workload with 1 table and 400 concurrent connections, a bottleneck on dict_table_t::lock_mutex was observed in ha_innobase::info_low(). dict_table_t::lock_latch: Replaces lock_mutex. In ha_innobase::info_low() and several other places, we will acquire a shared dict_table_t::lock_latch or we may elide the latch if hardware memory transactions are available. innobase_build_v_templ(): Remove the parameter "bool locked", and require the caller to hold exclusive dict_table_t::lock_latch (instead of holding an exclusive dict_sys.latch). Tested by: Vladislav Vaintroub Reviewed by: Vladislav Vaintroub	2024-06-28 15:57:07 +03:00
Iaroslav Babanin	5d49a2add7	MDEV-33935 fix deadlock counter - The deadlock counter was moved from Deadlock::find_cycle into Deadlock::report, because the find_cycle method is called multiple times during deadlock detection flow, which means it shouldn't have such side effects. But report() can, which called only once for a victim transaction. - Also the deadlock_detect.test and *.result test case has been extended to handle the fix.	2024-06-19 20:43:33 +03:00
Jan Lindström	ee974ca5e0	MDEV-31658 : Deadlock found when trying to get lock during applying Problem was that there was two non-conflicting local idle transactions in node_1 that both inserted a key to primary key. Then two transactions from other nodes inserted also a key to primary key so that insert from node_2 conflicted one of the local transactions in node_1 so that there would be duplicate key if both are committed. For this insert from other node tries to acquire S-lock for this record and because this insert is high priority brute force (BF) transaction it will kill idle local transaction. Concurrently, second insert from node_3 conflicts the second idle insert transaction in node_1. Again, it tries to acquire S-lock for this record and kills idle local transaction. At this point we have two non-conflicting high priority transactions holding S-lock on different records in node_1. For example like this: rec s-lock-node2-rec s-lock-node3-rec rec. Because these high priority BF-transactions do not wait each other insert from node3 that has later seqno compared to insert from node2 can continue. It will try to acquire insert intention for record it tries to insert (to avoid duplicate key to be inserted by local transaction). Hower, it will note that there is conflicting S-lock in same gap between records. This will lead deadlock error as we have defined that BF-transactions may not wait for record lock but we can't kill conflicting BF-transaction because it has lower seqno and it should commit first. BF-transactions are executed concurrently because their values to primary key are different i.e. they do not conflict. Galera certification will make sure that inserts from other nodes i.e these high priority BF-transactions can't insert duplicate keys. Local transactions naturally can but they will be killed when BF-transaction acquires required record locks. Therefore, we can allow situation where there is conflicting S-lock and insert intention lock regardless of their seqno order and let both continue with no wait. This will lead to situation where we need to allow BF-transaction to wait when lock_rec_has_to_wait_in_queue is called because this function is also called from lock_rec_queue_validate and because lock is waiting there would be assertion in ut_a(lock->is_gap() \|\| lock_rec_has_to_wait_in_queue(cell, lock)); lock_wait_wsrep_kill Add debug sync points for BF-transactions killing local transaction. wsrep_assert_no_bf_bf_wait Print also requested lock information lock_rec_has_to_wait Add function to handle wsrep transaction lock wait cases. lock_rec_has_to_wait_wsrep New function to handle wsrep transaction lock wait exceptions. lock_rec_has_to_wait_in_queue Remove wsrep exception, in this function all conflicting locks need to wait in queue. Conflicts between BF and local transactions are handled in lock_wait. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-06-19 14:09:11 +02:00
Marko Mäkelä	d34289a3e2	Merge 10.11 into 11.1	2024-06-17 09:21:50 +03:00
Marko Mäkelä	27834ebc91	Merge 10.5 into 10.6	2024-06-10 15:22:15 +03:00
Marko Mäkelä	a2bd936c52	MDEV-33161 Function pointer signature mismatch in LF_HASH In cmake -DWITH_UBSAN=ON builds with clang but not with GCC, -fsanitize=undefined will flag several runtime errors on function pointer mismatch related to the lock-free hash table LF_HASH. Let us use matching function signatures and remove function pointer casts in order to avoid potential bugs due to undefined behaviour. These errors could be caught at compilation time by -Wcast-function-type-strict, which is available starting with clang-16, but not available in any version of GCC as of now. The old GCC flag -Wcast-function-type is enabled as part of -Wextra, but it specifically does not catch these errors. Reviewed by: Vladislav Vaintroub	2024-06-10 12:35:33 +03:00
Yuchen Pei	2d3e2c58b6	Merge branch '10.11' into 11.1	2024-05-31 10:54:31 +10:00
Marko Mäkelä	5ba542e9ee	Merge 10.5 into 10.6	2024-05-30 14:27:07 +03:00
mariadb-DebarunBanerjee	b2944adb76	MDEV-34166 Server could hang with BP < 80M under stress BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size. For example, for BP size lower than 80M and 16 K page size, the limit is more than 5% of total BP and for lowest BP 5M, it is 80% of the BP. Non-data objects like explicit locks could occupy part of the BP pool reducing the pages available for LRU. If LRU reaches minimum limit and if no free pages are available, server would hang with page cleaner not able to free any more pages. Fix: To avoid such hang, we adjust the LRU limit lower than the limit for data objects as checked in buf_LRU_check_size_of_non_data_objects() i.e. one page less than 5% of BP.	2024-05-21 14:13:29 +05:30
mariadb-DebarunBanerjee	8047c8bc71	MDEV-28800 SIGABRT due to running out of memory for InnoDB locks This regression is introduced in 10.6 by following commit. commit `898dcf93a8` (Cleanup the lock creation) It removed one important optimization for lock bitmap pre-allocation. We pre-allocate about 8 byte extra space along with every lock object to adjust for similar locks on newly created records on the same page by same transaction. When it is exhausted, a new lock object is created with similar 8 byte pre-allocation. With this optimization removed we are left with only 1 byte pre-allocation. When large number of records are inserted and locked in a single page, we end up creating too many new locks almost in n^2 order. Fix-1: Bring back LOCK_PAGE_BITMAP_MARGIN for pre-allocation. Fix-2: Use the extra space (40 bytes) for bitmap in trx->lock.rec_pool.	2024-05-20 21:19:13 +05:30
Sergei Golubchik	f0a5412037	Merge branch '11.0' into 11.1	2024-05-13 09:52:30 +02:00
Sergei Golubchik	f9807aadef	Merge branch '10.11' into 11.0	2024-05-12 12:18:28 +02:00
Marko Mäkelä	4aa92911c7	MDEV-33802 Weird read view after ROLLBACK of another transaction Even after commit `b8a6719889` there is an anomaly where a locking read could return inconsistent results. If a locking read would have to wait for a record lock, then by the definition of a read view, the modifications made by the current lock holder cannot be visible in the read view. This is because the read view must exclude any transactions that had not been committed at the time when the read view was created. lock_rec_convert_impl_to_expl_for_trx(), lock_rec_convert_impl_to_expl(): Return an unsafe-to-dereference pointer to a transaction that holds or held the lock, or nullptr if the lock was available. lock_clust_rec_modify_check_and_lock(), lock_sec_rec_read_check_and_lock(), lock_clust_rec_read_check_and_lock(): Return DB_RECORD_CHANGED if innodb_strict_isolation=ON and the lock was being held by another transaction. The test case, which is based on a bug report by Zhuang Liu, covers the function lock_sec_rec_read_check_and_lock(). Reviewed by: Vladislav Lesin	2024-04-09 12:50:24 +03:00
Marko Mäkelä	683fbced6b	Merge 11.0 into 11.1	2024-03-28 12:15:36 +02:00
Marko Mäkelä	fec2fd6add	Merge 10.11 into 11.0	2024-03-28 10:51:36 +02:00

1 2 3 4 5 ...

809 Commits