1
0
mirror of https://github.com/MariaDB/server.git synced 2025-09-13 13:47:59 +03:00
Commit Graph

809 Commits

Author SHA1 Message Date
Oleksandr Byelkin
a8d4642375 Merge branch '10.11' into 11.4 2025-04-26 10:53:02 +02:00
Vlad Lesin
47e687b109 MDEV-36639 innodb_snapshot_isolation=1 gives error for not committed row changes
Set solution is to check if transaction, which modified a record, is
still active in lock_clust_rec_read_check_and_lock(). if yes, then just
request a lock. If no, then, depending on if the current transaction read
view can see the changes, return eighter DB_RECORD_CHANGED or request a
lock.

We can do the check in lock_clust_rec_read_check_and_lock() because
transaction tries to set a lock on the record which cursor points to after
transaction resuming and cursor position restoring. If the lock already
exists, then we don't request the lock again. But for the current commit
it's important that lock_clust_rec_read_check_and_lock() will be invoked
again for the same record, so we can do the check again after
transaction, which modified a record, was committed or rolled back.

MDEV-33802(4aa9291) is partially reverted. If some transaction holds
implicit lock on some record and transaction with snapshot isolation level
requests conflicting lock on the same record, it should be blocked instead
of returning DB_RECORD_CHANGED to have ability to continue execution when
implicit lock owner is rolled back.

The construction
--------------------------------------------------------------------------
let $wait_condition=
  select count(*) = 1 from information_schema.processlist
  where state = 'Updating' and info = 'UPDATE t SET b = 2 WHERE a';
--source include/wait_condition.inc
--------------------------------------------------------------------------

is not reliable enought to make sure transaction is blocked in test
case, the test failed sporadically with
--------------------------------------------------------------------------
./mtr --max-test-fail=1 --parallel=96 lock_isolation{,,,,,,,}{,,,}{,,} \
--repeat=500
--------------------------------------------------------------------------

command. That's why it was replaced with debug sync-points.

Reviewed by: Marko Mäkelä
2025-04-22 20:41:43 +03:00
Jan Lindström
5f2562291c MDEV-36509 : Galera test failure on galera_sr.mysql-wsrep-features#165
Problem was that thread was holding lock_sys.wait_mutex when
streaming replication transaction rollback was handled and
in wsrep-lib requests THD::LOCK_thd_kill mutex causing
wrong mutex usage (thd->reset_globals()).

Fix is to remove streaming replication rollback handling
from Deadlock::report() i.e. wsrep_handle_SR_rollback call.
Purpose of Deadloc::report() is to find a cycle in the
waits-for graph if exists, report it, mark victim transaction
as deadlock victim and release locks it is waiting for.
Actual streaming replication rollback that can take longer
time can be handled later at trx_t::rollback where
lock_sys.wait_mutex is not held.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2025-04-15 02:18:19 +02:00
Marko Mäkelä
f5bd250f5b Merge 10.11 into 11.4 2025-03-28 13:55:21 +02:00
Marko Mäkelä
d1a6792324 MDEV-36122: Protect table references with a lock
dict_table_open_on_id(): Simplify the logic.

dict_stats: A helper for acquiring MDL and opening the tables
mysql.innodb_table_stats and mysql.innodb_index_stats.

innodb_ft_aux_table_validate(): Contiguously hold dict_sys.latch
while accessing the table that we open with dict_table_open_on_name().

lock_table_children(): Do not hold a table reference while invoking
dict_acquire_mdl_shared<false>(), which may temporarily release and
reacquire the shared dict_sys.latch that we are holding.

prepare_inplace_alter_table_dict(): If an unexpected reference to the
table exists, wait for the purge subsystem to release its table handle,
similar to how we would do in case FULLTEXT INDEX existed. This function
is supposed to be protected by MDL_EXCLUSIVE on the table name.
If purge is going to access the table again later during is ALTER TABLE
operation, it will have access to an MDL compatible name for it and
therefore should conflict with any MDL_EXCLUSIVE that would cover
ha_innobase::commit_inplace_alter_table(commit=true).

ha_innobase::rename_table(): Before locking the data dictionary,
ensure that the purge subsystem is not holding a reference to the
table due to the lack of metadata locking, related to FULLTEXT INDEX
or the row-level undo logging of ALTER IGNORE TABLE.

ha_innobase::truncate(): Before locking the data dictionary,
ensure that the purge subsystem is not holding a reference to the
table due to insufficient metadata locking related to an earlier
ALTER IGNORE TABLE operation.

trx_purge_attach_undo_recs(), purge_sys_t::batch_cleanup():
Clear purge_sys.m_active only after all table handles have been released.

With these changes, no caller of dict_acquire_mdl_shared<false>
should be holding a table reference.

All remaining calls to dict_table_open_on_name(dict_locked=false)
except the one in fts_lock_table() and possibly in the DDL recovery
predicate innodb_check_version() should be protected by MDL, but there
currently is no assertion that would enforce this.

Reviewed by: Debarun Banerjee
2025-03-26 14:31:44 +02:00
Marko Mäkelä
67caeca284 MDEV-36122: Protect table references with a lock
dict_table_open_on_id(): Simplify the logic.

dict_stats: A helper for acquiring MDL and opening the tables
mysql.innodb_table_stats and mysql.innodb_index_stats.

innodb_ft_aux_table_validate(): Contiguously hold dict_sys.latch
while accessing the table that we open with dict_table_open_on_name().

lock_table_children(): Do not hold a table reference while invoking
dict_acquire_mdl_shared<false>(), which may temporarily release and
reacquire the shared dict_sys.latch that we are holding.

With these changes, no caller of dict_acquire_mdl_shared<false>
should be holding a table reference.

All remaining calls to dict_table_open_on_name(dict_locked=false)
except the one in fts_lock_table() and possibly in the DDL recovery
predicate innodb_check_version() should be protected by MDL, but there
currently is no assertion that would enforce this.

Reviewed by: Debarun Banerjee
2025-03-26 14:22:58 +02:00
Sergei Golubchik
7d657fda64 Merge branch '10.11 into 11.4 2025-01-30 12:01:11 +01:00
Vlad Lesin
3a6af458e6 MDEV-34877 Port "Bug #11745929 Change lock priority so that the transaction holding S-lock gets X-lock first" fix from MySQL to MariaDB
This commit implements
mysql/mysql-server@7037a0bdc8
functionality.

If some transaction 't' requests not-gap X-lock 'Xt' on record 'r', and
locks list of the record 'r' contains not-gap granted S-lock 'St' of
transaction 't', followed by not-gap waiting locks WB={Wb1,
Wb2, ..., Wbn} conflicting with 'Xt', and 'Xt' does not conflict with any
other lock, located in the list after 'St', then grant 'Xt'. Note that
insert-intention locks are also gap locks.

If some transaction 't' holds not-gap lock 'Lt' on record 'r', and some
other transactions have not-gap continuous waiting locks sequence
L(B)={L(b1), L(b2), ..., L(bn)} following L(t) in
the list of locks for the record 'r', and transaction 't' requests not-gap,
what means also not insert intention, as ii-locks are also gap locks,
X-lock conflicting with any lock in L(B), then grant the.

MySQL's commit contains the following explanation of why insert-intention
locks must not overtake a waiting ordinary or gap locks:

"It is important that this decission rule doesn't allow
INSERT_INTENTION locks to overtake WAITING locks on gaps (`S`, `S|GAP`,
`X`, `X|GAP`), as inserting a record into a gap would split such WAITING
lock, violating the invariant that each transaction can have at most
single WAITING lock at any time."

I would add to the explanation the following. Suppose we have trx 1 which
holds ordinary X-lock on some record. And trx 2 executes "DELETE FROM t"
or "SELECT * FOR UPDATE" in RR(see lock_delete_updated.test and
MDEV-27992), i.e. it creates waiting ordinary X-lock on the same record.
And then trx 1 wants to insert some record just before the locked record.
It requests insert-intention lock, and if the lock overtakes trx 2 lock,
there will be phantom records for trx 2 in RR. lock_delete_updated.test
shows how "DELETE" allows to insert some records in already scanned gap
and misses some records to delete.

The current implementation differs from MySQL implementation. There are
two key differences:

1. Lock queue ordering. In MySQL all waiting locks precede all granted
   locks. A new waiting lock is added to the head of the queue, a new
   granted lock is added to the end of the queue, if some waiting lock
   is granted, it's moved to the end of the queue. In MariaDB any new
   lock is added to the end of the queue and waiting lock does not change
   its position in the queue where the lock is granted. The rule is that
   blocking lock must be located before blocked lock in lock queue. We
   maintain the rule with inserting bypassing lock just before bypassed
   one.

2. MySQL implementation uses some object(locksys::Trx_locks_cache) which
   can be passed to consecutive calls to rec_lock_has_to_wait() for the
   same trx and heap_no to cache the result of checking if trx has a
   granted lock which is blocking the waiting lock(see
   locksys::Trx_locks_cache::has_granted_blocker()). The current
   implementation does not use such object, because it looks for such
   granted lock on the level of lock_rec_other_has_conflicting() and
   lock_rec_has_to_wait_in_queue(). I.e. there is no need in additional
   lock queue iteration in
   locksys::Trx_locks_cache::has_granted_blocker(), as we already iterate
   it in lock_rec_other_has_conflicting() and
   lock_rec_has_to_wait_in_queue().

During the testing the following case was found. Suppose we have
delete-marked record and going to do inplace insert into
that delete-marked record. Usually we don't create explicit lock if
there are no conlicting with not gap X-lock locks(see
lock_clust_rec_modify_check_and_lock(), btr_cur_update_in_place()). The
implicit lock will be converted to explicit one by demand.

That can happen during INSERT, the not-gap S-lock can
be acquired on searching for duplicates(see
row_ins_duplicate_error_in_clust()), and, if delete-marked record is
found, inplace insert(see btr_cur_upd_rec_in_place()) modifies the
record, what is treated as implicit lock.

But there can be a case when some transaction trx1 holds not-gap S-lock,
another transaction trx2 creates waiting X-lock, and then trx2 tries to
do inplace insert. Before the fix the waiting X-lock of trx2 would be
conflicting lock, and trx1 would try to create explicit X-lock, what
would cause deadlock, and one of the transactions whould be rolled back.
But after the fix, trx2 waiting X-lock is not treated as conflicting
with trx1 X-lock anymore, as trx1 already holds S-lock. If we don't create
explicit lock, then some other transaction trx3 can create it during
implicit to explicit lock conversion and place it at the end of the
queue. So there can be the following locks order in the queue:

S1(granted) X2(waiting) X1(granted)

The above queue is not valid, because all granted trx1 locks must be
placed before waiting trx2 lock. Besides, lock_rec_release_try() can
remove S(granted, trx1) lock and grant X lock to trx 2, and there can be
two granted X-locks on the same record:

X2(granted) X1(granted)

Taking into account that lock_rec_release_try() can release cell and
lock_sys latches leaving some locks unreleased, the queue validation
function can fail in any unexpected place.

It can be fixed with two ways:

1) Place explicit X(granted, trx1) lock before X(waiting, trx2) lock
   during implicit to explicit lock conversion. This option is implemented
   in MySQL, as granted lock is always placed at the top of locks queue,
   and waiting locks are placed at the bottom of the queue. MariaDB does
   not do this, and implementing this variant would require conflicting
   locks search before converting implicit to explicit lock, what, in
   turns, would require cell and/or lock_sys latch acquiring.

2) Create and place X(granted, trx1) lock before X(waiting, trx2) during
   inplace INSERT, i.e. when lock_rec_lock() is invoked from
   lock_clust_rec_modify_check_and_lock() or
   lock_sec_rec_modify_check_and_lock(), if X(waiting, trx2) is
   bypassed. Such a way we don't need in additional conflicting locks
   search, as they are searched anyway in lock_rec_low().

This fix implements the second variant(see the changes around
c_lock_info.insert_after in lock_rec_lock). I.e. if some record was
delete-marked and we do inplace insert in such a record, and some lock for
bypass was found, create explicit lock to avoid conflicting lock search on
each implicit to explicit lock conversion. We can remove it if MDEV-35624
is implemented.

lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue():
search locks to bypass along with conflicting locks searching in the
same loop. The result is returned in conflicting_lock_info object.
There can be several locks to bypass, only the first one is returned to
limit lock_rec_find_similar_on_page() with the first bypassed lock to
preserve "blocking before blocked" invariant. conflicting_lock_info also
contains a pointer to the lock, after which we can insert bypassing
lock. This lock precedes bypassed one.

Bypassing lock can be next-key lock, and the following cases are
possible:

1. S1(not-gap, granted) II2(granted) X3(waiting for S1),

   When new X1(ordinary) lock is acquired, there will be the following
   locks queue:

   S1(not-gap, granted) II2(granted) X1(ordinary, granted) X3(waiting for
   S1)

   If we had inserted new X1 lock just after S1, and S1 had been released
   on transaction commit or rollback, we would have the following
   sequence in the locks queue:

   X1(ordinary, granted) II2(granted) X3(waiting for X1)
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   This is not a real issue as II lock once granted can be
   ignored but it could possibly hit some assert(taking into account
   that lock_release_try() can release lock_sys latch, and other threads
   can acquire the latch and validate lock queue) as it breaks our design
   constraint that any granted lock in the queue should not conflict
   with locks ahead in the queue. But lock_rec_queue_validate() does not
   check the above constraint. We place new bypassing lock just before
   bypassed one, but there still can be the case when lock bitmap is used
   instead of creating new lock object(see lock_rec_add_to_queue() and
   lock_rec_find_similar_on_page()), and the lock, which owns the
   bitmap, can precede II2(granted). We can either disable
   lock_rec_find_similar_on_page() space optimization for bypassing locks
   or treat "X1(ordinary, granted) II2(granted)" sequence as valid. As
   we don't currently have the function which would fail on the above
   sequence, let treat it as valid for the case, when lock_release()
   execution is in process.

2. S1(ordinary, granted) II2(waiting for S1) X3(waiting for S1)

   When new X1(ordinary) lock is acquired, there will be the following
   locks queue:

   S1(ordinary, granted) II2(waiting for S1) X1(ordinary, granted)
   X3(waiting for S1).

   After S1 releasing there will be:

   II2(granted) X1(ordinary, granted) X3(waiting for X1)
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

   The above queue is valid because ordinary lock does not conflict with
   II-lock(see lock_rec_has_to_wait()).

lock_rec_create_low(): insert new lock to the position which
lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue()
returned if the lock is bypassing.

lock_rec_find_similar_on_page(): add ability to limit similiar lock search
with the certain lock to preserve "blocking before blocked" invariant for
all bypassed locks.

lock_rec_add_to_queue(): don't treat bypassed locks as waiting ones to
let lock bitmap reusing for bypassing locks.

lock_rec_lock(): fix inplace insert case, explained above.

lock_rec_dequeue_from_page(), lock_rec_rebuild_waiting_queue(): move
bypassing lock to the correct place to preserve "blocking before blocked"
invariant.

Reviewed by: Debarun Banerjee, Marko Mäkelä.
2025-01-23 17:38:32 +03:00
Vlad Lesin
c05e7c4e0e MDEV-35708 lock_rec_get_prev() returns only the first record lock
It's supposed that the function gets the previous lock set on a record.
But if there are several locks set on a record, it will return only the
first one. Continue locks list iteration till the certain lock even if
the certain bit in lock bitmap is set.
2025-01-20 12:03:50 +03:00
Marko Mäkelä
0abef37ccd Minor lock_sys cleanup
Let us make some member functions of lock_sys_t non-static
to avoid some shuffling of function parameter registers.

lock_cancel_waiting_and_release(): Declare static, because there
are no external callers.

Reviewed by: Debarun Banerjee
2025-01-15 16:55:29 +02:00
Marko Mäkelä
b82abc7163 MDEV-35701 trx_t::autoinc_locks causes unnecessary dynamic memory allocation
trx_t::autoinc_locks: Use small_vector<lock_t*,4> in order to avoid any
dynamic memory allocation in the most common case (a statement is
holding AUTO_INCREMENT locks on at most 4 tables or partitions).

lock_cancel_waiting_and_release(): Instead of removing elements from
the middle, simply assign nullptr, like lock_table_remove_autoinc_lock().

The added test innodb.auto_increment_lock_mode covers the dynamic memory
allocation as well as nondeterministically (occasionally) covers
the out-of-order lock release in lock_table_remove_autoinc_lock().

Reviewed by: Debarun Banerjee
2025-01-15 16:55:01 +02:00
Sergei Golubchik
f1a7693bc0 Merge branch '10.11' into 11.4 2025-01-14 23:45:41 +01:00
Sergei Golubchik
9929a0a76e MDEV-32576 increase query length in the InnoDB deadlock output
* increase target buffer size to 3072
* remove the parameter, just use the buffer size as a limit
2025-01-09 10:00:36 +01:00
Marko Mäkelä
17f01186f5 Merge 10.11 into 11.4 2025-01-09 07:58:08 +02:00
Marko Mäkelä
ddd7d5d8e3 MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure
Under unknown circumstances, the SQL layer may wrongly disregard an
invocation of thd_mark_transaction_to_rollback() when an InnoDB
transaction had been aborted (rolled back) due to one of the following errors:
* HA_ERR_LOCK_DEADLOCK
* HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON)
* HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON)

Such an error used to cause a crash of InnoDB during transaction commit.
These changes aim to catch and report the error earlier, so that not only
this crash can be avoided but also the original root cause be found and
fixed more easily later.

The idea of this fix is from Michael 'Monty' Widenius.

HA_ERR_ROLLBACK: A new error code that will be translated into
ER_ROLLBACK_ONLY, signalling that the current transaction
has been aborted and the only allowed action is ROLLBACK.

trx_t::state: Add TRX_STATE_ABORTED that is like
TRX_STATE_NOT_STARTED, but noting that the transaction had been
rolled back and aborted.

trx_t::is_started(): Replaces trx_is_started().

ha_innobase: Check the transaction state in various places.
Simplify the logic around SAVEPOINT.

ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only().

The InnoDB logic around transaction savepoints, commit, and rollback
was unnecessarily complex and might have contributed to this
inconsistency. So, we are simplifying that logic as well.

trx_savept_t: Replace with const undo_no_t*. When we rollback to
a savepoint, all we need to know is the number of undo log records
that must survive.

trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t
directly in the space allocated at innobase_hton->savepoint_offset.

fts_trx_create(): Do not copy previous savepoints.

fts_savepoint_rollback(): If a savepoint was not found, roll back
everything after the default savepoint of fts_trx_create().
The test innodb_fts.savepoint is extended to cover this code.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
2024-12-12 18:02:00 +02:00
Marko Mäkelä
2719cc4925 Merge 10.11 into 11.4 2024-12-02 11:35:34 +02:00
Daniele Sciascia
e821c9fa7c MDEV-35281 SR transaction crashes with innodb_snapshot_isolation
Ignore snapshot isolation conflict during fragment removal, before
streaming transaction commits. This happens when a streaming
transaction creates a read view that precedes the INSERTion of
fragments into the streaming_log table. Fragments are INSERTed
using a different transaction. These fragment are then removed
as part of COMMIT of the streaming transaction. This fragment
removal operation could fail when the fragments were not part
the transaction's read view, thus violating snapshot isolation.
2024-11-29 08:06:32 +01:00
Marko Mäkelä
895cd553a3 MDEV-32175: Reduce page_align(), page_offset() calls
When srv_page_size and innodb_page_size were introduced,
the functions page_align() and page_offset() got more expensive.
Let us try to replace such calls with simpler pointer arithmetics
with respect to the buffer page frame.

page_rec_get_next_non_del_marked(): Add a page frame as a parameter,
and template<bool comp>.

page_rec_next_get(): A more efficient variant of page_rec_get_next(),
with template<bool comp> and const page_t* parameters.

lock_get_heap_no(): Replaces page_rec_get_heap_no() outside debug checks.

fseg_free_step(), fseg_free_step_not_header(): Take the header block
as a parameter.

Reviewed by: Vladislav Lesin
2024-11-21 11:01:30 +02:00
Marko Mäkelä
3c312d247c MDEV-35190 HASH_SEARCH duplicates effort before HASH_INSERT or HASH_DELETE
The HASH_ macros are unnecessarily obfuscating the logic,
so we had better replace them.

hash_cell_t::search(): Implement most of the HASH_DELETE logic,
for a subsequent insert or remove().

hash_cell_t::remove(): Remove an element.

hash_cell_t::find(): Implement the HASH_SEARCH logic.

xb_filter_hash_free(): Avoid any hash table lookup;
just traverse the hash bucket chains and free each element.

xb_register_filter_entry(): Search databases_hash only once.

rm_if_not_found(): Make use of find_filter_in_hashtable().

dict_sys_t::acquire_temporary_table(), dict_sys_t::find_table():
Define non-inline to avoid unnecessary code duplication.

dict_sys_t::add(dict_table_t *table), dict_table_rename_in_cache():
Look for duplicate while finding the insert position.

dict_table_change_id_in_cache(): Merged to the only caller
row_discard_tablespace().

hash_insert(): Helper function of dict_sys_t::resize().

fil_space_t::create(): Look for a duplicate (and crash if found)
when searching for the insert position.

lock_rec_discard(): Take the hash array cell as a parameter
to avoid a duplicated lookup.

lock_rec_free_all_from_discard_page(): Remove a parameter.

Reviewed by: Debarun Banerjee
2024-11-21 08:59:02 +02:00
Oleksandr Byelkin
69d033d165 Merge branch '10.11' into 11.2 2024-10-29 16:42:46 +01:00
Vlad Lesin
8c7786e7d5 MDEV-34690 lock_rec_unlock_unmodified() causes deadlock
lock_rec_unlock_unmodified() is executed either under lock_sys.wr_lock()
or under a combination of lock_sys.rd_lock() + record locks hash table
cell latch. It also requests page latch to check if locked records were
changed by the current transaction or not.

Usually InnoDB requests page latch to find the certain record on the
page, and then requests lock_sys and/or record lock hash cell latch to
request record lock. lock_rec_unlock_unmodified() requests the latches
in the opposite order, what causes deadlocks. One of the possible
scenario for the deadlock is the following:

thread 1 - lock_rec_unlock_unmodified() is invoked under locks hash table
           cell latch, the latch is acquired;
thread 2 - purge thread acquires page latch and tries to remove
           delete-marked record, it invokes lock_update_delete(), which
           requests locks hash table cell latch, held by thread 1;
thread 1 - requests page latch, held by thread 2.

To fix it we need to release lock_sys.latch and/or lock hash cell latch,
acquire page latch and re-acquire lock_sys related latches.

When lock_sys.latch and/or lock hash cell latch are released in
lock_release_on_prepare() and lock_release_on_prepare_try(), the page on
which the current lock is held, can be merged. In this case the bitmap
of the current lock must be cleared, and the new lock must be added to
the end of trx->lock.trx_locks list, or bitmap of already existing lock
must be changed.

The new field trx_lock_t::set_nth_bit_calls indicates if new locks
(bits in existing lock bitmaps or new lock objects) were created during
the period when lock_sys was released in trx->lock.trx_locks list
iteration loop in lock_release_on_prepare() or
lock_release_on_prepare_try(). And, if so, we traverse the list again.

The block can be freed during pages merging, what causes assertion
failure in buf_page_get_gen(), as btr_block_get() passes BUF_GET as page
get mode to it. That's why page_get_mode parameter was added to
btr_block_get() to pass BUF_GET_POSSIBLY_FREED from
lock_release_on_prepare() and lock_release_on_prepare_try() to
buf_page_get_gen().

As searching for id of trx, which modified secondary index record, is
quite expensive operation, restrict its usage for master. System variable
was added to remove the restriction for testing simplifying. The
variable exists only either for debug build or for build with
-DINNODB_ENABLE_XAP_UNLOCK_UNMODIFIED_FOR_PRIMARY option to increase the
probability of catching bugs for release build with RQG.

Note that the code, which does primary index lookup to find out what
transaction modified secondary index record, is necessary only when
there is no primary key and no unique secondary key on replica with row
based replication, because only in this case extra X locks on unmodified
records can be set during scan phase.

Reviewed by Marko Mäkelä.
2024-10-23 12:36:17 +03:00
Vlad Lesin
92180ad513 MDEV-34466 XA prepare don't release unmodified records for some cases
There is no need to exclude exclusive non-gap locks from the procedure
of locks releasing on XA PREPARE execution in
lock_release_on_prepare_try() after commit
17e59ed3aa (MDEV-33454), because
lock_rec_unlock_unmodified() should check if the record was modified
with the XA, and release the lock if it was not.

lock_release_on_prepare_try(): don't skip X-locks, let
lock_rec_unlock_unmodified() to process them.

lock_sec_rec_some_has_impl(): add template parameter for not acquiring
trx_t::mutex for the case if a caller already holds the mutex, don't
crash if lock's bitmap is clean.

row_vers_impl_x_locked(), row_vers_impl_x_locked_low(): add new argument
to skip trx_t::mutex acquiring.

rw_trx_hash_t::validate_element(): don't acquire trx_t::mutex if the
current thread already holds it.

Thanks to Andrei Elkin for finding the bug.
Reviewed by Marko Mäkelä, Debarun Banerjee.
2024-10-23 12:36:17 +03:00
Jan Lindström
b3be3c2157 MDEV-30653 : With wsrep_mode=REPLICATE_ARIA only part of mixed-engine transactions is replicated
Replication of non-transactional engines is experimental and
uses TOI. This naturally means that if there is open transaction
with transactional engine it's changes will be rolled back.

Fixed by adding error message if non-transactional engine
is part of multi-engine transaction with warning.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-10-23 04:00:52 +02:00
Marko Mäkelä
12a91b57e2 Merge 10.11 into 11.2 2024-10-03 13:24:43 +03:00
Denis Protivensky
231900e5bb MDEV-34836: TOI on parent table must BF abort SR in progress on a child
Applied SR transaction on the child table was not BF aborted by TOI running
on the parent table for several reasons:

Although SR correctly collected FK-referenced keys to parent, TOI in Galera
disregards common certification index and simply sets itself to depend on
the latest certified write set seqno.

Since this write set was the fragment of SR transaction, TOI was allowed to
run in parallel with SR presuming it would BF abort the latter.

At the same time, DML transactions in the server don't grab MDL locks on
FK-referenced tables, thus parent table wasn't protected by an MDL lock from
SR and it couldn't provoke MDL lock conflict for TOI to BF abort SR transaction.

In InnoDB, DDL transactions grab shared MDL locks on child tables, which is not
enough to trigger MDL conflict in Galera.

InnoDB-level Wsrep patch didn't contain correct conflict resolution logic due to
the fact that it was believed MDL locking should always produce conflicts correctly.

The fix brings conflict resolution rules similar to MDL-level checks to InnoDB,
thus accounting for the problematic case.

Apart from that, wsrep_thd_is_SR() is patched to return true only for executing
SR transactions. It should be safe as any other SR state is either the same as
for any single write set (thus making the two logically equivalent), or it reflects
an SR transaction as being aborting or prepared, which is handled separately in
BF-aborting logic, and for regular execution path it should not matter at all.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-09-24 11:14:01 +02:00
Marko Mäkelä
e91a799458 Merge 10.11 into 11.2 2024-08-29 16:02:57 +03:00
Marko Mäkelä
0e76c1ba94 Merge 10.5 into 10.6 2024-08-28 15:51:36 +03:00
Marko Mäkelä
e7bb9b7c55 MDEV-24923 fixup: Correct a function comment 2024-08-27 18:06:24 +03:00
Oleksandr Byelkin
80abd847da Merge branch '10.11' into 11.1 2024-08-03 09:32:42 +02:00
Thirunarayanan Balathandayuthapani
3359ac09a4 MDEV-34066 Output of SHOW ENGINE INNODB STATUS uses the nanoseconds suffix for microseconds
- This issue is caused by commit e71e613353
(MDEV-24671). Change the output of transaction lock wait
time in microseconds suffix.
2024-07-23 21:36:13 +05:30
Yuchen Pei
f071b7620b Merge branch '10.5' into 10.6 2024-07-16 15:54:22 +08:00
Thirunarayanan Balathandayuthapani
00d2c7f7f4 MDEV-34542 Assertion `lock_trx_has_sys_table_locks(trx) == __null' failed in void row_mysql_unfreeze_data_dictionary(trx_t*)
- During XA PREPARE, InnoDB releases the non-exclusive locks.
But it fails to remove the non-exclusive table lock from the
transaction table locks. In the mean time, main thread evicts
the table from the LRU cache. While rollbacking the XA transaction,
InnoDB iterates through the table locks to check whether it
holds lock on any system tables and wrongly assumes the
evicted table as system table since the table id is 0

Fix:
===
During XA PREPARE, remove the table locks of the transaction while
releasing the non-exclusive locks.
2024-07-12 17:42:14 +05:30
Julius Goryavsky
4026f04425 Merge branch 10.5 into 10.6 2024-07-09 11:56:47 +02:00
Denis Protivensky
b7718a1c1c MDEV-32738: Don't roll back high-prio txn waiting on a lock in InnoDB
DML transactions on FK-child tables also get table locks
on FK-parent tables. If there is a DML transaction holding
such a lock, and a TOI transaction starts, the latter
BF-aborts the former and puts itself into a waiting state.
If at this moment another DML transaction on FK-child table
starts, it doesn't check that the transaction waiting on
a parent table lock is TOI, and it erroneously BF-aborts
the waiting TOI transaction.

The fix: don't roll back high-priority transaction waiting
on a lock in InnoDB, instead roll back an incoming DML
transaction.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-07-08 23:36:21 +02:00
Oleksandr Byelkin
2447dda2c0 Merge branch '10.11' into 11.1 2024-07-08 22:40:16 +02:00
Marko Mäkelä
d1ecf5cc5f MDEV-32176 Contention in ha_innobase::info_low()
During a Sysbench oltp_point_select workload with 1 table and 400
concurrent connections, a bottleneck on dict_table_t::lock_mutex was
observed in ha_innobase::info_low().

dict_table_t::lock_latch: Replaces lock_mutex.

In ha_innobase::info_low() and several other places, we will acquire
a shared dict_table_t::lock_latch or we may elide the latch if
hardware memory transactions are available.

innobase_build_v_templ(): Remove the parameter "bool locked", and
require the caller to hold exclusive dict_table_t::lock_latch
(instead of holding an exclusive dict_sys.latch).

Tested by: Vladislav Vaintroub
Reviewed by: Vladislav Vaintroub
2024-06-28 15:57:07 +03:00
Iaroslav Babanin
5d49a2add7 MDEV-33935 fix deadlock counter
- The deadlock counter was moved from
Deadlock::find_cycle into Deadlock::report, because
the find_cycle method is called multiple times during deadlock
detection flow, which means it shouldn't have such side effects.
But report() can, which called only once for
a victim transaction.
- Also the deadlock_detect.test and *.result test case
has been extended to handle the fix.
2024-06-19 20:43:33 +03:00
Jan Lindström
ee974ca5e0 MDEV-31658 : Deadlock found when trying to get lock during applying
Problem was that there was two non-conflicting local idle
transactions in node_1 that both inserted a key to primary key.
Then two transactions from other nodes inserted also
a key to primary key so that insert from node_2 conflicted
one of the local transactions in node_1 so that there would
be duplicate key if both are committed. For this insert
from other node tries to acquire S-lock for this record
and because this insert is high priority brute force (BF)
transaction it will kill idle local transaction.

Concurrently, second insert from node_3 conflicts the second
idle insert transaction in node_1. Again, it tries to acquire
S-lock for this record and kills idle local transaction.

At this point we have two non-conflicting high priority
transactions holding S-lock on different records in node_1.
For example like this: rec s-lock-node2-rec s-lock-node3-rec rec.

Because these high priority BF-transactions do not wait
each other insert from node3 that has later seqno compared
to insert from node2 can continue. It will try to acquire
insert intention for record it tries to insert (to avoid
duplicate key to be inserted by local transaction). Hower,
it will note that there is conflicting S-lock in same gap
between records. This will lead deadlock error as we have
defined that BF-transactions may not wait for record lock
but we can't kill conflicting BF-transaction because
it has lower seqno and it should commit first.

BF-transactions are executed concurrently because their
values to primary key are different i.e. they do not
conflict.

Galera certification will make sure that inserts from
other nodes i.e these high priority BF-transactions
can't insert duplicate keys. Local transactions naturally
can but they will be killed when BF-transaction
acquires required record locks.

Therefore, we can allow situation where there is conflicting
S-lock and insert intention lock regardless of their seqno
order and let both continue with no wait. This will lead
to situation where we need to allow BF-transaction
to wait when lock_rec_has_to_wait_in_queue is called
because this function is also called from
lock_rec_queue_validate and because lock is waiting
there would be assertion in ut_a(lock->is_gap()
|| lock_rec_has_to_wait_in_queue(cell, lock));

lock_wait_wsrep_kill
  Add debug sync points for BF-transactions killing
  local transaction.

wsrep_assert_no_bf_bf_wait
  Print also requested lock information

lock_rec_has_to_wait
  Add function to handle wsrep transaction lock wait
  cases.

lock_rec_has_to_wait_wsrep
  New function to handle wsrep transaction lock wait
  exceptions.

lock_rec_has_to_wait_in_queue
  Remove wsrep exception, in this function all
  conflicting locks need to wait in queue.
  Conflicts between BF and local transactions
  are handled in lock_wait.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-06-19 14:09:11 +02:00
Marko Mäkelä
d34289a3e2 Merge 10.11 into 11.1 2024-06-17 09:21:50 +03:00
Marko Mäkelä
27834ebc91 Merge 10.5 into 10.6 2024-06-10 15:22:15 +03:00
Marko Mäkelä
a2bd936c52 MDEV-33161 Function pointer signature mismatch in LF_HASH
In cmake -DWITH_UBSAN=ON builds with clang but not with GCC,
-fsanitize=undefined will flag several runtime errors on
function pointer mismatch related to the lock-free hash table LF_HASH.

Let us use matching function signatures and remove function pointer
casts in order to avoid potential bugs due to undefined behaviour.

These errors could be caught at compilation time by
-Wcast-function-type-strict, which is available starting with clang-16,
but not available in any version of GCC as of now. The old GCC flag
-Wcast-function-type is enabled as part of -Wextra, but it specifically
does not catch these errors.

Reviewed by: Vladislav Vaintroub
2024-06-10 12:35:33 +03:00
Yuchen Pei
2d3e2c58b6 Merge branch '10.11' into 11.1 2024-05-31 10:54:31 +10:00
Marko Mäkelä
5ba542e9ee Merge 10.5 into 10.6 2024-05-30 14:27:07 +03:00
mariadb-DebarunBanerjee
b2944adb76 MDEV-34166 Server could hang with BP < 80M under stress
BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size.
For example, for BP size lower than 80M and 16 K page size, the limit is
more than 5% of total BP and for lowest BP 5M, it is 80% of the BP.
Non-data objects like explicit locks could occupy part of the BP pool
reducing the pages available for LRU. If LRU reaches minimum limit and
if no free pages are available, server would hang with page cleaner not
able to free any more pages.

Fix: To avoid such hang, we adjust the LRU limit lower than the limit
for data objects as checked in buf_LRU_check_size_of_non_data_objects()
i.e. one page less than 5% of BP.
2024-05-21 14:13:29 +05:30
mariadb-DebarunBanerjee
8047c8bc71 MDEV-28800 SIGABRT due to running out of memory for InnoDB locks
This regression is introduced in 10.6 by following commit.
commit 898dcf93a8
(Cleanup the lock creation)

It removed one important optimization for lock bitmap pre-allocation.
We pre-allocate about 8 byte extra space along with every lock object to
adjust for similar locks on newly created records on the same page by
same transaction. When it is exhausted, a new lock object is created
with similar 8 byte pre-allocation. With this optimization removed we
are left with only 1 byte pre-allocation. When large number of records
are inserted and locked in a single page, we end up creating too many
new locks almost in n^2 order.

Fix-1: Bring back LOCK_PAGE_BITMAP_MARGIN for pre-allocation.

Fix-2: Use the extra space (40 bytes) for bitmap in trx->lock.rec_pool.
2024-05-20 21:19:13 +05:30
Sergei Golubchik
f0a5412037 Merge branch '11.0' into 11.1 2024-05-13 09:52:30 +02:00
Sergei Golubchik
f9807aadef Merge branch '10.11' into 11.0 2024-05-12 12:18:28 +02:00
Marko Mäkelä
4aa92911c7 MDEV-33802 Weird read view after ROLLBACK of another transaction
Even after commit b8a6719889 there
is an anomaly where a locking read could return inconsistent results.
If a locking read would have to wait for a record lock, then by the
definition of a read view, the modifications made by the current lock
holder cannot be visible in the read view. This is because the read
view must exclude any transactions that had not been committed at the
time when the read view was created.

lock_rec_convert_impl_to_expl_for_trx(), lock_rec_convert_impl_to_expl():
Return an unsafe-to-dereference pointer to a transaction that holds or
held the lock, or nullptr if the lock was available.

lock_clust_rec_modify_check_and_lock(),
lock_sec_rec_read_check_and_lock(),
lock_clust_rec_read_check_and_lock():
Return DB_RECORD_CHANGED if innodb_strict_isolation=ON and the
lock was being held by another transaction.

The test case, which is based on a bug report by Zhuang Liu,
covers the function lock_sec_rec_read_check_and_lock().

Reviewed by: Vladislav Lesin
2024-04-09 12:50:24 +03:00
Marko Mäkelä
683fbced6b Merge 11.0 into 11.1 2024-03-28 12:15:36 +02:00
Marko Mäkelä
fec2fd6add Merge 10.11 into 11.0 2024-03-28 10:51:36 +02:00