1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00

MDEV-29621: Replica stopped by locks on sequence

When using binlog_row_image=FULL with sequence table inserts, a
replica can deadlock because it treats full inserts in a sequence as DDL
statements by getting an exclusive lock on the sequence table. It
has been observed that with parallel replication, this exclusive
lock on the sequence table can lead to a deadlock where one
transaction has the exclusive lock and is waiting on a prior
transaction to commit, whereas this prior transaction is waiting on
the MDL lock.

This fix for this is on the master side, to raise FL_DDL
flag on the GTID of a full binlog_row_image write of a sequence table.
This forces the slave to execute the statement serially so a deadlock
cannot happen.

A test verifies the deadlock also to prove it happen on the OLD (pre-fixes)
slave.

OLD (buggy master) -replication-> NEW (fixed slave) is provided.
As the pre-fixes master's full row-image may represent both
SELECT NEXT VALUE and INSERT, the parallel slave pessimistically
waits for the prior transaction to have committed before to take on the
critical part of the second (like INSERT in the test) event execution.
The waiting exploits a parallel slave's retry mechanism which is
controlled by `@@global.slave_transaction_retries`.

Note that in order to avoid any persistent 'Deadlock found' 2013 error
in OLD -> NEW, `slave_transaction_retries` may need to be set to a
higher than the default value.
START-SLAVE is an effective work-around if this still happens.
This commit is contained in:
Andrei
2023-03-18 21:11:07 +02:00
parent 7e75f94ba1
commit 55a53949be
8 changed files with 190 additions and 13 deletions

View File

@@ -0,0 +1,81 @@
--source include/have_innodb.inc
--source include/have_debug.inc
--source include/have_debug_sync.inc
--source include/have_binlog_format_row.inc
--source include/master-slave.inc
--connection slave
--source include/stop_slave.inc
ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
--echo # MDEV-29621 the sequence engine binlog_row_image-full events
--echo # MDL-deadlock on the parallel slave.
--connection master
CREATE SEQUENCE s1;
SET @@session.binlog_row_image=FULL;
SET @@session.debug_dbug="+d,binlog_force_commit_id";
SET @commit_id=7;
SET @@gtid_seq_no=100;
SELECT NEXT VALUE FOR s1;
INSERT INTO s1 VALUES(2, 1, 10, 1, 2, 1, 1, 0);
SET @@session.debug_dbug="";
--connection slave
--let $slave_parallel_threads=`select @@global.slave_parallel_threads`
--let $slave_parallel_mode=`select @@global.slave_parallel_mode`
SET @@global.slave_parallel_threads=2;
SET @@global.slave_parallel_mode=optimistic;
SET @@global.debug_dbug="+d,hold_worker_on_schedule";
--source include/start_slave.inc
--let $wait_condition= SELECT count(*) = 1 FROM information_schema.processlist WHERE state LIKE "Waiting for prior transaction to start commit before starting%"
--source include/wait_condition.inc
SET DEBUG_SYNC = 'now SIGNAL continue_worker';
--connection master
DROP SEQUENCE s1;
--sync_slave_with_master
--source include/stop_slave.inc
--echo # Simulate buggy 10.3.36 master to prove the parallel applier
--echo # does not deadlock now at replaying the above master load.
--connection master
--let $datadir= `SELECT @@datadir`
--let $rpl_server_number= 1
--source include/rpl_stop_server.inc
--remove_file $datadir/master-bin.000001
--copy_file $MYSQL_TEST_DIR/std_data/rpl/master-bin-seq_10.3.36.000001 $datadir/master-bin.000001
--let $rpl_server_number= 1
--source include/rpl_start_server.inc
--source include/wait_until_connected_again.inc
--save_master_pos
--connection slave
RESET MASTER;
SET @@global.gtid_slave_pos="";
--replace_result $SERVER_MYPORT_1 SERVER_MYPORT_1
eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$SERVER_MYPORT_1, master_user='root', master_use_gtid=slave_pos;
START SLAVE UNTIL MASTER_GTID_POS='0-1-102';
--let $wait_condition= SELECT count(*) = 1 FROM information_schema.processlist WHERE state LIKE "Waiting for prior transaction to commit"
--source include/wait_condition.inc
SET DEBUG_SYNC = 'now SIGNAL continue_worker';
#
# MDEV-29621 clean up.
#
--source include/wait_for_slave_to_stop.inc
SET debug_sync = RESET;
--eval SET @@global.slave_parallel_threads= $slave_parallel_threads
--eval SET @@global.slave_parallel_mode= $slave_parallel_mode
SET @@global.debug_dbug = "";
--source include/start_slave.inc
--source include/rpl_end.inc