Since MDEV-31503 fixes ALTER-SEQUENCE might be able to complete its
work including binlogging before a preceding in binlog-order
transaction. There may not be data dependency between the two
transactions, but there would be
"an attempt was made to binlog GTID D-S-XYZ which would create an
out-of-order sequence number"
error in the gtid_strict_mode = ON.
After the preceding transaction started committing, and does it rather
slow, ALTER-SEQUNCE was able to find a time window to complete because
it temporarily releases its link with the waitee parent transaction.
And while having it released it also erroneously executes the binlogging part.
Fixed with restoring the commit dependency on the parent before
ALTER-SEQUNCE takes on binlogging.
Updated tests: cases with bugs or which cannot be run
with the cursor-protocol were excluded with
"--disable_cursor_protocol"/"--enable_cursor_protocol"
Fix for v.10.5
The crash at running mysqlbinlog on a SEQUENCE containing binlog file
was caused MDEV-29621 fixes that did not check which of the slave
or binlog applier executes a block introduced there.
The block is meaningful only for the parallel slave applier, so
it's safe to fix this bug with identified the actual applier and
skipping the block when it's the mysqlbinlog one.
The assert's reason was in missed FL_DDL flagging of CREATE-or-REPLACE
Query event.
MDEV-27365 fixes covered only the non-pre-existing table execution branch so
did not see a possibility of implicit commit in
the middle of execution in a rollback branch when the being CREATEd
sequence table is actually replaced.
The pre-existing table branch cleared the DDL modification
flag so the query lost FL_DDL in binlog and its parallel execution
on slave may have ended up with the assert to indicate the query
is raced by a following in binlog order event.
Fixed with applying the MDEV-27365 pattern.
An mtr test is added to cover the rollback situation.
The description test [ pass ] with a generous number of mtr parallel
reties.
This patch adds for "--ps-protocol" second execution
of queries "SELECT".
Also in this patch it is added ability to disable/enable
(--disable_ps2_protocol/--enable_ps2_protocol) second
execution for "--ps-prototocol" in testcases.
MDEV-31503 ALTER SEQUENCE ends up in optimistic parallel slave binlog out-of-order
The OOO error still was possible even after MDEV-31077. This time
it occured through open_table() when the sequence table was not in
the table cache *and* the table was created before the last server
restart.
In such context a internal (read-only) transaction is committed
and it was not blocked from doing a wakeup() call to subsequent
transactions.
Fixed with extending suspend_subsequent_commits() effect for the entirety
of Sql_cmd_alter_sequence::execute().
An elaborated MDEV-31077 test proves the fixes of both failure scenarios.
Also the bug condition suggests a workaround to pre-SELECT sequence
tables before START SLAVE.
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
1. log_event.cc stuff should go into log_event_server.cc
2. the test's wait condition is textually different in 10.5, fixed.
3. pre-exec 'optimistic' global var value is correct for 10.5 indeed.
When using binlog_row_image=FULL with sequence table inserts, a
replica can deadlock because it treats full inserts in a sequence as DDL
statements by getting an exclusive lock on the sequence table. It
has been observed that with parallel replication, this exclusive
lock on the sequence table can lead to a deadlock where one
transaction has the exclusive lock and is waiting on a prior
transaction to commit, whereas this prior transaction is waiting on
the MDL lock.
This fix for this is on the master side, to raise FL_DDL
flag on the GTID of a full binlog_row_image write of a sequence table.
This forces the slave to execute the statement serially so a deadlock
cannot happen.
A test verifies the deadlock also to prove it happen on the OLD (pre-fixes)
slave.
OLD (buggy master) -replication-> NEW (fixed slave) is provided.
As the pre-fixes master's full row-image may represent both
SELECT NEXT VALUE and INSERT, the parallel slave pessimistically
waits for the prior transaction to have committed before to take on the
critical part of the second (like INSERT in the test) event execution.
The waiting exploits a parallel slave's retry mechanism which is
controlled by `@@global.slave_transaction_retries`.
Note that in order to avoid any persistent 'Deadlock found' 2013 error
in OLD -> NEW, `slave_transaction_retries` may need to be set to a
higher than the default value.
START-SLAVE is an effective work-around if this still happens.