When using binlog_row_image=FULL with sequence table inserts, a
replica can deadlock because it treats full inserts in a sequence as DDL
statements by getting an exclusive lock on the sequence table. It
has been observed that with parallel replication, this exclusive
lock on the sequence table can lead to a deadlock where one
transaction has the exclusive lock and is waiting on a prior
transaction to commit, whereas this prior transaction is waiting on
the MDL lock.
This fix for this is on the master side, to raise FL_DDL
flag on the GTID of a full binlog_row_image write of a sequence table.
This forces the slave to execute the statement serially so a deadlock
cannot happen.
A test verifies the deadlock also to prove it happen on the OLD (pre-fixes)
slave.
OLD (buggy master) -replication-> NEW (fixed slave) is provided.
As the pre-fixes master's full row-image may represent both
SELECT NEXT VALUE and INSERT, the parallel slave pessimistically
waits for the prior transaction to have committed before to take on the
critical part of the second (like INSERT in the test) event execution.
The waiting exploits a parallel slave's retry mechanism which is
controlled by `@@global.slave_transaction_retries`.
Note that in order to avoid any persistent 'Deadlock found' 2013 error
in OLD -> NEW, `slave_transaction_retries` may need to be set to a
higher than the default value.
START-SLAVE is an effective work-around if this still happens.
- Description:
- Before 10.3.8 semisync was a plugin that is built into the server with
MDEV-13073,starting with commit cbc71485e2.
There are still some usage of `rpl_semi_sync_master` in mtr.
Note:
- To recognize the replica in the `dump_thread`, replica is creating
local variable `rpl_semi_sync_slave` (the keyword of plugin) in
function `request_transmit`, that is catched by primary in
`is_semi_sync_slave()`. This is the user variable and as such not
related to the obsolete plugin.
- Found in `sys_vars.all_vars` and `rpl_semi_sync_wait_point` tests,
usage of plugins `rpl_semi_sync_master`, `rpl_semi_sync_slave`.
The former test is disabled by default (`sys_vars/disabled.def`)
and marked as `obsolete`, however this patch will remove the queries.
- Add cosmetic fixes to semisync codebase
Reviewer: <brandon.nesterenko@mariadb.com>
Closes PR #2528, PR #2380
The hang could be seen as show slave status displaying an error like
Last_Error: Could not execute Write_rows_v1
along with
Slave_SQL_Running: Yes
accompanied with one of the replication threads in show-processlist
characteristically having status like
2394 | system user | | NULL | Slave_worker | 50852| closing tables
It turns out that closing tables worker got entrapped in endless looping
in mark_start_commit_inner() across already garbage-collected gco items.
The reclaimed gco links are explained with actually possible
out-of-order groups of events termination due to the Last_Error.
This patch reinforces the correct ordering to perform
finish_event_group's cleanup actions, incl unlinking gco:s
from the active list.
One of the constraints added in the MDEV-29639 patch, is that only
the first event after idling should update last_master_timestamp;
and as long as the replica has more events to execute, the variable
should not be updated. The corresponding test,
rpl_delayed_parallel_slave_sbm.test, aims to verify this; however,
if the IO thread takes too long to queue events, the SQL thread can
appear to catch up too fast.
This fix ensures that the relay log has been fully written before
executing the events.
Note that the underlying cause of this test failure needs to be
addressed as a bug-fix, this is a temporary fix to stop test
failures. To track work on the bug-fix for the underlying issue,
please see MDEV-30619.
ANALYZE was observed to race over a preceding in binlog order DML
in updating the binlog and slave gtid states.
Tagging ANALYZE and other admin class commands in binlog by the fixes
of MDEV-17515 left a flaw allowing such race leading to
the gtid mode out-of-order error.
This is fixed now to observe by ADMIN commands the ordered access to
the slave gtid status variables and binlog.
Problem
========
On a parallel, delayed replica, Seconds_Behind_Master will not be
calculated until after MASTER_DELAY seconds have passed and the
event has finished executing, resulting in potentially very large
values of Seconds_Behind_Master (which could be much larger than the
MASTER_DELAY parameter) for the entire duration the event is
delayed. This contradicts the documented MASTER_DELAY behavior,
which specifies how many seconds to withhold replicated events from
execution.
Solution
========
After a parallel replica idles, the first event after idling should
immediately update last_master_timestamp with the time that it began
execution on the primary.
Reviewed By
===========
Andrei Elkin <andrei.elkin@mariadb.com>
The following tests are disabled when running --valgrding without --big:
- rpl.rpl_ssl
- rpl.rpl_semi_sync_event
- All encryption test (which includes have_file_key_management.inc)
Let us disable Valgrind on tests that would fail because a
server shutdown or a STOP SLAVE command would take longer,
causing the test harness to forcibly and silently kill the server
due to an exceeded timeout.
- Added missing information about database of corresponding table for various types of commands
- Update some typos
- Reviewed by: <vicentiu@mariadb.org>
There are separate flags DBUG_OFF for disabling the DBUG facility
and ENABLED_DEBUG_SYNC for enabling the DEBUG_SYNC facility.
Let us allow debug builds without DEBUG_SYNC.
Note: For CMAKE_BUILD_TYPE=Debug, CMakeLists.txt will continue to
define ENABLED_DEBUG_SYNC.
The rpl_row_img_sequence test can fail on resource
constrained buildbot machines due to its high
space consumption. To reduce this footprint, the
test is split into three parts, one for each value
of the binlog_row_img variable.
Problem:
========
Replication can break while applying a query log event if its
respective command errors on the primary, but is ignored by the
replication filter within Grant_tables on the replica. The bug
reported by MDEV-28530 shows this with REVOKE ALL PRIVILEGES using a
non-existent user. The primary will binlog the REVOKE command with
an error code, and the replica will think the command executed with
success because the replication filter will ignore the command while
accessing the Grant_tables classes. When the replica performs an
error check, it sees the difference between the error codes, and
replication breaks.
Solution:
========
If the replication filter check done by Grant_tables logic ignores
the tables, reset thd->slave_expected_error to 0 so that
Query_log_event::do_apply_event() can be made aware that the
underlying query was ignored when it compares errors.
Note that this bug also effects DROP USER if not all users exist
in the provided list, and the patch fixes and tests this case.
Reviewed By:
============
andrei.elkin@mariadb.com
Problem:
========
When replicating SET DEFAULT ROLE, the pre-update check (i.e. that
in set_var_default_role::check()) tries to validate the existence of
the given rules/user even when the targeted tables are ignored. When
previously issued CREATE USER/ROLE commands are ignored by the
replica because of the replication filtering rules, this results in
an error because the targeted data does not exist.
Solution:
========
Before checking that the given roles/user exist of a SET DEFAULT
ROLE command, first ensure that the mysql.user and
mysql.roles_mapping tables are not excluded by replication filters.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
Sergei Golubchik <serg@mariadb.com>
Problem:
=======
This patch addresses two issues:
1. An incident event can be incorrectly reported for transactions
which are rolled back successfully. That is, an incident event
should only be generated for failed “non-transactional transactions”
(i.e., those which modify non-transactional tables) because they
cannot be rolled back.
2. When the mariadb slave (error) stops at receiving the incident
event there's no description of what led to it. Neither in the event
nor in the master's error log.
Solution:
========
Before reporting an incident event for a transaction, first validate
that it is “non-transactional” (i.e. cannot be safely rolled back).
To determine if a transaction is non-transactional,
lex->stmt_accessed_table(LEX::STMT_WRITES_NON_TRANS_TABLE)
is used because it is set previously in
THD::decide_logging_format().
Additionally, when an incident event is written, write an error
message to the server’s error log to indicate the underlying issue.
Reviewed by:
===========
Andrei Elkin <andrei.elkin@mariadb.com>
Problem:
========
When using sequences, the function
sequence_definition::write(TABLE *table, bool all_fields)
is used to save DML/DDL updates to sequence tables (e.g. nextval,
setval, and alter). Prior to this patch, the value all_fields was
always false when invoked via nextval and setval, which forced the
bitmap to only include changed columns.
Solution:
========
Change all_fields when invoked via nextval and setval to be reliant
on binlog_row_image, such that it is false when binlog_row_image is
MINIMAL, and true otherwise.
Reviewed By:
===========
Andrei Elkin <andrei.elkin@mariadb.com>
If a slave received a fake GLLE event after a GTID event
it would terminate the group. This adds a test for the
previous commit which fixed this issue (939672a).
Review by Andrei Elkin <andrei.elkin@mariadb.com>
MDEV-21810 MBR: Unexpected "Unsafe statement" warning for unsafe IODKU
MDEV-17614 fixes to replication unsafety for INSERT ON DUP KEY UPDATE
on two or more unique key table left a flaw. The fixes checked the
safety condition per each inserted record with the idea to catch a user-created
value to an autoincrement column and when that succeeds the autoincrement column
would become the source of unsafety too.
It was not expected that after a duplicate error the next record's
write_set may become different and the unsafe decision for that
specific record will be computed to screw the Query's binlogging
state and when @@binlog_format is MIXED nothing gets bin-logged.
This case has been already fixed in 10.5.2 by 91ab42a823 that
relocated/optimized THD::decide_logging_format_low() out of the record insert
loop. The safety decision is computed once and at the right time.
Pertinent parts of the commit are cherry-picked.
Also a spurious warning about unsafety is removed when MIXED
@@binlog_format; original MDEV-17614 test result corrected.
The original test of MDEV-17614 is extended and made more readable.
Problem:
========
During mysqld initialization, if the number of GTIDs added since
that last purge of the mysql.gtid_slave_pos tables is greater than
or equal to the –-gtid-cleanup-batch-size value, a race condition
can occur. Specifically, the binlog background thread will submit
the bg_gtid_delete_pending job to the mysql handle manager; however,
the mysql handle manager may not be initialized, leading to crashes.
Solution:
========
Force the mysql handle manager to initialize/start before the binlog
background thread is created.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
In cases of a faulty master or an incorrect binlog event producer, that slave is working with,
sends an incomplete group of events slave must react with an error to not to log
into the relay-log any new events that do not belong to the incomplete group.
Fixed with extending received event properties check when slave connects to master
in gtid mode.
Specifically for the event that can be a part of a group its relay-logging is
permitted only when its position within the group is validated.
Otherwise slave IO thread stops with ER_SLAVE_RELAY_LOG_WRITE_FAILURE.
Problem:
========
If a primary is shutdown during an active semi-sync connection
during the period when the primary is awaiting an ACK, the primary
hard kills the active communication thread and does not ensure the
transaction was received by a replica. This can lead to an
inconsistent replication state.
Solution:
========
During shutdown, the primary should wait for an ACK or timeout
before hard killing a thread which is awaiting a communication. We
extend the `SHUTDOWN WAIT FOR SLAVES` logic to identify and ignore
any threads waiting for a semi-sync ACK in phase 1. Then, before
stopping the ack receiver thread, the shutdown is delayed until all
waiting semi-sync connections receive an ACK or time out. The
connections are then killed in phase 2.
Notes:
1) There remains an unresolved corner case that affects this
patch. MDEV-28141: Slave crashes with Packets out of order when
connecting to a shutting down master. Specifically, If a slave is
connecting to a master which is actively shutting down, the slave
can crash with a "Packets out of order" assertion error. To get
around this issue in the MTR tests, the primary will wait a small
amount of time before phase 1 killing threads to let the replicas
safely stop (if applicable).
2) This patch also fixes MDEV-28114: Semi-sync Master ACK Receiver
Thread Can Error on COM_QUIT
Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>
rpl.rpl_semi_sync_slave_compressed_protocol.test was manually
re-enabled only in 10.3 but left disabled in 10.4+. The fix went
into 10.3+, but the test was left disabled in later versions. This
commit re-enables the test in 10.4+.