If a slave received a fake GLLE event after a GTID event
it would terminate the group. This adds a test for the
previous commit which fixed this issue (939672a).
Review by Andrei Elkin <andrei.elkin@mariadb.com>
MDEV-21810 MBR: Unexpected "Unsafe statement" warning for unsafe IODKU
MDEV-17614 fixes to replication unsafety for INSERT ON DUP KEY UPDATE
on two or more unique key table left a flaw. The fixes checked the
safety condition per each inserted record with the idea to catch a user-created
value to an autoincrement column and when that succeeds the autoincrement column
would become the source of unsafety too.
It was not expected that after a duplicate error the next record's
write_set may become different and the unsafe decision for that
specific record will be computed to screw the Query's binlogging
state and when @@binlog_format is MIXED nothing gets bin-logged.
This case has been already fixed in 10.5.2 by 91ab42a823 that
relocated/optimized THD::decide_logging_format_low() out of the record insert
loop. The safety decision is computed once and at the right time.
Pertinent parts of the commit are cherry-picked.
Also a spurious warning about unsafety is removed when MIXED
@@binlog_format; original MDEV-17614 test result corrected.
The original test of MDEV-17614 is extended and made more readable.
Problem:
========
During mysqld initialization, if the number of GTIDs added since
that last purge of the mysql.gtid_slave_pos tables is greater than
or equal to the –-gtid-cleanup-batch-size value, a race condition
can occur. Specifically, the binlog background thread will submit
the bg_gtid_delete_pending job to the mysql handle manager; however,
the mysql handle manager may not be initialized, leading to crashes.
Solution:
========
Force the mysql handle manager to initialize/start before the binlog
background thread is created.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
In cases of a faulty master or an incorrect binlog event producer, that slave is working with,
sends an incomplete group of events slave must react with an error to not to log
into the relay-log any new events that do not belong to the incomplete group.
Fixed with extending received event properties check when slave connects to master
in gtid mode.
Specifically for the event that can be a part of a group its relay-logging is
permitted only when its position within the group is validated.
Otherwise slave IO thread stops with ER_SLAVE_RELAY_LOG_WRITE_FAILURE.
Problem:
========
If a primary is shutdown during an active semi-sync connection
during the period when the primary is awaiting an ACK, the primary
hard kills the active communication thread and does not ensure the
transaction was received by a replica. This can lead to an
inconsistent replication state.
Solution:
========
During shutdown, the primary should wait for an ACK or timeout
before hard killing a thread which is awaiting a communication. We
extend the `SHUTDOWN WAIT FOR SLAVES` logic to identify and ignore
any threads waiting for a semi-sync ACK in phase 1. Then, before
stopping the ack receiver thread, the shutdown is delayed until all
waiting semi-sync connections receive an ACK or time out. The
connections are then killed in phase 2.
Notes:
1) There remains an unresolved corner case that affects this
patch. MDEV-28141: Slave crashes with Packets out of order when
connecting to a shutting down master. Specifically, If a slave is
connecting to a master which is actively shutting down, the slave
can crash with a "Packets out of order" assertion error. To get
around this issue in the MTR tests, the primary will wait a small
amount of time before phase 1 killing threads to let the replicas
safely stop (if applicable).
2) This patch also fixes MDEV-28114: Semi-sync Master ACK Receiver
Thread Can Error on COM_QUIT
Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>
rpl.rpl_semi_sync_slave_compressed_protocol.test was manually
re-enabled only in 10.3 but left disabled in 10.4+. The fix went
into 10.3+, but the test was left disabled in later versions. This
commit re-enables the test in 10.4+.
Problem: In regular replication, when master binlogged using statement format
slave might not have written an event to its binary log when the Query
event aimed at a temporary table.
Specifically this was observed with LOAD DATA INFILE.
This effect was possible because unlike master slave holds temporary
tables in its pool and the master side check of existence of a
temporary table at the format bin-logging decision did not apply.
Solution: replace THD::has_thd_temporary_tables() with
THD::has_temporary_tables which allows to identify temporary table
presence on either side.
--
Reviewed by Andrei Elkin.
MDEV-21117 had to relax own events acceptance condition for a case
when a former semisync master server recovers after crash as the
semisync slave. That however admitted a possibility for endless event
"orbiting" in the non-strict slave gtid mode of semisync circular
setup.
The same server-id event termination is restored now for
the non-strict gtid mode to follow regular rules (that is it's ignored
unless @@global.replicate_same_server_id allows it in).
To address MDEV-21117 recovery agenda,
in the strict gtid mode and the transaction's gtid ordered strictly
greater than the current slave gtid state, the same server-id
transaction is accepted.
The gtid strict mode is safe to accept transactions even if
the slave state were not set correct by the user, e.g
at the former master.
An added test shows a typical out-of-order error at execution so
no data corruption is guaranteed in such a case.
DEBUG_SYNC signals can get lost in certain tests due to later
DEBUG_SYNC commands overwriting them. This patch addresses
these issues in three tests: main.query_cache_debug,
main.partition_debug_sync, and
rpl.rpl_dump_request_retry_warning.
Additionally, main.partition_debug_sync needed changes to the
result file (the others did not). The synchronization happened
between two commands, one based on ALTER, the other on DROP.
A new thread/connection was needed to synchronize the DEBUG_SYNC
actions between these commands, thereby changing the result file.
Additional comments were added for clarification.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
The rpl.rpl_seconds_behind_master_spike test would sometimes
timeout or take a very long time to complete. This happened
because an MTR DEBUG_SYNC signal would be lost due to a
subsequent call to RESET. I.e., the slave SQL thread would
be paused due to the WAIT_FOR signal being lost, resulting in
either a failed test if the `select master_pos_wait` timeout
occurs first, or a very long run-time if the DBUG_SYNC timeout
occurs first.
The fix ensures that the MTR signal is processed by the slave
SQL thread before issuing the call to RESET
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
This patch addresses two problems with
rpl.rpl_seconds_behind_master_spike
First, --sync_slave_with_master / select master_pos_wait
seems to have a bug where it will hang after all master
events have been executed.
This patch removes the sync_slave_with_master command from
the test, where it not required anyway as it is used to
declare explicit cleanup
Second, the test uses timestamps to ensure that the
Seconds_Behind_Master value does not point to a time too
far in the past. The checks of these timestamps were
too strict, because they could be slightly inconsistent
with the master and the SBM would be counted as invalid
when it was actually correct.
To fix this, a slight buffer was added to the check
to ensure the value is valid but still does not point
too far in the past
Reviewed By:
===========
Andrei Elkin <andrei.elkin@mariadb.com>
The 10.5 version of the patch.
Removing DEFAULT from INFORMATION_SCHEMA columns.
DEFAULT in read-only tables is rather meaningless.
Upgrade should go smoothly.
Also fixes:
MDEV-20254 Problems with EMPTY_STRING_IS_NULL and I_S tables
Removing DEFAULT from INFORMATION_SCHEMA columns.
DEFAULT in read-only tables is rather meaningless.
Upgrade should go smoothly.
Also fixes:
MDEV-20254 Problems with EMPTY_STRING_IS_NULL and I_S tables
Problem:
========
A slave’s relay log format description event is used when
calculating Seconds_Behind_Master (SBM). This forces the SBM
value to spike when processing these events, as their creation
date is set to the timestamp that the IO thread begins.
Solution:
========
When the slave generates a format description event, mark the
event as a relay log event so it does not update the
rli->last_master_timestamp variable.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>