1
0
mirror of https://github.com/MariaDB/server.git synced 2025-07-30 16:24:05 +03:00

MDEV-11853: semisync thread can be killed after sync binlog but before ACK in the sync state

Problem:
========
If a primary is shutdown during an active semi-sync connection
during the period when the primary is awaiting an ACK, the primary
hard kills the active communication thread and does not ensure the
transaction was received by a replica. This can lead to an
inconsistent replication state.

Solution:
========
During shutdown, the primary should wait for an ACK or timeout
before hard killing a thread which is awaiting a communication. We
extend the `SHUTDOWN WAIT FOR SLAVES` logic to identify and ignore
any threads waiting for a semi-sync ACK in phase 1. Then, before
stopping the ack receiver thread, the shutdown is delayed until all
waiting semi-sync connections receive an ACK or time out. The
connections are then killed in phase 2.

Notes:
 1) There remains an unresolved corner case that affects this
patch. MDEV-28141: Slave crashes with Packets out of order when
connecting to a shutting down master. Specifically, If a slave is
connecting to a master which is actively shutting down, the slave
can crash with a "Packets out of order" assertion error. To get
around this issue in the MTR tests, the primary will wait a small
amount of time before phase 1 killing threads to let the replicas
safely stop (if applicable).
 2) This patch also fixes MDEV-28114: Semi-sync Master ACK Receiver
Thread Can Error on COM_QUIT

Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>
This commit is contained in:
Brandon Nesterenko
2021-10-20 20:13:45 -06:00
parent 807945f2eb
commit a83c7ab1ea
13 changed files with 1091 additions and 22 deletions

View File

@ -2453,6 +2453,8 @@ int run_plugin_auth(MYSQL *mysql, char *data, uint data_len,
mpvio.db= db;
mpvio.plugin= auth_plugin;
DBUG_EXECUTE_IF("client_delay_run_plugin_auth", my_sleep(400000););
res= auth_plugin->authenticate_user((struct st_plugin_vio *)&mpvio, mysql);
DBUG_PRINT ("info", ("authenticate_user returned %s",
res == CR_OK ? "CR_OK" :