1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00

MDEV-33551: Semi-sync Wait Point AFTER_COMMIT Slow on Workloads with Heavy Concurrency

When using semi-sync replication with
rpl_semi_sync_master_wait_point=AFTER_COMMIT, the performance of the
primary can significantly reduce compared to AFTER_SYNC's
performance for workloads with many concurrent users executing
transactions. This is because all connections on the primary share
the same cond_wait variable/mutex pair, so any time an ACK is
received from a replica, all waiting connections are awoken to check
if the ACK was for itself, which is done in mutual exclusion.

This patch changes this such that the waiting THD will use its own
local condition variable, and the ACK receiver thread only signals
connections which have been ACKed for wakeup. That is, the
THD::LOCK_wakeup_ready condition variable is re-used for this
purpose, and the Active_tranx queue nodes are extended to hold the
waiting thread, so it can be signalled once ACKed.

Additionally:

 1)  Removed part of MDEV-11853 additions, which allowed suspended
connection threads awaiting their semi-sync ACKs to live until their
ACKs had been received. This part, however, wasn't needed.  That is,
all that was needed was for the Ack_thread to survive.  So now the
connection threads are killed during phase 1. Thereby
THD::is_awaiting_semisync_ack, and all its related code was removed.

 2) COND_binlog_send is repurposed to signal on the condition when
Active_tranx is emptied during clear_active_tranx_nodes.

 3) At master shutdown (when waiting for slaves), instead of the
main loop individually waiting for each ACK, await_slave_reply()
(renamed await_all_slave_replies()) just waits once for the
repurposed COND_binlog_send to signal it is empty.

 4) Test rpl_semi_sync_shutdown_await_ack is updates as following:
   4.1) Added test case (adapted from Kristian Nielsen) to ensure
that if a thread awaiting its ACK is killed while SHUTDOWN WAIT FOR
ALL SLAVES is issued, the primary will still wait for the ACK from
the killed thread.
   4.2) As connections which by-passed phase 1 of thread killing no
longer are delayed for kill until phase 2, we can no longer query
yes/no tx after receiving an ACK/timeout. The check for these
variables is removed.
   4.3) Comment descriptions are updated which mention that the
connection is alive; and adjusted to be the Ack_thread.

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
This commit is contained in:
Brandon Nesterenko
2024-02-27 12:11:06 -07:00
parent b8a6719889
commit 75c7c6dc39
12 changed files with 588 additions and 274 deletions

View File

@@ -5320,8 +5320,18 @@ public:
Flag, mutex and condition for a thread to wait for a signal from another
thread.
Currently used to wait for group commit to complete, can also be used for
other purposes.
Currently used to wait for group commit to complete, and COND_wakeup_ready
is used for threads to wait on semi-sync ACKs (though is protected by
Repl_semi_sync_master::LOCK_binlog). Note the following relationships
between these two use-cases when using
rpl_semi_sync_master_wait_point=AFTER_SYNC during group commit:
1) Non-leader threads use COND_wakeup_ready to wait for the leader thread
to complete binlog commit.
2) The leader thread uses COND_wakeup_ready to await ACKs from the
replica before signalling the non-leader threads to wake up.
With wait_point=AFTER_COMMIT, there is no overlap as binlogging has
finished, so COND_wakeup_ready is safe to re-use.
*/
bool wakeup_ready;
mysql_mutex_t LOCK_wakeup_ready;
@@ -5449,14 +5459,6 @@ public:
bool is_binlog_dump_thread();
#endif
/*
Indicates if this thread is suspended due to awaiting an ACK from a
replica. True if suspended, false otherwise.
Note that this variable is protected by Repl_semi_sync_master::LOCK_binlog
*/
bool is_awaiting_semisync_ack;
inline ulong wsrep_binlog_format(ulong binlog_format) const
{
#ifdef WITH_WSREP