mirror of
https://github.com/MariaDB/server.git
synced 2025-07-18 23:03:28 +03:00
MDEV-10653: SHOW SLAVE STATUS Can Deadlock an Errored Slave
AKA rpl.rpl_parallel, binlog_encryption.rpl_parallel fails in buildbot with timeout in include A replication parallel worker thread can deadlock with another connection running SHOW SLAVE STATUS. That is, if the replication worker thread is in do_gco_wait() and is killed, it will already hold the LOCK_parallel_entry, and during error reporting, try to grab the err_lock. SHOW SLAVE STATUS, however, grabs these locks in reverse order. It will initially grab the err_lock, and then try to grab LOCK_parallel_entry. This leads to a deadlock when both threads have grabbed their first lock without the second. This patch implements the MDEV-31894 proposed fix to optimize the workers_idle() check to compare the last in-use relay log’s queued_count==dequeued_count for idleness. This removes the need for workers_idle() to grab LOCK_parallel_entry, as these values are atomically updated. Huge thanks to Kristian Nielsen for diagnosing the problem! Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>
This commit is contained in:
@ -369,9 +369,10 @@ struct rpl_parallel {
|
||||
rpl_parallel_entry *find(uint32 domain_id);
|
||||
void wait_for_done(THD *thd, Relay_log_info *rli);
|
||||
void stop_during_until();
|
||||
bool workers_idle();
|
||||
int wait_for_workers_idle(THD *thd);
|
||||
int do_event(rpl_group_info *serial_rgi, Log_event *ev, ulonglong event_size);
|
||||
|
||||
static bool workers_idle(Relay_log_info *rli);
|
||||
};
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user