1
0
mirror of https://github.com/MariaDB/server.git synced 2025-07-29 05:21:33 +03:00

MDEV-31448: Killing a replica thread awaiting its GCO can hang/crash a parallel replica

The problem is that when a worker thread is (user) killed in
wait_for_prior_commit, the event group may complete out-of-order since the
wait for prior commit was aborted by the kill.

This fix ensures that event groups will always complete in-order, even
in the error case. This is done in finish_event_group() by doing an
extra wait_for_prior_commit(), if necessary, that ignores kills.

This fix supersedes the fix for MDEV-30780, so the earlier fix for
that is reverted in this patch.

Also fix that an error from wait_for_prior_commit() inside
finish_event_group() would not signal the error to
wakeup_subsequent_commits().

Based on earlier work by Brandon Nesterenko and Andrei Elkin, with
some changes to simplify the semantics of wait_for_prior_commit() and
make the code more robust to future changes.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This commit is contained in:
Kristian Nielsen
2023-06-15 21:35:53 +02:00
parent a8ea6627a4
commit 5d61442c85
3 changed files with 61 additions and 19 deletions

View File

@ -7635,11 +7635,18 @@ wait_for_commit::register_wait_for_prior_commit(wait_for_commit *waitee)
with register_wait_for_prior_commit(). If the commit already completed,
returns immediately.
If ALLOW_KILL is set to true (the default), the wait can be aborted by a
kill. In case of kill, the wait registration is still removed, so another
call of unregister_wait_for_prior_commit() is needed to later retry the
wait. If ALLOW_KILL is set to false, then kill will be ignored and this
function will not return until the prior commit (if any) has called
wakeup_subsequent_commits().
If thd->backup_commit_lock is set, release it while waiting for other threads
*/
int
wait_for_commit::wait_for_prior_commit2(THD *thd)
wait_for_commit::wait_for_prior_commit2(THD *thd, bool allow_kill)
{
PSI_stage_info old_stage;
wait_for_commit *loc_waitee;
@ -7664,7 +7671,7 @@ wait_for_commit::wait_for_prior_commit2(THD *thd)
&stage_waiting_for_prior_transaction_to_commit,
&old_stage);
while ((loc_waitee= this->waitee.load(std::memory_order_relaxed)) &&
likely(!thd->check_killed(1)))
(!allow_kill || likely(!thd->check_killed(1))))
mysql_cond_wait(&COND_wait_commit, &LOCK_wait_commit);
if (!loc_waitee)
{