1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00

MDEV-31448: Killing a replica thread awaiting its GCO can hang/crash a parallel replica

The problem was an incorrect unmark_start_commit() in
signal_error_to_sql_driver_thread(). If an event group gets an error, this
unmark could run after the following GCO started, and the subsequent
re-marking could access de-allocated GCO.

The offending unmark_start_commit() looks obviously incorrect, and the fix
is to just remove it. It was introduced in the MDEV-8302 patch, the commit
message of which suggests it was added there solely to satisfy an assertion
in ha_rollback_trans(). So update this assertion instead to not trigger for
event groups that experienced an error (rgi->worker_error). When an error
occurs in an event group, all following event groups are skipped anyway, so
the unmark should never be needed in this case.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This commit is contained in:
Kristian Nielsen
2023-06-11 17:44:58 +02:00
parent 60bec1d54d
commit a8ea6627a4
3 changed files with 15 additions and 7 deletions

View File

@@ -91,6 +91,10 @@ struct group_commit_orderer {
};
uint8 flags;
#ifndef DBUG_OFF
/*
Flag set when the GCO has been freed and entered the free list, to catch
(in debug) errors in the complex lifetime of this object.
*/
bool gc_done;
#endif
};