1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00

MDEV-7888, MDEV-7929: Parallel replication hangs sometimes on ANALYZE TABLE or DDL

The hangs occur when the group_commit_orderer object is freed before the last
mark_start_commit() call on it - this loses the wakeup to other waiting worker
threads, causing them to hang until killed manually.

The object was freed because wakeup_subsequent_commits() was called two early
in two places. For MDEV-7888, during ANALYZE TABLE, and for MDEV-7929 during
record_gtid() after processing a DDL event. The group_commit_orderer object
can be freed when its last transaction has called wait_for_prior_commit().

Fix by implementing a suspend/resume mechanism for wakeup_subsequent_commits()
that can be used in places where a transaction is committed without this being
the commit of the actual replication event group.

Also add a protection mechanism (that asserts in debug builds) which can
prevent the too-early free and hang if other similar bugs should remain in
other parts of the code.
This commit is contained in:
Kristian Nielsen
2015-04-08 11:01:18 +02:00
parent 880f2273fd
commit 3b961347db
7 changed files with 231 additions and 3 deletions

View File

@@ -171,8 +171,24 @@ finish_event_group(rpl_parallel_thread *rpt, uint64 sub_id,
/* Now free any GCOs in which all transactions have committed. */
group_commit_orderer *tmp_gco= rgi->gco;
while (tmp_gco &&
(!tmp_gco->next_gco || tmp_gco->last_sub_id > sub_id))
(!tmp_gco->next_gco || tmp_gco->last_sub_id > sub_id ||
tmp_gco->next_gco->wait_count > entry->count_committing_event_groups))
{
/*
We must not free a GCO before the wait_count of the following GCO has
been reached and wakeup has been sent. Otherwise we will lose the
wakeup and hang (there were several such bugs in the past).
The intention is that this is ensured already since we only free when
the last event group in the GCO has committed
(tmp_gco->last_sub_id <= sub_id). However, if we have a bug, we have
extra check on next_gco->wait_count to hopefully avoid hanging; we
have here an assertion in debug builds that this check does not in
fact trigger.
*/
DBUG_ASSERT(!tmp_gco->next_gco || tmp_gco->last_sub_id > sub_id);
tmp_gco= tmp_gco->prev_gco;
}
while (tmp_gco)
{
group_commit_orderer *prev_gco= tmp_gco->prev_gco;