mirror of
https://github.com/MariaDB/server.git
synced 2025-08-08 11:22:35 +03:00
MDEV-7888, MDEV-7929: Parallel replication hangs sometimes on ANALYZE TABLE or DDL
The hangs occur when the group_commit_orderer object is freed before the last mark_start_commit() call on it - this loses the wakeup to other waiting worker threads, causing them to hang until killed manually. The object was freed because wakeup_subsequent_commits() was called two early in two places. For MDEV-7888, during ANALYZE TABLE, and for MDEV-7929 during record_gtid() after processing a DDL event. The group_commit_orderer object can be freed when its last transaction has called wait_for_prior_commit(). Fix by implementing a suspend/resume mechanism for wakeup_subsequent_commits() that can be used in places where a transaction is committed without this being the commit of the actual replication event group. Also add a protection mechanism (that asserts in debug builds) which can prevent the too-early free and hang if other similar bugs should remain in other parts of the code.
This commit is contained in:
@@ -171,8 +171,24 @@ finish_event_group(rpl_parallel_thread *rpt, uint64 sub_id,
|
||||
/* Now free any GCOs in which all transactions have committed. */
|
||||
group_commit_orderer *tmp_gco= rgi->gco;
|
||||
while (tmp_gco &&
|
||||
(!tmp_gco->next_gco || tmp_gco->last_sub_id > sub_id))
|
||||
(!tmp_gco->next_gco || tmp_gco->last_sub_id > sub_id ||
|
||||
tmp_gco->next_gco->wait_count > entry->count_committing_event_groups))
|
||||
{
|
||||
/*
|
||||
We must not free a GCO before the wait_count of the following GCO has
|
||||
been reached and wakeup has been sent. Otherwise we will lose the
|
||||
wakeup and hang (there were several such bugs in the past).
|
||||
|
||||
The intention is that this is ensured already since we only free when
|
||||
the last event group in the GCO has committed
|
||||
(tmp_gco->last_sub_id <= sub_id). However, if we have a bug, we have
|
||||
extra check on next_gco->wait_count to hopefully avoid hanging; we
|
||||
have here an assertion in debug builds that this check does not in
|
||||
fact trigger.
|
||||
*/
|
||||
DBUG_ASSERT(!tmp_gco->next_gco || tmp_gco->last_sub_id > sub_id);
|
||||
tmp_gco= tmp_gco->prev_gco;
|
||||
}
|
||||
while (tmp_gco)
|
||||
{
|
||||
group_commit_orderer *prev_gco= tmp_gco->prev_gco;
|
||||
|
Reference in New Issue
Block a user