1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-01 03:47:19 +03:00

MDEV-24302: RESET MASTER hangs

Starting with MariaDB 10.5, roughly after MDEV-23855 was fixed,
we are observing sporadic hangs during the execution of the
RESET MASTER statement. We are hoping to fix the hangs with these
changes, but due to the rather infrequent occurrence of the hangs
and our inability to reliably reproduce the hangs, we cannot be
sure of this.

What we do know is that innodb_force_recovery=2 (or a larger setting)
will prevent srv_master_callback (the former srv_master_thread) from
running. In that mode, periodic log flushes would never occur and
RESET MASTER could hang indefinitely. That is demonstrated by the new
test case that was developed by Andrei Elkin. We fix this case by
implementing a special case for it.

This also includes some code cleanup and renames of misleadingly
named code. The interface has nothing to do with log checkpoints in
the storage engine; it is only about requesting log writes to be
persistent.

handlerton::commit_checkpoint_request,
commit_checkpoint_notify_ha(): Remove the unused parameter hton.

log_requests.start: Replaces pending_checkpoint_list.
log_requests.end: Replaces pending_checkpoint_list_end.
log_requests.mutex: Replaces pending_checkpoint_mutex.

log_flush_notify_and_unlock(), log_flush_notify(): Replaces
innobase_mysql_log_notify().  The new implementation should be
functionally equivalent to the old one.

innodb_log_flush_request(): Replaces innobase_checkpoint_request().
Implement a fast path for common cases, and reduce the mutex hold time.
POSSIBLE FIX OF THE HANG: We will invoke commit_checkpoint_notify_ha()
for the current request if it is already satisfied, as well as invoke
log_flush_notify_and_unlock() for any satisfied requests.

log_write(): Invoke log_flush_notify() when the write is already durable.
This was missing WITH_PMEM when the log is in persistent memory.

Reviewed by: Vladislav Vaintroub
This commit is contained in:
Marko Mäkelä
2021-03-29 15:16:23 +03:00
parent 8e2d69f7b8
commit e8b7fceb82
9 changed files with 175 additions and 164 deletions

View File

@ -1471,7 +1471,7 @@ struct handlerton
recovery. It uses that to reduce the work needed for any subsequent XA
recovery process.
*/
void (*commit_checkpoint_request)(handlerton *hton, void *cookie);
void (*commit_checkpoint_request)(void *cookie);
/*
"Disable or enable checkpointing internal to the storage engine. This is
used for FLUSH TABLES WITH READ LOCK AND DISABLE CHECKPOINT to ensure that
@ -5211,7 +5211,7 @@ void trans_register_ha(THD *thd, bool all, handlerton *ht,
const char *get_canonical_filename(handler *file, const char *path,
char *tmp_path);
void commit_checkpoint_notify_ha(handlerton *hton, void *cookie);
void commit_checkpoint_notify_ha(void *cookie);
inline const LEX_CSTRING *table_case_name(HA_CREATE_INFO *info, const LEX_CSTRING *name)
{