1
0
mirror of https://github.com/MariaDB/server.git synced 2025-07-30 16:24:05 +03:00

MDEV-30260: Slave crashed:reload_acl_and_cache during shutdown

The signal handler thread can use various different runtime
resources when processing a SIGHUP (e.g. master-info information)
due to calling into reload_acl_and_cache(). Currently, the shutdown
process waits for the termination of the signal thread after
performing cleanup. However, this could cause resources actively
used by the signal handler to be freed while reload_acl_and_cache()
is processing.

The specific resource that caused MDEV-30260 is a race condition for
the hostname_cache, such that mysqld would delete it in
clean_up()::hostname_cache_free(), before the signal handler would
use it in reload_acl_and_cache()::hostname_cache_refresh().

Another similar resource is the active_mi/master_info_index. There
was a race between its deletion by the main thread in end_slave(),
and their usage by the Signal Handler as a part of
Master_info_index::flush_all_relay_logs.read(active_mi) in
reload_acl_and_cache().

This patch fixes these race conditions by relocating where server
shutdown waits for the signal handler to die until after
server-level threads have been killed (i.e., as a last step of
close_connections()). With respect to the hostname_cache, active_mi
and master_info_cache, this ensures that they cannot be destroyed
while the signal handler is still active, and potentially using
them.

Additionally:

 1) This requires that Events memory is still in place for SIGHUP
handling's mysql_print_status(). So event deinitialization is moved
into clean_up(), but the event scheduler still needs to be stopped
in close_connections() at the same spot.

 2) The function kill_server_thread is no longer used, so it is
deleted

 3) The timeout to wait for the death of the signal thread was not
consistent with the comment. The comment mentioned up to 10 seconds,
whereas it was actually 0.01s. The code has been fixed to wait up to
10 seconds.

 4) A warning has been added if the signal handler thread fails to
exit in time.

 5) Added pthread_join() to end of wait_for_signal_thread_to_end()
if it hadn't ended in 10s with a warning. Note this also removes
the pthread_detached attribute from the signal_thread to allow
for the pthread_join().

Reviewed By:
===========
Vladislav Vaintroub <wlad@mariadb.com>
Andrei Elkin <andrei.elkin@mariadb.com>
This commit is contained in:
Brandon Nesterenko
2024-04-08 13:04:59 -06:00
parent 4980fcb990
commit 952ab9a596
4 changed files with 259 additions and 35 deletions

View File

@ -67,6 +67,15 @@ bool reload_acl_and_cache(THD *thd, unsigned long long options,
bool result=0;
select_errors=0; /* Write if more errors */
int tmp_write_to_binlog= *write_to_binlog= 1;
#ifndef DBUG_OFF
/*
When invoked for handling a SIGHUP by rpl_shutdown_sighup.test, we need to
force the signal handler to wait after REFRESH_TABLES, as that will check
for a killed server, and we need to call hostname_cache_refresh after
server cleanup has happened to trigger MDEV-30260.
*/
int do_dbug_sleep= 0;
#endif
DBUG_ASSERT(!thd || !thd->in_sub_stmt);
@ -99,6 +108,15 @@ bool reload_acl_and_cache(THD *thd, unsigned long long options,
*/
my_error(ER_UNKNOWN_ERROR, MYF(0));
}
#ifndef DBUG_OFF
DBUG_EXECUTE_IF("hold_sighup_log_refresh", {
DBUG_ASSERT(!debug_sync_set_action(
thd, STRING_WITH_LEN("now SIGNAL in_reload_acl_and_cache "
"WAIT_FOR refresh_logs")));
do_dbug_sleep= 1;
});
#endif
}
opt_noacl= 0;
@ -351,6 +369,11 @@ bool reload_acl_and_cache(THD *thd, unsigned long long options,
}
my_dbopt_cleanup();
}
#ifndef DBUG_OFF
if (do_dbug_sleep)
my_sleep(3000000); // 3s
#endif
if (options & REFRESH_HOSTS)
hostname_cache_refresh();
if (thd && (options & REFRESH_STATUS))