1
0
mirror of https://github.com/MariaDB/server.git synced 2025-07-30 16:24:05 +03:00

MDEV-23328 Server hang due to Galera lock conflict resolution

mutex order violation here.
when wsrep bf thread kills a conflicting trx, the stack is

  wsrep_thd_LOCK()
  wsrep_kill_victim()
  lock_rec_other_has_conflicting()
  lock_clust_rec_read_check_and_lock()
  row_search_mvcc()
  ha_innobase::index_read()
  ha_innobase::rnd_pos()
  handler::ha_rnd_pos()
  handler::rnd_pos_by_record()
  handler::ha_rnd_pos_by_record()
  Rows_log_event::find_row()
  Update_rows_log_event::do_exec_row()
  Rows_log_event::do_apply_event()
  Log_event::apply_event()
  wsrep_apply_events()

and mutexes are taken in the order

  lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data

When a normal KILL statement is executed, the stack is

  innobase_kill_query()
  kill_handlerton()
  plugin_foreach_with_mask()
  ha_kill_query()
  THD::awake()
  kill_one_thread()

and mutexes are

  victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex

To fix the mutex order violation we kill the victim thd asynchronously,
from the manager thread
This commit is contained in:
Sergei Golubchik
2021-01-20 18:22:38 +01:00
parent 5d1db34585
commit 29bbcac0ee
2 changed files with 100 additions and 73 deletions

View File

@ -2762,9 +2762,7 @@ extern "C" void wsrep_thd_awake(THD *thd, my_bool signal)
{
if (signal)
{
mysql_mutex_lock(&thd->LOCK_thd_data);
thd->awake(KILL_QUERY);
mysql_mutex_unlock(&thd->LOCK_thd_data);
}
else
{