mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-11-27 05:41:41 +03:00

Author	SHA1	Message	Date
sjaakola	ef2dbb8dbc	MDEV-23328 Server hang due to Galera lock conflict resolution Mutex order violation when wsrep bf thread kills a conflicting trx, the stack is wsrep_thd_LOCK() wsrep_kill_victim() lock_rec_other_has_conflicting() lock_clust_rec_read_check_and_lock() row_search_mvcc() ha_innobase::index_read() ha_innobase::rnd_pos() handler::ha_rnd_pos() handler::rnd_pos_by_record() handler::ha_rnd_pos_by_record() Rows_log_event::find_row() Update_rows_log_event::do_exec_row() Rows_log_event::do_apply_event() Log_event::apply_event() wsrep_apply_events() and mutexes are taken in the order lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data When a normal KILL statement is executed, the stack is innobase_kill_query() kill_handlerton() plugin_foreach_with_mask() ha_kill_query() THD::awake() kill_one_thread() and mutexes are victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex This patch is the plan D variant for fixing potetial mutex locking order exercised by BF aborting and KILL command execution. In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen. TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well. This also fixed unprotected calls to wsrep_thd_abort that will use wsrep_abort_transaction. This is fixed by holding THD::LOCK_thd_data while we abort transaction. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-10-29 20:40:35 +02:00
Marko Mäkelä	4eb7217ec3	Merge 10.4 into 10.5	2021-10-06 09:45:12 +03:00
mkaruza	a75813d467	MDEV-22708 Assertion `!mysql_bin_log.is_open() \|\| thd.is_current_stmt_binlog_format_row()' failed in Delayed_insert::handle_inserts and in Diagnostics_area::set_eof_status Function `upgrade_lock_type` should check global binlog_format variable instead of thread one. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-10-06 07:20:02 +03:00
Marko Mäkelä	699de65d5e	Merge 10.4 into 10.5	2021-09-17 19:57:13 +03:00
Jan Lindström	d3b35598fc	MDEV-26053 : TRUNCATE on table with Foreign Key Constraint no longer replicated to other nodes Problem was that there was extra condition !thd->lex->no_write_to_binlog before call to begin TOI. It seems that this variable is not initialized. TRUNCATE does not support [NO_WRITE_TO_BINLOG \| LOCAL] keywords, thus we should not check this condition. All this was hidden in a macro, so I decided to remove those macros that were used only a few places with actual function calls.	2021-09-17 07:18:37 +03:00
Marko Mäkelä	87ff4ba7c8	Merge 10.4 into 10.5	2021-08-26 08:46:57 +03:00
Marko Mäkelä	15b691b7bd	After-merge fix `f84e28c119` In a rebase of the merge, two preceding commits were accidentally reverted: commit `112b23969a` (MDEV-26308) commit `ac2857a5fb` (MDEV-25717) Thanks to Daniele Sciascia for noticing this.	2021-08-25 17:35:44 +03:00
Marko Mäkelä	f84e28c119	Merge 10.3 into 10.4	2021-08-18 16:51:52 +03:00
mkaruza	da171182b7	MDEV-26223 Galera cluster node consider old server_id value even after modification of server_id [wsrep_gtid_mode=ON] If cluster is bootstrapped in existing database, we should use provided configuration variables for wsrep_gtid_domain_id and server_id instead of recovered ones. If 'new' combination of wsrep_gtid_domain_id & server_id already existed somewere before in binlog we should continue from last seqno, if combination is new we start from seqno 0. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-08-18 12:20:06 +03:00
Leandro Pacheco	112b23969a	MDEV-26308 : Galera test failure on galera.galera_split_brain Contains following fixes: * allow TOI commands to timeout while trying to acquire TOI with override lock_wait_timeout with a LONG_TIMEOUT only after succesfully entering TOI * only ignore lock_wait_timeout on TOI * fix galera_split_brain test as TOI operation now returns ER_LOCK_WAIT_TIMEOUT after lock_wait_timeout * explicitly test for TOI Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-08-18 08:57:33 +03:00
Oleksandr Byelkin	ae6bdc6769	Merge branch '10.4' into 10.5	2021-07-31 23:19:51 +02:00
Leandro Pacheco	2b84e1c966	MDEV-23080: desync and pause node on BACKUP STAGE BLOCK_DDL make BACKUP STAGE behave as FTWRL, desyncing and pausing the node to prevent BF threads (appliers) from interfering with blocking stages. This is needed because BF threads don't respect BACKUP MDL locks. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-07-27 08:11:41 +03:00
Sergei Golubchik	b34cafe9d9	cleanup: move thread_count to THD_count::value() because the name was misleading, it counts not threads, but THDs, and as THD_count is the only way to increment/decrement it, it could as well be declared inside THD_count.	2021-07-24 12:37:50 +02:00
Marko Mäkelä	c6846757ac	Merge 10.4 into 10.5	2021-05-03 14:34:48 +03:00
mkaruza	206d630ea0	MDEV-22227 Assertion `state_ == s_exec' failed in wsrep::client_state::start_transaction Removed redundant code for BF abort transaction in `thr_lock.cc`. TOI operations will ignore provided lock_wait_timeout and use `LONG_TIMEOUT` until operation is finished. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-04-28 11:11:01 +03:00
Marko Mäkelä	80ed136e6d	Merge 10.4 into 10.5	2021-04-21 09:01:01 +03:00
mkaruza	c3b016efde	MDEV-22668: "Flush SSL" command doesn't reload wsrep cert Trigger `socket.ssl_reload` when FLUSH SSL is issued. To triger reloading of certificate, key and CA, files needs to be physically changed. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-04-15 08:50:01 +03:00
Marko Mäkelä	7b48da4d7e	Merge 10.4 into 10.5	2021-04-08 07:47:49 +03:00
Daniele Sciascia	915983e1cc	MDEV-25226 Assertion when wsrep_on set OFF with SR transaction This patch makes the following changes around variable wsrep_on: 1) Variable wsrep_on can no longer be updated from a session that has an active transaction running. The original behavior allowed cases like this: BEGIN; INSERT INTO t1 VALUES (1); SET SESSION wsrep_on = OFF; INSERT INTO t1 VALUES (2); COMMIT; With regular transactions this would result in no replication events (not even value 1). With streaming replication it would be unnecessarily complex to achieve the same behavior. In the above example, it would be possible for value 1 to be already replicated if it happened to fill a separate fragment, while value 2 wouldn't. 2) Global variable wsrep_on no longer affects current sessions, only subsequent ones. This is to avoid a similar case to the above, just using just by using global wsrep_on instead session wsrep_on: --connection conn_1 BEGIN; INSERT INTO t1 VALUES(1); --connection conn_2 SET GLOBAL wsrep_on = OFF; --connection conn_1 INSERT INTO t1 VALUES(2); COMMIT; The above example results in the transaction to be replicated, as global wsrep_on will only affect the session wsrep_on of new connections. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-04-05 09:10:23 +03:00
Marko Mäkelä	d6d3d9ae2f	Merge 10.2 into 10.3	2021-03-31 08:01:03 +03:00
Jan Lindström	d217a925b2	MDEV-24923 : Port selected Galera conflict resolution changes from 10.6 Add condition on trx->state == TRX_STATE_COMMITTED_IN_MEMORY in order to avoid unnecessary work. If a transaction has already been committed or rolled back, it will release its locks in lock_release() and let the waiting thread(s) continue execution. Let BF wait on lock_rec_has_to_wait and if necessary other BF is replayed. wsrep_trx_order_before If BF is not even replicated yet then they are ordered correctly. bg_wsrep_kill_trx Make sure victim_trx is found and check also its state. If state is TRX_STATE_COMMITTED_IN_MEMORY transaction is already committed or rolled back and will release it locks soon. wsrep_assert_no_bf_bf_wait Transaction requesting new record lock should be TRX_STATE_ACTIVE Conflicting transaction can be in states TRX_STATE_ACTIVE, TRX_STATE_COMMITTED_IN_MEMORY or in TRX_STATE_PREPARED. If conflicting transaction is already committed in memory or prepared we should wait. When transaction is committed in memory we held trx mutex, but not lock_sys->mutex. Therefore, we could end here before transaction has time to do lock_release() that is protected with lock_sys->mutex. lock_rec_has_to_wait We very well can let bf to wait normally as other BF will be replayed in case of conflict. For debug builds we will do additional sanity checks to catch unsupported bf wait if any. wsrep_kill_victim Check is victim already in TRX_STATE_COMMITTED_IN_MEMORY state and if it is we can return. lock_rec_dequeue_from_page lock_rec_unlock Remove unnecessary wsrep_assert_no_bf_bf_wait function calls. We can very well let BF wait here.	2021-03-30 08:58:10 +03:00
Sergei Golubchik	f33e57a9e6	Merge branch '10.4' into 10.5	2021-02-23 13:06:22 +01:00
Sergei Golubchik	34fcd726a6	Merge branch 'bb-10.4-release' into 10.4	2021-02-23 00:08:56 +01:00
Marko Mäkelä	16388f393c	Merge mariadb-10.5.9	2021-02-17 16:19:49 +02:00
Jan Lindström	a5bcec727b	MDEV-24865 : Server crashes when truncate mysql user table For truncate we try to find out possible foreign key tables using open_tables. However, table_list was not cleaned up properly and there was no error handling. Fixed by cleaning table_list and adding proper error handling.	2021-02-16 08:46:14 +02:00
Sergei Golubchik	25d9d2e37f	Merge branch 'bb-10.4-release' into bb-10.5-release	2021-02-15 16:43:15 +01:00
Sergei Golubchik	2696538723	updating @@wsrep_cluster_address deadlocks wsrep_cluster_address_update() causes LOCK_wsrep_slave_threads to be locked under LOCK_wsrep_cluster_config, while normally the order should be the opposite. Fix: don't protect @@wsrep_cluster_address value with the LOCK_wsrep_cluster_config, LOCK_global_system_variables is enough. Only protect wsrep reinitialization with the LOCK_wsrep_cluster_config. And make it use a local copy of the global @@wsrep_cluster_address. Also, introduce a helper function that checks whether wsrep_cluster_address is set and also asserts that it can be safely read by the caller.	2021-02-14 23:18:42 +01:00
Jan Lindström	b3df194e31	MDEV-24833 : Signal 11 on wsrep_can_run_in_toi at wsrep_mysqld.cc:1994 Problem was that when engine substitution is allowd (e.g. sql_mode='') we must also check db_type. Additionally, we did not resolve default storage engine on that case and used that to check is TOI possible or not.	2021-02-12 20:43:03 +02:00
Sergei Golubchik	259a1902a0	cleanup: THD::abort_current_cond_wait() * reuse the loop in THD::abort_current_cond_wait, don't duplicate it * find_thread_by_id should return whatever it has found, it's the caller's task not to kill COM_DAEMON (if the caller's a killer) and other minor changes	2021-02-12 18:05:34 +01:00
Sergei Golubchik	37e24970cb	merge	2021-02-02 12:43:17 +01:00
Sergei Golubchik	2676c9aad7	galera fixes related to THD::LOCK_thd_kill Since 2017 (`c2118a08b1`) THD::awake() no longer requires LOCK_thd_data. It uses LOCK_thd_kill, and this latter mutex is used to prevent a thread of dying, not LOCK_thd_data as before.	2021-02-02 10:02:17 +01:00
Sergei Golubchik	29bbcac0ee	MDEV-23328 Server hang due to Galera lock conflict resolution mutex order violation here. when wsrep bf thread kills a conflicting trx, the stack is wsrep_thd_LOCK() wsrep_kill_victim() lock_rec_other_has_conflicting() lock_clust_rec_read_check_and_lock() row_search_mvcc() ha_innobase::index_read() ha_innobase::rnd_pos() handler::ha_rnd_pos() handler::rnd_pos_by_record() handler::ha_rnd_pos_by_record() Rows_log_event::find_row() Update_rows_log_event::do_exec_row() Rows_log_event::do_apply_event() Log_event::apply_event() wsrep_apply_events() and mutexes are taken in the order lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data When a normal KILL statement is executed, the stack is innobase_kill_query() kill_handlerton() plugin_foreach_with_mask() ha_kill_query() THD::awake() kill_one_thread() and mutexes are victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex To fix the mutex order violation we kill the victim thd asynchronously, from the manager thread	2021-01-24 11:35:55 +01:00
Marko Mäkelä	8de233af81	Merge 10.4 into 10.5	2021-01-11 16:29:51 +02:00
Jan Lindström	49b8774951	MDEV-24546 : AddressSanitizer: initialization-order-fiasco on address ... in Sys_var_integer from __static_initialization_and_destruction_0, possibly inside global var wsrep_gtid_server Galera parameter wsrep_gtid_domain_id was defined using a class where actual parameter was not a first member. Fixed this by using normal variable and assigning this value to class member value.	2021-01-09 09:03:39 +02:00
Alexey Yurchenko	033f8d13ce	Update wsrep-lib (new logger interface) Ensure consistent use of logging macros in wsrep-related code Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-01-07 17:41:21 +02:00
Oleksandr Byelkin	02e7bff882	Merge commit '10.4' into 10.5	2021-01-06 10:53:00 +01:00
mkaruza	b79b3ff655	MDEV-23468: inline_mysql_socket_send: Assertion `mysql_socket.fd != -1' failed on shutdown Closing remaining threads in `wsrep_close_client_connections` should also check `thd_is_connection_alive` for thd before closing connection. Assert is happening when thread already doing shutdown, but still not removed from threads list. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2020-12-21 11:24:24 +02:00
Marko Mäkelä	6a1e655cb0	Merge 10.4 into 10.5	2020-12-02 18:29:49 +02:00
Marko Mäkelä	24ec8eaf66	MDEV-15532 after-merge fixes from Monty The Galera tests were massively failing with debug assertions.	2020-12-02 16:16:29 +02:00
Marko Mäkelä	d7a5824899	Merge 10.4 into 10.5	2020-11-13 21:54:21 +02:00
sjaakola	2fbcddbeaf	MDEV-24119 MDL BF-BF Conflict caused by TRUNCATE TABLE A follow-up fix, for original fix for MDEV-21577, which did not handle well temporary tables. OPTIMIZE and REPAIR TABLE statements can take a list of tables as argument, and some of the tables may be temporary. Proper handling of temporary tables is to skip them and continue working on the real tables. The bad version, skipped all tables, if a single temporary table was in the argument list. And this resulted so that FK parent tables were not scnanned for the remaining real tables. Some mtr tests, using OPTIMIZE or REPAIR for temporary tables caused regressions bacause of this, e.g. galera.galera_optimize_analyze_multi The fix in this PR opens temporary and real tables first, and in the table handling loop skips temporary tables, FK parent scanning is done only for real tables. The test has new scenario for OPTIMIZE and REPAIR issued for two tables of which one is temporary table. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2020-11-11 17:46:50 +02:00
sjaakola	4d6c661144	MDEV-21577 MDL BF-BF conflict Some DDL statements appear to acquire MDL locks for a table referenced by foreign key constraint from the actual affected table of the DDL statement. OPTIMIZE, REPAIR and ALTER TABLE belong to this class of DDL statements. Earlier MariaDB version did not take this in consideration, and appended only affected table in the certification key list in write set. Because of missing certification information, it could happen that e.g. OPTIMIZE table for FK child table could be allowed to apply in parallel with DML operating on the foreign key parent table, and this could lead to unhandled MDL lock conflicts between two high priority appliers (BF). The fix in this patch, changes the TOI replication for OPTIMIZE, REPAIR and ALTER TABLE statements so that before the execution of respective DDL statement, there is foreign key parent search round. This FK parent search contains following steps: * open and lock the affected table (with permissive shared locks) * iterate over foreign key contstraints and collect and array of Fk parent table names * close all tables open for the THD and release MDL locks * do the actual TOI replication with the affected table and FK parent table names as key values The patch contains also new mtr test for verifying that the above mentioned DDL statements replicate without problems when operating on FK child table. The mtr test scenario #1, which can be used to check if some other DDL (on top of OPTIMIZE, REPAIR and ALTER) could cause similar excessive FK parent table locking. Reviewed-by: Aleksey Midenkov <aleksey.midenkov@mariadb.com> Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2020-11-03 19:40:06 +02:00
Jan Lindström	c5517cd864	MDEV-23638 : DROP TRIGGER in Galera Cluster not replicating Drop trigger handling was missing from wsrep_can_run_in_toi in 10.5 for some reason.	2020-09-08 11:14:37 +03:00
Marko Mäkelä	5ff7e68c7e	Merge 10.4 into 10.5	2020-09-04 18:44:44 +03:00
Marko Mäkelä	c9cf6b13f6	Merge 10.3 into 10.4	2020-09-03 15:53:38 +03:00
Jan Lindström	33ae1616e0	MDEV-21578 : CREATE OR REPLACE TRIGGER in Galera cluster not replicating In 10.3 OR REPLACE trigger option is part of create_info.	2020-09-03 14:10:42 +03:00
Marko Mäkelä	c3752cef3c	Merge 10.2 into 10.3	2020-09-03 09:26:54 +03:00
Jan Lindström	c710c450e3	MDEV-21578 : CREATE OR REPLACE TRIGGER in Galera cluster not replicating While doing TOI buffer OR REPLACE option was not added to replicated string.	2020-08-28 16:40:12 +03:00
Marko Mäkelä	d5d8756de3	Merge 10.4 into 10.5	2020-08-20 12:52:44 +03:00
Daniele Sciascia	09dd06f14a	MDEV-22443 wsrep::runtime_error on START TRANSACTION This happens with global wsrep_on disabled and local wsrep_on enabled. The fix consists in avoiding sync wait when global wsrep_on is disabled.	2020-08-19 13:12:00 +03:00

1 2 3 4 5 ...

399 Commits