mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-31 22:22:30 +03:00

Author	SHA1	Message	Date
Sergei Golubchik	b2187662bc	Merge branch '10.5' into 10.6	2022-05-18 10:30:47 +02:00
Sergei Golubchik	7970ac7fe8	Merge branch '10.4' into 10.5	2022-05-18 09:50:26 +02:00
Sergei Golubchik	23ddc3518f	Merge branch '10.3' into 10.4	2022-05-18 01:25:30 +02:00
Sergei Golubchik	a0d4f0f306	Merge branch '10.2' into 10.3 commit `84984b79f2` is null-merged	2022-05-18 01:23:47 +02:00
Andrei	726bd8c968	MDEV-28550 improper handling of replication event group that contains GTID_LIST_EVENT or INCIDENT_EVENT. It's legal to have either of the two inside a group. E.g Gtid_event, Gtid_log_list_event, Query_1, ... Xid_log_event is permitted. However, the slave IO thread treated both as the terminal even when the group represents a DDL query. That causes a premature Gtid state update so the slave IO would think the whole group has been collected while in fact Query_1 etc are yet to process. Fixed with correcting a condition to compute the terminal event of the group. Tested with rpl_mysqlbinlog_slave_consistency (of 10.9) and rpl_gtid_errorlog.test.	2022-05-13 09:45:32 +02:00
Sergei Golubchik	3bc98a4ec4	Merge branch '10.5' into 10.6	2022-05-10 14:01:23 +02:00
Sergei Golubchik	ef781162ff	Merge branch '10.4' into 10.5	2022-05-09 22:04:06 +02:00
Sergei Golubchik	a70a1cf3f4	Merge branch '10.3' into 10.4	2022-05-08 23:03:08 +02:00
Oleksandr Byelkin	9614fde1aa	Merge branch '10.2' into 10.3	2022-05-03 10:59:54 +02:00
Sergei Golubchik	1430cf7873	MDEV-28428 Master_SSL_Crl shows Master_SSL_CA value in SHOW SLAVE STATUS output it was showing ca and capath instead of crl and crl_path	2022-04-28 13:21:04 +02:00
Andrei	388032e990	MDEV-27697. Removed a false assert.	2022-04-26 19:47:59 +03:00
Andrei	945245aea4	MDEV-27697. Two affected tests fixed. A result file is updated in one case and former error simulation got refined.	2022-04-26 17:05:40 +03:00
Andrei	1bcdc3e9eb	MDEV-27697 slave must recognize incomplete replication event group In cases of a faulty master or an incorrect binlog event producer, that slave is working with, sends an incomplete group of events slave must react with an error to not to log into the relay-log any new events that do not belong to the incomplete group. Fixed with extending received event properties check when slave connects to master in gtid mode. Specifically for the event that can be a part of a group its relay-logging is permitted only when its position within the group is validated. Otherwise slave IO thread stops with ER_SLAVE_RELAY_LOG_WRITE_FAILURE.	2022-04-25 16:00:35 +03:00
Brandon Nesterenko	a83c7ab1ea	MDEV-11853: semisync thread can be killed after sync binlog but before ACK in the sync state Problem: ======== If a primary is shutdown during an active semi-sync connection during the period when the primary is awaiting an ACK, the primary hard kills the active communication thread and does not ensure the transaction was received by a replica. This can lead to an inconsistent replication state. Solution: ======== During shutdown, the primary should wait for an ACK or timeout before hard killing a thread which is awaiting a communication. We extend the `SHUTDOWN WAIT FOR SLAVES` logic to identify and ignore any threads waiting for a semi-sync ACK in phase 1. Then, before stopping the ack receiver thread, the shutdown is delayed until all waiting semi-sync connections receive an ACK or time out. The connections are then killed in phase 2. Notes: 1) There remains an unresolved corner case that affects this patch. MDEV-28141: Slave crashes with Packets out of order when connecting to a shutting down master. Specifically, If a slave is connecting to a master which is actively shutting down, the slave can crash with a "Packets out of order" assertion error. To get around this issue in the MTR tests, the primary will wait a small amount of time before phase 1 killing threads to let the replicas safely stop (if applicable). 2) This patch also fixes MDEV-28114: Semi-sync Master ACK Receiver Thread Can Error on COM_QUIT Reviewed By ============ Andrei Elkin <andrei.elkin@mariadb.com>	2022-04-22 12:59:54 -06:00
Andrei	5ccd845d51	MDEV-27760 event may non stop replicate in circular semisync setup MDEV-21117 had to relax own events acceptance condition for a case when a former semisync master server recovers after crash as the semisync slave. That however admitted a possibility for endless event "orbiting" in the non-strict slave gtid mode of semisync circular setup. The same server-id event termination is restored now for the non-strict gtid mode to follow regular rules (that is it's ignored unless @@global.replicate_same_server_id allows it in). To address MDEV-21117 recovery agenda, in the strict gtid mode and the transaction's gtid ordered strictly greater than the current slave gtid state, the same server-id transaction is accepted. The gtid strict mode is safe to accept transactions even if the slave state were not set correct by the user, e.g at the former master. An added test shows a typical out-of-order error at execution so no data corruption is guaranteed in such a case.	2022-03-22 19:20:19 +02:00
Oleksandr Byelkin	f5c5f8e41e	Merge branch '10.5' into 10.6	2022-02-03 17:01:31 +01:00
Oleksandr Byelkin	cf63eecef4	Merge branch '10.4' into 10.5	2022-02-01 20:33:04 +01:00
Oleksandr Byelkin	a576a1cea5	Merge branch '10.3' into 10.4	2022-01-30 09:46:52 +01:00
Oleksandr Byelkin	41a163ac5c	Merge branch '10.2' into 10.3	2022-01-29 15:41:05 +01:00
Brandon Nesterenko	96de6bfd5e	MDEV-16091: Seconds_Behind_Master spikes to millions of seconds Problem: ======== A slave’s relay log format description event is used when calculating Seconds_Behind_Master (SBM). This forces the SBM value to spike when processing these events, as their creation date is set to the timestamp that the IO thread begins. Solution: ======== When the slave generates a format description event, mark the event as a relay log event so it does not update the rli->last_master_timestamp variable. Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>	2022-01-04 11:21:33 -07:00
sjaakola	ef2dbb8dbc	MDEV-23328 Server hang due to Galera lock conflict resolution Mutex order violation when wsrep bf thread kills a conflicting trx, the stack is wsrep_thd_LOCK() wsrep_kill_victim() lock_rec_other_has_conflicting() lock_clust_rec_read_check_and_lock() row_search_mvcc() ha_innobase::index_read() ha_innobase::rnd_pos() handler::ha_rnd_pos() handler::rnd_pos_by_record() handler::ha_rnd_pos_by_record() Rows_log_event::find_row() Update_rows_log_event::do_exec_row() Rows_log_event::do_apply_event() Log_event::apply_event() wsrep_apply_events() and mutexes are taken in the order lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data When a normal KILL statement is executed, the stack is innobase_kill_query() kill_handlerton() plugin_foreach_with_mask() ha_kill_query() THD::awake() kill_one_thread() and mutexes are victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex This patch is the plan D variant for fixing potetial mutex locking order exercised by BF aborting and KILL command execution. In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen. TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well. This also fixed unprotected calls to wsrep_thd_abort that will use wsrep_abort_transaction. This is fixed by holding THD::LOCK_thd_data while we abort transaction. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-10-29 20:40:35 +02:00
Jan Lindström	d5bc05798f	MDEV-25114: Crash: WSREP: invalid state ROLLED_BACK (FATAL) Revert "MDEV-23328 Server hang due to Galera lock conflict resolution" This reverts commit `eac8341df4`.	2021-10-29 20:38:11 +02:00
sjaakola	5c230b21bf	MDEV-23328 Server hang due to Galera lock conflict resolution Mutex order violation when wsrep bf thread kills a conflicting trx, the stack is wsrep_thd_LOCK() wsrep_kill_victim() lock_rec_other_has_conflicting() lock_clust_rec_read_check_and_lock() row_search_mvcc() ha_innobase::index_read() ha_innobase::rnd_pos() handler::ha_rnd_pos() handler::rnd_pos_by_record() handler::ha_rnd_pos_by_record() Rows_log_event::find_row() Update_rows_log_event::do_exec_row() Rows_log_event::do_apply_event() Log_event::apply_event() wsrep_apply_events() and mutexes are taken in the order lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data When a normal KILL statement is executed, the stack is innobase_kill_query() kill_handlerton() plugin_foreach_with_mask() ha_kill_query() THD::awake() kill_one_thread() and mutexes are victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex This patch is the plan D variant for fixing potetial mutex locking order exercised by BF aborting and KILL command execution. In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen. TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well. This also fixed unprotected calls to wsrep_thd_abort that will use wsrep_abort_transaction. This is fixed by holding THD::LOCK_thd_data while we abort transaction. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-10-29 09:52:52 +03:00
Jan Lindström	aa7ca987db	MDEV-25114: Crash: WSREP: invalid state ROLLED_BACK (FATAL) Revert "MDEV-23328 Server hang due to Galera lock conflict resolution" This reverts commit `eac8341df4`.	2021-10-29 09:52:40 +03:00
Marko Mäkelä	d8c6c53a06	Merge 10.5 into 10.6	2021-10-28 09:08:58 +03:00
Marko Mäkelä	a8ded39557	Merge 10.4 into 10.5	2021-10-28 08:48:36 +03:00
Julius Goryavsky	7948a1dc53	MDEV-26914: Unreleased mutex in the exec_relay_log_event() function In the replication-related code, in the exec_relay_log_event() (slave.cc) function, where the "data_lock" mutex is captured, this mutex is then not released on one of the early return branches within a specific insert for WSREP, namely under the branch: "if (wsrep_before_statement(thd))". As a result, the mutex remains captured, resulting in errors or hangs. This commit fixes this issue, which is now showing up as intermittent failures in mtr tests for galera and galera_sr suites.	2021-10-28 03:17:12 +02:00
Sujatha	6c39eaeb12	MDEV-21117: refine the server binlog-based recovery for semisync Problem: ======= When the semisync master is crashed and restarted as slave it could recover transactions that former slaves may never have seen. A known method existed to clear out all prepared transactions with --tc-heuristic-recover=rollback does not care to adjust binlog accordingly. Fix: === The binlog-based recovery is made to concern of the slave semisync role of post-crash restarted server. No changes in behavior is done to the "normal" binloggging server and the semisync master. When the restarted server is configured with --rpl-semi-sync-slave-enabled=1 the refined recovery attempts to roll back prepared transactions and truncate binlog accordingly. In case of a partially committed (that is committed at least in one of the engine participants) such transaction gets committed. It's guaranteed no (partially as well) committed transactions exist beyond the truncate position. In case there exists a non-transactional replication event (being in a way a committed transaction) past the computed truncate position the recovery ends with an error. As after master crash and failover to slave, the demoted-to-slave ex-master must be ready to face and accept its own (generated by) events, without generally necessary --replicate-same-server-id. So the acceptance conditions are relaxed for the semisync slave to accept own events without that option. While gtid_strict_mode ON ensures no duplicate transaction can be (re-)executed the master_use_gtid=none slave has to be configured with --replicate-same-server-id. NOTE for reviewers. This patch does not handle the user XA which is done in next git commit.	2021-06-11 19:49:39 +03:00
Rucha Deodhar	4e19539c14	MDEV-22189: Change error messages inside code to have mariadb instead of mysql Fix: Changed error messages, rerecorded results and changed other relevant files.	2021-05-24 11:38:13 +05:30
Monty	85d6278fed	Change replication to use uchar for all buffers instead of char This change is to get rid of randomly failing tests, especially those that reads random position of the binary log. From looking at the logs it's clear that some failures is because of a read char (with value >= 128) is converted to a big long value. Using uchar everywhere makes this much less likely to happen. Another benefit is that a lot of cast of char to uchar could be removed. Other things: - Removed some extra space before '=' and '+=' in assignments - Fixed indentations and lines > 80 characters - Replace '16' with 'element_size' (from class definition) in Gtid_list_log_event()	2021-05-19 22:54:12 +02:00
Monty	a206658b98	Change CHARSET_INFO character set and collaction names to LEX_CSTRING This change removed 68 explict strlen() calls from the code. The following renames was done to ensure we don't use the old names when merging code from earlier releases, as using the new variables for print function could result in crashes: - charset->csname renamed to charset->cs_name - charset->name renamed to charset->coll_name Almost everything where mechanical changes except: - Changed to use the new Protocol::store(LEX_CSTRING..) when possible - Changed to use field->store(LEX_CSTRING, CHARSET_INFO) when possible - Changed to use String->append(LEX_CSTRING&) when possible Other things: - There where compiler issues with ensuring that all character set names points to the same string: gcc doesn't allow one to use integer constants when defining global structures (constant char * pointers works fine). To get around this, I declared defines for each character set name length.	2021-05-19 22:54:07 +02:00
Monty	b6ff139aa3	Reduce usage of strlen() Changes: - To detect automatic strlen() I removed the methods in String that uses 'const char ' without a length: - String::append(const char) - Binary_string(const char str) - String(const char str, CHARSET_INFO cs) - append_for_single_quote(const char ) All usage of append(const char) is changed to either use String::append(char), String::append(const char, size_t length) or String::append(LEX_CSTRING) - Added STRING_WITH_LEN() around constant string arguments to String::append() - Added overflow argument to escape_string_for_mysql() and escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow. This was needed as most usage of the above functions never tested the result for -1 and would have given wrong results or crashes in case of overflows. - Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING. Changed all Item_func::func_name()'s to func_name_cstring()'s. The old Item_func_or_sum::func_name() is now an inline function that returns func_name_cstring().str. - Changed Item::mode_name() and Item::func_name_ext() to return LEX_CSTRING. - Changed for some functions the name argument from const char * to to const LEX_CSTRING &: - Item::Item_func_fix_attributes() - Item::check_type_...() - Type_std_attributes::agg_item_collations() - Type_std_attributes::agg_item_set_converter() - Type_std_attributes::agg_arg_charsets...() - Type_handler_hybrid_field_type::aggregate_for_result() - Type_handler_geometry::check_type_geom_or_binary() - Type_handler::Item_func_or_sum_illegal_param() - Predicant_to_list_comparator::add_value_skip_null() - Predicant_to_list_comparator::add_value() - cmp_item_row::prepare_comparators() - cmp_item_row::aggregate_row_elements_for_comparison() - Cursor_ref::print_func() - Removes String_space() as it was only used in one cases and that could be simplified to not use String_space(), thanks to the fixed my_vsnprintf(). - Added some const LEX_CSTRING's for common strings: - NULL_clex_str, DATA_clex_str, INDEX_clex_str. - Changed primary_key_name to a LEX_CSTRING - Renamed String::set_quick() to String::set_buffer_if_not_allocated() to clarify what the function really does. - Rename of protocol function: bool store(const char from, CHARSET_INFO cs) to bool store_string_or_null(const char from, CHARSET_INFO cs). This was done to both clarify the difference between this 'store' function and also to make it easier to find unoptimal usage of store() calls. - Added Protocol::store(const LEX_CSTRING, CHARSET_INFO) - Changed some 'const char' arrays to instead be of type LEX_CSTRING. - class Item_func_units now used LEX_CSTRING for name. Other things: - Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character in the prompt would cause some part of the prompt to be duplicated. - Fixed a lot of instances where the length of the argument to append is known or easily obtain but was not used. - Removed some not needed 'virtual' definition for functions that was inherited from the parent. I added override to these. - Fixed Ordered_key::print() to preallocate needed buffer. Old code could case memory overruns. - Simplified some loops when adding char to a String with delimiters.	2021-05-19 22:27:48 +02:00
Marko Mäkelä	ca3f497564	Merge 10.2 into 10.3, except MDEV-25682	2021-05-18 08:40:19 +03:00
Sachin Kumar	355dc74b76	MDEV-22370 safe_mutex: Trying to lock uninitialized mutex at /data/src/10.4-bug/sql/rpl_parallel.cc, line 470 upon shutdown during FTWRL Problem:- When we issue FTWRL with shutdown in parallel, there is race between FTWRL and shutdown. Shutdown might destroy the mutex (pool->LOCK_rpl_thread_pool) before FTWRL can lock it. So we can get crash on FTWRL thread Solution:- mysql_mutex_destroy(pool->LOCK_rpl_thread_pool) should wait for FTWRL thread to complete its work , and then destroy. So slave_prepare_for_shutdown will just deactivate the pool, and mutex is destroyed later in end_slave()	2021-05-14 11:49:46 +01:00
Andrei Elkin	3616640a31	MDEV-20821 parallel slave server shutdown hang Parallel slave server shutdown found to be hanging in close_connections() triggered by shutdown due to a slave worker thread would not be notified to exit in case the worker was sitting idle. Fixed with destroying the worker pool earlier that is in slave_prepare_for_shutdown() when all their driver threads have already left. A test file is added to simulate the bug condition as well as check multi-sourced and not-idle worker cases.	2021-05-14 11:49:26 +01:00
Sujatha	70642871bc	MDEV-16437: merge 5.7 P_S replication instrumentation and tables Merge 'replication_applier_status_by_coordinator' table. This table captures SQL_THREAD status in case of both single threaded and multi threaded slave configuration. When multi_source replication is enabled this table will display each source specific SQL_THREAD status. Added new columns for: - LAST_SEEN_TRANSACTION - LAST_TRANS_RETRY_COUNT	2021-04-16 09:02:00 +05:30
Marko Mäkelä	94b4578704	Merge 10.5 into 10.6	2021-02-17 19:39:05 +02:00
Sergei Golubchik	25d9d2e37f	Merge branch 'bb-10.4-release' into bb-10.5-release	2021-02-15 16:43:15 +01:00
Sergei Golubchik	eac8341df4	MDEV-23328 Server hang due to Galera lock conflict resolution adaptation of `29bbcac0ee` for 10.4	2021-02-12 18:17:06 +01:00
Sergei Golubchik	9703cffa8c	don't take mutexes conditionally	2021-02-12 18:14:20 +01:00
Sergei Golubchik	00a313ecf3	Merge branch 'bb-10.3-release' into bb-10.4-release Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution" was null-merged. 10.4 version of the fix is coming up separately	2021-02-12 17:44:22 +01:00
Sergei Golubchik	60ea09eae6	Merge branch '10.2' into 10.3	2021-02-01 13:49:33 +01:00
Sergei Golubchik	6a1cb449fe	cleanup: remove slave background thread, use handle_manager thread instead	2021-01-24 11:35:55 +01:00
Daniel Black	29d9897fe2	MDEV-10272: add master host/port info to slave thread exit messages Sample log error message generated: 2021-01-21 2:33:24 139912137520896 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 369 33:24 139912137520896 [Note] master was 127.0.0.1:16400 2021-01-21 2:33:24 139912137828096 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 369 2021-01-21 2:33:24 139912137828096 [Note] master was 127.0.0.1:16400 Based on work by Hartmut Holzgraefe. Reviewer: knielsen@knielsen-hq.org, Andrei, Sachin	2021-01-22 10:06:33 +11:00
Hartmut Holzgraefe	fa14c423cd	MDEV-10271: add master host/port info to slave thread exit messages Sample log error message generated: mysql-test/var/log/mysqld.2.err:2021-01-21 13:02:30 8 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 329, master: 127.0.0.1:16000 mysql-test/var/log/mysqld.2.err:2021-01-21 13:02:30 7 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 329, master 127.0.0.1:16000 mysql-test/var/log/mysqld.2.err:2021-01-21 13:02:30 12 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 329; GTID position '', master: 127.0.0.1:16000 Reviewer: knielsen@knielsen-hq.org, Andrei and Sachin	2021-01-21 13:03:54 +11:00
Oleksandr Byelkin	10aa576483	Merge branch '10.5' into 10.6	2020-11-14 20:05:35 +01:00
Marko Mäkelä	d7a5824899	Merge 10.4 into 10.5	2020-11-13 21:54:21 +02:00
Sujatha	b2029c0300	Merge branch '10.3' into 10.4	2020-11-12 15:39:02 +05:30
Sujatha	bafb011a82	Merge branch '10.2' into 10.3	2020-11-12 14:10:05 +05:30
Sujatha	984a06db2c	MDEV-4633: multi_source.simple test fails sporadically Analysis: ======== Writes to 'rli->log_space_total' needs to be synchronized, otherwise both SQL_THREAD and IO_THREAD can try to modify the variable simultaneously resulting in incorrect rli->log_space_total. In the current test scenario SQL_THREAD is trying to decrement 'rli->log_space_total' in 'purge_first_log' and IO_THREAD is trying to increment the 'rli->log_space_total' in 'queue_event' simultaneously. Hence test occasionally fails with result mismatch. Fix: === Convert 'rli->log_space_total' variable to atomic type.	2020-11-12 13:04:39 +05:30

1 2 3 4 5 ...

2637 Commits