mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-07-05 12:42:17 +03:00

Author	SHA1	Message	Date
Sergei Golubchik	87e13722a9	Merge branch '10.6' into 10.11	2024-02-01 18:36:14 +01:00
Sergei Golubchik	3f6038bc51	Merge branch '10.5' into 10.6	2024-01-31 18:04:03 +01:00
Sergei Golubchik	01f6abd1d4	Merge branch '10.4' into 10.5	2024-01-31 17:32:53 +01:00
Oleksandr Byelkin	fe490f85bb	Merge branch '10.11' into 11.0	2024-01-30 08:54:10 +01:00
Oleksandr Byelkin	14d930db5d	Merge branch '10.6' into 10.11	2024-01-30 08:17:58 +01:00
Oleksandr Byelkin	25c0806867	Merge branch '10.5' into 10.6	2024-01-30 07:43:15 +01:00
Oleksandr Byelkin	50107c4b22	Merge branch '10.4' into 10.5	2024-01-30 07:26:17 +01:00
Brandon Nesterenko	c75905cacb	MDEV-33327: rpl_seconds_behind_master_spike Sensitive to IO Thread Stop Position rpl.rpl_seconds_behind_master_spike uses the DEBUG_SYNC mechanism to count how many format descriptor events (FDEs) have been executed, to attempt to pause on a specific relay log FDE after executing transactions. However, depending on when the IO thread is stopped, it can send an extra FDE before sending the transactions, forcing the test to pause before executing any transactions, resulting in a table not existing, that is attempted to be read for COUNT. This patch fixes this by no longer counting FDEs, but rather by programmatically waiting until the SQL thread has executed the transaction and then automatically activating the DEBUG_SYNC point to trigger at the next relay log FDE.	2024-01-30 06:58:44 +01:00
Michael Widenius	7af50e4df4	MDEV-32551: "Read semi-sync reply magic number error" warnings on master rpl_semi_sync_slave_enabled_consistent.test and the first part of the commit message comes from Brandon Nesterenko. A test to show how to induce the "Read semi-sync reply magic number error" message on a primary. In short, if semi-sync is turned on during the hand-shake process between a primary and replica, but later a user negates the rpl_semi_sync_slave_enabled variable while the replica's IO thread is running; if the io thread exits, the replica can skip a necessary call to kill_connection() in repl_semisync_slave.slave_stop() due to its reliance on a global variable. Then, the replica will send a COM_QUIT packet to the primary on an active semi-sync connection, causing the magic number error. The test in this patch exits the IO thread by forcing an error; though note a call to STOP SLAVE could also do this, but it ends up needing more synchronization. That is, the STOP SLAVE command also tries to kill the VIO of the replica, which makes a race with the IO thread to try and send the COM_QUIT before this happens (which would need more debug_sync to get around). See THD::awake_no_mutex for details as to the killing of the replica’s vio. Notes: - The MariaDB documentation does not make it clear that when one enables semi-sync replication it does not matter if one enables it first in the master or slave. Any order works. Changes done: - The rpl_semi_sync_slave_enabled variable is now a default value for when semisync is started. The variable does not anymore affect semisync if it is already running. This fixes the original reported bug. Internally we now use repl_semisync_slave.get_slave_enabled() instead of rpl_semi_sync_slave_enabled. To check if semisync is active on should check the @@rpl_semi_sync_slave_status variable (as before). - The semisync protocol conflicts in the way that the original MySQL/MariaDB client-server protocol was designed (client-server send and reply packets are strictly ordered and includes a packet number to allow one to check if a packet is lost). When using semi-sync the master and slave can send packets at 'any time', so packet numbering does not work. The 'solution' has been that each communication starts with packet number 1, but in some cases there is still a chance that the packet number check can fail. Fixed by adding a flag (pkt_nr_can_be_reset) in the NET struct that one can use to signal that packet number checking should not be done. This is flag is set when semi-sync is used. - Added Master_info::semi_sync_reply_enabled to allow one to configure some slaves with semisync and other other slaves without semisync. Removed global variable semi_sync_need_reply that would not work with multi-master. - Repl_semi_sync_master::report_reply_packet() can now recognize the COM_QUIT packet from semisync slave and not give a "Read semi-sync reply magic number error" error for this case. The slave will be removed from the Ack listener. - On Windows, don't stop semisync Ack listener just because one slave connection is using socket_id > FD_SETSIZE. - Removed busy loop in Ack_receiver::run() by using "Self-pipe trick" to signal new slave and stop Ack_receiver. - Changed some Repl_semi_sync_slave functions that always returns 0 from int to void. - Added Repl_semi_sync_slave::slave_reconnect(). - Removed dummy_function Repl_semi_sync_slave::reset_slave(). - Removed some duplicate semisync notes from the error log. - Add test of "if (get_slave_enabled() && semi_sync_need_reply)" before calling Repl_semi_sync_slave::slave_reply(). (Speeds up the code as we can skip all initializations). - If epl_semisync_slave.slave_reply() fails, we disable semisync for that connection. - We do not call semisync.switch_off() if there are no active slaves. Instead we check in Repl_semi_sync_master::commit_trx() if there are no active threads. This simplices the code. - Changed assert() to DBUG_ASSERT() to ensure that the DBUG log is flushed in case of asserts. - Removed the internal rpl_semi_sync_slave_status as it is not needed anymore. The @@rpl_semi_sync_slave_status status variable is now mapped to rpl_semi_sync_enabled. - Removed rpl_semi_sync_slave_enabled as it is not needed anymore. Repl_semi_sync_slave::get_slave_enabled() contains the active status. - Added checking that we do not add a slave twice with Ack_receiver::add_slave(). This could happen with old code. - Removed Repl_semi_sync_master::check_and_switch() as it is not needed anymore. - Ensure that when we call Ack_receiver::remove_slave() that the slave is removed from the listener before function returns. - Call listener.listen_on_sockets() outside of mutex for better performance and less contested mutex. - Ensure that listening is ignoring newly added slaves when checking for responses. - Fixed the master ack_receiver listener is not killed if there are no connected slaves (and thus stop semisync handling of future connections). This could happen if all slaves sockets where would be marked as unreliable. - Added unlink() to base_ilist_iterator and remove() to I_List_iterator. This enables us to remove 'dead' slaves in Ack_recever::run(). - kill_zombie_dump_threads() now does killing of dump threads properly. - It can now kill several threads (should be impossible but could happen if IO slaves reconnects very fast). - We now wait until the dump thread is done before starting the dump. - Added an error if kill_zombie_dump_threads() fails. - Set thd->variables.server_id before calling kill_zombie_dump_threads(). This simplies the code. - Added a lot of comments both in code and tests. - Removed DBUG_EVALUATE_IF "failed_slave_start" as it is not used. Test changes: - rpl.rpl_session_var2 added which runs rpl.rpl_session_var test with semisync enabled. - Some timings changed slight with startup of slave which caused rpl_binlog_dump_slave_gtid_state_info.text to fail as it checked the error log file before the slave had started properly. Fixed by adding wait_for_pattern_in_file.inc that allows waiting for the pattern to appear in the log file. - Tests have been updated so that we first set rpl_semi_sync_master_enabled on the master and then set rpl_semi_sync_slave_enabled on the slaves (this is according to how the MariaDB documentation document how to setup semi-sync). - Error text "Master server does not have semi-sync enabled" has been replaced with "Master server does not support semi-sync" for the case when the master supports semi-sync but semi-sync is not enabled. Other things: - Some trivial cleanups in Repl_semi_sync_master::update_sync_header(). - We should in 11.3 changed the default value for rpl-semi-sync-master-wait-no-slave from TRUE to FALSE as the TRUE does not make much sense as default. The main difference with using FALSE is that we do not wait for semisync Ack if there are no slave threads. In the case of TRUE we wait once, which did not bring any notable benefits except slower startup of master configured for using semisync. Co-author: Brandon Nesterenko <brandon.nesterenko@mariadb.com> This solves the problem reported in MDEV-32960 where a new slave may not be registered in time and the master disables semi sync because of that.	2024-01-23 13:03:11 +02:00
Sergei Golubchik	c154aafe1a	Merge remote-tracking branch '11.3' into 11.4	2023-12-21 15:40:55 +01:00
Sergei Golubchik	7f0094aac8	Merge branch '11.2' into 11.3	2023-12-21 02:14:59 +01:00
Marko Mäkelä	590036b021	Merge 10.11 into 11.0	2023-12-20 16:05:20 +02:00
Marko Mäkelä	2b99e5f7ef	Merge 10.6 into 10.11	2023-12-20 15:58:36 +02:00
Marko Mäkelä	2b01e5103d	Merge 10.5 into 10.6	2023-12-19 18:41:42 +02:00
Marko Mäkelä	12995559f9	Merge 10.4 into 10.5	2023-12-19 18:30:58 +02:00
Sergei Golubchik	8c8bce05d2	Merge branch '10.11' into 11.0	2023-12-19 15:53:18 +01:00
Kristian Nielsen	eaa4968fc5	MDEV-10653: Fix segfault in SHOW MASTER STATUS with NULL inuse_relaylog The previous patch for MDEV-10653 changes the rpl_parallel::workers_idle() function to use Relay_log_info::last_inuse_relaylog to check for idle workers. But the code was missing a NULL check. Also, there was one place during SQL slave thread start which was missing mutex synchronisation when updating inuse_relaylog. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-12-19 12:08:54 +01:00
Sergei Golubchik	fd0b47f9d6	Merge branch '10.6' into 10.11	2023-12-18 11:19:04 +01:00
Marko Mäkelä	4ae105a37d	Merge 10.4 into 10.5	2023-12-18 08:59:07 +02:00
Sergei Golubchik	e95bba9c58	Merge branch '10.5' into 10.6	2023-12-17 11:20:43 +01:00
Brandon Nesterenko	8dad51481b	MDEV-10653: SHOW SLAVE STATUS Can Deadlock an Errored Slave AKA rpl.rpl_parallel, binlog_encryption.rpl_parallel fails in buildbot with timeout in include A replication parallel worker thread can deadlock with another connection running SHOW SLAVE STATUS. That is, if the replication worker thread is in do_gco_wait() and is killed, it will already hold the LOCK_parallel_entry, and during error reporting, try to grab the err_lock. SHOW SLAVE STATUS, however, grabs these locks in reverse order. It will initially grab the err_lock, and then try to grab LOCK_parallel_entry. This leads to a deadlock when both threads have grabbed their first lock without the second. This patch implements the MDEV-31894 proposed fix to optimize the workers_idle() check to compare the last in-use relay log’s queued_count==dequeued_count for idleness. This removes the need for workers_idle() to grab LOCK_parallel_entry, as these values are atomically updated. Huge thanks to Kristian Nielsen for diagnosing the problem! Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2023-12-11 07:45:23 -07:00
Sergei Golubchik	98a39b0c91	Merge branch '10.4' into 10.5	2023-12-02 01:02:50 +01:00
Monty	1ffa8c5072	Fixed build failure on aarch64-macos debug_sync.h was wrongly combined with replication	2023-11-28 15:31:44 +02:00
Vladislav Vaintroub	013fc02a23	MDEV-32567 Remove thr_alarm from server codebase This allows to simplify net_real_read() and net_real_write() a bit. Removed some superfluous #ifdef/ifndef MYSQL_SERVER from net_serv.cc The code always runs in server, either normal or embedded. Dead code for switching socket between blocking and non-blocking modes, is also removed. Removed pthread_kill() with alarm signal that woke up main thread on server shutdown. Used shutdown(2) on polling sockets instead, to the same effect. Removed yet another superstitious pthread_kill(), that ran on non-Windows in terminate_slave_thread().	2023-11-23 11:52:38 +11:00
Vladislav Vaintroub	bb8e1bf7a2	Merge 11.3 into 11.4	2023-11-21 15:43:20 +01:00
Anel Husakovic	a7d186a17d	MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc - Reviewer: <knielsen@knielsen-hq.org> <brandon.nesterenko@mariadb.com> <andrei.elkin@mariadb.com>	2023-11-16 10:41:11 +01:00
Oleksandr Byelkin	34272bd6a5	Merge branch '11.2' into 11.3	2023-11-14 18:33:03 +01:00
Oleksandr Byelkin	48af85db21	Merge branch '10.11' into 11.0	2023-11-08 17:09:44 +01:00
Oleksandr Byelkin	fecd78b837	Merge branch '10.10' into 10.11	2023-11-08 16:46:47 +01:00
Oleksandr Byelkin	04d9a46c41	Merge branch '10.6' into 10.10	2023-11-08 16:23:30 +01:00
Oleksandr Byelkin	b83c379420	Merge branch '10.5' into 10.6	2023-11-08 15:57:05 +01:00
Oleksandr Byelkin	6cfd2ba397	Merge branch '10.4' into 10.5	2023-11-08 12:59:00 +01:00
Marko Mäkelä	7b842f1536	Merge 11.2 into 11.3	2023-10-27 10:48:29 +03:00
Kristian Nielsen	8eee9806fb	MDEV-31273: Eliminate Log_event::checksum_alg This is a preparatory commit for pre-computing checksums outside of holding LOCK_log, no functional changes. Which checksum algorithm is used (if any) when writing an event does not belong in the event, it is a property of the log being written to. Instead decide the checksum algorithm when constructing the Log_event_writer object, and store it there. Introduce a client-only Log_event::read_checksum_alg to be able to print the checksum read, and a Format_description_log_event::source_checksum_alg which is the checksum algorithm (if any) to use when reading events from a log. Also eliminate some redundant `enum` keywords on the enum_binlog_checksum_alg type. Reviewed-by: Monty <monty@mariadb.org> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-10-26 20:45:35 +02:00
Brandon Nesterenko	c5f776e9fa	MDEV-32265: seconds_behind_master is inaccurate for Delayed replication If a replica is actively delaying a transaction when restarted (STOP SLAVE/START SLAVE), when the sql thread is back up, Seconds_Behind_Master will present as 0 until the configured MASTER_DELAY has passed. That is, before the restart, last_master_timestamp is updated to the timestamp of the delayed event. Then after the restart, the negation of sql_thread_caught_up is skipped because the timestamp of the event has already been used for the last_master_timestamp, and their update is grouped together in the same conditional block. This patch fixes this by separating the negation of sql_thread_caught_up out of the timestamp-dependent block, so it is called any time an idle parallel slave queues an event to a worker. Note that sql_thread_caught_up is still left in the check for internal events, as SBM should remain idle in such case to not "magically" begin incrementing. Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>	2023-10-23 14:25:03 -06:00
Brandon Nesterenko	0c1bf5e247	MDEV-27247: Add keywords "SQL_BEFORE_GTIDS" and "SQL_AFTER_GTIDS" for START SLAVE UNTIL New Feature: ============ This patch extends the START SLAVE UNTIL command with options SQL_BEFORE_GTIDS and SQL_AFTER_GTIDS to allow user control of whether the replica stops before or after a provided GTID state. Its syntax is: START SLAVE UNTIL (SQL_BEFORE_GTIDS\|SQL_AFTER_GTIDS)=”<gtid_list>” When providing SQL_BEFORE_GTIDS=”<gtid_list>”, for each domain specified in the gtid_list, the replica will execute transactions up to the GTID found, and immediately stop processing events in that domain (without executing the transaction of the specified GTID). Once all domains have stopped, the replica will stop. Events originating from domains that are not specified in the list are not replicated. START SLAVE UNTIL SQL_AFTER_GTIDS=”<gtid_list>” is an alias to the default behavior of START SLAVE UNTIL master_gtid_pos=”<gtid_list>”. That is, the replica will only execute transactions originating from domain ids provided in the list, and will stop once all transactions provided in the UNTIL list have all been executed. Example: ========= If a primary server has a binary log consisting of the following GTIDs: 0-1-1 1-1-1 0-1-2 1-1-2 0-1-3 1-1-3 If a fresh replica (i.e. one with an empty GTID position, @@gtid_slave_pos='') is started with SQL_BEFORE_GTIDS, i.e. START SLAVE UNTIL SQL_BEFORE_GTIDS=”1-1-2” The resulting gtid_slave_pos of the replica will be “1-1-1”. This is because the replica will execute only events from domain 1 until it sees the transaction with sequence number 2, and immediately stop without executing it. If the replica is started with SQL_AFTER_GTIDS, i.e. START SLAVE UNTIL SQL_AFTER_GTIDS=”1-1-2” then the resulting gtid_slave_pos of the replica will be “1-1-2”. This is because it will only execute events from domain 1 until it has executed the provided GTID. Reviewed By: ============ Kristian Nielson <knielsen@knielsen-hq.org>	2023-10-23 06:40:05 -06:00
Marko Mäkelä	be24e75229	Merge 10.11 into 11.0	2023-10-19 08:12:16 +03:00
Marko Mäkelä	2ecc0443ec	Merge 10.10 into 10.11	2023-10-17 16:04:21 +03:00
Marko Mäkelä	d5e15424d8	Merge 10.6 into 10.10 The MDEV-29693 conflict resolution is from Monty, as well as is a bug fix where ANALYZE TABLE wrongly built histograms for single-column PRIMARY KEY. Also includes a fix for safe_malloc error reporting. Other things: - Copied main.log_slow from 10.4 to avoid mtr issue Disabled test: - spider/bugfix.mdev_27239 because we started to get +Error 1429 Unable to connect to foreign data source: localhost -Error 1158 Got an error reading communication packets - main.delayed - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED This part is disabled for now as it fails randomly with different warnings/errors (no corruption).	2023-10-14 13:36:11 +03:00
Marko Mäkelä	6a470db552	Merge 10.5 into 10.6	2023-09-14 15:25:53 +03:00
Yuchen Pei	cb1965bd9d	Merge branch '10.4' into 10.5	2023-09-14 16:30:11 +10:00
Marko Mäkelä	0f9acce3f2	Merge 10.5 into 10.6	2023-09-14 09:01:15 +03:00
Brandon Nesterenko	1407f99963	MDEV-31177: SHOW SLAVE STATUS Last_SQL_Errno Race Condition on Errored Slave Restart The SQL thread and a user connection executing SHOW SLAVE STATUS have a race condition on Last_SQL_Errno, such that a slave which previously errored and stopped, on its next start, SHOW SLAVE STATUS can show that the SQL Thread is running while the previous error is also showing. The fix is to move when the last error is cleared when the SQL thread starts to occur before setting the status of Slave_SQL_Running. Thanks to Kristian Nielson for his work diagnosing the problem! Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com> Kristian Nielson <knielsen@knielsen-hq.org>	2023-09-13 12:01:47 -06:00
sjaakola	a3cbc44b24	MDEV-31833 replication breaks when using optimistic replication and replica is a galera node MariaDB async replication SQL thread was stopped for any failure in applying of replication events and error message logged for the failure was: "Node has dropped from cluster". The assumption was that event applying failure is always due to node dropping out. With optimistic parallel replication, event applying can fail for natural reasons and applying should be retried to handle the failure. This retry logic was never exercised because the slave SQL thread was stopped with first applying failure. To support optimistic parallel replication retrying logic this commit will now skip replication slave abort, if node remains in cluster (wsrep_ready==ON) and replication is configured for optimistic or aggressive retry logic. During the development of this fix, galera.galera_as_slave_nonprim test showed some problems. The test was analyzed, and it appears to need some attention. One excessive sleep command was removed in this commit, but it will need more fixes still to be fully deterministic. After this commit galera_as_slave_nonprim is successful, though. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-09-12 02:37:30 +02:00
Alexander Barkov	baf00fc553	Merge remote-tracking branch 'origin/10.11' into 11.0	2023-08-18 07:34:54 +04:00
Sergei Petrunia	725bd56834	Merge 10.10 into 10.11	2023-08-17 13:44:05 +03:00
Marko Mäkelä	9cd2989589	Merge 10.6 into 10.10	2023-08-16 15:28:42 +03:00
Kristian Nielsen	7c9837ce74	Merge 10.4 into 10.5 Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 18:02:18 +02:00
Kristian Nielsen	18acbaf416	MDEV-31655: Parallel replication deadlock victim preference code errorneously removed Restore code to make InnoDB choose the second transaction as a deadlock victim if two transactions deadlock that need to commit in-order for parallel replication. This code was erroneously removed when VATS was implemented in InnoDB. Also add a test case for InnoDB choosing the right deadlock victim. Also fixes this bug, with testcase that reliably reproduces: MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 16:39:49 +02:00
Kristian Nielsen	900c4d6920	MDEV-31655: Parallel replication deadlock victim preference code errorneously removed Restore code to make InnoDB choose the second transaction as a deadlock victim if two transactions deadlock that need to commit in-order for parallel replication. This code was erroneously removed when VATS was implemented in InnoDB. Also add a test case for InnoDB choosing the right deadlock victim. Also fixes this bug, with testcase that reliably reproduces: MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master Note: This should be null-merged to 10.6, as a different fix is needed there due to InnoDB locking code changes. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 16:35:30 +02:00

1 2 3 4 5 ...

2831 Commits