mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-09-13 13:47:59 +03:00

Author	SHA1	Message	Date
Oleksandr Byelkin	69d033d165	Merge branch '10.11' into 11.2	2024-10-29 16:42:46 +01:00
Oleksandr Byelkin	3d0fb15028	Merge branch '10.6' into 10.11	2024-10-29 15:24:38 +01:00
Monty	066f920484	MDEV-35110 Deadlock on Replica during BACKUP STAGE BLOCK_COMMIT on XA transactions This is an extension of MDEV-30423 "Deadlock on Replica during BACKUP STAGE BLOCK_COMMIT on XA transactions" The original commit in MDEV-30423 was not complete as some usage in XA of MDL_BACKUP_COMMIT locks did not set thd->backup_commit_lock. This is required to be set when using parallel replication. Fixed by ensuring that all usage of BACKUP_COMMIT lock i XA is uniform and all sets thd->backup_commit_lock. I also changed all locks to be MDL_EXPLICIT to keep also that part uniform. A regression test is added.	2024-10-28 13:29:21 +02:00
Brandon Nesterenko	1ed30e08af	MDEV-34122: Assertion `entry' failed in Active_tranx::assert_thd_is_waiter If semi-sync is switched off then on while a transaction is in-between binlogging and waiting for an ACK, the semi-sync state of the transaction is removed, leading to a debug assertion that indicates the transaction tried to wait, but cannot receive an ACK signal. More specifically, when semi-sync is switched off, the Active_tranx list is cleared (where a transaction adds an entry to this list during binlogging), and each entry in this list saves the thread which will wait for an ACK, and the thread has the COND variable to signal to wake itself. So if the entry is lost, the Ack_receiver thread won’t be able to find the thread to wake up when an ACK comes in The fix is to ensure that the entry exists before awaiting the ACK, and if there is no entry, skip the wait. In debug builds, an informative message is written explaining that the transaction is skipping its wait. Additional debug-build only logic is added to ensure that the cause of the missing entry is due to semi-sync being turned off and on Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org>	2024-10-21 15:35:54 -06:00
Sergei Golubchik	3a1cf2c85b	MDEV-34679 ER_BAD_FIELD uses non-localizable substrings	2024-10-17 21:37:37 +02:00
Sergei Golubchik	5ebda30ccc	Revert "MDEV-35019 Provide a way to enable "rollback XA on disconnect" behavior we had before 10.5.2" This reverts commit `8ae462a220`.	2024-10-16 13:23:47 +02:00
Kristian Nielsen	8ae462a220	MDEV-35019 Provide a way to enable "rollback XA on disconnect" behavior we had before 10.5.2 Implement variable legacy_xa_rollback_at_disconnect to support backwards compatibility for applications that rely on the pre-10.5 behavior for connection disconnect, which is to rollback the transaction (in violation of the XA specification). Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-10-16 10:18:36 +02:00
Marko Mäkelä	b53b81e937	Merge 11.2 into 11.4	2024-10-03 14:32:14 +03:00
Marko Mäkelä	12a91b57e2	Merge 10.11 into 11.2	2024-10-03 13:24:43 +03:00
Marko Mäkelä	63913ce5af	Merge 10.6 into 10.11	2024-10-03 10:55:08 +03:00
Marko Mäkelä	7e0afb1c73	Merge 10.5 into 10.6	2024-10-03 09:31:39 +03:00
Brandon Nesterenko	68938d2b42	MDEV-33500 (part 2): rpl.rpl_parallel_sbm can still fail The failing test case validates Seconds_Behind_Master for a delayed slave, while STOP SLAVE is executed during a delay. The test fixes initially added to the test (commit `b04c857596`) added a table lock to ensure a transaction could not finish before validating the Seconds_Behind_Master field after SLAVE START, but did not address a possibility that the transaction could finish before running the STOP SLAVE command, which invalidates the validations for the rest of the test case. Specifically, this would result in 1) a timeout in “Waiting for table metadata lock” on the replica, which expects the transaction to retry after slave restart and hit a lock conflict on the locked tables (added in `b04c857596`), and 2) that Seconds_Behind_Master should have increased, but did not. The failure can be reproduced by synchronizing the slave to the master before the MDEV-32265 echo statement (i.e. before the SLAVE STOP). This patch fixes the test by adding a mechanism to use DEBUG_SYNC to synchronize a MASTER_DELAY, rather than continually increase the duration of the delay each time the test fails on buildbot. This is to ensure that on slow machines, a delay does not pass before the test gets a chance to validate results. Additionally, it decreases overall test time because the test can continue immediately after validation, thereby bypassing the remainder of a full delay for each transaction.	2024-09-17 06:29:20 -06:00
Marko Mäkelä	44733aa8cf	Merge 11.2 into 11.4	2024-08-29 19:10:38 +03:00
Marko Mäkelä	e91a799458	Merge 10.11 into 11.2	2024-08-29 16:02:57 +03:00
Marko Mäkelä	cfcf27c6fe	Merge 10.6 into 10.11	2024-08-29 07:47:29 +03:00
Marko Mäkelä	48becffd07	Merge 10.5 into 10.6	2024-08-27 08:52:10 +03:00
Kristian Nielsen	214e6c5b3d	Fix sporadic failure of test case rpl.rpl_old_master Remove the test for MDEV-14528. This is supposed to test that parallel replication from pre-10.0 master will update Seconds_Behind_Master. But after MDEV-12179 the SQL thread is blocked from even beginning to fetch events from the relay log due to FLUSH TABLES WITH READ LOCK, so the test case is no longer testing what is was intended to. And pre-10.0 versions are long since out of support, so does not seem worthwhile to try to rewrite the test to work another way. The root cause of the test failure is MDEV-34778. Briefly, depending on exact timing during slave stop, the rli->sql_thread_caught_up flag may end up with different value. If it ends up as "true", this causes Seconds_Behind_Master to be 0 during next slave start; and this caused test case timeout as the test was waiting for Seconds_Behind_Master to become non-zero. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Brandon Nesterenko	bd54475efa	MDEV-34779: Sporadic test failure in rpl.rpl_semi_sync_cond_var_per_thd In a merge, an mtr.call_suppression was erroneously removed for "Got an error writing communication packets". So when the error would expectedly occur, the test would fail. This patch adds this suppression back. Note that the test will still fail due MDEV-34799, which is to be fixed in 10.6.	2024-08-22 13:44:15 -06:00
Kristian Nielsen	5dc2fe4815	Fix sporadic failure in test rpl.rpl_rotate_logs The test started failing after push of MDEV-31404. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-16 22:27:01 +02:00
Oleksandr Byelkin	1640c9b06e	Merge branch '11.2' into 11.4	2024-08-04 17:27:48 +02:00
Oleksandr Byelkin	dced6cbdb6	Merge branch '11.1' into 11.2	2024-08-03 09:50:16 +02:00
Oleksandr Byelkin	80abd847da	Merge branch '10.11' into 11.1	2024-08-03 09:32:42 +02:00
Oleksandr Byelkin	8f020508c8	Merge branch '10.5' into 10.6	2024-08-03 09:04:24 +02:00
Brandon Nesterenko	001608de7e	MDEV-15393: Fix rpl_mysqldump_gtid_slave_pos The slave would try to sync_with_master_gtid.inc, but the master never actually saved its gtid position so the test would move on too quickly.	2024-07-31 14:17:46 -06:00
Andrei	c944cd6fec	MDEV-15393 post-push: complete rpl_mysqldump_gtid_slave_pos fixes. Added a missed --source include/save_master_gtid.inc by the previous commit.	2024-07-22 20:52:26 +03:00
Oleksandr Byelkin	0fe39d368a	Merge branch '10.6' into 10.11	2024-07-22 15:14:50 +02:00
Oleksandr Byelkin	a938503cfb	Merge branch '10.5' into 10.6	2024-07-20 08:12:42 +02:00
Andrei	b8f92ade57	MDEV-15393 gtid_slave_pos duplicate key errors after mysqldump restore When mysqldump is run to dump the `mysql` system database, it generates INSERT statements into the table `mysql.gtid_slave_pos`. After running the backup script those inserts did not produce the expected gtid state on slave. In particular the maximum of mysql.gtid_slave_pos.sub_id did not make into rpl_global_gtid_slave_state.last_sub_id an in-memory object that is supposed to match the current state of the table. And that was regardless of whether --gtid option was specified or not. Later when the backup recipient server starts as slave in non-gtid mode this desychronization may lead to a duplicate key error. This effect is corrected for --gtid mode mysqldump/mariadb-dump only as the following. The fixes ensure the insert block of the dump script is followed with a "summing-up" SET @global.gtid_slave_pos assignment. For the implemenation part, note a deferred print-out of SET-gtid_slave_pos and associated comments is prefered over relocating of the entire blocks if (opt_master,slave_data && do_show_master,slave_status) ... because of compatiblity concern. Namely an error inside do_show_*() is handled in the new code the same way, as early as, as before. A regression test can be run in how-to-reproduce mode as well. One affected mtr test observed. rpl_mysqldump_slave.result "mismatch" shows now the new deferring print of SET-gtid_slave_pos policy in action.	2024-07-19 21:44:12 +03:00
Oleksandr Byelkin	9af2caca33	Merge branch '10.5' into 10.6	2024-07-18 16:25:33 +02:00
Brandon Nesterenko	a061ae1079	MDEV-33921: Fix rpl_xa_empty_transaction.test The test was missing a save_master_gtid.inc on the master, leading to the slave thinking it was in sync after executing sync_with_master_gtid.inc, despite not having executed the latest transaction. This skipped transaction, XA COMMIT, was supposed to error-to-be-ignored because its XID could not be found, but be thrown out because the replication filters would filter out the target database. However, if the slave was able to stop before executing the transaction, then the replication filer is reset (to empty), and when the slave is later restarted, that transactions error would no longer be ignored. Additionally, as the test cases added in MDEV-33921 rely on GTID synchronization, the test cases now force master_use_gtid=slave_pos for consistency	2024-07-17 16:38:26 -06:00
Sergei Golubchik	d60f5c11ea	MDEV-34318 mariadb-dump SQL syntax error with MAX_STATEMENT_TIME against Percona MySQL server protect MariaDB conditional comments from a bug in Percona MySQL comment parser	2024-07-17 21:25:40 +02:00
Yuchen Pei	f071b7620b	Merge branch '10.5' into 10.6	2024-07-16 15:54:22 +08:00
Brandon Nesterenko	632dd304c7	MDEV-34554: rpl_change_master_demote sporadically fails on buildbot MDEV-34274 did not fix the test failure. The test has a START SLAVE UNTIL condition, where we can't use sync_with_master_gtid.inc, wait_for_slave_to_start.inc, or wait_for_slave_to_stop.inc because our MTR connection thread races with the start/stop of the SQL/IO threads. So instead, for slave start, we prove the threads started by waiting for the connection count to increase by 2; and for slave stop, we wait for the processlist count to return to its pre start slave number.	2024-07-11 14:45:12 -06:00
Brandon Nesterenko	fa80449725	MDEV-34274: Test rpl.rpl_change_master_demote frequently fails on buildbot with "IO thread should not be running..." Note this is a backport of `8c8b3ab784` from 11.1. The test rpl.rpl_change_master_demote used a `sleep 1` command to give time for a START SLAVE UNTIL to start the slave threads and wait for them to automatically die by UNTIL. On machines with heavy load (especially MSAN bb builders), one second was not enough, and the test would fail due to the IO thread still being up. This patch fixes the test by replacing the sleep with specific conditions to wait for. The test cannot wait for the IO or SQL threads to start, as it would be possible that they would be started and stopped by the time the MTR executor would check the slave status. So instead, we test for proof that they existed via the Connections status variable being incremented by at least 2 (Connections just shows the global thread id). At this point, we still can't use the wait_for_slave_to_stop helper, as the SQL/IO_Running fields of SHOW SLAVE STATUS may not be updated yet. So instead, we use information_schema.processlist, which would show the presence of the Slave_SQL/IO threads. So to "wait for the slave to stop", we wait for the Slave_SQL/IO threads to be gone from the processlist.	2024-07-11 09:06:23 -06:00
Brandon Nesterenko	ea9869504d	MDEV-33921: Replication breaks when filtering two-phase XA transactions There are two problems. First, replication fails when XA transactions are used where the slave has replicate_do_db set and the client has touched a different database when running DML such as inserts. This is because XA commands are not treated as keywords, and are thereby not exempt from the replication filter. The effect of this is that during an XA transaction, if its logged “use db” from the master is filtered out by the replication filter, then XA END will be ignored, yet its corresponding XA PREPARE will be executed in an invalid state, thereby breaking replication. Second, if the slave replicates an XA transaction which results in an empty transaction, the XA START through XA PREPARE first phase of the transaction won’t be binlogged, yet the XA COMMIT will be binlogged. This will break replication in chain configurations. The first problem is fixed by treating XA commands in Query_log_event as keywords, thus allowing them to bypass the replication filter. Note that Query_log_event::is_trans_keyword() is changed to accept a new parameter to define its mode, to either check for XA commands or regular transaction commands, but not both. In addition, mysqlbinlog is adapted to use this mode so its --database filter does not remove XA commands from its output. The second problem fixed by overwriting the XA state in the XID cache to be XA_ROLLBACK_ONLY, so at commit time, the server knows to rollback the transaction and skip its binlogging. If the xid cache is cleared before an XA transaction receives its completion command (e.g. on server shutdown), then before reporting ER_XAER_NOTA when the completion command is executed, the filter is first checked if the database is ignored, and if so, the error is ignored. Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2024-07-10 14:37:39 -06:00
Monty	dd99780967	MDEV-34504 PURGE BINARY LOGS not working anymore PURGE BINARY LOGS did not always purge binary logs. This commit fixes some of the issues and adds notifications if a binary log cannot be purged. User visible changes: - 'PURGE BINARY LOG TO log_name' and 'PURGE BINARY LOGS BEFORE date' worked differently. 'TO' ignored 'slave_connections_needed_for_purge' while 'BEFORE' did not. Now both versions ignores the 'slave_connections_needed_for_purge variable'. - 'PURGE BINARY LOG..' commands now returns 'note' if a binary log cannot be deleted like Note 1375 Binary log 'master-bin.000004' is not purged because it is the current active binlog - Automatic binary log purges, based on date or size, will write a note to the error log if a binary log matching the size or date cannot yet be deleted. - If 'slave_connections_needed_for_purge' is set from a config or command line, it is set to 0 if Galera is enabled and 1 otherwise (old default). This ensures that automatic binary log purge works with Galera as before the addition of 'slave_connections_needed_for_purge'. If the variable is changed to 0, a warning will be printed to the error log. Code changes: - Added THD argument to several purge_logs related functions that needed THD. - Added 'interactive' options to purge_logs functions. This allowed me to remove testing of sql_command == SQLCOM_PURGE. - Changed purge_logs_before_date() to first check if log is applicable before calling can_purge_logs(). This ensures we do not get a notification for logs that does not match the remove criteria. - MYSQL_BIN_LOG::can_purge_log() will write notifications to the user or error log if a log cannot yet be removed. - log_in_use() will return reason why a binary log cannot be removed. Changes to keep code consistent: - Moved checking of binlog_format for Galera to be after Galera is initialized (The old check never worked). If Galera is enabled we now change the binlog_format to ROW, with a warning, instead of aborting the server. If this change happens a warning will be printed to the error log. - Print a warning if Galera or FLASHBACK changes the binlog_format to ROW. Before it was done silently. Reviewed by: Sergei Golubchik <serg@mariadb.com>, Kristian Nielsen <knielsen@knielsen-hq.org>	2024-07-10 18:50:08 +03:00
Alexander Barkov	5fb07d942b	Merge remote-tracking branch 'origin/11.2' into 11.4	2024-07-09 21:45:37 +04:00
Alexander Barkov	8aad19ddfc	Merge remote-tracking branch 'origin/11.1' into 11.2	2024-07-09 14:04:11 +04:00
Julius Goryavsky	4026f04425	Merge branch 10.5 into 10.6	2024-07-09 11:56:47 +02:00
Alexander Barkov	44af9bfc67	Merge remote-tracking branch 'origin/10.11' into 11.1	2024-07-09 10:45:47 +04:00
Oleksandr Byelkin	2447dda2c0	Merge branch '10.11' into 11.1	2024-07-08 22:40:16 +02:00
Alexander Barkov	4d71a117a3	Merge remote-tracking branch 'origin/10.6' into 10.11	2024-07-08 21:52:08 +04:00
Brandon Nesterenko	744580d5a7	MDEV-32892: IO Thread Reports False Error When Stopped During Connecting to Primary The IO thread can report error code 2013 into the error log when it is stopped during the initial connection process to the primary, as well as when trying to read an event. However, because the IO thread is being stopped, its connection to the primary is force-killed by the signaling thread (see THD::awake_no_mutex()), and thereby these connection errors should be ignored. Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org>	2024-07-08 10:39:17 -06:00
Alexander Barkov	e56040fee8	Merge remote-tracking branch 'origin/10.5' into 10.6	2024-07-08 18:59:04 +04:00
Brandon Nesterenko	eb4458e993	MDEV-33465: an option to enable semisync recovery The current semi-sync binlog fail-over recovery process uses rpl_semi_sync_slave_enabled==TRUE as its condition to truncate a primary server’s binlog, as it is anticipating the server to re-join a replication topology as a replica. However, for servers configured with both rpl_semi_sync_master_enabled=1 and rpl_semi_sync_slave_enabled=1, if a primary is just re-started (i.e. retaining its role as master), it can truncate its binlog to drop transactions which its replica(s) has already received and executed. If this happens, when the replica reconnects, its gtid_slave_pos can be ahead of the recovered primary’s gtid_binlog_pos, resulting in an error state where the replica’s state is ahead of the primary’s. This patch changes the condition for semi-sync recovery to truncate the binlog to instead use the configuration variable --init-rpl-role, when set to SLAVE. This allows for both rpl_semi_sync_master_enabled and rpl_semi_sync_slave_enabled to be set for a primary that is restarted, and no transactions will be lost, so long as --init-rpl-role is not set to SLAVE. Reviewed By: ============ Sergei Golubchik <serg@mariadb.com>	2024-07-05 19:53:57 -06:00
Brandon Nesterenko	cbc1898e82	MDEV-25607: Auto-generated DELETE from HEAP table can break replication The special logic used by the memory storage engine to keep slaves in sync with the master on a restart can break replication. In particular, after a restart, the master writes DELETE statements in the binlog for each MEMORY-based table so the slave can empty its data. If the DELETE is not executable, e.g. due to invalid triggers, the slave will error and fail, whereas the master will never see the problem. Instead of DELETE statements, use TRUNCATE to keep slaves in-sync with the master, thereby bypassing triggers. Reviewed By: =========== Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2024-07-05 12:00:09 -06:00
Marko Mäkelä	27a3366663	Merge 10.6 into 10.11	2024-06-27 10:26:09 +03:00
Marko Mäkelä	0076eb3d4e	Merge 10.5 into 10.6	2024-06-24 13:09:47 +03:00
Monty	3541bd63f0	MDEV-33582 Add more warnings to be able to better diagnose network issues Changed the logged messages from errors to warnings Also changed 'remain' to 'read_length' in the warning to make it more readable.	2024-06-20 09:53:01 +03:00
Brandon Nesterenko	8c8b3ab784	MDEV-34274: Test rpl.rpl_change_master_demote frequently fails on buildbot with "IO thread should not be running..." The test rpl.rpl_change_master_demote used a `sleep 1` command to give time for a START SLAVE UNTIL to start the slave threads and wait for them to automatically die by UNTIL. On machines with heavy load (especially MSAN bb builders), one second was not enough, and the test would fail due to the IO thread still being up. This patch fixes the test by replacing the sleep with specific conditions to wait for. The test cannot wait for the IO or SQL threads to start, as it would be possible that they would be started and stopped by the time the MTR executor would check the slave status. So instead, we test for proof that they existed via the Connections status variable being incremented by at least 2 (Connections just shows the global thread id). At this point, we still can't use the wait_for_slave_to_stop helper, as the SQL/IO_Running fields of SHOW SLAVE STATUS may not be updated yet. So instead, we use information_schema.processlist, which would show the presence of the Slave_SQL/IO threads. So to "wait for the slave to stop", we wait for the Slave_SQL/IO threads to be gone from the processlist.	2024-06-19 10:17:08 -06:00

1 2 3 4 5 ...

3665 Commits