mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00

Author	SHA1	Message	Date
Brandon Nesterenko	ea9869504d	MDEV-33921: Replication breaks when filtering two-phase XA transactions There are two problems. First, replication fails when XA transactions are used where the slave has replicate_do_db set and the client has touched a different database when running DML such as inserts. This is because XA commands are not treated as keywords, and are thereby not exempt from the replication filter. The effect of this is that during an XA transaction, if its logged “use db” from the master is filtered out by the replication filter, then XA END will be ignored, yet its corresponding XA PREPARE will be executed in an invalid state, thereby breaking replication. Second, if the slave replicates an XA transaction which results in an empty transaction, the XA START through XA PREPARE first phase of the transaction won’t be binlogged, yet the XA COMMIT will be binlogged. This will break replication in chain configurations. The first problem is fixed by treating XA commands in Query_log_event as keywords, thus allowing them to bypass the replication filter. Note that Query_log_event::is_trans_keyword() is changed to accept a new parameter to define its mode, to either check for XA commands or regular transaction commands, but not both. In addition, mysqlbinlog is adapted to use this mode so its --database filter does not remove XA commands from its output. The second problem fixed by overwriting the XA state in the XID cache to be XA_ROLLBACK_ONLY, so at commit time, the server knows to rollback the transaction and skip its binlogging. If the xid cache is cleared before an XA transaction receives its completion command (e.g. on server shutdown), then before reporting ER_XAER_NOTA when the completion command is executed, the filter is first checked if the database is ignored, and if so, the error is ignored. Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2024-07-10 14:37:39 -06:00
Monty	dd99780967	MDEV-34504 PURGE BINARY LOGS not working anymore PURGE BINARY LOGS did not always purge binary logs. This commit fixes some of the issues and adds notifications if a binary log cannot be purged. User visible changes: - 'PURGE BINARY LOG TO log_name' and 'PURGE BINARY LOGS BEFORE date' worked differently. 'TO' ignored 'slave_connections_needed_for_purge' while 'BEFORE' did not. Now both versions ignores the 'slave_connections_needed_for_purge variable'. - 'PURGE BINARY LOG..' commands now returns 'note' if a binary log cannot be deleted like Note 1375 Binary log 'master-bin.000004' is not purged because it is the current active binlog - Automatic binary log purges, based on date or size, will write a note to the error log if a binary log matching the size or date cannot yet be deleted. - If 'slave_connections_needed_for_purge' is set from a config or command line, it is set to 0 if Galera is enabled and 1 otherwise (old default). This ensures that automatic binary log purge works with Galera as before the addition of 'slave_connections_needed_for_purge'. If the variable is changed to 0, a warning will be printed to the error log. Code changes: - Added THD argument to several purge_logs related functions that needed THD. - Added 'interactive' options to purge_logs functions. This allowed me to remove testing of sql_command == SQLCOM_PURGE. - Changed purge_logs_before_date() to first check if log is applicable before calling can_purge_logs(). This ensures we do not get a notification for logs that does not match the remove criteria. - MYSQL_BIN_LOG::can_purge_log() will write notifications to the user or error log if a log cannot yet be removed. - log_in_use() will return reason why a binary log cannot be removed. Changes to keep code consistent: - Moved checking of binlog_format for Galera to be after Galera is initialized (The old check never worked). If Galera is enabled we now change the binlog_format to ROW, with a warning, instead of aborting the server. If this change happens a warning will be printed to the error log. - Print a warning if Galera or FLASHBACK changes the binlog_format to ROW. Before it was done silently. Reviewed by: Sergei Golubchik <serg@mariadb.com>, Kristian Nielsen <knielsen@knielsen-hq.org>	2024-07-10 18:50:08 +03:00
Alexander Barkov	5fb07d942b	Merge remote-tracking branch 'origin/11.2' into 11.4	2024-07-09 21:45:37 +04:00
Alexander Barkov	8aad19ddfc	Merge remote-tracking branch 'origin/11.1' into 11.2	2024-07-09 14:04:11 +04:00
Julius Goryavsky	4026f04425	Merge branch 10.5 into 10.6	2024-07-09 11:56:47 +02:00
Alexander Barkov	44af9bfc67	Merge remote-tracking branch 'origin/10.11' into 11.1	2024-07-09 10:45:47 +04:00
Alexander Barkov	4ffeca643d	Renaming a few mysql-test/suite/rpl/include/.test files into .inc Builders: - amd64-fedora-38-last-N-failed - amd64-debian-10-last-N-failed erroneously execute MTR for these files: - rpl/include/rpl_extra_col_master.test - rpl/include.rpl_binlog_max_cache_size.test - rpl/include.rpl_charset.test when these files are changed by a commit. Changing their extension from .test to .inc to avoid this.	2024-07-09 08:46:40 +04:00
Oleksandr Byelkin	2447dda2c0	Merge branch '10.11' into 11.1	2024-07-08 22:40:16 +02:00
Alexander Barkov	4d71a117a3	Merge remote-tracking branch 'origin/10.6' into 10.11	2024-07-08 21:52:08 +04:00
Brandon Nesterenko	744580d5a7	MDEV-32892: IO Thread Reports False Error When Stopped During Connecting to Primary The IO thread can report error code 2013 into the error log when it is stopped during the initial connection process to the primary, as well as when trying to read an event. However, because the IO thread is being stopped, its connection to the primary is force-killed by the signaling thread (see THD::awake_no_mutex()), and thereby these connection errors should be ignored. Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org>	2024-07-08 10:39:17 -06:00
Alexander Barkov	e56040fee8	Merge remote-tracking branch 'origin/10.5' into 10.6	2024-07-08 18:59:04 +04:00
Brandon Nesterenko	eb4458e993	MDEV-33465: an option to enable semisync recovery The current semi-sync binlog fail-over recovery process uses rpl_semi_sync_slave_enabled==TRUE as its condition to truncate a primary server’s binlog, as it is anticipating the server to re-join a replication topology as a replica. However, for servers configured with both rpl_semi_sync_master_enabled=1 and rpl_semi_sync_slave_enabled=1, if a primary is just re-started (i.e. retaining its role as master), it can truncate its binlog to drop transactions which its replica(s) has already received and executed. If this happens, when the replica reconnects, its gtid_slave_pos can be ahead of the recovered primary’s gtid_binlog_pos, resulting in an error state where the replica’s state is ahead of the primary’s. This patch changes the condition for semi-sync recovery to truncate the binlog to instead use the configuration variable --init-rpl-role, when set to SLAVE. This allows for both rpl_semi_sync_master_enabled and rpl_semi_sync_slave_enabled to be set for a primary that is restarted, and no transactions will be lost, so long as --init-rpl-role is not set to SLAVE. Reviewed By: ============ Sergei Golubchik <serg@mariadb.com>	2024-07-05 19:53:57 -06:00
Brandon Nesterenko	cbc1898e82	MDEV-25607: Auto-generated DELETE from HEAP table can break replication The special logic used by the memory storage engine to keep slaves in sync with the master on a restart can break replication. In particular, after a restart, the master writes DELETE statements in the binlog for each MEMORY-based table so the slave can empty its data. If the DELETE is not executable, e.g. due to invalid triggers, the slave will error and fail, whereas the master will never see the problem. Instead of DELETE statements, use TRUNCATE to keep slaves in-sync with the master, thereby bypassing triggers. Reviewed By: =========== Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2024-07-05 12:00:09 -06:00
Marko Mäkelä	27a3366663	Merge 10.6 into 10.11	2024-06-27 10:26:09 +03:00
Marko Mäkelä	0076eb3d4e	Merge 10.5 into 10.6	2024-06-24 13:09:47 +03:00
Monty	3541bd63f0	MDEV-33582 Add more warnings to be able to better diagnose network issues Changed the logged messages from errors to warnings Also changed 'remain' to 'read_length' in the warning to make it more readable.	2024-06-20 09:53:01 +03:00
Brandon Nesterenko	8c8b3ab784	MDEV-34274: Test rpl.rpl_change_master_demote frequently fails on buildbot with "IO thread should not be running..." The test rpl.rpl_change_master_demote used a `sleep 1` command to give time for a START SLAVE UNTIL to start the slave threads and wait for them to automatically die by UNTIL. On machines with heavy load (especially MSAN bb builders), one second was not enough, and the test would fail due to the IO thread still being up. This patch fixes the test by replacing the sleep with specific conditions to wait for. The test cannot wait for the IO or SQL threads to start, as it would be possible that they would be started and stopped by the time the MTR executor would check the slave status. So instead, we test for proof that they existed via the Connections status variable being incremented by at least 2 (Connections just shows the global thread id). At this point, we still can't use the wait_for_slave_to_stop helper, as the SQL/IO_Running fields of SHOW SLAVE STATUS may not be updated yet. So instead, we use information_schema.processlist, which would show the presence of the Slave_SQL/IO threads. So to "wait for the slave to stop", we wait for the Slave_SQL/IO threads to be gone from the processlist.	2024-06-19 10:17:08 -06:00
Andrei	387bdb2a2e	MDEV-29934 rpl.rpl_start_alter_chain_basic, rpl.rpl_start_alter_restart_slave sometimes fail in BB with result content mismatch rpl.rpl_start_alter_chain_basic was used to fail sporadically due to a missed GTID master-slave synchronization which was necessary because of the following SELECT from GTID-state table. Fixed with arranging two synchronization pieces for two chain slaves requiring that. Note rpl.rpl_start_alter_restart_slave must have been fixed by MDEV-30460 and `87e13722a9` (manual) merge commit.	2024-06-19 14:09:00 +03:00
Brandon Nesterenko	6cab2f75fe	MDEV-23857: replication master password length After MDEV-4013, the maximum length of replication passwords was extended to 96 ASCII characters. After a restart, however, slaves only read the first 41 characters of MASTER_PASSWORD from the master.info file. This lead to slaves unable to reconnect to the master after a restart. After a slave restart, if a master.info file is detected, use the full allowable length of the password rather than 41 characters. Reviewed By: ============ Sergei Golubchik <serg@mariadb.com>	2024-06-18 07:21:18 -06:00
Andrei	c37b2a9f04	MDEV-30460 rpl.rpl_start_alter_restart_slave sometimes fails in BB with result length mismatch The test was used to fail because of lacking a synchronization point designating an expected for processing "CA_1" event has been indeed taken into this phase. The event apparently may not have even arrived at slave. Fixed with deploying the missed synchronization.	2024-06-17 19:06:34 +03:00
Alexander Barkov	c4bf4ce948	Merge remote-tracking branch 'origin/11.2' into 11.4	2024-06-17 15:46:39 +04:00
Marko Mäkelä	a21e49cbcc	Merge 11.1 into 11.2	2024-06-17 12:02:03 +03:00
Marko Mäkelä	d34289a3e2	Merge 10.11 into 11.1	2024-06-17 09:21:50 +03:00
Marko Mäkelä	b81d717387	Merge 10.6 into 10.11	2024-06-11 12:50:10 +03:00
Brandon Nesterenko	fcd21d3e40	MDEV-34355: rpl.rpl_semi_sync_no_missed_ack_after_add_slave ‘server_3 should have sent…’ The problem is that the test could query the status variable Rpl_semi_sync_slave_send_ack before the slave actually updated it. This would result in an immediate --die assertion killing the rest of the test. The bottom of this commit message has a small patch that can be applied to reproduce the test failure. This patch fixes the test failure by waiting for the variable to be updated before querying its value. diff --git a/sql/semisync_slave.cc b/sql/semisync_slave.cc index 9ddd4c5c8d7..60538079fce 100644 --- a/sql/semisync_slave.cc +++ b/sql/semisync_slave.cc @@ -303,7 +303,10 @@ int Repl_semi_sync_slave::slave_reply(Master_info *mi) reply_res= DBUG_EVALUATE_IF("semislave_failed_net_flush", 1, net_flush(net)); if (!reply_res) + { + sleep(1); rpl_semi_sync_slave_send_ack++; + } } DBUG_RETURN(reply_res); }	2024-06-10 12:27:20 -06:00
Oleksandr Byelkin	99b370e023	Merge branch '11.2' into 11.4	2024-05-21 19:38:51 +02:00
Sergei Golubchik	bf5da43e50	Merge branch '11.1' into 11.2	2024-05-13 10:00:26 +02:00
Sergei Golubchik	f0a5412037	Merge branch '11.0' into 11.1	2024-05-13 09:52:30 +02:00
Sergei Golubchik	f9807aadef	Merge branch '10.11' into 11.0	2024-05-12 12:18:28 +02:00
Sergei Golubchik	a6b2f820e0	Merge branch '10.6' into 10.11	2024-05-10 20:02:18 +02:00
Sergei Golubchik	7b53672c63	Merge branch '10.5' into 10.6	2024-05-08 20:06:00 +02:00
Kristian Nielsen	383ee364dc	Merge 10.6 to 10.11	2024-05-07 08:45:31 +02:00
Sergei Golubchik	7a789e2027	sporadic failures of rpl.rpl_parallel_sbm the test waits for the event to get stuck on MASTER_DELAY, but on a slow/overloaded slave the event might pass MASTER_DELAY before the test starts waiting. Wait for the event to get stuck on the LOCK TABLES (after MASTER_DELAY), the event cannot avoid that,	2024-05-05 21:37:07 +02:00
Sergei Golubchik	cea083af9f	cleanup: use THD_STAGE_INFO, not thd_proc_info and put master-slave.inc last in the series of includes	2024-05-05 21:37:07 +02:00
Kristian Nielsen	4b4db4a8e5	MDEV-34042: Deadlock kill of XA PREPARE can break replication / rpl.rpl_parallel_multi_domain_xa sporadic failure Refinement of the original patch. Move the code to reset the kill up into the parent class Xid_apply_log_event, to also fix the similar issue for XA COMMIT. Increase the number of slave retries in the test case rpl.rpl_parallel_multi_domain_xa to fix some sporadic failures. The test generates massive amounts of conflicting transactions in multiple independent domains, which can cause multiple rollback+retry for a transaction as it conflicts with transactions in other domains one-by-one. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-05-05 19:01:56 +02:00
Sergei Golubchik	3ee6f69d49	sporadic failures of binlog_encryption.rpl_parallel_gco_wait_kill CURRENT_TEST: binlog_encryption.rpl_parallel_gco_wait_kill mysqltest: In included file "./suite/rpl/t/rpl_parallel_gco_wait_kill.test": included from /home/buildbot/amd64-ubuntu-2004-debug/build/mysql-test/suite/binlog_encryption/rpl_parallel_gco_wait_kill.test at line 2: At line 334: Can't initialize replace from 'replace_result $thd_id THD_ID' An sql thread can reach the "Slave has read all relay log" state and then start reading relay log again. Let's use a more generic pattern to retrieve the sql thread ID even if it's not in the "read all relay log" state.	2024-05-02 22:14:19 +02:00
Kristian Nielsen	e365877bae	MDEV-33798: ROW base optimistic deadlock with concurrent writes on same table One case is conflicting transactions T1 and T2 with different domain id, in optimistic parallel replication in non-GTID mode. Then T2 will wait_for_prior_commit on T1; and if T1 got a row lock wait on T2 it would hang, as different domains caused the deadlock kill to be skipped in thd_rpl_deadlock_check(). More generally, if we have transactions T1 and T2 in one domain/master connection, and independent transactions U in another, then we can still deadlock like this: T1 row low wait on U U row lock wait on T2 T2 wait_for_prior_commit on T1 This commit enforces the deadlock kill in these cases. If the waited-for transaction is speculatively applied, then it will be deadlock killed in case of a conflict, even if the two transactions are in different domains or master connections. Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-05-02 21:07:45 +02:00
Sergei Golubchik	0aae11ac28	Merge branch '10.6' into 10.11	2024-04-30 16:56:49 +02:00
Andrei	3fa2caf553	MDEV-31404 post-push for rpl.max_binlog_total_size The test's header did not follow a correct `have_` and `master-slave` sourcing pattern. That's corrected.	2024-04-30 14:00:19 +03:00
Andrei	ae03374f29	MDEV-34030 rpl.rpl_using_gtid_default can fail in (BB) mtr The test's header is not written to follow strictly a correct order of checks by mtr at test start which may lead to an error. E.g ./mtr --mysqld=--binlog-format=row rpl.rpl_using_gtid_default to At line 175: query 'SET GLOBAL gtid_slave_pos= ""' failed: ER_SLAVE_MUST_STOP (1198): This operation cannot be performed as you have a running slave ''; run STOP SLAVE '' first Fixed to require the binlog format first in the test header.	2024-04-30 12:40:50 +03:00
Andrei	6a63204c36	MDEV-34029 rpl.rpl_heartbeat can fail when (BB) mtr reorders tests rpl.rpl_heartbeat turns out to miss a standard include/master-slave header which made it potentially in BB and actually with manual mtr failing as it may have used a previous slave GTID state. Fixed with installing the standard rpl suite header/footer in the test file.	2024-04-30 12:40:50 +03:00
Sergei Golubchik	c1f3eff53f	Merge branch '10.5' into 10.6	2024-04-29 10:08:58 +02:00
Sergei Golubchik	7ff649315e	sporadic failures of rpl.rpl_parallel_multi_domain_xa it's a slow test, the slave needs to catch up, reading >1500 transactions. A default MASTER_GTID_WAIT() timeout in sync_with_master_gtid.inc is 120 seconds, which might be not enough for a slow/overloaded slave. Let's wait forever or until ./mtr --testcase-timeout, whatever comes first.	2024-04-26 14:24:32 +02:00
Sergei Golubchik	9e92582024	sporadic failures of rpl.rpl_parallel_sbm the test waits for the event to get stuck on MASTER_DELAY, but on a slow/overloaded slave the event might pass MASTER_DELAY before the test starts waiting. Wait for the event to get stuck on the LOCK TABLES (after MASTER_DELAY), the event cannot avoid that,	2024-04-25 12:47:23 +02:00
Kristian Nielsen	553a4d6271	MDEV-33602: Sporadic test failure in rpl.rpl_gtid_stop_start The test could fail with a duplicate key error because switching to non-GTID mode could start at the wrong old-style position. The position could be wrong when the previous GTID connect was stopped before receiving the fake GTID list event which gives the old-style position corresponding to the GTID connected position. Work-around by injecting an extra event and syncing the slave before switching to non-GTID mode. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-04-25 11:00:45 +02:00
Sergei Golubchik	9cf718859f	cleanup: use THD_STAGE_INFO, not thd_proc_info and put master-slave.inc last in the series of includes	2024-04-24 22:08:52 +02:00
Brandon Nesterenko	8c7992165b	MDEV-33672: 10.11 Fix for Two Phase Alter Flags Extends `89c907bd4f` to account for binlog_two_phase_alter flags in a Gtid log event. I.e., if the FL_COMMIT_ALTER_E1 or FL_ROLLBACK_ALTER_E2 flags are set in the event flags, yet the length of the event is too short to hold the value, then set the event as invalid	2024-04-24 13:19:36 +02:00
Sergei Golubchik	7fe764b109	MDEV-32973 SHOW TABLES LIKE shows temporary tables with non-matching names * compare both db and table name * use the correct charset	2024-04-23 12:31:41 +02:00
Sergei Golubchik	f243c73788	sporadic failures of rpl.rpl_rewrite_db_sys_vars first stop the slave, then run commands on the master that are supposed to fail on the slave, then start the slave. if you swap first two steps, the slave might get and execute those commands before it's stopped, which will fail the test. also, improve debugability	2024-04-22 21:02:11 +02:00
Sergei Golubchik	018d537ec1	Merge branch '10.6' into 10.11	2024-04-22 15:23:10 +02:00

1 2 3 4 5 ...

4561 Commits