mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00

Author	SHA1	Message	Date
Marko Mäkelä	3d23adb766	Merge 10.6 into 10.11	2024-11-29 13:43:17 +02:00
Brandon Nesterenko	a06d81ff3f	MDEV-35477: rpl_semi_sync_no_missed_ack_after_add_slave fails after MDEV-35109 MTR test rpl_semi_sync_no_missed_ack_after_add_slave fails on buildbot after the preparatory commit for MDEV-35109 (`5290fa043b`) which changed a sleep to a debug_sync point. The problem is that the debug_sync point would time-out on a slave while waiting to enter the logic to send an ACK reply. More specifically, where the test config is a primary with two replicas, and the test waits on one of the replicas to start sending an ACK, if the other replica was able to receive the event and respond with an ACK before the binlog dump thread of the timing-out server would prepare to send event, it wouldn't set the SEMI_SYNC_NEED_ACK flag, and the replica wouldn't even try to respond with an ACK. Fix is to use debug_sync for both replicas such that both replicas are held before sending their ack, so one can’t temporarily disable semi-sync for the other before it receives the transaction.	2024-11-21 11:30:25 -07:00
Dave Gosselin	3f114a0930	MDEV-35046 SIGSEGV in list_delete in optimized builds when using pseudo_slave_mode slave_applier_reset_xa_trans() should clear the THD::pseudo_thread_id when called to reset XA transaction state completely. Clearing when pseudo_thread_id models the binlog applier that handles BASE64-encoded events which possibly contain the pseudo_thread_id, allowing us to restore the pre-event's state of the connection's respective session var.	2024-11-15 09:48:28 -05:00
Brandon Nesterenko	716ed2ce22	MDEV-35350: Consolidate MTR wait_for_pattern_in_file.inc and SEARCH_WAIT in search_pattern_in_file.inc Replace wait_for_pattern_in_file.inc and all of its uses to use search_pattern_in_file.inc with SEARCH_WAIT. Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org> Sergei Golubchik <serg@mariadb.org>	2024-11-07 13:25:58 -07:00
Vladislav Vaintroub	faf9e755ba	MDEV-35109 fix test case rpl_semi_sync_after_sync_coord_consistency fails on release compilation	2024-11-05 22:38:55 +01:00
Brandon Nesterenko	b07258a0d5	MDEV-35109: Semi-sync Replication stalling Primary using wait point=AFTER_SYNC For a primary configured with wait_point=AFTER_SYNC, if two threads T1 (binlogging through MYSQL_BIN_LOG::write()) and T2 were binlogging at the same time, T1 could accidentally wait for its semi-sync ACK using the binlog coordinates of T2. Prior to MDEV-33551, this only resulted in delayed transactions, because all transactions shared the same condition variable for ACK signaling. However, with the MDEV-33551 changes, each thread has its own condition variable to signal. So T1 could wait indefinitely when either: 1) T1's ACK is received but not T2's when T1 goes into wait_after_sync(), because the ACK receiver thread has already notified about the T1 ACK, but T1 was _actually_ waiting on T2's ACK, and therefore tries to wait (in vain). 2) T1 goes to wait_after_sync() before any ACKs have arrived. When T1's ACK comes in, T1 is woken up; however, sees it needs to wait more (because it was actually waiting on T2's ACK), and goes to wait again (this time, in vain). Note that the actual cause of T1 waiting on T2's binlog coordinates is when MYSQL_BIN_LOG::write() would call Repl_semisync_master::wait_after_sync(), the binlog offset parameter was read as the end of MYSQL_BIN_LOG::log_file, which is shared among transactions. So if T2 had updated the binary log _after_ T1 had released LOCK_log, but not yet invoked wait_after_sync(), it would use the end of the binary log file as the binlog offset, which was that of T2 (or any future transaction). The fix in this patch ensures consistency between the binary log coordinates a transaction uses between report_binlog_update() and wait_after_sync(). Reviewed By ============ Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2024-11-04 10:45:58 -07:00
Brandon Nesterenko	5290fa043b	MDEV-35109 PREP: simulate_delay_semisync_slave_reply use debug_sync This is a preparatory commit for MDEV-35109 to make its testing code cleaner (and harden other tests too). The DEBUG_DBUG point simulate_delay_semisync_slave_reply up to this patch used my_sleep() to delay an ACK response, but sleeps are prone to test failures on machines that run tests when already having a heavy load (e.g. on buildbot). This patch changes this DEBUG_DBUG sleep to use DEBUG_SYNC to coordinate exactly when a slave should send its reply, which is safer and faster. As DEBUG_SYNC can't be used while a server is shutting down, to synchronize threads with SHUTDOWN WAIT FOR SLAVES logic, we use and extend wait_for_pattern_in_file.inc to wait for an informational error message in the logic to indicate that the shutdown process has reached the intended state (i.e. indicating that the shutdown has been delayed to await semi-sync ACKs). Specifically, the extensions are as follows: 1. wait_for_pattern_in_file.inc is extended with parameter wait_for_pattern_count as a number that indicates the number of times a pattern should occur in the file before return control back to the calling script. 2. search_for_pattern_in_file.inc is extended with parameter SEARCH_ABORT_IS_SUCCESS to inverse the error/success logic, so the SEARCH_ABORT condition can be used to indicate success, rather than error.	2024-11-04 10:45:58 -07:00
Brandon Nesterenko	e9a502df08	Testing fix for rpl_semi_sync_cond_var_per_thd failure	2024-10-30 08:32:19 -06:00
Oleksandr Byelkin	c770bce898	Merge branch '11.2' into 11.4	2024-10-30 15:11:17 +01:00
Oleksandr Byelkin	69d033d165	Merge branch '10.11' into 11.2	2024-10-29 16:42:46 +01:00
Oleksandr Byelkin	3d0fb15028	Merge branch '10.6' into 10.11	2024-10-29 15:24:38 +01:00
Monty	066f920484	MDEV-35110 Deadlock on Replica during BACKUP STAGE BLOCK_COMMIT on XA transactions This is an extension of MDEV-30423 "Deadlock on Replica during BACKUP STAGE BLOCK_COMMIT on XA transactions" The original commit in MDEV-30423 was not complete as some usage in XA of MDL_BACKUP_COMMIT locks did not set thd->backup_commit_lock. This is required to be set when using parallel replication. Fixed by ensuring that all usage of BACKUP_COMMIT lock i XA is uniform and all sets thd->backup_commit_lock. I also changed all locks to be MDL_EXPLICIT to keep also that part uniform. A regression test is added.	2024-10-28 13:29:21 +02:00
Brandon Nesterenko	1ed30e08af	MDEV-34122: Assertion `entry' failed in Active_tranx::assert_thd_is_waiter If semi-sync is switched off then on while a transaction is in-between binlogging and waiting for an ACK, the semi-sync state of the transaction is removed, leading to a debug assertion that indicates the transaction tried to wait, but cannot receive an ACK signal. More specifically, when semi-sync is switched off, the Active_tranx list is cleared (where a transaction adds an entry to this list during binlogging), and each entry in this list saves the thread which will wait for an ACK, and the thread has the COND variable to signal to wake itself. So if the entry is lost, the Ack_receiver thread won’t be able to find the thread to wake up when an ACK comes in The fix is to ensure that the entry exists before awaiting the ACK, and if there is no entry, skip the wait. In debug builds, an informative message is written explaining that the transaction is skipping its wait. Additional debug-build only logic is added to ensure that the cause of the missing entry is due to semi-sync being turned off and on Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org>	2024-10-21 15:35:54 -06:00
Sergei Golubchik	3a1cf2c85b	MDEV-34679 ER_BAD_FIELD uses non-localizable substrings	2024-10-17 21:37:37 +02:00
Sergei Golubchik	5ebda30ccc	Revert "MDEV-35019 Provide a way to enable "rollback XA on disconnect" behavior we had before 10.5.2" This reverts commit `8ae462a220`.	2024-10-16 13:23:47 +02:00
Kristian Nielsen	8ae462a220	MDEV-35019 Provide a way to enable "rollback XA on disconnect" behavior we had before 10.5.2 Implement variable legacy_xa_rollback_at_disconnect to support backwards compatibility for applications that rely on the pre-10.5 behavior for connection disconnect, which is to rollback the transaction (in violation of the XA specification). Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-10-16 10:18:36 +02:00
Marko Mäkelä	b53b81e937	Merge 11.2 into 11.4	2024-10-03 14:32:14 +03:00
Marko Mäkelä	12a91b57e2	Merge 10.11 into 11.2	2024-10-03 13:24:43 +03:00
Marko Mäkelä	63913ce5af	Merge 10.6 into 10.11	2024-10-03 10:55:08 +03:00
Marko Mäkelä	7e0afb1c73	Merge 10.5 into 10.6	2024-10-03 09:31:39 +03:00
Lena Startseva	0a5e4a0191	MDEV-31005: Make working cursor-protocol Updated tests: cases with bugs or which cannot be run with the cursor-protocol were excluded with "--disable_cursor_protocol"/"--enable_cursor_protocol" Fix for v.10.5	2024-09-18 18:39:26 +07:00
Brandon Nesterenko	68938d2b42	MDEV-33500 (part 2): rpl.rpl_parallel_sbm can still fail The failing test case validates Seconds_Behind_Master for a delayed slave, while STOP SLAVE is executed during a delay. The test fixes initially added to the test (commit `b04c857596`) added a table lock to ensure a transaction could not finish before validating the Seconds_Behind_Master field after SLAVE START, but did not address a possibility that the transaction could finish before running the STOP SLAVE command, which invalidates the validations for the rest of the test case. Specifically, this would result in 1) a timeout in “Waiting for table metadata lock” on the replica, which expects the transaction to retry after slave restart and hit a lock conflict on the locked tables (added in `b04c857596`), and 2) that Seconds_Behind_Master should have increased, but did not. The failure can be reproduced by synchronizing the slave to the master before the MDEV-32265 echo statement (i.e. before the SLAVE STOP). This patch fixes the test by adding a mechanism to use DEBUG_SYNC to synchronize a MASTER_DELAY, rather than continually increase the duration of the delay each time the test fails on buildbot. This is to ensure that on slow machines, a delay does not pass before the test gets a chance to validate results. Additionally, it decreases overall test time because the test can continue immediately after validation, thereby bypassing the remainder of a full delay for each transaction.	2024-09-17 06:29:20 -06:00
Marko Mäkelä	44733aa8cf	Merge 11.2 into 11.4	2024-08-29 19:10:38 +03:00
Marko Mäkelä	e91a799458	Merge 10.11 into 11.2	2024-08-29 16:02:57 +03:00
Marko Mäkelä	cfcf27c6fe	Merge 10.6 into 10.11	2024-08-29 07:47:29 +03:00
Marko Mäkelä	48becffd07	Merge 10.5 into 10.6	2024-08-27 08:52:10 +03:00
Kristian Nielsen	8642453ce6	Fix sporadic failure of test case rpl.rpl_start_stop_slave The test was expecting the I/O thread to be in a specific state, but thread scheduling may cause it to not yet have reached that state. So just have a loop that waits for the expected state to occur. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Kristian Nielsen	214e6c5b3d	Fix sporadic failure of test case rpl.rpl_old_master Remove the test for MDEV-14528. This is supposed to test that parallel replication from pre-10.0 master will update Seconds_Behind_Master. But after MDEV-12179 the SQL thread is blocked from even beginning to fetch events from the relay log due to FLUSH TABLES WITH READ LOCK, so the test case is no longer testing what is was intended to. And pre-10.0 versions are long since out of support, so does not seem worthwhile to try to rewrite the test to work another way. The root cause of the test failure is MDEV-34778. Briefly, depending on exact timing during slave stop, the rli->sql_thread_caught_up flag may end up with different value. If it ends up as "true", this causes Seconds_Behind_Master to be 0 during next slave start; and this caused test case timeout as the test was waiting for Seconds_Behind_Master to become non-zero. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Kristian Nielsen	7dc4ea5649	Fix sporadic test failure in rpl.rpl_create_drop_event Depending on timing, an extra event run could start just when the event scheduler is shut down and delay running until after the table has been dropped; this would cause the test to fail with a "table does not exist" error in the log. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Kristian Nielsen	33854d7324	Restore skiping rpl.rpl_mdev6020 under Valgrind (Revert a change done by mistake when XtraDB was removed.) Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Brandon Nesterenko	bd54475efa	MDEV-34779: Sporadic test failure in rpl.rpl_semi_sync_cond_var_per_thd In a merge, an mtr.call_suppression was erroneously removed for "Got an error writing communication packets". So when the error would expectedly occur, the test would fail. This patch adds this suppression back. Note that the test will still fail due MDEV-34799, which is to be fixed in 10.6.	2024-08-22 13:44:15 -06:00
Kristian Nielsen	78fcb9474c	Fix sporadic failure in test rpl.rpl_rotate_logs Clarify confusing comments in the previous commit, and note that the failure started after push of MDEV-34504. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-19 21:18:56 +02:00
Kristian Nielsen	5dc2fe4815	Fix sporadic failure in test rpl.rpl_rotate_logs The test started failing after push of MDEV-31404. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-16 22:27:01 +02:00
Oleksandr Byelkin	1640c9b06e	Merge branch '11.2' into 11.4	2024-08-04 17:27:48 +02:00
Oleksandr Byelkin	dced6cbdb6	Merge branch '11.1' into 11.2	2024-08-03 09:50:16 +02:00
Oleksandr Byelkin	80abd847da	Merge branch '10.11' into 11.1	2024-08-03 09:32:42 +02:00
Oleksandr Byelkin	8f020508c8	Merge branch '10.5' into 10.6	2024-08-03 09:04:24 +02:00
Brandon Nesterenko	001608de7e	MDEV-15393: Fix rpl_mysqldump_gtid_slave_pos The slave would try to sync_with_master_gtid.inc, but the master never actually saved its gtid position so the test would move on too quickly.	2024-07-31 14:17:46 -06:00
Andrei	c944cd6fec	MDEV-15393 post-push: complete rpl_mysqldump_gtid_slave_pos fixes. Added a missed --source include/save_master_gtid.inc by the previous commit.	2024-07-22 20:52:26 +03:00
Oleksandr Byelkin	0fe39d368a	Merge branch '10.6' into 10.11	2024-07-22 15:14:50 +02:00
Oleksandr Byelkin	a938503cfb	Merge branch '10.5' into 10.6	2024-07-20 08:12:42 +02:00
Andrei	b8f92ade57	MDEV-15393 gtid_slave_pos duplicate key errors after mysqldump restore When mysqldump is run to dump the `mysql` system database, it generates INSERT statements into the table `mysql.gtid_slave_pos`. After running the backup script those inserts did not produce the expected gtid state on slave. In particular the maximum of mysql.gtid_slave_pos.sub_id did not make into rpl_global_gtid_slave_state.last_sub_id an in-memory object that is supposed to match the current state of the table. And that was regardless of whether --gtid option was specified or not. Later when the backup recipient server starts as slave in non-gtid mode this desychronization may lead to a duplicate key error. This effect is corrected for --gtid mode mysqldump/mariadb-dump only as the following. The fixes ensure the insert block of the dump script is followed with a "summing-up" SET @global.gtid_slave_pos assignment. For the implemenation part, note a deferred print-out of SET-gtid_slave_pos and associated comments is prefered over relocating of the entire blocks if (opt_master,slave_data && do_show_master,slave_status) ... because of compatiblity concern. Namely an error inside do_show_*() is handled in the new code the same way, as early as, as before. A regression test can be run in how-to-reproduce mode as well. One affected mtr test observed. rpl_mysqldump_slave.result "mismatch" shows now the new deferring print of SET-gtid_slave_pos policy in action.	2024-07-19 21:44:12 +03:00
Oleksandr Byelkin	9af2caca33	Merge branch '10.5' into 10.6	2024-07-18 16:25:33 +02:00
Brandon Nesterenko	a061ae1079	MDEV-33921: Fix rpl_xa_empty_transaction.test The test was missing a save_master_gtid.inc on the master, leading to the slave thinking it was in sync after executing sync_with_master_gtid.inc, despite not having executed the latest transaction. This skipped transaction, XA COMMIT, was supposed to error-to-be-ignored because its XID could not be found, but be thrown out because the replication filters would filter out the target database. However, if the slave was able to stop before executing the transaction, then the replication filer is reset (to empty), and when the slave is later restarted, that transactions error would no longer be ignored. Additionally, as the test cases added in MDEV-33921 rely on GTID synchronization, the test cases now force master_use_gtid=slave_pos for consistency	2024-07-17 16:38:26 -06:00
Sergei Golubchik	d60f5c11ea	MDEV-34318 mariadb-dump SQL syntax error with MAX_STATEMENT_TIME against Percona MySQL server protect MariaDB conditional comments from a bug in Percona MySQL comment parser	2024-07-17 21:25:40 +02:00
Yuchen Pei	f071b7620b	Merge branch '10.5' into 10.6	2024-07-16 15:54:22 +08:00
Daniel Black	e8bcc4e455	MDEV-34568 rpl.rpl_mdev12179 - correct for Windows Simplify in an attempt to avoid: mysqltest: At line 275: File already exist: on the write_file lines. Using write_line as that's what a lot of other tests do for writing small bits to a expect file. Review thanks Valdislav Vaintroub	2024-07-12 12:55:28 +02:00
Brandon Nesterenko	632dd304c7	MDEV-34554: rpl_change_master_demote sporadically fails on buildbot MDEV-34274 did not fix the test failure. The test has a START SLAVE UNTIL condition, where we can't use sync_with_master_gtid.inc, wait_for_slave_to_start.inc, or wait_for_slave_to_stop.inc because our MTR connection thread races with the start/stop of the SQL/IO threads. So instead, for slave start, we prove the threads started by waiting for the connection count to increase by 2; and for slave stop, we wait for the processlist count to return to its pre start slave number.	2024-07-11 14:45:12 -06:00
Brandon Nesterenko	fa80449725	MDEV-34274: Test rpl.rpl_change_master_demote frequently fails on buildbot with "IO thread should not be running..." Note this is a backport of `8c8b3ab784` from 11.1. The test rpl.rpl_change_master_demote used a `sleep 1` command to give time for a START SLAVE UNTIL to start the slave threads and wait for them to automatically die by UNTIL. On machines with heavy load (especially MSAN bb builders), one second was not enough, and the test would fail due to the IO thread still being up. This patch fixes the test by replacing the sleep with specific conditions to wait for. The test cannot wait for the IO or SQL threads to start, as it would be possible that they would be started and stopped by the time the MTR executor would check the slave status. So instead, we test for proof that they existed via the Connections status variable being incremented by at least 2 (Connections just shows the global thread id). At this point, we still can't use the wait_for_slave_to_stop helper, as the SQL/IO_Running fields of SHOW SLAVE STATUS may not be updated yet. So instead, we use information_schema.processlist, which would show the presence of the Slave_SQL/IO threads. So to "wait for the slave to stop", we wait for the Slave_SQL/IO threads to be gone from the processlist.	2024-07-11 09:06:23 -06:00
Monty	e0cff1e72b	Fixed failure in rpl.rpl_change_master_demote : "IO thread should not be running..." The issue was that the test did not take into account that the IO thread could have been in COMMAND=Connecting state, which happens before the COMMANMD=Slave_IO state. The test is a bit fragile as it depends on the COMMAND state to be syncronised with the Slave_IO_State, which is not the case. I added a new proc state and some more information to the error output to be able to diagnose future failures more easily.	2024-07-11 11:15:47 +03:00

1 2 3 4 5 ...

4561 Commits