mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-09-11 05:52:26 +03:00

Author	SHA1	Message	Date
Brandon Nesterenko	68938d2b42	MDEV-33500 (part 2): rpl.rpl_parallel_sbm can still fail The failing test case validates Seconds_Behind_Master for a delayed slave, while STOP SLAVE is executed during a delay. The test fixes initially added to the test (commit `b04c857596`) added a table lock to ensure a transaction could not finish before validating the Seconds_Behind_Master field after SLAVE START, but did not address a possibility that the transaction could finish before running the STOP SLAVE command, which invalidates the validations for the rest of the test case. Specifically, this would result in 1) a timeout in “Waiting for table metadata lock” on the replica, which expects the transaction to retry after slave restart and hit a lock conflict on the locked tables (added in `b04c857596`), and 2) that Seconds_Behind_Master should have increased, but did not. The failure can be reproduced by synchronizing the slave to the master before the MDEV-32265 echo statement (i.e. before the SLAVE STOP). This patch fixes the test by adding a mechanism to use DEBUG_SYNC to synchronize a MASTER_DELAY, rather than continually increase the duration of the delay each time the test fails on buildbot. This is to ensure that on slow machines, a delay does not pass before the test gets a chance to validate results. Additionally, it decreases overall test time because the test can continue immediately after validation, thereby bypassing the remainder of a full delay for each transaction.	2024-09-17 06:29:20 -06:00
Sergei Golubchik	7a789e2027	sporadic failures of rpl.rpl_parallel_sbm the test waits for the event to get stuck on MASTER_DELAY, but on a slow/overloaded slave the event might pass MASTER_DELAY before the test starts waiting. Wait for the event to get stuck on the LOCK TABLES (after MASTER_DELAY), the event cannot avoid that,	2024-05-05 21:37:07 +02:00
Sergei Golubchik	cea083af9f	cleanup: use THD_STAGE_INFO, not thd_proc_info and put master-slave.inc last in the series of includes	2024-05-05 21:37:07 +02:00
Brandon Nesterenko	b04c857596	MDEV-33500: rpl.rpl_parallel_sbm can fail on slow machines, e.g. MSAN/Valgrind builders In an addition to test rpl.rpl_parallel_sbm added by MDEV-32265, the test uses sleep statements alone to test Seconds_Behind_Master with delayed replication. On slow running machines, the test can pass the intended MASTER_DELAY duration and Seconds_Behind_Master can become 0, when the test expects the transaction to still be actively in a delaying state. This can be consistently reproduced by adding a sleep statement before the call to --let = query_get_value(SHOW SLAVE STATUS, Seconds_Behind_Master, 1) to sleep past the delay end point. This patch fixes this by locking the table which the delayed transaction targets so Second_Behind_Master cannot be updated before the test reads it for validation.	2024-02-20 08:19:18 -07:00
Brandon Nesterenko	c5f776e9fa	MDEV-32265: seconds_behind_master is inaccurate for Delayed replication If a replica is actively delaying a transaction when restarted (STOP SLAVE/START SLAVE), when the sql thread is back up, Seconds_Behind_Master will present as 0 until the configured MASTER_DELAY has passed. That is, before the restart, last_master_timestamp is updated to the timestamp of the delayed event. Then after the restart, the negation of sql_thread_caught_up is skipped because the timestamp of the event has already been used for the last_master_timestamp, and their update is grouped together in the same conditional block. This patch fixes this by separating the negation of sql_thread_caught_up out of the timestamp-dependent block, so it is called any time an idle parallel slave queues an event to a worker. Note that sql_thread_caught_up is still left in the check for internal events, as SBM should remain idle in such case to not "magically" begin incrementing. Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>	2023-10-23 14:25:03 -06:00
Brandon Nesterenko	063f4ac25e	MDEV-30619: Parallel Slave SQL Thread Can Update Seconds_Behind_Master with Active Workers MDEV-31749 sporadic assert in MDEV-30619 new test If the workers of a parallel replica are busy (potentially with long queues), but the SQL thread has no events left to distribute (so it goes idle), then the next event that comes from the primary will update mi->last_master_timestamp with its timestamp, even if the workers have not yet finished. This patch changes the parallel replica logic which updates last_master_timestamp after idling from using solely sql_thread_caught_up (added in MDEV-29639) to using the latter with rli queued/dequeued event counters. That is, if the queued count is equal to the dequeued count, it means all events have been processed and the replica is considered idle when the driver thread has also distributed all events. Low level details of the commit include - to make a more generalized test for Seconds_Behind_Master on the parallel replica, rpl_delayed_parallel_slave_sbm.test is renamed to rpl_parallel_sbm.test for this purpose. - pause_sql_thread_on_next_event usage was removed with the MDEV-30619 fixes. Rather than remove it, we adapt it to the needs of this test case - added test case to cover SBM spike of relay log read and LMT update that was fixed by MDEV-29639 - rpl_seconds_behind_master_spike.test is made to use the negate_clock_diff_with_master debug eval. Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>	2023-07-25 16:36:14 +03:00

6 Commits