mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00

Author	SHA1	Message	Date
Brandon Nesterenko	01fafd45eb	MDEV-35939: rpl.rpl_parallel_sbm: "Slave_last_event_time is not equal to Master_last_event_time" The test rpl.rpl_parallel_sbm would fail because Slave_last_event_time would not match Master_last_event_time after syncing with the master (i.e. via sync_with_master_gtid.inc). This happens because the timing statistics are updated after updating gtid_slave_pos. That is, gtid_slave_pos is updated with the transaction commit, whereas the timing statistics are updated when the relay log position is updated (i.e. after the commit). This is by design. For the test, this means there is a small amount of time after sync_with_master_gtid.inc where the SHOW SLAVE STATUS output lags behind the data state of the slave. This would cause the test to fail if comparing Slave_last_event_time to Master_last_event_time during this period, because Slave_last_event_time would not yet reflect the latest committed transaction. The fix is to add a wait_condition before the comparison to wait for the SHOW SLAVE STATUS statistics to be updated. The actual check is to wait for Seconds_Behind_Master == 0, because the slave should be idle at this point, which forces the value to be 0. The test failure can be reproduced with the following patch: diff --git a/sql/rpl_gtid.cc b/sql/rpl_gtid.cc index db5c26b6237..b346a32b573 100644 --- a/sql/rpl_gtid.cc +++ b/sql/rpl_gtid.cc @@ -300,6 +300,7 @@ rpl_slave_state::update(uint32 domain_id, uint32 server_id, uint64 sub_id, mysql_mutex_lock(&LOCK_slave_state); res= update_nolock(domain_id, server_id, sub_id, seq_no, hton, rgi); mysql_mutex_unlock(&LOCK_slave_state); + sleep(1); return res; }	2025-01-26 09:17:23 -07:00
Marko Mäkelä	43465352b9	Merge 11.4 into 11.6	2024-10-03 16:09:56 +03:00
Brandon Nesterenko	68938d2b42	MDEV-33500 (part 2): rpl.rpl_parallel_sbm can still fail The failing test case validates Seconds_Behind_Master for a delayed slave, while STOP SLAVE is executed during a delay. The test fixes initially added to the test (commit `b04c857596`) added a table lock to ensure a transaction could not finish before validating the Seconds_Behind_Master field after SLAVE START, but did not address a possibility that the transaction could finish before running the STOP SLAVE command, which invalidates the validations for the rest of the test case. Specifically, this would result in 1) a timeout in “Waiting for table metadata lock” on the replica, which expects the transaction to retry after slave restart and hit a lock conflict on the locked tables (added in `b04c857596`), and 2) that Seconds_Behind_Master should have increased, but did not. The failure can be reproduced by synchronizing the slave to the master before the MDEV-32265 echo statement (i.e. before the SLAVE STOP). This patch fixes the test by adding a mechanism to use DEBUG_SYNC to synchronize a MASTER_DELAY, rather than continually increase the duration of the delay each time the test fails on buildbot. This is to ensure that on slow machines, a delay does not pass before the test gets a chance to validate results. Additionally, it decreases overall test time because the test can continue immediately after validation, thereby bypassing the remainder of a full delay for each transaction.	2024-09-17 06:29:20 -06:00
Monty	25b5c63905	MDEV-33856: Alternative Replication Lag Representation via Received/Executed Master Binlog Event Timestamps This commit adds 3 new status variables to 'show all slaves status': - Master_last_event_time ; timestamp of the last event read from the master by the IO thread. - Slave_last_event_time ; Master timestamp of the last event committed on the slave. - Master_Slave_time_diff: The difference of the above two timestamps. All the above variables are NULL until the slave has started and the slave has read one query event from the master that changes data. - Added information_schema.slave_status, which allows us to remove: - show_master_info(), show_master_info_get_fields(), send_show_master_info_data(), show_all_master_info() - class Sql_cmd_show_slave_status. - Protocol::store(I_List<i_string_pair>* str_list) as it is not used anymore. - Changed old SHOW SLAVE STATUS and SHOW ALL SLAVES STATUS to use the SELECT code path, as all other SHOW ... STATUS commands. Other things: - Xid_log_time is set to time of commit to allow slave that reads the binary log to calculate Master_last_event_time and Slave_last_event_time. This is needed as there is not 'exec_time' for row events. - Fixed that Load_log_event calculates exec_time identically to Query_event. - Updated RESET SLAVE to reset Master/Slave_last_event_time - Updated SQL thread's update on first transaction read-in to only update Slave_last_event_time on group events. - Fixed possible (unlikely) bugs in sql_show.cc ...old_format() functions if allocation of 'field' would fail. Reviewed By: Brandon Nesterenko <brandon.nesterenko@mariadb.com> Kristian Nielsen <knielsen@knielsen-hq.org>	2024-07-25 08:57:27 -06:00
Sergei Golubchik	7a789e2027	sporadic failures of rpl.rpl_parallel_sbm the test waits for the event to get stuck on MASTER_DELAY, but on a slow/overloaded slave the event might pass MASTER_DELAY before the test starts waiting. Wait for the event to get stuck on the LOCK TABLES (after MASTER_DELAY), the event cannot avoid that,	2024-05-05 21:37:07 +02:00
Sergei Golubchik	cea083af9f	cleanup: use THD_STAGE_INFO, not thd_proc_info and put master-slave.inc last in the series of includes	2024-05-05 21:37:07 +02:00
Sergei Golubchik	9e92582024	sporadic failures of rpl.rpl_parallel_sbm the test waits for the event to get stuck on MASTER_DELAY, but on a slow/overloaded slave the event might pass MASTER_DELAY before the test starts waiting. Wait for the event to get stuck on the LOCK TABLES (after MASTER_DELAY), the event cannot avoid that,	2024-04-25 12:47:23 +02:00
Sergei Golubchik	9cf718859f	cleanup: use THD_STAGE_INFO, not thd_proc_info and put master-slave.inc last in the series of includes	2024-04-24 22:08:52 +02:00
Brandon Nesterenko	b04c857596	MDEV-33500: rpl.rpl_parallel_sbm can fail on slow machines, e.g. MSAN/Valgrind builders In an addition to test rpl.rpl_parallel_sbm added by MDEV-32265, the test uses sleep statements alone to test Seconds_Behind_Master with delayed replication. On slow running machines, the test can pass the intended MASTER_DELAY duration and Seconds_Behind_Master can become 0, when the test expects the transaction to still be actively in a delaying state. This can be consistently reproduced by adding a sleep statement before the call to --let = query_get_value(SHOW SLAVE STATUS, Seconds_Behind_Master, 1) to sleep past the delay end point. This patch fixes this by locking the table which the delayed transaction targets so Second_Behind_Master cannot be updated before the test reads it for validation.	2024-02-20 08:19:18 -07:00
Brandon Nesterenko	c5f776e9fa	MDEV-32265: seconds_behind_master is inaccurate for Delayed replication If a replica is actively delaying a transaction when restarted (STOP SLAVE/START SLAVE), when the sql thread is back up, Seconds_Behind_Master will present as 0 until the configured MASTER_DELAY has passed. That is, before the restart, last_master_timestamp is updated to the timestamp of the delayed event. Then after the restart, the negation of sql_thread_caught_up is skipped because the timestamp of the event has already been used for the last_master_timestamp, and their update is grouped together in the same conditional block. This patch fixes this by separating the negation of sql_thread_caught_up out of the timestamp-dependent block, so it is called any time an idle parallel slave queues an event to a worker. Note that sql_thread_caught_up is still left in the check for internal events, as SBM should remain idle in such case to not "magically" begin incrementing. Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>	2023-10-23 14:25:03 -06:00
Brandon Nesterenko	063f4ac25e	MDEV-30619: Parallel Slave SQL Thread Can Update Seconds_Behind_Master with Active Workers MDEV-31749 sporadic assert in MDEV-30619 new test If the workers of a parallel replica are busy (potentially with long queues), but the SQL thread has no events left to distribute (so it goes idle), then the next event that comes from the primary will update mi->last_master_timestamp with its timestamp, even if the workers have not yet finished. This patch changes the parallel replica logic which updates last_master_timestamp after idling from using solely sql_thread_caught_up (added in MDEV-29639) to using the latter with rli queued/dequeued event counters. That is, if the queued count is equal to the dequeued count, it means all events have been processed and the replica is considered idle when the driver thread has also distributed all events. Low level details of the commit include - to make a more generalized test for Seconds_Behind_Master on the parallel replica, rpl_delayed_parallel_slave_sbm.test is renamed to rpl_parallel_sbm.test for this purpose. - pause_sql_thread_on_next_event usage was removed with the MDEV-30619 fixes. Rather than remove it, we adapt it to the needs of this test case - added test case to cover SBM spike of relay log read and LMT update that was fixed by MDEV-29639 - rpl_seconds_behind_master_spike.test is made to use the negate_clock_diff_with_master debug eval. Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>	2023-07-25 16:36:14 +03:00

11 Commits