MDEV-6462: Slave replicating using GTID doesn't recover correctly when master crashes in the middle of transaction

If the slave gets a reconnect in the middle of a GTID event group, normally it will re-fetch that event group, skipping the first part that was already queued for the SQL thread. However, if the master crashed while writing the event group, the group is incomplete. This patch detects this case and makes sure that the transaction is rolled back and nothing is skipped from any following event groups. Similarly, a network proxy might cause the reconnect to end up on a different master server. Detect this by noticing a different server_id, and similarly in this case roll back the partially received group.
2025-08-01 03:47:19 +03:00 · 2014-09-02 14:07:01 +02:00
parent fbaaf3688d
commit 36f50be970
6 changed files with 373 additions and 0 deletions
--- a/sql/rpl_mi.h
+++ b/sql/rpl_mi.h
@ -135,6 +135,12 @@ class Master_info : public Slave_reporting_capability
  ulonglong received_heartbeats;  // counter of received heartbeat events
  DYNAMIC_ARRAY ignore_server_ids;
  ulong master_id;
+  /*
+    At reconnect and until the first rotate event is seen, prev_master_id is
+    the value of master_id during the previous connection, used to detect
+    silent change of master server during reconnects.
+  */
+  ulong prev_master_id;
  /*
    Which kind of GTID position (if any) is used when connecting to master.