mariadb

database/mariadb

Fork 0

mirror of https://github.com/MariaDB/server.git synced 2025-09-11 05:52:26 +03:00

Commit Graph

Author	SHA1	Message	Date
Sujatha	475f69b985	MDEV-25958: rpl_semi_sync_fail_over.test fails in buildbot Analysis: ======== In case multi binlog truncation scenario debug sync points are in the following order. Two inserts are done on master as shown below. INSERT INTO t1 VALUES (4, REPEAT("x", 4100) commit_after_release_LOCK_after_binlog_sync INSERT INTO t1 VALUES (5, REPEAT("x", 4100) commit_after_release_LOCK_log First insert debug sync ensures that transaction is synced to binlog and not committed but it reached slave through semi sync. Second insert debug sync ensures that transaction is synced to binlog and not committed. It doesn't ensure that 'INSERT 5' reached slave. Most of the times INSERT 5 reaches slave, hence when it is promoted as master it sends 4,5 to slave. But occasionally 5 may not reach slave in those cases post recovery master will have only 4. When row 6 is inserted Master has 4-6 and Slave has 4,5,6. This results in test failure. Fix: === For the first insert use 'commit_before_get_LOCK_commit_ordered' debug sync point, it will ensure that binlog was sent to slave and slave has acknowledged the receipt. Now enable debug code such that when the next transaction is written to binary log, dump thread will read and send it across the network and notify the server to be get killed. Insert row 5 and wait for notification from dump thread. Kill the server. This ensures that both 4 and 5 have reached the semi-sync slave. Added a new test case: Insert two rows on master such that first is present in master's binlog and reached semi sync slave. Second insert should be flushed to binlog but not sent to slave. Now crash and fail over to slave. The promoted master will send the extra transaction to slave.	2021-08-19 11:59:39 +05:30
Sujatha	6c39eaeb12	MDEV-21117: refine the server binlog-based recovery for semisync Problem: ======= When the semisync master is crashed and restarted as slave it could recover transactions that former slaves may never have seen. A known method existed to clear out all prepared transactions with --tc-heuristic-recover=rollback does not care to adjust binlog accordingly. Fix: === The binlog-based recovery is made to concern of the slave semisync role of post-crash restarted server. No changes in behavior is done to the "normal" binloggging server and the semisync master. When the restarted server is configured with --rpl-semi-sync-slave-enabled=1 the refined recovery attempts to roll back prepared transactions and truncate binlog accordingly. In case of a partially committed (that is committed at least in one of the engine participants) such transaction gets committed. It's guaranteed no (partially as well) committed transactions exist beyond the truncate position. In case there exists a non-transactional replication event (being in a way a committed transaction) past the computed truncate position the recovery ends with an error. As after master crash and failover to slave, the demoted-to-slave ex-master must be ready to face and accept its own (generated by) events, without generally necessary --replicate-same-server-id. So the acceptance conditions are relaxed for the semisync slave to accept own events without that option. While gtid_strict_mode ON ensures no duplicate transaction can be (re-)executed the master_use_gtid=none slave has to be configured with --replicate-same-server-id. NOTE for reviewers. This patch does not handle the user XA which is done in next git commit.	2021-06-11 19:49:39 +03:00

Author

SHA1

Message

Date

Sujatha

475f69b985

MDEV-25958: rpl_semi_sync_fail_over.test fails in buildbot

Analysis:
========
In case multi binlog truncation scenario debug sync points are in the
following order.

Two inserts are done on master as shown below.

INSERT INTO t1 VALUES (4, REPEAT("x", 4100)
commit_after_release_LOCK_after_binlog_sync

INSERT INTO t1 VALUES (5, REPEAT("x", 4100)
commit_after_release_LOCK_log

First insert debug sync ensures that transaction is synced to binlog and
not committed but it reached slave through semi sync.

Second insert debug sync ensures that transaction is synced to binlog and
not committed. It doesn't ensure that 'INSERT 5' reached slave.

Most of the times INSERT 5 reaches slave, hence when it is promoted as
master it sends 4,5 to slave. But occasionally 5 may not reach slave in
those cases post recovery master will have only 4. When row 6 is inserted
Master has 4-6 and Slave has 4,5,6.

This results in test failure.

Fix:
===
For the first insert use 'commit_before_get_LOCK_commit_ordered' debug sync
point, it will ensure that binlog was sent to slave and slave has
acknowledged the receipt. Now enable debug code such that when the next
transaction is written to binary log, dump thread will read and send it
across the network and notify the server to be get killed. Insert row 5
and wait for notification from dump thread. Kill the server. This ensures
that both 4 and 5 have reached the semi-sync slave.

Added a new test case:
Insert two rows on master such that first is present in master's binlog and
reached semi sync slave. Second insert should be flushed to binlog but not
sent to slave. Now crash and fail over to slave. The promoted master will send
the extra transaction to slave.

2021-08-19 11:59:39 +05:30

Sujatha

6c39eaeb12

MDEV-21117: refine the server binlog-based recovery for semisync

Problem:
=======
When the semisync master is crashed and restarted as slave it could
recover transactions that former slaves may never have seen.
A known method existed to clear out all prepared transactions
with --tc-heuristic-recover=rollback does not care to adjust
binlog accordingly.

Fix:
===
The binlog-based recovery is made to concern of the slave semisync role of
post-crash restarted server.
No changes in behavior is done to the "normal" binloggging server
and the semisync master.

When the restarted server is configured with
  --rpl-semi-sync-slave-enabled=1
the refined recovery attempts to roll back prepared transactions
and truncate binlog accordingly.
In case of a partially committed (that is committed at least
in one of the engine participants) such transaction gets committed.
It's guaranteed no (partially as well) committed transactions
exist beyond the truncate position.
In case there exists a non-transactional replication event
(being in a way a committed transaction) past the
computed truncate position the recovery ends with an error.

As after master crash and failover to slave, the demoted-to-slave
ex-master must be ready to face and accept its own (generated by)
events, without generally necessary --replicate-same-server-id.
So the acceptance conditions are relaxed for the semisync slave
to accept own events without that option.
While gtid_strict_mode ON ensures no duplicate transaction can be
(re-)executed the master_use_gtid=none slave has to be
configured with --replicate-same-server-id.

*NOTE* for reviewers.

This patch does not handle the user XA which is done
in next git commit.

2021-06-11 19:49:39 +03:00

2 Commits