1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00

MDEV-4473: mysql_binlog_send() starts sending events from wrong GTID position in some master failover scenarios

Suppose binlog file X has in its Gtid_list_event: 0-1-3,0-2-5, and suppose the
slave requests to start replicating after 0-1-3.

In this case the bug was that master would start sending events from the start
of X. This is wrong, because 0-2-4 and 0-2-5 are contained in X-1, and are
needed by the slave. So these events were lost.

On the other hand, if the slave requested 0-2-5, then it _is_ correct to start
sending from the beginning of binlog file X, because 0-2-5 is the last GTID
logged in earlier binlogs. The difference is that 0-2-5 is the last of the
GTIDs in the Gtid_list_event. The problem was that the code did not check that
the matched GTID was the last one in the list.

Fixed by checking if the gtid requested by slave that matches a gtid in the
Gtid_list_event is the last event for that domain in the list. If not, go back
to a prior binlog to ensure all needed events are sent to slave.

mysql-test/include/show_events.inc:
  Backport --let $binlog_file=LAST, used by MDEV-4473 test case.
This commit is contained in:
unknown
2013-05-03 11:27:29 +02:00
parent eb75edfb2b
commit 5aa0d185ca
5 changed files with 195 additions and 8 deletions

View File

@@ -703,7 +703,12 @@ get_gtid_list_event(IO_CACHE *cache, Gtid_list_log_event **out_gtid_list)
to build an in-memory hash or stuff like that.
We need to check that slave did not request GTID D-S-N1, when the
Gtid_list_log_event for this binlog file has D-S-N2 with N2 > N1.
Gtid_list_log_event for this binlog file has D-S-N2 with N2 >= N1.
(Because this means that requested GTID is in an earlier binlog).
However, if the Gtid_list_log_event indicates that D-S-N1 is the very last
GTID for domain D in prior binlog files, then it is ok to start from the
very start of this binlog file. This special case is important, as it
allows to purge old logs even if some domain is unused for long.
In addition, we need to check that we do not have a GTID D-S-N3 in the
Gtid_list_log_event where D is not present in the requested slave state at
@@ -717,7 +722,8 @@ contains_all_slave_gtid(slave_connection_state *st, Gtid_list_log_event *glev)
for (i= 0; i < glev->count; ++i)
{
const rpl_gtid *gtid= st->find(glev->list[i].domain_id);
uint32 gl_domain_id= glev->list[i].domain_id;
const rpl_gtid *gtid= st->find(gl_domain_id);
if (!gtid)
{
/*
@@ -727,13 +733,28 @@ contains_all_slave_gtid(slave_connection_state *st, Gtid_list_log_event *glev)
return false;
}
if (gtid->server_id == glev->list[i].server_id &&
gtid->seq_no < glev->list[i].seq_no)
gtid->seq_no <= glev->list[i].seq_no)
{
/*
The slave needs to receive gtid, but it is contained in an earlier
binlog file. So we need to search back further.
The slave needs to start after gtid, but it is contained in an earlier
binlog file. So we need to search back further, unless it was the very
last gtid logged for the domain in earlier binlog files.
*/
return false;
if (gtid->seq_no < glev->list[i].seq_no)
return false;
/*
The slave requested D-S-N1, which happens to be the last GTID logged
in prior binlog files with same domain id D and server id S.
The Gtid_list is kept sorted on domain_id, with the last GTID in each
domain_id group being the last one logged. So if this is the last GTID
within the domain_id group, then it is ok to start from the very
beginning of this group, per the special case explained in comment at
the start of this function. If not, then we need to search back further.
*/
if (i+1 < glev->count && gl_domain_id == glev->list[i+1].domain_id)
return false;
}
}
@@ -997,7 +1018,15 @@ gtid_find_binlog_file(slave_connection_state *state, char *out_name)
const rpl_gtid *gtid= state->find(glev->list[i].domain_id);
if (!gtid)
{
/* contains_all_slave_gtid() would have returned false if so. */
/*
contains_all_slave_gtid() returns false if there is any domain in
Gtid_list_event which is not in the requested slave position.
We may delete a domain from the slave state inside this loop, but
we only do this when it is the very last GTID logged for that
domain in earlier binlogs, and then we can not encounter it in any
further GTIDs in the Gtid_list.
*/
DBUG_ASSERT(0);
continue;
}