Fix rare failures in test case rpl.rpl_gtid_basic:
- Add another possible error code when a connection is killed.
- Make sure that the IO thread has had time to complete its stop after START
SLAVE UNTIL. Otherwise, START SLAVE might run before IO thread stop,
leaving the test case with a stopped IO thread that eventually causes a
wait timeout.
rpl.rpl_gtid_crash fail on PPC64
This is an addition to the original patch.
Restored show binlog events output and adjusted filters to
replace [\d-\d-\d,\d-\d-\d,\d-\d-\d] with [#-#-#].
The sql_slave_skip_counter is important to be able to recover replication from
certain errors. Often, an appropriate solution is to set
sql_slave_skip_counter to skip over a problem event. But setting
sql_slave_skip_counter produced an error in GTID mode, with a suggestion to
instead set @@gtid_slave_pos to point past the problem event. This however is
not always possible; for example, in case of an INCIDENT event, that event
does not have any GTID to assign to @@gtid_slave_pos.
With this patch, sql_slave_skip_counter now works in GTID mode the same was as
in non-GTID mode. When set, that many initial events are skipped when the SQL
thread starts, plus as many extra events are needed to completely skip any
partially skipped event group. The GTID position is updated to point past the
skipped event(s).
As a side-effect of purge_relay_logs(), sql_slave_skip_counter
was silently ignored in GTID mode.
But sql_slave_skip_counter in fact is not a good match with GTID.
And it is not really needed either, as users can explicitly set
@@gtid_slave_pos to skip specific GTIDs, in a way that matches
well how GTID replication works.
So with this patch, we give an error on attempts to set
sql_slave_skip_counter when using GTID, with a suggestion to use
gtid_slave_pos instead, if needed.
Rewrite the gtid_waiting::wait_for_gtid() function.
The code was rubbish (and buggy). Now the logic is
much clearer.
Also fix a missing slave sync that could cause test failure.
Some GTID test cases were using include/wait_condition.inc with a
condition like SELECT COUNT(*)=4 FROM t1 to wait for the slave to
catch up with the master. This causes races and test failures, as the
changes to the tables become visible at the COMMIT of the SQL thread
(or even before in case of MyISAM), but the changes to
@@gtid_slave_pos only become visible a little bit after the COMMIT.
Now that we have MASTER_GTID_WAIT(), just use that to sync up in a
GTID-friendly way, wrapped in nice include/save_master_gtid.inc and
include/sync_with_master_gtid.inc scripts.
MASTER_GTID_WAIT() is similar to MASTER_POS_WAIT(), but works with a
GTID position rather than an old-style filename/offset.
@@LAST_GTID gives the GTID assigned to the last transaction written
into the binlog.
Together, the two can be used by applications to obtain the GTID of
an update on the master, and then do a MASTER_GTID_WAIT() for that
position on any read slave where it is important to get results that
are caught up with the master at least to the point of the update.
The implementation of MASTER_GTID_WAIT() is implemented in a way
that tries to minimise the performance impact on the SQL threads,
even in the presense of many waiters on single GTID positions (as
from @@LAST_GTID).
The bug was that if mysql.slave_gtid_pos was missing, operations on variables
gtid_slave_pos, gtid_binlog_pos, and gtid_current_pos would fail, and continue
to fail even after the table was fixed, until server restart.
Now setting the variables retry loading the table, succeeding if it has been
restored. And querying the variables when the table is not there acts as if
the table was there and was empty.
Also, attempt to fix a race in the rpl.rpl_gtid_basic test case.
Implement @@gtid_binlog_state. This is the internal state of the binlog
(most recent GTID logged for every domain_id and server_id). This allows
to save the state before RESET MASTER and restore it afterwards.
Change of user interface to be more logical and more in line with expectations
to work similar to old-style replication.
User can now explicitly choose in CHANGE MASTER whether binlog position is
taken into account (master_gtid_pos=current_pos) or not (master_gtid_pos=
slave_pos) when slave connects to master.
@@gtid_pos is replaced by three separate variables @@gtid_slave_pos (can
be set by user, replicated GTIDs only), @@gtid_binlog_pos (read only), and
@@gtid_current_pos (a combination of the two, most recent GTID within each
domain). mysql.rpl_slave_state is renamed to mysql.gtid_slave_pos to match.
This fixes MDEV-4474.
There was missing a check for THD::killed after THD::enter_cond(). This could
cause the binlog dump thread to miss the kill signal during server shutdown
and hang until it was force-closed.
Also fix a race in a test case that occasionally fails in Buildbot.
Replace CHANGE MASTER TO ... master_gtid_pos='xxx' with a new system
variable @@global.gtid_pos.
This is more logical; @@gtid_pos is global, not per-master, and it is not
affected by RESET SLAVE.
Also rename master_gtid_pos=AUTO to master_use_gtid=1, which again is more
logical.
- Add first basic mysql-test-run test case which tests switch to new master
using MASTER_GTID_POS=AUTO.
- When we connect with GTID, do not use any old relay logs, as they may
contain the wrong events or be corrupt after crash.
- Fix old bug that fails replication if we receive a heartbeat event
immediately after an event was omitted in the stream from the master.
- Fix rpl_end to clear Gtid_Pos_Auto, to keep check_testcase happy.