Some GTID test cases were using include/wait_condition.inc with a
condition like SELECT COUNT(*)=4 FROM t1 to wait for the slave to
catch up with the master. This causes races and test failures, as the
changes to the tables become visible at the COMMIT of the SQL thread
(or even before in case of MyISAM), but the changes to
@@gtid_slave_pos only become visible a little bit after the COMMIT.
Now that we have MASTER_GTID_WAIT(), just use that to sync up in a
GTID-friendly way, wrapped in nice include/save_master_gtid.inc and
include/sync_with_master_gtid.inc scripts.
MDEV-4725: Incorrect binlog state recovery if crash while writing event group
The binlog state was not recovered correctly if XA is not used (eg. InnoDB
disabled), or if server crashed in the middle of writing an event group to the
binlog.
With this patch, we ensure that recovery of binlog state is done even if we do
not do the full XA binlog recovery, and we ensure that we only recover fully
written event groups into the binlog state.
Change of user interface to be more logical and more in line with expectations
to work similar to old-style replication.
User can now explicitly choose in CHANGE MASTER whether binlog position is
taken into account (master_gtid_pos=current_pos) or not (master_gtid_pos=
slave_pos) when slave connects to master.
@@gtid_pos is replaced by three separate variables @@gtid_slave_pos (can
be set by user, replicated GTIDs only), @@gtid_binlog_pos (read only), and
@@gtid_current_pos (a combination of the two, most recent GTID within each
domain). mysql.rpl_slave_state is renamed to mysql.gtid_slave_pos to match.
This fixes MDEV-4474.
The slave dump thread running on the master only checked thd->killed whenever
it reached the end of a binlog file, not between events. This could
unnecessarily delay server shutdown.
This was found by code inspection while tracking down some occasional "forcing
close of thread..." errors in Buildbot. Hopefully this will fix the failures,
but the fix is correct in any case.
Also increase the wait during server shutdown, 2 seconds is a bit tight in
case of heavy I/O stall, and it seems better to delay shutdown a bit than
force-kill threads unnecessarily.
Also fix some races in test cases that restart the mysqld server. The .expect
file should be changed with --append_file, --remove_file + --write_file
creates a short window where mysqld can error out due to .expect file missing.
Replace CHANGE MASTER TO ... master_gtid_pos='xxx' with a new system
variable @@global.gtid_pos.
This is more logical; @@gtid_pos is global, not per-master, and it is not
affected by RESET SLAVE.
Also rename master_gtid_pos=AUTO to master_use_gtid=1, which again is more
logical.
More fixes for test failures in Buildbot:
- Do not run crashing test in Valgrind.
- FLUSH TABLES did not work to avoid errors about not closed tables when
crashing server. Suppress the messages instead.
- Rewrite multi-source test case to only start one pair of slave threads at a
time, to work-around the bug MDEV-4352.
Add tests crashing the slave in the middle of replication and checking that
replication picks-up again on restart in a crash-safe way.
Fix silly code that causes crash by inserting uninitialised data into a hash.
Test crashing the master, check that it recovers the binlog state.
Fix one bug introduced by previous commit (crash-recoved binlog state was
overwritten by loading stale binlog state file).
Fix Windows build error.
Adjust full test suite to work with GTID.
Huge patch, mainly due to having to update .result file for all SHOW BINLOG
EVENTS and mysqlbinlog outputs, where the new GTID events pop up.
Everything was painstakingly checked to be still correct and valid .result
file updates.
Fix MDEV-4275 - I/O thread restart duplicates events in the relay log.
The first time we connect to master after CHANGE MASTER or restart, we connect
from the GTID position. But then subsequent reconnects or IO thread restarts
reconnect with the old-style file/offset binlog pos from where it left off at
last disconnect. This is necessary to avoid duplicate events in the relay
logs, as there is nothing that synchronises the SQL thread update of GTID
state (multiple threads in case of multi-source) with IO thread reconnects.
Test cases.
Some small cleanups and fixes.