The test could fail with a duplicate key error because switching to non-GTID
mode could start at the wrong old-style position. The position could be
wrong when the previous GTID connect was stopped before receiving the fake
GTID list event which gives the old-style position corresponding to the GTID
connected position.
Work-around by injecting an extra event and syncing the slave before
switching to non-GTID mode.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Task:
=====
Update tests to reflect MDEV-20122, deprecation of master_use_gtid=current_pos.
Change Master (CM) statements were either removed or modified with
current_pos --> slave_pos based on original intention of the test.
Reviewed by:
============
Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Let us disable Valgrind on tests that would fail because a
server shutdown or a STOP SLAVE command would take longer,
causing the test harness to forcibly and silently kill the server
due to an exceeded timeout.
Shutdown of mtr tests may be too impatient, esp on CI environment where
10 seconds of `arg` of `shutdown_server arg` may not be enough for the clean
shutdown to complete.
This is fixed to remove explicit non-zero timeout argument to
`shutdown_server` from all mtr tests. mysqltest computes 60 seconds default
value for the timeout for the argless `shutdown_server` command.
This policy is additionally ensured with a compile time assert.
Make all system tables in mysql directory of type
engine=Aria
Privilege tables are using transactional=1
Statistical tables are using transactional=0, to allow them
to be quickly updated with low overhead.
Help tables are also using transactional=0 as these are only
updated at init time.
Other changes:
- Aria store engine is now a required engine
- Update comment for Aria tables to reflect their new usage
- Fixed that _ma_reset_trn_for_table() removes unlocked table
from transaction table list. This was needed to allow one
to lock and unlock system tables separately from other
tables, for example when reading a procedure from mysql.proc
- Don't give a warning when using transactional=1 for engines
that is using transactions. This is both logical and also
to avoid warnings/errors when doing an alter of a privilege
table to InnoDB.
- Don't abort on warnings from ALTER TABLE for changes that
would be accepted by CREATE TABLE.
- New created Aria transactional tables are marked as not movable
(as they include create_rename_lsn).
- bootstrap.test was changed to kill orignal server, as one
can't anymore have two servers started at same time on same
data directory and data files.
- Disable maria.small_blocksize as one can't anymore change
aria block size after system tables are created.
- Speed up creation of help tables by using lock tables.
- wsrep_sst_resync now also copies Aria redo logs.
Intermediate commit.
Update some existing test cases to work with the new handling of
mysql.gtid_slave_pos* tables:
- The tables are now checked during START SLAVE, which causes some
errors or error injections to trigger differently.
- Some test cases that play games with renaming or altering the
mysql.gtid_slave_pos table need adjustments.
- MDEV-11621 rpl.rpl_gtid_stop_start fails sporadically in buildbot
- MDEV-11620 rpl.rpl_upgrade_master_info fails sporadically in buildbot
The issue above was probably that the build machine was overworked and the
shutdown took longer than 30 resp 10 seconds, which caused MyISAM tables
to be marked as crashed.
Fixed by flushing myisam tables before doing a forced shutdown/kill.
I also increased timeout for forced shutdown from 10 seconds to 60 seconds
to fix other possible issues on slow machines.
Fixed also some compiler warnings
Some GTID test cases were using include/wait_condition.inc with a
condition like SELECT COUNT(*)=4 FROM t1 to wait for the slave to
catch up with the master. This causes races and test failures, as the
changes to the tables become visible at the COMMIT of the SQL thread
(or even before in case of MyISAM), but the changes to
@@gtid_slave_pos only become visible a little bit after the COMMIT.
Now that we have MASTER_GTID_WAIT(), just use that to sync up in a
GTID-friendly way, wrapped in nice include/save_master_gtid.inc and
include/sync_with_master_gtid.inc scripts.
The bug was that if mysql.slave_gtid_pos was missing, operations on variables
gtid_slave_pos, gtid_binlog_pos, and gtid_current_pos would fail, and continue
to fail even after the table was fixed, until server restart.
Now setting the variables retry loading the table, succeeding if it has been
restored. And querying the variables when the table is not there acts as if
the table was there and was empty.
Also, attempt to fix a race in the rpl.rpl_gtid_basic test case.
When we load the slave state from the mysql.gtid_slave_pos at server start, we
need to load all the rows into the in-memory hash, not just the most recent
one in each replication domain. Otherwise we accumulate cruft in the form of
old rows each time the server restarts.
Now whenever we reach the GTID point requested from the slave (when using GTID
position to connect), we send a fake Gtid_list event. This event is used by
the slave to know the current old-style position for MASTER_POS_WAIT(), and
later the similar binlog position for MASTER_GTID_WAIT().
Without this fake event, if the slave is already fully up-to-date with the
master, there may be no events sent at the given position for an indeterminate
time.
If the mysql.gtid_slave_pos table is not available, we cannot load nor update
the current GTID position persistently. This can happen eg. after an upgrade,
before mysql_upgrade_db is run, or if the table is InnoDB and the server is
restarted without the InnoDB storage engine enabled.
Before, replication always failed to start if the table was unavailable. With
this patch, we try to continue with old-style replication, after suitable
complaints in the error log. In strict mode, or if slave is configured to use
GTID, slave still refuses to start.
There were several cases where the slave GTID position was not loaded
correctly before being used. This caused various failures such as
corrupting the position at slave start and empty values of
@@gtid_slave_pos and @@gtid_current_pos.
Fixed by adding more checks for loaded position, and by always loading
the position at server startup.
Change of user interface to be more logical and more in line with expectations
to work similar to old-style replication.
User can now explicitly choose in CHANGE MASTER whether binlog position is
taken into account (master_gtid_pos=current_pos) or not (master_gtid_pos=
slave_pos) when slave connects to master.
@@gtid_pos is replaced by three separate variables @@gtid_slave_pos (can
be set by user, replicated GTIDs only), @@gtid_binlog_pos (read only), and
@@gtid_current_pos (a combination of the two, most recent GTID within each
domain). mysql.rpl_slave_state is renamed to mysql.gtid_slave_pos to match.
This fixes MDEV-4474.
The slave dump thread running on the master only checked thd->killed whenever
it reached the end of a binlog file, not between events. This could
unnecessarily delay server shutdown.
This was found by code inspection while tracking down some occasional "forcing
close of thread..." errors in Buildbot. Hopefully this will fix the failures,
but the fix is correct in any case.
Also increase the wait during server shutdown, 2 seconds is a bit tight in
case of heavy I/O stall, and it seems better to delay shutdown a bit than
force-kill threads unnecessarily.
Also fix some races in test cases that restart the mysqld server. The .expect
file should be changed with --append_file, --remove_file + --write_file
creates a short window where mysqld can error out due to .expect file missing.
Replace CHANGE MASTER TO ... master_gtid_pos='xxx' with a new system
variable @@global.gtid_pos.
This is more logical; @@gtid_pos is global, not per-master, and it is not
affected by RESET SLAVE.
Also rename master_gtid_pos=AUTO to master_use_gtid=1, which again is more
logical.
Implement test case rpl_gtid_stop_start.test to test normal stop and restart
of master and slave mysqld servers.
Fix a couple bugs found with the test:
- When InnoDB is disabled (no XA), the binlog state was not read when master
mysqld starts.
- Remove old code that puts a bogus D-S-0 into the initial binlog state, it
is not correct in current design.
- Fix memory leak in gtid_find_binlog_file().