The test could fail with a duplicate key error because switching to non-GTID
mode could start at the wrong old-style position. The position could be
wrong when the previous GTID connect was stopped before receiving the fake
GTID list event which gives the old-style position corresponding to the GTID
connected position.
Work-around by injecting an extra event and syncing the slave before
switching to non-GTID mode.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Task:
=====
Update tests to reflect MDEV-20122, deprecation of master_use_gtid=current_pos.
Change Master (CM) statements were either removed or modified with
current_pos --> slave_pos based on original intention of the test.
Reviewed by:
============
Brandon Nesterenko <brandon.nesterenko@mariadb.com>
New Feature:
========
This feature adds a safe replacement to the
MASTER_USE_GTID=Current_Pos option for CHANGE MASTER TO as
MASTER_DEMOTE_TO_SLAVE=<bool>. The use case of Current_Pos is to
transition a master to become a slave; however, can break
replication state if the slave executes local transactions due to
actively updating gtid_current_pos with gtid_binlog_pos and
gtid_slave_pos.
MASTER_DEMOTE_TO_SLAVE changes this use case by forcing users to set
Using_Gtid=Slave_Pos and merging gtid_binlog_pos into gtid_slave_pos
once at CHANGE MASTER TO time. Note that if gtid_slave_pos is more
recent than gtid_binlog_pos (as in the case of chain replication),
the replication state should be preserved.
Additionally, deprecate the `Current_Pos` option of MASTER_USE_GTID
to suggest the safe alternative option MASTER_DEMOTE_TO_SLAVE=TRUE.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
This commit makes replicas crash-safe by default by changing the
Using_Gtid value to be Slave_Pos on a fresh slave start and after
RESET SLAVE is issued. If the primary server does not support GTIDs
(i.e., version < 10), the replica will fall back to Using_Gtid=No on
slave start and after RESET SLAVE.
The following additional informational messages/warnings are added:
1. When Using_Gtid is automatically changed. That is, if RESET
SLAVE reverts Using_Gtid back to Slave_Pos, or Using_Gtid is
inferred to No from a CHANGE MASTER TO given with log coordinates
without MASTER_USE_GTID.
2. If options are ignored in CHANGE MASTER TO. If CHANGE MASTER TO
is given with log coordinates, yet also specifies
MASTER_USE_GTID=Slave_Pos, a warning message is given that the log
coordinate options are ignored.
Additionally, an MTR macro has been added for RESET SLAVE,
reset_slave.inc, which provides modes/options for resetting a slave
in log coordinate or gtid modes. When in log coordinates mode, the
macro will execute CHANGE MASTER TO MASTER_USE_GTID=No after the
RESET SLAVE command. When in GTID mode, an extra parameter,
reset_slave_keep_gtid_state, can be set to reset or preserve the
value of gtid_slave_pos.
Reviewed By:
===========
Andrei Elkin <andrei.elkin@mariadb.com>
This patch changes how old rows in mysql.gtid_slave_pos* tables are deleted.
Instead of doing it as part of every replicated transaction in
record_gtid(), it is done periodically (every @@gtid_cleanup_batch_size
transaction) in the slave background thread.
This removes the deletion step from the replication process in SQL or worker
threads, which could speed up replication with many small transactions. It
also decreases contention on the global mutex LOCK_slave_state. And it
simplifies the logic, eg. when a replicated transaction fails after having
deleted old rows.
With this patch, the deletion of old GTID rows happens asynchroneously and
slightly non-deterministic. Thus the number of old rows in
mysql.gtid_slave_pos can temporarily exceed @@gtid_cleanup_batch_size. But
all old rows will be deleted eventually after sufficiently many new GTIDs
have been replicated.
Make all system tables in mysql directory of type
engine=Aria
Privilege tables are using transactional=1
Statistical tables are using transactional=0, to allow them
to be quickly updated with low overhead.
Help tables are also using transactional=0 as these are only
updated at init time.
Other changes:
- Aria store engine is now a required engine
- Update comment for Aria tables to reflect their new usage
- Fixed that _ma_reset_trn_for_table() removes unlocked table
from transaction table list. This was needed to allow one
to lock and unlock system tables separately from other
tables, for example when reading a procedure from mysql.proc
- Don't give a warning when using transactional=1 for engines
that is using transactions. This is both logical and also
to avoid warnings/errors when doing an alter of a privilege
table to InnoDB.
- Don't abort on warnings from ALTER TABLE for changes that
would be accepted by CREATE TABLE.
- New created Aria transactional tables are marked as not movable
(as they include create_rename_lsn).
- bootstrap.test was changed to kill orignal server, as one
can't anymore have two servers started at same time on same
data directory and data files.
- Disable maria.small_blocksize as one can't anymore change
aria block size after system tables are created.
- Speed up creation of help tables by using lock tables.
- wsrep_sst_resync now also copies Aria redo logs.
Intermediate commit.
Update some existing test cases to work with the new handling of
mysql.gtid_slave_pos* tables:
- The tables are now checked during START SLAVE, which causes some
errors or error injections to trigger differently.
- Some test cases that play games with renaming or altering the
mysql.gtid_slave_pos table need adjustments.
- MDEV-11621 rpl.rpl_gtid_stop_start fails sporadically in buildbot
- MDEV-11620 rpl.rpl_upgrade_master_info fails sporadically in buildbot
The issue above was probably that the build machine was overworked and the
shutdown took longer than 30 resp 10 seconds, which caused MyISAM tables
to be marked as crashed.
Fixed by flushing myisam tables before doing a forced shutdown/kill.
I also increased timeout for forced shutdown from 10 seconds to 60 seconds
to fix other possible issues on slow machines.
Fixed also some compiler warnings
Some GTID test cases were using include/wait_condition.inc with a
condition like SELECT COUNT(*)=4 FROM t1 to wait for the slave to
catch up with the master. This causes races and test failures, as the
changes to the tables become visible at the COMMIT of the SQL thread
(or even before in case of MyISAM), but the changes to
@@gtid_slave_pos only become visible a little bit after the COMMIT.
Now that we have MASTER_GTID_WAIT(), just use that to sync up in a
GTID-friendly way, wrapped in nice include/save_master_gtid.inc and
include/sync_with_master_gtid.inc scripts.
The bug was that if mysql.slave_gtid_pos was missing, operations on variables
gtid_slave_pos, gtid_binlog_pos, and gtid_current_pos would fail, and continue
to fail even after the table was fixed, until server restart.
Now setting the variables retry loading the table, succeeding if it has been
restored. And querying the variables when the table is not there acts as if
the table was there and was empty.
Also, attempt to fix a race in the rpl.rpl_gtid_basic test case.
When we load the slave state from the mysql.gtid_slave_pos at server start, we
need to load all the rows into the in-memory hash, not just the most recent
one in each replication domain. Otherwise we accumulate cruft in the form of
old rows each time the server restarts.
Now whenever we reach the GTID point requested from the slave (when using GTID
position to connect), we send a fake Gtid_list event. This event is used by
the slave to know the current old-style position for MASTER_POS_WAIT(), and
later the similar binlog position for MASTER_GTID_WAIT().
Without this fake event, if the slave is already fully up-to-date with the
master, there may be no events sent at the given position for an indeterminate
time.
If the mysql.gtid_slave_pos table is not available, we cannot load nor update
the current GTID position persistently. This can happen eg. after an upgrade,
before mysql_upgrade_db is run, or if the table is InnoDB and the server is
restarted without the InnoDB storage engine enabled.
Before, replication always failed to start if the table was unavailable. With
this patch, we try to continue with old-style replication, after suitable
complaints in the error log. In strict mode, or if slave is configured to use
GTID, slave still refuses to start.
There were several cases where the slave GTID position was not loaded
correctly before being used. This caused various failures such as
corrupting the position at slave start and empty values of
@@gtid_slave_pos and @@gtid_current_pos.
Fixed by adding more checks for loaded position, and by always loading
the position at server startup.
When @@GLOBAL.gtid_strict_mode=1, then certain operations result
in error that would otherwise result in out-of-order binlog files
between servers.
GTID sequence numbers are now allocated independently per domain;
this results in less/no holes in GTID sequences, increasing the
likelyhood that diverging binlogs will be caught by the slave when
GTID strict mode is enabled.
Change of user interface to be more logical and more in line with expectations
to work similar to old-style replication.
User can now explicitly choose in CHANGE MASTER whether binlog position is
taken into account (master_gtid_pos=current_pos) or not (master_gtid_pos=
slave_pos) when slave connects to master.
@@gtid_pos is replaced by three separate variables @@gtid_slave_pos (can
be set by user, replicated GTIDs only), @@gtid_binlog_pos (read only), and
@@gtid_current_pos (a combination of the two, most recent GTID within each
domain). mysql.rpl_slave_state is renamed to mysql.gtid_slave_pos to match.
This fixes MDEV-4474.
Replace CHANGE MASTER TO ... master_gtid_pos='xxx' with a new system
variable @@global.gtid_pos.
This is more logical; @@gtid_pos is global, not per-master, and it is not
affected by RESET SLAVE.
Also rename master_gtid_pos=AUTO to master_use_gtid=1, which again is more
logical.
Implement test case rpl_gtid_stop_start.test to test normal stop and restart
of master and slave mysqld servers.
Fix a couple bugs found with the test:
- When InnoDB is disabled (no XA), the binlog state was not read when master
mysqld starts.
- Remove old code that puts a bogus D-S-0 into the initial binlog state, it
is not correct in current design.
- Fix memory leak in gtid_find_binlog_file().