1
0
mirror of https://github.com/MariaDB/server.git synced 2025-12-07 17:42:39 +03:00
Files
mariadb/mysql-test/suite/galera/t/galera_as_slave_gtid.inc
seppo 4111a53079 MDEV-21096 async slave crash with gtid_log_pos table access (#1413)
The original crash happened when async replication IO thread was updating mysql.gtid_slave_pos table. Operations on this table should remain node local, but it appears that protection (THD::wsrep_ignore_table flag) to prevent wsrep replication for this table mas missing for innodb write_row() and update_row().
It was somewhat difficult to reproduce the issue, because mtr seems to create the affected table mysql.gtid_log_pos as of Aria engine type, and Aria engine operations will not be replicated anyhow. It looks, though, that in release installation, mysql.gtid_slave_pos table is of InnoDB engine.
It was possible to trigger somewhat related problem by running test galera.galera_as_slave_gtid with configuration: gtid_pos_auto_engines=InnoDB. However, this test mode, causes earlier crash when replication background thread creates aditional table: mysql.gtid_slave_pos_InnoDB, and this table create triggered wsrep TOI replication, which also failed for assertion. Actually, async replication IO and background threads should not replicate anything to cluster.

This pull request contains new test galera.galera_as_slave_gtid_auto_engine, which basically just runs galera.galera_as_slave_gtid with configuration of gtid_pos_auto_engines=InnoDB.
Test galera.galera_as_slave_gtid is also modified for better code reuse.
Actual fix for MDEV-21096 is in storage/innobase/handler/ha_innodb.cc, where THD::wsrep_ignore_table flag is now honored before wsrep key population.
There is additional fix in sql/service_wsrep.cc where async replication IO and background threads are marked as non-local. This fences these threads out of wsrep replication altogether. Note that this change, actually makes the use of THD::wsrep_ignore-table redundant. We may want to refactor THD::wsrep_ignore_table out in the future, if there is no other use case for it in sight.
2019-11-25 11:19:33 +02:00

87 lines
2.5 KiB
PHP

#
# Test Galera as a slave to a MariaDB master using GTIDs
#
# suite/galera/galera_2nodes_as_slave.cnf describes the setup of the nodes
# suite/galera/t/galera_as_slave_gtid.cnf has the GTID options
#
# In addition to performing DDL and DML, we check that the gtid of the master is preserved inside the cluster
#
--source include/have_innodb.inc
--source include/galera_cluster.inc
# As node #3 is not a Galera node, and galera_cluster.inc does not open connetion to it
# we open the node_3 connection here
--connect node_3, 127.0.0.1, root, , test, $NODE_MYPORT_3
--connection node_2
--disable_query_log
--eval CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_USER='root', MASTER_PORT=$NODE_MYPORT_3;
--enable_query_log
START SLAVE;
--connection node_3
CREATE TABLE t1 (f1 INTEGER PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t1 VALUES(1);
SELECT LENGTH(@@global.gtid_binlog_state) > 1;
--let $gtid_binlog_state_node1 = `SELECT @@global.gtid_binlog_state;`
--connection node_2
--let $wait_condition = SELECT COUNT(*) = 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 't1';
--source include/wait_condition.inc
--let $wait_condition = SELECT COUNT(*) = 1 FROM t1;
--source include/wait_condition.inc
--disable_query_log
--eval SELECT '$gtid_binlog_state_node1' = @@global.gtid_binlog_state AS gtid_binlog_state_equal;
#--eval SELECT GTID_SUBSET('$gtid_executed_node1', @@global.gtid_executed) AS gtid_executed_equal;
--enable_query_log
--connection node_1
SELECT COUNT(*) = 1 FROM t1;
--disable_query_log
--eval SELECT '$gtid_binlog_state_node1' = @@global.gtid_binlog_state AS gtid_binlog_state_equal;
#--eval SELECT GTID_SUBSET('$gtid_executed_node1', @@global.gtid_executed) AS gtid_executed_equal;
--enable_query_log
--connection node_3
DROP TABLE t1;
#
# Unfortunately without the sleep below the following statement fails with "query returned no rows", which
# is difficult to understand given that it is an aggregate query. A "query execution was interrupted"
# warning is also reported by MTR, which is also weird.
#
--sleep 1
--connection node_1
--let $wait_condition = SELECT COUNT(*) = 0 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 't1';
--source include/wait_condition.inc
--connection node_2
--let $wait_condition = SELECT COUNT(*) = 0 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 't1';
--source include/wait_condition.inc
STOP SLAVE;
RESET SLAVE ALL;
--echo #cleanup
--connection node_1
set global wsrep_on=OFF;
reset master;
set global wsrep_on=ON;
--connection node_2
set global wsrep_on=OFF;
reset master;
set global wsrep_on=ON;
--connection node_3
reset master;