1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00

MDEV-32551: "Read semi-sync reply magic number error" warnings on master

rpl_semi_sync_slave_enabled_consistent.test and the first part of
the commit message comes from Brandon Nesterenko.

A test to show how to induce the "Read semi-sync reply magic number
error" message on a primary. In short, if semi-sync is turned on
during the hand-shake process between a primary and replica, but
later a user negates the rpl_semi_sync_slave_enabled variable while
the replica's IO thread is running; if the io thread exits, the
replica can skip a necessary call to kill_connection() in
repl_semisync_slave.slave_stop() due to its reliance on a global
variable. Then, the replica will send a COM_QUIT packet to the
primary on an active semi-sync connection, causing the magic number
error.

The test in this patch exits the IO thread by forcing an error;
though note a call to STOP SLAVE could also do this, but it ends up
needing more synchronization. That is, the STOP SLAVE command also
tries to kill the VIO of the replica, which makes a race with the IO
thread to try and send the COM_QUIT before this happens (which would
need more debug_sync to get around). See THD::awake_no_mutex for
details as to the killing of the replica’s vio.

Notes:
- The MariaDB documentation does not make it clear that when one
  enables semi-sync replication it does not matter if one enables
  it first in the master or slave. Any order works.

Changes done:
- The rpl_semi_sync_slave_enabled variable is now a default value for
  when semisync is started. The variable does not anymore affect
  semisync if it is already running. This fixes the original reported
  bug.  Internally we now use repl_semisync_slave.get_slave_enabled()
  instead of rpl_semi_sync_slave_enabled. To check if semisync is
  active on should check the @@rpl_semi_sync_slave_status variable (as
  before).
- The semisync protocol conflicts in the way that the original
  MySQL/MariaDB client-server protocol was designed (client-server
  send and reply packets are strictly ordered and includes a packet
  number to allow one to check if a packet is lost). When using
  semi-sync the master and slave can send packets at 'any time', so
  packet numbering does not work. The 'solution' has been that each
  communication starts with packet number 1, but in some cases there
  is still a chance that the packet number check can fail.  Fixed by
  adding a flag (pkt_nr_can_be_reset) in the NET struct that one can
  use to signal that packet number checking should not be done. This
  is flag is set when semi-sync is used.
- Added Master_info::semi_sync_reply_enabled to allow one to configure
  some slaves with semisync and other other slaves without semisync.
  Removed global variable semi_sync_need_reply that would not work
  with multi-master.
- Repl_semi_sync_master::report_reply_packet() can now recognize
  the COM_QUIT packet from semisync slave and not give a
  "Read semi-sync reply magic number error" error for this case.
  The slave will be removed from the Ack listener.
- On Windows, don't stop semisync Ack listener just because one
  slave connection is using socket_id > FD_SETSIZE.
- Removed busy loop in Ack_receiver::run() by using
 "Self-pipe trick" to signal new slave and stop Ack_receiver.
- Changed some Repl_semi_sync_slave functions that always returns 0
  from int to void.
- Added Repl_semi_sync_slave::slave_reconnect().
- Removed dummy_function Repl_semi_sync_slave::reset_slave().
- Removed some duplicate semisync notes from the error log.
- Add test of "if (get_slave_enabled() && semi_sync_need_reply)"
  before calling Repl_semi_sync_slave::slave_reply().
  (Speeds up the code as we can skip all initializations).
- If epl_semisync_slave.slave_reply() fails, we disable semisync
  for that connection.
- We do not call semisync.switch_off() if there are no active slaves.
  Instead we check in Repl_semi_sync_master::commit_trx() if there are
  no active threads. This simplices the code.
- Changed assert() to DBUG_ASSERT() to ensure that the DBUG log is
  flushed in case of asserts.
- Removed the internal rpl_semi_sync_slave_status as it is not needed
  anymore. The @@rpl_semi_sync_slave_status status variable is now
  mapped to rpl_semi_sync_enabled.
- Removed rpl_semi_sync_slave_enabled  as it is not needed anymore.
  Repl_semi_sync_slave::get_slave_enabled() contains the active status.
- Added checking that we do not add a slave twice with
  Ack_receiver::add_slave(). This could happen with old code.
- Removed Repl_semi_sync_master::check_and_switch() as it is not
  needed anymore.
- Ensure that when we call Ack_receiver::remove_slave() that the slave
  is removed from the listener before function returns.
- Call listener.listen_on_sockets() outside of mutex for better
  performance and less contested mutex.
- Ensure that listening is ignoring newly added slaves when checking for
  responses.
- Fixed the master ack_receiver listener is not killed if there are no
  connected slaves (and thus stop semisync handling of future
  connections). This could happen if all slaves sockets where would be
  marked as unreliable.
- Added unlink() to base_ilist_iterator and remove() to
  I_List_iterator. This enables us to remove 'dead' slaves in
  Ack_recever::run().
- kill_zombie_dump_threads() now does killing of dump threads properly.
  - It can now kill several threads (should be impossible but could
    happen if IO slaves reconnects very fast).
  - We now wait until the dump thread is done before starting the
    dump.
- Added an error if kill_zombie_dump_threads() fails.
- Set thd->variables.server_id before calling
  kill_zombie_dump_threads(). This simplies the code.
- Added a lot of comments both in code and tests.
- Removed DBUG_EVALUATE_IF "failed_slave_start" as it is not used.

Test changes:
- rpl.rpl_session_var2 added which runs rpl.rpl_session_var test with
  semisync enabled.
- Some timings changed slight with startup of slave which caused
  rpl_binlog_dump_slave_gtid_state_info.text to fail as it checked the
  error log file before the slave had started properly. Fixed by
  adding wait_for_pattern_in_file.inc that allows waiting for the
  pattern to appear in the log file.
- Tests have been updated so that we first set
  rpl_semi_sync_master_enabled on the master and then set
  rpl_semi_sync_slave_enabled on the slaves (this is according to how
  the MariaDB documentation document how to setup semi-sync).
- Error text "Master server does not have semi-sync enabled" has been
  replaced with "Master server does not support semi-sync" for the
  case when the master supports semi-sync but semi-sync is not
  enabled.

Other things:
- Some trivial cleanups in Repl_semi_sync_master::update_sync_header().
- We should in 11.3 changed the default value for
  rpl-semi-sync-master-wait-no-slave from TRUE to FALSE as the TRUE
  does not make much sense as default. The main difference with using
  FALSE is that we do not wait for semisync Ack if there are no slave
  threads.  In the case of TRUE we wait once, which did not bring any
  notable benefits except slower startup of master configured for
  using semisync.

Co-author: Brandon Nesterenko <brandon.nesterenko@mariadb.com>

This solves the problem reported in MDEV-32960 where a new
slave may not be registered in time and the master disables
semi sync because of that.
This commit is contained in:
Michael Widenius
2023-11-08 16:57:58 -07:00
committed by Monty
parent ee7cc0a466
commit 7af50e4df4
57 changed files with 1171 additions and 360 deletions

View File

@@ -8,6 +8,7 @@ CHANGE MASTER TO MASTER_USE_GTID=current_pos;
include/start_slave.inc
connection master;
"Test Case 1: Start binlog_dump to slave_server(#), pos(master-bin.000001, ###), using_gtid(1), gtid('')"
include/wait_for_pattern_in_file.inc
FOUND 1 /using_gtid\(1\), gtid\(\'\'\).*/ in mysqld.1.err
connection slave;
include/stop_slave.inc
@@ -15,6 +16,7 @@ CHANGE MASTER TO MASTER_USE_GTID=no;
include/start_slave.inc
connection master;
"Test Case 2: Start binlog_dump to slave_server(#), pos(master-bin.000001, ###), using_gtid(0), gtid('')"
include/wait_for_pattern_in_file.inc
FOUND 1 /using_gtid\(0\), gtid\(\'\'\).*/ in mysqld.1.err
CREATE TABLE t (f INT) ENGINE=INNODB;
INSERT INTO t VALUES(10);
@@ -25,6 +27,7 @@ CHANGE MASTER TO MASTER_USE_GTID=slave_pos;
include/start_slave.inc
connection master;
"Test Case 3: Start binlog_dump to slave_server(#), pos(master-bin.000001, ###), using_gtid(1), gtid('0-1-2')"
include/wait_for_pattern_in_file.inc
FOUND 1 /using_gtid\(1\), gtid\(\'0-1-2\'\).*/ in mysqld.1.err
SET @@SESSION.gtid_domain_id=10;
INSERT INTO t VALUES(20);
@@ -35,6 +38,7 @@ CHANGE MASTER TO MASTER_USE_GTID=slave_pos;
include/start_slave.inc
connection master;
"Test Case 4: Start binlog_dump to slave_server(#), pos(master-bin.000001, ###), using_gtid(1), gtid('0-1-2,10-1-1')"
include/wait_for_pattern_in_file.inc
FOUND 1 /using_gtid\(1\), gtid\(\'0-1-2,10-1-1\'\).*/ in mysqld.1.err
"===== Clean up ====="
connection slave;

View File

@@ -1,5 +1,7 @@
include/master-slave.inc
[connection master]
connection server_2;
call mtr.add_suppression("Timeout waiting for reply of binlog");
# Master server_1 and Slave server_2 initialization ...
connection server_2;
include/stop_slave.inc
@@ -40,6 +42,8 @@ set @@global.rpl_semi_sync_master_enabled = 1;
INSERT INTO t1(a) VALUES (2);
include/save_master_gtid.inc
connection server_1;
include/stop_slave.inc
include/start_slave.inc
#
# the successful sync is a required proof
#

View File

@@ -67,6 +67,9 @@ include/stop_slave.inc
# CHANGE MASTER TO MASTER_DELAY = 2*T
include/start_slave.inc
connection master;
INSERT INTO t1 VALUES ('Syncing slave', 5);
connection slave;
connection master;
INSERT INTO t1 VALUES (delay_on_slave(1), 6);
Warnings:
Note 1592 Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statement is unsafe because it uses a system variable that may have a different value on the slave

View File

@@ -6,7 +6,6 @@ call mtr.add_suppression("Read semi-sync reply");
call mtr.add_suppression("Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT.");
call mtr.add_suppression("mysqld: Got an error reading communication packets");
connection slave;
call mtr.add_suppression("Master server does not support semi-sync");
call mtr.add_suppression("Semi-sync slave .* reply");
call mtr.add_suppression("Slave SQL.*Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group");
connection master;
@@ -26,7 +25,7 @@ set global rpl_semi_sync_slave_enabled= 0;
# Main test of semi-sync replication start here
#
connection master;
set global rpl_semi_sync_master_timeout= 60000;
set global rpl_semi_sync_master_timeout= 2000;
[ default state of semi-sync on master should be OFF ]
show variables like 'rpl_semi_sync_master_enabled';
Variable_name Value
@@ -161,11 +160,15 @@ connection slave;
# Test semi-sync master will switch OFF after one transaction
# timeout waiting for slave reply.
#
connection master;
show status like "Rpl_semi_sync_master_status";
Variable_name Value
Rpl_semi_sync_master_status ON
connection slave;
include/stop_slave.inc
connection master;
include/kill_binlog_dump_threads.inc
set global rpl_semi_sync_master_timeout= 5000;
set global rpl_semi_sync_master_timeout= 2000;
[ master status should be ON ]
show status like 'Rpl_semi_sync_master_no_tx';
Variable_name Value
@@ -315,6 +318,8 @@ include/kill_binlog_dump_threads.inc
connection slave;
include/start_slave.inc
connection master;
connection slave;
connection master;
create table t1 (a int) engine = ENGINE_TYPE;
insert into t1 values (1);
insert into t1 values (2), (3);
@@ -357,6 +362,8 @@ show status like 'Rpl_semi_sync_slave_status';
Variable_name Value
Rpl_semi_sync_slave_status ON
connection master;
connection slave;
connection master;
[ master semi-sync should be ON ]
show status like 'Rpl_semi_sync_master_clients';
Variable_name Value

View File

@@ -7,7 +7,6 @@ call mtr.add_suppression("Read semi-sync reply");
call mtr.add_suppression("Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT.");
call mtr.add_suppression("mysqld: Got an error reading communication packets");
connection slave;
call mtr.add_suppression("Master server does not support semi-sync");
call mtr.add_suppression("Semi-sync slave .* reply");
call mtr.add_suppression("Slave SQL.*Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group");
connection master;
@@ -27,7 +26,7 @@ set global rpl_semi_sync_slave_enabled= 0;
# Main test of semi-sync replication start here
#
connection master;
set global rpl_semi_sync_master_timeout= 60000;
set global rpl_semi_sync_master_timeout= 2000;
[ default state of semi-sync on master should be OFF ]
show variables like 'rpl_semi_sync_master_enabled';
Variable_name Value
@@ -162,11 +161,15 @@ connection slave;
# Test semi-sync master will switch OFF after one transaction
# timeout waiting for slave reply.
#
connection master;
show status like "Rpl_semi_sync_master_status";
Variable_name Value
Rpl_semi_sync_master_status ON
connection slave;
include/stop_slave.inc
connection master;
include/kill_binlog_dump_threads.inc
set global rpl_semi_sync_master_timeout= 5000;
set global rpl_semi_sync_master_timeout= 2000;
[ master status should be ON ]
show status like 'Rpl_semi_sync_master_no_tx';
Variable_name Value
@@ -316,6 +319,8 @@ include/kill_binlog_dump_threads.inc
connection slave;
include/start_slave.inc
connection master;
connection slave;
connection master;
create table t1 (a int) engine = ENGINE_TYPE;
insert into t1 values (1);
insert into t1 values (2), (3);
@@ -358,6 +363,8 @@ show status like 'Rpl_semi_sync_slave_status';
Variable_name Value
Rpl_semi_sync_slave_status ON
connection master;
connection slave;
connection master;
[ master semi-sync should be ON ]
show status like 'Rpl_semi_sync_master_clients';
Variable_name Value

View File

@@ -7,7 +7,6 @@ call mtr.add_suppression("Read semi-sync reply");
call mtr.add_suppression("Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT.");
call mtr.add_suppression("mysqld: Got an error reading communication packets");
connection slave;
call mtr.add_suppression("Master server does not support semi-sync");
call mtr.add_suppression("Semi-sync slave .* reply");
call mtr.add_suppression("Slave SQL.*Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group");
connection master;
@@ -27,7 +26,7 @@ set global rpl_semi_sync_slave_enabled= 0;
# Main test of semi-sync replication start here
#
connection master;
set global rpl_semi_sync_master_timeout= 60000;
set global rpl_semi_sync_master_timeout= 2000;
[ default state of semi-sync on master should be OFF ]
show variables like 'rpl_semi_sync_master_enabled';
Variable_name Value
@@ -162,11 +161,15 @@ connection slave;
# Test semi-sync master will switch OFF after one transaction
# timeout waiting for slave reply.
#
connection master;
show status like "Rpl_semi_sync_master_status";
Variable_name Value
Rpl_semi_sync_master_status ON
connection slave;
include/stop_slave.inc
connection master;
include/kill_binlog_dump_threads.inc
set global rpl_semi_sync_master_timeout= 5000;
set global rpl_semi_sync_master_timeout= 2000;
[ master status should be ON ]
show status like 'Rpl_semi_sync_master_no_tx';
Variable_name Value
@@ -316,6 +319,8 @@ include/kill_binlog_dump_threads.inc
connection slave;
include/start_slave.inc
connection master;
connection slave;
connection master;
create table t1 (a int) engine = ENGINE_TYPE;
insert into t1 values (1);
insert into t1 values (2), (3);
@@ -358,6 +363,8 @@ show status like 'Rpl_semi_sync_slave_status';
Variable_name Value
Rpl_semi_sync_slave_status ON
connection master;
connection slave;
connection master;
[ master semi-sync should be ON ]
show status like 'Rpl_semi_sync_master_clients';
Variable_name Value

View File

@@ -7,7 +7,6 @@ call mtr.add_suppression("Read semi-sync reply");
call mtr.add_suppression("Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT.");
call mtr.add_suppression("mysqld: Got an error reading communication packets");
connection slave;
call mtr.add_suppression("Master server does not support semi-sync");
call mtr.add_suppression("Semi-sync slave .* reply");
call mtr.add_suppression("Slave SQL.*Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group");
connection master;

View File

@@ -8,7 +8,6 @@ call mtr.add_suppression("Read semi-sync reply");
call mtr.add_suppression("Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT.");
call mtr.add_suppression("mysqld: Got an error reading communication packets");
connection slave;
call mtr.add_suppression("Master server does not support semi-sync");
call mtr.add_suppression("Semi-sync slave .* reply");
call mtr.add_suppression("Slave SQL.*Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group");
connection master;

View File

@@ -5,6 +5,7 @@ include/stop_slave.inc
connection server_1;
RESET MASTER;
SET @@global.max_binlog_size= 4096;
set @@global.rpl_semi_sync_master_enabled = 1;
connection server_2;
RESET MASTER;
SET @@global.max_binlog_size= 4096;
@@ -14,7 +15,6 @@ CHANGE MASTER TO master_use_gtid= slave_pos;
include/start_slave.inc
connection server_1;
ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
set @@global.rpl_semi_sync_master_enabled = 1;
set @@global.rpl_semi_sync_master_wait_point=AFTER_SYNC;
CREATE TABLE t1 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
INSERT INTO t1 VALUES (1, 'dummy1');

View File

@@ -0,0 +1,48 @@
include/rpl_init.inc [topology=1->2,1->3]
connection server_1;
set @old_enabled= @@global.rpl_semi_sync_master_enabled;
set @old_timeout= @@global.rpl_semi_sync_master_timeout;
set global rpl_semi_sync_master_enabled= 1;
set global rpl_semi_sync_master_timeout= 500;
connection server_2;
include/stop_slave.inc
set @old_enabled= @@global.rpl_semi_sync_slave_enabled;
set @old_dbug= @@global.debug_dbug;
set global rpl_semi_sync_slave_enabled= 1;
set global debug_dbug="+d,simulate_delay_semisync_slave_reply";
include/start_slave.inc
connection server_3;
include/stop_slave.inc
set @old_enabled= @@global.rpl_semi_sync_slave_enabled;
set global rpl_semi_sync_slave_enabled= 1;
include/start_slave.inc
# Ensure primary recognizes both replicas are semi-sync
connection server_1;
connection server_1;
create table t1 (a int);
connection server_2;
# Verifying server_2 did not send ACK
connection server_3;
# Verifying server_3 did send ACK
connection server_1;
# Verifying master's semi-sync status is still ON (This failed pre-MDEV-32960 fixes)
# Verifying rpl_semi_sync_master_yes_tx incremented
#
# Cleanup
connection server_2;
set global rpl_semi_sync_slave_enabled= @old_enabled;
set global debug_dbug= @old_dbug;
include/stop_slave.inc
connection server_3;
set global rpl_semi_sync_slave_enabled= @old_enabled;
include/stop_slave.inc
connection server_1;
set global rpl_semi_sync_master_enabled= @old_enabled;
set global rpl_semi_sync_master_timeout= @old_timeout;
drop table t1;
connection server_2;
include/start_slave.inc
connection server_3;
include/start_slave.inc
include/rpl_end.inc
# End of rpl_semi_sync_no_missed_ack_after_add_slave.test

View File

@@ -0,0 +1,35 @@
include/master-slave.inc
[connection master]
call mtr.add_suppression("Replication event checksum verification failed");
call mtr.add_suppression("could not queue event from master");
#
# Set up a semisync connection
connection master;
set @@global.rpl_semi_sync_master_enabled= ON;
connection slave;
stop slave io_thread;
set @@global.rpl_semi_sync_slave_enabled= ON;
set @old_dbug= @@global.debug_dbug;
set @@global.debug_dbug= "+d,corrupt_queue_event";
set @@global.debug_dbug= "+d,pause_before_io_read_event";
set @@global.debug_dbug= "+d,placeholder";
start slave io_thread;
# Disable semi-sync on the slave while the IO thread is active
set debug_sync='now wait_for io_thread_at_read_event';
set @@global.rpl_semi_sync_slave_enabled= OFF;
set debug_sync='now signal io_thread_continue_read_event';
# Waiting for the slave to stop with the error from corrupt_queue_event
connection slave;
include/wait_for_slave_io_error.inc [errno=1595,1743]
# Sleep 1 to give time for Ack_receiver to receive COM_QUIT
include/assert_grep.inc [Check that there is no 'Read semi-sync reply magic number error' in error log.]
#
# Cleanup
connection slave;
include/stop_slave.inc
set @@global.debug_dbug= @old_dbug;
include/start_slave.inc
connection master;
set @@global.rpl_semi_sync_master_enabled= default;
include/rpl_end.inc
# End of rpl_semi_sync_slave_enabled_consistent.test

View File

@@ -4,6 +4,7 @@ connection slave;
include/stop_slave.inc
connection master;
call mtr.add_suppression("Timeout waiting for reply of binlog*");
call mtr.add_suppression("Master server does not read semi-sync messages*");
set global rpl_semi_sync_master_enabled = ON;
SET @@GLOBAL.rpl_semi_sync_master_timeout=100;
create table t1 (i int);
@@ -15,8 +16,8 @@ SET GLOBAL debug_dbug="+d,semislave_failed_net_flush";
include/start_slave.inc
connection master;
connection slave;
"Assert that the net_fulsh() reply failed is present in slave error log.
FOUND 1 /Semi-sync slave net_flush\(\) reply failed/ in mysqld.2.err
"Assert that Master server does not read semi-sync messages" is present in slave error log.
FOUND 1 /Master server does not read semi-sync messages/ in mysqld.2.err
"Assert that Slave IO thread is up and running."
SHOW STATUS LIKE 'Slave_running';
Variable_name Value

View File

@@ -14,7 +14,6 @@ CALL mtr.add_suppression("Failed on request_dump()*");
CALL mtr.add_suppression("Semi-sync master failed on*");
CALL mtr.add_suppression("Master command COM_BINLOG_DUMP failed*");
CALL mtr.add_suppression("on master failed*");
CALL mtr.add_suppression("Master server does not support semi-sync*");
CALL mtr.add_suppression("Semi-sync slave net_flush*");
CALL mtr.add_suppression("Failed to flush master info*");
CALL mtr.add_suppression("Request to stop slave SQL Thread received while apply*");
@@ -196,7 +195,7 @@ Variable_name Value
Rpl_semi_sync_master_clients 0
show status like 'Rpl_semi_sync_master_status';
Variable_name Value
Rpl_semi_sync_master_status OFF
Rpl_semi_sync_master_status ON
connection slave;
START SLAVE IO_THREAD;
include/wait_for_slave_io_to_start.inc

View File

@@ -1,5 +1,16 @@
include/master-slave.inc
[connection master]
select @@rpl_semi_sync_master_enabled;
@@rpl_semi_sync_master_enabled
0
connection slave;
select @@rpl_semi_sync_slave_enabled;
@@rpl_semi_sync_slave_enabled
0
show status like "rpl_semi_sync_slave_status";
Variable_name Value
Rpl_semi_sync_slave_status OFF
connection master;
drop table if exists t1;
Warnings:
Note 1051 Unknown table 'test.t1'

View File

@@ -0,0 +1,69 @@
include/master-slave.inc
[connection master]
select @@rpl_semi_sync_master_enabled;
@@rpl_semi_sync_master_enabled
1
connection slave;
select @@rpl_semi_sync_slave_enabled;
@@rpl_semi_sync_slave_enabled
1
show status like "rpl_semi_sync_slave_status";
Variable_name Value
Rpl_semi_sync_slave_status ON
connection master;
drop table if exists t1;
Warnings:
Note 1051 Unknown table 'test.t1'
create table t1(a varchar(100),b int);
set @@session.sql_mode=pipes_as_concat;
insert into t1 values('My'||'SQL', 1);
set @@session.sql_mode=default;
insert into t1 values('1'||'2', 2);
select * from t1 where b<3 order by a;
a b
1 2
MySQL 1
connection slave;
select * from t1 where b<3 order by a;
a b
1 2
MySQL 1
connection master;
set @@session.sql_mode=ignore_space;
insert into t1 values(password ('MySQL'), 3);
set @@session.sql_mode=ansi_quotes;
create table "t2" ("a" int);
drop table t1, t2;
set @@session.sql_mode=default;
create table t1(a int auto_increment primary key);
create table t2(b int, a int);
set @@session.sql_auto_is_null=1;
insert into t1 values(null);
insert into t2 select 1,a from t1 where a is null;
set @@session.sql_auto_is_null=0;
insert into t1 values(null);
insert into t2 select 2,a from t1 where a is null;
select * from t2 order by b;
b a
1 1
connection slave;
select * from t2 order by b;
b a
1 1
connection master;
drop table t1,t2;
connection slave;
connection master;
CREATE TABLE t1 (
`id` int(11) NOT NULL auto_increment,
`data` varchar(100),
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
INSERT INTO t1(data) VALUES(SESSION_USER());
connection slave;
SELECT length(data) < 100 FROM t1;
length(data) < 100
1
connection master;
drop table t1;
include/rpl_end.inc

View File

@@ -59,7 +59,7 @@ if(!$log_error_)
--let SEARCH_FILE=$log_error_
--let SEARCH_RANGE=-50000
--let SEARCH_PATTERN=using_gtid\(1\), gtid\(\'\'\).*
--source include/search_pattern_in_file.inc
--source include/wait_for_pattern_in_file.inc
--connection slave
--source include/stop_slave.inc
@@ -71,7 +71,7 @@ CHANGE MASTER TO MASTER_USE_GTID=no;
--let SEARCH_FILE=$log_error_
--let SEARCH_RANGE=-50000
--let SEARCH_PATTERN=using_gtid\(0\), gtid\(\'\'\).*
--source include/search_pattern_in_file.inc
--source include/wait_for_pattern_in_file.inc
CREATE TABLE t (f INT) ENGINE=INNODB;
INSERT INTO t VALUES(10);
save_master_pos;
@@ -89,7 +89,7 @@ CHANGE MASTER TO MASTER_USE_GTID=slave_pos;
--let SEARCH_FILE=$log_error_
--let SEARCH_RANGE=-50000
--let SEARCH_PATTERN=using_gtid\(1\), gtid\(\'0-1-2\'\).*
--source include/search_pattern_in_file.inc
--source include/wait_for_pattern_in_file.inc
SET @@SESSION.gtid_domain_id=10;
INSERT INTO t VALUES(20);
save_master_pos;
@@ -107,7 +107,7 @@ CHANGE MASTER TO MASTER_USE_GTID=slave_pos;
--let SEARCH_FILE=$log_error_
--let SEARCH_RANGE=-50000
--let SEARCH_PATTERN=using_gtid\(1\), gtid\(\'0-1-2,10-1-1\'\).*
--source include/search_pattern_in_file.inc
--source include/wait_for_pattern_in_file.inc
--echo "===== Clean up ====="
--connection slave

View File

@@ -7,6 +7,9 @@
--source include/have_binlog_format_mixed.inc
--source include/master-slave.inc
connection server_2;
call mtr.add_suppression("Timeout waiting for reply of binlog");
# The following tests prove
# A.
# no out-of-order gtid error is done to the stict gtid mode semisync
@@ -66,10 +69,18 @@ evalp CHANGE MASTER TO master_host='127.0.0.1', master_port=$SERVER_MYPORT_2, ma
--connection server_2
set @@global.gtid_strict_mode = true;
set @@global.rpl_semi_sync_master_enabled = 1;
# The following command is likely to cause the slave master is not yet setup
# for semi-sync
INSERT INTO t1(a) VALUES (2);
--source include/save_master_gtid.inc
--connection server_1
# Update slave to notice that server_2 now has rpl_semi_sync_master_enabled
--source include/stop_slave.inc
--source include/start_slave.inc
--echo #
--echo # the successful sync is a required proof
--echo #

View File

@@ -189,6 +189,12 @@ eval CHANGE MASTER TO MASTER_DELAY = $time2;
--enable_query_log
--source include/start_slave.inc
# Ensure that slave has started properly
--connection master
INSERT INTO t1 VALUES ('Syncing slave', 5);
--save_master_pos
--sync_slave_with_master
--connection master
INSERT INTO t1 VALUES (delay_on_slave(1), 6);
--save_master_pos

View File

@@ -11,7 +11,7 @@ set @old_master_binlog_checksum= @@global.binlog_checksum;
# empty Gtid_list event
#
# Test this by binlog rotation before we log any GTIDs.
connection slave;
sync_slave_with_master;
--source include/stop_slave.inc
--echo # Test slave with no capability gets dummy event, which is ignored.
set @old_dbug= @@global.debug_dbug;

View File

@@ -17,7 +17,6 @@ call mtr.add_suppression("Read semi-sync reply");
call mtr.add_suppression("Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT.");
call mtr.add_suppression("mysqld: Got an error reading communication packets");
connection slave;
call mtr.add_suppression("Master server does not support semi-sync");
call mtr.add_suppression("Semi-sync slave .* reply");
call mtr.add_suppression("Slave SQL.*Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group");
connection master;
@@ -51,7 +50,7 @@ set global rpl_semi_sync_slave_enabled= 0;
connection master;
set global rpl_semi_sync_master_timeout= 60000; # 60s
set global rpl_semi_sync_master_timeout= 2000; # 2s
echo [ default state of semi-sync on master should be OFF ];
show variables like 'rpl_semi_sync_master_enabled';
@@ -195,12 +194,16 @@ sync_slave_with_master;
--echo # Test semi-sync master will switch OFF after one transaction
--echo # timeout waiting for slave reply.
--echo #
connection master;
show status like "Rpl_semi_sync_master_status";
connection slave;
source include/stop_slave.inc;
connection master;
--source include/kill_binlog_dump_threads.inc
set global rpl_semi_sync_master_timeout= 5000;
set global rpl_semi_sync_master_timeout= 2000;
# The first semi-sync check should be on because after slave stop,
# there are no transactions on the master.
@@ -232,8 +235,8 @@ show status like 'Rpl_semi_sync_master_status';
show status like 'Rpl_semi_sync_master_no_tx';
show status like 'Rpl_semi_sync_master_yes_tx';
# Semi-sync status on master is now OFF, so all these transactions
# will be replicated asynchronously.
# Semi-sync status on master is now ON, but there are no slaves attached,
# so all these transactions will be replicated asynchronously.
delete from t1 where a=10;
delete from t1 where a=9;
delete from t1 where a=8;
@@ -367,6 +370,9 @@ let $status_var= Rpl_semi_sync_master_clients;
let $status_var_value= 1;
source include/wait_for_status_var.inc;
sync_slave_with_master;
connection master;
replace_result $engine_type ENGINE_TYPE;
eval create table t1 (a int) engine = $engine_type;
insert into t1 values (1);
@@ -413,6 +419,10 @@ connection master;
let $status_var= Rpl_semi_sync_master_clients;
let $status_var_value= 1;
source include/wait_for_status_var.inc;
sync_slave_with_master;
connection master;
echo [ master semi-sync should be ON ];
show status like 'Rpl_semi_sync_master_clients';
show status like 'Rpl_semi_sync_master_status';

View File

@@ -14,7 +14,6 @@ call mtr.add_suppression("Unsafe statement written to the binary log using state
call mtr.add_suppression("mysqld: Got an error reading communication packets");
connection slave;
call mtr.add_suppression("Master server does not support semi-sync");
call mtr.add_suppression("Semi-sync slave .* reply");
call mtr.add_suppression("Slave SQL.*Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group");

View File

@@ -18,6 +18,7 @@
--connection server_1
RESET MASTER;
SET @@global.max_binlog_size= 4096;
set @@global.rpl_semi_sync_master_enabled = 1;
--connection server_2
RESET MASTER;
@@ -29,7 +30,6 @@ CHANGE MASTER TO master_use_gtid= slave_pos;
--connection server_1
ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
set @@global.rpl_semi_sync_master_enabled = 1;
set @@global.rpl_semi_sync_master_wait_point=AFTER_SYNC;
CREATE TABLE t1 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;

View File

@@ -0,0 +1,12 @@
!include include/default_mysqld.cnf
[mysqld.1]
[mysqld.2]
[mysqld.3]
[ENV]
SERVER_MYPORT_1= @mysqld.1.port
SERVER_MYPORT_2= @mysqld.2.port
SERVER_MYPORT_3= @mysqld.3.port

View File

@@ -0,0 +1,122 @@
#
# This test ensures that a primary will listen for ACKs by newly added
# semi-sync connections connections, after a pre-existing connection is already
# established. MDEV-32960 reported that the newly added slave's ACK can be
# ignored if listen_on_sockets() does not timeout before
# rpl_semi_sync_master_timeout, and if the existing semi-sync connections fail
# to send ACKs, semi-sync is switched off.
#
# This test ensures this in a two-replica setup with a semi-sync timeout of
# 500ms, and delaying the ACK reply of the first-established replica by 800ms
# to force a timeout, and allowing the second replica to immediately ACK.
#
# References:
# MDEV-32960: Semi-sync ACKed Transaction can Timeout and Switch Off
# Semi-sync with Multiple Replicas
#
--source include/have_debug.inc
# binlog_format independent
--source include/have_binlog_format_statement.inc
--let $rpl_topology= 1->2,1->3
--source include/rpl_init.inc
--connection server_1
set @old_enabled= @@global.rpl_semi_sync_master_enabled;
set @old_timeout= @@global.rpl_semi_sync_master_timeout;
set global rpl_semi_sync_master_enabled= 1;
set global rpl_semi_sync_master_timeout= 500;
--connection server_2
--source include/stop_slave.inc
set @old_enabled= @@global.rpl_semi_sync_slave_enabled;
set @old_dbug= @@global.debug_dbug;
set global rpl_semi_sync_slave_enabled= 1;
set global debug_dbug="+d,simulate_delay_semisync_slave_reply";
--source include/start_slave.inc
--connection server_3
--source include/stop_slave.inc
set @old_enabled= @@global.rpl_semi_sync_slave_enabled;
set global rpl_semi_sync_slave_enabled= 1;
--source include/start_slave.inc
--echo # Ensure primary recognizes both replicas are semi-sync
--connection server_1
--let $status_var_value= 2
--let $status_var= rpl_semi_sync_master_clients
--source include/wait_for_status_var.inc
--let $master_ss_status= query_get_value(SHOW STATUS LIKE 'rpl_semi_sync_master_status', Value, 1)
if (`SELECT strcmp("$master_ss_status", "ON") != 0`)
{
SHOW STATUS LIKE 'rpl_semi_sync_master_status';
--die rpl_semi_sync_master_status should be ON to start
}
--connection server_1
--let $init_master_yes_tx= query_get_value(SHOW STATUS LIKE 'rpl_semi_sync_master_yes_tx', Value, 1)
create table t1 (a int);
--connection server_2
--echo # Verifying server_2 did not send ACK
--let $slave1_sent_ack= query_get_value(SHOW STATUS LIKE 'rpl_semi_sync_slave_send_ack', Value, 1)
if (`SELECT $slave1_sent_ack`)
{
SHOW STATUS LIKE 'rpl_semi_sync_slave_send_ack';
--die server_2 should not have sent semi-sync ACK to primary
}
--connection server_3
--echo # Verifying server_3 did send ACK
--let $slave2_sent_ack= query_get_value(SHOW STATUS LIKE 'rpl_semi_sync_slave_send_ack', Value, 1)
if (`SELECT NOT $slave2_sent_ack`)
{
SHOW STATUS LIKE 'rpl_semi_sync_slave_send_ack';
--die server_3 should have sent semi-sync ACK to primary
}
--connection server_1
--echo # Verifying master's semi-sync status is still ON (This failed pre-MDEV-32960 fixes)
let $master_ss_status= query_get_value(SHOW STATUS LIKE 'rpl_semi_sync_master_status', Value, 1);
if (`SELECT strcmp("$master_ss_status", "ON") != 0`)
{
SHOW STATUS LIKE 'rpl_semi_sync_master_status';
--die rpl_semi_sync_master_status should not have switched off after server_3 ACKed transaction
}
--echo # Verifying rpl_semi_sync_master_yes_tx incremented
--let $cur_master_yes_tx= query_get_value(SHOW STATUS LIKE 'rpl_semi_sync_master_yes_tx', Value, 1)
if (`SELECT $cur_master_yes_tx != ($init_master_yes_tx + 1)`)
{
--echo # Initial yes_tx: $init_master_yes_tx
--echo # Current yes_tx: $cur_master_yes_tx
--die rpl_semi_sync_master_yes_tx should have been incremented by primary
}
--echo #
--echo # Cleanup
--connection server_2
set global rpl_semi_sync_slave_enabled= @old_enabled;
set global debug_dbug= @old_dbug;
--source include/stop_slave.inc
--connection server_3
set global rpl_semi_sync_slave_enabled= @old_enabled;
--source include/stop_slave.inc
--connection server_1
set global rpl_semi_sync_master_enabled= @old_enabled;
set global rpl_semi_sync_master_timeout= @old_timeout;
drop table t1;
--connection server_2
--source include/start_slave.inc
--connection server_3
--source include/start_slave.inc
--source include/rpl_end.inc
--echo # End of rpl_semi_sync_no_missed_ack_after_add_slave.test

View File

@@ -0,0 +1,73 @@
#
# MDEV-32551: "Read semi-sync reply magic number error" warnings on master
#
# Test that changing rpl_semi_sync_master_enabled after startup does not
# cause problems with semi-sync cleanup.
#
--source include/have_debug.inc
--source include/have_debug_sync.inc
# Test is binlog format independent, so save resources
--source include/have_binlog_format_row.inc
--source include/master-slave.inc
call mtr.add_suppression("Replication event checksum verification failed");
call mtr.add_suppression("could not queue event from master");
--echo #
--echo # Set up a semisync connection
--connection master
set @@global.rpl_semi_sync_master_enabled= ON;
--connection slave
stop slave io_thread;
set @@global.rpl_semi_sync_slave_enabled= ON;
set @old_dbug= @@global.debug_dbug;
# Force an error to abort out of the main IO thread loop
set @@global.debug_dbug= "+d,corrupt_queue_event";
# Pause the IO thread as soon as the main loop starts. Note we can't use
# processlist where "Waiting for master to send event" because the
# "corrupt_queue_event" will trigger before we can turn semisync OFF
set @@global.debug_dbug= "+d,pause_before_io_read_event";
# Because the other debug_dbug points are automatically negated when they are
# run, and there is a bug that if "-d" takes us to an empty debug string state,
# _all_ debug_print statements are output
set @@global.debug_dbug= "+d,placeholder";
start slave io_thread;
--echo # Disable semi-sync on the slave while the IO thread is active
set debug_sync='now wait_for io_thread_at_read_event';
set @@global.rpl_semi_sync_slave_enabled= OFF;
set debug_sync='now signal io_thread_continue_read_event';
--echo # Waiting for the slave to stop with the error from corrupt_queue_event
--connection slave
--let $slave_io_errno= 1595,1743
--source include/wait_for_slave_io_error.inc
--echo # Sleep 1 to give time for Ack_receiver to receive COM_QUIT
--sleep 1
--let $assert_text= Check that there is no 'Read semi-sync reply magic number error' in error log.
--let $assert_select=magic number error
--let $assert_file= $MYSQLTEST_VARDIR/log/mysqld.1.err
--let $assert_count= 0
--let $assert_only_after=CURRENT_TEST
--source include/assert_grep.inc
--echo #
--echo # Cleanup
--connection slave
--source include/stop_slave.inc
set @@global.debug_dbug= @old_dbug;
--source include/start_slave.inc
--connection master
set @@global.rpl_semi_sync_master_enabled= default;
--source include/rpl_end.inc
--echo # End of rpl_semi_sync_slave_enabled_consistent.test

View File

@@ -31,6 +31,7 @@
--connection master
call mtr.add_suppression("Timeout waiting for reply of binlog*");
call mtr.add_suppression("Master server does not read semi-sync messages*");
--let $sav_timeout_master=`SELECT @@GLOBAL.rpl_semi_sync_master_timeout`
set global rpl_semi_sync_master_enabled = ON;
SET @@GLOBAL.rpl_semi_sync_master_timeout=100;
@@ -54,9 +55,9 @@ if(!$log_error_)
# does not know the location of its .err log, use default location
let $log_error_ = $MYSQLTEST_VARDIR/log/mysqld.2.err;
}
--echo "Assert that the net_fulsh() reply failed is present in slave error log.
--echo "Assert that Master server does not read semi-sync messages" is present in slave error log.
--let SEARCH_FILE=$log_error_
--let SEARCH_PATTERN=Semi-sync slave net_flush\(\) reply failed
--let SEARCH_PATTERN=Master server does not read semi-sync messages
--source include/search_pattern_in_file.inc
--echo "Assert that Slave IO thread is up and running."

View File

@@ -16,7 +16,6 @@ CALL mtr.add_suppression("Failed on request_dump()*");
CALL mtr.add_suppression("Semi-sync master failed on*");
CALL mtr.add_suppression("Master command COM_BINLOG_DUMP failed*");
CALL mtr.add_suppression("on master failed*");
CALL mtr.add_suppression("Master server does not support semi-sync*");
CALL mtr.add_suppression("Semi-sync slave net_flush*");
CALL mtr.add_suppression("Failed to flush master info*");
CALL mtr.add_suppression("Request to stop slave SQL Thread received while apply*");

View File

@@ -7,6 +7,12 @@ disable_query_log;
call mtr.add_suppression("Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT");
enable_query_log;
select @@rpl_semi_sync_master_enabled;
connection slave;
select @@rpl_semi_sync_slave_enabled;
show status like "rpl_semi_sync_slave_status";
connection master;
drop table if exists t1;
create table t1(a varchar(100),b int);
set @@session.sql_mode=pipes_as_concat;

View File

@@ -0,0 +1 @@
--rpl_semi_sync_master_enabled=1 --rpl_semi_sync_slave_enabled=1

View File

@@ -0,0 +1 @@
--rpl_semi_sync_slave_enabled=1

View File

@@ -0,0 +1,3 @@
# Replication of session variables when semi-sync is on
--source rpl_session_var.test

View File

@@ -28,6 +28,9 @@ while (`SELECT $i <= $slaves`)
--inc $i
}
# The following script will restart master and slaves. This will also set
# rpl_semi_sync_master_enabled=0
--source include/rpl_shutdown_wait_slaves.inc
--let i= 2
while (`SELECT $i <= $slaves`)