Major replication test framework cleanup. This does the following:
- Ensure that all tests clean up the replication state when they
finish, by making check-testcase check the output of SHOW SLAVE STATUS.
This implies:
- Slave must not be running after test finished. This is good
because it removes the risk for sporadic errors in subsequent
tests when a test forgets to sync correctly.
- Slave SQL and IO errors must be cleared when test ends. This is
good because we will notice if a test gets an unexpected error in
the slave threads near the end.
- We no longer have to clean up before a test starts.
- Ensure that all tests that wait for an error in one of the slave
threads waits for a specific error. It is no longer possible to
source wait_for_slave_[sql|io]_to_stop.inc when there is an error
in one of the slave threads. This is good because:
- If a test expects an error but there is a bug that causes
another error to happen, or if it stops the slave thread without
an error, then we will notice.
- When developing tests, wait_for_*_to_[start|stop].inc will fail
immediately if there is an error in the relevant slave thread.
Before this patch, we had to wait for the timeout.
- Remove duplicated and repeated code for setting up unusual replication
topologies. Now, there is a single file that is capable of setting
up arbitrary topologies (include/rpl_init.inc, but
include/master-slave.inc is still available for the most common
topology). Tests can now end with include/rpl_end.inc, which will clean
up correctly no matter what topology is used. The topology can be
changed with include/rpl_change_topology.inc.
- Improved debug information when tests fail. This includes:
- debug info is printed on all servers configured by include/rpl_init.inc
- User can set $rpl_debug=1, which makes auxiliary replication files
print relevant debug info.
- Improved documentation for all auxiliary replication files. Now they
describe purpose, usage, parameters, and side effects.
- Many small code cleanups:
- Made have_innodb.inc output a sensible error message.
- Moved contents of rpl000017-slave.sh into rpl000017.test
- Added mysqltest variables that expose the current state of
disable_warnings/enable_warnings and friends.
- Too many to list here: see per-file comments for details.
RBR was not considering the option --slave-skip-errors.
To fix the problem, we are reporting the ignored ERROR(s) as warnings thus avoiding
stopping the SQL Thread. Besides, it fixes the output of "SHOW VARIABLES LIKE
'slave_skip_errors'" which was showing nothing when the value "all" was assigned
to --slave-skip-errors.
@sql/log_event.cc
skipped rbr errors when the option skip-slave-errors is set.
@sql/slave.cc
fixed the output of for SHOW VARIABLES LIKE 'slave_skip_errors'"
@test-cases
fixed the output of rpl.rpl_idempotency
updated the test case rpl_skip_error
Problem: Many test cases don't clean up after themselves (fail
to drop tables or fail to reset variables). This implies that:
(1) check-testcase in the new mtr that currently lives in
5.1-rpl failed. (2) it may cause unexpected results in
subsequent tests.
Fix: make all tests clean up.
Also: cleaned away unnecessary output in rpl_packet.result
Also: fixed bug where rpl_log called RESET MASTER with a running
slave. This is not supposed to work.
Also: removed unnecessary code from rpl_stm_EE_err2 and made it
verify that an error occurred.
Also: removed unnecessary code from rpl_ndb_ctype_ucs2_def.
Problem: during a refactoring of mtr, a pattern for suppressing a warning from lowercase_table3 was lost.
Fix: re-introduce the suppression.
Problem 2: suppression was misspelt as supression. Fixed by adding a p.
Problem 1: tests often fail in pushbuild with a timeout when waiting
for the slave to start/stop/receive error.
Fix 1: Updated the wait_for_slave_* macros in the following way:
- The timeout is increased by a factor ten
- Refactored the macros so that wait_for_slave_param does the work for
the other macros.
Problem 2: Tests are often incorrectly written, lacking a
source include/wait_for_slave_to_[start|stop].inc.
Fix 2: Improved the chance to get it right by adding
include/start_slave.inc and include/stop_slave.inc, and updated tests
to use these.
Problem 3: The the built-in test language command
wait_for_slave_to_stop is a misnomer (does not wait for the slave io
thread) and does not give as much debug info in case of failure as
the otherwise equivalent macro
source include/wait_for_slave_sql_to_stop.inc
Fix 3: Replaced all calls to the built-in command by a call to the
macro.
Problem 4: Some, but not all, of the wait_for_slave_* macros had an
implicit connection slave. This made some tests confusing to read,
and made it more difficult to use the macro in circular replication
scenarios, where the connection named master needs to wait.
Fix 4: Removed the implicit connection slave from all
wait_for_slave_* macros, and updated tests to use an explicit
connection slave where necessary.
Problem 5: The macros wait_slave_status.inc and wait_show_pattern.inc
were unused. Moreover, using them is difficult and error-prone.
Fix 5: remove these macros.
Problem 6: log_bin_trust_function_creators_basic failed when running
tests because it assumed @@global.log_bin_trust_function_creators=1,
and some tests modified this variable without resetting it to its
original value.
Fix 6: All tests that use this variable have been updated so that
they reset the value at end of test.
without PK
Bug#31609 Not all RBR slave errors reported as errors
bug#32468 delete rows event on a table with foreign key constraint fails
The first two bugs comprise idempotency issues.
First, there was no error code reported under conditions of the bug
description although the slave sql thread halted.
Second, executions were different with and without presence of prim key in
the table.
Third, there was no way to instruct the slave whether to ignore an error
and skip to the following event or to halt.
Fourth, there are handler errors which might happen due to idempotent
applying of binlog but those were not listed among the "idempotent" error
list.
All the named issues are addressed.
Wrt to the 3rd, there is the new global system variable, changeble at run
time, which controls the slave sql thread behaviour.
The new variable allows further extensions to mimic the sql_mode
session/global variable.
To address the 4th, the new bug#32468 had to be fixed as it was staying
in the way.
The rpl_trigger test case indicated a problem with idempotency support when run
under row-based replication, which this patch fixes.
However, despite this, the test is not designed for execution under row-based
replication and hence rpl_trigger.test is not executed under row-based
replication.
The problem is that the test expects triggers to be executed when the slave
updates rows on the slave, and this is (deliberately) not done with row-based
replication.