Major replication test framework cleanup. This does the following:
- Ensure that all tests clean up the replication state when they
finish, by making check-testcase check the output of SHOW SLAVE STATUS.
This implies:
- Slave must not be running after test finished. This is good
because it removes the risk for sporadic errors in subsequent
tests when a test forgets to sync correctly.
- Slave SQL and IO errors must be cleared when test ends. This is
good because we will notice if a test gets an unexpected error in
the slave threads near the end.
- We no longer have to clean up before a test starts.
- Ensure that all tests that wait for an error in one of the slave
threads waits for a specific error. It is no longer possible to
source wait_for_slave_[sql|io]_to_stop.inc when there is an error
in one of the slave threads. This is good because:
- If a test expects an error but there is a bug that causes
another error to happen, or if it stops the slave thread without
an error, then we will notice.
- When developing tests, wait_for_*_to_[start|stop].inc will fail
immediately if there is an error in the relevant slave thread.
Before this patch, we had to wait for the timeout.
- Remove duplicated and repeated code for setting up unusual replication
topologies. Now, there is a single file that is capable of setting
up arbitrary topologies (include/rpl_init.inc, but
include/master-slave.inc is still available for the most common
topology). Tests can now end with include/rpl_end.inc, which will clean
up correctly no matter what topology is used. The topology can be
changed with include/rpl_change_topology.inc.
- Improved debug information when tests fail. This includes:
- debug info is printed on all servers configured by include/rpl_init.inc
- User can set $rpl_debug=1, which makes auxiliary replication files
print relevant debug info.
- Improved documentation for all auxiliary replication files. Now they
describe purpose, usage, parameters, and side effects.
- Many small code cleanups:
- Made have_innodb.inc output a sensible error message.
- Moved contents of rpl000017-slave.sh into rpl000017.test
- Added mysqltest variables that expose the current state of
disable_warnings/enable_warnings and friends.
- Too many to list here: see per-file comments for details.
mode
When the master was executing in sql_mode='traditional' (which
implies that really_abort_on_warning returns TRUE - because of
MODE_STRICT_ALL_TABLES), the error code (ER_DUP_ENTRY in the
reported case) was not being set in the
Query_log_event. Therefore, even if a failure was to be expected
when replaying the statement on the slave, a failure would occur,
because the Query_log_event was not transporting the expected
error code, but 0 instead.
This was because when the master was getting the error code to
set it in the Query_log_event, the executing thread would be
assumed to have been killed:
THD::killed==THD::KILL_BAD_DATA. This would make the error code
fetch routine not to check thd->main_da.sql_errno(), but instead
the thd->killed value. What's more, is that the server would
thd->killed value if thd->killed == THD::KILL_BAD_DATA and return
0 instead. So this is a double inconsistency, as the we should
not even check thd->killed but rather thd->main_da.sql_errno().
We fix this by extending the condition used to choose whether to
check the thd->main_da.sql_errno() or thd->killed, so that it
takes into consideration the case when:
thd->killed==THD::KILL_BAD_DATA.