Major replication test framework cleanup. This does the following:
- Ensure that all tests clean up the replication state when they
finish, by making check-testcase check the output of SHOW SLAVE STATUS.
This implies:
- Slave must not be running after test finished. This is good
because it removes the risk for sporadic errors in subsequent
tests when a test forgets to sync correctly.
- Slave SQL and IO errors must be cleared when test ends. This is
good because we will notice if a test gets an unexpected error in
the slave threads near the end.
- We no longer have to clean up before a test starts.
- Ensure that all tests that wait for an error in one of the slave
threads waits for a specific error. It is no longer possible to
source wait_for_slave_[sql|io]_to_stop.inc when there is an error
in one of the slave threads. This is good because:
- If a test expects an error but there is a bug that causes
another error to happen, or if it stops the slave thread without
an error, then we will notice.
- When developing tests, wait_for_*_to_[start|stop].inc will fail
immediately if there is an error in the relevant slave thread.
Before this patch, we had to wait for the timeout.
- Remove duplicated and repeated code for setting up unusual replication
topologies. Now, there is a single file that is capable of setting
up arbitrary topologies (include/rpl_init.inc, but
include/master-slave.inc is still available for the most common
topology). Tests can now end with include/rpl_end.inc, which will clean
up correctly no matter what topology is used. The topology can be
changed with include/rpl_change_topology.inc.
- Improved debug information when tests fail. This includes:
- debug info is printed on all servers configured by include/rpl_init.inc
- User can set $rpl_debug=1, which makes auxiliary replication files
print relevant debug info.
- Improved documentation for all auxiliary replication files. Now they
describe purpose, usage, parameters, and side effects.
- Many small code cleanups:
- Made have_innodb.inc output a sensible error message.
- Moved contents of rpl000017-slave.sh into rpl000017.test
- Added mysqltest variables that expose the current state of
disable_warnings/enable_warnings and friends.
- Too many to list here: see per-file comments for details.
Some of the test cases reference to binlog position and
these position numbers are written into result explicitly.
It is difficult to maintain if log event format changes.
There are a couple of cases explicit position number appears,
we handle them in different ways
A. 'CHANGE MASTER ...' with MASTER_LOG_POS or/and RELAY_LOG_POS options
Use --replace_result to mask them.
B. 'SHOW BINLOG EVENT ...'
Replaced by show_binlog_events.inc or wait_for_binlog_event.inc.
show_binlog_events.inc file's function is enhanced by given
$binlog_file and $binlog_limit.
C. 'SHOW SLAVE STATUS', 'show_slave_status.inc' and 'show_slave_status2.inc'
For the test cases just care a few items in the result of 'SHOW SLAVE STATUS',
only the items related to each test case are showed.
'show_slave_status.inc' is rebuild, only the given items in $status_items
will be showed.
'check_slave_is_running.inc' and 'check_slave_no_error.inc'
and 'check_slave_param.inc' are auxiliary files helping
to show running status and error information easily.
Problem: The rpl_ndb did not set binlog_format explicitly. Since
the default is binlog_format=statement, it means that the suite
ran with that. ndb does not support binlog_format=statement,
and many tests were skipped because they sourced
include/have_binlog_format_row_or_mixed.inc
Fix: set binlog_format=row explicitly in the configuration file
for the rpl_ndb suite.
When using a non-transactional table (t1) on the master
and with autocommit disabled, no COMMIT is recorded
in the binary log ending the statement. Therefore, if
the slave has t1 in a transactional engine, then it will
be as if a transaction is started but never ends. This is
actually BUG#29288 all over again.
We fix this by cherrypicking the cset for BUG#29288 which
was pushed to a later mysql version. The revision picked
was: mats@sun.com-20090923094343-bnheplq8n95opjay .
Additionally, a test case for covering the scenario depicted
in the bug report is included in this cset.
cant find record
Some engines return data for the record. Despite the fact that
the null bit is set for some fields, their old value may still in
the row. This can happen when unpacking an AI from the binlog on
top of a previous record in which a field is set to NULL, which
previously contained a value. Ultimately, this may cause the
comparison of records to fail when the slave is doing an index or
range scan.
We fix this by deploying a call to reset() for each field that is
set to null while unpacking a row from the binary log.
Furthermore, we also add mixed mode test case to cover the
scenario where updating and setting a field to null through a
Query event and later searching it through a rows event will
succeed.
Finally, we also change the reset() method, from Field_bit class,
so that it takes into account bits stored among the null bits and
not only the ones stored in the record.
failure to cleanup of table
The test case was not dropping a table before exiting (ie, it was
not cleaning itself after execution). In this case, the warning
message stating that the test did not do a proper cleanup was
deterministic (which can be annoying).
I have found other tests cases on which mtr sporadically reports
that they have not cleaned up after execution:
- rpl_ndb_circular
- rpl_failed_optimize
In this case, the master was dropping a table but there was no
synchronization between the slave and the master.
This patch addresses the rpl_ndb_circular_simplex case by adding
the missing DROP table. The other cases are fixed by deploying
the missing sync_slave_with_master instruction.
slave leaves slave unstable
Problem: when replicating from non-transactional to
transactional engine with autocommit off, no BEGIN/COMMIT
is written to the binlog. When the slave replicates, it
will start a transaction that never ends.
Fix: Force autocommit=on on slave by always replicating
autocommit=1 from the master.
The bug happened because filtering-out a STMT_END_F-flagged event so that
the transaction COMMIT finds traces of incomplete statement commit.
Such situation is only possible with ndb circular replication. The filtered-out
rows event is one that immediately preceeds the COMMIT query event.
Fixed with deploying an the rows-log-event statement commit at executing
of the transaction COMMIT event.
Resources that were allocated by other than STMT_END_F-flagged event of
the last statement are clean up prior execution of the commit logics.
There are two issues:
1. 6.0 uses the obsolate master-*** server options;
2. the test is not deterministic in that although master vs slave consistency is
fine, two runs of the test can have different results. The reason of the
non-determinism is the combination of
a chosen way to demo results and the ndb_autoincrement_prefetch_sz feature.
The current patch fixes the 2nd issue by putting out results via diff_table macro
instead of the former run-sensitive method.
The 1st issue is going to be fixed by a separate patch to 6.0.
conflicts:
Text conflict in client/mysqltest.cc
Text conflict in mysql-test/include/wait_until_connected_again.inc
Text conflict in mysql-test/lib/mtr_report.pm
Text conflict in mysql-test/mysql-test-run.pl
Text conflict in mysql-test/r/events_bugs.result
Text conflict in mysql-test/r/log_state.result
Text conflict in mysql-test/r/myisam_data_pointer_size_func.result
Text conflict in mysql-test/r/mysqlcheck.result
Text conflict in mysql-test/r/query_cache.result
Text conflict in mysql-test/r/status.result
Text conflict in mysql-test/suite/binlog/r/binlog_index.result
Text conflict in mysql-test/suite/binlog/r/binlog_innodb.result
Text conflict in mysql-test/suite/rpl/r/rpl_packet.result
Text conflict in mysql-test/suite/rpl/t/rpl_packet.test
Text conflict in mysql-test/t/disabled.def
Text conflict in mysql-test/t/events_bugs.test
Text conflict in mysql-test/t/log_state.test
Text conflict in mysql-test/t/myisam_data_pointer_size_func.test
Text conflict in mysql-test/t/mysqlcheck.test
Text conflict in mysql-test/t/query_cache.test
Text conflict in mysql-test/t/rpl_init_slave_func.test
Text conflict in mysql-test/t/status.test
- remove totally wrong (syntax) entries from disabled.def
- remove entries belonging to deleted tests from disabled.def
- correct the comments (correct the bug mentioned) of entries in disabled.def
- remove never completed tests which were accidently pushed
Problem: Many test cases don't clean up after themselves (fail
to drop tables or fail to reset variables). This implies that:
(1) check-testcase in the new mtr that currently lives in
5.1-rpl failed. (2) it may cause unexpected results in
subsequent tests.
Fix: make all tests clean up.
Also: cleaned away unnecessary output in rpl_packet.result
Also: fixed bug where rpl_log called RESET MASTER with a running
slave. This is not supposed to work.
Also: removed unnecessary code from rpl_stm_EE_err2 and made it
verify that an error occurred.
Also: removed unnecessary code from rpl_ndb_ctype_ucs2_def.
According to documenation:
http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-replication-issues.html
When setting up circular replication for clusters with different
SQL nodes in each cluster used as replication master and slave,
SQL nodes must not start with --log-slave-updates option.
This patch fixed the test case by remove log-slave-updates
configuration from test case configuration file.
pb notices differences in results at the very beginning of the test.
Absense of mysql.ndb_apply_status must be benign anyway, but the warning should not happen
if have_ndb.inc is invoked ahead of ndb_master-slave.
Fixed with relocation of the macros.
manually resolved conflicts:
Text conflict in client/mysqltest.c
Contents conflict in mysql-test/include/have_bug25714.inc
Text conflict in mysql-test/include/have_ndbapi_examples.inc
Text conflict in mysql-test/mysql-test-run.pl
Text conflict in mysql-test/suite/parts/inc/partition_check_drop.inc
Text conflict in mysql-test/suite/parts/inc/partition_layout.inc
Text conflict in mysql-test/suite/parts/inc/partition_layout_check1.inc
Text conflict in mysql-test/suite/parts/inc/partition_layout_check2.inc
Text conflict in mysql-test/suite/parts/r/partition_alter1_1_2_myisam.result
Text conflict in mysql-test/suite/parts/r/partition_alter1_1_myisam.result
Text conflict in mysql-test/suite/parts/r/partition_alter1_2_myisam.result
Text conflict in mysql-test/suite/parts/r/partition_alter2_myisam.result
Text conflict in mysql-test/suite/parts/r/partition_alter3_innodb.result
Text conflict in mysql-test/suite/parts/r/partition_alter3_myisam.result
Text conflict in mysql-test/suite/parts/r/partition_basic_innodb.result
Text conflict in mysql-test/suite/parts/r/partition_basic_myisam.result
Text conflict in mysql-test/suite/parts/r/partition_basic_symlink_myisam.result
Text conflict in mysql-test/suite/parts/r/partition_engine_myisam.result
Text conflict in mysql-test/suite/parts/r/partition_syntax_myisam.result
Text conflict in mysql-test/suite/rpl_ndb/t/disabled.def
Text conflict in mysql-test/t/disabled.def
Problem 1: tests often fail in pushbuild with a timeout when waiting
for the slave to start/stop/receive error.
Fix 1: Updated the wait_for_slave_* macros in the following way:
- The timeout is increased by a factor ten
- Refactored the macros so that wait_for_slave_param does the work for
the other macros.
Problem 2: Tests are often incorrectly written, lacking a
source include/wait_for_slave_to_[start|stop].inc.
Fix 2: Improved the chance to get it right by adding
include/start_slave.inc and include/stop_slave.inc, and updated tests
to use these.
Problem 3: The the built-in test language command
wait_for_slave_to_stop is a misnomer (does not wait for the slave io
thread) and does not give as much debug info in case of failure as
the otherwise equivalent macro
source include/wait_for_slave_sql_to_stop.inc
Fix 3: Replaced all calls to the built-in command by a call to the
macro.
Problem 4: Some, but not all, of the wait_for_slave_* macros had an
implicit connection slave. This made some tests confusing to read,
and made it more difficult to use the macro in circular replication
scenarios, where the connection named master needs to wait.
Fix 4: Removed the implicit connection slave from all
wait_for_slave_* macros, and updated tests to use an explicit
connection slave where necessary.
Problem 5: The macros wait_slave_status.inc and wait_show_pattern.inc
were unused. Moreover, using them is difficult and error-prone.
Fix 5: remove these macros.
Problem 6: log_bin_trust_function_creators_basic failed when running
tests because it assumed @@global.log_bin_trust_function_creators=1,
and some tests modified this variable without resetting it to its
original value.
Fix 6: All tests that use this variable have been updated so that
they reset the value at end of test.
Problem: rpl_ndb_transaction fails because it assumes nothing
is written to the binlog at a certain point. However, ndb may
binlog updates in ndb system tables at a nondeterministic
time point after an ndb table update has been committed.
Fix: break the test into two. rpl_ndb_transaction still does
the ndb updates needed by the first half of the test. The new
test case rpl_bug26395 includes the part that assumes nothing
more will be written to the binlog.
The reason is that we are using a sleep to wait for slave to reach the
slave_transaction_retries limit.
Fix: wait for the slave to stop instead. This is what we want to do, since
the slave stops when the limit is reached.