Major replication test framework cleanup. This does the following:
- Ensure that all tests clean up the replication state when they
finish, by making check-testcase check the output of SHOW SLAVE STATUS.
This implies:
- Slave must not be running after test finished. This is good
because it removes the risk for sporadic errors in subsequent
tests when a test forgets to sync correctly.
- Slave SQL and IO errors must be cleared when test ends. This is
good because we will notice if a test gets an unexpected error in
the slave threads near the end.
- We no longer have to clean up before a test starts.
- Ensure that all tests that wait for an error in one of the slave
threads waits for a specific error. It is no longer possible to
source wait_for_slave_[sql|io]_to_stop.inc when there is an error
in one of the slave threads. This is good because:
- If a test expects an error but there is a bug that causes
another error to happen, or if it stops the slave thread without
an error, then we will notice.
- When developing tests, wait_for_*_to_[start|stop].inc will fail
immediately if there is an error in the relevant slave thread.
Before this patch, we had to wait for the timeout.
- Remove duplicated and repeated code for setting up unusual replication
topologies. Now, there is a single file that is capable of setting
up arbitrary topologies (include/rpl_init.inc, but
include/master-slave.inc is still available for the most common
topology). Tests can now end with include/rpl_end.inc, which will clean
up correctly no matter what topology is used. The topology can be
changed with include/rpl_change_topology.inc.
- Improved debug information when tests fail. This includes:
- debug info is printed on all servers configured by include/rpl_init.inc
- User can set $rpl_debug=1, which makes auxiliary replication files
print relevant debug info.
- Improved documentation for all auxiliary replication files. Now they
describe purpose, usage, parameters, and side effects.
- Many small code cleanups:
- Made have_innodb.inc output a sensible error message.
- Moved contents of rpl000017-slave.sh into rpl000017.test
- Added mysqltest variables that expose the current state of
disable_warnings/enable_warnings and friends.
- Too many to list here: see per-file comments for details.
Some of the test cases reference to binlog position and
these position numbers are written into result explicitly.
It is difficult to maintain if log event format changes.
There are a couple of cases explicit position number appears,
we handle them in different ways
A. 'CHANGE MASTER ...' with MASTER_LOG_POS or/and RELAY_LOG_POS options
Use --replace_result to mask them.
B. 'SHOW BINLOG EVENT ...'
Replaced by show_binlog_events.inc or wait_for_binlog_event.inc.
show_binlog_events.inc file's function is enhanced by given
$binlog_file and $binlog_limit.
C. 'SHOW SLAVE STATUS', 'show_slave_status.inc' and 'show_slave_status2.inc'
For the test cases just care a few items in the result of 'SHOW SLAVE STATUS',
only the items related to each test case are showed.
'show_slave_status.inc' is rebuild, only the given items in $status_items
will be showed.
'check_slave_is_running.inc' and 'check_slave_no_error.inc'
and 'check_slave_param.inc' are auxiliary files helping
to show running status and error information easily.
This patch fixes two problems described as follows:
1 - If there is an on-going transaction and a temporary table is created or
dropped, any failed statement that follows the "create" or "drop commands"
triggers a rollback and by consequence the slave will go out sync because
the binary log will have a wrong sequence of events.
To fix the problem, we changed the expression that evaluates when the
cache should be flushed after either the rollback of a statment or
transaction.
2 - When a "CREATE TEMPORARY TABLE SELECT * FROM" was executed the
OPTION_KEEP_LOG was not set into the thd->options. For that reason, if
the transaction had updated only transactional engines and was rolled
back at the end (.e.g due to a deadlock) the changes were not written
to the binary log, including the creation of the temporary table.
To fix the problem, we have set the OPTION_KEEP_LOG into the thd->options
when a "CREATE TEMPORARY TABLE SELECT * FROM" is executed.
General overview:
The logic for switching to row format when binlog_format=MIXED had
numerous flaws. The underlying problem was the lack of a consistent
architecture.
General purpose of this changeset:
This changeset introduces an architecture for switching to row format
when binlog_format=MIXED. It enforces the architecture where it has
to. It leaves some bugs to be fixed later. It adds extensive tests to
verify that unsafe statements work as expected and that appropriate
errors are produced by problems with the selection of binlog format.
It was not practical to split this into smaller pieces of work.
Problem 1:
To determine the logging mode, the code has to take several parameters
into account (namely: (1) the value of binlog_format; (2) the
capabilities of the engines; (3) the type of the current statement:
normal, unsafe, or row injection). These parameters may conflict in
several ways, namely:
- binlog_format=STATEMENT for a row injection
- binlog_format=STATEMENT for an unsafe statement
- binlog_format=STATEMENT for an engine only supporting row logging
- binlog_format=ROW for an engine only supporting statement logging
- statement is unsafe and engine does not support row logging
- row injection in a table that does not support statement logging
- statement modifies one table that does not support row logging and
one that does not support statement logging
Several of these conflicts were not detected, or were detected with
an inappropriate error message. The problem of BUG#39934 was that no
appropriate error message was written for the case when an engine
only supporting row logging executed a row injection with
binlog_format=ROW. However, all above cases must be handled.
Fix 1:
Introduce new error codes (sql/share/errmsg.txt). Ensure that all
conditions are detected and handled in decide_logging_format()
Problem 2:
The binlog format shall be determined once per statement, in
decide_logging_format(). It shall not be changed before or after that.
Before decide_logging_format() is called, all information necessary to
determine the logging format must be available. This principle ensures
that all unsafe statements are handled in a consistent way.
However, this principle is not followed:
thd->set_current_stmt_binlog_row_based_if_mixed() is called in several
places, including from code executing UPDATE..LIMIT,
INSERT..SELECT..LIMIT, DELETE..LIMIT, INSERT DELAYED, and
SET @@binlog_format. After Problem 1 was fixed, that caused
inconsistencies where these unsafe statements would not print the
appropriate warnings or errors for some of the conflicts.
Fix 2:
Remove calls to THD::set_current_stmt_binlog_row_based_if_mixed() from
code executed after decide_logging_format(). Compensate by calling the
set_current_stmt_unsafe() at parse time. This way, all unsafe statements
are detected by decide_logging_format().
Problem 3:
INSERT DELAYED is not unsafe: it is logged in statement format even if
binlog_format=MIXED, and no warning is printed even if
binlog_format=STATEMENT. This is BUG#45825.
Fix 3:
Made INSERT DELAYED set itself to unsafe at parse time. This allows
decide_logging_format() to detect that a warning should be printed or
the binlog_format changed.
Problem 4:
LIMIT clause were not marked as unsafe when executed inside stored
functions/triggers/views/prepared statements. This is
BUG#45785.
Fix 4:
Make statements containing the LIMIT clause marked as unsafe at
parse time, instead of at execution time. This allows propagating
unsafe-ness to the view.
Problem: during a refactoring of mtr, a pattern for suppressing a warning from lowercase_table3 was lost.
Fix: re-introduce the suppression.
Problem 2: suppression was misspelt as supression. Fixed by adding a p.
without PK
Bug#31609 Not all RBR slave errors reported as errors
bug#32468 delete rows event on a table with foreign key constraint fails
The first two bugs comprise idempotency issues.
First, there was no error code reported under conditions of the bug
description although the slave sql thread halted.
Second, executions were different with and without presence of prim key in
the table.
Third, there was no way to instruct the slave whether to ignore an error
and skip to the following event or to halt.
Fourth, there are handler errors which might happen due to idempotent
applying of binlog but those were not listed among the "idempotent" error
list.
All the named issues are addressed.
Wrt to the 3rd, there is the new global system variable, changeble at run
time, which controls the slave sql thread behaviour.
The new variable allows further extensions to mimic the sql_mode
session/global variable.
To address the 4th, the new bug#32468 had to be fixed as it was staying
in the way.
replication):
Incremental patch to enable idempotency support for update events again.
The final handling of errors will be done in BUG#31609, and until then
the handling of errors should be consistent between the different types
of changes.
If a temporary error occured inside a group on an event that was not the first
event of the group, the slave could get stuck because the retry counter is reset
whenever an event was executed successfully.
This patch only reset the retry counter when an entire group has been successfully
executed, or failed with a non-transient error.