Network error happened here, but it can be caused by CR_CONNECTION_ERROR,
CR_CONN_HOST_ERROR, CR_SERVER_GONE_ERROR, CR_SERVER_LOST, ER_CON_COUNT_ERROR,
and ER_SERVER_SHUTDOWN. We just check CR_SERVER_LOST here, so the test fails.
To fix the problem, check all errors that can be cause by the master shutdown.
But there is no Last_IO_Error reported.
On the master, if a binary log event is larger than max_allowed_packet,
ER_MASTER_FATAL_ERROR_READING_BINLOG and the specific reason of this error is
sent to a slave when it requests a dump from the master, thus leading
the I/O thread to stop.
On a slave, the I/O thread stops when receiving a packet larger than max_allowed_packet.
In both cases, however, there was no Last_IO_Error reported.
This patch adds code to report the Last_IO_Error and exact reason before stopping the
I/O thread and also reports the case the out memory pops up while
handling packets from the master.
The test case rpl_do_grant fails sporadically on PB2 with "Access
denied for user 'create_rout_db'@'localhost' ...". Inspecting the
test case, one may find that if issues a GRANT on the master
connection and immediately after it creates two new connections
(one to the master and one to the slave) using the credentials
set with the GRANT.
Unfortunately, there is no synchronization between master and
slave after the grant and before the connections are
established. This can result in slave not having executed the
GRANT by the time the connection is attempted.
This patch fixes this by deploying a sync_slave_with_master
between the grant and the connections attempt.
The test case creates two temporary tables, then closes the
connection, waits for it to disconnect, then syncs the slave with
the master, checks for remaining opened temporary tables on
slave (which should be 0) and finally drops the used
database (mysqltest).
Unfortunately, sometimes, the test fails with one open table on
the slave. This seems to be caused by the fact that waiting for
the connection to be closed is not sufficient. The test needs to
wait for the DROP event to be logged and only then synchronize
the slave with the master and proceed with the check. This is
caused by the asynchronous nature of the disconnect wrt
binlogging of the DROP temporary table statement.
We fix this by deploying a call to wait_for_binlog_event.inc
on the test case, which makes execution to wait for the DROP
temp tables event before synchronizing master and slave.
In RBR, There is an inconsistency between slaves and master.
When INSERT statement which includes an auto_increment field is executed,
Store engine of master will check the value of the auto_increment field.
It will generate a sequence number and then replace the value, if its value is NULL or empty.
if the field's value is 0, the store engine will do like encountering the NULL values
unless NO_AUTO_VALUE_ON_ZERO is set into SQL_MODE.
In contrast, if the field's value is 0, Store engine of slave always generates a new sequence number
whether or not NO_AUTO_VALUE_ON_ZERO is set into SQL_MODE.
SQL MODE of slave sql thread is always consistency with master's.
Another variable is related to this bug.
If generateing a sequence number is decided by the values of
table->auto_increment_field_not_null and SQL_MODE(if includes MODE_NO_AUTO_VALUE_ON_ZERO)
The table->auto_increment_is_not_null is FALSE, which causes this bug to appear. ..
Essentially, Bug#45574 results in this bug. The 'CREATE DATABASE IF NOT EXISTS' statement was not
binlogged, when the database has existed.
Sometimes, the master and slaves become inconsistent. The "CREATE DATABASE
IF NOT EXISTS mysqltest1" statement is not binlogged
if the db 'mysqltest1' existed before the test case is executed.
So the db 'mysqltest1' can't be created on slave.
Patch of Bug#45574 has resolved this problem.
But I think it is better to replace 'mysqltest1' by default db 'test'.
If an EVENT is created without the DEFINER clause set explicitly or with it set
to CURRENT_USER, the master and slaves become inconsistent. This issue stems from
the fact that in both cases, the DEFINER is set to the CURRENT_USER of the current
thread. On the master, the CURRENT_USER is the mysqld's user, while on the slave,
the CURRENT_USER is empty for the SQL Thread which is responsible for executing
the statement.
To fix the problem, we do what follows. If the definer is not set explicitly,
a DEFINER clause is added when writing the query into binlog; if 'CURRENT_USER' is
used as the DEFINER, it is replaced with the value of the current user before
writing to binlog.
Slave does not correctly handle "expected errors" leading to inconsistencies
between the mater and slave. Specifically, when a statement changes both
transactional and non-transactional tables, the transactional changes are
automatically rolled back on the master but the slave ignores the error and
does not roll them back thus leading to inconsistencies.
To fix the problem, we automatically roll back a statement that fails on
the slave but note that the transaction is not rolled back unless a "rollback"
command is in the relay log file.
binlog
Mixing transactional (T) and non-transactional (N) tables on behalf of a
transaction may lead to inconsistencies among masters and slaves in STATEMENT
mode. The problem stems from the fact that although modifications done to
non-transactional tables on behalf of a transaction become immediately visible
to other connections they do not immediately get to the binary log and therefore
consistency is broken. Although there may be issues in mixing T and M tables in
STATEMENT mode, there are safe combinations that clients find useful.
In this bug, we fix the following issue. Mixing N and T tables in multi-level
(e.g. a statement that fires a trigger) or multi-table table statements (e.g.
update t1, t2...) were not handled correctly. In such cases, it was not possible
to distinguish when a T table was updated if the sequence of changes was N and T.
In a nutshell, just the flag "modified_non_trans_table" was not enough to reflect
that both a N and T tables were changed. To circumvent this issue, we check if an
engine is registered in the handler's list and changed something which means that
a T table was modified.
Check WL 2687 for a full-fledged patch that will make the use of either the MIXED or
ROW modes completely safe.
In STATEMENT based replication, a statement that failed on the master but that
updated non-transactional tables is written to binary log with the error code
appended to it. On the slave, the statement is executed and the same error is
expected. However, when an "expected error" did not happen on the slave and was
either ignored or was related to a concurrency issue on the master, the slave
did not rollback the effects of the statement and as such inconsistencies might
happen.
To fix the problem, we automatically rollback a statement that should have
failed on a slave but succeded and whose expected failure is either ignored or
stems from a concurrency issue on the master.
If the log_bin_trust_function_creators option is not defined, creating a stored
function requires either one of the modifiers DETERMINISTIC, NO SQL, or READS
SQL DATA. Executing a stored function should also follows the same rules if in
STATEMENT mode. However, this was not happening and a wrong error was being
printed out: ER_BINLOG_ROW_RBR_TO_SBR.
The patch makes the creation and execution compatible and prints out the correct
error ER_BINLOG_UNSAFE_ROUTINE when a stored function without one of the modifiers
above is executed in STATEMENT mode.
to wrong result
When using MIXED mode and issuing 'CREATE TEMPORARY TABLE t_tmp',
the statement is logged if the current binlogging mode is
STATEMENT. This causes the slave to replay the instruction and
create the temporary table as well. If there is no switch to ROW
mode, and later on a 'DROP TEMPORARY TABLE t_tmp' is issued, then
this statement will also be logged and the slave will
remove/close the temporary table.
However, if there is a switch to ROW mode between the CREATE and
DROP TEMPORARY table, the DROP statement will not be logged,
leaving the slave with a dangling temporary table.
This patch addresses this, by always logging a DROP TEMPORARY
TABLE IF EXISTS when in mixed mode and a drop statement is issued
for temporary table(s).
This is a post-push fix addressing review requests and
problems with extra warnings.
Problem 1: The sub-statement where an unsafe warning was detected was
printed as part of the warning. This was ok for statements that
were unsafe due to, e.g., calls to UUID(), but did not make
sense for statements that were unsafe because there was more than
one autoincrement column (unsafeness in this case comes from the
combination of several sub-statements).
Fix 1: Instead of printing the sub-statement, print an explanation
of why the statement is unsafe.
Problem 2:
When a recursive construct (i.e., stored proceure, stored
function, trigger, view, prepared statement) contained several
sub-statements, and at least one of them was unsafe, there would be
one unsafeness warning per sub-statement - even for safe
sub-statements.
Fix 2:
Ensure that each type of warning is printed at most once, by
remembering throughout the execution of the statement which types
of warnings have been printed.
binlog
The fix for BUG 43929 introduced a regression issue. In a nutshell, when a
statement that changes a non-transactional table fails, it is written to the
binary log with the error code appended. Unfortunately, after BUG 43929, this
failure was flushing the transactional chace causing mismatch between execution
and logging histories. To fix this issue, we avoid flushing the transactional
cache when a commit or rollback is not issued.
The "get_master_version_and_clock(...)" function in sql/slave.cc ignores
error and passes directly when queries fail, or queries succeed
but the result retrieved is empty.
The "get_master_version_and_clock(...)" function should try to reconnect master
if queries fail because of transient network problems, and fail otherwise.
The I/O thread should print a warning if the some system variables do not
exist on master (very old master)
NOTE: This undoes changes by BUG#42829 in sql_class.cc:binlog_query().
I will revert the change in a post-push fix (the binlog filter should
be checked in sql_base.cc:decide_logging_format()).
General overview:
The logic for switching to row format when binlog_format=MIXED had
numerous flaws. The underlying problem was the lack of a consistent
architecture.
General purpose of this changeset:
This changeset introduces an architecture for switching to row format
when binlog_format=MIXED. It enforces the architecture where it has
to. It leaves some bugs to be fixed later. It adds extensive tests to
verify that unsafe statements work as expected and that appropriate
errors are produced by problems with the selection of binlog format.
It was not practical to split this into smaller pieces of work.
Problem 1:
To determine the logging mode, the code has to take several parameters
into account (namely: (1) the value of binlog_format; (2) the
capabilities of the engines; (3) the type of the current statement:
normal, unsafe, or row injection). These parameters may conflict in
several ways, namely:
- binlog_format=STATEMENT for a row injection
- binlog_format=STATEMENT for an unsafe statement
- binlog_format=STATEMENT for an engine only supporting row logging
- binlog_format=ROW for an engine only supporting statement logging
- statement is unsafe and engine does not support row logging
- row injection in a table that does not support statement logging
- statement modifies one table that does not support row logging and
one that does not support statement logging
Several of these conflicts were not detected, or were detected with
an inappropriate error message. The problem of BUG#39934 was that no
appropriate error message was written for the case when an engine
only supporting row logging executed a row injection with
binlog_format=ROW. However, all above cases must be handled.
Fix 1:
Introduce new error codes (sql/share/errmsg.txt). Ensure that all
conditions are detected and handled in decide_logging_format()
Problem 2:
The binlog format shall be determined once per statement, in
decide_logging_format(). It shall not be changed before or after that.
Before decide_logging_format() is called, all information necessary to
determine the logging format must be available. This principle ensures
that all unsafe statements are handled in a consistent way.
However, this principle is not followed:
thd->set_current_stmt_binlog_row_based_if_mixed() is called in several
places, including from code executing UPDATE..LIMIT,
INSERT..SELECT..LIMIT, DELETE..LIMIT, INSERT DELAYED, and
SET @@binlog_format. After Problem 1 was fixed, that caused
inconsistencies where these unsafe statements would not print the
appropriate warnings or errors for some of the conflicts.
Fix 2:
Remove calls to THD::set_current_stmt_binlog_row_based_if_mixed() from
code executed after decide_logging_format(). Compensate by calling the
set_current_stmt_unsafe() at parse time. This way, all unsafe statements
are detected by decide_logging_format().
Problem 3:
INSERT DELAYED is not unsafe: it is logged in statement format even if
binlog_format=MIXED, and no warning is printed even if
binlog_format=STATEMENT. This is BUG#45825.
Fix 3:
Made INSERT DELAYED set itself to unsafe at parse time. This allows
decide_logging_format() to detect that a warning should be printed or
the binlog_format changed.
Problem 4:
LIMIT clause were not marked as unsafe when executed inside stored
functions/triggers/views/prepared statements. This is
BUG#45785.
Fix 4:
Make statements containing the LIMIT clause marked as unsafe at
parse time, instead of at execution time. This allows propagating
unsafe-ness to the view.
There is an inconsistency with DROP DATABASE|TABLE|EVENT IF EXISTS and
CREATE DATABASE|TABLE|EVENT IF NOT EXISTS. DROP IF EXISTS statements are
binlogged even if either the DB, TABLE or EVENT does not exist. In
contrast, Only the CREATE EVENT IF NOT EXISTS is binlogged when the EVENT
exists.
This patch fixes the following cases for all the replication formats:
CREATE DATABASE IF NOT EXISTS.
CREATE TABLE IF NOT EXISTS,
CREATE TABLE IF NOT EXISTS ... LIKE,
CREAET TABLE IF NOT EXISTS ... SELECT.
Had attempted to disable this test on Windows only, but the nature of this bug
does not allow for this. The master.opt file is processed before anything in
in the actual test. As a result, we must use disabled.def files to ensure
these tests are skipped on the problematic platforms.
Removed Windows-only code and updated the proper disabled.def files accordingly.
timeout
In STMT and MIXED modes, a statement that changes both non-transactional and
transactional tables must be written to the binary log whenever there are
changes to non-transactional tables. This means that the statement gets into the
binary log even when the changes to the transactional tables fail. In particular
, in the presence of a failure such statement is annotated with the error number
and wrapped in a begin/rollback. On the slave, while applying the statement, it
is expected the same failure and the rollback prevents the transactional changes
to be persisted.
Unfortunately, statements that fail due to concurrency issues (e.g. deadlocks,
timeouts) are logged in the same way causing the slave to stop as the statements
are applied sequentially by the SQL Thread. To fix this bug, we automatically
ignore concurrency failures on the slave. Specifically, the following failures
are ignored: ER_LOCK_WAIT_TIMEOUT, ER_LOCK_DEADLOCK and ER_XA_RBDEADLOCK.