This worklog aims at testing the two following scenarios:
1) Whenever the mysql_binlog_send method (dump thread)
reaches the end of file when reading events from the binlog, before
checking if it should wait for more events, there was a test to
check if the file being read was still active, i.e, it was the last
known binlog. However, it was possible that something was written to
the binary log and then a rotation would happen, after EOF was
detected and before the check for active was performed. In this
case, the end of the binary log would not be read by the dump
thread, and this would cause the slave to lose updates.
This test verifies that the problem has been fixed. It waits during
this window while forcing a rotation in the binlog.
2) Verify dump thread can send events in active file, correctly after
encountering an IO error.
DURING INNODB RECOVERY
Problem:
=======
The connection 'master' is dropped by mysqltest after
rpl_end.inc. At this point, dropping temporary tables
at the connection 'master' are not synced at slave.
So, the temporary tables replicated from master remain
on slave leading to an inconsistent close of the test.
The following test thus complains about the presence of
temporary table(s) left over from the previous test.
Fix:
===
- Put explicit drop commands in replication tests so
that the temporary tables are dropped at slave as well.
- Added the check for Slave_open_temp_tables in
mtr_check.sql to warn about the remaining temporary
table, if any, at the close of a test.
--BINLOG-IGNORE-DB AND FULLY QUALIFIED TABLE
Problem:
=======
An ALTER TABLE statement is not written to binlog if server
started with "--binlog-ignore-db some database" and 'fully
qualified' table names are used in the ALTER TABLE statement
altering table different from current database context.
Analysis:
========
The above mentioned problem not only affects "ALTER TABLE"
statements but also to all kind of statements. Once the
current default database becomes "NULL" none of the
statements will be binlogged.
The current behaviour is such that if the user has specified
restrictions on which database needs to be replicated and the
default db is not specified, then do not replicate.
This means that "NULL" is considered to be equivalent to
everything (default db = null implied ignore don't log the
statement).
Fix:
===
"NULL" should not be considered as equivalent to everything.
Since the filtering criteria is not equal to "NULL" the
statement should be logged into binlog.
mysql-test/suite/rpl/r/rpl_loaddata_m.result:
Earlier when defalut database was "NULL" DROP TABLE
was not getting logged. Post this fix it will be logged
and the DROP will fail at slave as the table creation
was skipped by master as --binlog-ignore-db=test.
mysql-test/suite/rpl/t/rpl_loaddata_m.test:
Earlier when defalut database was "NULL" DROP TABLE
was not getting logged. Post this fix it will be logged
and the DROP will fail at slave as the table creation
was skipped by master as --binlog-ignore-db=test.
sql/rpl_filter.cc:
Replaced DBUG_RETURN(0) with DBUG_RETURN(1).
PLATFORM= MACOSX10.6 X86_64 MAX
Problem: The test was failing on pb2's mac machine because
it was not cleaned up properly. The test checks if
the command 'start slave until' throws a proper
error when issued with a wrong number/type of
parameters. After this,the replication stream was
stopped using the include file 'rpl_end.inc'.
The errors thrown earlier left the slave in an
inconsistent state to be closed by the include
file which was caught by the mac machine.
Fix: Started slave by invoking start_slave.inc to have a
working slave before calling rpl_reset.inc
Problem: The test file was not in a good shape. It tested
start slave until relay log file/pos combination
wrongly. A couple of commands were executed at
master and replicated at slave. Next, the
coordinates in terms of relay log file and pos
were noted down followed by reset slave and start
slave until saved relay log file/pos. Reset slave
deletes all relay log files and makes the slave
forget its replication position. So, using the
saved coordiantes after reset slave is wrong.
Fix: Split the test in two parts:
a) Test for start slave until master log file/pos and
checking for correct errors in the failure
scenarios.
b) Test for start slave until relay log file/pos.
Problem: The variables auto_increment_increment and
auto_increment_offset were set in the the include
file rpl_init.inc. This was only configured for
some connections that are rarely used by test
cases, so likely that it will cause confusion.
If replication tests want to setup these variables
they should do so explicitly.
Fix:
a) Removed code to set the variables
auto_increment_increment and auto_increment_offset
in the include file.
b) Updated tests files using the same.
post push fix:
rpl_stm_until.test was disabled because of
this bug. Enabled and fixed it.
Removed a part of the test that was obsolete.
It tested replication from 4.0 master to 5.0
slave.
post push fix:
rpl_stm_until.test was disabled because of
this bug. Enabled and fixed it.
Removed a part of the test that was obsolete.
It tested replication from 4.0 master to 5.0
slave.
RPL_ROW_UNTIL TIMES OUT
patch to fix post push falures in pb2
mysql-test/suite/rpl/r/rpl_row_until.result:
changes to account for the changes made in
corresponding test file.
mysql-test/suite/rpl/t/disabled.def:
disabled test in macosx
mysql-test/suite/rpl/t/rpl_row_until.test:
replaced static relayy log file by an mtr variable
which saves the name of relay log file.
RPL_ROTATE_LOGS has been failing sporadically in what seems a
problem related to routines that update the coordinates. However,
the test lacks proper assert statments and because of this the
debug information upon failure simply points to the content
mismatch between the test and the result file.
Not as a solution, but as a improvement to the test to better
debug this failure, new assert statments were added to the test.
@rpl_rotate_logs.test
Added new assert statments reducing the
dependency on the result file.
@rpl_rotate_logs.result
Added new content to the result file to
match the test changes
Problem: The problem with the test is that the slave returns
from start_slave.inc call too early before the list
is actually actualised. This caused the slave stale
data to be reported.
Fix: Added a wait in the test till the slave's IO status is
changed to "Waiting for master to send event" which
which ensures the list is correctly updated.
=== Problem ===
The test is dependent on binlog positions and checks
to see if the command 'START SLAVE' functions correctly
with the 'UNTIL' clause added to it. The 'UNTIL' clause
is added to specify that the slave should start and run
until the SQL thread reaches a given point in the master
binary log or in the slave relay log.
The test uses hard coded values for MASTER_LOG_POS and
RELAY_LOG_POS, instead of extracting it using
query_get_value() function. There is a test
'rpl.rpl_row_until' which does the similar thing but uses
query_get_value() function to set the values of
MASTER_LOG_POS/ RELAY_LOG_POS. To be precise,
rpl.rpl_row_until is a modified version of
engines/func.rpl_row_until.test.
The use of hard coded values may lead the slave to stop at a position
which may differ from the expected position in the binlog file,
an example being the failure of engines/funcs.rpl_row_until in
mysql-5.1 given as:
"query 'select * from t2' failed. Table 'test.t2' doesn't exist".
In this case, the slave actually ran a couple of extra commands
as a result of which the slave first deleted the table and then
ran a select query on table, leading to the above mentioned failure.
=== Fix ===
1) Fixed the code for failure seen in rpl.rpl_row_until.
This test was also failing although the symptoms of
failure were different.
2) Copied the contents from rpl.rpl_row_until into
into engines/funcs.rpl.rpl_row_until.
3) Updated engines/funcs.rpl_row_until.result accordingly.
mysql-test/suite/engines/funcs/r/rpl_row_until.result:
modified to accomodate the changes in corresponding
test file.
mysql-test/suite/engines/funcs/t/disabled.def:
removed from the list of disabled tests.
mysql-test/suite/engines/funcs/t/rpl_row_until.test:
fixed rpl.rpl_row_until and copied its content to
engines/funcs.rpl_row_until. The reason being both
are same tests but rpl.rpl_row_until is an
updated version.
mysql-test/suite/rpl/t/disabled.def:
removed from the list of disabled tests.
sql/sql_repl.cc:
Added a check to catch an improper combination
of arguements passed to 'START SLAVE UNTIL'. Earlier,
START SLAVE UNTIL MASTER_LOG_FILE='master-bin.000001',
MASTER_LOG_POS=561, RELAY_LOG_POS=12;
passed. It is now detected and an error is reported.
When a binlog is replayed into a server, e.g.:
$ mysqlbinlog binlog.000001 | mysql
it sets a pseudo slave mode on the client connection in order to server
be able to read binlog events, there is, a format description event is
needed to correctly read following events.
Also this pseudo slave mode applies to the current connection
replication rules that are needed to correctly apply binlog events.
If a binlog dump is sourced on a connection, this pseudo slave mode will
remains after it, what will apply unexpected rules from customer
perspective to following commands.
Added a new SET statement to binlog dump that will unset pseudo slave
mode at the end of dump file.
rpl_cant_read_event_incident:
Slave applies updates from bug11747416_32228_binlog.000001 file which
contains a CREATE TABLE t statement and an incident, when SQL thread is
running slowly IO thread may reach the incident before SQL thread
executes the create table statement.
Execute "drop table if exists t" and also perform a RESET MASTER to
clean slave binary logs.
rpl_bug41902:
Error "MYSQL_BIN_LOG::purge_logs was called with file
./master-bin.000001 not listed in the index." suppression is not
considering windows path, there is ".\master-bin.000001".
Changed suppression to: "MYSQL_BIN_LOG::purge_logs was called with file
..master-bin.000001 not listed in the index", to match ".\" and "./".
Problem
========
SQL statements close to the size of max_allowed_packet produce binary
log events larger than max_allowed_packet.
The reason why this failure is occuring is because the event length is
more than the total size of the max_allowed_packet + max_event_header
length. Now since the event length exceeds this size master Dump
thread is unable to send the packet on to the slave.
That can happen e.g with row-based replication in Update_rows event.
Fix
====
The problem was fixed by increasing the max_allowed_packet for the
slave's threads (IO/SQL) by increasing it to 1GB.
This is done using the new server option included which is used to
regulate the max_allowed_packet of the slave thread (IO/SQL).
This causes the large packets to be received by the slave and apply
it successfully.
sql/log_event.h:
Added the new option in the log_event.h file.
sql/mysqld.cc:
Added a new option to the server.
sql/slave.cc:
Increasing the session max_allowed_packet to a large value ,
i.e. not taking global(max_allowed) into consideration, for the slave's threads.
Problem - The failure on PB2 is possbily due to the port number being still in
use even after the server restarts which is not reflected in the
server restart.
Fix - The problem is fixed by starting the servers forcefully using the option
file and also the parameters for the server restart is passed correctly.
mysql-test/suite/rpl/t/rpl_report_port-master.opt:
Option file for the master.
The function mysql_show_binlog_events has a local stack variable
'LOG_INFO linfo;', which is assigned to thd->current_linfo, however
this variable goes out of scope and is destroyed before clean
thd->current_linfo.
The problem is solved by moving 'LOG_INFO linfo;' to function scope.
BUG#11761686 insert_id event is not filtered.
Two issues are covered.
INSERT into autoincrement field which is not the first part in the composed primary key
is unsafe by autoincrement logging design. The case is specific to MyISAM engine
because Innodb does not allow such table definition.
However no warnings and row-format logging in the MIXED mode was done, and
that is fixed.
Int-, Rand-, User-var log-events were not filtered along with their parent
query that made possible them to screw up execution context of the following
query.
Fixed with deferring their execution until the parent query.
******
Bug#11754117
Post review fixes.
mysql-test/suite/rpl/r/rpl_auto_increment_bug45679.result:
a new result file is added.
mysql-test/suite/rpl/r/rpl_filter_tables_not_exist.result:
results updated.
mysql-test/suite/rpl/t/rpl_auto_increment_bug45679.test:
regression test for BUG#11754117-45670 is added.
mysql-test/suite/rpl/t/rpl_filter_tables_not_exist.test:
regression test for filtering issue of BUG#11754117 - 45670 is added.
sql/log_event.cc:
Logics are added for deferring and executing events associated
with the Query event.
sql/log_event.h:
Interface to deferred events batch execution is added.
sql/rpl_rli.cc:
initialization for new RLI members is added.
sql/rpl_rli.h:
New members to RLI are added to facilitate deferred events gathering
and execution control;
two general character RLI cleanup methods are constructed.
sql/rpl_utility.cc:
Deferred_log_events methods are difined.
sql/rpl_utility.h:
A new class Deferred_log_events is defined to implement
IRU events gathering, execution and cleanup.
sql/slave.cc:
Necessary changes to initialize `rli->deferred_events' and prevent
deferred event deletion in the main read-exec branch.
sql/sql_base.cc:
A new safe-check function for multi-part pk with auto-increment is defined
and deployed in lock_tables().
sql/sql_class.cc:
Initialization for a new member and replication cleanups are added
to THD class.
sql/sql_class.h:
THD class receives a new member to hold a specific execution
context for slave applier.
sql/sql_parse.cc:
Execution of the deferred event in started prior to its parent query.
Problem - this failure occured in the test added for the fix of the
bug-13333431. The basic problem of the failure was the
value of the report_port which persisted even after the end
of the test (ie. rpl_end.inc). So this causes the assertion
in the test to fail if it is executed again.
Fix - restarted the server with the default value being passed to the
report_port after testing the two expected case so that in the
next run of the test we will not encounter the previous value of
report_port.
mysql-test/suite/rpl/r/rpl_report_port.result:
Updated the corresponding result file.
mysql-test/suite/rpl/t/rpl_report_port-slave.opt:
Removed the slave option file.
mysql-test/suite/rpl/t/rpl_report_port.test:
Added the restart server option before ending the test.
BUG#64503: mysql frequently ignores --relay-log-space-limit
When the SQL thread goes to sleep, waiting for more events, it sets
the flag ignore_log_space_limit to true. This gives the IO thread a
chance to queue some more events and ultimately the SQL thread will be
able to purge the log once it is rotated. By then the SQL thread
resets the ignore_log_space_limit to false. However, between the time
the SQL thread has set the ignore flag and the time it resets it, the
IO thread will be queuing events in the relay log, possibly going way
over the limit.
This patch makes the IO and SQL thread to synchronize when they reach
the space limit and only ask for one event at a time. Thus the SQL
thread sets ignore_log_space_limit flag and the IO thread resets it to
false everytime it processes one more event. In addition, everytime
the SQL thread processes the next event, and the limit has been
reached, it checks if the IO thread should rotate. If it should, it
instructs the IO thread to rotate, giving the SQL thread a chance to
purge the logs (freeing space). Finally, this patch removes the
resetting of the ignore_log_space_limit flag from purge_first_log,
because this is now reset by the IO thread every time it processes the
next event when the limit has been reached.
If the SQL thread is in a transaction, it cannot purge so, there is no
point in asking the IO thread to rotate. The only thing it can do is
to ask for more events until the transaction is over (then it can ask
the IO to rotate and purge the log right away). Otherwise, there would
be a deadlock (SQL would not be able to purge and IO thread would not
be able to queue events so that the SQL would finish the transaction).
Fix - Changed the implementation of the condition check from the result file
to using an assert.
mysql-test/suite/rpl/r/rpl_report_port.result:
Updated the result file.
mysql-test/suite/rpl/t/rpl_report_port.test:
Changed from the a condtional check to an assert.
Problem - The default port number shown in SHOW SLAVE HOSTS is always 3306
though the slave is actually listening on a different port number.
This is a problem as the user can not be sure whether this port
value can be trusted and so client trying to read replication
topology can get confused.
Fix - 3306 ceases to be the default value of report-port. Moreover report-port
does not have a static default any longer.
Instead we initialize report-port to 0 as the new default value and change
it based on two checks :
1) If report_port is not set, the slave reports the port number its listening
on. (i.e. if report-port is not set we get the actual value of the slave's
port number).
2) If report-port is set, we show the value report-port is set to, as the slave's
port number.
mysql-test/include/show_slave_hosts.inc:
A .inc file is added to use show slave hosts in the new test added.
mysql-test/r/mysqld--help-notwin.result:
Updated the result file to show the default value passed for report-port.
mysql-test/suite/rpl/r/rpl_report_port.result:
The result file for the new test that is added.
mysql-test/suite/rpl/r/rpl_show_slave_hosts.result:
Updated the result file to show the default value passed for report-port.
mysql-test/suite/rpl/t/rpl_report_port-slave.opt:
Option file for the new test added.
mysql-test/suite/rpl/t/rpl_report_port.test:
Added a test to check the correct functionality of report-port.
We check this by running the replication twice.
In the first run we do not set the value of report-port through the opt file
and get the actual port number of the slave's port.
We then restart the server with report-port set to some value (in this case 9000)
and check the value reported for the slave's port number.
mysql-test/suite/sys_vars/t/report_port_basic.test:
Update the test file to show the value for report-port. It is replaced with
SLAVE_PORT as the actual value of the report-port will change with each run.
sql/mysqld.cc:
Changed the value reported by report port :
1. If the value for report-port is not set we assign report-port to be the
actual port number of the slave (mysqld_port).
2. If report-port is set we get the value set for the report-port.
sql/sys_vars.cc:
Passed 0 as the default value of the report-port.
PROBLEM: After WL 4144, when using MyISAM Merge tables, the routine
open_and_lock_tables will append to the list of tables to lock, the
base tables that make up the MERGE table. This has two side-effects in
replication:
1. On the master side, we log additional table maps for the base
tables, since they appear in the list of locked tables, even
though we don't really use them at the slave.
2. On the slave side, when opening a MERGE table while applying a
ROW event, additional tables are appended to the list of tables
to lock.
Side-effect #1 is not harmful. It's just that when using MyISAM Merge
tables a few table maps more may be logged.
Side-effect #2, is harmful, because the list rli->tables_to_lock is an
extended structure from TABLE_LIST in which the extra fields are
filled from the table maps that are processed. Since
open_and_lock_tables appends tables to the list after all table map
events have been processed we end up with entries without
replication/table map data on them. Thus when trying to access that
info for these extra tables, the server will crash.
SOLUTION: We fix side-effect #2 by making sure that we access the
replication part of the structure for those in the list that were
accounted for when processing the correspondent table map events. All
in all, we never go beyond rli->tables_to_lock_count.
We also deploy an assertion when clearing rli->tables_to_lock, making
sure that the base tables are not in the list anymore (were closed in
close_thread_tables).
Problem: Statements that write to tables with auto_increment columns
based on the selection from another table, may lead to master
and slave going out of sync, as the order in which the rows
are retrieved from the table may differ on master and slave.
Solution: We mark writing to a table with auto_increment table
based on the rows selected from another table as unsafe. This
will cause the execution of such statements to throw a warning
and forces the statement to be logged in ROW if the logging
format is mixed.
Changes:
1. All the statements that writes to a table with auto_increment
column(s) based on the rows fetched from another table, will now
be unsafe.
2. CREATE TABLE with SELECT will now be unsafe.
sql/share/errmsg-utf8.txt:
Added new warning messages.
sql/sql_base.cc:
-Created function to check statements that write to
tables with auto_increment column and has select.
-Marked all the statements that write to a table
with auto_increment column based on rows fetched
from other table(s) as unsafe.
sql/sql_table.cc:
mark CREATE TABLE[with auto_increment column] as unsafe.
Problem: Statements that write to tables with auto_increment columns
based on the selection from another table, may lead to master
and slave going out of sync, as the order in which the rows
are retrived from the table may differ on master and slave.
Solution: We mark writing to a table with auto_increment table
as unsafe. This will cause the execution of such statements to
throw a warning and forces the statement to be logged in ROW if
the logging format is mixed.
Changes:
1. All the statements that writes to a table with auto_increment
column(s) based on the rows fetched from another table, will now
be unsafe.
2. CREATE TABLE with SELECT will now be unsafe.
sql/share/errmsg-utf8.txt:
Added new Warning messages
sql/sql_base.cc:
created a new function that checks for select + write on a autoinc table
made all such statements to be unsafe.
sql/sql_parse.cc:
made create autoincremnet tabble + select unsafe
rpl_heartbeat_basic test fails sporadically on pushbuild because did
not received all heartbeats from slave in circular replication.
MASTER_HEARTBEAT_PERIOD had the default value (slave_net_timeout/2) so
wait on "Heartbeat event received on master", that only waits for 1
minute, sometimes timeout before heartbeat arrives. Fixed setting a
smaller period value.