The function mysql_show_binlog_events has a local stack variable
'LOG_INFO linfo;', which is assigned to thd->current_linfo, however
this variable goes out of scope and is destroyed before clean
thd->current_linfo.
The problem is solved by moving 'LOG_INFO linfo;' to function scope.
BUG#11761686 insert_id event is not filtered.
Two issues are covered.
INSERT into autoincrement field which is not the first part in the composed primary key
is unsafe by autoincrement logging design. The case is specific to MyISAM engine
because Innodb does not allow such table definition.
However no warnings and row-format logging in the MIXED mode was done, and
that is fixed.
Int-, Rand-, User-var log-events were not filtered along with their parent
query that made possible them to screw up execution context of the following
query.
Fixed with deferring their execution until the parent query.
******
Bug#11754117
Post review fixes.
PROBLEM:
--------
When binary log statements are replayed on the slave, BEGIN is represented
in com_counters but COMMIT is not. Similarly in 'ROW' based replication
'INSERT','UPDATE',and 'DELETE' com_counters are not getting incremented
when the binary log statements are replayed at slave.
ANALYSIS:
---------
In 'ROW' based replication for COMMIT,INSERT,UPDATE and DELETE operations
following special events are invoked.
Xid_log_event,Write_rows_log_event,Update_rows_log_event,Update_rows_log_event.
The above mentioned events doesn't go through the parser where the
'COM_COUNTERS' are incremented.
FIX:
-----
Increment statements are added at appropriate events.
Respective functions are listed below.
'Xid_log_event::do_apply_event'
'Write_rows_log_event::do_before_row_operations'
'Update_rows_log_event::do_before_row_operations'
'Delete_rows_log_event::do_before_row_operations'
Problem - this failure occured in the test added for the fix of the
bug-13333431. The basic problem of the failure was the
value of the report_port which persisted even after the end
of the test (ie. rpl_end.inc). So this causes the assertion
in the test to fail if it is executed again.
Fix - restarted the server with the default value being passed to the
report_port after testing the two expected case so that in the
next run of the test we will not encounter the previous value of
report_port.
Description: When the table has more than one unique or primary key,
INSERT... ON DUP KEY UPDATE statement is sensitive to the order in which
the storage engines checks the keys. Depending on this order, the storage
engine may determine different rows to mysql, and hence mysql can update
different rows on master and slave.
Solution: We mark INSERT...ON DUP KEY UPDATE on a table with more than on unique
key as unsafe therefore the event will be logged in row format if it is available
(ROW/MIXED). If only STATEMENT format is available, a warning will be thrown.
BUG#64503: mysql frequently ignores --relay-log-space-limit
When the SQL thread goes to sleep, waiting for more events, it sets
the flag ignore_log_space_limit to true. This gives the IO thread a
chance to queue some more events and ultimately the SQL thread will be
able to purge the log once it is rotated. By then the SQL thread
resets the ignore_log_space_limit to false. However, between the time
the SQL thread has set the ignore flag and the time it resets it, the
IO thread will be queuing events in the relay log, possibly going way
over the limit.
This patch makes the IO and SQL thread to synchronize when they reach
the space limit and only ask for one event at a time. Thus the SQL
thread sets ignore_log_space_limit flag and the IO thread resets it to
false everytime it processes one more event. In addition, everytime
the SQL thread processes the next event, and the limit has been
reached, it checks if the IO thread should rotate. If it should, it
instructs the IO thread to rotate, giving the SQL thread a chance to
purge the logs (freeing space). Finally, this patch removes the
resetting of the ignore_log_space_limit flag from purge_first_log,
because this is now reset by the IO thread every time it processes the
next event when the limit has been reached.
If the SQL thread is in a transaction, it cannot purge so, there is no
point in asking the IO thread to rotate. The only thing it can do is
to ask for more events until the transaction is over (then it can ask
the IO to rotate and purge the log right away). Otherwise, there would
be a deadlock (SQL would not be able to purge and IO thread would not
be able to queue events so that the SQL would finish the transaction).
Problem - The default port number shown in SHOW SLAVE HOSTS is always 3306
though the slave is actually listening on a different port number.
This is a problem as the user can not be sure whether this port
value can be trusted and so client trying to read replication
topology can get confused.
Fix - 3306 ceases to be the default value of report-port. Moreover report-port
does not have a static default any longer.
Instead we initialize report-port to 0 as the new default value and change
it based on two checks :
1) If report_port is not set, the slave reports the port number its listening
on. (i.e. if report-port is not set we get the actual value of the slave's
port number).
2) If report-port is set, we show the value report-port is set to, as the slave's
port number.
PROBLEM: After WL 4144, when using MyISAM Merge tables, the routine
open_and_lock_tables will append to the list of tables to lock, the
base tables that make up the MERGE table. This has two side-effects in
replication:
1. On the master side, we log additional table maps for the base
tables, since they appear in the list of locked tables, even
though we don't really use them at the slave.
2. On the slave side, when opening a MERGE table while applying a
ROW event, additional tables are appended to the list of tables
to lock.
Side-effect #1 is not harmful. It's just that when using MyISAM Merge
tables a few table maps more may be logged.
Side-effect #2, is harmful, because the list rli->tables_to_lock is an
extended structure from TABLE_LIST in which the extra fields are
filled from the table maps that are processed. Since
open_and_lock_tables appends tables to the list after all table map
events have been processed we end up with entries without
replication/table map data on them. Thus when trying to access that
info for these extra tables, the server will crash.
SOLUTION: We fix side-effect #2 by making sure that we access the
replication part of the structure for those in the list that were
accounted for when processing the correspondent table map events. All
in all, we never go beyond rli->tables_to_lock_count.
We also deploy an assertion when clearing rli->tables_to_lock, making
sure that the base tables are not in the list anymore (were closed in
close_thread_tables).
Problem: Statements that write to tables with auto_increment columns
based on the selection from another table, may lead to master
and slave going out of sync, as the order in which the rows
are retrieved from the table may differ on master and slave.
Solution: We mark writing to a table with auto_increment table
based on the rows selected from another table as unsafe. This
will cause the execution of such statements to throw a warning
and forces the statement to be logged in ROW if the logging
format is mixed.
Changes:
1. All the statements that writes to a table with auto_increment
column(s) based on the rows fetched from another table, will now
be unsafe.
2. CREATE TABLE with SELECT will now be unsafe.
Problem: Statements that write to tables with auto_increment columns
based on the selection from another table, may lead to master
and slave going out of sync, as the order in which the rows
are retrived from the table may differ on master and slave.
Solution: We mark writing to a table with auto_increment table
as unsafe. This will cause the execution of such statements to
throw a warning and forces the statement to be logged in ROW if
the logging format is mixed.
Changes:
1. All the statements that writes to a table with auto_increment
column(s) based on the rows fetched from another table, will now
be unsafe.
2. CREATE TABLE with SELECT will now be unsafe.
rpl_heartbeat_basic test fails sporadically on pushbuild because did
not received all heartbeats from slave in circular replication.
MASTER_HEARTBEAT_PERIOD had the default value (slave_net_timeout/2) so
wait on "Heartbeat event received on master", that only waits for 1
minute, sometimes timeout before heartbeat arrives. Fixed setting a
smaller period value.
Problem : The basic problem is the way the thread sleeps in mysql-5.5 and also in mysql-5.1
when we execute a stop slave on windows platform.
On windows platform if the stop slave is executed after the master dies, we have
this long wait before the stop slave return a value. This is because there is a
sleep of the thread. The sleep is uninterruptable in the two above version,
which was fixed by Davi patch for the BUG#11765860 for mysql-trunk. Backporting
his patch for mysql-5.5 fixes the problem.
Solution : A new pair of mutex and condition variable is introduced to synchronize thread
sleep and finalization. A new mutex is required because the slave threads are
terminated while holding the slave thread locks (run_lock), which can not be
relinquished during termination as this would affect the lock order.
If an error message contains '\' backslash it is displayed correctly
through show-slave-status or
query_get_value(SHOW SLAVE STATUS, Last_IO_Error, 1);. But when
SELECT REPLACE(...) is applied backslash is escaped resulting in a
different test output.
Disabled backslash escape on show_slave_status.inc and replaced '\' for
'/' using replace_regex function in order to achieve the same test
output when different path separators are used.
The server crashes when receiving a COM_BINLOG_DUMP command with a position of 0 or
larger than the file size.
The execution proceeds to an error block having the last read replication coordinates
pointer be NULL and its dereferencing crashed the server.
Fixed with making "public" previously used only for heartbeat coordinates.
Test extra/rpl_tests/rpl_extra_col_master.test (used by
rpl_extra_col_master_*) ends with the active connection pointing to the
slave. Thus, the two last tests never succeed in changing the binlog
format of the master away from 'row'. With correct active connection
(master) tests fail for binlog 'statement' and 'mixed' formats.
Tests rpl_extra_col_master_* only run when binary log format is
row. Statement and mix replication do not make sense in this
tests since it will try to execute statements on columns that do
not exist. This fix is basically a backport from mysql-5.5, see
changes done as part of BUG 39934.
There was memory leak when running some tests on PB2.
The reason of the failure is an early return from change_master()
that was supposed to deallocate a dyn-array.
Actually the same bug58915 was fixed in trunk with relocating the dyn-array
destruction into THD::cleanup_after_query() which can't be bypassed.
The current patch backports magne.mahre@oracle.com-20110203101306-q8auashb3d7icxho
and adds two optimizations: were done: the static buffer for the dyn-array to base on,
and the array initialization is called precisely when it's necessary rather than
per each CHANGE-MASTER as before.
Refactored the test case: hardened and extended it. Created test inc file
to abstract the task of relocating binlogs.
Also, disabled it on windows for not cluttering the test case any further,
as it depends heavily on doing filesystem operations and path handling.
There was memory leak when running some tests on PB2.
The reason of the failure is an early return from change_master()
that was supposed to deallocate a dyn-array.
Fixed with relocating the dyn-array's destructor at ~LEX() that is
the end of the session, per Gleb's patch idea.
Two optimizations were done: the static buffer for the dyn-array to base on,
and the array initialization is called precisely when it's necessary rather than
per each CHANGE-MASTER as before.
BIN LOG HAS BEEN MOVED
When moving the binary/relay log files from one location to
another and restarting the server with a different log-bin or
relay-log paths, would cause the startup process to abort. The
root cause was that the server would not be able to find the log
files because it would consider old paths for entries in the
index file instead of the new location. What's even worse, the
relative paths would not be considered relative to the path
provided in log-bin and relay-log, but to mysql_data_dir.
We fix the cases where the server contains relative paths. When
the server is reading from the index file, it checks whether the
entry contains relative paths. If it does, we replace it with the
absolute path set in log-bin/relay-log option. Absolute paths
remain unchanged and the index must be manually edited to
consider the new log-bin and/or relay-log path (this should be
documented). This is a fix for a GA version, that does not break
behavior (that much).
For development versions, we should go with Zhenxing's approach
that removes paths altogether from index files.
When passing an empty user to the connect function will cause
valgrind warnings. Seems that the client code is not prepared
to handle empty users. On 5.6 this can even be triggered by
START SLAVE PASSWORD='...'; i.e., without setting USER='...' on
the START SLAVE command (see WL#4143 for details on the new
additional START SLAVE commands).
To fix this, we disallow empty users when configuring the slave
connection parameters (this decision might be revisited if the
client code accepts empty users in the future).
SCAN/CPU) => SLAVE FAILURE
When a statement containing a large amount of ROWs to be applied on
the slave, and the slave's table does not contain a PK, it can take a
considerable amount of time to find and change all the rows that are
to be changed.
The proper slave enhancement will be implemented in WL 5597. However,
in this bug we are making it clear to the user what the problem is, by
printing a message to the error log if the execution time, for a given
statement in RBR, takes more than LONG_FIND_ROW_THRESHOLD (set to 60
seconds). This shall help the DBA to diagnose what's happening when
facing a slave server that is quiet for no apparent reason...
The note is only printed to the error log if log_warnings is set to be
greater than 1.
post-push fixes for show_slave_io_error= 1 of wait_for_slave_io_error.inc;
Unix and win format path specifically so few tests have to change show_slave_io_error
to zero.
The bug case is similar to one fixed earlier bug_49536.
Deadlock involving LOCK_log appears to be possible because the purge running thread
is holding LOCK_log whereas there is no sense of doing that and which fact was
exploited by the earlier bug fixes.
Fixed with small reengineering of rotate_and_purge(), adding two new methods and
setting up a policy to execute those instead of the former
rotate_and_purge(RP_LOCK_LOG_IS_ALREADY_LOCKED).
The policy for using rotate(), purge() is that if the caller acquires LOCK_log itself,
it should call rotate(), release the mutex and run purge().
Side effect of this patch is refining error message of bug@11747416 to print
the whole path.
Binary log of master can get a partially logged event if the server
runs out of disk space and, while waiting for some space to be freed,
is shut down (or crashes). If the server is not stopped, it will just
wait endlessly for space to be freed, thus no partial event anomaly
occurs. The restarted master server has had a dubious policy to send
the incomplete event to slave which it apparently can't handle.
Although an error was printed out the fact of sending with unclear
error message is a source of confusion.
Actually the problem of presence an incomplete event in the binary log
was already fixed by WL 5493 (which was merged to our current trunk
branch, major version 5.6). The fix makes the server truncate the
binary log on server restart and recovery.
However 5.5 master can't do that. So the current issue is a problem of
sending incomplete events to the slave by 5.5 master.
It is fixed in this patch by changing the policy so that only complete
events are pushed by the dump thread to the IO thread. In addition,
the error text that master sends to the slave when an incomplete event
is found, now states that incomplete event may have been caused by an
out-of-disk space situation and provides coordinates of
the first and the last event bytes read.
Problem: The following statements can cause the slave to go out of sync
if logged in statement format:
INSERT IGNORE...SELECT
INSERT ... SELECT ... ON DUPLICATE KEY UPDATE
REPLACE ... SELECT
UPDATE IGNORE :
CREATE ... IGNORE SELECT
CREATE ... REPLACE SELECT
Background: Since the order of the rows returned by the SELECT
statement or otherwise may differ on master and slave, therefore
the above statements may cuase the salve to go out of sync with
the master.
Fix:
Issue a warning when statements like the above are exectued and
the bin-logging format is statement. If the logging format is mixed,
use row based logging. Marking a statement as unsafe has been
done in the sql/sql_parse.cc instead of sql/sql_yacc.cc, because while
parsing for a token has been done we cannot be sure if the parsing
of the other tokens has been done as well.
Six new warning messages has been added for each unsafe statement.
binlog.binlog_unsafe.test has been updated to incoporate these additional unsafe statments.
******
BUG#11758262 - 50439: MARK INSERT...SEL...ON DUP KEY UPD,REPLACE...SEL,CREATE...[IGN|REPL] SEL
Problem: The following statements can cause the slave to go out of sync
if logged in statement format:
INSERT IGNORE...SELECT
INSERT ... SELECT ... ON DUPLICATE KEY UPDATE
REPLACE ... SELECT
UPDATE IGNORE :
CREATE ... IGNORE SELECT
CREATE ... REPLACE SELECT
Background: Since the order of the rows returned by the SELECT
statement or otherwise may differ on master and slave, therefore
the above statements may cuase the salve to go out of sync with
the master.
Fix:
Issue a warning when statements like the above are exectued and
the bin-logging format is statement. If the logging format is mixed,
use row based logging. Marking a statement as unsafe has been
done in the sql/sql_parse.cc instead of sql/sql_yacc.cc, because while
parsing for a token has been done we cannot be sure if the parsing
of the other tokens has been done as well.
Six new warning messages has been added for each unsafe statement.
binlog.binlog_unsafe.test has been updated to incoporate these additional unsafe statments.
Before BUG#28796, an empty host was used to identify that an instance was no
longer a slave. However, BUG#28796 changed this behavior and one cannot set
an empty host. Besides, a RESET SLAVE only cleans up information on the next
event to retrieve from the master, disables ssl and resets heartbeat period.
So a call to SHOW SLAVE STATUS after issuing a RESET SLAVE still returns some
valid information, such as host, port, user and password.
To fix this problem, we have introduced the command RESET SLAVE ALL that does
what a regular RESET SLAVE does and also clears host, port, user and password
information thus allowing users to identify when an instance is no longer a
slave.