RPL_ROW_UNTIL TIMES OUT
patch to fix post push falures in pb2
mysql-test/suite/rpl/r/rpl_row_until.result:
changes to account for the changes made in
corresponding test file.
mysql-test/suite/rpl/t/disabled.def:
disabled test in macosx
mysql-test/suite/rpl/t/rpl_row_until.test:
replaced static relayy log file by an mtr variable
which saves the name of relay log file.
=== Problem ===
The test is dependent on binlog positions and checks
to see if the command 'START SLAVE' functions correctly
with the 'UNTIL' clause added to it. The 'UNTIL' clause
is added to specify that the slave should start and run
until the SQL thread reaches a given point in the master
binary log or in the slave relay log.
The test uses hard coded values for MASTER_LOG_POS and
RELAY_LOG_POS, instead of extracting it using
query_get_value() function. There is a test
'rpl.rpl_row_until' which does the similar thing but uses
query_get_value() function to set the values of
MASTER_LOG_POS/ RELAY_LOG_POS. To be precise,
rpl.rpl_row_until is a modified version of
engines/func.rpl_row_until.test.
The use of hard coded values may lead the slave to stop at a position
which may differ from the expected position in the binlog file,
an example being the failure of engines/funcs.rpl_row_until in
mysql-5.1 given as:
"query 'select * from t2' failed. Table 'test.t2' doesn't exist".
In this case, the slave actually ran a couple of extra commands
as a result of which the slave first deleted the table and then
ran a select query on table, leading to the above mentioned failure.
=== Fix ===
1) Fixed the code for failure seen in rpl.rpl_row_until.
This test was also failing although the symptoms of
failure were different.
2) Copied the contents from rpl.rpl_row_until into
into engines/funcs.rpl.rpl_row_until.
3) Updated engines/funcs.rpl_row_until.result accordingly.
mysql-test/suite/engines/funcs/r/rpl_row_until.result:
modified to accomodate the changes in corresponding
test file.
mysql-test/suite/engines/funcs/t/disabled.def:
removed from the list of disabled tests.
mysql-test/suite/engines/funcs/t/rpl_row_until.test:
fixed rpl.rpl_row_until and copied its content to
engines/funcs.rpl_row_until. The reason being both
are same tests but rpl.rpl_row_until is an
updated version.
mysql-test/suite/rpl/t/disabled.def:
removed from the list of disabled tests.
sql/sql_repl.cc:
Added a check to catch an improper combination
of arguements passed to 'START SLAVE UNTIL'. Earlier,
START SLAVE UNTIL MASTER_LOG_FILE='master-bin.000001',
MASTER_LOG_POS=561, RELAY_LOG_POS=12;
passed. It is now detected and an error is reported.
btr_lift_page_up() writes wrong page number (different by -1) for upper than father page.
But in almost all of the cases, the father page should be root page, no upper
pages. It is very rare path.
In addition the leaf page should not be lifted unless the father page is root.
Because the branch pages should not become the leaf pages.
rb://1336 approved by Marko Makela.
main.mysqlbinlog_row_innodb are skipped by mtr
=== Problem ===
The following tests are wrongly placed in main suite and as a
result these are not run with proper binlog format combinations.
Some are always skipped by mtr.
1) mysqlbinlog_row_myisam
2) mysqlbinlog_row_innodb
3) mysqlbinlog_row.test
4) mysqlbinlog_row_trans.test
5) mysqlbinlog-cp932
6) mysqlbinlog2
7) mysqlbinlog_base64
=== Background ===
mtr runs the tests placed in main suite with binlog format=stmt.
Those that need to be tested against binlog format=row or mixed
or more than one binlog format and require only one mysql server
are placed in binlog suite. mtr runs tests in binlog suite with
all three binlog formats(stmt,row and mixed).
=== Fix ===
1) Moved the test listed in problem section above to binlog suite.
2) Added prefix "binlog_" to the name of each test case moved.
Renamed the coresponding result files and option files accordingly.
mysql-test/extra/binlog_tests/mysqlbinlog_row_engine.inc:
include file for mysqlbinlog_row_myisam.test and
mysqlbinlog_row_myisam.test which are being moved to
binlog suite.
mysql-test/suite/binlog/r/binlog_mysqlbinlog-cp932.result:
result file for mysqlbinlog-cp932.test which is being moved to
binlog suite.
mysql-test/suite/binlog/r/binlog_mysqlbinlog2.result:
result file for mysqlbinlog2.test which is being moved to
binlog suite.
mysql-test/suite/binlog/r/binlog_mysqlbinlog_base64.result:
result file for mysqlbinlog_base64.test which is being moved to
binlog suite.
mysql-test/suite/binlog/r/binlog_mysqlbinlog_row.result:
result file for mysqlbinlog_row.test which is being moved to
binlog suite.
mysql-test/suite/binlog/r/binlog_mysqlbinlog_row_innodb.result:
result file for mysqlbinlog_row_innodb.test which is being moved to
binlog suite.
mysql-test/suite/binlog/r/binlog_mysqlbinlog_row_myisam.result:
result file for mysqlbinlog_row_myisam.test which is being moved to
binlog suite.
mysql-test/suite/binlog/r/binlog_mysqlbinlog_row_trans.result:
result file for mysqlbinlog_row_trans.test which is being moved to
binlog suite.
mysql-test/suite/binlog/t/binlog_mysqlbinlog-cp932-master.opt:
option file for mysqlbinlog-cp932.test which is being moved to
binlog suite.
mysql-test/suite/binlog/t/binlog_mysqlbinlog-cp932.test:
the test requires binlog format=stmt or mixed. Since, it was placed in
main suite earlier, it was only run with binlog format=stmt, and hence
this test was never run with binlog format=mixed.
mysql-test/suite/binlog/t/binlog_mysqlbinlog2.test:
the test requires binlog format=stmt or mixed. Since, it was placed in
main suite earlier, it was only run with binlog format=stmt, and hence
this test was never run with binlog format=mixed.
mysql-test/suite/binlog/t/binlog_mysqlbinlog_base64.test:
the test requires binlog format=row. Since, it was placed in main
suite earlier, it was only run with binlog format=stmt, and hence
this test was always skipped by mtr.
mysql-test/suite/binlog/t/binlog_mysqlbinlog_row.test:
the test requires binlog format=row. Since, it was placed in main
suite earlier, it was only run with binlog format=stmt, and hence
this test was always skipped by mtr.
mysql-test/suite/binlog/t/binlog_mysqlbinlog_row_innodb.test:
the test requires binlog format=row. Since, it was placed in main
suite earlier, it was only run with binlog format=stmt, and hence
this test was always skipped by mtr.
mysql-test/suite/binlog/t/binlog_mysqlbinlog_row_myisam.test:
the test requires binlog format=row. Since, it was placed in main
suite earlier, it was only run with binlog format=stmt, and hence
this test was always skipped by mtr.
mysql-test/suite/binlog/t/binlog_mysqlbinlog_row_trans.test:
the test requires binlog format=row. Since, it was placed in main
suite earlier, it was only run with binlog format=stmt, and hence
this test was always skipped by mtr.
SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE
LOOKUPS (honoring kill query while accessing sec_index)
If secondary index is being used for select query evaluation and this
query is operating with consistent read snapshot it might take good time for
secondary index to return back control to mysql as MVCC would kick in.
If user issues "kill query <id>" while query is actively accessing
secondary index it will not be honored as there is no hook to check
for this condition. Added hook for this check.
-----
Parallely secondary index taking too long to evaluate for consistent
read snapshot case is being examined for performance improvement. WL#6540.
QUOTING IN REPLICATION
Problem: Misquoting or unquoted identifiers may lead to
incorrect statements to be logged to the binary log.
Fix: we use specialized functions to append quoted identifiers in
the statements generated by the server.
THOUGH IT IS NOT.
The following error message is misleading because it claims
that the BLOB space is not counted.
"ERROR 1118 (42000): Row size too large. The maximum row size for
the used table type, not counting BLOBs, is 8126. You have to
change some columns to TEXT or BLOBs"
When the ROW_FORMAT=compact or ROW_FORMAT=REDUNDANT is used,
the BLOB prefix is stored inline along with the row. So
the above error message is changed as follows depending on
the row format used:
For ROW_FORMAT=COMPRESSED or ROW_FORMAT=DYNAMIC, the error
message is as follows:
"ERROR 42000: Row size too large (> 8126). Changing some
columns to TEXT or BLOB may help. In current row format,
BLOB prefix of 0 bytes is stored inline."
For ROW_FORMAT=COMPACT or ROW_FORMAT=REDUNDANT, the error
message is as follows:
"ERROR 42000: Row size too large (> 8126). Changing some
columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC or
ROW_FORMAT=COMPRESSED may help. In current row
format, BLOB prefix of 768 bytes is stored inline."
rb://1252 approved by Marko Makela
Problem description:
Table 't' created with two colums having compound index on both the
columns under innodb/myisam engine at remote machine. In the local
machine same table is created undet the federated engine.
A select having where clause with along 'AND' operation gives wrong
results on local machine.
Analysis:
The given query at federated engine is wrongly transformed by
federated::create_where_from_key() function and the same was sent to
the remote machine. Hence the local machine is showing wrong results.
Given query "select c1 from t where c1 <= 2 and c2 = 1;"
Query transformed, after ha_federated::create_where_from_key() function is:
SELECT `c1`, `c2` FROM `t` WHERE (`c1` IS NOT NULL ) AND
( (`c1` >= 2) AND (`c2` <= 1) ) and the same sent to real_query().
In the above the '<=' and '=' conditions were transformed to '>=' and
'<=' respectively.
ha_federated::create_where_from_key() function behaving as below:
The key_range is having both the start_key and end_key. The start_key
is used to get "(`c1` IS NOT NULL )" part of the where clause, this
transformation is correct. The end_key is used to get "( (`c1` >= 2)
AND (`c2` <= 1) )", which is wrong, here the given conditions('<=' and '=')
are changed as wrong conditions('>=' and '<=').
The end_key is having {key = 0x39fa6d0 "", length = 10, keypart_map = 3,
flag = HA_READ_AFTER_KEY}
The store_length is having value '5'. Based on store_length and length
values the condition values is applied in HA_READ_AFTER_KEY switch case.
The switch case 'HA_READ_AFTER_KEY' is applicable to only the last part of
the end_key and for previous parts it is going to 'HA_READ_KEY_OR_NEXT' case,
here the '>=' is getting added as a condition instead of '<='.
Fix:
Updated the 'if' condition in 'HA_READ_AFTER_KEY' case to affect for all
parts of the end_key. i.e 'i > 0' will used for end_key, Hence added it in
the if condition.
mysql-test/suite/federated/federated.test:
modified the federated.inc file location
mysql-test/suite/federated/federated_archive.test:
modified the federated.inc file location
mysql-test/suite/federated/federated_bug_13118.test:
modified the federated.inc file location
mysql-test/suite/federated/federated_bug_25714.test:
modified the federated.inc file location
mysql-test/suite/federated/federated_bug_35333.test:
modified the federated.inc file location
mysql-test/suite/federated/federated_debug.test:
modified the federated.inc file location
mysql-test/suite/federated/federated_innodb.test:
modified the federated.inc file location
mysql-test/suite/federated/federated_server.test:
modified the federated.inc file location
mysql-test/suite/federated/federated_transactions.test:
modified the federated.inc file location
mysql-test/suite/federated/include/federated.inc:
moved the file from federated suite to federated/include folder
mysql-test/suite/federated/include/federated_cleanup.inc:
moved the file from federated suite to federated/include folder
mysql-test/suite/federated/include/have_federated_db.inc:
moved the file from federated suite to federated/include folder
storage/federated/ha_federated.cc:
updated the 'if condition' in ha_federated::create_where_from_key()
function.
Backporting the WL#5716, "Information schema table for InnoDB
buffer pool information". Backporting revisions 2876.244.113,
2876.244.102 from mysql-trunk.
rb://1175 approved by Jimmy Yang.
Print the warning(note):
YEAR(x) is deprecated and will be removed in a future release. Please use YEAR(4) instead
on "CREATE TABLE ... YEAR(x)" or "ALTER TABLE MODIFY ... YEAR(x)", where x != 4
Problem
========
Replication breaks in the cases if the event length exceeds
the size of master Dump thread's max_allowed_packet.
The reason why this failure is occuring is because the event length is
more than the total size of the max_allowed_packet, on addition of the
max_event_header length exceeds the max_allowed_packet of the DUMP thread.
This causes the Dump thread to break replication and throw an error.
That can happen e.g with row-based replication in Update_rows event.
Fix
====
The problem is fixed in 2 steps:
1.) The Dump thread limit to read event is increased to the upper limit
i.e. Dump thread reads whatever gets logged in the binary log.
2.) On the slave side we increase the the max_allowed_packet for the
slave's threads (IO/SQL) by increasing it to 1GB.
This is done using the new server option (slave_max_allowed_packet)
included, is used to regulate the max_allowed_packet of the
slave thread (IO/SQL) by the DBA, and facilitates the sending of
large packets from the master to the slave.
This causes the large packets to be received by the slave and apply
it successfully.
sql/log_event.cc:
The max_allowed_packet is not evaluated to the new option
slave_max_allowed_packet after the fix.
sql/log_event.h:
Added the new option in the log_event.h file.
sql/mysqld.cc:
Added a new option to the server.
sql/slave.cc:
Increasing the session max_allowed_packet to a large value,
i.e. not taking global(max_allowed) into consideration, for the slave's threads.
sql/sql_repl.cc:
The dump thread's max_allowed_packet is set to the upper limit
which makes it independent and it now reads whatever gets
logged in the binary log.
Problem
========
SQL statements close to the size of max_allowed_packet produce binary
log events larger than max_allowed_packet.
The reason why this failure is occuring is because the event length is
more than the total size of the max_allowed_packet + max_event_header
length. Now since the event length exceeds this size master Dump
thread is unable to send the packet on to the slave.
That can happen e.g with row-based replication in Update_rows event.
Fix
====
The problem was fixed by increasing the max_allowed_packet for the
slave's threads (IO/SQL) by increasing it to 1GB.
This is done using the new server option included which is used to
regulate the max_allowed_packet of the slave thread (IO/SQL).
This causes the large packets to be received by the slave and apply
it successfully.
sql/log_event.h:
Added the new option in the log_event.h file.
sql/mysqld.cc:
Added a new option to the server.
sql/slave.cc:
Increasing the session max_allowed_packet to a large value ,
i.e. not taking global(max_allowed) into consideration, for the slave's threads.
BY A CONCURRENT TRANSACTIO
The member function QUICK_RANGE_SELECT::init_ror_merged_scan() performs
a table handler clone. Innodb does not provide a clone operation.
The ha_innobase::clone() is not there. The handler::clone() does not
take care of the ha_innobase->prebuilt->select_lock_type. Because of
this what happens is that for one index we do a locking read, and
for the other index we were doing a non-locking (consistent) read.
The patch introduces ha_innobase::clone() member function.
It is implemented similar to ha_myisam::clone(). It calls the
base class handler::clone() and then does any additional operation
required. I am setting the ha_innobase->prebuilt->select_lock_type
correctly.
rb://1060 approved by Marko
The function mysql_show_binlog_events has a local stack variable
'LOG_INFO linfo;', which is assigned to thd->current_linfo, however
this variable goes out of scope and is destroyed before clean
thd->current_linfo.
The problem is solved by moving 'LOG_INFO linfo;' to function scope.
BUG#11761686 insert_id event is not filtered.
Two issues are covered.
INSERT into autoincrement field which is not the first part in the composed primary key
is unsafe by autoincrement logging design. The case is specific to MyISAM engine
because Innodb does not allow such table definition.
However no warnings and row-format logging in the MIXED mode was done, and
that is fixed.
Int-, Rand-, User-var log-events were not filtered along with their parent
query that made possible them to screw up execution context of the following
query.
Fixed with deferring their execution until the parent query.
******
Bug#11754117
Post review fixes.
mysql-test/suite/rpl/r/rpl_auto_increment_bug45679.result:
a new result file is added.
mysql-test/suite/rpl/r/rpl_filter_tables_not_exist.result:
results updated.
mysql-test/suite/rpl/t/rpl_auto_increment_bug45679.test:
regression test for BUG#11754117-45670 is added.
mysql-test/suite/rpl/t/rpl_filter_tables_not_exist.test:
regression test for filtering issue of BUG#11754117 - 45670 is added.
sql/log_event.cc:
Logics are added for deferring and executing events associated
with the Query event.
sql/log_event.h:
Interface to deferred events batch execution is added.
sql/rpl_rli.cc:
initialization for new RLI members is added.
sql/rpl_rli.h:
New members to RLI are added to facilitate deferred events gathering
and execution control;
two general character RLI cleanup methods are constructed.
sql/rpl_utility.cc:
Deferred_log_events methods are difined.
sql/rpl_utility.h:
A new class Deferred_log_events is defined to implement
IRU events gathering, execution and cleanup.
sql/slave.cc:
Necessary changes to initialize `rli->deferred_events' and prevent
deferred event deletion in the main read-exec branch.
sql/sql_base.cc:
A new safe-check function for multi-part pk with auto-increment is defined
and deployed in lock_tables().
sql/sql_class.cc:
Initialization for a new member and replication cleanups are added
to THD class.
sql/sql_class.h:
THD class receives a new member to hold a specific execution
context for slave applier.
sql/sql_parse.cc:
Execution of the deferred event in started prior to its parent query.
Currently SHOW MASTER LOGS and SHOW BINARY LOGS require the SUPER
privilege. Monitoring tools (such as MEM) often want to check this
output - for instance MEM generates the SUM of the sizes of the logs
reported here, and puts that in the Replication overview within the MEM
Dashboard.
However, because of the SUPER requirement, these tools often have an
account that holds open the connection whilst monitoring, and can lock
out administrators when the server gets overloaded and reaches
max_connections - there is already another SUPER privileged account
connected, the "monitor".
As SHOW MASTER STATUS, and all other replication related statements,
return with either REPLICATION CLIENT or SUPER privileges, this worklog
is to make SHOW MASTER LOGS and SHOW BINARY LOGS be consistent with this
as well, and allow both of these commands with either SUPER or
REPLICATION CLIENT.
This allows monitoring tools to not require a SUPER privilege any more,
so is safer in overloaded situations, as well as being more secure, as
lighter privileges can be given to users of such tools or scripts.
The test case must insert all the records using a single transaction. Otherwise the test
case takes more than 15 minutes and will time out in pb2 and mtr.
BUG#64503: mysql frequently ignores --relay-log-space-limit
When the SQL thread goes to sleep, waiting for more events, it sets
the flag ignore_log_space_limit to true. This gives the IO thread a
chance to queue some more events and ultimately the SQL thread will be
able to purge the log once it is rotated. By then the SQL thread
resets the ignore_log_space_limit to false. However, between the time
the SQL thread has set the ignore flag and the time it resets it, the
IO thread will be queuing events in the relay log, possibly going way
over the limit.
This patch makes the IO and SQL thread to synchronize when they reach
the space limit and only ask for one event at a time. Thus the SQL
thread sets ignore_log_space_limit flag and the IO thread resets it to
false everytime it processes one more event. In addition, everytime
the SQL thread processes the next event, and the limit has been
reached, it checks if the IO thread should rotate. If it should, it
instructs the IO thread to rotate, giving the SQL thread a chance to
purge the logs (freeing space). Finally, this patch removes the
resetting of the ignore_log_space_limit flag from purge_first_log,
because this is now reset by the IO thread every time it processes the
next event when the limit has been reached.
If the SQL thread is in a transaction, it cannot purge so, there is no
point in asking the IO thread to rotate. The only thing it can do is
to ask for more events until the transaction is over (then it can ask
the IO to rotate and purge the log right away). Otherwise, there would
be a deadlock (SQL would not be able to purge and IO thread would not
be able to queue events so that the SQL would finish the transaction).
truncating, inserting the same set of rows. When a table is
re-created with the same set of rows, the data file size must
not grow.
rb:968
Approved by Marko.
There are two threads. In one thread, dml operation is going on
involving cascaded update operation. In another thread, alter
table add foreign key constraint is happening. Under these
circumstances, it is possible for the dml thread to access a
dict_foreign_t object that has been freed by the ddl thread.
The debug sync test case provides the sequence of operations.
Without fix, the test case will crash the server (because of
newly added assert). With fix, the alter table stmt will return
an error message.
Backporting the fix from MySQL 5.5 to 5.1
rb:961
rb:947
Analysis:
========================
sql_mode "NO_BACKSLASH_ESCAPES": When user want to use backslash as character input,
instead of escape character in a string literal then sql_mode can be set to
"NO_BACKSLASH_ESCAPES". With this mode enabled, backslash becomes an ordinary
character like any other.
SQL_MODE set applies to the current client session. And while creating the stored
procedure, MySQL stores the current sql_mode and always executes the stored
procedure in sql_mode stored with the Procedure, regardless of the server SQL
mode in effect when the routine is invoked.
In the scenario (for which bug is reported), the routine is created with
sql_mode=NO_BACKSLASH_ESCAPES. And routine is executed with the invoker sql_mode
is "" (NOT SET) by executing statement "call testp('Axel\'s')".
Since invoker sql_mode is "" (NOT_SET), the '\' in 'Axel\'s'(argument to function)
is considered as escape character and column "a" (of table "t1") values are
updated with "Axel's". The binary log generated for above update operation is as below,
set sql_mode=XXXXXX (for no_backslash_escapes)
update test.t1 set a= NAME_CONST('var',_latin1'Axel\'s' COLLATE 'latin1_swedish_ci');
While logging stored procedure statements, the local variables (params) used in
statements are replaced with the NAME_CONST(var_name, var_value) (Internal function)
(http://dev.mysql.com/doc/refman/5.6/en/miscellaneous-functions.html#function_name-const)
On slave, these logs are applied. NAME_CONST is parsed to get the variable and its
value. Since, stored procedure is created with sql_mode="NO_BACKSLASH_ESCAPES", the sql_mode
is also logged in. So that at slave this sql_mode is set before executing the statements
of routine. So at slave, sql_mode is set to "NO_BACKSLASH_ESCAPES" and then while
parsing NAME_CONST of string variable, '\' is considered as NON ESCAPE character
and parsing reported error for "'" (as we have only one "'" no backslash).
At slave, parsing was proper with sql_mode "NO_BACKSLASH_ESCAPES".
But above error reported while writing bin log, "'" (of Axel's) is escaped with
"\" character. Actually, all special characters (n, r, ', ", \, 0...) are escaped
while writing NAME_CONST for string variable(param, local variable) in bin log
Airrespective of "NO_BACKSLASH_ESCAPES" sql_mode. So, basically, the problem is
that logging string parameter does not take into account sql_mode value.
Fix:
========================
So when sql_mode is set to "NO_BACKSLASH_ESCAPES", escaping characters as
(n, r, ', ", \, 0...) should be avoided. To do so, added a check to not to
escape such characters while writing NAME_CONST for string variables in bin
log.
And when sql_mode is set to NO_BACKSLASH_ESCAPES, quote character "'" is
represented as ''.
http://dev.mysql.com/doc/refman/5.6/en/string-literals.html (There are several
ways to include quote characters within a string: )
mysql-test/r/sql_mode.result:
Added test case for Bug#12601974.
mysql-test/suite/binlog/r/binlog_sql_mode.result:
Appended result of test cases added for Bug#12601974.
mysql-test/suite/binlog/t/binlog_sql_mode.test:
Added test case for Bug#12601974.
mysql-test/t/sql_mode.test:
Appended result of test cases added for Bug#12601974.
Suppress innodb_bug34300 from failing if InnoDB prints:
120221 11:05:03 InnoDB: ERROR: the age of the last checkpoint is 9439048,
InnoDB: which exceeds the log group capacity 9433498.
by default the log capacity is 2 log files, 5 MB each.
This bug was originally filed and fixed as Bug#12612184. The original
fix was buggy, and it was patched by Bug#12704861. Also that patch was
buggy (potentially breaking crash recovery), and both fixes were
reverted.
This fix was not ported to the built-in InnoDB of MySQL 5.1, because
the function signatures of many core functions are different from
InnoDB Plugin and later versions. The block allocation routines and
their callers would have to changed so that they handle block
descriptors instead of page frames.
When a record is updated so that its size grows, non-updated columns
can be selected for external (off-page) storage. The bug is that the
initially inserted updated record contains an all-zero BLOB pointer to
the field that was not updated. Only after the BLOB pages have been
allocated and written, the valid pointer can be written to the record.
Between the release of the page latch in mtr_commit(mtr) after
btr_cur_pessimistic_update() and the re-latching of the page in
btr_pcur_restore_position(), other threads can see the invalid BLOB
pointer consisting of 20 zero bytes. Moreover, if the system crashes
at this point, the situation could persist after crash recovery, and
the contents of the non-updated column would be permanently lost.
The problem is amplified by the ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPRESSED that were introduced in
innodb_file_format=barracuda in InnoDB Plugin, but the bug does exist
in all InnoDB versions.
The fix is as follows. After a pessimistic B-tree operation that needs
to write out off-page columns, allocate the pages for these columns in
the mini-transaction that performed the B-tree operation (btr_mtr),
but write the pages in a separate mini-transaction (blob_mtr). Do
mtr_commit(blob_mtr) before mtr_commit(btr_mtr). A quirk: Do not reuse
pages that were previously freed in btr_mtr. Only write the off-page
columns to 'fresh' pages.
In this way, crash recovery will see redo log entries for blob_mtr
before any redo log entry for btr_mtr. It will apply the BLOB page
writes to pages that were marked free at that point. If crash recovery
fails to see all of the btr_mtr redo log, there will be some
unreachable BLOB data in free pages, but the B-tree will be in a
consistent state.
btr_page_alloc_low(): Renamed from btr_page_alloc(). Add the parameter
init_mtr. Return an allocated block, or NULL. If init_mtr!=mtr but
the page was already X-latched in mtr, do not initialize the page.
btr_page_alloc(): Wrapper for btr_page_alloc_for_ibuf() and
btr_page_alloc_low().
btr_page_free(): Add a debug assertion that the page was a B-tree page.
btr_lift_page_up(): Return the father block.
btr_compress(), btr_cur_compress_if_useful(): Add the parameter ibool
adjust, for adjusting the cursor position.
btr_cur_pessimistic_update(): Preserve the cursor position when
big_rec will be written and the new flag BTR_KEEP_POS_FLAG is defined.
Remove a duplicate rec_get_offsets() call. Keep the X-latch on
index->lock when big_rec is needed.
btr_store_big_rec_extern_fields(): Replace update_inplace with
an operation code, and local_mtr with btr_mtr. When not doing a
fresh insert and btr_mtr has freed pages, put aside any pages that
were previously X-latched in btr_mtr, and free the pages after
writing out all data. The data must be written to 'fresh' pages,
because btr_mtr will be committed and written to the redo log after
the BLOB writes have been written to the redo log.
btr_blob_op_is_update(): Check if an operation passed to
btr_store_big_rec_extern_fields() is an update or insert-by-update.
fseg_alloc_free_page_low(), fsp_alloc_free_page(),
fseg_alloc_free_extent(), fseg_alloc_free_page_general(): Add the
parameter init_mtr. Return an allocated block, or NULL. If
init_mtr!=mtr but the page was already X-latched in mtr, do not
initialize the page.
xdes_get_descriptor_with_space_hdr(): Assert that the file space
header is being X-latched.
fsp_alloc_from_free_frag(): Refactored from fsp_alloc_free_page().
fsp_page_create(): New function, for allocating, X-latching and
potentially initializing a page. If init_mtr!=mtr but the page was
already X-latched in mtr, do not initialize the page.
fsp_free_page(): Add ut_ad(0) to the error outcomes.
fsp_free_page(), fseg_free_page_low(): Increment mtr->n_freed_pages.
fsp_alloc_seg_inode_page(), fseg_create_general(): Assert that the
page was not previously X-latched in the mini-transaction. A file
segment or inode page should never be allocated in the middle of an
mini-transaction that frees pages, such as btr_cur_pessimistic_delete().
fseg_alloc_free_page_low(): If the hinted page was allocated, skip the
check if the tablespace should be extended. Return NULL instead of
FIL_NULL on failure. Remove the flag frag_page_allocated. Instead,
return directly, because the page would already have been initialized.
fseg_find_free_frag_page_slot() would return ULINT_UNDEFINED on error,
not FIL_NULL. Correct a bogus assertion.
fseg_alloc_free_page(): Redefine as a wrapper macro around
fseg_alloc_free_page_general().
buf_block_buf_fix_inc(): Move the definition from the buf0buf.ic to
buf0buf.h, so that it can be called from other modules.
mtr_t: Add n_freed_pages (number of pages that have been freed).
page_rec_get_nth_const(), page_rec_get_nth(): The inverse function of
page_rec_get_n_recs_before(), get the nth record of the record
list. This is faster than iterating the linked list. Refactored from
page_get_middle_rec().
trx_undo_rec_copy(): Add a debug assertion for the length.
trx_undo_add_page(): Return a block descriptor or NULL instead of a
page number or FIL_NULL.
trx_undo_report_row_operation(): Add debug assertions.
trx_sys_create_doublewrite_buf(): Assert that each page was not
previously X-latched.
page_cur_insert_rec_zip_reorg(): Make use of page_rec_get_nth().
row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG, so that
the repositioning of the cursor can be avoided.
row_ins_index_entry_low(): Add DEBUG_SYNC points before and after
writing off-page columns. If inserting by updating a delete-marked
record, do not reposition the cursor or commit the mini-transaction
before writing the off-page columns.
row_build(): Tighten a debug assertion about null BLOB pointers.
row_upd_clust_rec(): Add DEBUG_SYNC points before and after writing
off-page columns. Do not reposition the cursor or commit the
mini-transaction before writing the off-page columns.
rb:939 approved by Jimmy Yang
GRACEFUL SHUTDOWN
During startup mysql picks up .frm files from the tmpdir directory and
tries to drop those tables in the storage engine.
The problem is that when tmpdir ends in / then ha_innobase::delete_table()
is passed a string like "/var/tmp//#sql123", then it wrongly normalizes it
to "/#sql123" and calls row_drop_table_for_mysql() which of course fails
to delete the table entry from the InnoDB dictionary cache.
ha_innobase::delete_table() returns an error but nevertheless mysql wipes
away the .frm file and the entry in the InnoDB dictionary cache remains
orphaned with no easy way to remove it.
The "no easy" way to remove it is to create a similar temporary table again,
copy its .frm file to tmpdir under "#sql123.frm" and restart mysqld with
tmpdir=/var/tmp (no trailing slash) - this way mysql will pick the .frm file
after restart and will try to issue drop table for "/var/tmp/#sql123"
(notice do double slash), ha_innobase::delete_table() will normalize it to
"tmp/#sql123" and row_drop_table_for_mysql() will successfully remove the
table entry from the dictionary cache.
The solution is to fix normalize_table_name_low() to normalize things like
"/var/tmp//table" correctly to "tmp/table".
This patch also adds a test function which invokes
normalize_table_name_low() with various inputs to make sure it works
correctly and a mtr test that calls this test function.
Reviewed by: Marko (http://bur03.no.oracle.com/rb/r/929/)
Test extra/rpl_tests/rpl_extra_col_master.test (used by
rpl_extra_col_master_*) ends with the active connection pointing to the
slave. Thus, the two last tests never succeed in changing the binlog
format of the master away from 'row'. With correct active connection
(master) tests fail for binlog 'statement' and 'mixed' formats.
Tests rpl_extra_col_master_* only run when binary log format is
row. Statement and mix replication do not make sense in this
tests since it will try to execute statements on columns that do
not exist. This fix is basically a backport from mysql-5.5, see
changes done as part of BUG 39934.
If we meet DB_TOO_MANY_CONCURRENT_TRXS during the execution tab_create_graph from row_create_table_for_mysql(), .ibd file for the table should be created already but was not deleted for the error handling.
rb:875 approved by Jimmy Yang