STATUS OF ROLLBACKED TRANSACTION" and bug #17054007 - "TRANSACTION
IS NOT FULLY ROLLED BACK IN CASE OF INNODB DEADLOCK".
The problem in the first bug report was that although deadlock involving
metadata locks was reported using the same error code and message as InnoDB
deadlock it didn't rollback transaction like the latter. This caused
confusion to users as in some cases after ER_LOCK_DEADLOCK transaction
could have been restarted immediately and in some cases rollback was
required.
The problem in the second bug report was that although InnoDB deadlock
caused transaction rollback in all storage engines it didn't cause release
of metadata locks. So concurrent DDL on the tables used in transaction was
blocked until implicit or explicit COMMIT or ROLLBACK was issued in the
connection which got InnoDB deadlock.
The former issue has stemmed from the fact that when support for detection
and reporting metadata locks deadlocks was added we erroneously assumed
that InnoDB doesn't rollback transaction on deadlock but only last statement
(while this is what happens on InnoDB lock timeout actually) and so didn't
implement rollback of transactions on MDL deadlocks.
The latter issue was caused by the fact that rollback of transaction due
to deadlock is carried out by setting THD::transaction_rollback_request
flag at the point where deadlock is detected and performing rollback
inside of trans_rollback_stmt() call when this flag is set. And
trans_rollback_stmt() is not aware of MDL locks, so no MDL locks are
released.
This patch solves these two problems in the following way:
- In case when MDL deadlock is detect transaction rollback is requested
by setting THD::transaction_rollback_request flag.
- Code performing rollback of transaction if THD::transaction_rollback_request
is moved out from trans_rollback_stmt(). Now we handle rollback request
on the same level as we call trans_rollback_stmt() and release statement/
transaction MDL locks.
IN TIME RECOVERY FAILURE ON SLAVES
Problem:
DROP TEMP TABLE IF EXISTS commands can cause point
in time recovery (re-applying binlog) failures.
Analyses:
In RBR, 'DROP TEMPORARY TABLE' commands are
always binlogged by adding 'IF EXISTS' clauses.
Also, the slave SQL thread will not check replicate.* filter
rules for "DROP TEMPORARY TABLE IF EXISTS" queries.
If log-slave-updates is enabled on slave, these queries
will be binlogged in the format of "USE `db`;
DROP TEMPORARY TABLE IF EXISTS `t1`;" irrespective
of filtering rules and irrespective of the `db` existence.
When users try to recover slave from it's own binlog,
use `db` command might fail if `db` is not present on slave.
Fix:
At the time of writing the 'DROP TEMPORARY TABLE
IF EXISTS' query into the binlog, 'use `db`' will not be
present and the table name in the query will be a fully
qualified table name.
Eg:
'USE `db`; DROP TEMPORARY TABLE IF EXISTS `t1`;'
will be logged as
'DROP TEMPORARY TABLE IF EXISTS `db`.`t1`;'.
CAN LEAD TO MISSING TABLES
Overview
--------
If the FOREIGN_KEY_CHECKS system variable is set to 0, it is
possible to break a foreign key constraint by changing the type
or character set of the foreign key column, or by dropping the
foreign key index (without carrying out corresponding changes on
another table in the relationship).
If we subsequently set FOREIGN_KEY_CHECKS to 1 and execute ALTER
TABLE involving the COPY algorithm on such a table, the following
happens:
1) If ALTER TABLE does not contain a RENAME clause, the attempt
to install the new version of the table instead of the old one
will fail due to the fact that the inconsistency will be
detected. An attempt to revert the partially executed alter
table operation by restoring the old table definition will
fail as well due to FOREIGN_KEY_CHECKS == 1. As a result, the
table being altered will be lost.
2) If ALTER TABLE contains the RENAME clause, the inconsistency
will not be detected (most probably due to other bugs). But if
an attempt to install the new version of the table fails (for
example, due to a failure when updating triggers associated
with the table), reverting the partially executed alter table
by restoring the old table definition will fail too. So the
table being altered might be lost as well.
Suggested fix
-------------
The suggested fix is to temporarily unset the option bit
representing FOREIGN_KEY_CHECKS when the old table definition is
restored while reverting the partially executed operation.
SHOW ENGINE INNOD
Problem:
The purpose of explain_filename() is to provide useful additional
information regarding the partitions given the filename. This function
was returning an error when it was not able to parse the given filename.
For example, within InnoDB, temporary files are created with #sql-
prefix. But this function was not able to parse it correctly.
Solution:
It is not an error, if explain_filename() could not parse the given
filename. If there is no partition information to explain, then silently
return from the function.
rb#1940 approved by mattiasj
DOWNGRADED FROM 5.6.11 TO 5.6.10
Problem was new syntax not accepted by previous version.
Fixed by adding version comment of /*!50531 around the
new syntax.
Like this in the .frm file:
'PARTITION BY KEY /*!50611 ALGORITHM = 2 */ () PARTITIONS 3'
and also changing the output from SHOW CREATE TABLE to:
CREATE TABLE t1 (a INT)
/*!50100 PARTITION BY KEY */ /*!50611 ALGORITHM = 1 */ /*!50100 ()
PARTITIONS 3 */
It will always add the ALGORITHM into the .frm for KEY [sub]partitioned
tables, but for SHOW CREATE TABLE it will only add it in case it is the non
default ALGORITHM = 1.
Also notice that for 5.5, it will say /*!50531 instead of /*!50611, which
will make upgrade from 5.5 > 5.5.31 to 5.6 < 5.6.11 fail!
If one downgrades an fixed version to the same major version (5.5 or 5.6) the
bug 14521864 will be visible again, but unless the .frm is updated, it will
work again when upgrading again.
Also fixed so that the .frm does not get updated version
if a single partition check passes.
PROBLEM
-------
optimize on partiton will recreate the whole table
instead of just partition.
ANALYSIS
--------
At present innodb doesn't support optimize option ,so we do a rebuild of the
whole table and then call analyze() on the table.Presently for any optimize()
option (on table or partition) we display the following info to the user
"Table does not support optimize, doing recreate + analyze instead".
FIX
---
It was decided for GA versions(5.1 and 5.5) whenever the user tries to
optimize a partition(s) we will will display the following info the user
"Table does not support optimize on partitions.
All partitions will be rebuilt and analyzed."
Earlier partitions were not analyzed.Now all partitions will be analyzed.
If the user wants to optimize the whole table ,we will display the
previous info to the user. i.e
"Table does not support optimize, doing recreate + analyze instead"
For 5.6+ versions we will raise a new bug to support optimize() options
in innodb.
FAILED IN DEACTIVATE_DDL_LOG_ENTRY
deallocate_ddl_log_entry() can be called without having
locked LOCK_gdl. It uses a global buffer for reading and
writing entries in the ddl_log, and since it is not protected
by any mutex, two concurrent threads can overwrite the
content in the global buffer, so it can be different from
what was read.
Thread a reads from entry 1 into global
buffer, thread b reads from entry 2 into global buffer,
thread a writes from global buffer into entry 1
-> entry 1 is not the content of entry 2.
This is especially bad for replace entries, which uses
two phases, and does not deactivate the whole entry
after the first phase, but increases the phase instead.
Fixed by using thread local storage (stack) instead of global
storage (global buffer).
Also added buffer and size arguments to
read/write_ddl_log_file_entry.
Also only read/write first bytes in entries in
deactivate_ddl_log_entry.
Also fixed the scenario where it will try to recover from a server
compiled with a different value of IO_SIZE (very uncommon!)
updated patch with set_ddl_log_entry_from_buf
and removed read_ddl_log_entry.
Manually tested, no test case included.
QUOTING IN REPLICATION
Problem: Misquoting or unquoted identifiers may lead to
incorrect statements to be logged to the binary log.
Fix: we use specialized functions to append quoted identifiers in
the statements generated by the server.
primary key with innodb tables
The bug was triggered if a single ALTER TABLE statement both
added and dropped indexes and ALTER TABLE failed during drop
(e.g. because the index was needed in a foreign key constraint).
In such cases, the server index information would get out of
sync with InnoDB - the added index would be present inside
InnoDB, but not in the server. This could then lead to InnoDB
error messages and/or server crashes.
The root cause is that new indexes are added before old indexes
are dropped. This means that if ALTER TABLE fails while dropping
indexes, index changes will be reverted in the server but not
inside InnoDB.
This patch fixes the problem by dropping any added indexes
if drop fails (for ALTER TABLE statements that both adds
and drops indexes).
However, this won't work if we added a primary key as this
key might not be possible to drop inside InnoDB. Therefore,
we resort to the copy algorithm if a primary key is added
by an ALTER TABLE statement that also drops an index.
In 5.6 this bug is more properly fixed by the handler interface
changes done in the scope of WL#5534 "Online ALTER".
Fixed by backport of:
------------------------------------------------------------
revno: 3402.50.156
committer: Jon Olav Hauglid <jon.hauglid@oracle.com>
branch nick: mysql-trunk-test
timestamp: Wed 2012-02-08 14:10:23 +0100
message:
Bug#13417754 ASSERT IN ROW_DROP_DATABASE_FOR_MYSQL DURING DROP SCHEMA
This assert could be triggered if an InnoDB table was being moved
to a different database using ALTER TABLE ... RENAME, while this
database concurrently was being dropped by DROP DATABASE.
The reason for the problem was that no metadata lock was taken
on the target database by ALTER TABLE ... RENAME.
DROP DATABASE was therefore not blocked and could remove
the database while ALTER TABLE ... RENAME was executing. This
could cause the assert in InnoDB to be triggered.
This patch fixes the problem by taking a IX metadata lock on
the target database before ALTER TABLE ... RENAME starts
moving a table to a different database.
Note that this problem did not occur with RENAME TABLE which
already takes the correct metadata locks.
Also note that this patch slightly changes the behavior of
ALTER TABLE ... RENAME. Before, the statement would abort and
return an error if a lock on the target table name could not
be taken immediately. With this patch, ALTER TABLE ... RENAME
will instead block and wait until the lock can be taken
(or until we get a lock timeout). This also means that it is
possible to get ER_LOCK_DEADLOCK errors in this situation
since we allow ALTER TABLE ... RENAME to wait and not just
abort immediately.
UNHANDLED, CONFUSING ERROR
The main confusion with the error message is that "it
implies that your data dictionary may now be out of
sync". This patch will remove the unwanted and the
misleading error message by not doing an unnecessary
operation in the error handling code.
rb://980 approved by: Dmitry Lenev
TABLES IN INCORRECT ENGINE
PROBLEM:
CREATE/ALTER TABLE currently can move system tables like
mysql.db, user, host etc, to engines other than MyISAM. This is not
completely supported as of now, by mysqld. When some of system tables
like plugin, servers, event, func, *_priv, time_zone* are moved
to innodb, mysqld restart crashes. Currently system tables
can be moved to BLACKHOLE also!!!.
ANALYSIS:
The problem is that there is no check before creating or moving
a system table to some particular engine.
System tables are suppose to be residing in MyISAM. We can think
of restricting system tables to exist only in MyISAM. But, there could
be future needs of these system tables to be part of other engines
by design. For eg, NDB cluster expects some tables to be on innodb
or ndb engine. This calls for a solution, by which system
tables can be supported by any desired engine, with minimal effort.
FIX:
The solution provides a handlerton interface using which,
mysqld server can query particular storage engine handlerton for
system tables that it supports. This way each storage engine
layer can define their own system database and system tables.
The check_engine() function uses the new handlerton function
ha_check_if_supported_system_table() to check if db.tablename
provided in the DDL is supported by the SE.
Note: This fix has modified a test in help.test, which was moving
mysql.help_* to innodb. The primary intention of the test was not
to move them between engines.
Problem: Statements that write to tables with auto_increment columns
based on the selection from another table, may lead to master
and slave going out of sync, as the order in which the rows
are retrieved from the table may differ on master and slave.
Solution: We mark writing to a table with auto_increment table
based on the rows selected from another table as unsafe. This
will cause the execution of such statements to throw a warning
and forces the statement to be logged in ROW if the logging
format is mixed.
Changes:
1. All the statements that writes to a table with auto_increment
column(s) based on the rows fetched from another table, will now
be unsafe.
2. CREATE TABLE with SELECT will now be unsafe.
SMALL KEY CACHE
The server crashed on division by zero because the key cache was not
initialized and the block length was 0 which was used in a division.
The fix was to not allow CACHE INDEX if the key cache was not initiallized.
Thus never try LOAD INDEX INTO CACHE for an uninitialized key cache.
Also added some windows files/directories to .bzrignore.
A patch for alter_table-big.test has been committed earlier.
This is a patch for create-big.test:
The test used to time-out after 900 seconds.
It relied on debug sleeps that are no longer present in the
code. Since the sleeps are long gone, fixing the problem didn't
involve just updating the result file or using macro
"show_binlog_events2.inc" instead of "show binlog events"
statement. The test needed to be rewritten using debug sync
points, and result then needed to be updated.
So, the sleeps have been replaced by debug_sync points and the test execution time has
been reduced significantly.
TO POSITION FIRST CAN CAUSE DATA TO BE CORRUPTED".
ALTER TABLE MODIFY/CHANGE ... FIRST did nothing except renaming
columns if new version of the table had exactly the same
structure as the old one (i.e. as result of such statement, names
of columns changed their order as specified but data in columns
didn't). The same thing happened for ALTER TABLE DROP COLUMN/ADD
COLUMN statements which were supposed to produce new version of
table with exactly the same structure as the old version of table.
I.e. in the latter case the result was the same as if old column
was renamed instead of being dropped and new column with default
as value being created.
Both these problems were caused by the fact that ALTER TABLE
implementation incorrectly interpreted both these situations as
simple renaming of columns and assumed that in-place ALTER TABLE
algorithm could have been used for them.
This patch fixes this problem by ensuring that in cases when some
column is moved to the first position or some column is dropped
the default ALTER TABLE algorithm involving table copying is
always used. This is achieved by detecting such situations in
mysql_prepare_alter_table() and setting Alter_info::change_level
to ALTER_TABLE_DATA_CHANGED for them.
In sql_class.cc, 'row_count', of type 'ha_rows', was used as last argument for
ER_TRUNCATED_WRONG_VALUE_FOR_FIELD which is
"Incorrect %-.32s value: '%-.128s' for column '%.192s' at row %ld".
So 'ha_rows' was used as 'long'.
On SPARC32 Solaris builds, 'long' is 4 bytes and 'ha_rows' is 'longlong' i.e. 8 bytes.
So the printf-like code was reading only the first 4 bytes.
Because the CPU is big-endian, 1LL is 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01
so the first four bytes yield 0. So the warning message had "row 0" instead of
"row 1" in test outfile_loaddata.test:
-Warning 1366 Incorrect string value: '\xE1\xE2\xF7' for column 'b' at row 1
+Warning 1366 Incorrect string value: '\xE1\xE2\xF7' for column 'b' at row 0
All error-messaging functions which internally invoke some printf-life function
are potential candidate for such mistakes.
One apparently easy way to catch such mistakes is to use
ATTRIBUTE_FORMAT (from my_attribute.h).
But this works only when call site has both:
a) the format as a string literal
b) the types of arguments.
So:
func(ER(ER_BLAH), 10);
will silently not be checked, because ER(ER_BLAH) is not known at
compile time (it is known at run-time, and depends on the chosen
language).
And
func("%s", a va_list argument);
has the same problem, as the *real* type of arguments is not
known at this site at compile time (it's known in some caller).
Moreover,
func(ER(ER_BLAH));
though possibly correct (if ER(ER_BLAH) has no '%' markers), will not
compile (gcc says "error: format not a string literal and no format
arguments").
Consequences:
1) ATTRIBUTE_FORMAT is here added only to functions which in practice
take "string literal" formats: "my_error_reporter" and "print_admin_msg".
2) it cannot be added to the other functions: my_error(),
push_warning_printf(), Table_check_intact::report_error(),
general_log_print().
To do a one-time check of functions listed in (2), the following
"static code analysis" has been done:
1) replace
my_error(ER_xxx, arguments for substitution in format)
with the equivalent
my_printf_error(ER_xxx,ER(ER_xxx), arguments for substitution in
format),
so that we have ER(ER_xxx) and the arguments *in the same call site*
2) add ATTRIBUTE_FORMAT to push_warning_printf(),
Table_check_intact::report_error(), general_log_print()
3) replace ER(xxx) with the hard-coded English text found in
errmsg.txt (like: ER(ER_UNKNOWN_ERROR) is replaced with
"Unknown error"), so that a call site has the format as string literal
4) this way, ATTRIBUTE_FORMAT can effectively do its job
5) compile, fix errors detected by ATTRIBUTE_FORMAT
6) revert steps 1-2-3.
The present patch has no compiler error when submitted again to the
static code analysis above.
It cannot catch all problems though: see Field::set_warning(), in
which a call to push_warning_printf() has a variable error
(thus, not replacable by a string literal); I checked set_warning() calls
by hand though.
See also WL 5883 for one proposal to avoid such bugs from appearing
again in the future.
The issues fixed in the patch are:
a) mismatch in types (like 'int' passed to '%ld')
b) more arguments passed than specified in the format.
This patch resolves mismatches by changing the type/number of arguments,
not by changing error messages of sql/share/errmsg.txt. The latter would be wrong,
per the following old rule: errmsg.txt must be as stable as possible; no insertions
or deletions of messages, no changes of type or number of printf-like format specifiers,
are allowed, as long as the change impacts a message already released in a GA version.
If this rule is not followed:
- Connectors, which use error message numbers, will be confused (by insertions/deletions
of messages)
- using errmsg.sys of MySQL 5.1.n with mysqld of MySQL 5.1.(n+1)
could produce wrong messages or crash; such usage can easily happen if
installing 5.1.(n+1) while /etc/my.cnf still has --language=/path/to/5.1.n/xxx;
or if copying mysqld from 5.1.(n+1) into a 5.1.n installation.
When fixing b), I have verified that the superfluous arguments were not used in the format
in the first 5.1 GA (5.1.30 'bteam@astra04-20081114162938-z8mctjp6st27uobm').
Had they been used, then passing them today, even if the message doesn't use them
anymore, would have been necessary, as explained above.
STATEMENTS FAIL".
Attempt to execute CREATE TABLE LIKE statement on a MyISAM
table with INDEX or DATA DIRECTORY options specified as a
source resulted in "MyISAM table '...' is in use..." error.
According to our documentation such a statement should create
a copy of source table with DATA/INDEX DIRECTORY options
omitted.
The problem was that new implementation of CREATE TABLE LIKE
statement in 5.5 tried to copy value of INDEX and DATA DIRECTORY
parameters from the source table. Since in description of source
table this parameters also included name of this table, attempt
to create target table with these parameter led to file name
conflict and error.
This fix addresses the problem by preserving documented and
backward-compatible behavior. I.e. by ensuring that contents
of DATA/INDEX DIRECTORY clauses for the source table is
ignored when target table is created.
SECONDARY INDEX IN INNODB
The patches for Bug#11751388 and Bug#11784056 enabled concurrent
reads while creating secondary indexes in InnoDB. However, they
introduced a regression. This regression occured if ALTER TABLE
failed after the index had been added, for example during the
lock upgrade needed to update .FRM. If this happened, InnoDB
and the server got out of sync with regards to which indexes
actually existed. Therefore the patch for Bug#11815600 again
disabled concurrent reads.
This patch re-enables concurrent reads. The original regression
is fixed by splitting the ADD INDEX operation into two parts.
First the new index is created but not made active. This is
done while concurrent reads are allowed. The second part of
the operation makes the index active (or reverts the change).
This is done after lock upgrade, which prevents the original
regression.
In order to implement this change, the patch changes the storage
API for in-place index creation. handler::add_index() is split
into two functions, handler_add_index() and
handler::final_add_index(). The former for creating indexes without
making them visible and the latter for commiting (i.e. making
visible) new indexes or reverting the changes.
Large parts of this patch were written by Marko Mäkelä.
Test case added to innodb_mysql_lock.test.
CLAUSE FAILS OR ABORTS SERVER".
Attempt to re-execute prepared ALTER TABLE statement which
involves .FRM-only changes and also have RENAME clause led
to unwarranted 'Table doesn't exist' error in production
builds and assertion failure for debug builds.
This problem stemmed from the fact that for such ALTER TABLE
mysql_alter_table() code changed table list element for table
to be altered when it tried to re-open table under new name.
Since this change was not reverted back before next
re-execution, it made this statement re-execution unsafe.
This fix addresses this problem by avoiding changing table list
element from the main table list in such a situation. Instead
temporary TABLE_LIST object is used.
FLUSH TABLES under FLUSH TABLES <list> WITH READ LOCK leads
to assert failure.
This assert was triggered if a statement tried up upgrade a metadata
lock with an active FLUSH TABLE <list> WITH READ LOCK. The assert
checks that the connection already holds a global intention exclusive
metadata lock. However, FLUSH TABLE <list> WITH READ LOCK does not
acquire this lock in order to be compatible with FLUSH TABLES WITH
READ LOCK. Therefore any metadata lock upgrade caused the assert to
be triggered.
This patch fixes the problem by preventing metadata lock upgrade
if the connection has an active FLUSH TABLE <list> WITH READ LOCK.
ER_TABLE_NOT_LOCKED_FOR_WRITE will instead be reported to the client.
Test case added to flush.test.
- Add new "format section" in extra data segment with additional table and
column properties. This was originally introduced in 5.1.20 based MySQL Cluster
- Remove hardcoded STORAGE DISK for table and instead
output the real storage format used. Keep both TABLESPACE
and STORAGE inside same version guard.
- Implement default version of handler::get_tablespace_name() since tablespace
is now available in share and it's unnecessary for each handler to implement.
(the function could actually be removed totally now).
- Add test for combinations of TABLESPACE and STORAGE with CREATE TABLE
and ALTER TABLE
- Add test to show that 5.5 now can read a .frm file created by MySQL Cluster
7.0.22. Although it does not yet show the column level attributes, they are read.
This is a backport of the patch for MySQL Bug#50574.
Adding a SPATIAL INDEX on non-geometrical columns caused a
segmentation fault when the table was subsequently
inserted into.
A test was added in mysql_prepare_create_table to explicitly
check whether non-geometrical columns are used in a
spatial index, and throw an error if so.
For MySQL 5.5 and later, a new and more meaningful error
message was introduced. For 5.1, we (re-)use an existing
error code.
that implement add_index
The problem was that ALTER TABLE blocked reads on an InnoDB table
while adding a secondary index, even if this was not needed. It is
only needed for the final step where the .frm file is updated.
The reason queries were blocked, was that ALTER TABLE upgraded the
metadata lock from MDL_SHARED_NO_WRITE (which blocks writes) to
MDL_EXCLUSIVE (which blocks all accesses) before index creation.
The way the server handles index creation, is that storage engines
publish their capabilities to the server and the server determines
which of the following three ways this can be handled: 1) build a
new version of the table; 2) change the existing table but with
exclusive metadata lock; 3) change the existing table but without
metadata lock upgrade.
For InnoDB and secondary index creation, option 3) should have been
selected. However this failed for two reasons. First, InnoDB did
not publish this capability properly.
Second, the ALTER TABLE code failed to made proper use of the
information supplied by the storage engine. A variable
need_lock_for_indexes was set accordingly, but was not later used.
This patch fixes this problem by only doing metadata lock upgrade
before index creation/deletion if this variable has been set.
This patch also changes some of the related terminology used
in the code. Specifically the use of "fast" and "online" with
respect to ALTER TABLE. "Fast" was used to indicate that an
ALTER TABLE operation could be done without involving a
temporary table. "Fast" has been renamed "in-place" to more
accurately describe the behavior.
"Online" meant that the operation could be done without taking
a table lock. However, in the current implementation writes
are always prohibited during ALTER TABLE and an exclusive
metadata lock is held while updating the .frm, so ALTER TABLE
is not completely online. This patch replaces "online" with
"in-place", with additional comments indicating if concurrent
reads are allowed during index creation/deletion or not.
An important part of this update of terminology is renaming
of the handler flags used by handlers to indicate if index
creation/deletion can be done in-place and if concurrent reads
are allowed. For example, the HA_ONLINE_ADD_INDEX_NO_WRITES
flag has been renamed to HA_INPLACE_ADD_INDEX_NO_READ_WRITE,
while HA_ONLINE_ADD_INDEX is now HA_INPLACE_ADD_INDEX_NO_WRITE.
Note that this is a rename to clarify current behavior, the
flag values have not changed and no flags have been removed or
added.
Test case added to innodb_mysql_sync.test.
Silence a warning about old table name when InnoDB tests whether the
format has changed using a nonexistent table name.
Reviewed by: bar@mysql.com, marko.makela@oracle.com
leave the table unusable".
Failing ALTER statement on partitioned table could have left
this table in an unusable state. This has happened in cases
when ALTER was executed using "fast" algorithm, which doesn't
involve copying of data between old and new versions of table,
and the resulting new table was incompatible with partitioning
function in some way.
The problem stems from the fact that discrepancies between new
table definition and partitioning function are discovered only
when the table is opened. In case of "fast" algorithm this has
happened too late during ALTER's execution, at the moment when
all changes were already done and couldn't have been reverted.
In the cases when "slow" algorithm, which copies data, is used
such discrepancies are detected at the moment new table
definition is opened implicitly when new version of table is
created in storage engine. As result ALTER is aborted before
any changes to table were done.
This fix tries to address this issue by ensuring that "fast"
algorithm behaves similarly to "slow" algorithm and checks
compatibility between new definition and partitioning function
by trying to open new definition after .FRM file for it has
been created.
Long term we probably should implement some way to check
compatibility between partitioning function and new table
definition which won't involve opening it, as this should
allow much cleaner fix for this problem.
breaks SBR
This pre-requisite patch refactors the code for dropping tables, used
by DROP TABLE and DROP DATABASE. The patch moves the code for acquiring
metadata locks out of mysql_rm_table_part2() and makes it the
responsibility of the caller. This in preparation of changing the
DROP DATABASE implementation to acquire all metadata locks before any
changes are made. mysql_rm_table_part2() is renamed
mysql_rm_table_no_locks() to reflect the change.
bug #57006 "Deadlock between HANDLER and FLUSH TABLES WITH READ
LOCK" and bug #54673 "It takes too long to get readlock for
'FLUSH TABLES WITH READ LOCK'".
The first bug manifested itself as a deadlock which occurred
when a connection, which had some table open through HANDLER
statement, tried to update some data through DML statement
while another connection tried to execute FLUSH TABLES WITH
READ LOCK concurrently.
What happened was that FTWRL in the second connection managed
to perform first step of GRL acquisition and thus blocked all
upcoming DML. After that it started to wait for table open
through HANDLER statement to be flushed. When the first connection
tried to execute DML it has started to wait for GRL/the second
connection creating deadlock.
The second bug manifested itself as starvation of FLUSH TABLES
WITH READ LOCK statements in cases when there was a constant
stream of concurrent DML statements (in two or more
connections).
This has happened because requests for protection against GRL
which were acquired by DML statements were ignoring presence of
pending GRL and thus the latter was starved.
This patch solves both these problems by re-implementing GRL
using metadata locks.
Similar to the old implementation acquisition of GRL in new
implementation is two-step. During the first step we block
all concurrent DML and DDL statements by acquiring global S
metadata lock (each DML and DDL statement acquires global IX
lock for its duration). During the second step we block commits
by acquiring global S lock in COMMIT namespace (commit code
acquires global IX lock in this namespace).
Note that unlike in old implementation acquisition of
protection against GRL in DML and DDL is semi-automatic.
We assume that any statement which should be blocked by GRL
will either open and acquires write-lock on tables or acquires
metadata locks on objects it is going to modify. For any such
statement global IX metadata lock is automatically acquired
for its duration.
The first problem is solved because waits for GRL become
visible to deadlock detector in metadata locking subsystem
and thus deadlocks like one in the first bug become impossible.
The second problem is solved because global S locks which
are used for GRL implementation are given preference over
IX locks which are acquired by concurrent DML (and we can
switch to fair scheduling in future if needed).
Important change:
FTWRL/GRL no longer blocks DML and DDL on temporary tables.
Before this patch behavior was not consistent in this respect:
in some cases DML/DDL statements on temporary tables were
blocked while in others they were not. Since the main use cases
for FTWRL are various forms of backups and temporary tables are
not preserved during backups we have opted for consistently
allowing DML/DDL on temporary tables during FTWRL/GRL.
Important change:
This patch changes thread state names which are used when
DML/DDL of FTWRL is waiting for global read lock. It is now
either "Waiting for global read lock" or "Waiting for commit
lock" depending on the stage on which FTWRL is.
Incompatible change:
To solve deadlock in events code which was exposed by this
patch we have to replace LOCK_event_metadata mutex with
metadata locks on events. As result we have to prohibit
DDL on events under LOCK TABLES.
This patch also adds extensive test coverage for interaction
of DML/DDL and FTWRL.
Performance of new and old global read lock implementations
in sysbench tests were compared. There were no significant
difference between new and old implementations.
ALTER TABLE RENAME, DISABLE KEYS.
The code of ALTER TABLE RENAME, DISABLE KEYS could
issue a commit while holding LOCK_open mutex.
This is a regression introduced by the fix for
Bug 54453.
This failed an assert guarding us against a potential
deadlock with connections trying to execute
FLUSH TABLES WITH READ LOCK.
The fix is to move acquisition of LOCK_open outside
the section that issues ha_autocommit_or_rollback().
LOCK_open is taken to protect against concurrent
operations with .frms and the table definition
cache, and doesn't need to cover the call to commit.
A test case added to innodb_mysql.test.
The patch is to be null-merged to 5.5, which
already has 54453 null-merged to it.
data dictionary confusion
On file systems with case insensitive file names, and
lower_case_table_names set to '2', the server could crash
due to a table definition cache inconsistency. This is
the default setting on MacOSX, but may also be set and
used on MS Windows.
The bug is caused by using two different strategies for
creating the hash key for the table definition cache, resulting
in failure to look up an entry which is present in the cache,
or failure to delete an existing entry. One strategy was to
use the real table name (with case preserved), and the other
to use a normalized table name (i.e a lower case version).
This is manifested in two cases. One is during 'DROP DATABASE',
where all known files are removed. The removal from
the table definition cache is done via a generated list of
TABLE_LIST with keys (wrongly) created using the case preserved
name. The other is during CREATE TABLE, where the cache lookup
is also (wrongly) based on the case preserved name.
The fix was to use only the normalized table name when
creating hash keys.