The reason for the deadlock was an improper exit from
MDL_context::wait_for_locks() which caused mysys_var->current_mutex to remain
LOCK_mdl even though LOCK_mdl was no longer held by that connection.
This could for example lead to a deadlock in the following way:
1) INSERT DELAYED tries to open a table but fails, and trying to recover it
calls wait_for_locks().
2) Due to a pending exclusive request, wait_for_locks() fails and exits without
resetting mysys_var->current_mutex for the delayed insert handler thread. So it
continues to point to LOCK_mdl.
3) The handler thread manages to open a table.
4) A different connection takes LOCK_open and tries to take LOCK_mdl.
5) FLUSH TABLES from a third connection notices that the handler thread has a
table open, and tries to kill it. This involves locking mysys_var->current_mutex
while having LOCK_open locked. Since current_mutex mistakenly points to LOCK_mdl,
we have a deadlock.
This patch makes sure MDL_EXIT_COND() is called before exiting wait_for_locks().
This clears mysys->current_mutex which resolves the issue.
An assert is added to recover_from_failed_open_table_attempt() after
wait_for_locks() is called, to check that current_mutex is indeed reset.
With this assert in place, existing tests in (e.g.) mdl_sync.test will fail
without this patch.
The reason for the deadlock was an improper exit from
MDL_context::wait_for_locks() which caused mysys_var->current_mutex to remain
LOCK_mdl even though LOCK_mdl was no longer held by that connection.
This could for example lead to a deadlock in the following way:
1) INSERT DELAYED tries to open a table but fails, and trying to recover it
calls wait_for_locks().
2) Due to a pending exclusive request, wait_for_locks() fails and exits without
resetting mysys_var->current_mutex for the delayed insert handler thread. So it
continues to point to LOCK_mdl.
3) The handler thread manages to open a table.
4) A different connection takes LOCK_open and tries to take LOCK_mdl.
5) FLUSH TABLES from a third connection notices that the handler thread has a
table open, and tries to kill it. This involves locking mysys_var->current_mutex
while having LOCK_open locked. Since current_mutex mistakenly points to LOCK_mdl,
we have a deadlock.
This patch makes sure MDL_EXIT_COND() is called before exiting wait_for_locks().
This clears mysys->current_mutex which resolves the issue.
An assert is added to recover_from_failed_open_table_attempt() after
wait_for_locks() is called, to check that current_mutex is indeed reset.
With this assert in place, existing tests in (e.g.) mdl_sync.test will fail
without this patch.
-----------------------------------------------------------
2630.28.28 Magne Mahre 2008-12-05
Bug #38661 'all threads hang in "opening tables" or "waiting for table"
and cpu is at 100%'
Concurrent execution of FLUSH TABLES statement and at least two statements
using the same table might have led to live-lock which caused all three
connections to stall and hog 100% of CPU.
tdc_wait_for_old_versions() wrongly assumed that there cannot be a share
with an old version and no used TABLE instances and thus was failing to
perform wait in situation when such old share was cached in MDL subsystem
thanks to a still active metadata lock on the table. So it might have
happened that two or more connections simultaneously executing statements
which involve table being flushed managed to prevent each other from
waiting in this function by keeping shared metadata lock on the table
constantly active (i.e. one of the statements managed to take/hold this
lock while other statements were calling tdc_wait_for_old_versions()).
Thus they were forcing each other to loop infinitely in open_tables() -
close_thread_tables_for_reopen() - tdc_wait_for_old_versions() cycle
causing CPU hogging.
This patch fixes this problem by removing this false assumption from
tdc_wait_for_old_versions().
Note that the problem is specific only for server versions >= 6.0.
No test case is submitted for this test, as the test infrastructure
hasn't got the necessary primitives to test the behaviour. The
manifestation is that throughput will decrease to a low level
(possibly 0) after some time, and stay at that level. Several
transactions will not complete.
Manual testing can be done by running the code submitted by Shane
Bester attached to the bug report. If the bug persists, the
transaction thruput will almost immediately drop to near zero
(shown as the transaction count output from the test program staying
on a close to constant value, instead of increasing rapidly).
-----------------------------------------------------------
2630.28.28 Magne Mahre 2008-12-05
Bug #38661 'all threads hang in "opening tables" or "waiting for table"
and cpu is at 100%'
Concurrent execution of FLUSH TABLES statement and at least two statements
using the same table might have led to live-lock which caused all three
connections to stall and hog 100% of CPU.
tdc_wait_for_old_versions() wrongly assumed that there cannot be a share
with an old version and no used TABLE instances and thus was failing to
perform wait in situation when such old share was cached in MDL subsystem
thanks to a still active metadata lock on the table. So it might have
happened that two or more connections simultaneously executing statements
which involve table being flushed managed to prevent each other from
waiting in this function by keeping shared metadata lock on the table
constantly active (i.e. one of the statements managed to take/hold this
lock while other statements were calling tdc_wait_for_old_versions()).
Thus they were forcing each other to loop infinitely in open_tables() -
close_thread_tables_for_reopen() - tdc_wait_for_old_versions() cycle
causing CPU hogging.
This patch fixes this problem by removing this false assumption from
tdc_wait_for_old_versions().
Note that the problem is specific only for server versions >= 6.0.
No test case is submitted for this test, as the test infrastructure
hasn't got the necessary primitives to test the behaviour. The
manifestation is that throughput will decrease to a low level
(possibly 0) after some time, and stay at that level. Several
transactions will not complete.
Manual testing can be done by running the code submitted by Shane
Bester attached to the bug report. If the bug persists, the
transaction thruput will almost immediately drop to near zero
(shown as the transaction count output from the test program staying
on a close to constant value, instead of increasing rapidly).
----------------------------------------------------
2736.2.10 Michael Widenius 2008-10-22
Fix for bug#39395 Maria: ma_extra.c:286: maria_extra:
Assertion `share->reopen == 1' failed
sql/sql_base.cc:
Race condition in wait_while_table_is_used() where a table used
by another connection could be forced closed, but there was no protection against the other thread re-opening the table and trying to lock it
again before the table was name locked by original thread.
(diagnostics_area)
Execution of CREATE TABLE ... SELECT statement was not atomic in
the sense that concurrent statements trying to affect its target
table might have sneaked in between the moment when the table was
created and moment when it was filled according to SELECT clause.
This resulted in inconsistent binary log, unexpected target table
contents. In cases when concurrent statement was a DDL statement
CREATE TABLE ... SELECT might have failed with ER_CANT_LOCK error.
In more detail:
Due to premature metadata lock downgrade which occured after CREATE
TABLE SELECT statement created table but before it managed to obtain
table-level lock on it other statements were allowed to open, lock
and change target table in the middle of CREATE TABLE SELECT
execution. This also meant that it was possible that CREATE TABLE
SELECT would wait in mysql_lock_tables() when it was called for newly
created table and that this wait could have been aborted by concurrent
DDL. The latter led to execution of unexpected branch of code and
CREATE TABLE SELECT ending with ER_CANT_LOCK error.
The premature downgrade occured because open_table(), which was called
for newly created table, decided that it is OK to downgrade metadata
lock from exclusive to shared since table exists, even although it
was not acquired within this call.
This fix ensures that open_table() does not downgrade metadata lock
if it is not acquired during its current invocation.
Testing:
The bug is exposed in a race condition, and is thus difficult to
expose in a standard mysql-test-run test case. Instead, a stress
test using the Random Query Generator (https://launchpad.net/randgen)
will trip the problem occasionally.
% perl runall.pl \
--basedir=<build dir> \
--mysqld=--table-lock-wait-timeout=5 \
--mysqld=--skip-safemalloc \
--grammar=conf/maria_bulk_insert.yy \
--reporters=ErrorLog,Backtrace,WinPackage \
--mysqld=--log-output=file \
--queries=100000 \
--threads=10 \
--engine=myisam
Note: You will need a debug build to expose the bug
When the bug is tripped, the server will abort and dump core.
Backport from 6.0-codebase (revid: 2617.53.4)
(diagnostics_area)
Execution of CREATE TABLE ... SELECT statement was not atomic in
the sense that concurrent statements trying to affect its target
table might have sneaked in between the moment when the table was
created and moment when it was filled according to SELECT clause.
This resulted in inconsistent binary log, unexpected target table
contents. In cases when concurrent statement was a DDL statement
CREATE TABLE ... SELECT might have failed with ER_CANT_LOCK error.
In more detail:
Due to premature metadata lock downgrade which occured after CREATE
TABLE SELECT statement created table but before it managed to obtain
table-level lock on it other statements were allowed to open, lock
and change target table in the middle of CREATE TABLE SELECT
execution. This also meant that it was possible that CREATE TABLE
SELECT would wait in mysql_lock_tables() when it was called for newly
created table and that this wait could have been aborted by concurrent
DDL. The latter led to execution of unexpected branch of code and
CREATE TABLE SELECT ending with ER_CANT_LOCK error.
The premature downgrade occured because open_table(), which was called
for newly created table, decided that it is OK to downgrade metadata
lock from exclusive to shared since table exists, even although it
was not acquired within this call.
This fix ensures that open_table() does not downgrade metadata lock
if it is not acquired during its current invocation.
Testing:
The bug is exposed in a race condition, and is thus difficult to
expose in a standard mysql-test-run test case. Instead, a stress
test using the Random Query Generator (https://launchpad.net/randgen)
will trip the problem occasionally.
% perl runall.pl \
--basedir=<build dir> \
--mysqld=--table-lock-wait-timeout=5 \
--mysqld=--skip-safemalloc \
--grammar=conf/maria_bulk_insert.yy \
--reporters=ErrorLog,Backtrace,WinPackage \
--mysqld=--log-output=file \
--queries=100000 \
--threads=10 \
--engine=myisam
Note: You will need a debug build to expose the bug
When the bug is tripped, the server will abort and dump core.
Backport from 6.0-codebase (revid: 2617.53.4)
2630.16.14 Sergei Golubchik 2008-08-25
fixed a crash in partition tests
introduced by HA_EXTRA_PREPARE_FOR_DROP patch
sql/sql_base.cc:
Don't call ::extra() for closed tables.
Bug #46654 False deadlock on concurrent DML/DDL with partitions,
inconsistent behavior
The problem was that if one connection is running a multi-statement
transaction which involves a single partitioned table, and another
connection attempts to alter the table, the first connection gets
ER_LOCK_DEADLOCK and cannot proceed anymore, even when the ALTER TABLE
statement in another connection has timed out or failed.
The reason for this was that the prepare phase for ALTER TABLE for
partitioned tables removed all instances of the table from the table
definition cache before it started waiting on the lock. The transaction
running in the first connection would notice this and report ER_LOCK_DEADLOCK.
This patch changes the prep_alter_part_table() ALTER TABLE code so that
tdc_remove_table() is no longer called. Instead, only the TABLE instance
changed by prep_alter_part_table() is marked as needing reopen.
The patch also removes an unnecessary call to tdc_remove_table() from
mysql_unpack_partition() as the changed TABLE object is destroyed by the
caller at a later point.
Test case added in partition_sync.test.
Bug #46654 False deadlock on concurrent DML/DDL with partitions,
inconsistent behavior
The problem was that if one connection is running a multi-statement
transaction which involves a single partitioned table, and another
connection attempts to alter the table, the first connection gets
ER_LOCK_DEADLOCK and cannot proceed anymore, even when the ALTER TABLE
statement in another connection has timed out or failed.
The reason for this was that the prepare phase for ALTER TABLE for
partitioned tables removed all instances of the table from the table
definition cache before it started waiting on the lock. The transaction
running in the first connection would notice this and report ER_LOCK_DEADLOCK.
This patch changes the prep_alter_part_table() ALTER TABLE code so that
tdc_remove_table() is no longer called. Instead, only the TABLE instance
changed by prep_alter_part_table() is marked as needing reopen.
The patch also removes an unnecessary call to tdc_remove_table() from
mysql_unpack_partition() as the changed TABLE object is destroyed by the
caller at a later point.
Test case added in partition_sync.test.
Bug#42546 Backup: RESTORE fails, thinking it finds an existing table
The problem occured when a MDL locking conflict happened for a non-existent
table between a CREATE and a INSERT statement. The code for CREATE
interpreted this lock conflict to mean that the table existed,
which meant that the statement failed when it should not have.
The problem could occur for CREATE TABLE, CREATE TABLE LIKE and
ALTER TABLE RENAME.
This patch fixes the problem for CREATE TABLE and CREATE TABLE LIKE.
It is based on code backported from the mysql-6.1-fk tree written
by Dmitry Lenev. CREATE now uses normal open_and_lock_tables() code
to acquire exclusive locks. This means that for the test case in the bug
description, CREATE will wait until INSERT completes so that it can
get the exclusive lock. This resolves the reported bug.
The patch also prohibits CREATE TABLE and CREATE TABLE LIKE under
LOCK TABLES. Note that this is an incompatible change and must
be reflected in the documentation. Affected test cases have been
updated.
mdl_sync.test contains tests for CREATE TABLE and CREATE TABLE LIKE.
Fixing the issue for ALTER TABLE RENAME is beyond the scope of this
patch. ALTER TABLE cannot be prohibited from working under LOCK TABLES
as this could seriously impact customers and a proper fix would require
a significant rewrite.
Bug#42546 Backup: RESTORE fails, thinking it finds an existing table
The problem occured when a MDL locking conflict happened for a non-existent
table between a CREATE and a INSERT statement. The code for CREATE
interpreted this lock conflict to mean that the table existed,
which meant that the statement failed when it should not have.
The problem could occur for CREATE TABLE, CREATE TABLE LIKE and
ALTER TABLE RENAME.
This patch fixes the problem for CREATE TABLE and CREATE TABLE LIKE.
It is based on code backported from the mysql-6.1-fk tree written
by Dmitry Lenev. CREATE now uses normal open_and_lock_tables() code
to acquire exclusive locks. This means that for the test case in the bug
description, CREATE will wait until INSERT completes so that it can
get the exclusive lock. This resolves the reported bug.
The patch also prohibits CREATE TABLE and CREATE TABLE LIKE under
LOCK TABLES. Note that this is an incompatible change and must
be reflected in the documentation. Affected test cases have been
updated.
mdl_sync.test contains tests for CREATE TABLE and CREATE TABLE LIKE.
Fixing the issue for ALTER TABLE RENAME is beyond the scope of this
patch. ALTER TABLE cannot be prohibited from working under LOCK TABLES
as this could seriously impact customers and a proper fix would require
a significant rewrite.
------------------------------------------------------------
revno: 2617.68.25
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-next-bg-pre2-2
timestamp: Wed 2009-09-16 18:26:50 +0400
message:
Follow-up for one of pre-requisite patches for fixing bug #30977
"Concurrent statement using stored function and DROP FUNCTION
breaks SBR".
Made enum_mdl_namespace enum part of MDL_key class and removed MDL_
prefix from the names of enum members. In order to do the latter
changed name of PROCEDURE symbol to PROCEDURE_SYM (otherwise macro
which was automatically generated for this symbol conflicted with
MDL_key::PROCEDURE enum member).
------------------------------------------------------------
revno: 2617.68.25
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-next-bg-pre2-2
timestamp: Wed 2009-09-16 18:26:50 +0400
message:
Follow-up for one of pre-requisite patches for fixing bug #30977
"Concurrent statement using stored function and DROP FUNCTION
breaks SBR".
Made enum_mdl_namespace enum part of MDL_key class and removed MDL_
prefix from the names of enum members. In order to do the latter
changed name of PROCEDURE symbol to PROCEDURE_SYM (otherwise macro
which was automatically generated for this symbol conflicted with
MDL_key::PROCEDURE enum member).
------------------------------------------------------------
revno: 2617.68.24
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-next-bg-pre2-2
timestamp: Wed 2009-09-16 17:25:29 +0400
message:
Pre-requisite patch for fixing bug #30977 "Concurrent statement
using stored function and DROP FUNCTION breaks SBR".
Added MDL_request for stored routine as member to Sroutine_hash_entry
in order to be able perform metadata locking for stored routines in
future (Sroutine_hash_entry is an equivalent of TABLE_LIST class for
stored routines).
(WL#4284, follow up fixes).
sql/mdl.cc:
Introduced version of MDL_request::init() method which initializes
lock request using pre-built MDL key.
MDL_key::table_name/table_name_length() getters were
renamed to reflect the fact that MDL_key objects are
now created not only for tables.
sql/mdl.h:
Extended enum_mdl_namespace enum with values which correspond
to namespaces for stored functions and triggers.
Renamed MDL_key::table_name/table_name_length() getters
to MDL_key::name() and name_length() correspondingly to
reflect the fact that MDL_key objects are now created
not only for tables.
Added MDL_key::mdl_namespace() getter.
Also added version of MDL_request::init() method which
initializes lock request using pre-built MDL key.
sql/sp.cc:
Added MDL_request for stored routine as member to Sroutine_hash_entry.
Changed code to use MDL_key from this request as a key for LEX::sroutines
set. Removed separate "key" member from Sroutine_hash_entry as it became
unnecessary.
sql/sp.h:
Added MDL_request for stored routine as member to Sroutine_hash_entry
in order to be able perform metadata locking for stored routines in
future (Sroutine_hash_entry is an equivalent of TABLE_LIST class for
stored routines).
Removed Sroutine_hash_entry::key member as now we can use MDL_key from
this request as a key for LEX::sroutines set.
sql/sp_head.cc:
Removed sp_name::m_sroutines_key member and set_routine_type() method.
Since key for routine in LEX::sroutines set has no longer sp_name::m_qname
as suffix we won't save anything by creating it at sp_name construction
time.
Adjusted sp_name constructor used for creating temporary objects for
lookups in SP-cache to accept MDL_key as parameter and to avoid any
memory allocation.
Finally, removed sp_head::m_soutines_key member for reasons similar
to why sp_name::m_sroutines_key was removed
sql/sp_head.h:
Removed sp_name::m_sroutines_key member and set_routine_type() method.
Since key for routine in LEX::sroutines set has no longer sp_name::m_qname
as suffix we won't save anything by creating it at sp_name construction
time.
Adjusted sp_name constructor used for creating temporary objects for
lookups in SP-cache to accept MDL_key as parameter and to avoid any
memory allocation.
Finally, removed sp_head::m_soutines_key member for reasons similar
to why sp_name::m_sroutines_key was removed.
sql/sql_base.cc:
Adjusted code to the fact that we now use MDL_key from
Sroutine_hash_entry::mdl_request as a key for LEX::sroutines set.
MDL_key::table_name/table_name_length() getters were
renamed to reflect the fact that MDL_key objects are
now created not only for tables.
sql/sql_trigger.cc:
sp_add_used_routine() now takes MDL_key as parameter as now we use
instance of this class as a key for LEX::sroutines set.
------------------------------------------------------------
revno: 2617.68.24
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-next-bg-pre2-2
timestamp: Wed 2009-09-16 17:25:29 +0400
message:
Pre-requisite patch for fixing bug #30977 "Concurrent statement
using stored function and DROP FUNCTION breaks SBR".
Added MDL_request for stored routine as member to Sroutine_hash_entry
in order to be able perform metadata locking for stored routines in
future (Sroutine_hash_entry is an equivalent of TABLE_LIST class for
stored routines).
(WL#4284, follow up fixes).
revno: 2617.68.23
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-next-bg-pre1
timestamp: Wed 2009-09-16 09:34:42 +0400
message:
Pre-requisite patch for fixing bug #30977 "Concurrent statement
using stored function and DROP FUNCTION breaks SBR".
CREATE TABLE SELECT statements take exclusive metadata lock on table
being created. Invariant of metadata locking subsystem states that
such lock should be taken before taking any kind of shared locks.
Once metadata locks on stored routines are introduced statements like
"CREATE TABLE ... SELECT f1()" will break this invariant by taking
shared locks on routines before exclusive lock on target table.
To avoid this, open_tables() is reworked to process tables which are
directly used by the statement before stored routines are processed.
sql/sql_base.cc:
Refactored open_tables() implementation to process stored routines
only after tables which are directly used by statement were processed.
To achieve this moved handling of routines in open_tables() out of
loop which iterates over tables to a new separate loop. And in its
turn this allowed to split handling of particular table or view to
an auxiliary function, which made code in open_tables() simpler and
more easy to understand.
revno: 2617.68.23
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-next-bg-pre1
timestamp: Wed 2009-09-16 09:34:42 +0400
message:
Pre-requisite patch for fixing bug #30977 "Concurrent statement
using stored function and DROP FUNCTION breaks SBR".
CREATE TABLE SELECT statements take exclusive metadata lock on table
being created. Invariant of metadata locking subsystem states that
such lock should be taken before taking any kind of shared locks.
Once metadata locks on stored routines are introduced statements like
"CREATE TABLE ... SELECT f1()" will break this invariant by taking
shared locks on routines before exclusive lock on target table.
To avoid this, open_tables() is reworked to process tables which are
directly used by the statement before stored routines are processed.
------------------------------------------------------------
revno: 2617.68.10
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-next-bg46673
timestamp: Tue 2009-09-01 19:57:05 +0400
message:
Fix for bug #46673 "Deadlock between FLUSH TABLES WITH READ LOCK and DML".
Deadlocks occured when one concurrently executed transactions with
several statements modifying data and FLUSH TABLES WITH READ LOCK
statement or SET READ_ONLY=1 statement.
These deadlocks were introduced by the patch for WL 4284: "Transactional
DDL locking"/Bug 989: "If DROP TABLE while there's an active transaction,
wrong binlog order" which has changed FLUSH TABLES WITH READ LOCK/SET
READ_ONLY=1 to wait for pending transactions.
What happened was that FLUSH TABLES WITH READ LOCK blocked all further
statements changing tables by setting global_read_lock global variable
and has started waiting for all pending transactions to complete.
Then one of those transactions tried to executed DML, detected that
global_read_lock non-zero and tried to wait until global read lock will
be released (i.e. global_read_lock becomes 0), indeed, this led to a
deadlock.
Proper solution for this problem should probably involve full integration
of global read lock with metadata locking subsystem (which will allow to
implement waiting for pending transactions without blocking DML in them).
But since it requires significant changes another, short-term solution
for the problem is implemented in this patch.
Basically, this patch restores behavior of FLUSH TABLES WITH READ LOCK/
SET READ_ONLY=1 before the patch for WL 4284/bug 989. By ensuring that
extra references to TABLE_SHARE are not stored for active metadata locks
it changes these statements not to wait for pending transactions.
As result deadlock is eliminated.
Note that this does not change the fact that active FLUSH TABLES WITH
READ LOCK lock or SET READ_ONLY=1 prevent modifications to tables as
they also block transaction commits.
mysql-test/r/flush_block_commit.result:
Adjusted test case after change in FLUSH TABLES WITH READ LOCK behavior
- it is no longer blocked by a pending transaction.
mysql-test/r/mdl_sync.result:
Added test for bug #46673 "Deadlock between FLUSH TABLES WITH READ LOCK
and DML".
mysql-test/r/read_only_innodb.result:
Adjusted test case after change in SET READ_ONLY behavior - it is no
longer blocked by a pending transaction.
mysql-test/t/flush_block_commit.test:
Adjusted test case after change in FLUSH TABLES WITH READ LOCK behavior
- it is no longer blocked by a pending transaction.
mysql-test/t/mdl_sync.test:
Added test for bug #46673 "Deadlock between FLUSH TABLES WITH READ LOCK
and DML".
mysql-test/t/read_only_innodb.test:
Adjusted test case after change in SET READ_ONLY behavior - it is no
longer blocked by a pending transaction.
sql/sql_base.cc:
Disable caching of pointers to TABLE_SHARE objects in MDL subsystem.
This means that transactions holding metadata lock on the table will
no longer have extra reference to the TABLE_SHARE (due to this lock)
and will no longer block concurrent FLUSH TABLES/FLUSH TABLES WITH
READ LOCK. Note that this does not change the fact that FLUSH TABLES
WITH READ LOCK prevents concurrent transactions from modifying data
as it also blocks all commits.
------------------------------------------------------------
revno: 2617.68.10
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-next-bg46673
timestamp: Tue 2009-09-01 19:57:05 +0400
message:
Fix for bug #46673 "Deadlock between FLUSH TABLES WITH READ LOCK and DML".
Deadlocks occured when one concurrently executed transactions with
several statements modifying data and FLUSH TABLES WITH READ LOCK
statement or SET READ_ONLY=1 statement.
These deadlocks were introduced by the patch for WL 4284: "Transactional
DDL locking"/Bug 989: "If DROP TABLE while there's an active transaction,
wrong binlog order" which has changed FLUSH TABLES WITH READ LOCK/SET
READ_ONLY=1 to wait for pending transactions.
What happened was that FLUSH TABLES WITH READ LOCK blocked all further
statements changing tables by setting global_read_lock global variable
and has started waiting for all pending transactions to complete.
Then one of those transactions tried to executed DML, detected that
global_read_lock non-zero and tried to wait until global read lock will
be released (i.e. global_read_lock becomes 0), indeed, this led to a
deadlock.
Proper solution for this problem should probably involve full integration
of global read lock with metadata locking subsystem (which will allow to
implement waiting for pending transactions without blocking DML in them).
But since it requires significant changes another, short-term solution
for the problem is implemented in this patch.
Basically, this patch restores behavior of FLUSH TABLES WITH READ LOCK/
SET READ_ONLY=1 before the patch for WL 4284/bug 989. By ensuring that
extra references to TABLE_SHARE are not stored for active metadata locks
it changes these statements not to wait for pending transactions.
As result deadlock is eliminated.
Note that this does not change the fact that active FLUSH TABLES WITH
READ LOCK lock or SET READ_ONLY=1 prevent modifications to tables as
they also block transaction commits.
------------------------------------------------------------
revno: 2617.68.7
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-next-bg46044
timestamp: Thu 2009-08-27 10:22:17 +0400
message:
Fix for bug #46044 "MDL deadlock on LOCK TABLE + CREATE TABLE HIGH_PRIORITY
FOR UPDATE".
Deadlock occured when during execution of query to I_S we tried to open
a table or its .FRM in order to get information about it and had to wait
because we have encountered exclusive metadata lock on this table held by
a DDL operation from another connection which in its turn waited for some
resource currently owned by connection executing this I_S query.
For example, this might have happened if one under LOCK TABLES executed I_S
query targeted to particular table (which was not among locked) and also
concurrently tried to create this table using CREATE TABLE SELECT which
had to wait for one of tables locked by the first connection.
Another situation in which deadlock might have occured is when I_S query,
which was executed as part of transaction, tried to get information about
table which just has been dropped by concurrent DROP TABLES executed under
LOCK TABLES and this DROP TABLES for its completion also had to wait
transaction from the first connection.
This problem stemmed from the fact that opening of tables/.FRMs for I_S
filling is happening outside of connection's main MDL_context so code
which tries to detect deadlocks due to conflicting metadata locks doesn't
work in this case. Indeed, this led to deadlocks when during I_S filling
we tried to wait for conflicting metadata lock to go away, while its owner
was waiting for some resource held by connection executing I_S query.
This patch solves this problem by avoiding waiting in such situation.
Instead we skip this table and produce warning that information about
it was omitted from I_S due to concurrent DDL operation. We still wait
for conflicting metadata lock to go away when it is known that deadlock
is not possible (i.e. when connection executing I_S query does not hold
any metadata or table-level locks).
Basically, we apply our standard deadlock avoidance technique for metadata
locks to the process of filling of I_S tables but replace ER_LOCK_DEADLOCK
error with a warning.
Note that this change is supposed to be safe for 'mysqldump' since the
only its mode which is affected by this change is --single-transaction mode
is not safe in the presence of concurrent DDL anyway (and this fact is
documented). Other modes are unaffected because they either use
SHOW TABLES/SELECT * FROM I_S.TABLE_NAMES which do not take any metadata
locks in the process of I_S table filling and thus cannot skip tables or
execute I_S queries for tables which were previously locked by LOCK TABLES
(or in the presence of global read lock) which excludes possibility of
encountering conflicting metadata lock.
mysql-test/r/mdl_sync.result:
Added test for bug #46044 "MDL deadlock on LOCK TABLE + CREATE TABLE
HIGH_PRIORITY FOR UPDATE".
mysql-test/t/mdl_sync.test:
Added test for bug #46044 "MDL deadlock on LOCK TABLE + CREATE TABLE
HIGH_PRIORITY FOR UPDATE".
sql/mysql_priv.h:
Added a new flag for open_table() call which allows it to fail
with an error in cases when conflicting metadata lock is discovered
instead of waiting until this lock goes away.
sql/share/errmsg-utf8.txt:
Added error/warning message to be generated in cases when information
about table is omitted from I_S since there is conflicting metadata lock
on the table.
sql/share/errmsg.txt:
Added error/warning message to be generated in cases when information
about table is omitted from I_S since there is conflicting metadata lock
on the table.
sql/sql_base.cc:
Added a new flag for open_table() call which allows it to fail
with an error in cases when conflicting metadata lock is discovered
instead of waiting until this lock goes away.
sql/sql_show.cc:
When we are opening a table (or just .FRM) in order to fill I_S with
information about this table and encounter conflicting metadata lock
waiting for this lock to go away can lead to a deadlock in some
situations (under LOCK TABLES, within transaction, etc.). To avoid
these deadlocks we detect such situations and don't do waiting.
Instead, we skip table for which we have conflicting metadata lock,
thus omitting information about it from I_S table, and produce an
appropriate warning.
------------------------------------------------------------
revno: 2617.68.7
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-next-bg46044
timestamp: Thu 2009-08-27 10:22:17 +0400
message:
Fix for bug #46044 "MDL deadlock on LOCK TABLE + CREATE TABLE HIGH_PRIORITY
FOR UPDATE".
Deadlock occured when during execution of query to I_S we tried to open
a table or its .FRM in order to get information about it and had to wait
because we have encountered exclusive metadata lock on this table held by
a DDL operation from another connection which in its turn waited for some
resource currently owned by connection executing this I_S query.
For example, this might have happened if one under LOCK TABLES executed I_S
query targeted to particular table (which was not among locked) and also
concurrently tried to create this table using CREATE TABLE SELECT which
had to wait for one of tables locked by the first connection.
Another situation in which deadlock might have occured is when I_S query,
which was executed as part of transaction, tried to get information about
table which just has been dropped by concurrent DROP TABLES executed under
LOCK TABLES and this DROP TABLES for its completion also had to wait
transaction from the first connection.
This problem stemmed from the fact that opening of tables/.FRMs for I_S
filling is happening outside of connection's main MDL_context so code
which tries to detect deadlocks due to conflicting metadata locks doesn't
work in this case. Indeed, this led to deadlocks when during I_S filling
we tried to wait for conflicting metadata lock to go away, while its owner
was waiting for some resource held by connection executing I_S query.
This patch solves this problem by avoiding waiting in such situation.
Instead we skip this table and produce warning that information about
it was omitted from I_S due to concurrent DDL operation. We still wait
for conflicting metadata lock to go away when it is known that deadlock
is not possible (i.e. when connection executing I_S query does not hold
any metadata or table-level locks).
Basically, we apply our standard deadlock avoidance technique for metadata
locks to the process of filling of I_S tables but replace ER_LOCK_DEADLOCK
error with a warning.
Note that this change is supposed to be safe for 'mysqldump' since the
only its mode which is affected by this change is --single-transaction mode
is not safe in the presence of concurrent DDL anyway (and this fact is
documented). Other modes are unaffected because they either use
SHOW TABLES/SELECT * FROM I_S.TABLE_NAMES which do not take any metadata
locks in the process of I_S table filling and thus cannot skip tables or
execute I_S queries for tables which were previously locked by LOCK TABLES
(or in the presence of global read lock) which excludes possibility of
encountering conflicting metadata lock.
------------------------------------------------------------
revno: 2617.69.37
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-next-bg46748
timestamp: Fri 2009-08-21 18:17:02 +0400
message:
Fix for bug #46748 "Assertion in MDL_context::wait_for_locks()
on INSERT + CREATE TRIGGER".
Concurrent execution of statements involving stored functions or triggers
which were using several tables and DDL statements which affected those
tables on debug build of server might have led to assertion failures in
MDL_context::wait_for_locks(). Non-debug build was not affected.
The problem was that during back-off which happens when open_tables()
encounters conflicting metadata lock for one of the tables being open
we didn't reset MDL_request::ticket value for requests which correspond
to tables from extended prelocking set. Since these requests are part
of of list of requests to be waited for in Open_table_context this broke
assumption that ticket value for them is 0 in MDL_context::wait_for_locks()
and caused assertion failure.
This fix ensures that close_tables_for_reopen(), which performs this back-off
resets MDL_request::ticket value not only for tables directly used by the
statement but also for tables from extended prelocking set, thus satisfying
assumption described above.
mysql-test/r/mdl_sync.result:
Added test case for bug #46748 "Assertion in MDL_context::wait_for_locks()
on INSERT + CREATE TRIGGER".
mysql-test/t/mdl_sync.test:
Added test case for bug #46748 "Assertion in MDL_context::wait_for_locks()
on INSERT + CREATE TRIGGER".
sql/sql_base.cc:
Since metadata lock requests for tables from extended part of prelocking
set are also part of list of requests to be waited for in Open_table_context
in close_tables_for_reopen() we have to reset MDL_request::ticket
values for them to assumptions in MDL_context::wait_for_locks().