- A prerequisite cleanup patch for making KILL reliable.
The test case main.kill did not work reliably.
The following problems have been identified:
1. A kill signal could go lost if it came in, short before a
thread went reading on the client connection.
2. A kill signal could go lost if it came in, short before a
thread went waiting on a condition variable.
These problems have been solved as follows. Please see also
added code comments for more details.
1. There is no safe way to detect, when a thread enters the
blocking state of a read(2) or recv(2) system call, where it
can be interrupted by a signal. Hence it is not possible to
wait for the right moment to send a kill signal. It has been
decided, not to fix it in the code. Instead, the test case
repeats the KILL statement until the connection terminates.
2. Before waiting on a condition variable, we register it
together with a synchronizating mutex in THD::mysys_var. After
this, we need to test THD::killed again. At some places we did
only test it in a loop condition before the registration. When
THD::killed had been set between this test and the registration,
we entered waiting without noticing the killed flag. Additional
checks ahve been introduced where required.
In addition to the above, a re-write of the main.kill test
case has been done. All sleeps have been replaced by Debug
Sync Facility synchronization. A couple of sync points have
been added to the server code.
To avoid further problems, if the test case fails in spite of
the fixes, the test case has been added to the "experimental"
list for now.
- Most of the work on this patch is authored by Ingo Struewing
FLUSH TABLES <list> WITH READ LOCK are incompatible" to
be pushed as separate patch.
Replaced thread state name "Waiting for table", which was
used by threads waiting for a metadata lock or table flush,
with a set of names which better reflect types of resources
being waited for.
Also replaced "Table lock" thread state name, which was used
by threads waiting on thr_lock.c table level lock, with more
elaborate "Waiting for table level lock", to make it
more consistent with other thread state names.
Updated test cases and their results according to these
changes.
Fixed sys_vars.query_cache_wlock_invalidate_func test to not
to wait for timeout of wait_condition.inc script.
TABLES <list> WITH READ LOCK are incompatible".
The problem was that FLUSH TABLES <list> WITH READ LOCK
which was issued when other connection has acquired global
read lock using FLUSH TABLES WITH READ LOCK was blocked
and has to wait until global read lock is released.
This issue stemmed from the fact that FLUSH TABLES <list>
WITH READ LOCK implementation has acquired X metadata locks
on tables to be flushed. Since these locks required acquiring
of global IX lock this statement was incompatible with global
read lock.
This patch addresses problem by using SNW metadata type of
lock for tables to be flushed by FLUSH TABLES <list> WITH
READ LOCK. It is OK to acquire them without global IX lock
as long as we won't try to upgrade those locks. Since SNW
locks allow concurrent statements using same table FLUSH
TABLE <list> WITH READ LOCK now has to wait until old
versions of tables to be flushed go away after acquiring
metadata locks. Since such waiting can lead to deadlock
MDL deadlock detector was extended to take into account
waits for flush and resolve such deadlocks.
As a bonus code in open_tables() which was responsible for
waiting old versions of tables to go away was refactored.
Now when we encounter old version of table in open_table()
we don't back-off and wait for all old version to go away,
but instead wait for this particular table to be flushed.
Such approach supported by deadlock detection should reduce
number of scenarios in which FLUSH TABLES aborts concurrent
multi-statement transactions.
Note that active FLUSH TABLES <list> WITH READ LOCK still
blocks concurrent FLUSH TABLES WITH READ LOCK statement
as the former keeps tables open and thus prevents the
latter statement from doing flush.
------------------------------------------------------------
revno: 2630.4.7
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-6.0-3726-w
timestamp: Sun 2008-05-25 11:19:02 +0400
message:
WL#3726 "DDL locking for all metadata objects".
Fixed silly mistake in test case which caused sporadic
kill.test failures.
Backport of:
------------------------------------------------------------
revno: 2630.4.1
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-6.0-3726-w
timestamp: Fri 2008-05-23 17:54:03 +0400
message:
WL#3726 "DDL locking for all metadata objects".
After review fixes in progress.
------------------------------------------------------------
This is the first patch in series. It transforms the metadata
locking subsystem to use a dedicated module (mdl.h,cc). No
significant changes in the locking protocol.
The import passes the test suite with the exception of
deprecated/removed 6.0 features, and MERGE tables. The latter
are subject to a fix by WL#4144.
Unfortunately, the original changeset comments got lost in a merge,
thus this import has its own (largely insufficient) comments.
This patch fixes Bug#25144 "replication / binlog with view breaks".
Warning: this patch introduces an incompatible change:
Under LOCK TABLES, it's no longer possible to FLUSH a table that
was not locked for WRITE.
Under LOCK TABLES, it's no longer possible to DROP a table or
VIEW that was not locked for WRITE.
******
Backport of:
------------------------------------------------------------
revno: 2630.4.2
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-6.0-3726-w
timestamp: Sat 2008-05-24 14:03:45 +0400
message:
WL#3726 "DDL locking for all metadata objects".
After review fixes in progress.
******
Backport of:
------------------------------------------------------------
revno: 2630.4.3
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-6.0-3726-w
timestamp: Sat 2008-05-24 14:08:51 +0400
message:
WL#3726 "DDL locking for all metadata objects"
Fixed failing Windows builds by adding mdl.cc to the lists
of files needed to build server/libmysqld on Windows.
******
Backport of:
------------------------------------------------------------
revno: 2630.4.4
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-6.0-3726-w
timestamp: Sat 2008-05-24 21:57:58 +0400
message:
WL#3726 "DDL locking for all metadata objects".
Fix for assert failures in kill.test which occured when one
tried to kill ALTER TABLE statement on merge table while it
was waiting in wait_while_table_is_used() for other connections
to close this table.
These assert failures stemmed from the fact that cleanup code
in this case assumed that temporary table representing new
version of table was open with adding to THD::temporary_tables
list while code which were opening this temporary table wasn't
always fulfilling this.
This patch changes code that opens new version of table to
always do this linking in. It also streamlines cleanup process
for cases when error occurs while we have new version of table
open.
******
WL#3726 "DDL locking for all metadata objects"
Add libmysqld/mdl.cc to .bzrignore.
******
Backport of:
------------------------------------------------------------
revno: 2630.4.6
committer: Dmitry Lenev <dlenev@mysql.com>
branch nick: mysql-6.0-3726-w
timestamp: Sun 2008-05-25 00:33:22 +0400
message:
WL#3726 "DDL locking for all metadata objects".
Addition to the fix of assert failures in kill.test caused by
changes for this worklog.
Make sure we close the new table only once.
If a thread is killed in the server, we throw "shutdown" only if one is actually in
progress; otherwise, we throw "query interrupted".
Control-C in the mysql command-line client is "incremental" now.
First Control-C sends KILL QUERY (when connected to 5.0+ server, otherwise, see next)
Next Control-C sends KILL CONNECTION
Next Control-C aborts client.
As the first two steps only pertain to an existing query,
Control-C will abort the client right away if no query is running.
client will give more detailed/consistent feedback on Control-C now.
The problem is that since MyISAM's concurrent_insert is on by
default some concurrent SELECT statements might not see changes
made by INSERT statements in other connections, even if the
INSERT statement has returned.
The solution is to disable concurrent_insert so that INSERT
statements returns after the data is actually visible to other
statements.
different error code depending on platform.
On Mac OS X, KILL statement issued to kill the current
connection would return a different error code and message than on
other platforms ('MySQL server has gone away' instead of 'Shutdown
in progress').
The reason for this difference was that on Mac OS X we have macro
SIGNAL_WITH_VIO_CLOSE defined. This macro forces KILL
implementation to close the communication socket of the thread
that is being killed. SIGNAL_WITH_VIO_CLOSE macro is defined on
platforms where just sending a signal is not a reliable mechanism
to interrupt the thread from sleeping on a blocking system call.
In a nutshell, closing the socket is a hack to work around an
operating system bug and awake the blocked thread no matter what.
However, if the thread that is being killed is the same
thread that issued KILL statement, closing the socket leads to a
prematurely lost connection. At the same time it is not necessary
to close the socket in this case, since the thread in question
is not inside a blocking system call.
The fix, therefore, is to not close the socket if the thread that
is being killed is the same that issued KILL statement, even with
defined SIGNAL_WITH_VIO_CLOSE.
The reason the "reap;" succeeds unexpectedly is because the query was completing(almost always) and the network buffer was big enough to store the query result (sometimes) on Windows, meaning the response was completely sent before the server thread could be killed.
Therefore we use a much longer running query that doesn't have a chance to fully complete before the reap happens, testing the kill properly.
mysqld crashed when a long-running explain query was killed from
another connection.
When the current thread caught a kill signal executing the function
best_extension_by_limited_search it just silently returned to
the calling function greedy_search without initializing elements of
the join->best_positions array.
However, the greedy_search function ignored thd->killed status
after a calls to the best_extension_by_limited_search function, and
after several calls the greedy_search function used an uninitialized
data from the join->best_positions[idx] to search position in the
join->best_ref array.
That search failed, and greedy_search tried to call swap_variables
function with NULL argument - that caused a crash.
If a stored function or a trigger was killed it had aborted but no error
was thrown. This allows the caller statement to continue without a notice.
This may lead to a wrong data being inserted/updated to/deleted as in such
cases the correct result of a stored function isn't guaranteed. In the case
of triggers it allows the caller statement to ignore kill signal and to
waste time because of re-evaluation of triggers that always will fail
because thd->killed flag is still on.
Now the Item_func_sp::execute() and the sp_head::execute_trigger() functions
check whether a function or a trigger were killed during execution and
throws an appropriate error if so.
Now the fill_record() function stops filling record if an error was reported
through thd->net.report_error.
Events: crash with procedure which alters events with function
Post-review CS
This fix also changes the handling of KILL command combined with
subquery. It changes the error message given back to "not supported",
from parse error. The error for CREATE|ALTER EVENT has also been changed
to generate "not supported yet" instead of parse error.
In case of a SP call, the error is "not supported yet". This change
cleans the parser from code which should not belong to there. Still
LEX::expr_allows_subselect is existant because it simplifies the handling
of SQLCOM_HA_READ which forbids subselects.
- Fixed tests
- Optimized new code
- Fixed some unlikely core dumps
- Better bug fixes for:
- #14397 - OPTIMIZE TABLE with an open HANDLER causes a crash
- #14850 (ERROR 1062 when a quering a view using a Group By on a column that can be null
mysql-test-run runs with --sleep=10; otherwise GET_LOCK() times out
before being killed so we get 0 instead of NULL. Verified that it
works on our powermacg5 where the test was failing.