THREAD POOLING STRESS TEST
PROBLEM:
Connection stress tests which consists of concurrent
kill connections interleaved with mysql ping queries
cause the mysqld server which uses thread pool scheduler
to crash.
FIX:
Killing a connection involves shutdown and close of client
socket and this can cause EPOLLHUP(or EPOLLERR) events to be
to be queued and handled after disarming and cleanup of
of the connection object (THD) is being done.We disarm the
the connection by modifying the epoll mask to zero which
ensure no events come and release the ownership of waiting
thread that collect events and then do the cleanup of THD.
object.As per the linux kernel epoll source code (
http://lxr.linux.no/linux+*/fs/eventpoll.c#L1771), EPOLLHUP
(or EPOLLERR) can't be masked even if we set EPOLL mask
to zero. So we disarm the connection and thus prevent
execution of any query processing handler/queueing to
client ctx. queue by removing the client fd from the epoll
set via EPOLL_CTL_DEL. Also there is a race condition which
involve the following threads:
1) Thread X executing KILL CONNECTION Y and is in THD::awake
and using mysys_var (holding LOCK_thd_data).
2) Thread Y in tp_process_event executing and is being killed.
3) Thread Z receives KILL flag internally and possible call
the tp_thd_cleanup function which set thread session variable
and changing mysys_var.
The fix for the above race is to set thread session variable
under LOCK_thd_data.
We also do not call THD::awake if we found the thread in the
thread list that is to be killed but it's KILL_CONNECTION flag
set thus avoiding any possible concurrent cleanup. This patch
is approved by Mikael Ronstrom via email review.
The use of Thread_iterator did not work on windows (linking problems).
Solution: Change the interface between the thread_pool and the server
to only use simple free functions.
This patch is for 5.5 only (mimicks similar solution in 5.6)
PROBLEM:
mysql provides a feature where in a session which is
idle for a period specified by the wait_timeout variable
(whose value is in seconds), the session is closed
This feature is not present when we use thread pool.
FIX:
This patch implements the interface functions which is
required to implement the wait_timeout functionality
in the thread pool plugin.
The types mysql_event_general/mysql_event_connection are
being cast to the incompatible type mysql_event. The way
mysql_event and the other types are designed are prone to
strict aliasing violations and can break things depending
on how compilers optimizes this code.
This patch fixes audit interface, so it confirms to strict-
aliasing rules. It introduces incompatible changes to audit
interface:
- mysql_event type has been removed;
- event_class has been removed from mysql_event_generic and
mysql_event_connection types;
- st_mysql_audit::event_notify() second argument is event_class;
- st_mysql_audit::event_notify() third argument is event of type
(const void *).
"Writing Audit Plugins" section of manual should be updated:
http://dev.mysql.com/doc/refman/5.5/en/writing-audit-plugins.html
Silence a warning about old table name when InnoDB tests whether the
format has changed using a nonexistent table name.
Reviewed by: bar@mysql.com, marko.makela@oracle.com
connectors plugins
Implemented changes needed to keep the client plugin API compatible with
the existing plugins :
1. Provided an options() client plugin API to let the application pass
options to the plugin after loading it
2. Added "License" (const char *) to specify the client plugin's license
3. Added "mysql_api" as a placeholder that the client library can use
to pass function pointers to the plugin so that the plugin can call the
C lib back.
4. Updated the existing client plugins to comply with the API change.
5. Added more detailed error message generation for Windows.
detector" that doesn't introduce bug #56715 "Concurrent
transactions + FLUSH result in sporadical unwarranted
deadlock errors".
Deadlock could have occurred when workload containing a mix
of DML, DDL and FLUSH TABLES statements affecting the same
set of tables was executed in a heavily concurrent environment.
This deadlock occurred when several connections tried to
perform deadlock detection in the metadata locking subsystem.
The first connection started traversing wait-for graph,
encountered a sub-graph representing a wait for flush, acquired
LOCK_open and dived into sub-graph inspection. Then it
encountered sub-graph corresponding to wait for metadata lock
and blocked while trying to acquire a rd-lock on
MDL_lock::m_rwlock, since some,other thread had a wr-lock on it.
When this wr-lock was released it could have happened (if there
was another pending wr-lock against this rwlock) that the rd-lock
from the first connection was left unsatisfied but at the same
time the new rd-lock request from the second connection sneaked
in and was satisfied (for this to be possible the second
rd-request should come exactly after the wr-lock is released but
before pending the wr-lock manages to grab rwlock, which is
possible both on Linux and in our own rwlock implementation).
If this second connection continued traversing the wait-for graph
and encountered a sub-graph representing a wait for flush it tried
to acquire LOCK_open and thus the deadlock was created.
The previous patch tried to workaround this problem by not
allowing the deadlock detector to lock LOCK_open mutex if
some other thread doing deadlock detection already owns it
and current search depth is greater than 0. Instead deadlock
was reported. As a result it has introduced bug #56715.
This patch solves this problem in a different way.
It introduces a new rw_pr_lock_t implementation to be used
by MDL subsystem instead of one based on Linux rwlocks or
our own rwlock implementation. This new implementation
never allows situation in which an rwlock is rd-locked and
there is a blocked pending rd-lock. Thus the situation which
has caused this bug becomes impossible with this implementation.
Due to fact that this implementation is optimized for
wr-lock/unlock scenario which is most common in the MDL
subsystem it doesn't introduce noticeable performance
regressions in sysbench tests. Moreover it significantly
improves situation for POINT_SELECT test when many
connections are used.
No test case is provided as this bug is very hard to repeat
in MTR environment but is repeatable with the help of RQG
tests.
This patch also doesn't include a test for bug #56715
"Concurrent transactions + FLUSH result in sporadical
unwarranted deadlock errors" as it takes too much time to
be run as part of normal test-suite runs.
The crash during boot was caused by a DBUG_PRINT statement in fill_schema_schemata() (in
sql_show.cc). This DBUG_PRINT statement contained several instances of %s in the format
string and for one of these we gave a NULL pointer as the argument. This caused the
call to vsnprintf() to crash when running on Solaris.
The fix for this problem is to replace the call to vsnprintf() with my_vsnprintf()
which handles that a NULL pointer is passed as argumens for %s.
This patch also extends my_vsnprintf() to support %i in the format string.