The assertion happens when: (i) the master and slave are configured to
use the semisync plugin; (ii) the DBA disables semisync on the master;
(iii) and he also unsets the option to wait for slaves ACK even if the
semisync slave count reaches 0 during the waiting period. This
combination of factors makes the server run into an assertion as soon
as the last semisync slave disconnects and its dump thread exits.
The root of the problem is the fact that when the dump thread
disconnects and calls the observer hook transmit_stop, which ends up
calling ReplSemiSyncMaster::remove_slave, there is no check whether
the master has already disabled semisync or not. If it has, the then a
second call to the switch_off member function must be avoided.
The quick fix is to avoid calling switch_off if the DBA has disabled
the semisync plugin interactively on the master. Also, the switch_off
member function should only be called if the plugin has not been
switched off already. This is basically the pattern throughout the
rest of the semisync plugin and no other calls seem vulnerable to
similar crashes/assertions.
(This a backport of the patch to 5.5, which is also vulnerable.)
RPL_SEMI_SYNC_MASTER_ENABLED OFF.
Problem:
=======
If master is waiting for a reply from slave, at this time
set global rpl_semi_sync_master_enabled=OFF, the master
server will crash.
Analysis:
========
When master is waiting for a reply from slave, at this time
if semi sync is switched off on master, during switch off if
active transactions are present the transactions will be
cleared and "active_tranxs_" variable will be set to NULL.
When the waiting master connection finds that semi sync is
switched of it tries to access "active_tranxs_" without
checking if the transaction list exists or not. Accessing
NULL transaction list causes the crash.
Fix:
===
A check has been added to see a valid list exists before
accessing the "active_tranxs_".
The autotools-based build system has been superseded and
is being removed in order to ease the maintenance burden on
developers tweaking and maintaining the build system.
In order to support tools that need to extract the server
version, a new file that (only) contains the server version,
called VERSION, is introduced. The file contents are human
and machine-readable. The format is:
MYSQL_VERSION_MAJOR=5
MYSQL_VERSION_MINOR=5
MYSQL_VERSION_PATCH=8
MYSQL_VERSION_EXTRA=-rc
The CMake based version extraction in cmake/mysql_version.cmake
is changed to extract the version from this file. The configure
to CMake wrapper is retained for backwards compatibility and to
support the BUILD/ scripts. Also, a new a makefile target
show-dist-name that prints the server version is introduced.
Essentially, the problem is that safemalloc is excruciatingly
slow as it checks all allocated blocks for overrun at each
memory management primitive, yielding a almost exponential
slowdown for the memory management functions (malloc, realloc,
free). The overrun check basically consists of verifying some
bytes of a block for certain magic keys, which catches some
simple forms of overrun. Another minor problem is violation
of aliasing rules and that its own internal list of blocks
is prone to corruption.
Another issue with safemalloc is rather the maintenance cost
as the tool has a significant impact on the server code.
Given the magnitude of memory debuggers available nowadays,
especially those that are provided with the platform malloc
implementation, maintenance of a in-house and largely obsolete
memory debugger becomes a burden that is not worth the effort
due to its slowness and lack of support for detecting more
common forms of heap corruption.
Since there are third-party tools that can provide the same
functionality at a lower or comparable performance cost, the
solution is to simply remove safemalloc. Third-party tools
can provide the same functionality at a lower or comparable
performance cost.
The removal of safemalloc also allows a simplification of the
malloc wrappers, removing quite a bit of kludge: redefinition
of my_malloc, my_free and the removal of the unused second
argument of my_free. Since free() always check whether the
supplied pointer is null, redudant checks are also removed.
Also, this patch adds unit testing for my_malloc and moves
my_realloc implementation into the same file as the other
memory allocation primitives.
Fix various mismatches between function's language linkage. Any
particular function that is declared in C++ but should be callable
from C must have C linkage. Note that function types with different
linkages are also distinct. Thus, if a function type is declared in
C code, it will have C linkage (same if declared in a extern "C"
block).
This patch:
- Moves all definitions from the mysql_priv.h file into
header files for the component where the variable is
defined
- Creates header files if the component lacks one
- Eliminates all include directives from mysql_priv.h
- Eliminates all circular include cycles
- Rename time.cc to sql_time.cc
- Rename mysql_priv.h to sql_priv.h
The problem was because the gettimeofday function was incorrect
implemented for Windows, and so the semisync master did not wait
for slave reply properly on Windows.
Fixed by removing the gettimeofday function for Windows, and using
set_timespec function to get current time for all platforms.
The root cause of the crash is that a TranxNode is freed before it is used.
A TranxNode is allocated and inserted into the active list each time
a log event is written and flushed into the binlog file.
The memory for TranxNode is allocated with thd_alloc and will be freed
at the end of the statement. The after_commit/after_rollback callback
was supposed to be called before the end of each statement and remove the node from
the active list. However this assumption is not correct in all cases(e.g. call
'CREATE TEMPORARY TABLE myisam_t SELECT * FROM innodb_t' in a transaction
and delete all temporary tables automatically when a session closed),
and can cause the memory allocated for TranxNode be freed
before it was removed from the active list. So The TranxNode pointer in the active
list would become a wild pointer and cause the crash.
After this patch, We have a class called a TranxNodeAllocate which manages the memory
for allocating and freeing TranxNode. It uses my_malloc to allocate memory.
from mysql-next-mr-bugfixing to mysql-trunk-bugfixing.
Original revision:
------------------------------------------------------------
revision-id: zhenxing.he@sun.com-20091127084945-wng7gakygduv3q8k
committer: He Zhenxing <zhenxing.he@sun.com>
branch nick: 5.1-rep-semisync
timestamp: Fri 2009-11-27 16:49:45 +0800
message:
Bug#48351 Inconsistent library names for semisync plugin
The semisync plugin library names on Unix like systems were prefixed with
'lib', which did not follow the conventions.
Fix the problem by removing the 'lib' prefix on Unix systems.
------------------------------------------------------------
Before this patch, semisync assumed transactions running in parallel
can not be larger than max_connections, but this is not true when
the event scheduler is executing events, and cause semisync run out
of preallocated transaction nodes.
Fix the problem by allocating transaction nodes dynamically.
This patch also fixed a possible deadlock when running UNINSTALL
PLUGIN rpl_semi_sync_master and updating in parallel. Fixed by
releasing the internal Delegate lock before unlock the plugins.
The semisync plugin library names on Unix like systems were prefixed with
'lib', which did not follow the conventions.
Fix the problem by removing the 'lib' prefix on Unix systems.
rpl_semi_sync_master_wait_sessions was reset by FLUSH STATUS,
which could cause the master fail to wake up waiting sessions and
result in master timeout waiting for slave reply.
rpl_semi_sync_master_wait_session should not be reset, this
problem is fixed by this patch.