special character sets like utf16, utf32, ucs2.
Analysis: MySQL server does not support few special character sets like
utf16,utf32 and ucs2 as "client's character set"(eg: utf16,utf32, ucs2).
It is known limitation listed in the documentation
http://dev.mysql.com/doc/refman/5.5/en/charset-connection.html.
The default value for default-character-set parameter is 'auto'
which means that if the server's character set is not supported,
then server automatically changes client's character set to
predefined character-set which is 'latin1' in the current code.
Eg:
$ ./mysql -uroot -S$SOCKET_FILE --default-character-set=utf16
ERROR 1231 (42000): Variable 'character_set_client' can't be set to the value of 'utf16'
$ ./mysql -uroot -S$SOCKET_FILE will be successfully connected to
server with 'latin1' as default client side character set.
When IO thread is trying to connect to Master, it sets server's character
set as client's character set. When Slave server is started with these
special character sets, IO thread (which is like a connection to Master)
fails because of the above said limitation.
Fix: Now even IO thread also behaves the same as a regular client behaves.
i.e., If server's character set is not supported as client's character set,
then set default's client character set(latin1) as client's character set.
SHOW PROCESSLIST, SHOW BINLOGS
Problem: A deadlock was occurring when 4 threads were
involved in acquiring locks in the following way
Thread 1: Dump thread ( Slave is reconnecting, so on
Master, a new dump thread is trying kill
zombie dump threads. It acquired thread's
LOCK_thd_data and it is about to acquire
mysys_var->current_mutex ( which LOCK_log)
Thread 2: Application thread is executing show binlogs and
acquired LOCK_log and it is about to acquire
LOCK_index.
Thread 3: Application thread is executing Purge binary logs
and acquired LOCK_index and it is about to
acquire LOCK_thread_count.
Thread 4: Application thread is executing show processlist
and acquired LOCK_thread_count and it is
about to acquire zombie dump thread's
LOCK_thd_data.
Deadlock Cycle:
Thread 1 -> Thread 2 -> Thread 3-> Thread 4 ->Thread 1
The same above deadlock was observed even when thread 4 is
executing 'SELECT * FROM information_schema.processlist' command and
acquired LOCK_thread_count and it is about to acquire zombie
dump thread's LOCK_thd_data.
Analysis:
There are four locks involved in the deadlock. LOCK_log,
LOCK_thread_count, LOCK_index and LOCK_thd_data.
LOCK_log, LOCK_thread_count, LOCK_index are global mutexes
where as LOCK_thd_data is local to a thread.
We can divide these four locks in two groups.
Group 1 consists of LOCK_log and LOCK_index and the order
should be LOCK_log followed by LOCK_index.
Group 2 consists of other two mutexes
LOCK_thread_count, LOCK_thd_data and the order should
be LOCK_thread_count followed by LOCK_thd_data.
Unfortunately, there is no specific predefined lock order defined
to follow in the MySQL system when it comes to locks across these
two groups. In the above problematic example,
there is no problem in the way we are acquiring the locks
if you see each thread individually.
But If you combine all 4 threads, they end up in a deadlock.
Fix:
Since everything seems to be fine in the way threads are taking locks,
In this patch We are changing the duration of the locks in Thread 4
to break the deadlock. i.e., before the patch, Thread 4
('show processlist' command) mysqld_list_processes()
function acquires LOCK_thread_count for the complete duration
of the function and it also acquires/releases
each thread's LOCK_thd_data.
LOCK_thread_count is used to protect addition and
deletion of threads in global threads list. While show
process list is looping through all the existing threads,
it will be a problem if a thread is exited but there is no problem
if a new thread is added to the system. Hence a new mutex is
introduced "LOCK_thd_remove" which will protect deletion
of a thread from global threads list. All threads which are
getting exited should acquire LOCK_thd_remove
followed by LOCK_thread_count. (It should take LOCK_thread_count
also because other places of the code still thinks that exit thread
is protected with LOCK_thread_count. In this fix, we are changing
only 'show process list' query logic )
(Eg: unlink_thd logic will be protected with
LOCK_thd_remove).
Logic of mysqld_list_processes(or file_schema_processlist)
will now be protected with 'LOCK_thd_remove' instead of
'LOCK_thread_count'.
Now the new locking order after this patch is:
LOCK_thd_remove -> LOCK_thd_data -> LOCK_log ->
LOCK_index -> LOCK_thread_count
Since log_throttle is not available in 5.5. Logging of
error message for failure of thread to create new connection
in "create_thread_to_handle_connection" is not backported.
Since, function "my_plugin_log_message" is not available in
5.5 version and since there is incompatibility between
sql_print_XXX function compiled with g++ and alog files with
gcc to use sql_print_error, changes related to audit log
plugin is not backported.
Problem - When the slave was disconnected from the master, under certain
conditions, upon reconnect, it will report that it received a
packet larger the slave_max_allowed_packet which causes the
replication to stop.
Analysis -The reason of this failure is that on reconnect
the slave sets the max_allowed_packet from the master's mi->mysql
object which keeps the max_allowed_packet as 1MB. This causes the
slave to report such error on recieving packet bigger than 1MB.
START SLAVE on the slave fixes the problem since it restarts
slave threads which initializes the max_allowed_packet to
slave_max_allowed_packet.
Fix - The problem is fixed by some code refactoring and introduction of a new
function which updates the max_allowed_packet for the THD object of the
slave thread and the mysql->options max_allowed_packet.
Problem
========
Replication breaks in the cases if the event length exceeds
the size of master Dump thread's max_allowed_packet.
The reason why this failure is occuring is because the event length is
more than the total size of the max_allowed_packet, on addition of the
max_event_header length exceeds the max_allowed_packet of the DUMP thread.
This causes the Dump thread to break replication and throw an error.
That can happen e.g with row-based replication in Update_rows event.
Fix
====
The problem is fixed in 2 steps:
1.) The Dump thread limit to read event is increased to the upper limit
i.e. Dump thread reads whatever gets logged in the binary log.
2.) On the slave side we increase the the max_allowed_packet for the
slave's threads (IO/SQL) by increasing it to 1GB.
This is done using the new server option (slave_max_allowed_packet)
included, is used to regulate the max_allowed_packet of the
slave thread (IO/SQL) by the DBA, and facilitates the sending of
large packets from the master to the slave.
This causes the large packets to be received by the slave and apply
it successfully.
Problem
========
SQL statements close to the size of max_allowed_packet produce binary
log events larger than max_allowed_packet.
The reason why this failure is occuring is because the event length is
more than the total size of the max_allowed_packet + max_event_header
length. Now since the event length exceeds this size master Dump
thread is unable to send the packet on to the slave.
That can happen e.g with row-based replication in Update_rows event.
Fix
====
The problem was fixed by increasing the max_allowed_packet for the
slave's threads (IO/SQL) by increasing it to 1GB.
This is done using the new server option included which is used to
regulate the max_allowed_packet of the slave thread (IO/SQL).
This causes the large packets to be received by the slave and apply
it successfully.
PROBLEM:
Threads end-up in deadlock due to locks acquired as described
below,
con1: Run Query on a table.
It is important that this SELECT must back-off while
trying to open the t1 and enter into wait_for_condition().
The SELECT then is blocked trying to lock mysys_var->mutex
which is held by con3. The very significant fact here is
that mysys_var->current_mutex will still point to LOCK_open,
even if LOCK_open is no longer held by con1 at this point.
con2: Try dropping table used in con1 or query some table.
It will hold LOCK_open and be blocked trying to lock
kernel_mutex held by con4.
con3: Try killing the query run by con1.
It will hold THD::LOCK_thd_data belonging to con1 while
trying to lock mysys_var->current_mutex belonging to con1.
But current_mutex will point to LOCK_open which is held
by con2.
con4: Get innodb engine status
It will hold kernel_mutex, trying to lock THD::LOCK_thd_data
belonging to con1 which is held by con3.
So while technically only con2, con3 and con4 participate in the
deadlock, con1's mysys_var->current_mutex pointing to LOCK_open
is a vital component of the deadlock.
CYCLE = (THD::LOCK_thd_data -> LOCK_open ->
kernel_mutex -> THD::LOCK_thd_data)
FIX:
LOCK_thd_data has responsibility of protecting,
1) thd->query, thd->query_length
2) VIO
3) thd->mysys_var (used by KILL statement and shutdown)
4) THD during thread delete.
Among above responsibilities, 1), 2)and (3,4) seems to be three
independent group of responsibility. If there is different LOCK
owning responsibility of (3,4), the above mentioned deadlock cycle
can be avoid. This fix introduces LOCK_thd_kill to handle
responsibility (3,4), which eliminates the deadlock issue.
Note: The problem is not found in 5.5. Introduction MDL subsystem
caused metadata locking responsibility to be moved from TDC/TC to
MDL subsystem. Due to this, responsibility of LOCK_open is reduced.
As the use of LOCK_open is removed in open_table() and
mysql_rm_table() the above mentioned CYCLE does not form.
Revision ID for changes,
open_table() = dlenev@mysql.com-20100727133458-m3ua9oslnx8fbbvz
mysql_rm_table() = jon.hauglid@oracle.com-20101116100012-kxep9txz2fxy3nmw
BUG#11761686 insert_id event is not filtered.
Two issues are covered.
INSERT into autoincrement field which is not the first part in the composed primary key
is unsafe by autoincrement logging design. The case is specific to MyISAM engine
because Innodb does not allow such table definition.
However no warnings and row-format logging in the MIXED mode was done, and
that is fixed.
Int-, Rand-, User-var log-events were not filtered along with their parent
query that made possible them to screw up execution context of the following
query.
Fixed with deferring their execution until the parent query.
******
Bug#11754117
Post review fixes.
BUG#64503: mysql frequently ignores --relay-log-space-limit
When the SQL thread goes to sleep, waiting for more events, it sets
the flag ignore_log_space_limit to true. This gives the IO thread a
chance to queue some more events and ultimately the SQL thread will be
able to purge the log once it is rotated. By then the SQL thread
resets the ignore_log_space_limit to false. However, between the time
the SQL thread has set the ignore flag and the time it resets it, the
IO thread will be queuing events in the relay log, possibly going way
over the limit.
This patch makes the IO and SQL thread to synchronize when they reach
the space limit and only ask for one event at a time. Thus the SQL
thread sets ignore_log_space_limit flag and the IO thread resets it to
false everytime it processes one more event. In addition, everytime
the SQL thread processes the next event, and the limit has been
reached, it checks if the IO thread should rotate. If it should, it
instructs the IO thread to rotate, giving the SQL thread a chance to
purge the logs (freeing space). Finally, this patch removes the
resetting of the ignore_log_space_limit flag from purge_first_log,
because this is now reset by the IO thread every time it processes the
next event when the limit has been reached.
If the SQL thread is in a transaction, it cannot purge so, there is no
point in asking the IO thread to rotate. The only thing it can do is
to ask for more events until the transaction is over (then it can ask
the IO to rotate and purge the log right away). Otherwise, there would
be a deadlock (SQL would not be able to purge and IO thread would not
be able to queue events so that the SQL would finish the transaction).
Problem : The basic problem is the way the thread sleeps in mysql-5.5 and also in mysql-5.1
when we execute a stop slave on windows platform.
On windows platform if the stop slave is executed after the master dies, we have
this long wait before the stop slave return a value. This is because there is a
sleep of the thread. The sleep is uninterruptable in the two above version,
which was fixed by Davi patch for the BUG#11765860 for mysql-trunk. Backporting
his patch for mysql-5.5 fixes the problem.
Solution : A new pair of mutex and condition variable is introduced to synchronize thread
sleep and finalization. A new mutex is required because the slave threads are
terminated while holding the slave thread locks (run_lock), which can not be
relinquished during termination as this would affect the lock order.
When passing an empty user to the connect function will cause
valgrind warnings. Seems that the client code is not prepared
to handle empty users. On 5.6 this can even be triggered by
START SLAVE PASSWORD='...'; i.e., without setting USER='...' on
the START SLAVE command (see WL#4143 for details on the new
additional START SLAVE commands).
To fix this, we disallow empty users when configuring the slave
connection parameters (this decision might be revisited if the
client code accepts empty users in the future).
Connection of slave to master using a replication account which authenticates
with an external plugin was not possible.
Fixed by making sure that the CLIENT_PLUGIN_AUTH capability is set when client connects using mysql_real_connect(). Also, a plugin-dir path used by client library to locate authentication plugins is set based on the analogous server setting. This is done in connect_to_master() function before a call to mysql_real_connect().
Background: Backporting fix for BUG 11752963 to Mysql5.1 branch.
Problem: Fix of bug 11752963 was only available for trunk and 5.5 branch.
Partial fix has been pushed to 5.1 branch as well.
Fix: backporting the fixes of bug 11752963 to 5.1 branch.
1. Made all major changes to make 5.1 branch in line with 5.5 and the trunk.
2. skipped the partial patch that was already applied to the 5.1 branch.
Manual merge from mysql-5.1 into mysql-5.5.
Conflicts
=========
Text conflict in mysql-test/suite/rpl/t/rpl_row_until.test
Text conflict in sql/handler.h
Text conflict in storage/archive/ha_archive.cc
Currently, rpl_semi_sync is failing in PB due to the warning message:
"Slave SQL: slave SQL thread is being stopped in the middle of "
"applying of a group having updated a non-transaction table; "
"waiting for the group completion ..."
The problem started happening after the fix for BUG#11762407 what was
automatically suppressing some warning messages.
To fix the current issue, we suppress the aforementioned warning message
and exploit the opportunity to make the sentence clearer.
rpl_packet got a timeout failure sporadically on PB when stopping
slave. The real reason of this bug is that STOP SLAVE stopped
IO thread first and then stopped SQL thread. It was
possible that IO thread stopped after replicating part of a
transaction which SQL thread was executing. SQL thread would
be hung if the transaction could not be rolled back safely.
After this patch, STOP SLAVE will stop SQL thread first and then stop IO
thread, which guarantees that IO thread will fetch the reset of the
events of the transaction that SQL thread is executing, so that SQL
thread can finish the transaction if it cannot be rolled back safely.
Added below auxiliary files to make the test code neater.
restart_slave_sql.inc
rpl_connection_master.inc
rpl_connection_slave.inc
rpl_connection_slave1.inc
Manual merge from mysql-5.1-bugteam into mysql-5.5-bugteam.
Conflicts
=========
Text conflict in sql/log.cc
Text conflict in sql/log.h
Text conflict in sql/slave.cc
Text conflict in sql/sql_parse.cc
Text conflict in sql/sql_priv.h
when generating new name.
If find_uniq_filename returns an error, then this error is not
being propagated upwards, and execution does not report error to
the user (although a entry in the error log is generated).
Additionally, some more errors were ignored in new_file_impl:
- when writing the rotate event
- when reopening the index and binary log file
This patch addresses this by propagating the error up in the
execution stack. Furthermore, when rotation of the binary log
fails, an incident event is written, because there may be a
chance that some changes for a given statement, were not properly
logged. For example, in SBR, LOAD DATA INFILE statement requires
more than one event to be logged, should rotation fail while
logging part of the LOAD DATA events, then the logged data would
become inconsistent with the data in the storage engine.
Problem: Extended characters outside of ASCII range where not displayed
properly in SHOW PROCESSLIST, because thd_info->query was always sent as
system_character_set (utf8). This was wrong, because query buffer
is never converted to utf8 - it is always have client character set.
Fix: sending query buffer using query character set
@ sql/sql_class.cc
@ sql/sql_class.h
Introducing a new class CSET_STRING, a LEX_STRING with character set.
Adding set_query(&CSET_STRING)
Adding reset_query(), to use instead of set_query(0, NULL).
@ sql/event_data_objects.cc
Using reset_query()
@ sql/log_event.cc
Using reset_query()
Adding charset argument to set_query_and_id().
@ sql/slave.cc
Using reset_query().
@ sql/sp_head.cc
Changing backing up and restore code to use CSET_STRING.
@ sql/sql_audit.h
Using CSET_STRING.
In the "else" branch it's OK not to use
global_system_variables.character_set_client.
&my_charset_latin1, which is set in constructor, is fine
(verified with Sergey Vojtovich).
@ sql/sql_insert.cc
Using set_query() with proper character set: table_name is utf8.
@ sql/sql_parse.cc
Adding character set argument to set_query_and_id().
(This is the main point where thd->charset() is stored
into thd->query_string.cs, for use in "SHOW PROCESSLIST".)
Using reset_query().
@ sql/sql_prepare.cc
Storing client character set into thd->query_string.cs.
@ sql/sql_show.cc
Using CSET_STRING to fetch and send charset-aware query information
from threads.
@ storage/myisam/ha_myisam.cc
Using set_query() with proper character set: table_name is utf8.
@ mysql-test/r/show_check.result
@ mysql-test/t/show_check.test
Adding tests
- A prerequisite cleanup patch for making KILL reliable.
The test case main.kill did not work reliably.
The following problems have been identified:
1. A kill signal could go lost if it came in, short before a
thread went reading on the client connection.
2. A kill signal could go lost if it came in, short before a
thread went waiting on a condition variable.
These problems have been solved as follows. Please see also
added code comments for more details.
1. There is no safe way to detect, when a thread enters the
blocking state of a read(2) or recv(2) system call, where it
can be interrupted by a signal. Hence it is not possible to
wait for the right moment to send a kill signal. It has been
decided, not to fix it in the code. Instead, the test case
repeats the KILL statement until the connection terminates.
2. Before waiting on a condition variable, we register it
together with a synchronizating mutex in THD::mysys_var. After
this, we need to test THD::killed again. At some places we did
only test it in a loop condition before the registration. When
THD::killed had been set between this test and the registration,
we entered waiting without noticing the killed flag. Additional
checks ahve been introduced where required.
In addition to the above, a re-write of the main.kill test
case has been done. All sleeps have been replaced by Debug
Sync Facility synchronization. A couple of sync points have
been added to the server code.
To avoid further problems, if the test case fails in spite of
the fixes, the test case has been added to the "experimental"
list for now.
- Most of the work on this patch is authored by Ingo Struewing
replication aborts
When recieving a 'SLAVE STOP' command, slave SQL thread will roll back the
transaction and stop immidiately if there is only transactional table updated,
even through 'CREATE|DROP TEMPOARY TABLE' statement are in it. But These
statements can never be rolled back. Because the temporary tables to the user
session mapping remain until 'RESET SLAVE', Therefore it will abort SQL thread
with an error that the table already exists or doesn't exist, when it restarts
and executes the whole transaction again.
After this patch, SQL thread always waits till the transaction ends and then stops,
if 'CREATE|DROP TEMPOARY TABLE' statement are in it.
When slave executes a transaction bigger than slave's max_binlog_cache_size,
slave will crash. It is caused by the assert that server should only roll back
the statement but not the whole transaction if the error ER_TRANS_CACHE_FULL
happens. But slave sql thread always rollbacks the whole transaction when
an error happens.
Ather this patch, we always clear any error set in sql thread(it is different
from the error in 'SHOW SLAVE STATUS') and it is cleared before rolling back
the transaction.
The procedure for setting up a fake binary log, by changing the
relay log files manually, is run twice when we issue mtr with
--repeat=2. However, when setting it up for the second time,
neither the sql thread is reset nor the server is restarted. This
means that previous stale relay log IO_CACHE metadata - from
first run - is left around. As a consequence, when the test is
run for the second time, the IO_CACHE for the relay log has
position in file different than zero, triggering the assertion.
We fix this by deploying a call to my_b_seek before calling
check_binlog_magic in next_event. This prevents the server
from asserting, in the cases that the SQL thread was reads
from a hot log (relay.NNNNN), then is stopped, then is restarted
from a previous cold log (relay.MMMMM), and then it reaches
the hot log relay.NNNNN again, in which case, the read
coordinates are not set to zero, but to the old values.
Additionally, some comments to the source code were added.
This patch also fixes Bug#55452 "SET PASSWORD is
replicated twice in RBR mode".
The goal of this patch is to remove the release of
metadata locks from close_thread_tables().
This is necessary to not mistakenly release
the locks in the course of a multi-step
operation that involves multiple close_thread_tables()
or close_tables_for_reopen().
On the same token, move statement commit outside
close_thread_tables().
Other cleanups:
Cleanup COM_FIELD_LIST.
Don't call close_thread_tables() in COM_SHUTDOWN -- there
are no open tables there that can be closed (we leave
the locked tables mode in THD destructor, and this
close_thread_tables() won't leave it anyway).
Make open_and_lock_tables() and open_and_lock_tables_derived()
call close_thread_tables() upon failure.
Remove the calls to close_thread_tables() that are now
unnecessary.
Simplify the back off condition in Open_table_context.
Streamline metadata lock handling in LOCK TABLES
implementation.
Add asserts to ensure correct life cycle of
statement transaction in a session.
Remove a piece of dead code that has also become redundant
after the fix for Bug 37521.