Problem : The basic problem is the way the thread sleeps in mysql-5.5 and also in mysql-5.1
when we execute a stop slave on windows platform.
On windows platform if the stop slave is executed after the master dies, we have
this long wait before the stop slave return a value. This is because there is a
sleep of the thread. The sleep is uninterruptable in the two above version,
which was fixed by Davi patch for the BUG#11765860 for mysql-trunk. Backporting
his patch for mysql-5.5 fixes the problem.
Solution : A new pair of mutex and condition variable is introduced to synchronize thread
sleep and finalization. A new mutex is required because the slave threads are
terminated while holding the slave thread locks (run_lock), which can not be
relinquished during termination as this would affect the lock order.
When passing an empty user to the connect function will cause
valgrind warnings. Seems that the client code is not prepared
to handle empty users. On 5.6 this can even be triggered by
START SLAVE PASSWORD='...'; i.e., without setting USER='...' on
the START SLAVE command (see WL#4143 for details on the new
additional START SLAVE commands).
To fix this, we disallow empty users when configuring the slave
connection parameters (this decision might be revisited if the
client code accepts empty users in the future).
Connection of slave to master using a replication account which authenticates
with an external plugin was not possible.
Fixed by making sure that the CLIENT_PLUGIN_AUTH capability is set when client connects using mysql_real_connect(). Also, a plugin-dir path used by client library to locate authentication plugins is set based on the analogous server setting. This is done in connect_to_master() function before a call to mysql_real_connect().
Background: Backporting fix for BUG 11752963 to Mysql5.1 branch.
Problem: Fix of bug 11752963 was only available for trunk and 5.5 branch.
Partial fix has been pushed to 5.1 branch as well.
Fix: backporting the fixes of bug 11752963 to 5.1 branch.
1. Made all major changes to make 5.1 branch in line with 5.5 and the trunk.
2. skipped the partial patch that was already applied to the 5.1 branch.
Manual merge from mysql-5.1 into mysql-5.5.
Conflicts
=========
Text conflict in mysql-test/suite/rpl/t/rpl_row_until.test
Text conflict in sql/handler.h
Text conflict in storage/archive/ha_archive.cc
Currently, rpl_semi_sync is failing in PB due to the warning message:
"Slave SQL: slave SQL thread is being stopped in the middle of "
"applying of a group having updated a non-transaction table; "
"waiting for the group completion ..."
The problem started happening after the fix for BUG#11762407 what was
automatically suppressing some warning messages.
To fix the current issue, we suppress the aforementioned warning message
and exploit the opportunity to make the sentence clearer.
rpl_packet got a timeout failure sporadically on PB when stopping
slave. The real reason of this bug is that STOP SLAVE stopped
IO thread first and then stopped SQL thread. It was
possible that IO thread stopped after replicating part of a
transaction which SQL thread was executing. SQL thread would
be hung if the transaction could not be rolled back safely.
After this patch, STOP SLAVE will stop SQL thread first and then stop IO
thread, which guarantees that IO thread will fetch the reset of the
events of the transaction that SQL thread is executing, so that SQL
thread can finish the transaction if it cannot be rolled back safely.
Added below auxiliary files to make the test code neater.
restart_slave_sql.inc
rpl_connection_master.inc
rpl_connection_slave.inc
rpl_connection_slave1.inc
Manual merge from mysql-5.1-bugteam into mysql-5.5-bugteam.
Conflicts
=========
Text conflict in sql/log.cc
Text conflict in sql/log.h
Text conflict in sql/slave.cc
Text conflict in sql/sql_parse.cc
Text conflict in sql/sql_priv.h
when generating new name.
If find_uniq_filename returns an error, then this error is not
being propagated upwards, and execution does not report error to
the user (although a entry in the error log is generated).
Additionally, some more errors were ignored in new_file_impl:
- when writing the rotate event
- when reopening the index and binary log file
This patch addresses this by propagating the error up in the
execution stack. Furthermore, when rotation of the binary log
fails, an incident event is written, because there may be a
chance that some changes for a given statement, were not properly
logged. For example, in SBR, LOAD DATA INFILE statement requires
more than one event to be logged, should rotation fail while
logging part of the LOAD DATA events, then the logged data would
become inconsistent with the data in the storage engine.
Problem: Extended characters outside of ASCII range where not displayed
properly in SHOW PROCESSLIST, because thd_info->query was always sent as
system_character_set (utf8). This was wrong, because query buffer
is never converted to utf8 - it is always have client character set.
Fix: sending query buffer using query character set
@ sql/sql_class.cc
@ sql/sql_class.h
Introducing a new class CSET_STRING, a LEX_STRING with character set.
Adding set_query(&CSET_STRING)
Adding reset_query(), to use instead of set_query(0, NULL).
@ sql/event_data_objects.cc
Using reset_query()
@ sql/log_event.cc
Using reset_query()
Adding charset argument to set_query_and_id().
@ sql/slave.cc
Using reset_query().
@ sql/sp_head.cc
Changing backing up and restore code to use CSET_STRING.
@ sql/sql_audit.h
Using CSET_STRING.
In the "else" branch it's OK not to use
global_system_variables.character_set_client.
&my_charset_latin1, which is set in constructor, is fine
(verified with Sergey Vojtovich).
@ sql/sql_insert.cc
Using set_query() with proper character set: table_name is utf8.
@ sql/sql_parse.cc
Adding character set argument to set_query_and_id().
(This is the main point where thd->charset() is stored
into thd->query_string.cs, for use in "SHOW PROCESSLIST".)
Using reset_query().
@ sql/sql_prepare.cc
Storing client character set into thd->query_string.cs.
@ sql/sql_show.cc
Using CSET_STRING to fetch and send charset-aware query information
from threads.
@ storage/myisam/ha_myisam.cc
Using set_query() with proper character set: table_name is utf8.
@ mysql-test/r/show_check.result
@ mysql-test/t/show_check.test
Adding tests
- A prerequisite cleanup patch for making KILL reliable.
The test case main.kill did not work reliably.
The following problems have been identified:
1. A kill signal could go lost if it came in, short before a
thread went reading on the client connection.
2. A kill signal could go lost if it came in, short before a
thread went waiting on a condition variable.
These problems have been solved as follows. Please see also
added code comments for more details.
1. There is no safe way to detect, when a thread enters the
blocking state of a read(2) or recv(2) system call, where it
can be interrupted by a signal. Hence it is not possible to
wait for the right moment to send a kill signal. It has been
decided, not to fix it in the code. Instead, the test case
repeats the KILL statement until the connection terminates.
2. Before waiting on a condition variable, we register it
together with a synchronizating mutex in THD::mysys_var. After
this, we need to test THD::killed again. At some places we did
only test it in a loop condition before the registration. When
THD::killed had been set between this test and the registration,
we entered waiting without noticing the killed flag. Additional
checks ahve been introduced where required.
In addition to the above, a re-write of the main.kill test
case has been done. All sleeps have been replaced by Debug
Sync Facility synchronization. A couple of sync points have
been added to the server code.
To avoid further problems, if the test case fails in spite of
the fixes, the test case has been added to the "experimental"
list for now.
- Most of the work on this patch is authored by Ingo Struewing
replication aborts
When recieving a 'SLAVE STOP' command, slave SQL thread will roll back the
transaction and stop immidiately if there is only transactional table updated,
even through 'CREATE|DROP TEMPOARY TABLE' statement are in it. But These
statements can never be rolled back. Because the temporary tables to the user
session mapping remain until 'RESET SLAVE', Therefore it will abort SQL thread
with an error that the table already exists or doesn't exist, when it restarts
and executes the whole transaction again.
After this patch, SQL thread always waits till the transaction ends and then stops,
if 'CREATE|DROP TEMPOARY TABLE' statement are in it.
When slave executes a transaction bigger than slave's max_binlog_cache_size,
slave will crash. It is caused by the assert that server should only roll back
the statement but not the whole transaction if the error ER_TRANS_CACHE_FULL
happens. But slave sql thread always rollbacks the whole transaction when
an error happens.
Ather this patch, we always clear any error set in sql thread(it is different
from the error in 'SHOW SLAVE STATUS') and it is cleared before rolling back
the transaction.
The procedure for setting up a fake binary log, by changing the
relay log files manually, is run twice when we issue mtr with
--repeat=2. However, when setting it up for the second time,
neither the sql thread is reset nor the server is restarted. This
means that previous stale relay log IO_CACHE metadata - from
first run - is left around. As a consequence, when the test is
run for the second time, the IO_CACHE for the relay log has
position in file different than zero, triggering the assertion.
We fix this by deploying a call to my_b_seek before calling
check_binlog_magic in next_event. This prevents the server
from asserting, in the cases that the SQL thread was reads
from a hot log (relay.NNNNN), then is stopped, then is restarted
from a previous cold log (relay.MMMMM), and then it reaches
the hot log relay.NNNNN again, in which case, the read
coordinates are not set to zero, but to the old values.
Additionally, some comments to the source code were added.
This patch also fixes Bug#55452 "SET PASSWORD is
replicated twice in RBR mode".
The goal of this patch is to remove the release of
metadata locks from close_thread_tables().
This is necessary to not mistakenly release
the locks in the course of a multi-step
operation that involves multiple close_thread_tables()
or close_tables_for_reopen().
On the same token, move statement commit outside
close_thread_tables().
Other cleanups:
Cleanup COM_FIELD_LIST.
Don't call close_thread_tables() in COM_SHUTDOWN -- there
are no open tables there that can be closed (we leave
the locked tables mode in THD destructor, and this
close_thread_tables() won't leave it anyway).
Make open_and_lock_tables() and open_and_lock_tables_derived()
call close_thread_tables() upon failure.
Remove the calls to close_thread_tables() that are now
unnecessary.
Simplify the back off condition in Open_table_context.
Streamline metadata lock handling in LOCK TABLES
implementation.
Add asserts to ensure correct life cycle of
statement transaction in a session.
Remove a piece of dead code that has also become redundant
after the fix for Bug 37521.
Fix warnings flagged by the new warning option -Wunused-but-set-variable
that was added to GCC 4.6 and that is enabled by -Wunused and -Wall. The
option causes a warning whenever a local variable is assigned to but is
later unused. It also warns about meaningless pointer dereferences.
Essentially, the problem is that safemalloc is excruciatingly
slow as it checks all allocated blocks for overrun at each
memory management primitive, yielding a almost exponential
slowdown for the memory management functions (malloc, realloc,
free). The overrun check basically consists of verifying some
bytes of a block for certain magic keys, which catches some
simple forms of overrun. Another minor problem is violation
of aliasing rules and that its own internal list of blocks
is prone to corruption.
Another issue with safemalloc is rather the maintenance cost
as the tool has a significant impact on the server code.
Given the magnitude of memory debuggers available nowadays,
especially those that are provided with the platform malloc
implementation, maintenance of a in-house and largely obsolete
memory debugger becomes a burden that is not worth the effort
due to its slowness and lack of support for detecting more
common forms of heap corruption.
Since there are third-party tools that can provide the same
functionality at a lower or comparable performance cost, the
solution is to simply remove safemalloc. Third-party tools
can provide the same functionality at a lower or comparable
performance cost.
The removal of safemalloc also allows a simplification of the
malloc wrappers, removing quite a bit of kludge: redefinition
of my_malloc, my_free and the removal of the unused second
argument of my_free. Since free() always check whether the
supplied pointer is null, redudant checks are also removed.
Also, this patch adds unit testing for my_malloc and moves
my_realloc implementation into the same file as the other
memory allocation primitives.
Apart strict-aliasing warnings, fix the remaining warnings
generated by GCC 4.4.4 -Wall and -Wextra flags.
One major source of warnings was the in-house function my_bcmp
which (unconventionally) took pointers to unsigned characters
as the byte sequences to be compared. Since my_bcmp and bcmp
are deprecated functions whose only difference with memcmp is
the return value, every use of the function is replaced with
memcmp as the special return value wasn't actually being used
by any caller.
There were also various other warnings, mostly due to type
mismatches, missing return values, missing prototypes, dead
code (unreachable) and ignored return values.
at mf_iocache.c, line 1722
The slave crashed while two threads: IO thread and user thread
raced for the same mutex (the append_buffer_lock protecting the
relay log's IO_CACHE). The IO thread was trying to flush the
cache, and for that was grabbing the append_buffer_lock.
However, the other thread was closing and reopening the relay log
when the IO thread tried to lock. Closing and reopening the log
includes destroying and reinitialising the IO_CACHE
mutex. Therefore, the IO thread tried to lock a destroyed mutex.
We fix this by backporting patch for BUG#50364 which fixed this
bug in mysql server 5.5+. The patch deploys missing
synchronization when flush_master_info is called and the relay
log is flushed by the IO thread. In detail the patch backports
revision (from mysql-trunk):
- luis.soares@sun.com-20100203165617-b1yydr0ee24ycpjm
This patch already includes the post-push fix also in BUG#50364:
- luis.soares@sun.com-20100222002629-0cijwqk6baxhj7gr
Conflicts:
Text conflict in mysql-test/r/grant.result
Text conflict in mysql-test/t/grant.test
Text conflict in mysys/mf_loadpath.c
Text conflict in sql/slave.cc
Text conflict in sql/sql_priv.h
When issuing a 'SET GLOBAL SQL_SLAVE_SKIP_COUNTER' statement, the previous
position along with the new position is dumped into the error log. Namely,
the following information is printed out: skip_counter, group_relay_log_name
and group_relay_log_pos.
This patch:
- Moves all definitions from the mysql_priv.h file into
header files for the component where the variable is
defined
- Creates header files if the component lacks one
- Eliminates all include directives from mysql_priv.h
- Eliminates all circular include cycles
- Rename time.cc to sql_time.cc
- Rename mysql_priv.h to sql_priv.h