(like DROP TABLE) has been scheduled before conflicting DDL's (like INSERT)
are commited.
What makes these bugs hard to detect is that in most cases any wrong
schduling are caught by MDL locks. It's only when there are timing issues
that the bugs (usually deadlocks) are noticed.
This patch adds a DBUG_ASSERT() that detects, in parallel replication,
if a DDL is scheduled before any depending DML'S are commited.
It does this be checking if there are any conflicting replication locks
when the DDL is about to wait for getting it's MDL lock.
I also did some minor code cleanups in sql_base.cc to make this code
similar to other related code.
Problem:
In debug builds, there is a chance that an out-of-bounds
read is performed when tables are locked in
LTM_PRELOCKED_UNDER_LOCK_TABLES mode. It can happen because
the debug code uses enum values as index for an array of
mode descriptions, but it only takes into consideration 3
out of 4 of the enum values.
Fix:
This patch fixes it by implementing a getter for the enum which
returns a string representation of the enum,
effectively removing the out-of-bounds read.
Moreover, it also fixes the lock mode descriptions that
would be print out in debug builds.
Problem was that notify_shared_lock() didn't abort an insert delayed thread
if it was in thr_upgrade_write_delay_lock().
ALTER TABLE first takes a weak_mdl_lock, then a thr_lock and then tries to upgrade
the mdl_lock.
Delayed insert thread first takes a mdl lock followed by a
thr_upgrade_write_delay_lock()
This caused insert delay to wait for alter table in thr_lock, while
alter table was waiting for the mdl lock by insert delay.
Fixed by telling mdl to run thr_lock_abort() for the insert delay thread table.
We also set thd->mysys_var->abort to 1 for the delay thread when it's killed
by alter table to ensure it doesn't ever get locked in thr_lock.
- Fixes query cache so that it is aware of wsrep_sync_wait.
Query cache would return (possibly stale) results to the
client, regardless of the value of wsrep_sync_wait.
- Includes the test case that reproduced the issue.
- Moves call wsrep_free_status() to THD::cleanup_after_query().
Wsrep status variables were previously freed only on SHOW STATUS.
- Removes valgrind suppression from mysql-test/valgrind.
- Fixes query cache so that it is aware of wsrep_sync_wait.
Query cache would return (possibly stale) results to the
client, regardless of the value of wsrep_sync_wait.
- Includes the test case that reproduced the issue.
- Added testing if connection is killed to shortcut reading of connection data
This will allow us later in 10.2 to do a cleaner shutdown of slaves (less errors in the log)
- Add new status variables: Slaves_connected, Slaves_running and Slave_connections.
- Use MYSQL_SLAVE_NOT_RUN instead of 0 with slave_running.
- Don't print obvious extra warnings to the error log when slave is shut down normally.
On shutdown feedback was sending a short report without creating
a THD. At that point current_thd was pointing to the already
destroyed THD from the previous full report.
backport from 10.1:
commit bfe703a
Author: Sergei Golubchik <serg@mariadb.org>
Date: Tue Feb 3 18:19:56 2015 +0100
don't let current_thd to point to a destroyed THD
As galera node (slave) received query log events from an async
replication master, it partially wrote the updates made to replication
state table (mysql.gtid_slave_pos) to galera transaction writeset post
TOI. As a result, the transaction handle, thus created within galera,
was never freed/purged as the corresponding trx did not commit.
Thus, it kept piling up for every query log event and was only reclaimed
upon server shutdown when the transaction map object got destructed.
Fixed by making sure that updates in replication slave state table
are not written to galera transaction writeset and thus, not replicated
to other nodes.
Problem:
At the end of first execution select_lex->prep_where is pointing to
a runtime created object (temporary table field). As a result
server exits trying to access a invalid pointer during second
execution.
Analysis:
While optimizing the join conditions for the query, after the
permanent transformation, optimizer makes a copy of the new
where conditions in select_lex->prep_where. "prep_where" is what
is used as the "where condition" for the query at the start of execution.
W.r.t the query in question, "where" condition is actually pointing
to a field in the temporary table. As a result, for the second
execution the pointer is no more valid resulting in server exit.
Fix:
At the end of the first execution, select_lex->where will have the
original item of the where condition.
Make prep_where the new place where the original item of select->where
has to be rolled back.
Fixed in 5.7 with the wl#7082 - Move permanent transformations from
JOIN::optimize to JOIN::prepare
Patch for 5.5 includes the following backports from 5.6:
Bugfix for Bug12603141 - This makes the first execute statement in the testcase
pass in 5.5
However it was noted later in in Bug16163596 that the above bugfix needed to
be modified. Although Bug16163596 is reproducible only with changes done for
Bug12582849, we have decided include the fix.
Considering that Bug12582849 is related to Bug12603141, the fix is
also included here. However this results in Bug16317817, Bug16317685,
Bug16739050. So fix for the above three bugs is also part of this patch.
Problem & Analysis: If DML invokes a trigger or a
stored function that inserts into an AUTO_INCREMENT column,
that DML has to be marked as 'unsafe' statement. If the
tables are locked in the transaction prior to DML statement
(using LOCK TABLES), then the same statement is not marked as
'unsafe' statement. The logic of checking whether unsafeness
is protected with if (!thd->locked_tables_mode). Hence if
we lock the tables prior to DML statement, it is *not* entering
into this if condition. Hence the statement is not marked
as unsafe statement.
Fix: Irrespective of locked_tables_mode value, the unsafeness
check should be done. Now with this patch, the code is moved
out to 'decide_logging_format()' function where all these checks
are happening and also with out 'if(!thd->locked_tables_mode)'.
Along with the specified test case in the bug scenario
(BINLOG_STMT_UNSAFE_AUTOINC_COLUMNS), we also identified that
other cases BINLOG_STMT_UNSAFE_AUTOINC_NOT_FIRST,
BINLOG_STMT_UNSAFE_WRITE_AUTOINC_SELECT, BINLOG_STMT_UNSAFE_INSERT_TWO_KEYS
are also protected with thd->locked_tables_mode which is not right. All
of those checks also moved to 'decide_logging_format()' function.
(even when configured with --binlog-format=statement).
Before we got an error on the slave and the slave stopped if the master
was configured with --binlog-format=mixed or --binlog-format=row.
The old code used pthread_setspecific() to store temporary data used by the thread.
This is not safe when used with thread pool, as the thread may change for the transaction.
The fix is to save the data in THD, which is guaranteed to be properly freed.
I also fixed the code so that we don't do a malloc() for every transaction.
When the slave processes the master restart format_description event,
parallel replication needs to complete any prior events before processing
the restart event (which closes temporary tables and such stuff).
This happens in wait_for_workers_idle(), however it was not waiting long
enough. The wait was using wait_for_prior_commit(), but at that points table
can still be open. This lead to assertion in this case.
So change wait_for_workers_idle() to wait until all worker threads have
reached finish_event_group(), at which point all tables should have been
closed.
including the big commit
commit 305130361bf72726de220f3d2b2787395e10be61
Author: Marc Alff <marc.alff@oracle.com>
Date: Tue Feb 10 11:31:32 2015 +0100
WL#8354 BACKPORT DIGEST IMPROVEMENTS TO MYSQL 5.6
(with the following commits) and related changes in sql/
on disconnect THD must clean user_var_events array before
dropping temporary tables. Otherwise when binlogging a DROP,
it'll access user_var_events, but they were allocated
in the already freed memroot.
This bug is a side-effect of fix for MDEV-6924, where we completely
stopped a statement-based event from getting into the binlog cache when
binary logging is not enabled (and thus, wsrep_emulate_binlog mode = 1).
As a result, the SBR events were not replicated.
Fixed by allowing the SBR events to be written into the binlog cache.
Note: Only DMLs were affected as DDLs are replicated via TOI.
Merged galera_create_trigger.test from github.com/codership/mysql-wsrep.