Problem: In regular replication, when master binlogged using statement format
slave might not have written an event to its binary log when the Query
event aimed at a temporary table.
Specifically this was observed with LOAD DATA INFILE.
This effect was possible because unlike master slave holds temporary
tables in its pool and the master side check of existence of a
temporary table at the format bin-logging decision did not apply.
Solution: replace THD::has_thd_temporary_tables() with
THD::has_temporary_tables which allows to identify temporary table
presence on either side.
--
Reviewed by Andrei Elkin.
Using parallel slave applying can cause deadlock between between DDL and
other events. GTID with lower seqno can be blocked in galera when node
entered TOI mode, but DDL GTID which has higher node can be blocked
before previous GTIDs are applied locally.
Fix is to check prior commits before entering TOI.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Handling BF abort for prepared statement execution so that EXECUTE processing will continue
until parameter setup is complete, before BF abort bails out the statement execution.
THD class has new boolean member: wsrep_delayed_BF_abort, which is set if BF abort is observed
in do_command() right after reading client's packet, and if the client has sent PS execute command.
In such case, the deadlock error is not returned immediately back to client, but the PS execution
will be started. However, the PS execution loop, will now check if wsrep_delayed_BF_abort is set, and
stop the PS execution after the type information has been assigned for the PS.
With this, the PS protocol type information, which is present in the first PS EXECUTE command, is not lost
even if the first PS EXECUTE command was marked to abort.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
extra2_read_len resolved by keeping the implementation
in sql/table.cc by exposed it for use by ha_partition.cc
Remove identical implementation in unireg.h
(ref: bfed2c7d57)
Problem:
Parse-time conversion from binary to tricky character sets like utf32
produced ill-formed strings. So, later a chash happened in debug builds,
or a wrong SHOW CREATE TABLE was returned in release builds.
Fix:
1. Backporting a few methods from 10.3:
- THD::check_string_for_wellformedness()
- THD::convert_string() overloads
- THD::make_text_string_connection()
2. Adding a new method THD::reinterpret_string_from_binary(),
which makes sure to either returns a well-formed string
(optionally prepending with zero bytes), or returns an error.
close_connections() in mysqld.cc sends a signal to all threads.
But InnoDB is too busy purging, doesn't react immediately.
close_connections() waits 20 seconds, which isn't enough in this
particular case, and then unlinks all threads from
the list and forcibly closes their vio connection.
InnoDB background threads have no vio connection to close, but
they're unlinked all the same. So when later they finally notice
the shutdown request and try to unlink themselves, they fail to
assert that they're still linked.
Fix: don't assert_linked, as another thread can unlink this THD anytime
When transaction creates or drops temporary tables and afterward its statement
faces an error even the transactional table statement's cached ROW
format events get involved into binlog and are visible after the transaction's commit.
Fixed with proper analysis of whether the errored-out statement needs
to be rolled back in binlog.
For instance a fact of already cached CREATE or DROP for temporary
tables by previous statements alone
does not cause to retain the being errored-out statement events in the
cache.
Conversely, if the statement creates or drops a temporary table
itself it can't be rolled back - this rule remains.
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
When transaction creates or drops temporary tables and afterward its statement
faces an error even the transactional table statement's cached ROW
format events get involved into binlog and are visible after the transaction's commit.
Fixed with proper analysis of whether the errored-out statement needs
to be rolled back in binlog.
For instance a fact of already cached CREATE or DROP for temporary
tables by previous statements alone
does not cause to retain the being errored-out statement events in the
cache.
Conversely, if the statement creates or drops a temporary table
itself it can't be rolled back - this rule remains.
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This patch also fixes mutex locking order and unprotected
THD member accesses on bf aborting case. We try to hold
THD::LOCK_thd_data during bf aborting. Only case where it
is not possible is at wsrep_abort_transaction before
call wsrep_innobase_kill_one_trx where we take InnoDB
mutexes first and then THD::LOCK_thd_data.
This will also fix possible race condition during
close_connection and while wsrep is disconnecting
connections.
Added wsrep_bf_kill_debug test case
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
THD::copy_db_to(): Always return true if the output parameter
was left uninitialized. This fixes a regression that was caused
by commit 7d0d934ca6 (MDEV-16473).
MariaDB Server 10.3 and later were unaffected by this bug
thanks to commit a7e352b54d.
Possibly this bug only affects mysql_list_fields()
in the Embedded Server (libmysqld).
This bug was found by GCC 11.2.0 in CMAKE_BUILD_TYPE=RelWithDebInfo.
make BACKUP STAGE behave as FTWRL, desyncing and pausing the node
to prevent BF threads (appliers) from interfering with blocking stages.
This is needed because BF threads don't respect BACKUP MDL locks.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
because the name was misleading, it counts not threads, but THDs,
and as THD_count is the only way to increment/decrement it, it
could as well be declared inside THD_count.
In the code existed just before this patch binding of a table reference to
the specification of the corresponding CTE happens in the function
open_and_process_table(). If the table reference is not the first in the
query the specification is cloned in the same way as the specification of
a view is cloned for any reference of the view. This works fine for
standalone queries, but does not work for stored procedures / functions
for the following reason.
When the first call of a stored procedure/ function SP is processed the
body of SP is parsed. When a query of SP is parsed the info on each
encountered table reference is put into a TABLE_LIST object linked into
a global chain associated with the query. When parsing of the query is
finished the basic info on the table references from this chain except
table references to derived tables and information schema tables is put
in one hash table associated with SP. When parsing of the body of SP is
finished this hash table is used to construct TABLE_LIST objects for all
table references mentioned in SP and link them into the list of such
objects passed to a pre-locking process that calls open_and_process_table()
for each table from the list.
When a TABLE_LIST for a view is encountered the view is opened and its
specification is parsed. For any table reference occurred in
the specification a new TABLE_LIST object is created to be included into
the list for pre-locking. After all objects in the pre-locking have been
looked through the tables mentioned in the list are locked. Note that the
objects referenced CTEs are just skipped here as it is impossible to
resolve these references without any info on the context where they occur.
Now the statements from the body of SP are executed one by one that.
At the very beginning of the execution of a query the tables used in the
query are opened and open_and_process_table() now is called for each table
reference mentioned in the list of TABLE_LIST objects associated with the
query that was built when the query was parsed.
For each table reference first the reference is checked against CTEs
definitions in whose scope it occurred. If such definition is found the
reference is considered resolved and if this is not the first reference
to the found CTE the the specification of the CTE is re-parsed and the
result of the parsing is added to the parsing tree of the query as a
sub-tree. If this sub-tree contains table references to other tables they
are added to the list of TABLE_LIST objects associated with the query in
order the referenced tables to be opened. When the procedure that opens
the tables comes to the TABLE_LIST object created for a non-first
reference to a CTE it discovers that the referenced table instance is not
locked and reports an error.
Thus processing non-first table references to a CTE similar to how
references to view are processed does not work for queries used in stored
procedures / functions. And the main problem is that the current
pre-locking mechanism employed for stored procedures / functions does not
allow to save the context in which a CTE reference occur. It's not trivial
to save the info about the context where a CTE reference occurs while the
resolution of the table reference cannot be done without this context and
consequentially the specification for the table reference cannot be
determined.
This patch solves the above problem by moving resolution of all CTE
references at the parsing stage. More exactly references to CTEs occurred in
a query are resolved right after parsing of the query has finished. After
resolution any CTE reference it is marked as a reference to to derived
table. So it is excluded from the hash table created for pre-locking used
base tables and view when the first call of a stored procedure / function
is processed.
This solution required recursive calls of the parser. The function
THD::sql_parser() has been added specifically for recursive invocations of
the parser.
# Conflicts:
# sql/sql_cte.cc
# sql/sql_cte.h
# sql/sql_lex.cc
# sql/sql_lex.h
# sql/sql_view.cc
# sql/sql_yacc.yy
# sql/sql_yacc_ora.yy
In the code existed just before this patch binding of a table reference to
the specification of the corresponding CTE happens in the function
open_and_process_table(). If the table reference is not the first in the
query the specification is cloned in the same way as the specification of
a view is cloned for any reference of the view. This works fine for
standalone queries, but does not work for stored procedures / functions
for the following reason.
When the first call of a stored procedure/ function SP is processed the
body of SP is parsed. When a query of SP is parsed the info on each
encountered table reference is put into a TABLE_LIST object linked into
a global chain associated with the query. When parsing of the query is
finished the basic info on the table references from this chain except
table references to derived tables and information schema tables is put
in one hash table associated with SP. When parsing of the body of SP is
finished this hash table is used to construct TABLE_LIST objects for all
table references mentioned in SP and link them into the list of such
objects passed to a pre-locking process that calls open_and_process_table()
for each table from the list.
When a TABLE_LIST for a view is encountered the view is opened and its
specification is parsed. For any table reference occurred in
the specification a new TABLE_LIST object is created to be included into
the list for pre-locking. After all objects in the pre-locking have been
looked through the tables mentioned in the list are locked. Note that the
objects referenced CTEs are just skipped here as it is impossible to
resolve these references without any info on the context where they occur.
Now the statements from the body of SP are executed one by one that.
At the very beginning of the execution of a query the tables used in the
query are opened and open_and_process_table() now is called for each table
reference mentioned in the list of TABLE_LIST objects associated with the
query that was built when the query was parsed.
For each table reference first the reference is checked against CTEs
definitions in whose scope it occurred. If such definition is found the
reference is considered resolved and if this is not the first reference
to the found CTE the the specification of the CTE is re-parsed and the
result of the parsing is added to the parsing tree of the query as a
sub-tree. If this sub-tree contains table references to other tables they
are added to the list of TABLE_LIST objects associated with the query in
order the referenced tables to be opened. When the procedure that opens
the tables comes to the TABLE_LIST object created for a non-first
reference to a CTE it discovers that the referenced table instance is not
locked and reports an error.
Thus processing non-first table references to a CTE similar to how
references to view are processed does not work for queries used in stored
procedures / functions. And the main problem is that the current
pre-locking mechanism employed for stored procedures / functions does not
allow to save the context in which a CTE reference occur. It's not trivial
to save the info about the context where a CTE reference occurs while the
resolution of the table reference cannot be done without this context and
consequentially the specification for the table reference cannot be
determined.
This patch solves the above problem by moving resolution of all CTE
references at the parsing stage. More exactly references to CTEs occurred in
a query are resolved right after parsing of the query has finished. After
resolution any CTE reference it is marked as a reference to to derived
table. So it is excluded from the hash table created for pre-locking used
base tables and view when the first call of a stored procedure / function
is processed.
This solution required recursive calls of the parser. The function
THD::sql_parser() has been added specifically for recursive invocations of
the parser.
In the code existed just before this patch binding of a table reference to
the specification of the corresponding CTE happens in the function
open_and_process_table(). If the table reference is not the first in the
query the specification is cloned in the same way as the specification of
a view is cloned for any reference of the view. This works fine for
standalone queries, but does not work for stored procedures / functions
for the following reason.
When the first call of a stored procedure/ function SP is processed the
body of SP is parsed. When a query of SP is parsed the info on each
encountered table reference is put into a TABLE_LIST object linked into
a global chain associated with the query. When parsing of the query is
finished the basic info on the table references from this chain except
table references to derived tables and information schema tables is put
in one hash table associated with SP. When parsing of the body of SP is
finished this hash table is used to construct TABLE_LIST objects for all
table references mentioned in SP and link them into the list of such
objects passed to a pre-locking process that calls open_and_process_table()
for each table from the list.
When a TABLE_LIST for a view is encountered the view is opened and its
specification is parsed. For any table reference occurred in
the specification a new TABLE_LIST object is created to be included into
the list for pre-locking. After all objects in the pre-locking have been
looked through the tables mentioned in the list are locked. Note that the
objects referenced CTEs are just skipped here as it is impossible to
resolve these references without any info on the context where they occur.
Now the statements from the body of SP are executed one by one that.
At the very beginning of the execution of a query the tables used in the
query are opened and open_and_process_table() now is called for each table
reference mentioned in the list of TABLE_LIST objects associated with the
query that was built when the query was parsed.
For each table reference first the reference is checked against CTEs
definitions in whose scope it occurred. If such definition is found the
reference is considered resolved and if this is not the first reference
to the found CTE the the specification of the CTE is re-parsed and the
result of the parsing is added to the parsing tree of the query as a
sub-tree. If this sub-tree contains table references to other tables they
are added to the list of TABLE_LIST objects associated with the query in
order the referenced tables to be opened. When the procedure that opens
the tables comes to the TABLE_LIST object created for a non-first
reference to a CTE it discovers that the referenced table instance is not
locked and reports an error.
Thus processing non-first table references to a CTE similar to how
references to view are processed does not work for queries used in stored
procedures / functions. And the main problem is that the current
pre-locking mechanism employed for stored procedures / functions does not
allow to save the context in which a CTE reference occur. It's not trivial
to save the info about the context where a CTE reference occurs while the
resolution of the table reference cannot be done without this context and
consequentially the specification for the table reference cannot be
determined.
This patch solves the above problem by moving resolution of all CTE
references at the parsing stage. More exactly references to CTEs occurred in
a query are resolved right after parsing of the query has finished. After
resolution any CTE reference it is marked as a reference to to derived
table. So it is excluded from the hash table created for pre-locking used
base tables and view when the first call of a stored procedure / function
is processed.
This solution required recursive calls of the parser. The function
THD::sql_parser() has been added specifically for recursive invocations of
the parser.
m_status == DA_OK_BULK' failed in Diagnostics_area::message from
get_schema_tables_record
Analysis: SET NAMES changes character set for character_set_client,
character_set_connection, character_set_results to 'filename'. The .frm file of view
has @xx sequences in the SELECT query, which give parsing error because 'filename'
character set is not parser friendly. When we get parsing error (ER_PARSE_ERROR), we
directly return true without setting error status. This is caught later in assertion.
Fix: Disallow 'filename' character set in SET NAMES because it is not parser
friendly.
Problem:
=======
In slave_parallel_mode=optimistic configuration, when admin commands and
DML operation on the same table are scheduled simultaneously for execution,
it results in lock conflict and slave server either hangs due to
deadlock or goes down with an assert.
Analysis:
========
Admin commands OPTIMIZE, REPAIR and ANALYZE are written to binary log as
ordinary transactions. When 'slave_parallel_mode' is 'optimistic' DMLs are
allowed to run in parallel. But these locks are not detected by parallel
replication deadlock detection-and-handling mechanism. At times they result
in deadlock or assertion.
Fix:
===
Flag admin commands as DDL in Gtid_log_event at the time of writing to
binary log. Add a new bit EXECUTED_TABLE_ADMIN_CMD to
'm_unsafe_rollback_flags'. During 'mysql_admin_table' command execution it
accepts a list of tables to be processed and executes them in a loop. Upon
successful execution enable 'EXECUTED_TABLE_ADMIN_CMD' bit in
thd->transaction.stmt_unsafe_rollback_flags. Gtid_log_event constructor
will notice this flag and mark the current transaction with 'FL_DDL' flag.
Gtid_log_events marked as FL_DDL will not be scheduled parallel execution,
on the slave. They will execute in isolation to prevent deadlocks.
Note: Removed the call to 'trans_commit_implicit' from 'mysql_admin_table'
function as 'mysql_execute_command' will take care of invoking
'trans_commit_implicit'.
When you only need view structure, don't call handle_derived with
DT_CREATE and rely on its internal hackish check to skip DT_CREATE.
Because handle_derived is called from many different places,
and this internal hackish check is indiscriminative.
Instead, just don't ask handle_derived to do DT_CREATE
if you don't want it to do DT_CREATE.
When you only need view structure, don't call handle_derived with
DT_CREATE and rely on its internal hackish check to skip DT_CREATE.
Because handle_derived is called from many different places,
and this internal hackish check is indiscriminative.
Instead, just don't ask handle_derived to do DT_CREATE
if you don't want it to do DT_CREATE.