Function wsrep_read_only_option was already removed in commit
d54bc3c0d1 because it could cause race condition on variable
opt_readonly so that value OFF can become permanent.
Removed function again and added test case. Note that writes
to TEMPORARY tables are still allowed when read_only=ON.
Window Functions code tries to minimize the number of times it
needs to sort the select's resultset by finding "compatible"
OVER (PARTITION BY ... ORDER BY ...) clauses.
This employs compare_order_elements(). That function assumed that
the order expressions are Item_field-derived objects (that refer
to a temp.table). But this is not always the case: one can
construct queries order expressions are arbitrary item expressions.
Add handling for such expressions: sort them according to the window
specification they appeared in.
This means we cannot detect that two compatible PARTITION BY clauses
that use expressions can share the sorting step.
But at least we won't crash.
IF an INSERT/REPLACE SELECT statement contained an ON expression in the top
level select and this expression used a subquery with a column reference
that could not be resolved then an attempt to resolve this reference as
an outer reference caused a crash of the server. This happened because the
outer context field in the Name_resolution_context structure was not set
to NULL for such references. Rather it pointed to the first element in
the select_stack.
Note that starting from 10.4 we cannot use the SELECT_LEX::outer_select()
method when parsing a SELECT construct.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
* Fix test galera.MW-44 to make it work with --ps-protocol
* Skip test galera.MW-328C under --ps-protocol This test
relies on wsrep_retry_autocommit, which has no effect
under ps-protocol.
* Return WSREP related errors on COM_STMT_PREPARE commands
Change wsrep_command_no_result() to allow sending back errors
when a statement is prepared. For example, to handle deadlock
error due to BF aborted transaction during prepare.
* Add sync waiting before statement prepare
When a statement is prepared, tables used in the statement may be
opened and checked for existence. Because of that, some tests (for
example galera_create_table_as_select) that CREATE a table in one node
and then SELECT from the same table in another node may result in errors
due to non existing table.
To make tests behave similarly under normal and PS protocol, we add a
call to sync wait before preparing statements that would sync wait
during normal execution.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Handling BF abort for prepared statement execution so that EXECUTE processing will continue
until parameter setup is complete, before BF abort bails out the statement execution.
THD class has new boolean member: wsrep_delayed_BF_abort, which is set if BF abort is observed
in do_command() right after reading client's packet, and if the client has sent PS execute command.
In such case, the deadlock error is not returned immediately back to client, but the PS execution
will be started. However, the PS execution loop, will now check if wsrep_delayed_BF_abort is set, and
stop the PS execution after the type information has been assigned for the PS.
With this, the PS protocol type information, which is present in the first PS EXECUTE command, is not lost
even if the first PS EXECUTE command was marked to abort.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
extra2_read_len resolved by keeping the implementation
in sql/table.cc by exposed it for use by ha_partition.cc
Remove identical implementation in unireg.h
(ref: bfed2c7d57)
Per Marko's comment in JIRA, sql_kill is passing the thread id
as long long. We change the format of the error messages to match,
and cast the thread id to long long in sql_kill_user.
The 10.5 test error main.grant_kill showed up a incorrect
thread id on a big endian architecture.
The cause of this is the sql_kill_user function assumed the
error was ER_OUT_OF_RESOURCES, when the the actual error was
ER_KILL_DENIED_ERROR. ER_KILL_DENIED_ERROR as an error message
requires a thread id to be passed as unsigned long, however a
user/host was passed.
ER_OUT_OF_RESOURCES doesn't even take a user/host, despite
the optimistic comment. We remove this being passed as an
argument to the function so that when MDEV-21978 is implemented
one less compiler format warning is generated (which would
have caught this error sooner).
Thanks Otto for reporting and Marko for analysis.
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This patch also fixes mutex locking order and unprotected
THD member accesses on bf aborting case. We try to hold
THD::LOCK_thd_data during bf aborting. Only case where it
is not possible is at wsrep_abort_transaction before
call wsrep_innobase_kill_one_trx where we take InnoDB
mutexes first and then THD::LOCK_thd_data.
This will also fix possible race condition during
close_connection and while wsrep is disconnecting
connections.
Added wsrep_bf_kill_debug test case
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
In a rebase of the merge, two preceding commits were accidentally reverted:
commit 112b23969a (MDEV-26308)
commit ac2857a5fb (MDEV-25717)
Thanks to Daniele Sciascia for noticing this.
Contains following fixes:
* allow TOI commands to timeout while trying to acquire TOI with
override lock_wait_timeout with a LONG_TIMEOUT only after
succesfully entering TOI
* only ignore lock_wait_timeout on TOI
* fix galera_split_brain test as TOI operation now returns ER_LOCK_WAIT_TIMEOUT after lock_wait_timeout
* explicitly test for TOI
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Using ROLLBACK with `completion_type = CHAIN` result in start of
transaction and implicit commit before previous WSREP internal data is
cleared.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
because the name was misleading, it counts not threads, but THDs,
and as THD_count is the only way to increment/decrement it, it
could as well be declared inside THD_count.
In the code existed just before this patch binding of a table reference to
the specification of the corresponding CTE happens in the function
open_and_process_table(). If the table reference is not the first in the
query the specification is cloned in the same way as the specification of
a view is cloned for any reference of the view. This works fine for
standalone queries, but does not work for stored procedures / functions
for the following reason.
When the first call of a stored procedure/ function SP is processed the
body of SP is parsed. When a query of SP is parsed the info on each
encountered table reference is put into a TABLE_LIST object linked into
a global chain associated with the query. When parsing of the query is
finished the basic info on the table references from this chain except
table references to derived tables and information schema tables is put
in one hash table associated with SP. When parsing of the body of SP is
finished this hash table is used to construct TABLE_LIST objects for all
table references mentioned in SP and link them into the list of such
objects passed to a pre-locking process that calls open_and_process_table()
for each table from the list.
When a TABLE_LIST for a view is encountered the view is opened and its
specification is parsed. For any table reference occurred in
the specification a new TABLE_LIST object is created to be included into
the list for pre-locking. After all objects in the pre-locking have been
looked through the tables mentioned in the list are locked. Note that the
objects referenced CTEs are just skipped here as it is impossible to
resolve these references without any info on the context where they occur.
Now the statements from the body of SP are executed one by one that.
At the very beginning of the execution of a query the tables used in the
query are opened and open_and_process_table() now is called for each table
reference mentioned in the list of TABLE_LIST objects associated with the
query that was built when the query was parsed.
For each table reference first the reference is checked against CTEs
definitions in whose scope it occurred. If such definition is found the
reference is considered resolved and if this is not the first reference
to the found CTE the the specification of the CTE is re-parsed and the
result of the parsing is added to the parsing tree of the query as a
sub-tree. If this sub-tree contains table references to other tables they
are added to the list of TABLE_LIST objects associated with the query in
order the referenced tables to be opened. When the procedure that opens
the tables comes to the TABLE_LIST object created for a non-first
reference to a CTE it discovers that the referenced table instance is not
locked and reports an error.
Thus processing non-first table references to a CTE similar to how
references to view are processed does not work for queries used in stored
procedures / functions. And the main problem is that the current
pre-locking mechanism employed for stored procedures / functions does not
allow to save the context in which a CTE reference occur. It's not trivial
to save the info about the context where a CTE reference occurs while the
resolution of the table reference cannot be done without this context and
consequentially the specification for the table reference cannot be
determined.
This patch solves the above problem by moving resolution of all CTE
references at the parsing stage. More exactly references to CTEs occurred in
a query are resolved right after parsing of the query has finished. After
resolution any CTE reference it is marked as a reference to to derived
table. So it is excluded from the hash table created for pre-locking used
base tables and view when the first call of a stored procedure / function
is processed.
This solution required recursive calls of the parser. The function
THD::sql_parser() has been added specifically for recursive invocations of
the parser.
In the code existed just before this patch binding of a table reference to
the specification of the corresponding CTE happens in the function
open_and_process_table(). If the table reference is not the first in the
query the specification is cloned in the same way as the specification of
a view is cloned for any reference of the view. This works fine for
standalone queries, but does not work for stored procedures / functions
for the following reason.
When the first call of a stored procedure/ function SP is processed the
body of SP is parsed. When a query of SP is parsed the info on each
encountered table reference is put into a TABLE_LIST object linked into
a global chain associated with the query. When parsing of the query is
finished the basic info on the table references from this chain except
table references to derived tables and information schema tables is put
in one hash table associated with SP. When parsing of the body of SP is
finished this hash table is used to construct TABLE_LIST objects for all
table references mentioned in SP and link them into the list of such
objects passed to a pre-locking process that calls open_and_process_table()
for each table from the list.
When a TABLE_LIST for a view is encountered the view is opened and its
specification is parsed. For any table reference occurred in
the specification a new TABLE_LIST object is created to be included into
the list for pre-locking. After all objects in the pre-locking have been
looked through the tables mentioned in the list are locked. Note that the
objects referenced CTEs are just skipped here as it is impossible to
resolve these references without any info on the context where they occur.
Now the statements from the body of SP are executed one by one that.
At the very beginning of the execution of a query the tables used in the
query are opened and open_and_process_table() now is called for each table
reference mentioned in the list of TABLE_LIST objects associated with the
query that was built when the query was parsed.
For each table reference first the reference is checked against CTEs
definitions in whose scope it occurred. If such definition is found the
reference is considered resolved and if this is not the first reference
to the found CTE the the specification of the CTE is re-parsed and the
result of the parsing is added to the parsing tree of the query as a
sub-tree. If this sub-tree contains table references to other tables they
are added to the list of TABLE_LIST objects associated with the query in
order the referenced tables to be opened. When the procedure that opens
the tables comes to the TABLE_LIST object created for a non-first
reference to a CTE it discovers that the referenced table instance is not
locked and reports an error.
Thus processing non-first table references to a CTE similar to how
references to view are processed does not work for queries used in stored
procedures / functions. And the main problem is that the current
pre-locking mechanism employed for stored procedures / functions does not
allow to save the context in which a CTE reference occur. It's not trivial
to save the info about the context where a CTE reference occurs while the
resolution of the table reference cannot be done without this context and
consequentially the specification for the table reference cannot be
determined.
This patch solves the above problem by moving resolution of all CTE
references at the parsing stage. More exactly references to CTEs occurred in
a query are resolved right after parsing of the query has finished. After
resolution any CTE reference it is marked as a reference to to derived
table. So it is excluded from the hash table created for pre-locking used
base tables and view when the first call of a stored procedure / function
is processed.
This solution required recursive calls of the parser. The function
THD::sql_parser() has been added specifically for recursive invocations of
the parser.
This patch sets the proper name resolution context for outer references
used in a subquery from an ON clause. Usually this context is more narrow
than the name resolution context of the parent select that were used before
this fix.
This fix revealed another problem that concerned ON expressions used in
from clauses of specifications of derived tables / views / CTEs. The name
resolution outer context for such ON expression must be set to NULL to
prevent name resolution beyond the derived table where it is used.
The solution to resolve this problem applied in sql_derived.cc was provided
by Sergei Petrunia <sergey@mariadb.com>.
The change in sql_parse.cc is not good for 10.4+. A corresponding diff for
10.4+ will be provided in JIRA entry for this bug.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
A user connected to a server with an expired password
can't change password with the statement "SET password=..."
if this statement is run in PS mode. In mentioned use case a user
gets the error ER_MUST_CHANGE_PASSWORD on attempt to run
the statement PREPARE stmt FOR "SET password=...";
The reason of failure to reset password by a locked user using the
statement PREPARE stmt FOR "SET password=..." is that PS-related
statements are not listed among the commands allowed for execution
by a user with expired password. However, simple adding of PS-related
statements (PREPARE FOR/EXECUTE/DEALLOCATE PREPARE ) to the list of
statements allowed for execution by a locked user is not enough
to solve problems, since it opens the opportunity for a locked user
to execute any statement in the PS mode.
To exclude this opportunity, additional checking that the statement
being prepared for execution in PS-mode is the SET statement has to be added.
This extra checking has been added by this patch into the method
Prepared_statement::prepared() that executed on preparing any statement
for execution in PS-mode.
Assert was:
mariadbd: /my/maria-10.6/wsrep-lib/src/client_state.cpp:256:
int wsrep::client_state::after_statement(): Assertion `state() == s_exec'
The reason was because of two faults:
- A missing test for WSREP(thd) when checking wsrep_after_statement(()
- THD->wsrep_cs().state was set to s_idle instead of s_none
Store old value of binlog format before wsrep code so that
if we bail out because wsrep is not ready for connections
we can restore binlog format correctly.
Store old value of binlog format before wsrep code so that
if we bail out because wsrep is not ready for connections
we can restore binlog format correctly.
A bogus error message was issued for any outer references occurred in
ON expressions used in subqueries. This prevented execution of queries
containing subqueries as soon as they used outer references in their ON
clauses. This happened because the Name_resolution_context structure
created for any ON expression erroneously had the field outer_context set
to NULL. The fields select_lex of this structure was not set correctly
either.
The idea of the fix was taken from mysql code of the function
push_new_name_resolution_context().
Approved by dmitry.shulga@mariadb.com
Running statements with SET STATEMENT FOR clause is handled incorrectly in
case the whole statement is executed in prepared statement mode.
For example, running of the following statement
SET STATEMENT sql_mode = 'NO_ENGINE_SUBSTITUTION' FOR CREATE TABLE t1 AS SELECT CONCAT('abc') AS c1;
results in different definition of the table t1 depending on whether
the statement is executed as a prepared or as a regular statement.
In first case the column c1 is defined as
`c1` varchar(3) DEFAULT NULL
in the last case the column c1 is defined as
`c1` varchar(3) NOT NULL
Different definition for the column c1 arise due to the fact that
a value of the data memeber Item_func_concat::maybe_null depends on
whether strict mode is on or off. Below is definition of the method
fix_fields() of the class Item_str_func that is base class for the
class Item_func_concat that is created on parsing the
SET STATEMENT FOR clause.
bool Item_str_func::fix_fields(THD *thd, Item **ref)
{
bool res= Item_func::fix_fields(thd, ref);
/*
In Item_str_func::check_well_formed_result() we may set null_value
flag on the same condition as in test() below.
*/
maybe_null= maybe_null || thd->is_strict_mode();
return res;
}
Although the clause SET STATEMENT sql_mode = 'NO_ENGINE_SUBSTITUTION' FOR
is parsed on PREPARE phase during processing of the prepared statement,
real setting of the sql_mode system variable is done on EXECUTION phase.
On the other hand, the method Item_str_func::fix_fields is called on PREPARE
phase. In result, thd->is_strict_mode() returns true during calling the method
Item_str_func::fix_fields(), the data member maybe_null is assigned the value
true and column c1 is defined as DEFAULT NULL.
To fix the issue the system variables listed in the SET STATEMENT FOR clause
are set at the beginning of handling the PREPARE phase just right before
calling the function check_prepared_statement() and their original values
restored immediate after return from this function.
Additionally, to avoid code duplication the source code used in the function
mysql_execute_command for setting variables, specified by SET STATEMENT
clause, were extracted to the standalone functions
run_set_statement_if_requested(). This new function is called from
the function mysql_execute_command() and the method
Prepared_statement::prepare().