This reverts the server part of the commit 775fccea0
but keeps InnoDB part (which reverted MDEV-17092 5530a93f4).
So after this both MDEV-23536 and MDEV-17092 are reverted,
and the original bug is resurrected.
A race condition may occur between the execution of transaction commit,
and an execution of a KILL statement that would attempt to abort that
transaction.
MDEV-17092 worked around this race condition by modifying InnoDB code.
After that issue was closed, Sergey Vojtovich pointed out that this
race condition would better be fixed above the storage engine layer:
If you look carefully into the above, you can conclude that
thd->free_connection() can be called concurrently with
KILL/thd->awake(). Which is the bug. And it is partially fixed in
THD::~THD(), that is destructor waits for KILL completion:
Fix: Add necessary mutex operations to THD::free_connection()
and move WSREP specific code also there. This ensures that no
one is using THD while we do free_connection(). These mutexes
will also ensures that there can't be concurrent KILL/THD::awake().
innobase_kill_query
We can now remove usage of trx_sys_mutex introduced on MDEV-17092.
trx_t::free()
Poison trx->state and trx->mysql_thd
This patch is validated with an RQG run similar to the one that
reproduced MDEV-17092.
This bug could cause a crash when executing queries that used mutually
recursive CTEs with system variable big_tables set to 1. It happened due
to several bugs in the code that handled recursive table references
referred mutually recursive CTEs. For each recursive table reference a
temporary table is created that contains all rows generated for the
corresponding recursive CTE table on the previous step of recursion.
This temporary table should be created in the same way as the temporary
table created for a regular materialized derived table using the
method select_union::create_result_table(). In this case when the
temporary table is created it uses the select_union::TMP_TABLE_PARAM
structure as the parameter for the table construction. However the
code created the temporary table using just the function create_tmp_table()
and passed pointers to certain fields of the TMP_TABLE_PARAM structure
used for accumulation of rows of the recursive CTE table as parameters
for update. This was a mistake because now different temporary tables
cannot share some TMP_TABLE_PARAM fields in a general case. Besides,
depending on how mutually recursive CTE tables were defined and which
of them were referred in the executed query the select_union object
allocated for a recursive table reference could be allocated again after
the the temporary table had been created. In this case the TMP_TABLE_PARAM
object associated with the temporary table created for the recursive
table reference contained unassigned fields needed for execution when
Aria engine is employed as the engine for temporary tables.
This patch ensures that
- select_union object is created only once for any recursive table
reference
- any temporary table created for recursive CTEs uses its own
TMP_TABLE_PARAM structure
The patch also fixes a problem caused by incomplete cleanup of join tables
associated with recursive table references.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
The reason for the failure is that
thd->mdl_context.release_transactional_locks()
was called after commit & rollback even in cases where the current
transaction is still active.
For 10.2, 10.3 and 10.4 the fix is simple:
- Replace all calls to thd->mdl_context.release_transactional_locks() with
thd->release_transactional_locks(). The thd function will only call
the mdl_context function if there are no active transactional locks.
In 10.6 we will better fix where we will change the return value for
some trans_xxx() functions to indicate if transaction did close the
transaction or not. This will avoid the need of the indirect call.
Other things:
- trans_xa_commit() and trans_xa_rollback() will automatically
call release_transactional_locks() if the transaction is closed.
- We can't do that for the other functions as the caller of many of these
are doing additional work (like close_thread_tables) before calling
release_transactional_locks().
- Added missing abort_result_set() and missing DBUG_RETURN in
select_create::send_eof()
- Fixed wrong indentation in injector::transaction::commit()
The issue here was the system variable max_sort_length was being applied
to decimals and it was truncating the value for decimals to the number
of bytes set by max_sort_length.
This was leading to a buffer overflow as the values were written
to the buffer without truncation and then we moved the offset to
the number of bytes(set by max_sort_length), that are needed for comparison.
The fix is to not apply max_sort_length for fixed size types like INT,
DECIMALS and only apply max_sort_length for CHAR, VARCHARS, TEXT and
BLOBS.
* Allocate items on thd->mem_root while refixing vcol exprs
* Make vcol tree changes register and roll them back after the statement is executed.
Explanation:
Due to collation implementation specifics an Item tree could change while fixing.
The tricky thing here is to make it on a proper arena.
It's usually not a problem when a field is deterministic, however, makes a pain vice-versa, during allocation allocating.
A non-deterministic field should be refixed on each statement, since it depends on the environment state.
Changing the tree will be temporary and therefore it should be reverted after the statement execution.
For low sort_buffer_size, in the cost calculation of using the Unique object the elements in the tree were evaluated to 0, make sure to have atleast 1 element in the Unique tree.
Also for the function Unique::get allocate memory for atleast MERGEBUFF2+1 keys.
cannot use the current THD::mem_root, because it can be temporarily
reassigned to something with a very different life time
(e.g. to TABLE::mem_root or range optimizer mem_root).
MDEV-21398 Deadlock (server hang) or assertion failure in
Diagnostics_area::set_error_status upon ALTER under lock
This failure could only happen if one locked the same table
multiple times and then did an ALTER TABLE on the table.
Major change is to change all instances of
table->m_needs_reopen= true;
to
table->mark_table_for_reopen();
The main fix for the problem was to ensure that we mark all
instances of the table in the locked_table_list and when we
reopen the tables, we first close all tables before reopening
and locking them.
Other things:
- Don't call thd->locked_tables_list.reopen_tables if there
are no tables marked for reopen. (performance)
* size represents the size of an element in the Unique class
* full_size is used when the Unique class counts the number of
duplicates stored per element. This requires additional space per Unique
element.
mysql_insert() first opens all affected tables (which implicitly
starts a transaction in InnoDB), then stat tables.
A failure to open a stat table caused open_tables() to abort
the current stmt transaction (trans_rollback_stmt()). So, from the
server point of view the following ha_write_row()-s happened outside
of a transactions, and the server didn't bother to commit them.
The server has a mechanism to prevent a transaction being
unexpectedly committed or rolled back in the middle of a statement -
if an operation takes place _in a sub-statement_ it cannot change
the transaction state. Operations on stat tables are exactly that -
they are not allowed to change a transaction state. Put them in
a sub-statement to make sure they don't.
read_statistics_for_tables_if_needed
Regression after 279a907, read_statistics_for_tables_if_needed() was
called after open_normal_and_derived_tables() failure.
Fixed by moving read_statistics_for_tables() call to a branch of
get_schema_stat_record() where result of open_normal_and_derived_tables()
is checked.
Removed THD::force_read_stats, added read_statistics_for_tables() instead.
Simplified away statistics_for_command_is_needed().
Problem:-
When mysql executes INSERT ON DUPLICATE KEY INSERT, the storage engine checks
if the inserted row would generate a duplicate key error. If yes, it returns
the existing row to mysql, mysql updates it and sends it back to the storage
engine.When the table has more than one unique or primary key, this statement
is sensitive to the order in which the storage engines checks the keys.
Depending on this order, the storage engine may determine different rows
to mysql, and hence mysql can update different rows.The order that the
storage engine checks keys is not deterministic. For example, InnoDB checks
keys in an order that depends on the order in which indexes were added to
the table. The first added index is checked first. So if master and slave
have added indexes in different orders, then slave may go out of sync.
Solution:-
Make INSERT...ON DUPLICATE KEY UPDATE unsafe while using stmt or mixed format
When there is more then one unique key.
Although there is two exception.
1. Auto Increment key is not counted because Innodb will get gap lock for
failed Insert and concurrent insert will get a next increment value. But if
user supplies auto inc value it can be unsafe.
2. Count only unique keys for which insertion is performed.
So this patch also addresses the bug id #72921
cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug
Maintainer mode makes all warnings errors. This patch fix warnings. Mostly about
deprecated `register` keyword.
Too much warnings came from Mroonga and I gave up on it.
Plugin fixed to not lock the LOCK_operations when not active.
Server fixed to lock the LOCK_plugin less - do it once per
thread and then only if a plugin was installed/uninstalled.
truncating a temporary table
TRUNCATE expects only one TABLE instance (which is used by TRUNCATE
itself) to be open. However this requirement wasn't enforced after
"MDEV-5535: Cannot reopen temporary table".
Fixed by closing unused table instances before performing TRUNCATE.
The command SHOW INDEXES ignored setting of the system variable
use_stat_tables to the value of 'preferably' and and showed statistical
data received from the engine. Similarly queries over the table
STATISTICS from INFORMATION_SCHEMA ignored this setting. It happened
because the function fill_schema_table_by_open() did not read any data
from statistical tables.
This patch contains a fix for the MDEV-17262/17243 issues and
new mtr test.
These issues (MDEV-17262/17243) have two reasons:
1) After an intermediate commit, a transaction loses its status
of "transaction that registered in the MySQL for 2pc coordinator"
(in the InnoDB) due to the fact that since version 10.2 the
write_row() function (which located in the ha_innodb.cc) does
not call trx_register_for_2pc(m_prebuilt->trx) during the processing
of split transactions. It is necessary to restore this call inside
the write_row() when an intermediate commit was made (for a split
transaction).
Similarly, we need to set the flag of the started transaction
(m_prebuilt->sql_stat_start) after intermediate commit.
The table->file->extra(HA_EXTRA_FAKE_START_STMT) called from the
wsrep_load_data_split() function (which located in sql_load.cc)
will also do this, but it will be too late. As a result, the call
to the wsrep_append_keys() function from the InnoDB engine may be
lost or function may be called with invalid transaction identifier.
2) If a transaction with the LOAD DATA statement is divided into
logical mini-transactions (of the 10K rows) and binlog is rotated,
then in rare cases due to the wsrep handler re-registration at the
boundary of the split, the last portion of data may be lost. Since
splitting of the LOAD DATA into mini-transactions is technical,
I believe that we should not allow these mini-transactions to fall
into separate binlogs. Therefore, it is necessary to prohibit the
rotation of binlog in the middle of processing LOAD DATA statement.
https://jira.mariadb.org/browse/MDEV-17262 and
https://jira.mariadb.org/browse/MDEV-17243
Make mysqltest to use --ps-protocol more
use prepared statements for everything that server supports
with the exception of CALL (for now).
Fix discovered test failures and bugs.
tests:
* PROCESSLIST shows Execute state, not Query
* SHOW STATUS increments status variables more than in text protocol
* multi-statements should be avoided (see tests with a wrong delimiter)
* performance_schema events have different names in --ps-protocol
* --enable_prepare_warnings
mysqltest.cc:
* make sure run_query_stmt() doesn't crash if there's
no active connection (in wait_until_connected_again.inc)
* prepare all statements that server supports
protocol.h
* Protocol_discard::send_result_set_metadata() should not send
anything to the client.
sql_acl.cc:
* extract the functionality of getting the user for SHOW GRANTS
from check_show_access(), so that mysql_test_show_grants() could
generate the correct column names in the prepare step
sql_class.cc:
* result->prepare() can fail, don't ignore its return value
* use correct number of decimals for EXPLAIN columns
sql_parse.cc:
* discard profiling for SHOW PROFILE. In text protocol it's done in
prepare_schema_table(), but in --ps it is called on prepare only,
so nothing was discarding profiling during execute.
* move the permission checking code for SHOW CREATE VIEW to
mysqld_show_create_get_fields(), so that it would be called during
prepare step too.
* only set sel_result when it was created here and needs to be
destroyed in the same block. Avoid destroying lex->result.
* use the correct number of tables in check_show_access(). Saying
"as many as possible" doesn't work when first_not_own_table isn't
set yet.
sql_prepare.cc:
* use correct user name for SHOW GRANTS columns
* don't ignore verbose flag for SHOW SLAVE STATUS
* support preparing REVOKE ALL and ROLLBACK TO SAVEPOINT
* don't ignore errors from thd->prepare_explain_fields()
* use select_send result for sending ANALYZE and EXPLAIN, but don't
overwrite lex->result, because it might be needed to issue execute-time
errors (select_dumpvar - too many rows)
sql_show.cc:
* check grants for SHOW CREATE VIEW here, not in mysql_execute_command
sql_view.cc:
* use the correct function to check privileges. Old code was doing
check_access() for thd->security_ctx, which is invoker's sctx,
not definer's sctx. Hide various view related errors from the invoker.
sql_yacc.yy:
* initialize lex->select_lex for LOAD, otherwise it'll contain garbage
data that happen to fail tests with views in --ps (but not otherwise).
This solves the following issues:
* unlike lex->m_sql_cmd and lex->sql_command, thd->query_plan_flags
is not reset in Prepared_statement::execute, it survives
till the log_slow_statement(), so slow logging behaves correctly in --ps
* using thd->query_plan_flags for both slow_log_filter and
log_slow_admin_statements means the definition of "admin" statements
for the slow log is the same no matter how it is filtered out.
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519