If UPDATE/DELETE does not change data it is skipped from
replication. We now force replication of such events when they trigger
partition auto-creation.
For ROLLBACK it is as simple as set OPTION_KEEP_LOG
flag. trans_cannot_safely_rollback() does the rest.
For UPDATE/DELETE .. LIMIT 0 we make additional binlog_query() calls
at the early points of return.
As a safety measure we also convert row format into statement if it is
needed. The condition is decided by
binlog_need_stmt_format(). Basically if there are some row events in
cache we don't need that: table open of row event will trigger
auto-creation anyway.
Multi-update/delete works via mysql_select(). There is no early points
of return, so binlogging is always checked by
send_eof()/abort_resultset(). But we must comply with the above
measure of converting into statement.
:: Syntax change ::
Keyword AUTO enables history partition auto-creation.
Examples:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 MONTH
STARTS '2021-01-01 00:00:00' AUTO PARTITIONS 12;
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME LIMIT 1000 AUTO;
Or with explicit partitions:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO
(PARTITION p0 HISTORY, PARTITION pn CURRENT);
To disable or enable auto-creation one can use ALTER TABLE by adding
or removing AUTO from partitioning specification:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
# Disables auto-creation:
ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR;
# Enables auto-creation:
ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
If the rest of partitioning specification is identical to CREATE TABLE
no repartitioning will be done (for details see MDEV-27328).
:: Description ::
Before executing history-generating DML command (see the list of commands below)
add N history partitions, so that N would be sufficient for potentially
generated history. N > 1 may be required when history partitions are switched
by INTERVAL and current_timestamp is N times further than the interval
boundary of the last history partition.
If the last history partition equals or exceeds LIMIT records then new history
partition is created and selected as the working partition. According to
MDEV-28411 partitions cannot be switched (or created) while the command is
running. Thus LIMIT does not carry strict limitation and the history partition
size must be planned as LIMIT value plus average number of history one DML
command can generate.
Auto-creation is implemented by synchronous fast_alter_partition_table() call
from the thread of the executed DML command before the command itself is run
(by the fallback and retry mechanism similar to Discovery feature,
see Open_table_context).
The name for newly added partitions are generated like default partition names
with extension of MDEV-22155 (which avoids name clashes by extending assignment
counter to next free-enough gap).
These DML commands can trigger auto-creation:
DELETE (including multitable DELETE, excluding DELETE HISTORY)
UPDATE (including multitable UPDATE)
REPLACE (including REPLACE .. SELECT)
INSERT .. ON DUPLICATE KEY UPDATE (including INSERT .. SELECT .. ODKU)
LOAD DATA .. REPLACE
:: Bug fixes ::
MDEV-23642 Locking timeout caused by auto-creation affects original DML
The reasons for this are:
- Do not disrupt main business process (the history is auxiliary service);
- Consequences are non-fatal (history is not lost, but comes into wrong
partition; fixed by partitioning rebuild);
- There is more freedom for application to fail in this case or not: it may
read warning info and find corresponding error number.
- While non-failing command is easy to handle by an application and fail it,
the opposite is hard to handle: there is no automatic actions to fix
failed command and retry, DBA intervention is required and until then
application is non-functioning.
MDEV-23639 Auto-create does not work under LOCK TABLES or inside triggers
Don't do tdc_remove_table() for OT_ADD_HISTORY_PARTITION because it is
not possible in locked tables mode.
LTM_LOCK_TABLES mode (and LTM_PRELOCKED_UNDER_LOCK_TABLES) works out
of the box as fast_alter_partition_table() can reopen tables via
locked_tables_list.
In LTM_PRELOCKED we reopen and relock table manually.
:: More fixes ::
* some_table_marked_for_reopen flag fix
some_table_marked_for_reopen affets only reopen of
m_locked_tables. I.e. Locked_tables_list::reopen_tables() reopens only
tables from m_locked_tables.
* Unused can_recover_from_failed_open() condition
Is recover_from_failed_open() can be really used after
open_and_process_routine()?
:: Reviewed by ::
Sergei Golubchik <serg@mariadb.org>
Moved LIMIT warning from vers_set_hist_part() to new call
vers_check_limit() at table unlock phase. At that point
read_partitions bitmap is already pruned by DML code (see
prune_partitions(), find_used_partitions()) so we have to set
corresponding bits for working history partition.
Also we don't do my_error(ME_WARNING|ME_ERROR_LOG), because at that
point it doesn't update warnings number, so command reports 0 warnings
(but warning list is still updated). Instead we do
push_warning_printf() and sql_print_warning() separately.
Under LOCK TABLES external_lock(F_UNLCK) is not executed. There is
start_stmt(), but no corresponding "stop_stmt()". So for that mode we
call vers_check_limit() directly from close_thread_tables().
Test result has been changed according to new LIMIT and warning
printing algorithm. For convenience all LIMIT warnings are marked with
"You see warning above ^".
TODO MDEV-20345 fixed. Now vers_history_generating() contains
fine-grained list of DML-commands that can generate history (and TODO
mechanism worked well).
- Describe the lifetime of EXPLAIN data structures in
sql_explain.h:ExplainDataStructureLifetime.
- Make Item_field::set_field() call set_refers_to_temp_table()
when it refers to a temp. table.
- Introduce QT_DONT_ACCESS_TMP_TABLES flag for Item::print.
It directs Item_field::print to not try access its the
temp table.
- Introduce Explain_query::notify_tables_are_closed()
and call it right before the query closes its tables.
- Make Explain data stuctures' print_explain_json() methods
accept "no_tmp_tbl" parameter which means pass
QT_DONT_ACCESS_TMP_TABLES when printing items.
- Make Show_explain_request::call_in_target_thread() not call
set_current_thd(). This wasn't needed as the code inside
lex->print_explain() uses output->thd anyway. output->thd
refers to the SHOW command's THD object.
EXPLAIN FOR CONNECTION is a MySQL-compatible syntax for SHOW EXPLAIN.
This commit also adds support for FORMAT=JSON to SHOW EXPLAIN,
so the possible options to get JSON-formatted output are:
- SHOW EXPLAIN FORMAT=JSON FOR $con
- EXPLAIN FORMAT=JSON FOR CONNECTION $con
column generated using date_format() and if()
vcol_info->expr is allocated on expr_arena at parsing stage. Since
expr item is allocated on expr_arena all its containee items must be
allocated on expr_arena too. Otherwise fix_session_expr() will
encounter prematurely freed item.
When table is reopened from cache vcol_info contains stale
expression. We refresh expression via TABLE::vcol_fix_exprs() but
first we must prepare a proper context (Vcol_expr_context) which meets
some requirements:
1. As noted above expr update must be done on expr_arena as there may
be new items created. It was a bug in fix_session_expr_for_read() and
was just not reproduced because of no second refix. Now refix is done
for more cases so it does reproduce. Tests affected: vcol.binlog
2. Also name resolution context must be narrowed to the single table.
Tested by: vcol.update main.default vcol.vcol_syntax gcol.gcol_bugfixes
3. sql_mode must be clean and not fail expr update.
sql_mode such as MODE_NO_BACKSLASH_ESCAPES, MODE_NO_ZERO_IN_DATE, etc
must not affect vcol expression update. If the table was created
successfully any further evaluation must not fail. Tests affected:
main.func_like
Reviewed by: Sergei Golubchik <serg@mariadb.org>
1. moved fix_vcol_exprs() call to open_table()
mysql_alter_table() doesn't do lock_tables() so it cannot win from
fix_vcol_exprs() from there. Tests affected: main.default_session
2. Vanilla cleanups and comments.
Re-execution of a query containing subquery in the FROM clause results
in assert failure in case the query is run as part of a stored routine or
as a prepared statement AND derived table merge optimization is off.
As an example, the following test case
CREATE TABLE t1 (a INT) ;
CREATE PROCEDURE sp() SELECT * FROM (SELECT a FROM t1) tb;
CALL sp();
SET optimizer_switch='derived_merge=off';
CALL sp();
results in assert failure on the second invocation of the 'sp' stored routine.
The reason for assertion failure is that the expression
derived->is_excluded()
returns the value true where the value false expected.
The method is_excluded() returns the value true for a derived table
that has been merged to a parent select. Such transformation happens as part
of Derived Table Merge Optimization that is performed on first invocation of
a stored routine or a prepared statement containing a query with subquery
in the FROM clause of the main SELECT.
When the same routine or prepared statement is run the second time and
Derived Table Merge Optimization is OFF the MariaDB server tries to materialize
a derived table specified by the subquery that fails since this subquery
has already been merged to the top-most SELECT. This transformation is permanent
and can't be reverted. That is the reason why the assert
DBUG_ASSERT(!derived->is_excluded());
fails inside the function TABLE_LIST::set_check_materialized().
Similar behaviour can be observed in case a stored routine or prepared statement
containing a SELECT statement with subquery in the FROM clause, first is run
with the optimizer_switch option set to derived_merge=off and re-run after this
option has been switched to derived_merge=on. In this case a derived table for
subquery is materialized on the first execution and marked as merged derived
table on the second execution that results in error with misleading error
message:
MariaDB [test]> CALL sp1();
ERROR 1030 (HY000): Got error 1 "Operation not permitted" from storage engine MEMORY
To fix the issue, a derived table that has been already optimized shouldn't be
re-marked for one more round of optimization.
One significant consequence following from suggested change is that the data
member TABLE_LIST::derived_type is not updated once the table optimization
has been done. This fact should be taken into account when Prepared Statement
being handled since once a table listed in a query has been optimized on
execution of the statement PREPARE FROM it won't be touched anymore on handling
the statement EXECUTE.
One side effect caused by this change could be observed for the following
test case:
CREATE TABLE t1 (s1 INT);
CREATE VIEW v1 AS
SELECT s1,s2 FROM (SELECT s1 as s2 FROM t1 WHERE s1 <100) x, t1 WHERE t1.s1=x.s2;
INSERT INTO v1 (s1) VALUES (-300);
PREPARE stmt FROM "INSERT INTO v1 (s1) VALUES (-300)";
EXECUTE stmt;
Execution of the above EXECUTE statement results in issuing the error
ER_COLUMNACCESS_DENIED_ERROR since table_ref->is_merged_derived() is false
and check_column_grant_in_table_ref() called for a temporary table that
shouldn't be. To fix this issue the function find_field_in_tables has been
modified in such a way that the function check_column_grant_in_table_ref()
is not called for a temporary table.
This patch reverts the fixes of the bugs MDEV-24454 and MDEV-25631 from
the commit 3690c549c6.
It leaves the changes in plugin/feedback/feedback.cc and corresponding
test files introduced in this commit intact.
Proper fixes for the bug MDEV-24454 and MDEV-25631 will follow immediately.
Diagnostics_area::set_error_status (interrupted ALTER TABLE under LOCK)
Analysis: KILL_QUERY is not ignored when local memory used exceeds maximum
session memory. Hence the query proceeds, OK is sent and we end up
reopening tables that are marked for reopen. During this, kill status is
eventually checked and assertion failure happens during trying to send error
message because OK has already been sent.
Fix: Ok is already sent so statement has already executed. It is too
late to give error. So ignore kill.
Dead code cleanup:
part_info->num_parts usage was wrong and working incorrectly in
mysql_drop_partitions() because num_parts is already updated in
prep_alter_part_table(). We don't have to update part_info->partitions
because part_info is destroyed at alter_partition_lock_handling().
Cleanups:
- DBUG_EVALUATE_IF() macro replaced by shorter form DBUG_IF();
- Typo in ER_KEY_COLUMN_DOES_NOT_EXITS.
Refactorings:
- Splitted write_log_replace_delete_frm() into write_log_delete_frm()
and write_log_replace_frm();
- partition_info via DDL_LOG_STATE;
- set_part_info_exec_log_entry() removed.
DBUG_EVALUATE removed
DBUG_EVALUTATE was only added for consistency together with
DBUG_EVALUATE_IF. It is not used anywhere in the code.
DBUG_SUICIDE() fix on release build
On release DBUG_SUICIDE() was statement. It was wrong as
DBUG_SUICIDE() is used in expression context.
https://jira.mariadb.org/browse/MDEV-26221
my_sys DYNAMIC_ARRAY and DYNAMIC_STRING inconsistancy
The DYNAMIC_STRING uses size_t for sizes, but DYNAMIC_ARRAY used uint.
This patch adjusts DYNAMIC_ARRAY to use size_t like DYNAMIC_STRING.
As the MY_DIR member number_of_files is copied from a DYNAMIC_ARRAY,
this is changed to be size_t.
As MY_TMPDIR members 'cur' and 'max' are copied from a DYNAMIC_ARRAY,
these are also changed to be size_t.
The lists of plugins and stored procedures use DYNAMIC_ARRAY,
but their APIs assume a size of 'uint'; these are unchanged.
There are two fill_record() functions (lines 8343 and 8618). First one
is used when there are some explicit values, the second one is used
for all implicit values. First one does update_default_fields(), the
second one did not. Added update_default_fields() call to the implicit
version of fill_record().
Use in_sum_func (and so nest_level) only in LEX to which SELECT lex belong to
Reduce usage of current_select (because it does not always point on the correct
SELECT_LEX, for example with prepare.
Change context for all classes inherited from Item_ident (was only for Item_field) in case of pushing down it to HAVING.
Now name resolution context have to have SELECT_LEX reference if the context is present.
Fixed feedback plugin stack usage.