Problem:
========
Replication can break while applying a query log event if its
respective command errors on the primary, but is ignored by the
replication filter within Grant_tables on the replica. The bug
reported by MDEV-28530 shows this with REVOKE ALL PRIVILEGES using a
non-existent user. The primary will binlog the REVOKE command with
an error code, and the replica will think the command executed with
success because the replication filter will ignore the command while
accessing the Grant_tables classes. When the replica performs an
error check, it sees the difference between the error codes, and
replication breaks.
Solution:
========
If the replication filter check done by Grant_tables logic ignores
the tables, reset thd->slave_expected_error to 0 so that
Query_log_event::do_apply_event() can be made aware that the
underlying query was ignored when it compares errors.
Note that this bug also effects DROP USER if not all users exist
in the provided list, and the patch fixes and tests this case.
Reviewed By:
============
andrei.elkin@mariadb.com
Atomic CREATE OR REPLACE allows to keep an old table intact if the
command fails or during the crash. That is done through creating
a table with a temporary name and filling it with the data
(for CREATE OR REPLACE .. SELECT), then renaming the original table
to another temporary (backup) name and renaming the replacement table
to original table. The backup table is kept until the last chance of
failure and if that happens, the replacement table is thrown off and
backup recovered. When the command is complete and logged the backup
table is deleted.
Atomic replace algorithm
Two DDL chains are used for CREATE OR REPLACE:
ddl_log_state_create (C) and ddl_log_state_rm (D).
1. (C) Log CREATE_TABLE_ACTION of TMP table (drops TMP table);
2. Create new table as TMP;
3. Do everything with TMP (like insert data);
finalize_atomic_replace():
4. Link chains: (D) is executed only if (C) is closed;
5. (D) Log DROP_ACTION of BACKUP;
6. (C) Log RENAME_TABLE_ACTION from ORIG to BACKUP (replays BACKUP -> ORIG);
7. Rename ORIG to BACKUP;
8. (C) Log CREATE_TABLE_ACTION of ORIG (drops ORIG);
9. Rename TMP to ORIG;
finalize_ddl() in case of success:
10. Close (C);
11. Replay (D): BACKUP is dropped.
finalize_ddl() in case of error:
10. Close (D);
11. Replay (C):
1) ORIG is dropped (only after finalize_atomic_replace());
2) BACKUP renamed to ORIG (only after finalize_atomic_replace());
3) drop TMP.
If crash happens (C) or (D) is replayed in reverse order. (C) is
replayed if crash happens before it is closed, otherwise (D) is
replayed.
Temporary table for CREATE OR REPLACE
Before dropping "old" table, CREATE OR REPLACE creates "tmp" table.
ddl_log_state_create holds the drop of the "tmp" table. When
everything is OK (data is inserted, "tmp" is ready) ddl_log_state_rm
is written to replace "old" with "tmp". Until ddl_log_state_create
is closed ddl_log_state_rm is not executed.
After the binlogging is done ddl_log_state_create is closed. At that
point ddl_log_state_rm is executed and "tmp" is replaced with
"old". That is: final rename is done by the DDL log.
With that important role of DDL log for CREATE OR REPLACE operation
replay of ddl_log_state_rm must fail at the first hit error and
print the error message if possible. F.ex. foreign key error is
discovered at this phase: InnoDB rejects to drop the "old" table and
returns corresponding foreign key error code.
Additional notes
- CREATE TABLE without REPLACE is not affected by this commit.
- Engines having HTON_EXPENSIVE_RENAME flag set are not affected by
this commit.
- CREATE TABLE .. SELECT XID usage is fixed and now there is no need
to log DROP TABLE via DDL_CREATE_TABLE_PHASE_LOG (see comments in
do_postlock()). XID is now correctly updated so it disables
DDL_LOG_DROP_TABLE_ACTION. Note that binary log is flushed at the
final stage when the table is ready. So if we have XID in the
binary log we don't need to drop the table.
- Three variations of CREATE OR REPLACE handled:
1. CREATE OR REPLACE TABLE t1 (..);
2. CREATE OR REPLACE TABLE t1 LIKE t2;
3. CREATE OR REPLACE TABLE t1 SELECT ..;
- Test case uses 6 combinations for engines (aria, aria_notrans,
myisam, ib, lock_tables, expensive_rename) and 2 combinations for
binlog types (row, stmt). Combinations help to check differences
between the results. Error failures are tested for the above three
variations.
- expensive_rename tests CREATE OR REPLACE without atomic
replace. The effect should be the same as with the old behaviour
before this commit.
- Triggers mechanism is unaffected by this change. This is tested in
create_replace.test.
- LOCK TABLES is affected. Lock restoration must be done after "rm"
chain is replayed.
- Moved ddl_log_complete() from send_eof() to finalize_ddl(). This
checkpoint was not executed before for normal CREATE TABLE but is
executed now.
- CREATE TABLE will now rollback also if writing to the binary
logging failed. See rpl_gtid_strict.test
Rename and drop via DDL log
We replay ddl_log_state_rm to drop the old table and rename the
temporary table. In that case we must throw the correct error
message if ddl_log_revert() fails (f.ex. on FK error).
If table is deleted earlier and not via DDL log and the crash
happened, the create chain is not closed. Linked drop chain is not
executed and the new table is not installed. But the old table is
already deleted.
ddl_log.cc changes
Now we can place action before DDL_LOG_DROP_INIT_ACTION and it will
be replayed after DDL_LOG_DROP_TABLE_ACTION.
report_error parameter for ddl_log_revert() allows to fail at first
error and print the error message if possible.
ddl_log_execute_action() now can print error message.
Since we now can handle errors from ddl_log_execute_action() (in
case of non-recovery execution) unconditional setting "error= TRUE"
is wrong (it was wrong anyway because it was overwritten at the end
of the function).
On XID usage
Like with all other atomic DDL operations XID is used to avoid
inconsistency between master and slave in the case of a crash after
binary log is written and before ddl_log_state_create is closed. On
recovery XIDs are taken from binary log and corresponding DDL log
events get disabled. That is done by
ddl_log_close_binlogged_events().
On linking two chains together
Chains are executed in the ascending order of entry_pos of execute
entries. But entry_pos assignment order is undefined: it may assign
bigger number for the first chain and then smaller number for the
second chain. So the execution order in that case will be reverse:
second chain will be executed first.
To avoid that we link one chain to another. While the base chain
(ddl_log_state_create) is active the secondary chain
(ddl_log_state_rm) is not executed. That is: only one chain can be
executed in two linked chains.
The interface ddl_log_link_chains() was done in "MDEV-22166
ddl_log_write_execute_entry() extension".
More on CREATE OR REPLACE .. SELECT
We use create_and_open_tmp_table() like in ALTER TABLE to create
temporary TABLE object (tmp_table is (NON_)TRANSACTIONAL_TMP_TABLE).
After we created such TABLE object we use create_info->tmp_table()
instead of table->s->tmp_table when we need to check for
parser-requested tmp-table.
External locking is required for temporary table created by
create_and_open_tmp_table(). F.ex. that disables logging for Aria
transactional tables and without that (when no mysql_lock_tables()
is done) it cannot work correctly.
For making external lock the patch requires Aria table to work in
non-transactional mode. That is usually done by
ha_enable_transaction(false). But we cannot disable transaction
completely because: 1. binlog rollback removes pending row events
(binlog_remove_pending_rows_event()). The row events are added
during CREATE .. SELECT data insertion phase. 2. replication slave
highly depends on transaction and cannot work without it.
So we put temporary Aria table into non-transactional mode with
"thd->transaction->on hack". See comment for on_save variable.
Note that Aria table has internal_table mode. But we cannot use it
because:
if (!internal_table)
{
mysql_mutex_lock(&THR_LOCK_myisam);
old_info= test_if_reopen(name_buff);
}
For internal_table test_if_reopen() is not called and we get a new
MARIA_SHARE for each file handler. In that case duplicate errors are
missed because insert and lookup in CREATE .. SELECT is done via two
different handlers (see create_lookup_handler()).
For temporary table before dropping TABLE_SHARE by
drop_temporary_table() we must do ha_reset(). ha_reset() releases
storage share. Without that the share is kept and the second CREATE
OR REPLACE .. SELECT fails with:
HA_ERR_TABLE_EXIST (156): MyISAM table '#sql-create-b5377-4-t2' is
in use (most likely by a MERGE table). Try FLUSH TABLES.
HA_EXTRA_PREPARE_FOR_DROP also removes MYISAM_SHARE, but that is
not needed as ha_reset() does the job.
ha_reset() is usually done by
mark_tmp_table_as_free_for_reuse(). But we don't need that mechanism
for our temporary table.
Atomic_info in HA_CREATE_INFO
Many functions in CREATE TABLE pass the same parameters. These
parameters are part of table creation info and should be in
HA_CREATE_INFO (or whatever). Passing parameters via single
structure is much easier for adding new data and
refactoring.
InnoDB changes (revised by Marko Mäkelä)
row_rename_table_for_mysql(): Specify the treatment of FOREIGN KEY
constraints in a 4-valued enum parameter. In cases where FOREIGN KEY
constraints cannot exist (partitioned tables, or internal tables of
FULLTEXT INDEX), we can use the mode RENAME_IGNORE_FK.
The mod RENAME_REBUILD is for any DDL operation that rebuilds the
table inside InnoDB, such as TRUNCATE and native ALTER TABLE
(or OPTIMIZE TABLE). The mode RENAME_ALTER_COPY is used solely
during non-native ALTER TABLE in ha_innobase::rename_table().
Normal ha_innobase::rename_table() will use the mode RENAME_FK.
CREATE OR REPLACE will rename the old table (if one exists) along
with its FOREIGN KEY constraints into a temporary name. The replacement
table will be initially created with another temporary name.
Unlike in ALTER TABLE, all FOREIGN KEY constraints must be renamed
and not inherited as part of these operations, using the mode RENAME_FK.
dict_get_referenced_table(): Let the callers convert names when needed.
create_table_info_t::create_foreign_keys(): CREATE OR REPLACE creates
the replacement table with a temporary name table, so for
self-references foreign->referenced_table will be a table with
temporary name and charset conversion must be skipped for it.
Reviewed by:
Michael Widenius <monty@mariadb.org>
create_table duplicates select_insert::table_list. Since select_create
inherits select_insert and the functional role of the members is the
same we should remove one to eliminate the need of keeping them in
sync.
TABLEOP_HOOKS is a strange interface: proxy interface calls virtual
interface. Since it is used only for select_create::prepare() such
complexity is overwhelming.
Making changes to wsrep_mysqld.h causes large parts of server code to
be recompiled. The reason is that wsrep_mysqld.h is included by
sql_class.h, even tough very little of wsrep_mysqld.h is needed in
sql_class.h. This commit introduces a new header file, wsrep_on.h,
which is meant to be included from sql_class.h, and contains only
macros and variable declarations used to determine whether wsrep is
enabled.
Also, header wsrep.h should only contain definitions that are also
used outside of sql/. Therefore, move WSREP_TO_ISOLATION* and
WSREP_SYNC_WAIT macros to wsrep_mysqld.h.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
TIMESTAMP columns were compared as strings in ALL/ANY comparison,
which did not work well near DST time change.
Changing ALL/ANY comparison to use "Native" representation to compare
TIMESTAMP columns, like simple comparison does.
- Added one neutral and 22 tailored (language specific) collations based on
Unicode Collation Algorithm version 14.0.0.
Collations were added for Unicode character sets
utf8mb3, utf8mb4, ucs2, utf16, utf32.
Every tailoring was added with four accent and case
sensitivity flag combinations, e.g:
* utf8mb4_uca1400_swedish_as_cs
* utf8mb4_uca1400_swedish_as_ci
* utf8mb4_uca1400_swedish_ai_cs
* utf8mb4_uca1400_swedish_ai_ci
and their _nopad_ variants:
* utf8mb4_uca1400_swedish_nopad_as_cs
* utf8mb4_uca1400_swedish_nopad_as_ci
* utf8mb4_uca1400_swedish_nopad_ai_cs
* utf8mb4_uca1400_swedish_nopad_ai_ci
- Introducing a conception of contextually typed named collations:
CREATE DATABASE db1 CHARACTER SET utf8mb4;
CREATE TABLE db1.t1 (a CHAR(10) COLLATE uca1400_as_ci);
The idea is that there is no a need to specify the character set prefix
in the new collation names. It's enough to type just the suffix
"uca1400_as_ci". The character set is taken from the context.
In the above example script the context character set is utf8mb4.
So the CREATE TABLE will make a column with the collation
utf8mb4_uca1400_as_ci.
Short collations names can be used in any parts of the SQL syntax
where the COLLATE clause is understood.
- New collations are displayed only one time
(without character set combinations) by these statements:
SELECT * FROM INFORMATION_SCHEMA.COLLATIONS;
SHOW COLLATION;
For example, all these collations:
- utf8mb3_uca1400_swedish_as_ci
- utf8mb4_uca1400_swedish_as_ci
- ucs2_uca1400_swedish_as_ci
- utf16_uca1400_swedish_as_ci
- utf32_uca1400_swedish_as_ci
have just one entry in INFORMATION_SCHEMA.COLLATIONS and SHOW COLLATION,
with COLLATION_NAME equal to "uca1400_swedish_as_ci", which is the suffix
without the character set name:
SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';
+-----------------------+
| COLLATION_NAME |
+-----------------------+
| uca1400_swedish_as_ci |
+-----------------------+
Note, the behaviour of old collations did not change.
Non-unicode collations (e.g. latin1_swedish_ci) and
old UCA-4.0.0 collations (e.g. utf8mb4_unicode_ci)
are still displayed with the character set prefix, as before.
- The structure of the table INFORMATION_SCHEMA.COLLATIONS was changed.
The NOT NULL constraint was removed from these columns:
- CHARACTER_SET_NAME
- ID
- IS_DEFAULT
and from the corresponding columns in SHOW COLLATION.
For example:
SELECT COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';
+-----------------------+--------------------+------+------------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT |
+-----------------------+--------------------+------+------------+
| uca1400_swedish_as_ci | NULL | NULL | NULL |
+-----------------------+--------------------+------+------------+
The NULL value in these columns now means that the collation
is applicable to multiple character sets.
The behavioir of old collations did not change.
Make sure your client programs can handle NULL values in these columns.
- The structure of the table
INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY was changed.
Three new NOT NULL columns were added:
- FULL_COLLATION_NAME
- ID
- IS_DEFAULT
New collations have multiple entries in COLLATION_CHARACTER_SET_APPLICABILITY.
The column COLLATION_NAME contains the collation name without the character
set prefix. The column FULL_COLLATION_NAME contains the collation name with
the character set prefix.
Old collations have full collation name in both FULL_COLLATION_NAME and
COLLATION_NAME.
SELECT COLLATION_NAME, FULL_COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
WHERE FULL_COLLATION_NAME RLIKE '^(utf8mb4|latin1).*swedish.*ci$';
+-----------------------------+-------------------------------------+--------------------+------+------------+
| COLLATION_NAME | FULL_COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT |
+-----------------------------+-------------------------------------+--------------------+------+------------+
| latin1_swedish_ci | latin1_swedish_ci | latin1 | 8 | Yes |
| latin1_swedish_nopad_ci | latin1_swedish_nopad_ci | latin1 | 1032 | |
| utf8mb4_swedish_ci | utf8mb4_swedish_ci | utf8mb4 | 232 | |
| uca1400_swedish_ai_ci | utf8mb4_uca1400_swedish_ai_ci | utf8mb4 | 2368 | |
| uca1400_swedish_as_ci | utf8mb4_uca1400_swedish_as_ci | utf8mb4 | 2370 | |
| uca1400_swedish_nopad_ai_ci | utf8mb4_uca1400_swedish_nopad_ai_ci | utf8mb4 | 2372 | |
| uca1400_swedish_nopad_as_ci | utf8mb4_uca1400_swedish_nopad_as_ci | utf8mb4 | 2374 | |
+-----------------------------+-------------------------------------+--------------------+------+------------+
- Other INFORMATION_SCHEMA queries:
SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS;
SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.PARAMETERS;
SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES;
SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.ROUTINES;
SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.EVENTS;
SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.EVENTS;
SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.ROUTINES;
SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.ROUTINES;
SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.TRIGGERS;
SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.TRIGGERS;
SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.VIEWS;
display full collation names, including character sets prefix,
for all collations, including new collations.
Corresponding SHOW commands also display full collation names
in collation related columns:
SHOW CREATE TABLE t1;
SHOW CREATE DATABASE db1;
SHOW TABLE STATUS;
SHOW CREATE FUNCTION f1;
SHOW CREATE PROCEDURE p1;
SHOW CREATE EVENT ev1;
SHOW CREATE TRIGGER tr1;
SHOW CREATE VIEW;
These INFORMATION_SCHEMA queries and SHOW statements may change in
the future, to display show collation names.
When creating a recursive CTE, the column types are taken from the
non recursive part of the CTE (this is according to the SQL standard).
This patch adds code to abort the CTE if the calculated values in the
recursive part does not fit in the fields in the created temporary table.
The new code only affects recursive CTE, so it should not cause any notable
problems for old applications.
Other things:
- Fixed that we get correct row numbers for warnings generated with
WITH RECURSIVE
Reviewer: Alexander Barkov <bar@mariadb.com>
New Feature:
============
This patch adds a new system variable, @@slave_max_statement_time,
which limits the execution time of s slave’s events that implements
an equivalent to @@max_statement_time for slave applier.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
Part #2: Extend heuristic pruning to use multiple tables as the
"Model tables".
Before the patch, heuristic pruning uses only one "Model table":
The table which had the best cost AND record became the "Model table".
After that, if a table's cost and record were both worse than
those of the Model Table, the table would be pruned away.
This didn't work well when the first table (the optimizer sorts them
by record_count) had low record_count but relatively high cost: nothing
could be pruned afterwards.
The patch adds the two additional "Model tables": one with the least
cost and the other with the least record_count.
(In both cases, a table can be pruned away if BOTH its cost and
record_count are worse than those of a Model table)
The new pruning is active when the number of tables to consider for
the prefix is higher than @@optimizer_extra_pruning_depth.
One can see the new pruning in the Optimizer Trace as
- "pruned_by_heuristic":"min_record_count", or
- "pruned_by_heuristic":"min_read_time".
Old heuristic pruning shows as "pruned_by_heuristic":1.
MDEV-28073 Slow query performance in MariaDB when using many tables
The faster we can find a good query plan, the more options we have for
finding and pruning (ignoring) bad plans.
This patch adds sorting of plans to best_extension_by_limited_search().
The plans, from best_access_path() are sorted according to the numbers
of found rows. This allows us to faster find 'good tables' and we are
thus able to eliminate 'bad plans' faster.
One side effect of this patch is that if two tables have equal cost,
the table that which was used earlier in the query is preferred.
This allows users to improve plans by reordering eq_ref tables in the
order they would like them to be uses.
Result changes caused by the patch:
- Traces are different as now we print the cost for using tables before
we start considering them in the plan.
- Table order are changed for some plans. In most cases this is because
the plans are equal and tables are in this case sorted according to
their usage in the original query.
- A few plans was changed as the optimizer was able to find a better
plan (that was pruned by the original code).
Other things:
- Added a new statistic variable: "optimizer_join_prefixes_check_calls",
which counts number of calls to best_extension_by_limited_search().
This can be used to check the prune efficiency in greedy_search().
- Added variable "JOIN_TAB::embedded_dependent" to be able to handle
XX IN (SELECT..) in the greedy_optimizer. The idea is that we
should prune a table if any of the tables in embedded_dependent is
not yet read.
- When using many tables in a query, there will be some additional
memory usage as we need to pre-allocate table of
table_count*table_count*sizeof(POSITION) objects (POSITION is 312
bytes for now) to hold the pre-calculated best_access_path()
information. This memory usage is offset by the expected
performance improvement when using many tables in a query.
- Removed the code from an earlier patch to keep the table order in
join->best_ref in the original order. This is not needed anymore as we
are now sorting the tables for each best_extension_by_limited_search()
call.
Also fixes
MDEV-27782 Wrong columns when using table level `CHARACTER SET utf8mb4 COLLATE DEFAULT`
MDEV-28644 Unexpected error on ALTER TABLE t1 CONVERT TO CHARACTER SET utf8mb3, DEFAULT CHARACTER SET utf8mb4
MDEV-21810 MBR: Unexpected "Unsafe statement" warning for unsafe IODKU
MDEV-17614 fixes to replication unsafety for INSERT ON DUP KEY UPDATE
on two or more unique key table left a flaw. The fixes checked the
safety condition per each inserted record with the idea to catch a user-created
value to an autoincrement column and when that succeeds the autoincrement column
would become the source of unsafety too.
It was not expected that after a duplicate error the next record's
write_set may become different and the unsafe decision for that
specific record will be computed to screw the Query's binlogging
state and when @@binlog_format is MIXED nothing gets bin-logged.
This case has been already fixed in 10.5.2 by 91ab42a823 that
relocated/optimized THD::decide_logging_format_low() out of the record insert
loop. The safety decision is computed once and at the right time.
Pertinent parts of the commit are cherry-picked.
Also a spurious warning about unsafety is removed when MIXED
@@binlog_format; original MDEV-17614 test result corrected.
The original test of MDEV-17614 is extended and made more readable.
If UPDATE/DELETE does not change data it is skipped from
replication. We now force replication of such events when they trigger
partition auto-creation.
For ROLLBACK it is as simple as set OPTION_KEEP_LOG
flag. trans_cannot_safely_rollback() does the rest.
For UPDATE/DELETE .. LIMIT 0 we make additional binlog_query() calls
at the early points of return.
As a safety measure we also convert row format into statement if it is
needed. The condition is decided by
binlog_need_stmt_format(). Basically if there are some row events in
cache we don't need that: table open of row event will trigger
auto-creation anyway.
Multi-update/delete works via mysql_select(). There is no early points
of return, so binlogging is always checked by
send_eof()/abort_resultset(). But we must comply with the above
measure of converting into statement.
:: Syntax change ::
Keyword AUTO enables history partition auto-creation.
Examples:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 MONTH
STARTS '2021-01-01 00:00:00' AUTO PARTITIONS 12;
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME LIMIT 1000 AUTO;
Or with explicit partitions:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO
(PARTITION p0 HISTORY, PARTITION pn CURRENT);
To disable or enable auto-creation one can use ALTER TABLE by adding
or removing AUTO from partitioning specification:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
# Disables auto-creation:
ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR;
# Enables auto-creation:
ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
If the rest of partitioning specification is identical to CREATE TABLE
no repartitioning will be done (for details see MDEV-27328).
:: Description ::
Before executing history-generating DML command (see the list of commands below)
add N history partitions, so that N would be sufficient for potentially
generated history. N > 1 may be required when history partitions are switched
by INTERVAL and current_timestamp is N times further than the interval
boundary of the last history partition.
If the last history partition equals or exceeds LIMIT records then new history
partition is created and selected as the working partition. According to
MDEV-28411 partitions cannot be switched (or created) while the command is
running. Thus LIMIT does not carry strict limitation and the history partition
size must be planned as LIMIT value plus average number of history one DML
command can generate.
Auto-creation is implemented by synchronous fast_alter_partition_table() call
from the thread of the executed DML command before the command itself is run
(by the fallback and retry mechanism similar to Discovery feature,
see Open_table_context).
The name for newly added partitions are generated like default partition names
with extension of MDEV-22155 (which avoids name clashes by extending assignment
counter to next free-enough gap).
These DML commands can trigger auto-creation:
DELETE (including multitable DELETE, excluding DELETE HISTORY)
UPDATE (including multitable UPDATE)
REPLACE (including REPLACE .. SELECT)
INSERT .. ON DUPLICATE KEY UPDATE (including INSERT .. SELECT .. ODKU)
LOAD DATA .. REPLACE
:: Bug fixes ::
MDEV-23642 Locking timeout caused by auto-creation affects original DML
The reasons for this are:
- Do not disrupt main business process (the history is auxiliary service);
- Consequences are non-fatal (history is not lost, but comes into wrong
partition; fixed by partitioning rebuild);
- There is more freedom for application to fail in this case or not: it may
read warning info and find corresponding error number.
- While non-failing command is easy to handle by an application and fail it,
the opposite is hard to handle: there is no automatic actions to fix
failed command and retry, DBA intervention is required and until then
application is non-functioning.
MDEV-23639 Auto-create does not work under LOCK TABLES or inside triggers
Don't do tdc_remove_table() for OT_ADD_HISTORY_PARTITION because it is
not possible in locked tables mode.
LTM_LOCK_TABLES mode (and LTM_PRELOCKED_UNDER_LOCK_TABLES) works out
of the box as fast_alter_partition_table() can reopen tables via
locked_tables_list.
In LTM_PRELOCKED we reopen and relock table manually.
:: More fixes ::
* some_table_marked_for_reopen flag fix
some_table_marked_for_reopen affets only reopen of
m_locked_tables. I.e. Locked_tables_list::reopen_tables() reopens only
tables from m_locked_tables.
* Unused can_recover_from_failed_open() condition
Is recover_from_failed_open() can be really used after
open_and_process_routine()?
:: Reviewed by ::
Sergei Golubchik <serg@mariadb.org>