When binlog_get_pending_rows_event was refactored, one usage in
binlog_need_stmt_format has not been taken in mind.
As binlog_get_pending_rows_event now requires existing cache_mngr, this check
is now made first.
Constraints processing row_ins_check_foreign_constraint() was not
called because row_upd_check_references_constraints() didn't see
update as delete: node->is_delete was false.
Since MDEV-30378 we check for TRG_EVENT_DELETE to detect versioned
delete in ha_innobase::update_row().
Now we can use TRG_EVENT_DELETE to set upd_node->is_delete, so
constraints processing is triggered correctly.
When replicating MDL events for a table that uses system versioning
without primary keys, ensure that for data sets with duplicate
records, the updates to these records with duplicates are enacted on
the correct row. That is, there was a bug (reported in MDEV-30430)
such that the function to find the row to update would stop after
finding the first matching record. However, in the absence of
primary keys, the version of the record is needed to compare the row
to ensure we are updating the correct one.
The fix, therefore, updates the record comparison functionality to
use system version columns when there are no primary keys on the
table.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
* clarify the help text for --system-versioning-insert-history
* move the vers_write=false check from Item_field::fix_fields()
next to other vers field checks in find_field_in_table()
* move row_start validation from handler::write_row() next to
vers_update_fields()
* make secure_timestamp check to happen in one place only,
extract it into a function is_set_timestamp_vorbidden().
* overwriting vers fields is an error, just like setting @@timestamp
* don't run vers_insert_history() for every row
1. system_versioning_insert_history session variable allows
pseudocolumns ROW_START and ROW_END be specified in INSERT,
INSERT..SELECT and LOAD DATA.
2. Cleaned up select_insert::send_data() from setting vers_write as
this parameter is now set on TABLE initialization.
4. Replication of system_versioning_insert_history via option_bits in
OPTIONS_WRITTEN_TO_BIN_LOG.
If UPDATE/DELETE does not change data it is skipped from
replication. We now force replication of such events when they trigger
partition auto-creation.
For ROLLBACK it is as simple as set OPTION_KEEP_LOG
flag. trans_cannot_safely_rollback() does the rest.
For UPDATE/DELETE .. LIMIT 0 we make additional binlog_query() calls
at the early points of return.
As a safety measure we also convert row format into statement if it is
needed. The condition is decided by
binlog_need_stmt_format(). Basically if there are some row events in
cache we don't need that: table open of row event will trigger
auto-creation anyway.
Multi-update/delete works via mysql_select(). There is no early points
of return, so binlogging is always checked by
send_eof()/abort_resultset(). But we must comply with the above
measure of converting into statement.
:: Syntax change ::
Keyword AUTO enables history partition auto-creation.
Examples:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 MONTH
STARTS '2021-01-01 00:00:00' AUTO PARTITIONS 12;
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME LIMIT 1000 AUTO;
Or with explicit partitions:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO
(PARTITION p0 HISTORY, PARTITION pn CURRENT);
To disable or enable auto-creation one can use ALTER TABLE by adding
or removing AUTO from partitioning specification:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
# Disables auto-creation:
ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR;
# Enables auto-creation:
ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
If the rest of partitioning specification is identical to CREATE TABLE
no repartitioning will be done (for details see MDEV-27328).
:: Description ::
Before executing history-generating DML command (see the list of commands below)
add N history partitions, so that N would be sufficient for potentially
generated history. N > 1 may be required when history partitions are switched
by INTERVAL and current_timestamp is N times further than the interval
boundary of the last history partition.
If the last history partition equals or exceeds LIMIT records then new history
partition is created and selected as the working partition. According to
MDEV-28411 partitions cannot be switched (or created) while the command is
running. Thus LIMIT does not carry strict limitation and the history partition
size must be planned as LIMIT value plus average number of history one DML
command can generate.
Auto-creation is implemented by synchronous fast_alter_partition_table() call
from the thread of the executed DML command before the command itself is run
(by the fallback and retry mechanism similar to Discovery feature,
see Open_table_context).
The name for newly added partitions are generated like default partition names
with extension of MDEV-22155 (which avoids name clashes by extending assignment
counter to next free-enough gap).
These DML commands can trigger auto-creation:
DELETE (including multitable DELETE, excluding DELETE HISTORY)
UPDATE (including multitable UPDATE)
REPLACE (including REPLACE .. SELECT)
INSERT .. ON DUPLICATE KEY UPDATE (including INSERT .. SELECT .. ODKU)
LOAD DATA .. REPLACE
:: Bug fixes ::
MDEV-23642 Locking timeout caused by auto-creation affects original DML
The reasons for this are:
- Do not disrupt main business process (the history is auxiliary service);
- Consequences are non-fatal (history is not lost, but comes into wrong
partition; fixed by partitioning rebuild);
- There is more freedom for application to fail in this case or not: it may
read warning info and find corresponding error number.
- While non-failing command is easy to handle by an application and fail it,
the opposite is hard to handle: there is no automatic actions to fix
failed command and retry, DBA intervention is required and until then
application is non-functioning.
MDEV-23639 Auto-create does not work under LOCK TABLES or inside triggers
Don't do tdc_remove_table() for OT_ADD_HISTORY_PARTITION because it is
not possible in locked tables mode.
LTM_LOCK_TABLES mode (and LTM_PRELOCKED_UNDER_LOCK_TABLES) works out
of the box as fast_alter_partition_table() can reopen tables via
locked_tables_list.
In LTM_PRELOCKED we reopen and relock table manually.
:: More fixes ::
* some_table_marked_for_reopen flag fix
some_table_marked_for_reopen affets only reopen of
m_locked_tables. I.e. Locked_tables_list::reopen_tables() reopens only
tables from m_locked_tables.
* Unused can_recover_from_failed_open() condition
Is recover_from_failed_open() can be really used after
open_and_process_routine()?
:: Reviewed by ::
Sergei Golubchik <serg@mariadb.org>
When there are E empty partitions left, auto-create N new empty
partitions for SYSTEM_TIME partitioning rotated by INTERVAL/LIMIT and
marked by AUTO_INCREMENT keyword. Syntax change: AUTO_INCREMENT
keyword (or shorter AUTO may be used instead) after LIMIT/INTERVAL
clause.
CREATE OR REPLACE TABLE t (x INT) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME LIMIT 100000 AUTO_INCREMENT;
CREATE OR REPLACE TABLE t (x INT) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 WEEK AUTO_INCREMENT;
The current revision implements hard-coded values of 1 for E and N. As
well as auto-creation threshold MinInterval = 1 hour, MinLimit = 1000.
The name for newly added partition will be first chosen as "pX", where
X is partition number and "p" is hard-coded name prefix. If this name
is already occupied, the X will be incremented until the resulting
name will be free to use.
ALTER TABLE ADD PARTITION is now always fast. If there some history
partition overflow occurs manual ALTER TABLE REBUILD PARTITION is
needed.
RBR cannot work with system versioning on the master.
row_end column is either system time (not @@timestamp) with microsecond
precision or transaction id. Either way, it'll certainly be different
on the slave. So if the master row contains row_end column, it won't
match on the slave. And if we ignore row_end when comparing,
then some other row might match instead.