Hash index is vcol-based wrapper (MDEV-371). row_end is added to
unique index. So when row_end is updated unique hash index must be
recalculated via vcol_update_fields(). DELETE did not update virtual
fields, so DELETE HISTORY was getting wrong hash value.
The fix does update_virtual_fields() on vers_update_end() so in every
case row_end is updated virtual fields are updated as well.
In any test that uses wait_all_purged.inc, ensure that InnoDB tables
will be created without persistent statistics.
This is a follow-up to commit cd04673a17
after a similar failure was observed in the innodb_zip.blob test.
The motivation of introducing the parameter
innodb_purge_rseg_truncate_frequency in
mysql/mysql-server@28bbd66ea5 and
mysql/mysql-server@8fc2120fed
seems to have been to avoid stalls due to freeing undo log pages
or truncating undo log tablespaces. In MariaDB Server,
innodb_undo_log_truncate=ON should be a much lighter operation
than in MySQL, because it will not involve any log checkpoint.
Another source of performance stalls should be
trx_purge_truncate_rseg_history(), which is shrinking the history list
by freeing the undo log pages whose undo records have been purged.
To alleviate that, we will introduce a purge_truncation_task that will
offload this from the purge_coordinator_task. In that way, the next
innodb_purge_batch_size pages may be parsed and purged while the pages
from the previous batch are being freed and the history list being shrunk.
The processing of innodb_undo_log_truncate=ON will still remain the
responsibility of the purge_coordinator_task.
purge_coordinator_state::count: Remove. We will ignore
innodb_purge_rseg_truncate_frequency, and act as if it had been
set to 1 (the maximum shrinking frequency).
purge_coordinator_state::do_purge(): Invoke an asynchronous task
purge_truncation_callback() to free the undo log pages.
purge_sys_t::iterator::free_history(): Free those undo log pages
that have been processed. This used to be a part of
trx_purge_truncate_history().
purge_sys_t::clone_end_view(): Take a new value of purge_sys.head
as a parameter, so that it will be updated while holding exclusive
purge_sys.latch. This is needed for race-free access to the field
in purge_truncation_callback().
Reviewed by: Vladislav Lesin
1. In case of system-versioned table add row_end into FTS_DOC_ID index
in fts_create_common_tables() and innobase_create_key_defs().
fts_n_uniq() returns 1 or 2 depending on whether the table is
system-versioned.
After this patch recreate of FTS_DOC_ID index is required for
existing system-versioned tables. If you see this message in error
log or server warnings: "InnoDB: Table db/t1 contains 2 indexes
inside InnoDB, which is different from the number of indexes 1
defined in the MariaDB" use this command to fix the table:
ALTER TABLE db.t1 FORCE;
2. Fix duplicate history for secondary unique index like it was done
in MDEV-23644 for clustered index (932ec586aa). In case of
existing history row which conflicts with currently inseted row we
check in row_ins_scan_sec_index_for_duplicate() whether that row
was inserted as part of current transaction. In that case we
indicate with DB_FOREIGN_DUPLICATE_KEY that new history row is not
needed and should be silently skipped.
3. Some parts of MDEV-21138 (7410ff436e) reverted. Skipping of
FTS_DOC_ID index for history rows made problems with purge
system. Now this is fixed differently by p.2.
4. wait_all_purged.inc checks that we didn't affect non-history rows
so they are deleted and purged correctly.
Additional FTS fixes
fts_init_get_doc_id(): exclude history rows from max_doc_id
calculation. fts_init_get_doc_id() callback is used only for crash
recovery.
fts_add_doc_by_id(): set max value for row_end field.
fts_read_stopword(): stopwords table can be system-versioned too. We
now read stopwords only for current data.
row_insert_for_mysql(): exclude history rows from doc_id validation.
row_merge_read_clustered_index(): exclude history_rows from doc_id
processing.
fts_load_user_stopword(): for versioned table retrieve row_end field
and skip history rows. For non-versioned table we retrieve 'value'
field twice (just for uniformity).
FTS tests for System Versioning now include maybe_versioning.inc which
adds 3 combinations:
'vers' for debug build sets sysvers_force and
sysvers_hide. sysvers_force makes every created table
system-versioned, sysvers_hide hides WITH SYSTEM VERSIONING
for SHOW CREATE.
Note: basic.test, stopword.test and versioning.test do not
require debug for 'vers' combination. This is controlled by
$modify_create_table in maybe_versioning.inc and these
tests run WITH SYSTEM VERSIONING explicitly which allows to
test 'vers' combination on non-debug builds.
'vers_trx' like 'vers' sets sysvers_force_trx and sysvers_hide. That
tests FTS with trx_id-based System Versioning.
'orig' works like before: no System Versioning is added, no debug is
required.
Upgrade/downgrade test for System Versioning is done by
innodb_fts.versioning. It has 2 combinations:
'prepare' makes binaries in std_data (requires old server and OLD_BINDIR).
It tests upgrade/downgrade against old server as well.
'upgrade' tests upgrade against binaries in std_data.
Cleanups:
Removed innodb-fts-stopword.test as it duplicates stopword.test
:: Syntax change ::
Keyword AUTO enables history partition auto-creation.
Examples:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 MONTH
STARTS '2021-01-01 00:00:00' AUTO PARTITIONS 12;
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME LIMIT 1000 AUTO;
Or with explicit partitions:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO
(PARTITION p0 HISTORY, PARTITION pn CURRENT);
To disable or enable auto-creation one can use ALTER TABLE by adding
or removing AUTO from partitioning specification:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
# Disables auto-creation:
ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR;
# Enables auto-creation:
ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
If the rest of partitioning specification is identical to CREATE TABLE
no repartitioning will be done (for details see MDEV-27328).
:: Description ::
Before executing history-generating DML command (see the list of commands below)
add N history partitions, so that N would be sufficient for potentially
generated history. N > 1 may be required when history partitions are switched
by INTERVAL and current_timestamp is N times further than the interval
boundary of the last history partition.
If the last history partition equals or exceeds LIMIT records then new history
partition is created and selected as the working partition. According to
MDEV-28411 partitions cannot be switched (or created) while the command is
running. Thus LIMIT does not carry strict limitation and the history partition
size must be planned as LIMIT value plus average number of history one DML
command can generate.
Auto-creation is implemented by synchronous fast_alter_partition_table() call
from the thread of the executed DML command before the command itself is run
(by the fallback and retry mechanism similar to Discovery feature,
see Open_table_context).
The name for newly added partitions are generated like default partition names
with extension of MDEV-22155 (which avoids name clashes by extending assignment
counter to next free-enough gap).
These DML commands can trigger auto-creation:
DELETE (including multitable DELETE, excluding DELETE HISTORY)
UPDATE (including multitable UPDATE)
REPLACE (including REPLACE .. SELECT)
INSERT .. ON DUPLICATE KEY UPDATE (including INSERT .. SELECT .. ODKU)
LOAD DATA .. REPLACE
:: Bug fixes ::
MDEV-23642 Locking timeout caused by auto-creation affects original DML
The reasons for this are:
- Do not disrupt main business process (the history is auxiliary service);
- Consequences are non-fatal (history is not lost, but comes into wrong
partition; fixed by partitioning rebuild);
- There is more freedom for application to fail in this case or not: it may
read warning info and find corresponding error number.
- While non-failing command is easy to handle by an application and fail it,
the opposite is hard to handle: there is no automatic actions to fix
failed command and retry, DBA intervention is required and until then
application is non-functioning.
MDEV-23639 Auto-create does not work under LOCK TABLES or inside triggers
Don't do tdc_remove_table() for OT_ADD_HISTORY_PARTITION because it is
not possible in locked tables mode.
LTM_LOCK_TABLES mode (and LTM_PRELOCKED_UNDER_LOCK_TABLES) works out
of the box as fast_alter_partition_table() can reopen tables via
locked_tables_list.
In LTM_PRELOCKED we reopen and relock table manually.
:: More fixes ::
* some_table_marked_for_reopen flag fix
some_table_marked_for_reopen affets only reopen of
m_locked_tables. I.e. Locked_tables_list::reopen_tables() reopens only
tables from m_locked_tables.
* Unused can_recover_from_failed_open() condition
Is recover_from_failed_open() can be really used after
open_and_process_routine()?
:: Reviewed by ::
Sergei Golubchik <serg@mariadb.org>
Moved LIMIT warning from vers_set_hist_part() to new call
vers_check_limit() at table unlock phase. At that point
read_partitions bitmap is already pruned by DML code (see
prune_partitions(), find_used_partitions()) so we have to set
corresponding bits for working history partition.
Also we don't do my_error(ME_WARNING|ME_ERROR_LOG), because at that
point it doesn't update warnings number, so command reports 0 warnings
(but warning list is still updated). Instead we do
push_warning_printf() and sql_print_warning() separately.
Under LOCK TABLES external_lock(F_UNLCK) is not executed. There is
start_stmt(), but no corresponding "stop_stmt()". So for that mode we
call vers_check_limit() directly from close_thread_tables().
Test result has been changed according to new LIMIT and warning
printing algorithm. For convenience all LIMIT warnings are marked with
"You see warning above ^".
TODO MDEV-20345 fixed. Now vers_history_generating() contains
fine-grained list of DML-commands that can generate history (and TODO
mechanism worked well).
Item_func_history (is_history()) is a bool function that checks if the
row is the history row by checking row_end->is_max(). The argument to
this function must be row_end system field.
Added the above function to conjunction with SYSTEM_TIME_BEFORE
versioning condition.
When there are E empty partitions left, auto-create N new empty
partitions for SYSTEM_TIME partitioning rotated by INTERVAL/LIMIT and
marked by AUTO_INCREMENT keyword. Syntax change: AUTO_INCREMENT
keyword (or shorter AUTO may be used instead) after LIMIT/INTERVAL
clause.
CREATE OR REPLACE TABLE t (x INT) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME LIMIT 100000 AUTO_INCREMENT;
CREATE OR REPLACE TABLE t (x INT) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 WEEK AUTO_INCREMENT;
The current revision implements hard-coded values of 1 for E and N. As
well as auto-creation threshold MinInterval = 1 hour, MinLimit = 1000.
The name for newly added partition will be first chosen as "pX", where
X is partition number and "p" is hard-coded name prefix. If this name
is already occupied, the X will be incremented until the resulting
name will be free to use.
ALTER TABLE ADD PARTITION is now always fast. If there some history
partition overflow occurs manual ALTER TABLE REBUILD PARTITION is
needed.
MDEV-18957 UPDATE with LIMIT clause is wrong for versioned partitioned tables
UPDATE, DELETE: replace linear search of current/historical records
with vers_setup_conds().
Additional DML cases in view.test