InnoDB startup was discovering undo tablespaces in a dirty way.
It was reading a possibly stale copy of the TRX_SYS page before
processing any redo log records.
srv_start(): Do not call buf_pool_invalidate(). Invoke
trx_rseg_get_n_undo_tablespaces() after the recovery has been initiated.
recv_recovery_from_checkpoint_start(): Assert that the buffer pool is
empty. This used to be guaranteed by the buf_pool_invalidate() call.
trx_rseg_get_n_undo_tablespaces(): Move to the calling compilation unit,
and reimplement in a simpler way.
srv_undo_tablespace_create(): Remove the constant parameter
size=SRV_UNDO_TABLESPACE_SIZE_IN_PAGES.
srv_undo_tablespace_open(): Reimplement in a cleaner way, with
more robust error handling.
srv_all_undo_tablespaces_open(): Split from srv_undo_tablespaces_init().
srv_undo_tablespaces_init(): Read all "undo001","undo002" tablespace
files directly, without consulting the TRX_SYS page via calling
trx_rseg_get_n_undo_tablespaces().
This is joint work with Thirunarayanan Balathandayuthapani.
In MariaDB 10.2 master could have been configured so that there
is extra annotate events. When we peak next event type for CTAS we
need to skip annotate events.
This patch contains two fixes:
* wsrep_handle_mdl_conflict(): handle the case where SR transaction
is in aborting state. Previously, a BF-BF conflict was reported, and
the process would abort.
* wsrep_thd_bf_abort(): do not restore thread vars after calling
wsrep_bf_abort(). Thread vars are already restored in wsrep-lib if
necessary. This also removes the assumption that the caller of
wsrep_thd_bf_abort() is the given bf_thd, which is not the case.
Also in this patch:
* Remove unnecessary check for active victim transaction in
wsrep_thd_bf_abort(): the exact same check is performed later in
wsrep_bf_abort().
* Make wsrep_thd_bf_abort() and wsrep_log_thd() const-correct.
* Change signature of wsrep_abort_thd() to take THD pointers instead
of void pointers.
Unit prepare prematurely fixed field which must be fixed via
setup_conds() to correctly update table->covering_keys.
Call vers_setup_conds() directly instead, because actually everything
else is not needed.
Unit prepare prematurely fixed field which must be fixed via
setup_conds() to correctly update table->covering_keys.
Call vers_setup_conds() directly instead, because actually everything
else is not needed.
Wrong assertion condition. SYSTEM_TIME_ALL indicates that
vers_setup_conds() is done. In case FOR SYSTEM_TIME ALL is specified
in command the assertion passes but not checks anything.
Don't do skip_setup_conds() unless all errors are checked.
Fixes following errors:
ER_PERIOD_NOT_FOUND
ER_VERS_QUERY_IN_PARTITION
ER_VERS_ENGINE_UNSUPPORTED
ER_VERS_NOT_VERSIONED
Don't do skip_setup_conds() unless all errors are checked.
Fixes following errors:
ER_PERIOD_NOT_FOUND
ER_VERS_QUERY_IN_PARTITION
ER_VERS_ENGINE_UNSUPPORTED
ER_VERS_NOT_VERSIONED
LIMIT history partitions cannot be checked by existing algorithm of
check_misplaced_rows() because working history partition is
incremented each time another one is filled. The existing algorithm
gets record and tries to decide partition id for it by
get_partition_id(). For LIMIT history it will just get first
non-filled partition.
To fix such partitions it is required to do REBUILD instead of REPAIR.
When view is merged by DT_MERGE_FOR_INSERT it is then skipped from
processing and doesn't update WHERE clause with
vers_setup_conds(). Note that view itself cannot work in
vers_setup_conds() because it doesn't have row_start, row_end
fields. Thus it is required to descend down to material TABLE_LIST
through calls of mysql_derived_prepare() and run vers_setup_conds()
from there. Luckily, all views (views of views, views of views of
views, etc.) are linked in one list through next_global pointer, so we
can skip all views of views and get straight to non-view TABLE_LIST by
checking its merge_underlying_list property for zero value (it is
assigned by DT_MERGE_FOR_INSERT for merged derived tables).
We have to do that only for UPDATE and DELETE. Other DML commands
don't use WHERE clause.
MDEV-21146 Assertion `m_lock_type == 2' in handler::ha_drop_table upon LOAD DATA
LOAD DATA does not use WHERE and the above call of vers_setup_conds()
is not needed. unit->prepare() led to wrongly locked temporary table.
"write set" for replication finally got its correct place
(mark_columns_per_binlog_row_image()). When done generally in
mark_columns_needed_for_update() it affects optimization
algorithm. used_key_is_modified, query_plan.using_io_buffer are
wrongly set and that leads to wrong prepare_for_keyread() which limits
read_set.
Turn read cache off for update and multi-update for versioned
table. no_cache is reinited on each TABLE open because it is
applicable for specific algorithms.
As a side fix vers_insert_history_row() honors vers_write setting.
Aria with row_format=fixed uses IO_CACHE of type READ_CACHE for
sequential read in update loop. When history row is inserted inside
this loop the cache misses it and fails with error.
TODO:
Currently maria_extra() does not support SEQ_READ_APPEND. Probably it
might be possible to use this type of cache.
executing undo undo_key_delete" upon startup on datadir restored from
incremental backup
aria_log* files were not copied on --prepare --incremental-dir step from
incremental to destination backup directory.
Revert part of commit 6cedb671e9
because it turns out to be theoretically impossible to parse a
ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC metadata record where
the variable-length fields in the PRIMARY KEY have been written
as nonempty strings.
The problem happens when MariaDB master replicates writes for only non InnoDB
tables (e.g. writes to MyISAM table(s)). Async slave node, in Galera cluster,
can apply these writes successfully, but it will, in the end, write gtid position in
mysql.gtid_slave_pos table. mysql.gtid_slave_pos table is InnoDB engine, and
this write makes innodb handlerton part of the replicated "transaction".
Note that wsrep patch identifies that write to gtid_slave_pos should not be replicated
and skips appending wsrep keys for these writes. However, as InnoDB was present
in the transaction, and there are replication events (for MyISAM table) in transaction
cache, but there are no appended keys, wsrep raises an error, and this makes the söave
thread to stop.
The fix is simply to not treat it as an error if async slave tries to replicate a write
set with binlog events, but no keys. We just skip wsrep replication and return successfully.
This commit contains also a mtr test which forces mysql.gtid_slave_pos table isto be
of InnoDB engine, and executes MyISAM only write through asyn replication.
There is additional fix for declaring IO and background slave threads as non wsrep.
These threads should not write anything for wsrep replication, and this is just a safeguard
to make sure nothing leaks into cluster from these slave threads.
We must relax too strict debug assertions. For latin1_swedish_ci,
mtype=DATA_CHAR or mtype=DATA_VARCHAR will be used instead of
mtype=DATA_MYSQL or mtype=DATA_VARMYSQL. Likewise, some changes of
dtype_get_charset_coll() do not affect the data type encoding,
but only any indexes that are defined on the column.
Charset::same_encoding(): Check whether two charset-collations have
the same character set encoding.
dict_col_t::same_encoding(): Check whether two character columns
have the same character set encoding.
dict_col_t::same_type(): Check whether two columns have a compatible
data type encoding.
dict_col_t::same_format(), dict_table_t::instant_column(): Do not
compare mtype or the charset-collation of prtype directly.
Rely on dict_col_t::same_type() instead.
dtype_get_charset_coll(): Narrow the return type to uint16_t.
This is a refined version of a fix that was developed by
Thirunarayanan Balathandayuthapani.
The original crash happened when async replication IO thread was updating mysql.gtid_slave_pos table. Operations on this table should remain node local, but it appears that protection (THD::wsrep_ignore_table flag) to prevent wsrep replication for this table mas missing for innodb write_row() and update_row().
It was somewhat difficult to reproduce the issue, because mtr seems to create the affected table mysql.gtid_log_pos as of Aria engine type, and Aria engine operations will not be replicated anyhow. It looks, though, that in release installation, mysql.gtid_slave_pos table is of InnoDB engine.
It was possible to trigger somewhat related problem by running test galera.galera_as_slave_gtid with configuration: gtid_pos_auto_engines=InnoDB. However, this test mode, causes earlier crash when replication background thread creates aditional table: mysql.gtid_slave_pos_InnoDB, and this table create triggered wsrep TOI replication, which also failed for assertion. Actually, async replication IO and background threads should not replicate anything to cluster.
This pull request contains new test galera.galera_as_slave_gtid_auto_engine, which basically just runs galera.galera_as_slave_gtid with configuration of gtid_pos_auto_engines=InnoDB.
Test galera.galera_as_slave_gtid is also modified for better code reuse.
Actual fix for MDEV-21096 is in storage/innobase/handler/ha_innodb.cc, where THD::wsrep_ignore_table flag is now honored before wsrep key population.
There is additional fix in sql/service_wsrep.cc where async replication IO and background threads are marked as non-local. This fences these threads out of wsrep replication altogether. Note that this change, actually makes the use of THD::wsrep_ignore-table redundant. We may want to refactor THD::wsrep_ignore_table out in the future, if there is no other use case for it in sight.
Implement syntax like:
create table t1 (x int) with system versioning partition by system_time;
which will create 1 history partition and 1 current partition.
Also it is possible to specify the number of history partitions:
create table t1 (x int) with system versioning partition by system_time partitions 5;
which will create 4 history partitions (and 1 current partition).
Tests:
partition.test cases are duplicated where it is appropriate for default partitions.
partition_rotation.test cases are replaced by default partitions where possible.
MDEV-18957 UPDATE with LIMIT clause is wrong for versioned partitioned tables
UPDATE, DELETE: replace linear search of current/historical records
with vers_setup_conds().
Additional DML cases in view.test