This is 10.4 version.
Idea is to create monitor thread for both donor and joiner that will
periodically if needed extend systemd timeout while SST is being
processed. In 10.4 actual SST is executed by running SST script
and exchanging messages on pipe using blocking fgets. This fix
starts monitoring thread before SST script is started and
we stop monitoring thread when SST has been completed.
The long semaphore wait appeared to be the caused by the following
pattern in the MTR test:
```
SET DEBUG_SYNC = "now SIGNAL wsrep_after_certification_continue";
SET DEBUG_SYNC = "now SIGNAL signal.wsrep_apply_cb;
```
Raising two signals, one right after another, caused one signal to
overwrite the other, before the signal was consumed by the thread.
This caused one thread to be stuck until the debug sync point would
timeout.
A certification failure followed by a clean shutdown would cause an
inconsistency between the sequence number stored in innodb and the
sequence number stored in provider.
This happened both in the case of local certification failure, and in
the case where dummy writeset is applied.
The fix consists of:
- updating wsrep position after dummy writeset is delivered in
`Wsrep_high_priority_service::log_dummy_write_set()`
- updating wsrep position while releasing commit order in wsrep-lib
side
Added two tests which stress the situation where a server is shutdown
after a certification failure.
Assertion failed in wsrep-lib after transaction replay which
failed due to conflict in certification.
- Implemented reproducible test case MDEV-20793 to reproduce the crash.
- Fixed wsrep-lib to deal with certification error during replay.
Found two bugs
(1) have_committing_connections was missing mutex unlock on one
exit case. As this function is called on a loop it caused mutex
lock when we already owned the mutex. This could cause hang.
(2) wsrep_RSU_begin did set up error code when partition to
be dropped could not be MDL-locked because of concurrent
operations but wrong error code was propagated to upper layer
causing error to be ignored. This could have also caused
the hang.
In MariaDB 10.2 master could have been configured so that there
is extra annotate events. When we peak next event type for CTAS we
need to skip annotate events.
The problem happens when MariaDB master replicates writes for only non InnoDB
tables (e.g. writes to MyISAM table(s)). Async slave node, in Galera cluster,
can apply these writes successfully, but it will, in the end, write gtid position in
mysql.gtid_slave_pos table. mysql.gtid_slave_pos table is InnoDB engine, and
this write makes innodb handlerton part of the replicated "transaction".
Note that wsrep patch identifies that write to gtid_slave_pos should not be replicated
and skips appending wsrep keys for these writes. However, as InnoDB was present
in the transaction, and there are replication events (for MyISAM table) in transaction
cache, but there are no appended keys, wsrep raises an error, and this makes the söave
thread to stop.
The fix is simply to not treat it as an error if async slave tries to replicate a write
set with binlog events, but no keys. We just skip wsrep replication and return successfully.
This commit contains also a mtr test which forces mysql.gtid_slave_pos table isto be
of InnoDB engine, and executes MyISAM only write through asyn replication.
There is additional fix for declaring IO and background slave threads as non wsrep.
These threads should not write anything for wsrep replication, and this is just a safeguard
to make sure nothing leaks into cluster from these slave threads.
The original crash happened when async replication IO thread was updating mysql.gtid_slave_pos table. Operations on this table should remain node local, but it appears that protection (THD::wsrep_ignore_table flag) to prevent wsrep replication for this table mas missing for innodb write_row() and update_row().
It was somewhat difficult to reproduce the issue, because mtr seems to create the affected table mysql.gtid_log_pos as of Aria engine type, and Aria engine operations will not be replicated anyhow. It looks, though, that in release installation, mysql.gtid_slave_pos table is of InnoDB engine.
It was possible to trigger somewhat related problem by running test galera.galera_as_slave_gtid with configuration: gtid_pos_auto_engines=InnoDB. However, this test mode, causes earlier crash when replication background thread creates aditional table: mysql.gtid_slave_pos_InnoDB, and this table create triggered wsrep TOI replication, which also failed for assertion. Actually, async replication IO and background threads should not replicate anything to cluster.
This pull request contains new test galera.galera_as_slave_gtid_auto_engine, which basically just runs galera.galera_as_slave_gtid with configuration of gtid_pos_auto_engines=InnoDB.
Test galera.galera_as_slave_gtid is also modified for better code reuse.
Actual fix for MDEV-21096 is in storage/innobase/handler/ha_innodb.cc, where THD::wsrep_ignore_table flag is now honored before wsrep key population.
There is additional fix in sql/service_wsrep.cc where async replication IO and background threads are marked as non-local. This fences these threads out of wsrep replication altogether. Note that this change, actually makes the use of THD::wsrep_ignore-table redundant. We may want to refactor THD::wsrep_ignore_table out in the future, if there is no other use case for it in sight.
This PR contains a mtr test for reproducing a failure with replicating create table as select statement (CTAS) through asynchronous mariadb replication to mariadb galera cluster.
The problem happens when CTAS replication contains both create table statement followed by row events for populating the table. In such situation, the galera node operating as mariadb replication slave, will first replicate only the create table part into the cluster, and then perform another replication containing both the create table and row events. This will lead all other nodes to fail for duplicate table create attempt, and crash due to this failure.
PR contains also a fix, which identifies the situation when CTAS has been replicated, and makes further scan in async replication stream to see if there are following row events. The slave node will replicate either single TOI in case the CTAS table is empty, or if CTAS table contains rows, then single bundled write set with create table and row events is replicated to galera cluster.
This fix should keep master server's GTID's for CTAS replication in sync with GTID's in galera cluster.
Historically, InnoDB split the redo log into at least 2 files.
MDEV-12061 allowed the minimum to be innodb_log_files_in_group=1,
but it kept the default at innodb_log_files_in_group=2.
Because performance seems to be slightly better with only one log file,
and because implementing an append-only variant of the log would require
a single file, let us define the default to be 1, and have
innodb_log_file_size=96M, to retain the same default total size.