The long semaphore wait appeared to be the caused by the following
pattern in the MTR test:
```
SET DEBUG_SYNC = "now SIGNAL wsrep_after_certification_continue";
SET DEBUG_SYNC = "now SIGNAL signal.wsrep_apply_cb;
```
Raising two signals, one right after another, caused one signal to
overwrite the other, before the signal was consumed by the thread.
This caused one thread to be stuck until the debug sync point would
timeout.
A certification failure followed by a clean shutdown would cause an
inconsistency between the sequence number stored in innodb and the
sequence number stored in provider.
This happened both in the case of local certification failure, and in
the case where dummy writeset is applied.
The fix consists of:
- updating wsrep position after dummy writeset is delivered in
`Wsrep_high_priority_service::log_dummy_write_set()`
- updating wsrep position while releasing commit order in wsrep-lib
side
Added two tests which stress the situation where a server is shutdown
after a certification failure.
Assertion failed in wsrep-lib after transaction replay which
failed due to conflict in certification.
- Implemented reproducible test case MDEV-20793 to reproduce the crash.
- Fixed wsrep-lib to deal with certification error during replay.
Found two bugs
(1) have_committing_connections was missing mutex unlock on one
exit case. As this function is called on a loop it caused mutex
lock when we already owned the mutex. This could cause hang.
(2) wsrep_RSU_begin did set up error code when partition to
be dropped could not be MDL-locked because of concurrent
operations but wrong error code was propagated to upper layer
causing error to be ignored. This could have also caused
the hang.
The problem happens when MariaDB master replicates writes for only non InnoDB
tables (e.g. writes to MyISAM table(s)). Async slave node, in Galera cluster,
can apply these writes successfully, but it will, in the end, write gtid position in
mysql.gtid_slave_pos table. mysql.gtid_slave_pos table is InnoDB engine, and
this write makes innodb handlerton part of the replicated "transaction".
Note that wsrep patch identifies that write to gtid_slave_pos should not be replicated
and skips appending wsrep keys for these writes. However, as InnoDB was present
in the transaction, and there are replication events (for MyISAM table) in transaction
cache, but there are no appended keys, wsrep raises an error, and this makes the söave
thread to stop.
The fix is simply to not treat it as an error if async slave tries to replicate a write
set with binlog events, but no keys. We just skip wsrep replication and return successfully.
This commit contains also a mtr test which forces mysql.gtid_slave_pos table isto be
of InnoDB engine, and executes MyISAM only write through asyn replication.
There is additional fix for declaring IO and background slave threads as non wsrep.
These threads should not write anything for wsrep replication, and this is just a safeguard
to make sure nothing leaks into cluster from these slave threads.
The original crash happened when async replication IO thread was updating mysql.gtid_slave_pos table. Operations on this table should remain node local, but it appears that protection (THD::wsrep_ignore_table flag) to prevent wsrep replication for this table mas missing for innodb write_row() and update_row().
It was somewhat difficult to reproduce the issue, because mtr seems to create the affected table mysql.gtid_log_pos as of Aria engine type, and Aria engine operations will not be replicated anyhow. It looks, though, that in release installation, mysql.gtid_slave_pos table is of InnoDB engine.
It was possible to trigger somewhat related problem by running test galera.galera_as_slave_gtid with configuration: gtid_pos_auto_engines=InnoDB. However, this test mode, causes earlier crash when replication background thread creates aditional table: mysql.gtid_slave_pos_InnoDB, and this table create triggered wsrep TOI replication, which also failed for assertion. Actually, async replication IO and background threads should not replicate anything to cluster.
This pull request contains new test galera.galera_as_slave_gtid_auto_engine, which basically just runs galera.galera_as_slave_gtid with configuration of gtid_pos_auto_engines=InnoDB.
Test galera.galera_as_slave_gtid is also modified for better code reuse.
Actual fix for MDEV-21096 is in storage/innobase/handler/ha_innodb.cc, where THD::wsrep_ignore_table flag is now honored before wsrep key population.
There is additional fix in sql/service_wsrep.cc where async replication IO and background threads are marked as non-local. This fences these threads out of wsrep replication altogether. Note that this change, actually makes the use of THD::wsrep_ignore-table redundant. We may want to refactor THD::wsrep_ignore_table out in the future, if there is no other use case for it in sight.
This PR contains a mtr test for reproducing a failure with replicating create table as select statement (CTAS) through asynchronous mariadb replication to mariadb galera cluster.
The problem happens when CTAS replication contains both create table statement followed by row events for populating the table. In such situation, the galera node operating as mariadb replication slave, will first replicate only the create table part into the cluster, and then perform another replication containing both the create table and row events. This will lead all other nodes to fail for duplicate table create attempt, and crash due to this failure.
PR contains also a fix, which identifies the situation when CTAS has been replicated, and makes further scan in async replication stream to see if there are following row events. The slave node will replicate either single TOI in case the CTAS table is empty, or if CTAS table contains rows, then single bundled write set with create table and row events is replicated to galera cluster.
This fix should keep master server's GTID's for CTAS replication in sync with GTID's in galera cluster.
Historically, InnoDB split the redo log into at least 2 files.
MDEV-12061 allowed the minimum to be innodb_log_files_in_group=1,
but it kept the default at innodb_log_files_in_group=2.
Because performance seems to be slightly better with only one log file,
and because implementing an append-only variant of the log would require
a single file, let us define the default to be 1, and have
innodb_log_file_size=96M, to retain the same default total size.
Instrumenting parallel slave worker thread with wsrep replication hooks.
Added mtr test for testing parallel slave support.
The test is based on the test attached in MDEV-6860 jira tracker.
* MDEV-20225 BF aborting SP execution
When stored procedure execution was chosen as victim for a BF abort, the old implemnetationn called for rollback immediately
when execution was inside SP isntruction. Technically this happened in wsrep_after_statement() call, which identified the
need for a rollback.
The problem was that MariaDB does not accept rollback (nor commit) inside sub statement, there are several asserts about it,
checking for THD::in_sub_stmt.
This patch contains a fix, which skips calling wsrep_after_statement() for SP execution, which is marked as BF must abort. Instead,
we return error code to upper level, where rollback will eventually happen, ouside of SP execution.
Also, appending the affected trigger table (dropped or created) in the populated key set for the write set,
which prevents parallel applying of other transactions working on the same table.
* MDEV-20225 BF aborting SP execution, second patch
First PR missed 4 commits, which are now squashed in this patch:
- Added galera_sp_bf_abort test.
A MTR test case which will reproduce BF-BF conflict if all keys
corresponding to affected tables are not assigned for DROP TRIGGER.
- Fixed incorrect use of sync pointsin MDEV-20225
- Added condition for SQLCOM_DROP_TRIGGER in wsrep_can_run_in_toi()
to make it replicate.
* MDEV-20225 BF aborting SP execution, third patch
The galera_trigger.test caused a situation, where SP invocation caused a trigger
to fire, and the trigger executed as sub statement SP, and was BF aborted by applier.
because of wsrep_after_statement() was called for the sub-statement level, it ended up
in exeuting rollback and asserted there.
Thus fix will catch sub-statement level SP execution, and avoids calling wsrep_after_statement()
- Fixes a situation in which a thread gets BF aborted and does not send the reply back to
the client, even though the connection is still alive. That caused
both sides to hang waiting for the next message. Now we explicitly
check that the connection is still alive.
- MTR test for the above
- Replaced thd->killed assignments to thd->reset_kill_query where applicable.
Command COM_SHUTDOWN was rejected in non-Primary because
server_command_flags[COM_SHUTDOWN] had value CF_NO_COM_MULTI
instead of CF_SKIP_WSREP_CHECK.
As a fix removed assignment
server_command_flags[CF_NO_COM_MULTI]= CF_NO_COM_MULTI
which overwrote server_command_flags[COM_SHUTDOWN].
Command COM_SHUTDOWN was rejected in non-Primary because
server_command_flags[COM_SHUTDOWN] had value CF_NO_COM_MULTI
instead of CF_SKIP_WSREP_CHECK.
As a fix removed assignment
server_command_flags[CF_NO_COM_MULTI]= CF_NO_COM_MULTI
which overwrote server_command_flags[COM_SHUTDOWN].