1
0
mirror of https://github.com/MariaDB/server.git synced 2025-12-07 17:42:39 +03:00
Commit Graph

751 Commits

Author SHA1 Message Date
Daniele Sciascia
7d31321464 MDEV-19803 Long semaphore wait error on galera.MW-388
The long semaphore wait appeared to be the caused by the following
pattern in the MTR test:

```
SET DEBUG_SYNC = "now SIGNAL wsrep_after_certification_continue";
SET DEBUG_SYNC = "now SIGNAL signal.wsrep_apply_cb;
```

Raising two signals, one right after another, caused one signal to
overwrite the other, before the signal was consumed by the thread.
This caused one thread to be stuck until the debug sync point would
timeout.
2020-01-14 09:11:35 +02:00
Daniele Sciascia
2d4b6571ec Wsrep position not updated in InnoDB after certification failures (#1432)
A certification failure followed by a clean shutdown would cause an
inconsistency between the sequence number stored in innodb and the
sequence number stored in provider.
This happened both in the case of local certification failure, and in
the case where dummy writeset is applied.
The fix consists of:
- updating wsrep position after dummy writeset is delivered in
 `Wsrep_high_priority_service::log_dummy_write_set()`
- updating wsrep position while releasing commit order in wsrep-lib
 side

Added two tests which stress the situation where a server is shutdown
after a certification failure.
2020-01-14 07:33:02 +02:00
Marko Mäkelä
ca8c3be47d Merge 10.4 into 10.5 2020-01-03 16:15:40 +02:00
Jan Lindström
2cff807d3f Add have_debug to MDEV-20793 and add wait condition to
galera_parallel_autoinc_largetrx to stabilize it.
2019-12-31 11:55:44 +02:00
Teemu Ollakka
13b3d7f1f1 MDEV-20793 Assertion failed after replay.
Assertion failed in wsrep-lib after transaction replay which
failed due to conflict in certification.

- Implemented reproducible test case MDEV-20793 to reproduce the crash.
- Fixed wsrep-lib to deal with certification error during replay.
2019-12-31 11:46:55 +02:00
Marko Mäkelä
8cc15c036d Merge 10.4 into 10.5 2019-12-27 21:17:16 +02:00
Marko Mäkelä
4c25e75ce7 Merge 10.3 into 10.4 2019-12-27 18:20:28 +02:00
Marko Mäkelä
5ab70e7f68 Merge 10.2 into 10.3 2019-12-27 15:14:48 +02:00
Jan Lindström
3fbd9f1522 MDEV-20909 : Galera test failure on galera.galera_gcs_fc_limit: Server crash with signal 6
Add proper wait condition when provider options are restored.
2019-12-23 16:06:25 +02:00
Jan Lindström
17b1b8118a MDEV-21189 : Dropping partition with 'wsrep_OSU_method=RSU' and 'SESSION sql_log_bin = 0' cases the galera node to hang
Test cleanup. Best practice for using RSU, is to isolate the node
up-front, so this test did not reflect real world scenario
2019-12-23 12:57:22 +02:00
Jan Lindström
cc9c55b2e2 MDEV-21189 : Dropping partition with 'wsrep_OSU_method=RSU' and 'SESSION sql_log_bin = 0' cases the galera node to hang
Test cleanup. Best practice for using RSU, is to isolate the node
up-front, so this test did not reflect real world scenario
2019-12-23 12:43:15 +02:00
Jan Lindström
c3824766c5 Fortify galera_partition test. 2019-12-18 10:02:57 +02:00
Jan Lindström
088de81d96 MDEV-21335 : Galera test failure on suite wsrep
Problem was that wsrep_on was OFF.

This is 10.4 version.
2019-12-18 08:22:07 +02:00
Marko Mäkelä
28c89b7151 Merge 10.4 into 10.5 2019-12-16 07:47:17 +02:00
Marko Mäkelä
8fa759a576 Merge 10.3 into 10.4
We disable the MDEV-21189 test galera.galera_partition
because it times out.
2019-12-13 17:30:37 +02:00
Marko Mäkelä
0a20e5ab77 Merge 10.2 into 10.3 2019-12-12 14:41:51 +02:00
Oleksandr Byelkin
a15234bf4b Merge branch '10.3' into 10.4 2019-12-09 15:09:41 +01:00
Jan Lindström
59e14b9684 MDEV-21189: Dropping partition with 'wsrep_OSU_method=RSU' and 'SESSION sql_log_bin = 0' cases the galera node to hang
Found two bugs

(1) have_committing_connections was missing mutex unlock on one
exit case. As this function is called on a loop it caused mutex
lock when we already owned the mutex. This could cause hang.

(2) wsrep_RSU_begin did set up error code when partition to
be dropped could not be MDL-locked because of concurrent
operations but wrong error code was propagated to upper layer
causing error to be ignored. This could have also caused
the hang.
2019-12-09 08:14:39 +02:00
Oleksandr Byelkin
008ee867a4 Merge branch '10.2' into 10.3 2019-12-04 17:46:28 +01:00
Oleksandr Byelkin
f8b5e147da Merge branch '10.1' into 10.2 2019-12-03 14:45:06 +01:00
Jan Lindström
88073dae79 MDEV-21198 : Galera test failure on galera_var_notify_cmd
Add proper wsrep sync wait.
2019-12-03 08:04:46 +02:00
Jan Lindström
9d9a2253c6 Merge remote-tracking branch 10.2 into 10.3
Conflicts:
	mysql-test/suite/galera/t/galera_binlog_event_max_size_max-master.opt
	mysql-test/suite/innodb/r/innodb-mdev-7513.result
	mysql-test/suite/innodb/t/innodb-mdev-7513.test
	mysql-test/suite/wsrep/disabled.def
	storage/innobase/ibuf/ibuf0ibuf.cc
2019-12-02 14:35:10 +02:00
Jan Lindström
c6ed37b88a MDEV-21182: Galera test failure on MW-284
galera_2nodes.cnf did not contain wsrep_on=1 on correct places. Fixed
restart options to use correct configuration.
2019-11-30 13:52:49 +02:00
seppo
38839854b7 MDEV-19572 async slave node fails to apply MyISAM only writes (#1418)
The problem happens when MariaDB master replicates writes for only non InnoDB
tables (e.g. writes to MyISAM table(s)). Async slave node, in Galera cluster,
can apply these writes successfully, but it will, in the end, write gtid position in
mysql.gtid_slave_pos table. mysql.gtid_slave_pos table is InnoDB engine, and
this write makes innodb handlerton part of the replicated "transaction".
Note that wsrep patch identifies that write to gtid_slave_pos should not be replicated
and skips appending wsrep keys for these writes. However, as InnoDB was present
in the transaction, and there are replication events (for MyISAM table) in transaction
cache, but there are no appended keys, wsrep raises an error, and this makes the söave
thread to stop.

The fix is simply to not treat it as an error if async slave tries to replicate a write
set with binlog events, but no keys. We just skip wsrep replication and return successfully.

This commit contains also a mtr test which forces mysql.gtid_slave_pos table isto be
of InnoDB engine, and executes MyISAM only write through asyn replication.

There is additional fix for declaring IO and background slave threads as non wsrep.
These threads should not write anything for wsrep replication, and this is just a safeguard
to make sure nothing leaks into cluster from these slave threads.
2019-11-26 08:49:50 +02:00
Aleksey Midenkov
0c05a2ed71 Merge 10.4 into 10.5 2019-11-25 17:24:09 +03:00
seppo
4111a53079 MDEV-21096 async slave crash with gtid_log_pos table access (#1413)
The original crash happened when async replication IO thread was updating mysql.gtid_slave_pos table. Operations on this table should remain node local, but it appears that protection (THD::wsrep_ignore_table flag) to prevent wsrep replication for this table mas missing for innodb write_row() and update_row().
It was somewhat difficult to reproduce the issue, because mtr seems to create the affected table mysql.gtid_log_pos as of Aria engine type, and Aria engine operations will not be replicated anyhow. It looks, though, that in release installation, mysql.gtid_slave_pos table is of InnoDB engine.
It was possible to trigger somewhat related problem by running test galera.galera_as_slave_gtid with configuration: gtid_pos_auto_engines=InnoDB. However, this test mode, causes earlier crash when replication background thread creates aditional table: mysql.gtid_slave_pos_InnoDB, and this table create triggered wsrep TOI replication, which also failed for assertion. Actually, async replication IO and background threads should not replicate anything to cluster.

This pull request contains new test galera.galera_as_slave_gtid_auto_engine, which basically just runs galera.galera_as_slave_gtid with configuration of gtid_pos_auto_engines=InnoDB.
Test galera.galera_as_slave_gtid is also modified for better code reuse.
Actual fix for MDEV-21096 is in storage/innobase/handler/ha_innodb.cc, where THD::wsrep_ignore_table flag is now honored before wsrep key population.
There is additional fix in sql/service_wsrep.cc where async replication IO and background threads are marked as non-local. This fences these threads out of wsrep replication altogether. Note that this change, actually makes the use of THD::wsrep_ignore-table redundant. We may want to refactor THD::wsrep_ignore_table out in the future, if there is no other use case for it in sight.
2019-11-25 11:19:33 +02:00
Jan Lindström
c6b097ab37 Remove excessive sleep from test. 2019-11-18 15:22:01 +02:00
seppo
5c68343db7 MDEV-18497 CTAS async replication from mariadb master crashes galera nodes (#1410)
This PR contains a mtr test for reproducing a failure with replicating create table as select statement (CTAS) through asynchronous mariadb replication to mariadb galera cluster.
The problem happens when CTAS replication contains both create table statement followed by row events for populating the table. In such situation, the galera node operating as mariadb replication slave, will first replicate only the create table part into the cluster, and then perform another replication containing both the create table and row events. This will lead all other nodes to fail for duplicate table create attempt, and crash due to this failure.

PR contains also a fix, which identifies the situation when CTAS has been replicated, and makes further scan in async replication stream to see if there are following row events. The slave node will replicate either single TOI in case the CTAS table is empty, or if CTAS table contains rows, then single bundled write set with create table and row events is replicated to galera cluster.

This fix should keep master server's GTID's for CTAS replication in sync with GTID's in galera cluster.
2019-11-18 15:18:00 +02:00
Marko Mäkelä
613e9e7d4d MDEV-20907 Set innodb_log_files_in_group=1 by default
Historically, InnoDB split the redo log into at least 2 files.
MDEV-12061 allowed the minimum to be innodb_log_files_in_group=1,
but it kept the default at innodb_log_files_in_group=2.

Because performance seems to be slightly better with only one log file,
and because implementing an append-only variant of the log would require
a single file, let us define the default to be 1, and have
innodb_log_file_size=96M, to retain the same default total size.
2019-10-28 17:11:10 +02:00
Marko Mäkelä
3043f38436 Merge 10.4 into 10.5 2019-10-28 17:10:34 +02:00
Jan Lindström
82f22d2f25 MDEV-18590 galera.versioning_trx_id: Test failure: mysqltest: Result content mismatch
Ignore warning.
2019-10-23 10:12:53 +03:00
seppo
421d52e896 MDEV-6860 Parallel async replication hangs (#1400)
Instrumenting parallel slave worker thread with wsrep replication hooks.
Added mtr test for testing parallel slave support.
The test is based on the test attached in MDEV-6860 jira tracker.
2019-10-16 07:51:36 +03:00
Marko Mäkelä
d04f2de80a Merge 10.4 into 10.5 2019-10-11 08:41:36 +03:00
Jan Lindström
c339487030 Try to fix galera_parallel_simple test case. 2019-10-07 08:47:42 +03:00
Jan Lindström
fe4f766e81 Add wait_condition to wait that node returns to ready state before
accessing it.
2019-10-04 14:45:25 +03:00
Marko Mäkelä
627027a674 Merge 10.4 into 10.5 2019-10-04 10:56:47 +03:00
Jan Lindström
eb0a10b072 Add missing have_debug to galera.MDEV-20225 test case. 2019-10-03 07:03:15 +03:00
Jan Lindström
97d82c3429 galera_sp_bf_abort requires debug Galera library. 2019-10-01 12:40:29 +03:00
seppo
c42c4233cb MDEV-20225 BF aborting SP execution (#1394)
* MDEV-20225 BF aborting SP execution

When stored procedure execution was chosen as victim for a BF abort, the old implemnetationn called for rollback immediately
when execution was inside SP isntruction. Technically this happened in wsrep_after_statement() call, which identified the
need for a rollback.
The problem was that MariaDB does not accept rollback (nor commit) inside sub statement, there are several asserts about it,
checking for THD::in_sub_stmt.

This patch contains a fix, which skips calling wsrep_after_statement() for SP execution, which is marked as BF must abort. Instead,
we return error code to upper level, where rollback will eventually happen, ouside of SP execution.
Also, appending the affected trigger table (dropped or created) in the populated key set for the write set,
which prevents parallel applying of other transactions working on the same table.

* MDEV-20225 BF aborting SP execution, second patch

First PR missed 4 commits, which are now squashed in this patch:
- Added galera_sp_bf_abort test.
  A MTR test case which will reproduce BF-BF conflict if all keys
  corresponding to affected tables are not assigned for DROP TRIGGER.
- Fixed incorrect use of sync pointsin MDEV-20225
- Added condition for SQLCOM_DROP_TRIGGER in wsrep_can_run_in_toi()
  to make it replicate.

* MDEV-20225 BF aborting SP execution, third patch

The galera_trigger.test caused a situation, where SP invocation caused a trigger
to fire, and the trigger executed as sub statement SP, and was BF aborted by applier.
because of wsrep_after_statement() was called for the sub-statement level, it ended up
in exeuting rollback and asserted there.
Thus fix will catch sub-statement level SP execution, and avoids calling wsrep_after_statement()
2019-10-01 10:41:33 +03:00
Marko Mäkelä
1333da90b5 Merge 10.4 into 10.5 2019-09-24 10:07:56 +03:00
Marko Mäkelä
5a92ccbaea Merge 10.3 into 10.4
Disable MDEV-20576 assertions until MDEV-20595 has been fixed.
2019-09-23 17:35:29 +03:00
Marko Mäkelä
c016ea660e Merge 10.2 into 10.3 2019-09-23 10:25:34 +03:00
Leandro Pacheco
efefafd02f fix for thread getting stuck after BF ABORT (#1362)
- Fixes a situation in which a thread gets BF aborted and does not send the reply back to
  the client, even though the connection is still alive. That caused
  both sides to hang waiting for the next message. Now we explicitly
  check that the connection is still alive.
- MTR test for the above
- Replaced thd->killed assignments to thd->reset_kill_query where applicable.
2019-09-17 10:58:20 +03:00
Marko Mäkelä
46a6cea5c5 Merge 10.4 into 10.5 2019-09-17 09:07:52 +03:00
Teemu Ollakka
509d103810 MDEV-20561 Galera node shutdown fails in non-Primary (#1386)
Command COM_SHUTDOWN was rejected in non-Primary because
server_command_flags[COM_SHUTDOWN] had value CF_NO_COM_MULTI
instead of CF_SKIP_WSREP_CHECK.

As a fix removed assignment
server_command_flags[CF_NO_COM_MULTI]= CF_NO_COM_MULTI
which overwrote server_command_flags[COM_SHUTDOWN].
2019-09-15 11:24:57 +03:00
Jan Lindström
2a98d0b5ca Fix #2 for galera.MW-336 2019-09-14 12:15:01 +03:00
Jan Lindström
8711f5505a Fix galera.galera_sst_mysqldump_with_key test case for 10.4 2019-09-14 11:48:41 +03:00
Jan Lindström
3422c13ab7 Try to fix galera.MW-336 test case. 2019-09-13 11:53:07 +03:00
Teemu Ollakka
40beeb1402 MDEV-20561 Galera node shutdown fails in non-Primary (#1386)
Command COM_SHUTDOWN was rejected in non-Primary because
server_command_flags[COM_SHUTDOWN] had value CF_NO_COM_MULTI
instead of CF_SKIP_WSREP_CHECK.

As a fix removed assignment
server_command_flags[CF_NO_COM_MULTI]= CF_NO_COM_MULTI
which overwrote server_command_flags[COM_SHUTDOWN].
2019-09-13 09:18:11 +03:00
Marko Mäkelä
d28686ada6 Merge 10.4 into 10.5 2019-09-12 16:36:46 +03:00