1
0
mirror of https://github.com/MariaDB/server.git synced 2025-11-12 10:22:39 +03:00
Commit Graph

144 Commits

Author SHA1 Message Date
sjaakola
a1e70388c4 MDEV-24966 Galera multi-master regression
After the merging of MDEV-24915, 10.6 branch has regressions with handling of
concurrent write load against two or more cluster nodes. These regressions may
surface as cluster hanging, node crashes or data inconsistency. With some test
scenarios, the only visible symptom could be that the BF victim aborting happens
only by innodb lock wait timeout expiration. This would result only to poor
performance (by default 50 sec hang for each BF conflict), and could be somewhat
difficult to diagnose.

This pull request has following fixes to handle concurrent write load from
multiple nodes:

In lock_wait_wsrep_kill(), the victim trx was expected to be only in
TRX_STATE_ACTIVE state. With the delayed BF conflict handling, it can happen
that victim has advanced into pre commit state. This was fixed by choosing
victim both in TRX_STATE_ACTIVE and TRX_STATE_PREPARED states.

Victim transaction may be in several different states at the time of detected
lock conflict, and due to delayed BF aborting practice in MDEV-24915, the victim
may advance further before the actual BF aborting takes place. The BF aborting
in MDEV-24915 did not wake the victim, if it was in the state of waiting for
some other lock (than the one that was blocking the high priority thread).
This anomaly caused the innodb lock wait timeout expiration delays and poor
performance symptom. To fix this, lock_wait_wsrep_kill() now looks if
victim is in lock waiting state, and uses lock_cancel_waiting_and_release()
to cancel this lock wait.

wsrep_bf_abort() checks if the victim has active transaction (in wsrep-lib),
and starts a new transaction if there was no active transaction before.
Due to late BF aborting, the victim may have e.g. failed in certification
and is already aborting or has aborted at this stage. This has caused
problems in testing where BF aborter tries to BF abort himself.
The fix in wsrep_bf_abort() now skips the BF abort, if victim is aborting
or has aborted. Victim may not have started transaction yet in wsrep context,
but it may have acquired MDL locks (due to DDL execution), and this has
caused BF conflict. Such case does not require aborting in wsrep or
replication provider state.

BF aborting could cause BF-BF conflict scenario, if victim was already aborted
and changed to replayer having high priority as well. This BF-BF conflict
scenario is now avoided in lock_wait_wsrep() where we now check if blocking
lock holder is also high priority and is ordered before, caller should wait
for the lock in this situation.

The natural innodb deadlock resolving algorithm could pick BF thread as
deadlock victim. This is fixed by giving max weigh to BF threads in
Deadlock::report().

MDEV-24341 has changed excution paths in do_command() and this affects BF
aborted victim execution. This PR fixes one assert in do_command():
 DBUG_ASSERT(!thd->async_state.pending_ops())
Which fired if the thd was BF aborted earlier. This assert is now changed
to allow pending_ops() if thd was BF aborted before.

With these fixes, long term highly conflicting write load could be run against
to node cluster. If binlogging is configured, log_slave_updates should be
also set.
2021-04-13 14:58:54 +03:00
Jan Lindström
27d66d644c MENT-411 : Implement wsrep_replicate_aria
Introduced two new wsrep_mode options
* REPLICATE_MYISAM
* REPLICATE_ARIA

Depracated wsrep_replicate_myisam parameter and we use
wsrep_mode = REPLICATE_MYISAM instead.

This required small refactoring of wsrep_check_mode_after_open_table
so that both MyISAM and Aria are handled on required DML cases.
Similarly, added Aria to wsrep_should_replicate_ddl to handle DDL
for Aria tables using TOI. Added test cases and improved MyISAM testing.
Changed use of wsrep_replicate_myisam to wsrep_mode = REPLICATE_MYISAM
2021-02-25 07:47:51 +02:00
Sergei Golubchik
e841957416 Merge branch '10.3' into 10.4 2021-02-23 09:25:57 +01:00
Sergei Golubchik
0ab1e3914c Merge branch '10.2' into 10.3 2021-02-22 22:42:27 +01:00
Sergei Golubchik
a638f1577a Merge branch 'bb-10.2-release' into 10.2 2021-02-22 18:43:03 +01:00
Sergei Golubchik
53123dfa3e Merge branch 'bb-10.3-release' into bb-10.4-release 2021-02-19 00:19:42 +01:00
Sergei Golubchik
0d55b020e1 Merge branch 'bb-10.2-release' into bb-10.3-release 2021-02-18 22:09:53 +01:00
Sergei Golubchik
ce3a2a688d make @@wsrep_provider and @@wsrep_notify_cmd read-only
this should simplify run-time cluster management
2021-02-18 19:03:01 +01:00
Jan Lindström
4d300ab1a8 MDEV-24867 : wsrep.variables MTR failed: Result length mismatch
Stabilize test case.
2021-02-17 10:28:37 +02:00
Jan Lindström
be5fce16a0 MDEV-24596 : Assertion `state_ == s_exec || state_ == s_quitting' failed in wsrep::client_state::disable_streaming
There were multiple problems here
* wsrep_trx_fragment_size should not be set when wsrep is disabled or provider is not loaded
* wsrep_trx_fragment_unit should not be set when wsrep is disabled or provider is not loaded
* wsrep_debug has no effect if wsrep is disabled or provider is not loaded
* wsrep_start_position should not be set when wsrep is disabled or provider is not loaded any other value than default
* wsrep_start_position should be changed only when we are joiner or initialized
* wsrep_start_position should be allowed to set only a value that exits, thus
we need to add error handling to wsrep_sst_complete
2021-01-21 11:41:29 +02:00
Marko Mäkelä
4b3690b504 fixup 67cb7ea22a 2020-11-03 14:48:08 +02:00
Jan Lindström
67cb7ea22a Clean up wsrep.variables 2020-11-03 09:12:06 +02:00
Jan Lindström
2391582ec3 Merge remote-tracking branch 10.2 into 10.3 2020-11-03 09:00:23 +02:00
Jan Lindström
94859d985e Clean up wsrep.variables 2020-11-03 08:49:10 +02:00
Teemu Ollakka
ec0e9d6f76 MDEV-22681 EXECUTE IMMEDIATE crashes server if wsrep is on.
A wsrep transaction was started for EXECUTE IMMEDIATE, which
caused assertion failure when the executed statement was
CREATE TABLE which should be executed in TOI mode.

As a fix, don't start wsrep transaction for EXECUTE IMMEDIATE
to let the wsrep state logic to be handled from inside stored
procedure codepath.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2020-10-28 09:51:35 +02:00
Marko Mäkelä
45e10d46d8 After-merge fix to wsrep.variables
We must make the test depend on a debug version of Galera.
2020-10-22 13:50:06 +03:00
Marko Mäkelä
46957a6a77 Merge 10.3 into 10.4 2020-10-22 13:27:18 +03:00
Marko Mäkelä
e3d692aa09 Merge 10.2 into 10.3 2020-10-22 08:26:28 +03:00
Daniele Sciascia
fdf87973cb MDEV-23081 Stray XA transactions at startup, with wsrep_on=OFF
Change xarecover_handlerton so that transaction with WSREP prefixed
xids are rolled back when Galera is disabled.

Reviewd-by: Jan Lindström <jan.lindstrom@mariadb.com>
2020-10-21 16:29:07 +03:00
Jan Lindström
fc3b5c7db3 MDEV-17585 : wsrep.variables failed in buildbot with deadlock on CREATE USER
Stabilize test by using correct galera library and restore
original galera cluster at end.
2020-10-10 08:50:50 +03:00
Daniele Sciascia
7edfb72eff Fix MTR test wsrep.variables
Require galera_have_debug_sync.inc and re-record to include new
variables exposed by latest galera library.
2020-09-28 12:17:05 +03:00
Jan Lindström
b69e980a38 MDEV-20581 Fix MTR test wsrep.variables
Made the test work with --repeat option by adding
galera_wait_ready.inc at the end of test. Recorded the test output.
2020-09-14 18:21:17 +03:00
Daniele Sciascia
f8bf5b0f84 MDEV-23466 SIGABRT on SELECT WSREP_LAST_SEEN_GTID
SELECT WSREP_LAST_SEEN_GTID aborts the server if no provider is
loaded.
2020-08-19 13:12:00 +03:00
Daniele Sciascia
fe3284b2cc MDEV-23092 SIGABRT when setting invalid wsrep_provider
Some invalid wsrep_provider paths may be interpreted as a valid
directory. For example '/invalid/libgalera_smm.so' with UTF character
set is interpreted as '/', which is a valid directory. A early check
that wsrep_provider should not be a directory fixes it.
2020-08-19 13:12:00 +03:00
Daniele Sciascia
09dd06f14a MDEV-22443 wsrep::runtime_error on START TRANSACTION
This happens with global wsrep_on disabled and local wsrep_on enabled.
The fix consists in avoiding sync wait when global wsrep_on is
disabled.
2020-08-19 13:12:00 +03:00
Marko Mäkelä
84db10f27b Merge 10.2 into 10.3 2020-04-15 09:56:03 +03:00
mkaruza
2d16452a31 MDEV-22021: Galera database could get inconsistent with rollback to savepoint
When binlog is disabled, WSREP will not behave correctly when
SAVEPOINT ROLLBACK is executed since we don't register handlers for such case.
Fixed by registering WSREP handlerton for SAVEPOINT related commands.
2020-03-31 09:59:37 +03:00
Jan Lindström
46386661a2 MDEV-20625 : MariaDB asserting when enabling wsrep_on
We need to release global system variables mutex before
doing wsrep_init to avoid race with next show status and
we need to save wsrep_on value as it is changed on wsrep_init.
Added test case.
2020-02-04 11:14:21 +02:00
Julius Goryavsky
93278ee8ad MDEV-20625: MariaDB asserting when enabling wsrep=on 2020-02-04 08:56:03 +02:00
Marko Mäkelä
87a61355e8 Merge 10.3 into 10.4
The MDEV-17062 fix in commit c4195305b2
was omitted.
2020-01-20 15:49:48 +02:00
Sergei Petrunia
e709eb9bf7 Merge branch '10.2' into 10.3
# Conflicts:
#	mysql-test/suite/galera/r/MW-388.result
#	mysql-test/suite/galera/t/MW-388.test
#	mysql-test/suite/innodb/r/truncate_inject.result
#	mysql-test/suite/innodb/t/truncate_inject.test
#	mysql-test/suite/rpl/r/rpl_stop_slave.result
#	mysql-test/suite/rpl/t/rpl_stop_slave.test
#	sql/sp_head.cc
#	sql/sp_head.h
#	sql/sql_lex.cc
#	sql/sql_yacc.yy
#	storage/xtradb/buf/buf0dblwr.cc
2020-01-17 00:46:40 +03:00
Jan Lindström
a382f69e24 MDEV-21498 : wsrep.binlog_format test failed on Azure
Waiting wsrep_ready is possible only if wsrep_on=ON.
2020-01-16 08:46:45 +02:00
Marko Mäkelä
d60dcabd0f Merge 10.3 into 10.4 2020-01-07 13:23:41 +02:00
Marko Mäkelä
eda719793a Merge 10.2 into 10.3 2020-01-07 12:14:35 +02:00
Jan Lindström
5824e9f8df MDEV-13569: wsrep_info.plugin failed in buildbot with "no nodes coming from prim view
Modify configuration so that all nodes are part of galera cluster
i.e. wsrep_on=ON. Add missing wait conditions.

test changes only.
2020-01-07 08:57:30 +02:00
Marko Mäkelä
5ab70e7f68 Merge 10.2 into 10.3 2019-12-27 15:14:48 +02:00
Marko Mäkelä
73985d8301 Merge 10.1 into 10.2 2019-12-23 07:14:51 +02:00
Jan Lindström
e9259d50f0 MDEV-21335 : Galera test failure on suite wsrep
Problem was that wsrep_on was OFF.
2019-12-18 10:02:57 +02:00
Jan Lindström
088de81d96 MDEV-21335 : Galera test failure on suite wsrep
Problem was that wsrep_on was OFF.

This is 10.4 version.
2019-12-18 08:22:07 +02:00
Oleksandr Byelkin
a15234bf4b Merge branch '10.3' into 10.4 2019-12-09 15:09:41 +01:00
Jan Lindström
2b7e461cc0 MDEV-21209 : mysql_tzinfo_to_sql's Galera checks do not work
wsrep_on parameter can be visible even when wsrep_on is set OFF
so we need to check variable_value from I_S also.
2019-12-05 12:41:13 +02:00
Jan Lindström
9d9a2253c6 Merge remote-tracking branch 10.2 into 10.3
Conflicts:
	mysql-test/suite/galera/t/galera_binlog_event_max_size_max-master.opt
	mysql-test/suite/innodb/r/innodb-mdev-7513.result
	mysql-test/suite/innodb/t/innodb-mdev-7513.test
	mysql-test/suite/wsrep/disabled.def
	storage/innobase/ibuf/ibuf0ibuf.cc
2019-12-02 14:35:10 +02:00
Jan Lindström
c6ed37b88a MDEV-21182: Galera test failure on MW-284
galera_2nodes.cnf did not contain wsrep_on=1 on correct places. Fixed
restart options to use correct configuration.
2019-11-30 13:52:49 +02:00
Marko Mäkelä
ec40980ddd Merge 10.3 into 10.4 2019-11-01 15:23:18 +02:00
Jan Lindström
6cdde9ebbf MDEV-20836 : Galera test failure on wsrep.variables
Add one more wait to make sure all threads have been started.
2019-10-16 13:01:40 +03:00
Jan Lindström
57b666b2e5 Fix test case wsrep.mdev_6832 we need to wait until wsrep_ready
is ON before test can end.
2019-10-08 15:04:39 +03:00
Marko Mäkelä
32ec5fb979 Merge 10.2 into 10.3 2019-08-21 15:23:45 +03:00
Marko Mäkelä
48c67038b9 Merge 10.1 into 10.2
For MDEV-15955, the fix in create_tmp_field_from_item() would cause a
compilation error. After a discussion with Alexander Barkov, the fix
was omitted and only the test case was kept.

In 10.3 and later, MDEV-15955 is fixed properly by overriding
create_tmp_field() in Item_func_user_var.
2019-08-20 09:15:28 +03:00
Jan Lindström
e6b505fd3c MDEV-18778: mysql_tzinfo_to_sql does not work correctly in MariaDB Galera
There were two problems:

(1) If user wanted same time zone information on all nodes in the Galera
cluster all updates were not replicated as time zone information was
stored on MyISAM tables. This is fixed on Galera by altering time zone
tables to InnoDB while they are modified.

(2) If user wanted different time zone information to nodes in the Galera
cluster TRUNCATE TABLE for time zone tables was replicated by Galera
destroying time zone information from other nodes. This is fixed
on Galera by introducing new option for mysql_tzinfo_to_sql_symlink
tool --skip-write-binlog to disable Galera replication while
time zone tables are modified.

Changes to be committed:
        modified:   mysql-test/r/mysql_tzinfo_to_sql_symlink.result
        modified:   mysql-test/suite/wsrep/r/mysql_tzinfo_to_sql_symlink.result
        new file:   mysql-test/suite/wsrep/r/mysql_tzinfo_to_sql_symlink_skip.result
        new file:   mysql-test/suite/wsrep/t/mysql_tzinfo_to_sql_symlink_skip.test
        modified:   sql/tztime.cc

This is 10.4 version of commit fa74088838
2019-08-16 07:07:31 +03:00
Jan Lindström
fa74088838 MDEV-18778: mysql_tzinfo_to_sql does not work correctly in MariaDB Galera
There were two problems:

(1) If user wanted same time zone information on all nodes in the Galera
cluster all updates were not replicated as time zone information was
stored on MyISAM tables. This is fixed on Galera by altering time zone
tables to InnoDB while they are modified.

(2) If user wanted different time zone information to nodes in the Galera
cluster TRUNCATE TABLE for time zone tables was replicated by Galera
destroying time zone information from other nodes. This is fixed
on Galera by introducing new option for mysql_tzinfo_to_sql_symlink
tool --skip-write-binlog to disable Galera replication while
time zone tables are modified.

Changes to be committed:
	modified:   mysql-test/r/mysql_tzinfo_to_sql_symlink.result
	modified:   mysql-test/suite/wsrep/r/mysql_tzinfo_to_sql_symlink.result
	new file:   mysql-test/suite/wsrep/r/mysql_tzinfo_to_sql_symlink_skip.result
	new file:   mysql-test/suite/wsrep/t/mysql_tzinfo_to_sql_symlink_skip.test
	modified:   sql/tztime.cc
2019-08-16 07:01:30 +03:00