After the merging of MDEV-24915, 10.6 branch has regressions with handling of
concurrent write load against two or more cluster nodes. These regressions may
surface as cluster hanging, node crashes or data inconsistency. With some test
scenarios, the only visible symptom could be that the BF victim aborting happens
only by innodb lock wait timeout expiration. This would result only to poor
performance (by default 50 sec hang for each BF conflict), and could be somewhat
difficult to diagnose.
This pull request has following fixes to handle concurrent write load from
multiple nodes:
In lock_wait_wsrep_kill(), the victim trx was expected to be only in
TRX_STATE_ACTIVE state. With the delayed BF conflict handling, it can happen
that victim has advanced into pre commit state. This was fixed by choosing
victim both in TRX_STATE_ACTIVE and TRX_STATE_PREPARED states.
Victim transaction may be in several different states at the time of detected
lock conflict, and due to delayed BF aborting practice in MDEV-24915, the victim
may advance further before the actual BF aborting takes place. The BF aborting
in MDEV-24915 did not wake the victim, if it was in the state of waiting for
some other lock (than the one that was blocking the high priority thread).
This anomaly caused the innodb lock wait timeout expiration delays and poor
performance symptom. To fix this, lock_wait_wsrep_kill() now looks if
victim is in lock waiting state, and uses lock_cancel_waiting_and_release()
to cancel this lock wait.
wsrep_bf_abort() checks if the victim has active transaction (in wsrep-lib),
and starts a new transaction if there was no active transaction before.
Due to late BF aborting, the victim may have e.g. failed in certification
and is already aborting or has aborted at this stage. This has caused
problems in testing where BF aborter tries to BF abort himself.
The fix in wsrep_bf_abort() now skips the BF abort, if victim is aborting
or has aborted. Victim may not have started transaction yet in wsrep context,
but it may have acquired MDL locks (due to DDL execution), and this has
caused BF conflict. Such case does not require aborting in wsrep or
replication provider state.
BF aborting could cause BF-BF conflict scenario, if victim was already aborted
and changed to replayer having high priority as well. This BF-BF conflict
scenario is now avoided in lock_wait_wsrep() where we now check if blocking
lock holder is also high priority and is ordered before, caller should wait
for the lock in this situation.
The natural innodb deadlock resolving algorithm could pick BF thread as
deadlock victim. This is fixed by giving max weigh to BF threads in
Deadlock::report().
MDEV-24341 has changed excution paths in do_command() and this affects BF
aborted victim execution. This PR fixes one assert in do_command():
DBUG_ASSERT(!thd->async_state.pending_ops())
Which fired if the thd was BF aborted earlier. This assert is now changed
to allow pending_ops() if thd was BF aborted before.
With these fixes, long term highly conflicting write load could be run against
to node cluster. If binlogging is configured, log_slave_updates should be
also set.
Introduced two new wsrep_mode options
* REPLICATE_MYISAM
* REPLICATE_ARIA
Depracated wsrep_replicate_myisam parameter and we use
wsrep_mode = REPLICATE_MYISAM instead.
This required small refactoring of wsrep_check_mode_after_open_table
so that both MyISAM and Aria are handled on required DML cases.
Similarly, added Aria to wsrep_should_replicate_ddl to handle DDL
for Aria tables using TOI. Added test cases and improved MyISAM testing.
Changed use of wsrep_replicate_myisam to wsrep_mode = REPLICATE_MYISAM
There were multiple problems here
* wsrep_trx_fragment_size should not be set when wsrep is disabled or provider is not loaded
* wsrep_trx_fragment_unit should not be set when wsrep is disabled or provider is not loaded
* wsrep_debug has no effect if wsrep is disabled or provider is not loaded
* wsrep_start_position should not be set when wsrep is disabled or provider is not loaded any other value than default
* wsrep_start_position should be changed only when we are joiner or initialized
* wsrep_start_position should be allowed to set only a value that exits, thus
we need to add error handling to wsrep_sst_complete
A wsrep transaction was started for EXECUTE IMMEDIATE, which
caused assertion failure when the executed statement was
CREATE TABLE which should be executed in TOI mode.
As a fix, don't start wsrep transaction for EXECUTE IMMEDIATE
to let the wsrep state logic to be handled from inside stored
procedure codepath.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Change xarecover_handlerton so that transaction with WSREP prefixed
xids are rolled back when Galera is disabled.
Reviewd-by: Jan Lindström <jan.lindstrom@mariadb.com>
Some invalid wsrep_provider paths may be interpreted as a valid
directory. For example '/invalid/libgalera_smm.so' with UTF character
set is interpreted as '/', which is a valid directory. A early check
that wsrep_provider should not be a directory fixes it.
When binlog is disabled, WSREP will not behave correctly when
SAVEPOINT ROLLBACK is executed since we don't register handlers for such case.
Fixed by registering WSREP handlerton for SAVEPOINT related commands.
We need to release global system variables mutex before
doing wsrep_init to avoid race with next show status and
we need to save wsrep_on value as it is changed on wsrep_init.
Added test case.
For MDEV-15955, the fix in create_tmp_field_from_item() would cause a
compilation error. After a discussion with Alexander Barkov, the fix
was omitted and only the test case was kept.
In 10.3 and later, MDEV-15955 is fixed properly by overriding
create_tmp_field() in Item_func_user_var.
There were two problems:
(1) If user wanted same time zone information on all nodes in the Galera
cluster all updates were not replicated as time zone information was
stored on MyISAM tables. This is fixed on Galera by altering time zone
tables to InnoDB while they are modified.
(2) If user wanted different time zone information to nodes in the Galera
cluster TRUNCATE TABLE for time zone tables was replicated by Galera
destroying time zone information from other nodes. This is fixed
on Galera by introducing new option for mysql_tzinfo_to_sql_symlink
tool --skip-write-binlog to disable Galera replication while
time zone tables are modified.
Changes to be committed:
modified: mysql-test/r/mysql_tzinfo_to_sql_symlink.result
modified: mysql-test/suite/wsrep/r/mysql_tzinfo_to_sql_symlink.result
new file: mysql-test/suite/wsrep/r/mysql_tzinfo_to_sql_symlink_skip.result
new file: mysql-test/suite/wsrep/t/mysql_tzinfo_to_sql_symlink_skip.test
modified: sql/tztime.cc
This is 10.4 version of commit fa74088838
There were two problems:
(1) If user wanted same time zone information on all nodes in the Galera
cluster all updates were not replicated as time zone information was
stored on MyISAM tables. This is fixed on Galera by altering time zone
tables to InnoDB while they are modified.
(2) If user wanted different time zone information to nodes in the Galera
cluster TRUNCATE TABLE for time zone tables was replicated by Galera
destroying time zone information from other nodes. This is fixed
on Galera by introducing new option for mysql_tzinfo_to_sql_symlink
tool --skip-write-binlog to disable Galera replication while
time zone tables are modified.
Changes to be committed:
modified: mysql-test/r/mysql_tzinfo_to_sql_symlink.result
modified: mysql-test/suite/wsrep/r/mysql_tzinfo_to_sql_symlink.result
new file: mysql-test/suite/wsrep/r/mysql_tzinfo_to_sql_symlink_skip.result
new file: mysql-test/suite/wsrep/t/mysql_tzinfo_to_sql_symlink_skip.test
modified: sql/tztime.cc