Issue:
The Commit: 0584846 'Add TL_FIRST_WRITE in SQL layer for determining R/W'
limits INSERT statements on write nodes to acquire MDL locks on it's all child
tables and thereby wsrep certification keys added, but on applier nodes it does
acquire MDL locks for all child tables. This can result into MDL BF-BF conflict
on applier node when transactions referring to parent and child tables are
executed concurrently. For example:
Tables with foreign keys: t1<-t2<-t3<-t4
Conflicting transactions: INSERT t1 and DROP TABLE t4
Wsrep certification keys taken on write node:
- for INSERT t1: t1 and t2
- for DROP TABLE t4: t4
On applier node MDL BF-BF conflict happened between two transaction because
MDL locks on t1, t2, t3 and t4 were taken for INSERT t1, which conflicted
with MDL lock on t4 taken by DROP TABLE t4.
The Wsrep certification keys helps in resolving this MDL BF-BF conflict by
prioritizing and scheduling concurrent transactions. But to generate Wsrep
certification keys it needs to open and take MDL locks on all the child tables.
The Commit: 0584846 change limits MDL lock to be taken on all child nodes for
read-only FK checks (INSERT t1). But this doesn't works on applier nodes
because Write_rows_log_event event logged for INSERT is also used to record
update (check Write_rows_log_event::get_trg_event_map()), and
therefore MDL locks is taken for all the child tables on applier node
for update and insert event.
Solution:
Additional keys for the referenced/foreign table needs to be added
to avoid potential MDL conflicts with concurrent update and DDLs.
Problem was that for partitioned tables base table storage engine
is DB_TYPE_PARTITION_DB and naturally different than DB_TYPE_INNODB
so operation was not allowed in Galera.
Fixed by requesting implementing storage engine for partitioned
tables i.e. table->file->partition_ht() or if that does not exist
we can use base table storage engine. Resulting storage engine
type is then used on condition is operation allowed when
wsrep_mode=DISALLOW_LOCAL_GTID or not. Operations to InnoDB
storage engine i.e DB_TYPE_INNODB should be allowed.
A new Galera feature that allows retrying of applying of writesets at
slave nodes (codership/mysql-wsrep-bugs/#1619). Currently replication
applying stops for first non ignored failure occurring in event
applying, and node will do emergency abort (or start inconsistency
voting). Some failures, however, can be concurrency related, and
applying may succeed if the operation is tried at later time.
This feature introduces a new dynamic global option variable
"wsrep_applier_retry_count" that controls the retry-applying feature:
a zero value disables retrying and a positive value sets the maximum
number of retry attempts. The default value for this option is zero,
which means that this feature is disabled by default.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
main/statistics_json.result is updated for f8ba5ced55 (MDEV-36099)
The test uses 'delete from t1' in many places and then populates
the table again. The natural order of rows in a MyISAM table is well
defined and the test was implicitly relying on that.
before f8ba5ced55 delete was deleting rows one by one, using
ha_myisam::delete_row() because the connection was stuck in rbr mode.
This caused rows to be shown in the reverse insertion order (because of
the delete link list).
MDEV-36099 fixes this bug and the server now correctly uses
ha_myisam::delete_all_rows(). This makes rows to be shown in the
insertion order as expected.
Cluster configuration was incorrect e.g. wsrep_node_address
was missing. Therefore, Galera replication was not properly
initialized and TOI is not supported.
Fix is to check when user tries to start Galera replication
with wsrep_on=ON that Galera replication is properly
initialized and node is ready to receive operations. If
Galera replication is not properly initialized return
a error.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
When replicating transactions from parallel slave replication
processing, Galera must respect the commit order of the parallel
slave replication. In the current implementation this is done by
calling `wait_for_prior_commit()` before the write set is
replicated and certified in before-prepare processing. This
however establishes a critical section which is held over
whole Galera replication step, and the commit rate will be
limited by Galera replication latency.
In order to allow concurrency in Galera replication step, the
critical section must be released at earliest point where Galera
can guarantee sequential consistency for replicated write sets.
This change passes a callback to release the critical section
by calling `wakeup_subsequent_commits()` to Galera library, which will
call the callback once the correct replication order can be established.
This functionality will be available from Galera 26.4.22 onwards.
Note that call to `wakeup_subsequent_commits()` at this stage is
safe from group commit point of view as Galera uses separate
`wait_for_commit` context to control commit ordering.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
This commit fixes a bug where Aria tables are used in
(master->slave1->slave2) and a backup is taken on slave2. In this case
it is possible that the replication position in the backup, stored in
mysql.gtid_slave_pos, will be wrong. This will lead to replication
errors if one is trying to use the backup as a new slave.
Analyze:
Replicated row events are committed with trans_commit_stmt() and
thd->transaction->all.ha_list != 0.
This means that backup_commit_lock is not taken for Aria tables,
which means the rows are committed and binary logged on the slave
under BLOCK_COMMIT which should not happen.
This issue does not occur on the master as thd->transaction->all.ha_list
is == 0 under AUTO_COMMIT, which sets 'is_real_trans' and 'rw_trans'
which in turn causes backup_commit_lock to be taken.
Fixed by checking in ha_check_and_coalesce_trx_read_only() if all handlers
supports rollback and if not, then wait for BLOCK_COMMIT also for
statement commit.
Issue:
Mariadb acquires additional MDL locks on UPDATE/INSERT/DELETE statements
on table with foreign keys. For example, table t1 references t2, an
UPDATE to t1 will MDL lock t2 in addition to t1.
A replica may deliver an ALTER t1 and UPDATE t2 concurrently for
applying. Then the UPDATE may acquire MDL lock for t1, followed by a
conflict when the ALTER attempts to MDL lock on t1. Causing a BF-BF
conflict.
Solution:
Additional keys for the referenced/foreign table needs to be added
to avoid potential MDL conflicts with concurrent update and DDLs.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Problem was that initial GTID was set on wsrep_before_prepare
out-of-order. In practice GTID was set to same as previous
executed transaction GTID. In recovery valid GTID was found
from prepared transaction and this transaction is committed
leading to fact that same GTID was executed twice.
This is fixed by setting invalid GTID at wsrep_before_prepare
and later in wsrep_before_commit actual correct GTID is set
and this setting is done while we are in commit monitor i.e.
assigment is done in order of replication.
In recovery if prepared transaction is found we check its
GTID, if it is invalid transaction will be rolled back
and if it is valid it will be committed.
Initialize gtid seqno from recovered seqno when
bootstrapping a new cluster.
Added two test cases for both mariabackup and rsync SST methods
to show that GTIDs remain consistent on cluster and that
all expected rows are in the table.
Added tests for wsrep GTID recovery with binlog on and off.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Problem was missing case from wsrep_handle_mdl_conflict. Test case
was trying to confirm that LOCK TABLE thread is not BF-aborted.
However as case was missing it was BF-aborted. Test case passed
because BF-aborting takes time and used wait condition might
see expected thread status before it was BF-aborted. Test naturally
failed if BF-aborting was done early enough.
Fix is to add missing case for SQLCOM_LOCK_TABLES to
wsrep_handle_mdl_conflict.
Note that using LOCK TABLE is still not recomended on cluster
because it could cause cluster hang.
This is a 10.5 specific commit that will then be overridden by
another one for 10.6+.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Problem was missing case from wsrep_handle_mdl_conflict. Test case
was trying to confirm that LOCK TABLE thread is not BF-aborted.
However as case was missing it was BF-aborted. Test case passed
because BF-aborting takes time and used wait condition might
see expected thread status before it was BF-aborted. Test naturally
failed if BF-aborting was done early enough.
Fix is to add missing case for SQLCOM_LOCK_TABLES to
wsrep_handle_mdl_conflict.
Note that using LOCK TABLE is still not recomended on cluster
because it could cause cluster hang.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
* rpl.rpl_system_versioning_partitions updated for MDEV-32188
* innodb.row_size_error_log_warnings_3 changed error for MDEV-33658
(checks are done in a different order)
Fix a regression that caused assertion thd->is_error() after
sync wait failures. If wsrep_sync_wait() fails make sure a appropriate
error is set. Partially revert 75dd0246f8.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>