If a replica failed to update the GTID slave state when committing
an XA PREPARE, the replica would retry the transaction and get an
out-of-order GTID error. This is because the commit phase of an XA
PREPARE is bifurcated. That is, first, the prepare is handled by the
relevant storage engines. Then second, the GTID slave state is
updated as a separate autocommit transaction. If the second phase
fails, and the transaction is retried, then the same transaction is
attempted to be committed again, resulting in a GTID out-of-order
error.
This patch fixes this error by immediately stopping the slave and
reporting the appropriate error. That is, there was logic to bypass
the error when updating the GTID slave state table if the underlying
error is allowed for retry on a parallel slave. This patch adds a
parameter to disallow the error bypass, thereby forcing the error
state to still happen.
Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>
In MariaDB, we have a confusing problem where:
* The transaction_isolation option can be set in a configuration file, but it cannot be set dynamically.
* The tx_isolation system variable can be set dynamically, but it cannot be set in a configuration file.
Therefore, we have two different names for the same thing in different contexts. This is needlessly confusing, and it complicates the documentation. The same thing applys for transaction_read_only.
MySQL 5.7 solved this problem by making them into system variables. https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-20.html
This commit takes a similar approach by adding new system variables and marking the original ones as deprecated. This commit also resolves some legacy problems related to SET STATEMENT and transaction_isolation.
- agressively -> aggressively
- exising -> existing
- occured -> occurred
- releated -> related
- seperated -> separated
- sucess -> success
- use use -> use
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
- Description:
- Before 10.3.8 semisync was a plugin that is built into the server with
MDEV-13073,starting with commit cbc71485e2.
There are still some usage of `rpl_semi_sync_master` in mtr.
Note:
- To recognize the replica in the `dump_thread`, replica is creating
local variable `rpl_semi_sync_slave` (the keyword of plugin) in
function `request_transmit`, that is catched by primary in
`is_semi_sync_slave()`. This is the user variable and as such not
related to the obsolete plugin.
- Found in `sys_vars.all_vars` and `rpl_semi_sync_wait_point` tests,
usage of plugins `rpl_semi_sync_master`, `rpl_semi_sync_slave`.
The former test is disabled by default (`sys_vars/disabled.def`)
and marked as `obsolete`, however this patch will remove the queries.
- Add cosmetic fixes to semisync codebase
Reviewer: <brandon.nesterenko@mariadb.com>
Closes PR #2528, PR #2380
The hang could be seen as show slave status displaying an error like
Last_Error: Could not execute Write_rows_v1
along with
Slave_SQL_Running: Yes
accompanied with one of the replication threads in show-processlist
characteristically having status like
2394 | system user | | NULL | Slave_worker | 50852| closing tables
It turns out that closing tables worker got entrapped in endless looping
in mark_start_commit_inner() across already garbage-collected gco items.
The reclaimed gco links are explained with actually possible
out-of-order groups of events termination due to the Last_Error.
This patch reinforces the correct ordering to perform
finish_event_group's cleanup actions, incl unlinking gco:s
from the active list.
Task:
=====
Update tests to reflect MDEV-20122, deprecation of master_use_gtid=current_pos.
Change Master (CM) statements were either removed or modified with
current_pos --> slave_pos based on original intention of the test.
Reviewed by:
============
Brandon Nesterenko <brandon.nesterenko@mariadb.com>
One of the constraints added in the MDEV-29639 patch, is that only
the first event after idling should update last_master_timestamp;
and as long as the replica has more events to execute, the variable
should not be updated. The corresponding test,
rpl_delayed_parallel_slave_sbm.test, aims to verify this; however,
if the IO thread takes too long to queue events, the SQL thread can
appear to catch up too fast.
This fix ensures that the relay log has been fully written before
executing the events.
Note that the underlying cause of this test failure needs to be
addressed as a bug-fix, this is a temporary fix to stop test
failures. To track work on the bug-fix for the underlying issue,
please see MDEV-30619.
Renames the upgrade state file, and ensures the old
file is properly removed when `mariadb-upgrade` tool is executed.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
ANALYZE was observed to race over a preceding in binlog order DML
in updating the binlog and slave gtid states.
Tagging ANALYZE and other admin class commands in binlog by the fixes
of MDEV-17515 left a flaw allowing such race leading to
the gtid mode out-of-order error.
This is fixed now to observe by ADMIN commands the ordered access to
the slave gtid status variables and binlog.
Problem
========
On a parallel, delayed replica, Seconds_Behind_Master will not be
calculated until after MASTER_DELAY seconds have passed and the
event has finished executing, resulting in potentially very large
values of Seconds_Behind_Master (which could be much larger than the
MASTER_DELAY parameter) for the entire duration the event is
delayed. This contradicts the documented MASTER_DELAY behavior,
which specifies how many seconds to withhold replicated events from
execution.
Solution
========
After a parallel replica idles, the first event after idling should
immediately update last_master_timestamp with the time that it began
execution on the primary.
Reviewed By
===========
Andrei Elkin <andrei.elkin@mariadb.com>
The user XA commit execution branch was caught not have been covered
with MDEV-21953 fixes.
The XA involved deadlock is resolved now to apply the former fixes
pattern.
Along the fixes the following changes have been implemented.
- MDL lock attribute correction
- dissociation of the externally completed XA from the current
thread's xid_state in the error branches
- cleanup_context() preseves the prepared XA
- wait_for_prior_commit() is relocated to satisfy both
the binlog ON (log-slave-updates and skip-log-bin)
and OFF slave execution branches.