* MDEV-16509 Improve wsrep commit performance with binlog disabled
Release commit order critical section early after trx_commit_low() if
binlog is not transaction coordinator. In order to avoid two phase commit,
binlog_hton is not registered for THD during IO_CACHE population.
Implemented a test which verifies that the transactions release
commit order early.
This optimization will change behavior during recovery as the commit
is not two phase when binlog is off. Fixed and recorded wsrep-recover-v25
and wsrep-recover to match the behavior.
* MDEV-18730 Ordering for wsrep binlog group commit
Previously out of order execution was allowed for wsrep commits.
Established proper ordering by populating wait_for_commit
for every wsrep THD and making group commit leader to wait for
prior commits before proceeding to trx_group_commit_leader().
* MDEV-18730 Added a test case to verify correct commit ordering
* MDEV-16509, MDEV-18730 Review fixes
Use WSREP_EMULATE_BINLOG() macro to decide if the binlog_hton
should be registered. Whitespace/syntax fixes and cleanups.
* MDEV-16509 Require binlog for galera_var_innodb_disallow_writes test
If the commit to InnoDB is done in one phase, the native InnoDB behavior
is that the transaction is committed in memory before it is persisted to
disk. This means that the innodb_disallow_writes=ON may not prevent
transaction to become visible to other readers before commit is completely
over. On the other hand, if the commit is two phase (as it is with binlog),
the transaction will be blocked in prepare phase.
Fixed the test to use binlog, which enforces two phase commit, which
in turn makes commit to block before the changes become visible to
other connections. This guarantees that the test produces expected
result.
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
Global variable wsrep_debug now can be used to filter wsrep-lib messages based on debug level provided.
Type of wsrep_debug is now set to be unsigned int, so tests and configuration files changed accordingly.
Removed including wsrep_api.h from service_wsrep.h. This caused
various kinds of collisions with definitions when wsrep is
not supposed to be built in. Defined functions wsrep_xid_seqno()
and wsrep_xid_uuid() in wsrep_dummy.cc. Replaced wsrep_seqno_t
with long long where wsrep_api.h is not included.
Removed wsrep_xid_seqno() macro from wsrep_mysqld.h and made
wsrep code using wsrep_xid_seqno() in handler.cc to be compiled
in only if WITH_WSREP is ON.
Included wsrep_api.h for mariabackup if WITH_WSREP is ON.
A new wsrep XID format was added to keep the XID implementation
backwards compatible. Original version always reads XID seqno
part in host byte order, the new version in little endian byte
order. Wsrep XID will always be written in the new format.
Included wsrep_api.h from service_wsrep.h for wsrep type definitions.
Removed redundant wsrep XID code from mariabackup and included
service_wsrep.h in order to use
The problem is that the seqno part of wsrep XID is always
stored in host byte order. This may cause issues when a physical
backup is restored on a host with different architecture, the
seqno part with XID may have incorrect value.
In order to fix this, wsrep XID seqno is always written into
XID data buffer in little endian byte order using int8store()
and read from data buffer using sint8korr(). For backwards
compatibility the seqno is read from TRX_SYS page in host
byte order during upgrade.
This patch implements byte ordering in wsrep_xid_init(),
wsrep_xid_seqno(), and exposes functions to read wsrep
XID uuid and seqno in wsrep_service_st. Backwards compatibility
for upgrade is provided in trx_rseg_init_wsrep_xid().