Galera parameter wsrep_gtid_domain_id was defined using a class where
actual parameter was not a first member. Fixed this by using normal
variable and assigning this value to class member value.
The setting innodb_lock_schedule_algorithm=VATS that was introduced
in MDEV-11039 (commit 021212b525e39d332cddd0b9f1656e2fa8044905)
causes conflicting exclusive locks to be incorrectly granted to
two transactions. Specifically, in lock_rec_insert_by_trx_age()
the predicate !lock_rec_has_to_wait_in_queue(in_lock) would hold even
though an active transaction is already holding an exclusive lock.
This was observed between two DELETE of the same clustered index record.
The HASH_DELETE invocation in lock_rec_enqueue_waiting() may be related.
Due to lack of progress in diagnosing the problem, we will remove the
option. The unsafe option was enabled by default between
commit 0c15d1a6ff0d18da946f050cfeac176387a76112 (MariaDB 10.2.3)
and the parent of
commit 1cc1d0429da14a041a6240c6fce17e0d31cad8e2 (MariaDB 10.2.17, 10.3.9),
and it was deprecated in
commit 295e2d500b31819422c97ad77beb6226b961c207 (MariaDB 10.2.34).
Some invalid wsrep_provider paths may be interpreted as a valid
directory. For example '/invalid/libgalera_smm.so' with UTF character
set is interpreted as '/', which is a valid directory. A early check
that wsrep_provider should not be a directory fixes it.
If the server is compiled WITH_WSREP=OFF, we should avoid evaluating
conditions on a global variable that is constant.
WSREP_ON_: Renamed from WSREP_ON. Defined only WITH_WSREP=ON.
WSREP_ON: Defined as unlikely(WSREP_ON_).
wsrep_on(): Defined as WSREP_ON && wsrep_service->wsrep_on_func().
The reason why we have wsrep_on() at all is that the macro WSREP(thd)
depends on the definition of THD, and that is intentionally an opaque
data type for InnoDB. So, we cannot avoid invoking wsrep_on(), but
we can evaluate the less expensive condition WSREP_ON before calling
the function.
We need to release global system variables mutex before
doing wsrep_init to avoid race with next show status and
we need to save wsrep_on value as it is changed on wsrep_init.
Added test case.
Support for galera GTID consistency thru cluster. All nodes in cluster
should have same GTID for replicated events which are originating from cluster.
Cluster originating commands need to contain sequential WSREP GTID seqno
Ignore manual setting of gtid_seq_no=X.
In master-slave scenario where master is non galera node replicated GTID is
replicated and is preserved in all nodes.
To have this - domain_id, server_id and seqnos should be same on all nodes.
Node which bootstraps the cluster, to achieve this, sends domain_id and
server_id to other nodes and this combination is used to write GTID for events
that are replicated inside cluster.
Cluster nodes that are executing non replicated events are going to have different
GTID than replicated ones, difference will be visible in domain part of gtid.
With wsrep_gtid_domain_id you can set domain_id for WSREP cluster.
Functions WSREP_LAST_WRITTEN_GTID, WSREP_LAST_SEEN_GTID and
WSREP_SYNC_WAIT_UPTO_GTID now works with "native" GTID format.
Fixed galera tests to reflect this chances.
Add variable to manually update WSREP GTID seqno in cluster
Add variable to manipulate and change WSREP GTID seqno. Next command
originating from cluster and on same thread will have set seqno and
cluster should change their internal counter to it's value.
Behavior is same as using @@gtid_seq_no for non WSREP transaction.
Problem was that tests select INFORMATION_SCHEMA.PROCESSLIST processes
from user system user and empty state. Thus, there is not clear
state for slave threads.
Changes:
- Added new status variables that store current amount of applier threads
(wsrep_applier_thread_count) and rollbacker threads
(wsrep_rollbacker_thread_count). This will make clear how many slave threads
of certain type there is.
- Added THD state "wsrep applier idle" when applier slave thread is
waiting for work. This makes finding slave/applier threads easier.
- Added force-restart option for mtr to always restart servers between tests
to avoid race on start of the test
- Added wait_condition_with_debug to wait until the passed statement returns
true, or the operation times out. If operation times out, the additional error
statement will be executed
Changes to be committed:
new file: mysql-test/include/force_restart.inc
new file: mysql-test/include/wait_condition_with_debug.inc
modified: mysql-test/mysql-test-run.pl
modified: mysql-test/suite/galera/disabled.def
modified: mysql-test/suite/galera/r/MW-336.result
modified: mysql-test/suite/galera/r/galera_kill_applier.result
modified: mysql-test/suite/galera/r/galera_var_slave_threads.result
new file: mysql-test/suite/galera/t/MW-336.cnf
modified: mysql-test/suite/galera/t/MW-336.test
modified: mysql-test/suite/galera/t/galera_kill_applier.test
modified: mysql-test/suite/galera/t/galera_parallel_autoinc_largetrx.test
modified: mysql-test/suite/galera/t/galera_parallel_autoinc_manytrx.test
modified: mysql-test/suite/galera/t/galera_var_slave_threads.test
modified: mysql-test/suite/wsrep/disabled.def
modified: mysql-test/suite/wsrep/r/variables.result
modified: mysql-test/suite/wsrep/t/variables.test
modified: sql/mysqld.cc
modified: sql/wsrep_mysqld.cc
modified: sql/wsrep_mysqld.h
modified: sql/wsrep_thd.cc
modified: sql/wsrep_var.cc
Refactored wsrep patch to not use LOCK_thread_count and COND_thread_count anymore.
This has partially been replaced by using old LOCK_wsrep_slave_threads mutex.
For slave thread count change waiting, new COND_wsrep_slave_threads signal has been added
Added LOCK_wsrep_cluster_config mutex to control that cluster address change cannot happen in parallel
Protected wsrep_slave_threads variable changes with LOCK_cluster_config mutex
This is for avoiding concurrent slave thread count and cluster joining operations to happen
Fixes according to Teemu's review
Global variable wsrep_debug now can be used to filter wsrep-lib messages based on debug level provided.
Type of wsrep_debug is now set to be unsigned int, so tests and configuration files changed accordingly.
Problem was that controlling connection i.e. connection that
executed the query SET GLOBAL wsrep_reject_queries = ALL_KILL;
was also killed but server would try to send result from that
query to controlling connection resulting a assertion
mysqld: /home/jan/mysql/10.2-sst/include/mysql/psi/mysql_socket.h:738: inline_mysql_socket_send: Assertion `mysql_socket.fd != -1' failed.
as socket was closed when controlling connection was closed.
wsrep_close_client_connections()
Do not close controlling connection and instead of
wsrep_close_thread() we do now soft kill by THD::awake
wsrep_reject_queries_update()
Call wsrep_close_client_connections using current thd.
Problem was that controlling connection i.e. connection that
executed the query SET GLOBAL wsrep_reject_queries = ALL_KILL;
was also killed but server would try to send result from that
query to controlling connection resulting a assertion
mysqld: /home/jan/mysql/10.2-sst/include/mysql/psi/mysql_socket.h:738: inline_mysql_socket_send: Assertion `mysql_socket.fd != -1' failed.
as socket was closed when controlling connection was closed.
wsrep_close_client_connections()
Do not close controlling connection and instead of
wsrep_close_thread() we do now soft kill by THD::awake
wsrep_reject_queries_update()
Call wsrep_close_client_connections using current thd.
MDEV-17058: Test failure on wsrep.variables
MDEV-17060: Test failure on galera.galera_var_slave_threads
Fix incorrect calculation of increased applier (slave) threads.
Note that increase change takes effect "immediately" but we should
use proper wait condition to wait it. Reducing the number of
slave threads is not immediate as thread will only exit after a
replication event.
MDEV-17058: Test failure on wsrep.variables
MDEV-17060: Test failure on galera.galera_var_slave_threads
Fix incorrect calculation of increased applier (slave) threads.
Note that increase change takes effect "immediately" but we should
use proper wait condition to wait it. Reducing the number of
slave threads is not immediate as thread will only exit after a
replication event.