This is the first part of the fixes for MDEV-24097. This commit
contains the fixes for instability when testing Galera and when
restarting nodes quickly:
1) Protection against a "stuck" old SST process during the execution
of the new SST (after restarting the node) is now implemented for
mariabackup / xtrabackup, which should help to avoid almost all
conflicts due to the use of the same ports - both during testing
with mtr, so and when restarting nodes quickly in a production
environment.
2) Added more protection to scripts against unexpected return of
the rc != 0 (in the commands for deleting temporary files, etc).
3) Added protection against unexpected crashes during binlog transfer
(in SST scripts for rsync).
4) Spaces and some special characters in binlog filenames shouldn't
be a problem now (at the script level).
5) Daemon process termination tracking has been made more robust
against crashes due to unexpected termination of the previous SST
process while new scripts are running.
6) Reading ssl encryption parameters has been moved from specific
SST scripts to a common wsrep_sst_common.sh script, which allows
unified error handling, unified diagnostics and simplifies script
revisions in the future.
7) Improved diagnostics of errors related to the use of openssl.
8) Corrections have been made for xtrabackup-v2 (both in tests and in
the script code) that restore the work of xtrabackup with updated
versions of innodb.
9) Fixed some tests for galera_3nodes, although the complete solution
for the problem of starting three nodes at the same time on fast
machines will be done in a separate commit.
No additional tests are required as this commit fixes problems with
existing tests.
1. Galera SST scripts should use ssl_capath (not ssl_ca) for CA
directory. The current implementation tries to automatically
detect the path using the trailing slash in the ssl_ca variable
value, but this approach is not compatible with the server
configuration. Now, by analogy with the server, SST scripts
also use a separate ssl_capath variable. In addition, a similar
tcapath variable has been added for the old-style configuration
(in the "sst" section).
2. Openssl utility detection made more reliable.
3. Removed extra spaces in automatically generated command lines -
to simplify debugging of the SST scripts.
4. In general, the code for detecting the presence or absence of
auxiliary utilities has been improved - it is made more reliable
in some configurations (and for shells other than bash).
* galera_kill_applier : we should make sure that node has
correct number of wsrep appliers
* galera_bad_wsrep_new_cluster: This test restarts both nodes,
so it is bad on mtr. Make sure it is run alone
* galera_update_limit : Make sure we have PK when needed
galera_as_slave_replay : bf abort was not consistent
* galera_unicode_pk : Add wait_conditions so that all nodes
are part of cluster and DDL and INSERT has replicated before
any further operations are done.
* galera_bf_abort_at_after_statement : Add wait_conditions
to make sure all nodes are part of cluster and that DDL
and INSERT has replicated. Make sure we reset DEBUG_SYNC.
Another batch of changes that should make the SST process
more reliable in all scenarios:
1) Added hostname or CN verification when stunnel is used
with certificate chain verification (verifyChain = yes);
2) Added check for the absence of the stunnel utility for
mtr tests;
3) Deletion of working files before and after SST is done
more accurately;
4) rsync on joiner can be run even if the path to its
configuration file contains spaces;
5) More accurate directory creation (for data files and
for logs);
6) IST with mysqldump no longer turns off statement logging;
7) Reset password for mysqldump when password is empty but
username is specified;
8) More reliable quoting when generating statements in
wsrep_sst_mysqldump;
9) Added explicit generation of 2048-bit Diffie-Hellman
parameters for sockat < 1.7.3, by analogy with xtrabackup;
10) Compression parameters for qpress are read from all
suitable server groups in configuration file, as well as
from the [sst] and [xtrabackup] groups;
11) Added a test that checks compression using qpress;
12) Checking for optional utilities is modified to work even
if they implemented as built-in shell commands (unlikely
on real systems, but more reliable).
Another batch of changes that should make the SST process
more reliable in all scenarios:
1) Added hostname or CN verification when stunnel is used
with certificate chain verification (verifyChain = yes);
2) Added check for the absence of the stunnel utility for
mtr tests;
3) Deletion of working files before and after SST is done
more accurately;
4) rsync on joiner can be run even if the path to its
configuration file contains spaces;
5) More accurate directory creation (for data files and
for logs);
6) IST with mysqldump no longer turns off statement logging;
7) Reset password for mysqldump when password is empty but
username is specified;
8) More reliable quoting when generating statements in
wsrep_sst_mysqldump;
9) Added explicit generation of 2048-bit Diffie-Hellman
parameters for sockat < 1.7.3, by analogy with xtrabackup;
10) Compression parameters for qpress are read from all
suitable server groups in configuration file, as well as
from the [sst] and [xtrabackup] groups;
11) Added a test that checks compression using qpress;
12) Checking for optional utilities is modified to work even
if they implemented as built-in shell commands (unlikely
on real systems, but more reliable).
Another batch of changes that should make the SST process
more reliable in all scenarios:
1) Added hostname or CN verification when stunnel is used
with certificate chain verification (verifyChain = yes);
2) Added check for the absence of the stunnel utility for
mtr tests;
3) Deletion of working files before and after SST is done
more accurately;
4) rsync on joiner can be run even if the path to its
configuration file contains spaces;
5) More accurate directory creation (for data files and
for logs);
6) IST with mysqldump no longer turns off statement logging;
7) Reset password for mysqldump when password is empty but
username is specified;
8) More reliable quoting when generating statements in
wsrep_sst_mysqldump;
9) Added explicit generation of 2048-bit Diffie-Hellman
parameters for sockat < 1.7.3, by analogy with xtrabackup;
10) Compression parameters for qpress are read from all
suitable server groups in configuration file, as well as
from the [sst] and [xtrabackup] groups;
11) Added a test that checks compression using qpress;
12) Checking for optional utilities is modified to work even
if they implemented as built-in shell commands (unlikely
on real systems, but more reliable).
After switching to the new mariabackup interface (instead of
the outdated innobackupex interface, which is supported for
compatibility), we need to explicitly pass a path to the datadir
directory as a parameter, since in the new interface the value
of this option is not automatically set in such a way that it
always matches the SST/IST logic. This commit adds passing this
option as an explicit parameter to mariabackup. This commit also
removed unnecessary options that are not used and not supported
by mariabackup.
Also, numerous flaws in the common wsrep_sst_common script have
been fixed:
1) There are many bash-specific constructs in the script that
may not be supported by other interpreters, which can lead
to the most unexpected errors during SST, because failures
in the interpretation of bash-specific constructs lead to
incorrect parsing of arguments;
2) There is parse_cnf() function which is often called by other
scripts for the "mysqld" or "--mysqld" group, but it does not
take into account the default group suffix, which leads to
reading values only from the default group, which then leads
to errors due to reading the default values instead of the
values for a specific group;
3) Some options such as --user, --innodb-data-home-dir or --datadir
are not removed from the --mysqld-args list, although they are
processed inside scripts (and passing of these options funther
may cause problems for mariabackup);
4) If an argument that the script understands is present in
the --mysqld-args list twice, then this causes SST to fail,
instead of reading the most recent value;
5) The "--host" parameter is technically still supported among
the arguments of the SST scripts, but in reality scripts do not
work with it as expected, especially if it has an IPv6 address;
6) If the port number is absent in the --address parameter value,
but the port number is explicitly passed through the --port
argument, then the scripts for mariabackup and xtrabackup-v2
fail;
7) If a new address interface is used (with the --address parameter),
then automatic default port substitution is not performed, although
it is supported for the legacy --host/--port interface.
8) If there are spaces in the parameter values after --mysqld_args,
then their further transfer does not occur correctly, which
causes mariabackup to fail during SST - the space splits
the argument in such a way that it breaks the parsing of the
following parameters;
9) If most of the parameters that are names or paths to the files
or directories contain spaces, then SST scripts fail in an
unpredictable way due to incorrect variable substitutions;
10) If the --log-bin option is passed among the arguments of myqlds
(--mysqld-args) without a parameter, and the --binlog option
is not specified, then the script cannot substitute the default
name for binlog and cannot construct binlog name using the
--log-basename argument (which is against server specifications);
11) Tail slashes are not removed from the directory names, which,
upon further substitution, leads to the appearance of a double
slash in the file paths;
12) The explicit --binlog parameter (which is now always transmitted
from the server side) and the "hidden" --log-bin parameter in the
list of arguments after --mysqld-args are perceived as two different
parameters in different parts of the scripts, and if they are do not
match for some reason, this will lead to failures during SST;
Also, all new changes from the 10.6 branch have been migrated here,
including the latest pull requests for authentication (only the part
that concerns SST scripts).
It also fixes dozens of other bugs in all SST scripts.
This commit removes the mtr test galera_sst_mariabackup_encrypt_with_key
from the list of disabled tests because the problem with it has already
been fixed.
The auto-increment variables may change intermittently during
the execution of some tests from the Galera mtr suite, causing
these tests to fail. This patch creates conditions in which
unpredictable changes to these variables are not possible
during the execution of those tests in which this problem
is noticed or their values are restored before the end of
testing.
MDEV-24649 galera.galera_bf_lock_wait MTR failed with sigabrt: Assertion `!is_ow
ned()' failed in sync0policy.ic on MutexDebug with Mutex = TTASEventMutex<GenericPolicy>
Bug was fixed as part of MDEV-23328, this just adds test cases to
regression set.
The test requires adaptation for MariaDB, which is done
in this patch. In addition, this patch includes a fix for
the SST script startup code that adds escaping for special
characters on the command line (in case they are contained
in the arguments to mysqld). The fix does not require
separate tests, as the required tests are already part
of the mtr suite for Galera.
We need to complete SST if both new and old start positions are
not same as initial positions. If they are initial positions
just set local uuid and seqno.
There were multiple problems here
* wsrep_trx_fragment_size should not be set when wsrep is disabled or provider is not loaded
* wsrep_trx_fragment_unit should not be set when wsrep is disabled or provider is not loaded
* wsrep_debug has no effect if wsrep is disabled or provider is not loaded
* wsrep_start_position should not be set when wsrep is disabled or provider is not loaded any other value than default
* wsrep_start_position should be changed only when we are joiner or initialized
* wsrep_start_position should be allowed to set only a value that exits, thus
we need to add error handling to wsrep_sst_complete