There was race between a committing transaction and the following in binlog
order FLUSH LOGS that could create a 2nd Binlog checkpoint (BCP) event
in the new file *before* the first logged-in-old-binlog transaction gets committed in
Innodb. That would cause the transaction loss at recovery, should
the server stop right after the BCP.
The race is tackled by enforcing the necessary set of mutexes to be acquired
by FLUSH-LOGS handler in the correct order (of the group commit leader
pattern).
Note, there remain two cases where a similar race is still possible:
- the above race as it is when the server is run with ("unlikely")
non-default `--binlog-optimize-thread-scheduling=0` (MDEV-24530), and
- at unlikely event of bin-logging of Incident event (MDEV-24531) that
also triggers binlog rotation,
in both cases though with lesser chances after the current fixes.
Analysis:
========
Writes to 'rli->log_space_total' needs to be synchronized, otherwise both
SQL_THREAD and IO_THREAD can try to modify the variable simultaneously
resulting in incorrect rli->log_space_total. In the current test scenario
SQL_THREAD is trying to decrement 'rli->log_space_total' in 'purge_first_log'
and IO_THREAD is trying to increment the 'rli->log_space_total' in
'queue_event' simultaneously. Hence test occasionally fails with result
mismatch.
Fix:
===
Convert 'rli->log_space_total' variable to atomic type.
EVEN IF I LOG TO FILE.
Analysis:
----------
MYSQL_UPGRADE of the master breaks the replication when
the query logging is enabled with FILE/NONE 'log-output'
option on the slave.
mysql_upgrade modifies the 'general_log' and 'slow_log'
tables after the logging is disabled as below:
SET @old_log_state = @@global.general_log;
SET GLOBAL general_log = 'OFF';
ALTER TABLE general_log
MODIFY event_time TIMESTAMP NOT NULL,
( .... );
SET GLOBAL general_log = @old_log_state;
and
SET @old_log_state = @@global.slow_query_log;
SET GLOBAL slow_query_log = 'OFF';
ALTER TABLE slow_log
MODIFY start_time TIMESTAMP NOT NULL,
( .... );
SET GLOBAL slow_query_log = @old_log_state;
In the binary log, only the ALTER statements are logged
but not the SET statements which turns ON/OFF the logging.
So when the slave replays the binary log,the ALTER of LOG
tables throws an error since the logging is enabled. Also
the 'log-output' option is not checked to determine
whether to allow/disallow the ALTER operation.
Fix:
----
The 'log-output' option is included in the check while
determining whether the query logging happens using the
log tables.
Picked from mysql respository at 0daaf8aecd8f84ff1fb400029139222ea1f0d812
Analysis:
========
RESET MASTER TO # command deletes all binary log files listed in the index
file, resets the binary log index file to be empty, and creates a new binary
log with number #. When the user provided binary log number is greater than
the max allowed value '2147483647' server fails to generate a new binary log.
The RESET MASTER statement marks the binlog closure status as
'LOG_CLOSE_TO_BE_OPENED' and exits. Statements which follow RESET MASTER
try to write to binary log they find the log_state != LOG_CLOSED and
proceed to write to binary log cache and it results in crash.
Fix:
===
During MYSQL_BIN_LOG open, if generation of new binary log name fails then the
"log_state" needs to be marked as "LOG_CLOSED". With this further statements
will find binary log as closed and they will skip writing to the binary log.
This is a backport of the applicable part of
commit 93475aff8d and
commit 2c39f69d34
from 10.4.
Before 10.4 and Galera 4, WSREP_ON is a macro that points to
a global Boolean variable, so it is not that expensive to
evaluate, but we will add an unlikely() hint around it.
WSREP_ON_NEW: Remove. This macro was introduced in
commit c863159c32
when reverting WSREP_ON to its previous definition.
We replace some use of WSREP_ON with WSREP(thd), like it was done
in 93475aff8d. Note: the macro
WSREP() in 10.1 is equivalent to WSREP_NNULL() in 10.4.
Item_func_rand::seed_random(): Avoid invoking current_thd
when WSREP is not enabled.
When binlog is disabled, WSREP will not behave correctly when
SAVEPOINT ROLLBACK is executed since we don't register handlers for such case.
Fixed by registering WSREP handlerton for SAVEPOINT related commands.
Problem:
-------
Accessing a member within 'xid_count_per_binlog' structure results in
following error when 'UBSAN' is enabled.
member access within address 0xXXX which does not point to an object of type
'xid_count_per_binlog'
Analysis:
---------
The problem appears to be that no constructor for 'xid_count_per_binlog' is
being called, and thus the vtable will not be initialized.
Fix:
---
Defined a parameterized constructor for 'xid_count_per_binlog' class.
Analysis:
========
'max_binlog_cache_size' is configured and a huge transaction is executed. When
the transaction specific events size exceeds 'max_binlog_cache_size' the event
cannot be written to the binary log cache and cache write error is raised.
Upon cache write error the statement is rolled back and the transaction cache
should be truncated to a previous statement specific position. The truncate
operation should reset the cache to earlier valid positions and flush the new
changes. Even though the flush is successful the cache write error is still in
marked state. The truncate code interprets the cache write error as cache flush
failure and returns abruptly without modifying the write cache parameters.
Hence cache is in a invalid state. When a COMMIT statement is executed in this
session it tries to flush the contents of transaction cache to binary log.
Since cache has partial events the cache write operation will report
'writer.remains' assert.
Fix:
===
Binlog truncate function resets the cache to a specified size. As a first step
of truncation, clear the cache write error flag that was raised during earlier
execution. With this new errors that surface during cache truncation can be
clearly identified.
cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug
Maintainer mode makes all warnings errors. This patch fix warnings. Mostly about
deprecated `register` keyword.
Too much warnings came from Mroonga and I gave up on it.
now we can afford it. Fix -Werror errors. Note:
* old gcc is bad at detecting uninit variables, disable it.
* time_t is int or long, cast it for printf's
This patch contains a fix for the MDEV-17262/17243 issues and
new mtr test.
These issues (MDEV-17262/17243) have two reasons:
1) After an intermediate commit, a transaction loses its status
of "transaction that registered in the MySQL for 2pc coordinator"
(in the InnoDB) due to the fact that since version 10.2 the
write_row() function (which located in the ha_innodb.cc) does
not call trx_register_for_2pc(m_prebuilt->trx) during the processing
of split transactions. It is necessary to restore this call inside
the write_row() when an intermediate commit was made (for a split
transaction).
Similarly, we need to set the flag of the started transaction
(m_prebuilt->sql_stat_start) after intermediate commit.
The table->file->extra(HA_EXTRA_FAKE_START_STMT) called from the
wsrep_load_data_split() function (which located in sql_load.cc)
will also do this, but it will be too late. As a result, the call
to the wsrep_append_keys() function from the InnoDB engine may be
lost or function may be called with invalid transaction identifier.
2) If a transaction with the LOAD DATA statement is divided into
logical mini-transactions (of the 10K rows) and binlog is rotated,
then in rare cases due to the wsrep handler re-registration at the
boundary of the split, the last portion of data may be lost. Since
splitting of the LOAD DATA into mini-transactions is technical,
I believe that we should not allow these mini-transactions to fall
into separate binlogs. Therefore, it is necessary to prohibit the
rotation of binlog in the middle of processing LOAD DATA statement.
https://jira.mariadb.org/browse/MDEV-17262 and
https://jira.mariadb.org/browse/MDEV-17243
PROBLEM
-------
Memory sanitizer reports uninitialized comparisons
in log_in_use(), because strings are compared with
memcmp() instead of strncmp.
FIX
---
Use strncmp() to compare strings
extra/mariabackup/fil_cur.cc:361:42: warning: format specifies type 'unsigned long' but the argument has type 'ib_int64_t' (aka 'long long') [-Wformat]
extra/mariabackup/fil_cur.cc:376:9: warning: format specifies type 'unsigned long' but the argument has type 'ib_int64_t' (aka 'long long') [-Wformat]
sql/handler.cc:6196:45: warning: format specifies type 'unsigned long' but the argument has type 'wsrep_trx_id_t' (aka 'unsigned long long') [-Wformat]
sql/log.cc:1681:16: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
sql/log.cc:1687:16: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
sql/wsrep_sst.cc:1388:86: warning: format specifies type 'long' but the argument has type 'wsrep_seqno_t' (aka 'long long') [-Wformat]
sql/wsrep_sst.cc:232:86: warning: format specifies type 'long' but the argument has type 'wsrep_seqno_t' (aka 'long long') [-Wformat]
storage/connect/filamdbf.cpp:450:47: warning: format specifies type 'short' but the argument has type 'int' [-Wformat]
storage/connect/filamdbf.cpp:970:47: warning: format specifies type 'short' but the argument has type 'int' [-Wformat]
storage/connect/inihandl.cpp:197:16: warning: address of array 'key->name' will always evaluate to 'true' [-Wpointer-bool-conversion]
storage/innobase/btr/btr0scrub.cc:151:17: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/innobase/buf/buf0buf.cc:5085:8: warning: nonnull parameter 'bpage' will evaluate to 'true' on first encounter [-Wpointer-bool-conversion]
storage/innobase/fil/fil0crypt.cc:2454:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/innobase/handler/ha_innodb.cc:18685:7: warning: format specifies type 'unsigned long' but the argument has type 'wsrep_trx_id_t' (aka 'unsigned long long') [-Wformat]
storage/innobase/row/row0mysql.cc:3319:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/innobase/row/row0mysql.cc:3327:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/maria/ma_norec.c:35:10: warning: implicit conversion from 'int' to 'my_bool' (aka 'char') changes value from 131 to -125 [-Wconstant-conversion]
storage/maria/ma_norec.c:42:10: warning: implicit conversion from 'int' to 'my_bool' (aka 'char') changes value from 131 to -125 [-Wconstant-conversion]
storage/maria/ma_test2.c:1009:12: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
storage/maria/ma_test2.c:1010:12: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
storage/mroonga/ha_mroonga.cpp:9189:44: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand]
storage/mroonga/vendor/groonga/lib/expr.c:4987:22: warning: comparison of constant -1 with expression of type 'grn_operator' is always false [-Wtautological-constant-out-of-range-compare]
storage/xtradb/btr/btr0scrub.cc:151:17: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/xtradb/buf/buf0buf.cc:5047:8: warning: nonnull parameter 'bpage' will evaluate to 'true' on first encounter [-Wpointer-bool-conversion]
storage/xtradb/fil/fil0crypt.cc:2454:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/xtradb/row/row0mysql.cc:3324:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/xtradb/row/row0mysql.cc:3332:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
unittest/sql/mf_iocache-t.cc:120:35: warning: format specifies type 'unsigned long' but the argument has type 'int' [-Wformat]
unittest/sql/mf_iocache-t.cc:96:35: note: expanded from macro 'INFO_TAIL'
For the purpose of reporting an error to error log, shutdown thread was
attempting to access current_thd->variables.lc_messages->errmsgs->errmsgs.
Whereas current_thd was NULL.
We should log errors according to global lc_messages setting instead of
session setting.
Modern compilers (such as GCC 8) emit warnings that the
'register' keyword is deprecated and not valid C++17.
Let us remove most use of the 'register' keyword.
Code in 'extra/' is not touched.
MyRocks internally will print non-critical messages to
sql_print_verbose_info() which will do what InnoDB does in similar cases:
check if (global_system_variables.log_warnings > 2).
For galera compatibility, the main thing is to ensure the FD 1, 2 are
not opened with O_CLOEXEC otherwise galera sst errors don't appear in
the error.log
Files without O_CLOEXEC from the test below:
0 -> /dev/pts/9
1 -> /tmp/error.log (intended)
2 -> /tmp/error.log (intended)
5 -> /tmp/datadir
6 -> /tmp/datadir/aria_log.00000001
(Innodb temp files)
8 -> /tmp/ibIIrhFL (deleted)
9 -> /tmp/ibfx1vai (deleted)
10 -> /tmp/ibAQUKFO (deleted)
11 -> /tmp/ibWBQSHR (deleted)
15 -> /tmp/ibEXEcfo (deleted)
20 -> /tmp/datadir/mysql/host.MYD
22 -> /tmp/datadir/mysql/user.MYD
... (rest of MYD files)
Test for this and the previous commit.
sql/mysqld --skip-networking --datadir=/tmp/datadir --log-bin=/tmp/datadir/mysqlbin --socket /tmp/s.sock --lc-messages-dir=${PWD}/sql/share --verbose --log-error=/tmp/error.log --general-log-file=/tmp/general.log --general-log=1 --slow-query-log-file=/tmp/slow.log --slow-query-log=1
180302 10:56:41 [Note] sql/mysqld (mysqld 5.5.60-MariaDB-wsrep) starting as process 26056 ...
$ cd /proc/26056
$ ls -la --sort=none fd
total 0
dr-x------. 2 dan dan 0 Mar 2 10:57 .
dr-xr-xr-x. 9 dan dan 0 Mar 2 10:56 ..
lrwx------. 1 dan dan 64 Mar 2 10:57 0 -> /dev/pts/9
l-wx------. 1 dan dan 64 Mar 2 10:57 1 -> /tmp/error.log
l-wx------. 1 dan dan 64 Mar 2 10:57 2 -> /tmp/error.log
lrwx------. 1 dan dan 64 Mar 2 10:57 3 -> /tmp/datadir/mysqlbin.index
lrwx------. 1 dan dan 64 Mar 2 10:57 4 -> /tmp/datadir/aria_log_control
lr-x------. 1 dan dan 64 Mar 2 10:57 5 -> /tmp/datadir
lrwx------. 1 dan dan 64 Mar 2 10:57 6 -> /tmp/datadir/aria_log.00000001
lrwx------. 1 dan dan 64 Mar 2 10:57 7 -> /tmp/datadir/ibdata1
lrwx------. 1 dan dan 64 Mar 2 10:57 8 -> /tmp/ibIIrhFL (deleted)
lrwx------. 1 dan dan 64 Mar 2 10:57 9 -> /tmp/ibfx1vai (deleted)
lrwx------. 1 dan dan 64 Mar 2 10:57 10 -> /tmp/ibAQUKFO (deleted)
lrwx------. 1 dan dan 64 Mar 2 10:57 11 -> /tmp/ibWBQSHR (deleted)
lrwx------. 1 dan dan 64 Mar 2 10:57 12 -> /tmp/datadir/ib_logfile0
lrwx------. 1 dan dan 64 Mar 2 10:57 13 -> /tmp/datadir/ib_logfile1
l-wx------. 1 dan dan 64 Mar 2 10:57 14 -> /tmp/slow.log
lrwx------. 1 dan dan 64 Mar 2 10:57 15 -> /tmp/ibEXEcfo (deleted)
l-wx------. 1 dan dan 64 Mar 2 10:57 16 -> /tmp/general.log
lrwx------. 1 dan dan 64 Mar 2 10:57 17 -> socket:[1897356]
lrwx------. 1 dan dan 64 Mar 2 10:57 18 -> socket:[45335]
l-wx------. 1 dan dan 64 Mar 2 10:57 19 -> /tmp/datadir/mysqlbin.000004
lrwx------. 1 dan dan 64 Mar 2 10:57 20 -> /tmp/datadir/mysql/host.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 21 -> /tmp/datadir/mysql/host.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 22 -> /tmp/datadir/mysql/user.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 23 -> /tmp/datadir/mysql/user.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 24 -> /tmp/datadir/mysql/db.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 25 -> /tmp/datadir/mysql/db.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 26 -> /tmp/datadir/mysql/proxies_priv.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 27 -> /tmp/datadir/mysql/proxies_priv.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 28 -> /tmp/datadir/mysql/tables_priv.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 29 -> /tmp/datadir/mysql/tables_priv.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 30 -> /tmp/datadir/mysql/columns_priv.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 31 -> /tmp/datadir/mysql/columns_priv.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 32 -> /tmp/datadir/mysql/procs_priv.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 33 -> /tmp/datadir/mysql/procs_priv.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 34 -> /tmp/datadir/mysql/servers.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 35 -> /tmp/datadir/mysql/servers.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 36 -> /tmp/datadir/mysql/event.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 37 -> /tmp/datadir/mysql/event.MYI
O_CLOEXEC files are those with flags 02000000
/usr/include/bits/fcntl-linux.h:# define __O_CLOEXEC 02000000
/usr/include/bits/fcntl-linux.h:# define O_CLOEXEC __O_CLOEXEC /* Set close_on_exec. */
$ find fdinfo/ -type f -ls -exec cat {} \; | more
1924720 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/0
pos: 0
flags: 0100002
mnt_id: 25
1924721 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/1
pos: 9954
flags: 0102001
mnt_id: 82
1924722 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/2
pos: 10951
flags: 0102001
mnt_id: 82
1924723 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/3
pos: 116
flags: 02100002
mnt_id: 82
1924724 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/4
pos: 52
flags: 0100002
mnt_id: 82
lock: 1: POSIX ADVISORY WRITE 26056 00:2c:1866365 0 EOF
1924725 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/5
pos: 0
flags: 0100000
mnt_id: 82
1924726 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/6
pos: 16384
flags: 0100002
mnt_id: 82
1924727 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/7
pos: 0
flags: 02100002
mnt_id: 82
lock: 1: POSIX ADVISORY WRITE 26056 00:2c:1866491 0 EOF
1924728 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/8
pos: 0
flags: 0100002
mnt_id: 82
1924729 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/9
pos: 0
flags: 0100002
mnt_id: 82
1924730 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/10
pos: 0
flags: 0100002
mnt_id: 82
1924731 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/11
pos: 0
flags: 0100002
mnt_id: 82
1924732 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/12
pos: 0
flags: 02100002
mnt_id: 82
lock: 1: POSIX ADVISORY WRITE 26056 00:2c:1866492 0 EOF
1924733 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/13
pos: 0
flags: 02100002
mnt_id: 82
lock: 1: POSIX ADVISORY WRITE 26056 00:2c:1866493 0 EOF
1924734 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/14
pos: 763
flags: 02102001
mnt_id: 82
1924735 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/15
pos: 0
flags: 0100002
mnt_id: 82
1924736 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/16
pos: 473
flags: 02102001
mnt_id: 82
1924737 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/17
pos: 0
flags: 02000002
mnt_id: 9
1924738 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/18
pos: 0
flags: 02
mnt_id: 9
1924739 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/19
pos: 245
flags: 02100001
mnt_id: 82
1924740 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/20
pos: 0
flags: 0100002
mnt_id: 82
1924741 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/21
pos: 503
flags: 0500002
mnt_id: 82
1924742 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/22
pos: 324
flags: 0100002
mnt_id: 82
1924743 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/23
pos: 642
flags: 0500002
mnt_id: 82
1924744 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/24
pos: 880
flags: 0100002
mnt_id: 82
1924745 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/25
pos: 581
flags: 0500002
mnt_id: 82
1924746 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/26
pos: 1386
flags: 0100002
mnt_id: 82
1924747 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/27
pos: 498
flags: 0500002
mnt_id: 82
1924748 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/28
pos: 0
flags: 0100002
mnt_id: 82
1924749 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/29
pos: 513
flags: 0500002
mnt_id: 82
1924750 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/30
pos: 0
flags: 0100002
mnt_id: 82
1924751 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/31
pos: 494
flags: 0500002
mnt_id: 82
1924752 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/32
pos: 0
flags: 0100002
mnt_id: 82
1924753 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/33
pos: 535
flags: 0500002
mnt_id: 82
1924754 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/34
pos: 0
flags: 0100002
mnt_id: 82
1924755 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/35
pos: 396
flags: 0500002
mnt_id: 82
1924756 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/36
pos: 0
flags: 0100002
mnt_id: 82
1924757 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/37
pos: 517
flags: 0500002
mnt_id: 82
Problem:- Gtid are not transferred in Galera Cluster.
Solution:- We need to transfer gtid in the case on either when cluster is
slave/master in async replication. In normal Gtid replication gtid are generated on
recieving node itself and it is always on sync with other nodes. Because galera keeps
node in sync , So all nodes get same no of event groups. So the issue arises when
say galera is slave in async replication.
A
| (Async replication)
D <-> E <-> F {Galera replication}
So what should happen is that all node should apply the master gtid but this does
node happen, becuase node E, F does not recieve gtid from D in write set , So what E(or F)
does is that it applies wsrep_gtid_domain_id, D server-id , E gtid next seq no. This
generated gtid does not always work when say A has different domain id.
So In this commit, on galera node when we see that this event is recieved from master
we simply write Gtid_Log_Event in write_set and send it to other nodes.
The crash (sometimes assert) in MYSQL_BIN_LOG::mark_xid_done was caused by a
fact that log.cc:binlog_background_thread_queue could become a cyclic list.
This possibility becomes real with two checkpoint capable engines that
may execute TC_LOG_BINLOG::commit_checkpoint_notify() in succession before
binlog_background thread gets control and eventually finds a freed memory
while otherwise endlessly looping in while(queue).
It is fixed with counting the notificaion kind instead of en-listing the same notificaion kind in commit_checkpoint_notify as formerly. The while(queue) of binlog background thread is refined to pay attention to the new counter. In effectno more access to free memory is possible.
With combination of --log-bin and Galera the server may crash
reporting two characteristic stacks:
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0xc7)[0x7f182a8e2cb7]
/usr/sbin/mysqld(binlog_background_thread+0x2b5)[0x7f182a8e3275]
or
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x9d)[0x7ff395b2dafd]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x7ff395b2db91]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG16rotate_and_purgeEb+0xc2)[0x7ff395b300b2]
The reason of the failure appears to be non-matching decrements for
`xid_count_per_binlog::xid_count`
which can occur when a transaction is executed having its connection issued
`SET @@sql_log_bin=0`. In such case the xid count is not incremented but
its decrements still runs to turn `binlog_xid_count_list` into improper state
which the following FLUSH BINARY LOGS exposes through the crash.
*Note_1*: the regression test reuses an existing galera.sql_log_bin
which does not run stably (even in its base form) by mtr with --log-bin.
*Note_2*: 10.0-galera branch is free of this issue having missed MDEV-7205
fixes.
As reported in MDEV-11969 "there's no way to ditch knowledge" about some
domain that is no longer updated on a server. Besides being of annoyance to
clutter output in DBA console stale domains can prevent the slave
to connect the master as MDEV-12012 witnesses.
What domain is obsolete must be evaluated by the user (DBA) according
to whether the domain info is still relevant and will the domain ever
receive any update.
This patch introduces a method to discard obsolete gtid domains from
the server binlog state. The removal requires no event group from such
domain present in existing binlog files though. If there are any the
containing logs must be first PURGEd in order for
FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)
succeed. Otherwise the command returns an error.
The list of obsolete domains can be computed through
intersecting two sets - the earliest (first) binlog's Gtid_list
and the current value of @@global.gtid_binlog_state - and extracting
the domain id components from the intersection list items.
The new DELETE_DOMAIN_ID featured FLUSH continues to rotate binlog
omitting the deleted domains from the active binlog file's Gtid_list.
Notice though when the command is ineffective - that none of requested to delete
domain exists in the binlog state - rotation does not occur.
Obsolete domain deletion is not harmful for connected slaves as long
as master side binlog files *purge* is synchronized with FLUSH-DELETE_DOMAIN_ID.
The slaves must have the last event from purged files processed as usual,
in order not to bump later into requesting a gtid from a file which
was already gone.
While the command is not replicated (as ordinary FLUSH BINLOG LOGS is)
slaves, even though having extra domains, won't suffer from reconnection errors
thanks to master-slave gtid connection protocol allowing the master
to be ignorant about a gtid domain.
Should at failover such slave to be promoted into master role it may run
the ex-master's
FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)
to clean its own binlog state.
NOTES.
suite/perfschema/r/start_server_low_digest.result
is re-recorded as consequence of internal parser codes changes.