This is actually an existing problem in the old binlog implementation, and
this patch is applicable to old binlog also. The problem is that RESET
MASTER can run concurrently with binlog dump threads / connected slaves.
This will remove the binlog from under the feet of the reader, which can
cause all sorts of strange behaviour.
This patch fixes the problem by disallowing to run RESET MASTER when dump
threads (or other RESET MASTER or SHOW BINARY LOGS) are running. An error is
thrown in this case, user must stop slaves and/or kill dump threads to make
the RESET MASTER go through. A slave that connects in the middle of RESET
MASTER will wait for it to complete.
Fix a lot of test cases to kill any lingering dump threads before doing
RESET MASTER, mostly just by sourcing include/kill_binlog_dump_threads.inc.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
InnoDB binlog files are now backed up along with other InnoDB data by
mariadb-backup.
The files are copied after backup locks have been released. Backup files
created later than the backup LSN are skipped. Then during --prepare, any
data missing from the hot-copied binlog files will be restored by the
binlog recovery code, and any excess data written after the backup LSN will
be zeroed out.
A couple test cases test taking a consistent backup of a server with active
traffic during the backup, by provisioning a slave from the restored binlog
position and checking that the slave can replicate from the original master
and get identical data.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Keep track of, for each binlog file, how many open transactions have
out-of-band data starting in that file. Then at the start of each new binlog
file, in the header page, record the file_no of the earliest file that this
file might contain commit records with references back to OOB records in
that earlier file.
Use this in PURGE BINARY LOGS, so that when a dump thread (slave connection)
is active in file number N, and that file (or a later one) may require
looking back in an earlier file number M for out-of-band records, purge will
stop already at file number M. This way, we avoid that purge accidentally
deletes some binlog file that a dump thread would later get an error on
because it needs to read out-of-band data.
This patch also includes placeholder data for a similar facility for XA
references. The actual implementation of support for XA is for later though.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Mostly various fixes to avoid initializing or creating any data or files for
the legacy binlog.
A possible later refinement could be to sub-class the binlog class
differently for legacy and in-engine binlogs, writing separate virtual
functions for behaviour that differ, extracting common functionality into
sub-methods. This could remove some if (opt_binlog_engine_hton)
conditionals.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
With this commit, the out-of-band binlogging of large event groups in
multiple smaller records interleaved with other event groups is now working.
Instead of flushing the binlog cache to disk when they reach
@@binlog_cache_size, instead the cache is binlogged as an out-of-band
record. Then at transaction commit, a commit record is written containing
just the GTID and a link to the out-of-band data.
To facilitate append-only operation, the binlogged records do not have a
"next" pointer. Instead, they are written out as a forest of perfect binary
trees, the leftmost leaf of one tree pointing to the root of the previous
tree. This structure is used in the binlog reader to efficiently read out
the event group data consecutively for the binlog dump thread, needing to
maintain only O(log(N)) amount of memory during the reading.
As part of this commit, the existing binlog reader code is refactored to be
greatly improved, with a much cleaner explicit state machine and handling of
chunk/page/file boundaries etc.
Also fixes some bugs in the gtid_search::find_gtid_pos().
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This will make mysql-test-run.pl try to schedule these long-running
(> 60 seconds) tests early in --parallel runs, which helps avoid that
the testsuite gets stuck with a few long-running tests at the end
while most other test workers are idle.
This speed up mtr --parallel=96 with 25 seconds for me.
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
- Suppress a couple errors the slave can get as the master crashes.
- The mysql-test-run occasionally takes 120 seconds between crashing
the master and starting it back up for some (unknown) reason. For
now, work-around that by letting the slave try for 500 seconds to
connect to master before giving up instead of only 100 seconds.
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The test case set debug_sync=RESET without waiting for the server thread to
receive the prior signal. This can cause the signal to be lost, the thread to
not wake up, and thus the test to time out.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Make sure the table mysql.gtid_slave_pos is altered to InnoDB before
starting parallel replication. The parallel replication of the suppression
insertion in the test case was trying to update the GTID position in
parallel with the ALTER TABLE, which could occasionally deadlock on the MDL
lock.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Disclaimer: This report was fixed in a previous commit with
MDEV-21117, this patch only adds a test to show the presence of
the fix.
Prior to MDEV-21117, the ordering of the handlers in a
transaction's ha_info list solely determined the order in which
the handlertons commit. The binlog is supposed to commit first,
and is normally placed first in the ha_list to do so; however,
in multi-engine 2-phase XA transactions, the binlog can be
placed second. This allowed a race-condition for other
concurrent transactions to commit and binlog before the prior
XA COMMIT would be written to binlog.
MDEV-21117 fixed this so the binlog is specially considered
to commit first, before traversing the ha_list (see
commit_one_phase_2() in sql/hander.cc for this specific change
in 6c39eaeb12).
Test rpl.rpl_parallel_gco_wait_kill has a race condition where
it identifies that SQL using its state as
"Slave has read all relay log", and immediately tries to save
the thread id of the SQL thread by querying for threads with
that same state. However, the state of the SQL thread may
change in-between this time, leading to the query that saves
the SQL thread finding no thread id that matches that state.
Commit 3ee6f69d49 aimed to fix
this in 10.11+ by simplifying the query string to
"%relay log%"; however, it could still fail (though less
often).
This commit instead changes the query to find the SQL thread
from using some state, to using the command "Slave_SQL", which
will not change
MDEV-35477 incorrectly reverted the original test fix of MDEV-34355,
thinking it superseded that fix. It is still needed though. Pasting
the content of the original patch’s commit message, as it is still
relevant (and the method to reproduce the test failure still works).
“””
The problem is that the test could query the status variable
Rpl_semi_sync_slave_send_ack before the slave actually updated it.
This would result in an immediate --die assertion killing the rest
of the test. The bottom of this commit message has a small patch
that can be applied to reproduce the test failure.
This patch fixes the test failure by waiting for the variable to be
updated before querying its value.
diff --git a/sql/semisync_slave.cc b/sql/semisync_slave.cc
index 9ddd4c5c8d7..60538079fce 100644
--- a/sql/semisync_slave.cc
+++ b/sql/semisync_slave.cc
@@ -303,7 +303,10 @@ int Repl_semi_sync_slave::slave_reply(Master_info *mi)
reply_res= DBUG_EVALUATE_IF("semislave_failed_net_flush", 1,
net_flush(net));
if (!reply_res)
+ {
+ sleep(1);
rpl_semi_sync_slave_send_ack++;
+ }
}
DBUG_RETURN(reply_res);
}
“””
In Log_event::read_log_event(), don't use IO_CACHE::error of the relay log's
IO_CACHE to signal an error back to the caller. When reading the active
relay log, this flag is also being used by the IO thread, and setting it can
randomly cause the IO thread to wrongly detect IO error on writing and
permanently disable the relay log.
This was seen sporadically in test case rpl.rpl_from_mysql80. The read
error set by the SQL thread in the IO_CACHE would be interpreted as a
write error by the IO thread, which would cause it to throw a fatal
error and close the relay log. And this would later cause CHANGE
MASTER to try to purge a closed relay log, resulting in nullptr crash.
SQL thread is not able to parse an event read from the relay log. This
can happen like here when replicating unknown events from a MySQL master,
potentially also for other reasons.
Also fix a mistake in my_b_flush_io_cache() introduced back in 2001
(fa09f2cd7e) where my_b_flush_io_cache() could wrongly return an error set
in IO_CACHE::error, even if the flush operation itself succeeded.
Also fix another sporadic failure in rpl.rpl_from_mysql80 where the outout
of MASTER_POS_WAIT() depended on timing of SQL and IO thread.
Reviewed-by: Monty <monty@mariadb.org>
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Row-injection updates don’t correctly set the historical partition
for tables with system versioning and system_time partitions. This
results in inconsistencies between the master and slave when
replicating transactions that target such tables (i.e. the primary
server would correctly distribute archived rows amongst its
partitions, whereas the replica would have all archived rows in a
single partition). The function
partition_info::vers_set_hist_part(THD*) is used to set the
partition; however, its initial check for
vers_require_hist_part(THD*) returns false, bypassing the rest of
the function (which sets up the partition to use). This is because
the actual check uses the LEX sql_command (via
LEX::vers_history_generating()) to determine if the command is valid
to generate history. Row injections don’t have sql_commands though.
This patch provides a fix which extends the check in
vers_history_generating() to additionally allow row injections to be
history generating (via the function LEX::is_stmt_row_injection()).
Special thanks to Jan Lindstrom <jan.lindstrom@galeracluster.com>
for his work in reproducing the bug, and providing an initial test
case.
Reviewed By
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Aleksey Midenkov <midenok@mariadb.com>
CURRENT_TEST: binlog_encryption.rpl_parallel_gco_wait_kill
mysqltest: In included file "./suite/rpl/t/rpl_parallel_gco_wait_kill.test":
included from /home/buildbot/amd64-ubuntu-2004-debug/build/mysql-test/suite/binlog_encryption/rpl_parallel_gco_wait_kill.test at line 2:
At line 334: Can't initialize replace from 'replace_result $thd_id THD_ID'
An sql thread can reach the "Slave has read all relay log" state
and then start reading relay log again. Let's use a more generic
pattern to retrieve the sql thread ID even if it's not
in the "read all relay log" state.
MDEV-29533 Crash when MariaDB is replica of MySQL 8.0
MySQL 8.0 has added the following new events in the MySQL binary log
PARTIAL_UPDATE_ROWS_EVENT
TRANSACTION_PAYLOAD_EVENT
HEARTBEAT_LOG_EVENT_V2
- PARTIAL_UPDATE_ROWS_EVENT is used by MySQL to generate update
statements using JSON_SET, JSON_REPLACE and JSON_REMOVE to make
update of JSON columns more efficient. These events can be
disabled by setting 'binlog-row-value-options=""'
- TRANSACTION_PAYLOAD_EVENT is used by MySQL to signal that a
row event is compressed. It an be disably by setting
'binlog_transaction_compression=0'.
- HEARTBEAT_LOG_EVENT_V2 is written to the binary log many times
per seconds. It can be ignored by the server.
What this patch does:
- If PARTIAL_UPDATE_ROWS_EVENT or TRANSACTION_PAYLOAD_EVENT is found,
the server will stop with an error message of how to disable the
MySQL server to generate such events.
- HEARTBEAT_LOG_EVENT_V2 events are ignored.
- mariadb-binlog will write the name of the new events.
- mariadb-binlog will stop if PARTIAL_UPDATE_ROWS_EVENT or
TRANSACTION_PAYLOAD_EVENT is found, unless --force is given.
- Fixes a crash in mariadb-binlog if a character set unknown to
MariaDB is found. (MDEV-29533)
From Kristian Nielsen:
- Add test case for MySQL 8.0 to MariaDB replication and fixed a
a small typo in post_header_len initialization.
Reviewer: knielsen@mariadb.org
Handle null bits for record comparison in row events the same way as in
handler::calculate_checksum(), forcing bits that can be undefined to 1.
These bits are the trailing unused bits, as well as the first bit for
tables not using HA_OPTION_PACK_RECORD.
The csv storage engine leaves these bits at 0, while the row-based
replication has them set to 1, which otherwise cause can't find record error.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The test turns out to be senstive to @@global.gtid_cleanup_batch_size.
With a rather small default value of the latter
SELECTing from mysql.gtid_slave_pos may not be deterministic: tests
that run before may increase a pending for automitic deletion batch.
The test is refined to set its own value for the batch size which
is virtually unreachable.
Thanks to Kristian Nielsen for the analysis.
Under unknown circumstances, the SQL layer may wrongly disregard an
invocation of thd_mark_transaction_to_rollback() when an InnoDB
transaction had been aborted (rolled back) due to one of the following errors:
* HA_ERR_LOCK_DEADLOCK
* HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON)
* HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON)
Such an error used to cause a crash of InnoDB during transaction commit.
These changes aim to catch and report the error earlier, so that not only
this crash can be avoided but also the original root cause be found and
fixed more easily later.
The idea of this fix is from Michael 'Monty' Widenius.
HA_ERR_ROLLBACK: A new error code that will be translated into
ER_ROLLBACK_ONLY, signalling that the current transaction
has been aborted and the only allowed action is ROLLBACK.
trx_t::state: Add TRX_STATE_ABORTED that is like
TRX_STATE_NOT_STARTED, but noting that the transaction had been
rolled back and aborted.
trx_t::is_started(): Replaces trx_is_started().
ha_innobase: Check the transaction state in various places.
Simplify the logic around SAVEPOINT.
ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only().
The InnoDB logic around transaction savepoints, commit, and rollback
was unnecessarily complex and might have contributed to this
inconsistency. So, we are simplifying that logic as well.
trx_savept_t: Replace with const undo_no_t*. When we rollback to
a savepoint, all we need to know is the number of undo log records
that must survive.
trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t
directly in the space allocated at innobase_hton->savepoint_offset.
fts_trx_create(): Do not copy previous savepoints.
fts_savepoint_rollback(): If a savepoint was not found, roll back
everything after the default savepoint of fts_trx_create().
The test innodb_fts.savepoint is extended to cover this code.
Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
The slave replication should accept not supported table options (eg.
"transactional" for MyISAM), as such options can end up being set from the
master in binlogged CREATE TABLE.
This was already handled in report_unknown_option(), which skips the error
in slave threads. But in mysql_prepare_create_table_finalize() there was still
a warning given, and this warning gets converted into an error when
STRICT_(ALL|TRANS)_TABLES. So skip this warning for replication also.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The assertion occurred in the SQL thread if an event group was incompletely
written, missing the end XID or COMMIT event, and immediately followed by a
new event group. This could also lead to the incomplete event group being
committed, and with the wrong GTID.
Fix by rolling back any active transaction from a prior event group when
applying the following GTID event.
Getting an incomplete event like this is somewhat rare to happen. If the
server crashes in the middle of writing an event group, the server restart
will write a new format description event, which makes the slave roll back
the partial event group. But presumably it could happen if the master
experiences temporary write errors in the binlog, like intermittent disk
full for example.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Disallow changing @@gtid_domain_id while a temporary table is open in
STATEMENT or MIXED binlog mode. Otherwise, a slave may try to replicate
events refering to the same temporary table in parallel, using domain-based
out-of-order parallel replication. This is not valid, temporary tables are
only available for use within a single thread at a time.
One concrete consequence seen from this bug was a ROLLBACK on an
InnoDB temporary table running in one domain in parallel with DROP
TEMPORARY TABLE in another domain, causing an assertion inside InnoDB:
InnoDB: Failing assertion: table->get_ref_count() == 0 in
dict_sys_t::remove.
Use an existing error code that's somewhat close to the real issue
(ER_INSIDE_TRANSACTION_PREVENTS_SWITCH_GTID_DOMAIN_ID_SEQ_NO), to not add a
new error code in a GA release. When this is merged to the next GA release,
we could optionally introduce a new and more precise error code for an
attempt to change the domain_id while temporary tables are open.
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The partitioning error handling code was looking at
thd->lex->alter_info.partition_flags in non-alter-table cases, in which cases
the value is stale and contains whatever was set by any earlier ALTER TABLE.
This could cause the wrong error code to be generated, which then in some cases
can cause replication to break with "different errorcode" error.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>