The test rpl.rpl_parallel_sbm would fail because
Slave_last_event_time would not match Master_last_event_time after
syncing with the master (i.e. via sync_with_master_gtid.inc). This
happens because the timing statistics are updated after updating
gtid_slave_pos. That is, gtid_slave_pos is updated with the
transaction commit, whereas the timing statistics are updated when
the relay log position is updated (i.e. after the commit). This is
by design. For the test, this means there is a small amount of time
after sync_with_master_gtid.inc where the SHOW SLAVE STATUS output
lags behind the data state of the slave. This would cause the test
to fail if comparing Slave_last_event_time to Master_last_event_time
during this period, because Slave_last_event_time would not yet
reflect the latest committed transaction.
The fix is to add a wait_condition before the comparison to wait for
the SHOW SLAVE STATUS statistics to be updated. The actual check is
to wait for Seconds_Behind_Master == 0, because the slave should be
idle at this point, which forces the value to be 0.
The test failure can be reproduced with the following patch:
diff --git a/sql/rpl_gtid.cc b/sql/rpl_gtid.cc
index db5c26b6237..b346a32b573 100644
--- a/sql/rpl_gtid.cc
+++ b/sql/rpl_gtid.cc
@@ -300,6 +300,7 @@ rpl_slave_state::update(uint32 domain_id, uint32 server_id, uint64 sub_id,
mysql_mutex_lock(&LOCK_slave_state);
res= update_nolock(domain_id, server_id, sub_id, seq_no, hton, rgi);
mysql_mutex_unlock(&LOCK_slave_state);
+ sleep(1);
return res;
}
Test rpl.rpl_parallel_gco_wait_kill has a race condition where
it identifies that SQL using its state as
"Slave has read all relay log", and immediately tries to save
the thread id of the SQL thread by querying for threads with
that same state. However, the state of the SQL thread may
change in-between this time, leading to the query that saves
the SQL thread finding no thread id that matches that state.
Commit 3ee6f69d49 aimed to fix
this in 10.11+ by simplifying the query string to
"%relay log%"; however, it could still fail (though less
often).
This commit instead changes the query to find the SQL thread
from using some state, to using the command "Slave_SQL", which
will not change
MDEV-35477 incorrectly reverted the original test fix of MDEV-34355,
thinking it superseded that fix. It is still needed though. Pasting
the content of the original patch’s commit message, as it is still
relevant (and the method to reproduce the test failure still works).
“””
The problem is that the test could query the status variable
Rpl_semi_sync_slave_send_ack before the slave actually updated it.
This would result in an immediate --die assertion killing the rest
of the test. The bottom of this commit message has a small patch
that can be applied to reproduce the test failure.
This patch fixes the test failure by waiting for the variable to be
updated before querying its value.
diff --git a/sql/semisync_slave.cc b/sql/semisync_slave.cc
index 9ddd4c5c8d7..60538079fce 100644
--- a/sql/semisync_slave.cc
+++ b/sql/semisync_slave.cc
@@ -303,7 +303,10 @@ int Repl_semi_sync_slave::slave_reply(Master_info *mi)
reply_res= DBUG_EVALUATE_IF("semislave_failed_net_flush", 1,
net_flush(net));
if (!reply_res)
+ {
+ sleep(1);
rpl_semi_sync_slave_send_ack++;
+ }
}
DBUG_RETURN(reply_res);
}
“””
In Log_event::read_log_event(), don't use IO_CACHE::error of the relay log's
IO_CACHE to signal an error back to the caller. When reading the active
relay log, this flag is also being used by the IO thread, and setting it can
randomly cause the IO thread to wrongly detect IO error on writing and
permanently disable the relay log.
This was seen sporadically in test case rpl.rpl_from_mysql80. The read
error set by the SQL thread in the IO_CACHE would be interpreted as a
write error by the IO thread, which would cause it to throw a fatal
error and close the relay log. And this would later cause CHANGE
MASTER to try to purge a closed relay log, resulting in nullptr crash.
SQL thread is not able to parse an event read from the relay log. This
can happen like here when replicating unknown events from a MySQL master,
potentially also for other reasons.
Also fix a mistake in my_b_flush_io_cache() introduced back in 2001
(fa09f2cd7e) where my_b_flush_io_cache() could wrongly return an error set
in IO_CACHE::error, even if the flush operation itself succeeded.
Also fix another sporadic failure in rpl.rpl_from_mysql80 where the outout
of MASTER_POS_WAIT() depended on timing of SQL and IO thread.
Reviewed-by: Monty <monty@mariadb.org>
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Row-injection updates don’t correctly set the historical partition
for tables with system versioning and system_time partitions. This
results in inconsistencies between the master and slave when
replicating transactions that target such tables (i.e. the primary
server would correctly distribute archived rows amongst its
partitions, whereas the replica would have all archived rows in a
single partition). The function
partition_info::vers_set_hist_part(THD*) is used to set the
partition; however, its initial check for
vers_require_hist_part(THD*) returns false, bypassing the rest of
the function (which sets up the partition to use). This is because
the actual check uses the LEX sql_command (via
LEX::vers_history_generating()) to determine if the command is valid
to generate history. Row injections don’t have sql_commands though.
This patch provides a fix which extends the check in
vers_history_generating() to additionally allow row injections to be
history generating (via the function LEX::is_stmt_row_injection()).
Special thanks to Jan Lindstrom <jan.lindstrom@galeracluster.com>
for his work in reproducing the bug, and providing an initial test
case.
Reviewed By
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Aleksey Midenkov <midenok@mariadb.com>
CURRENT_TEST: binlog_encryption.rpl_parallel_gco_wait_kill
mysqltest: In included file "./suite/rpl/t/rpl_parallel_gco_wait_kill.test":
included from /home/buildbot/amd64-ubuntu-2004-debug/build/mysql-test/suite/binlog_encryption/rpl_parallel_gco_wait_kill.test at line 2:
At line 334: Can't initialize replace from 'replace_result $thd_id THD_ID'
An sql thread can reach the "Slave has read all relay log" state
and then start reading relay log again. Let's use a more generic
pattern to retrieve the sql thread ID even if it's not
in the "read all relay log" state.
MDEV-29533 Crash when MariaDB is replica of MySQL 8.0
MySQL 8.0 has added the following new events in the MySQL binary log
PARTIAL_UPDATE_ROWS_EVENT
TRANSACTION_PAYLOAD_EVENT
HEARTBEAT_LOG_EVENT_V2
- PARTIAL_UPDATE_ROWS_EVENT is used by MySQL to generate update
statements using JSON_SET, JSON_REPLACE and JSON_REMOVE to make
update of JSON columns more efficient. These events can be
disabled by setting 'binlog-row-value-options=""'
- TRANSACTION_PAYLOAD_EVENT is used by MySQL to signal that a
row event is compressed. It an be disably by setting
'binlog_transaction_compression=0'.
- HEARTBEAT_LOG_EVENT_V2 is written to the binary log many times
per seconds. It can be ignored by the server.
What this patch does:
- If PARTIAL_UPDATE_ROWS_EVENT or TRANSACTION_PAYLOAD_EVENT is found,
the server will stop with an error message of how to disable the
MySQL server to generate such events.
- HEARTBEAT_LOG_EVENT_V2 events are ignored.
- mariadb-binlog will write the name of the new events.
- mariadb-binlog will stop if PARTIAL_UPDATE_ROWS_EVENT or
TRANSACTION_PAYLOAD_EVENT is found, unless --force is given.
- Fixes a crash in mariadb-binlog if a character set unknown to
MariaDB is found. (MDEV-29533)
From Kristian Nielsen:
- Add test case for MySQL 8.0 to MariaDB replication and fixed a
a small typo in post_header_len initialization.
Reviewer: knielsen@mariadb.org
Handle null bits for record comparison in row events the same way as in
handler::calculate_checksum(), forcing bits that can be undefined to 1.
These bits are the trailing unused bits, as well as the first bit for
tables not using HA_OPTION_PACK_RECORD.
The csv storage engine leaves these bits at 0, while the row-based
replication has them set to 1, which otherwise cause can't find record error.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The test turns out to be senstive to @@global.gtid_cleanup_batch_size.
With a rather small default value of the latter
SELECTing from mysql.gtid_slave_pos may not be deterministic: tests
that run before may increase a pending for automitic deletion batch.
The test is refined to set its own value for the batch size which
is virtually unreachable.
Thanks to Kristian Nielsen for the analysis.
Under unknown circumstances, the SQL layer may wrongly disregard an
invocation of thd_mark_transaction_to_rollback() when an InnoDB
transaction had been aborted (rolled back) due to one of the following errors:
* HA_ERR_LOCK_DEADLOCK
* HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON)
* HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON)
Such an error used to cause a crash of InnoDB during transaction commit.
These changes aim to catch and report the error earlier, so that not only
this crash can be avoided but also the original root cause be found and
fixed more easily later.
The idea of this fix is from Michael 'Monty' Widenius.
HA_ERR_ROLLBACK: A new error code that will be translated into
ER_ROLLBACK_ONLY, signalling that the current transaction
has been aborted and the only allowed action is ROLLBACK.
trx_t::state: Add TRX_STATE_ABORTED that is like
TRX_STATE_NOT_STARTED, but noting that the transaction had been
rolled back and aborted.
trx_t::is_started(): Replaces trx_is_started().
ha_innobase: Check the transaction state in various places.
Simplify the logic around SAVEPOINT.
ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only().
The InnoDB logic around transaction savepoints, commit, and rollback
was unnecessarily complex and might have contributed to this
inconsistency. So, we are simplifying that logic as well.
trx_savept_t: Replace with const undo_no_t*. When we rollback to
a savepoint, all we need to know is the number of undo log records
that must survive.
trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t
directly in the space allocated at innobase_hton->savepoint_offset.
fts_trx_create(): Do not copy previous savepoints.
fts_savepoint_rollback(): If a savepoint was not found, roll back
everything after the default savepoint of fts_trx_create().
The test innodb_fts.savepoint is extended to cover this code.
Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
The slave replication should accept not supported table options (eg.
"transactional" for MyISAM), as such options can end up being set from the
master in binlogged CREATE TABLE.
This was already handled in report_unknown_option(), which skips the error
in slave threads. But in mysql_prepare_create_table_finalize() there was still
a warning given, and this warning gets converted into an error when
STRICT_(ALL|TRANS)_TABLES. So skip this warning for replication also.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The assertion occurred in the SQL thread if an event group was incompletely
written, missing the end XID or COMMIT event, and immediately followed by a
new event group. This could also lead to the incomplete event group being
committed, and with the wrong GTID.
Fix by rolling back any active transaction from a prior event group when
applying the following GTID event.
Getting an incomplete event like this is somewhat rare to happen. If the
server crashes in the middle of writing an event group, the server restart
will write a new format description event, which makes the slave roll back
the partial event group. But presumably it could happen if the master
experiences temporary write errors in the binlog, like intermittent disk
full for example.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Disallow changing @@gtid_domain_id while a temporary table is open in
STATEMENT or MIXED binlog mode. Otherwise, a slave may try to replicate
events refering to the same temporary table in parallel, using domain-based
out-of-order parallel replication. This is not valid, temporary tables are
only available for use within a single thread at a time.
One concrete consequence seen from this bug was a ROLLBACK on an
InnoDB temporary table running in one domain in parallel with DROP
TEMPORARY TABLE in another domain, causing an assertion inside InnoDB:
InnoDB: Failing assertion: table->get_ref_count() == 0 in
dict_sys_t::remove.
Use an existing error code that's somewhat close to the real issue
(ER_INSIDE_TRANSACTION_PREVENTS_SWITCH_GTID_DOMAIN_ID_SEQ_NO), to not add a
new error code in a GA release. When this is merged to the next GA release,
we could optionally introduce a new and more precise error code for an
attempt to change the domain_id while temporary tables are open.
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The partitioning error handling code was looking at
thd->lex->alter_info.partition_flags in non-alter-table cases, in which cases
the value is stale and contains whatever was set by any earlier ALTER TABLE.
This could cause the wrong error code to be generated, which then in some cases
can cause replication to break with "different errorcode" error.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
- mariadb-dump utility performs logical backups by producing
set of sql statements that can be executed. By enabling this
no-autocommit option, InnoDB can load the data in an efficient
way and writes the only one undo log for the whole operation.
Only first insert statement undergoes bulk insert operation,
remaining insert statement doesn't write undo log and undergoes
normal insert code path.
MTR test rpl_semi_sync_no_missed_ack_after_add_slave fails on
buildbot after the preparatory commit for MDEV-35109 (5290fa043b)
which changed a sleep to a debug_sync point. The problem is that the
debug_sync point would time-out on a slave while waiting to enter
the logic to send an ACK reply. More specifically, where the test
config is a primary with two replicas, and the test waits on one of
the replicas to start sending an ACK, if the other replica was able
to receive the event and respond with an ACK before the binlog dump
thread of the timing-out server would prepare to send event, it
wouldn't set the SEMI_SYNC_NEED_ACK flag, and the replica wouldn't
even try to respond with an ACK.
Fix is to use debug_sync for both replicas such that both replicas
are held before sending their ack, so one can’t temporarily disable
semi-sync for the other before it receives the transaction.
slave_applier_reset_xa_trans() should clear the THD::pseudo_thread_id when called
to reset XA transaction state completely. Clearing when pseudo_thread_id models
the binlog applier that handles BASE64-encoded events which possibly contain the
pseudo_thread_id, allowing us to restore the pre-event's state of the
connection's respective session var.
Replace wait_for_pattern_in_file.inc and all of its uses
to use search_pattern_in_file.inc with SEARCH_WAIT.
Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Sergei Golubchik <serg@mariadb.org>
For a primary configured with wait_point=AFTER_SYNC, if two threads
T1 (binlogging through MYSQL_BIN_LOG::write()) and T2 were
binlogging at the same time, T1 could accidentally wait for its
semi-sync ACK using the binlog coordinates of T2. Prior to
MDEV-33551, this only resulted in delayed transactions, because all
transactions shared the same condition variable for ACK signaling.
However, with the MDEV-33551 changes, each thread has its own
condition variable to signal. So T1 could wait indefinitely when
either:
1) T1's ACK is received but not T2's when T1 goes into
wait_after_sync(), because the ACK receiver thread has already
notified about the T1 ACK, but T1 was _actually_ waiting on T2's
ACK, and therefore tries to wait (in vain).
2) T1 goes to wait_after_sync() before any ACKs have arrived. When
T1's ACK comes in, T1 is woken up; however, sees it needs to wait
more (because it was actually waiting on T2's ACK), and goes to wait
again (this time, in vain).
Note that the actual cause of T1 waiting on T2's binlog coordinates
is when MYSQL_BIN_LOG::write() would call
Repl_semisync_master::wait_after_sync(), the binlog offset parameter
was read as the end of MYSQL_BIN_LOG::log_file, which is shared
among transactions. So if T2 had updated the binary log _after_ T1
had released LOCK_log, but not yet invoked wait_after_sync(), it
would use the end of the binary log file as the binlog offset, which
was that of T2 (or any future transaction).
The fix in this patch ensures consistency between the binary log
coordinates a transaction uses between report_binlog_update() and
wait_after_sync().
Reviewed By
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
This is a preparatory commit for MDEV-35109 to make its
testing code cleaner (and harden other tests too).
The DEBUG_DBUG point simulate_delay_semisync_slave_reply
up to this patch used my_sleep() to delay an ACK response,
but sleeps are prone to test failures on machines that
run tests when already having a heavy load (e.g. on
buildbot).
This patch changes this DEBUG_DBUG sleep to use DEBUG_SYNC
to coordinate exactly when a slave should send its reply,
which is safer and faster.
As DEBUG_SYNC can't be used while a server is shutting
down, to synchronize threads with SHUTDOWN WAIT FOR SLAVES
logic, we use and extend wait_for_pattern_in_file.inc to
wait for an informational error message in the logic to
indicate that the shutdown process has reached the
intended state (i.e. indicating that the shutdown has
been delayed to await semi-sync ACKs). Specifically, the
extensions are as follows:
1. wait_for_pattern_in_file.inc is extended with parameter
wait_for_pattern_count as a number that indicates the
number of times a pattern should occur in the file before
return control back to the calling script.
2. search_for_pattern_in_file.inc is extended with parameter
SEARCH_ABORT_IS_SUCCESS to inverse the error/success
logic, so the SEARCH_ABORT condition can be used to
indicate success, rather than error.
* remove duplicate test file
* move all uuidv7 tests into plugin/type_uuid/mysql-test/type_uuid/
* remove mysys/ changes
* auto my_random_bytes() fallback - removes duplicate code from uuid,
and fixes all other users of my_random_bytes() that don't check
the return value (because, perhaps, they don't need crypto-strong
random bytes)
* End of 11.6 -> 11.7 in tests
* clarify the warning text
* UUID_VERSION_MASK()/UUID_VARIANT_MASK() must not depend on the version
* allow 4x more monotonic uuidv7 per millisecond - instead of stretching
1000 microseconds over 12 bits, let's use extra 2 bits as a counter
* rename for compatibility with Percona Server (uuid_v4, uuid_v7)