The long semaphore wait appeared to be the caused by the following
pattern in the MTR test:
```
SET DEBUG_SYNC = "now SIGNAL wsrep_after_certification_continue";
SET DEBUG_SYNC = "now SIGNAL signal.wsrep_apply_cb;
```
Raising two signals, one right after another, caused one signal to
overwrite the other, before the signal was consumed by the thread.
This caused one thread to be stuck until the debug sync point would
timeout.
Analysis:
========
'max_binlog_cache_size' is configured and a huge transaction is executed. When
the transaction specific events size exceeds 'max_binlog_cache_size' the event
cannot be written to the binary log cache and cache write error is raised.
Upon cache write error the statement is rolled back and the transaction cache
should be truncated to a previous statement specific position. The truncate
operation should reset the cache to earlier valid positions and flush the new
changes. Even though the flush is successful the cache write error is still in
marked state. The truncate code interprets the cache write error as cache flush
failure and returns abruptly without modifying the write cache parameters.
Hence cache is in a invalid state. When a COMMIT statement is executed in this
session it tries to flush the contents of transaction cache to binary log.
Since cache has partial events the cache write operation will report
'writer.remains' assert.
Fix:
===
Binlog truncate function resets the cache to a specified size. As a first step
of truncation, clear the cache write error flag that was raised during earlier
execution. With this new errors that surface during cache truncation can be
clearly identified.
MDEV-18046: Assortment of crashes, assertion failures and ASAN errors in mysql_show_binlog_events
Problem:
========
SHOW BINLOG EVENTS FROM <pos> reports following assert when ASAN is enabled.
uint32 binlog_get_uncompress_len(const char*):
Assertion `(buf[0] & 0xe0) == 0x80' failed
Fix:
===
**Part11: Converted debug assert to error handler code**
Problem:
========
SHOW BINLOG EVENTS FROM <pos> causes a variety of failures, some of which are
listed below. It is not a race condition issue, but there is some
non-determinism in it.
Analysis:
========
"show binlog events from <pos>" code considers the user given position as a
valid event start position. The code starts reading data from this event start
position onwards and tries to map it to a set of known events. Each event has
a specific event structure and asserts have been added to ensure that read
event data satisfies the event specific requirements. When a random position
is supplied to "show binlog events command" the event structure specific
checks will fail and they result in assert.
Fix:
====
The fix is split into different parts. Each part addresses either an ASAN
issue or an assert/crash.
**Part1: Checksum based position validation when checksum is enabled**
Using checksum validate the very first event read at the user specified
position. If there is a checksum mismatch report an appropriate error for the
invalid event.
By default (innodb_strict_mode=ON), InnoDB attempts to guarantee
at DDL time that any INSERT to the table can succeed.
MDEV-19292 recently revised the "row size too large" check in InnoDB.
The check still is somewhat inaccurate;
that should be addressed in MDEV-20194.
Note: If a table contains multiple long string columns so that each column
is part of a column prefix index, then an UPDATE that attempts to modify
all those columns at once may fail, because the undo log record might
not fit in a single undo log page (of innodb_page_size). In the worst case,
the undo log record would grow by about 3KiB of for each updated column.
The DDL-time check (since the InnoDB Plugin for MySQL 5.1) is optional
in the sense that when the maximum B-tree record size or undo log
record size would be exceeded, the DML operation will fail and the
transaction will be properly rolled back.
create_table_info_t::row_size_is_acceptable(): Add the parameter
'bool strict' so that innodb_strict_mode=ON can be overridden during
TRUNCATE, OPTIMIZE and ALTER TABLE...FORCE (when the storage format
is not changing).
create_table_info_t::create_table(): Perform a sloppy check for
TRUNCATE TABLE (create_fk=false).
prepare_inplace_alter_table_dict(): Perform a sloppy check for
simple operations.
trx_is_strict(): Remove. The function became unused in
commit 98694ab0cb (MDEV-20949).
PR#1127: Fix is_check_constraints.result to be compatibile with 10.3
The patch is done according to the original patch for MDEV-14474
1edd09c325 and not one which is merged on server
d526679efd.
This patch includes:
- Rename from `is_check_constraint` to `is_check_constraints` to tests
and results
- Per review, change the order of fields in IS check_constraints table by adding
the column `table_name` before `constraint_name`. According to the standard
2006 there is no `table_name` column.
- Original patch and one in `10.3` supports embedded server this patch doesn't
support. After the merge `10.3` will not support also.
- Don't use patch c8b8b01b61 to change the length of `CHECK_CLAUSE` field
PR#1150: MDEV-18440: Information_schema.check_constraints possible data leak
This patch is extension of PR 1127 and includes:
- Check for table grants
- Additional test according to the MDEV specification
Signed-off-by: Vicențiu Ciorbaru <vicentiu@mariadb.org>
copy thread)
mariabackup hangs waiting until innodb redo log thread read log till certain
LSN, and it waits under FTWRL. If there is redo log read error in the thread,
it is finished, and main thread knows nothing about it, what leads to hanging.
As it hangs under FTWRL, slave threads on server side can be blocked due
to MDL lock conflict.
The fix is to finish mariabackup with error message on innodb redo log read
failure.
InnoDB RNG maintains global state, causing otherwise unnecessary bus
traffic. Even worse, this is cross-mutex traffic. That is, different
mutexes suffer from contention.
Fixed delay of 4 was verified to give best throughput by OLTP update
index and read-write benchmarks on Intel Broadwell (2/20/40) and
ARM (1/46/46).
This is a backport of ce04790065 from
MariaDB Server 10.3.
We should not need anywhere near 32 bits of entropy, so we might
just limit ourselves to a 32-bit random number generator.
Also, it might be cheaper to use exclusive-or, bit shifting and
conditional jumps, instead of multiplication and addition.
We use relaxed atomic operations on the global random number generator
state in order in an attempt to silence any warnings about race conditions.
There is an obvious race condition between the load and store in
ut_rnd_gen(), but we do not think that it matters much that the
state of the random number generator could 'stutter'.
This change seems makes the 'uncompress_ops' nondeterministic
in innodb_zip.cmp_per_index after the restart. It looks like
there is an inherent race condition in the test, because the
table could be opened for InnoDB statistics recalculation
already before innodb_cmp_per_index_enabled was set. We might
end up having uncompress_ops anywhere between 0 and 9, or perhaps
even more. Let us remove that part of the test.
Found two bugs
(1) have_committing_connections was missing mutex unlock on one
exit case. As this function is called on a loop it caused mutex
lock when we already owned the mutex. This could cause hang.
(2) wsrep_RSU_begin did set up error code when partition to
be dropped could not be MDL-locked because of concurrent
operations but wrong error code was propagated to upper layer
causing error to be ignored. This could have also caused
the hang.
In MariaDB 10.2 master could have been configured so that there
is extra annotate events. When we peak next event type for CTAS we
need to skip annotate events.
executing undo undo_key_delete" upon startup on datadir restored from
incremental backup
aria_log* files were not copied on --prepare --incremental-dir step from
incremental to destination backup directory.
The problem happens when MariaDB master replicates writes for only non InnoDB
tables (e.g. writes to MyISAM table(s)). Async slave node, in Galera cluster,
can apply these writes successfully, but it will, in the end, write gtid position in
mysql.gtid_slave_pos table. mysql.gtid_slave_pos table is InnoDB engine, and
this write makes innodb handlerton part of the replicated "transaction".
Note that wsrep patch identifies that write to gtid_slave_pos should not be replicated
and skips appending wsrep keys for these writes. However, as InnoDB was present
in the transaction, and there are replication events (for MyISAM table) in transaction
cache, but there are no appended keys, wsrep raises an error, and this makes the söave
thread to stop.
The fix is simply to not treat it as an error if async slave tries to replicate a write
set with binlog events, but no keys. We just skip wsrep replication and return successfully.
This commit contains also a mtr test which forces mysql.gtid_slave_pos table isto be
of InnoDB engine, and executes MyISAM only write through asyn replication.
There is additional fix for declaring IO and background slave threads as non wsrep.
These threads should not write anything for wsrep replication, and this is just a safeguard
to make sure nothing leaks into cluster from these slave threads.
buf_read_ibuf_merge_pages(): Discard any page numbers that are
outside the current bounds of the tablespace, by invoking the
function ibuf_delete_recs() that was introduced in MDEV-20934.
This could avoid an infinite change buffer merge loop on
innodb_fast_shutdown=0, because normally the change buffer merge
would only be attempted if a page was successfully loaded into
the buffer pool.
dict_drop_index_tree(): Add the parameter trx_t*.
To prevent the DROP TABLE crash, do not invoke btr_free_if_exists()
if the entire .ibd file will be dropped. Thus, we will avoid a crash
if the BTR_SEG_LEAF or BTR_SEG_TOP of the index is corrupted,
and we will also avoid unnecessarily accessing the to-be-dropped
tablespace via the buffer pool.
In MariaDB 10.2, we disable the DROP TABLE fix if innodb_safe_truncate=0,
because the backup-unsafe MySQL 5.7 WL#6501 form of TRUNCATE TABLE
requires that the individual pages be freed inside the tablespace.
This PR contains a mtr test for reproducing a failure with replicating create table as select statement (CTAS) through asynchronous mariadb replication to mariadb galera cluster.
The problem happens when CTAS replication contains both create table statement followed by row events for populating the table. In such situation, the galera node operating as mariadb replication slave, will first replicate only the create table part into the cluster, and then perform another replication containing both the create table and row events. This will lead all other nodes to fail for duplicate table create attempt, and crash due to this failure.
PR contains also a fix, which identifies the situation when CTAS has been replicated, and makes further scan in async replication stream to see if there are following row events. The slave node will replicate either single TOI in case the CTAS table is empty, or if CTAS table contains rows, then single bundled write set with create table and row events is replicated to galera cluster.
This fix should keep master server's GTID's for CTAS replication in sync with GTID's in galera cluster.
Move row size check to early CREATE/ALTER TABLE phase. Stop checking
on table open.
dict_index_add_to_cache(): remove parameter 'strict', stop checking row size
dict_index_t::record_size_info_t: this is a result of row size check operation
create_table_info_t::row_size_is_acceptable(): performs row size check.
Issues error or warning. Writes first overflow field to InnoDB log.
create_table_info_t::create_table(): add row size check
dict_index_t::record_size_info(): this is a refactored version
of dict_index_t::rec_potentially_too_big(). New version doesn't change global
state of a program but return all interesting info. And it's callers who
decide how to handle row size overflow.
dict_index_t::rec_potentially_too_big(): removed
Problem:
========
CURRENT_TEST: binlog_encryption.rpl_corruption
mysqltest: In included file "./include/wait_for_slave_io_error.inc":
...
At line 72: Slave stopped with wrong error code
**** Slave stopped with wrong error code: 1743 (expected 1595,1913) ****
Analysis:
========
The test emulates the corruption at the various stages of replication for
example in binlog file, in network and in relay log etc. It verifies that all
corruption cases are handled through appropriate error messages.
The test cases which emulate network failure expect following errors.
--ER_SLAVE_RELAY_LOG_WRITE_FAILURE (1595)
--ER_NETWORK_READ_EVENT_CHECKSUM_FAILURE (1743)
Ideally test should expect error codes as 1595 and 1743.
But the test actually waits on incorrect error code 1595,1913
Fix:
===
Added appropriate error code for 'ER_NETWORK_READ_EVENT_CHECKSUM_FAILURE'.
Replaced 1913 with 1743.
The assert indicates that the current transaction got caught uncleaned from
the semisync master's cache when it is signaled to proceed upon its
ack receive.
The reason of missed cleanup turns out to be a flaw in the gtid
connect mode.
A submitted by connecting slave value of its last received event's
binlog file *name* was adopted into
{{Repl_semi_sync_master::m_reply_file_name}} as a part of semisync
initialization.
Notice that the initialization still refines the position part of the
submitted last received event's binlog coordinates.
The master side binlog filename:pos refinement is
specific to the gtid connect mode for purpose of computing the latest
binlog file to resume slave feeding from.
Effectively in the gtid connect mode the computed resumption filename:pos
may appear smaller in which case a new post-connect time committing
transaction may be logged with its filename:pos also less than the
submitted coordinates and that triggers the assert.
Fixed with making the semisync initialization to use the refined filename:pos.
It is guaranteed to be less than any new generated transaction's binlog:pos.
Due to a data corruption bug that may have occurred a long time earlier
(possibly involving physical backup and MySQL Bug #69122, which was
addressed in commit f166ec71b7)
it seems possible that the InnoDB change buffer might end up containing
entries, while no buffered changes exist according to the change buffer
bitmap pages in the .ibd files.
ibuf_delete_recs(): New function, to be invoked on slow shutdown only.
Remove all buffered changes for a specific page.
ibuf_merge_or_delete_for_page(): If the change buffer bitmap is clean
and a slow shutdown is in progress, invoke ibuf_delete_recs().
We do not want to do that during normal operation, due to the additional
overhead that is involved. The bitmap page should be consistent with
the change buffer in the first place.
InnoDB: Assertion failure in file .../dict/dict0dict.cc line ...
InnoDB: Failing assertion: table->can_be_evicted
This fixes a regression that was caused by the fix of MDEV-20621
(commit a41d429765).
MySQL 5.6 (and MariaDB 10.0) introduced eviction of tables from
the InnoDB data dictionary cache. Tables that are connected to
FOREIGN KEY constraints or FULLTEXT INDEX are exempt of the eviction.
With the problematic change, a table that would already be exempt
from eviction due to FOREIGN KEY would cause the problem if there
also was a FULLTEXT INDEX defined on it.
dict_load_table(): Only prevent eviction if table->can_be_evicted holds.
Unfortunate DROP TEMPORARY..IF EXISTS on a regular table may allow
subsequent CREATE TABLE statements to steal away the PFS_table_share
instance from the dropped table.
mysql_insert() first opens all affected tables (which implicitly
starts a transaction in InnoDB), then stat tables.
A failure to open a stat table caused open_tables() to abort
the current stmt transaction (trans_rollback_stmt()). So, from the
server point of view the following ha_write_row()-s happened outside
of a transactions, and the server didn't bother to commit them.
The server has a mechanism to prevent a transaction being
unexpectedly committed or rolled back in the middle of a statement -
if an operation takes place _in a sub-statement_ it cannot change
the transaction state. Operations on stat tables are exactly that -
they are not allowed to change a transaction state. Put them in
a sub-statement to make sure they don't.
Apply the changes to InnoDB and XtraDB that had been
inadvertently skipped in the merge
commit ae476868a5
That merge failure sabotaged part of MDEV-20127:
>Revert a problematic auto_increment_increment 'fix' from 2014.
>This involves replacing the MDEV-8827 fix and in 10.1,
>removing some WSREP instrumentation.
The code changes were re-merged manually by executing the following:
# Get the parent of the problematic merge.
git checkout ae476868a5394041a00e75a29c7d45917e8dfae8^
# Perform the merge again.
git merge ae476868a5394041a00e75a29c7d45917e8dfae8^2
# Get the conflict resolution from that merge.
git checkout ae476868a5 .
# Note: Any changes to these files were removed (empty diff)!
git diff HEAD storage/{innobase,xtradb}/handler/ha_innodb.cc
# Apply the code changes:
git diff cf40393471b10ca68cc1d2804c22ab9203900978^2..MERGE_HEAD \
storage/{innobase,xtradb}/handler/ha_innodb.cc|
patch -p1