WL#6326 in MariaDB 10.2.2 introduced a potential hang on purge or rollback
when an index tree is being shrunk by multiple levels.
This fix is based on
mysql/mysql-server@f2c5852630
with the main difference that our version of the test case uses
DEBUG_SYNC instrumentation on ROLLBACK, not on purge.
btr_cur_will_modify_tree(): Simplify the check further.
This is the actual bug fix.
row_undo_mod_remove_clust_low(), row_undo_mod_clust(): Add DEBUG_SYNC
instrumentation for the test case.
By default (innodb_strict_mode=ON), InnoDB attempts to guarantee
at DDL time that any INSERT to the table can succeed.
MDEV-19292 recently revised the "row size too large" check in InnoDB.
The check still is somewhat inaccurate;
that should be addressed in MDEV-20194.
Note: If a table contains multiple long string columns so that each column
is part of a column prefix index, then an UPDATE that attempts to modify
all those columns at once may fail, because the undo log record might
not fit in a single undo log page (of innodb_page_size). In the worst case,
the undo log record would grow by about 3KiB of for each updated column.
The DDL-time check (since the InnoDB Plugin for MySQL 5.1) is optional
in the sense that when the maximum B-tree record size or undo log
record size would be exceeded, the DML operation will fail and the
transaction will be properly rolled back.
create_table_info_t::row_size_is_acceptable(): Add the parameter
'bool strict' so that innodb_strict_mode=ON can be overridden during
TRUNCATE, OPTIMIZE and ALTER TABLE...FORCE (when the storage format
is not changing).
create_table_info_t::create_table(): Perform a sloppy check for
TRUNCATE TABLE (create_fk=false).
prepare_inplace_alter_table_dict(): Perform a sloppy check for
simple operations.
trx_is_strict(): Remove. The function became unused in
commit 98694ab0cb (MDEV-20949).
buf_read_ibuf_merge_pages(): Discard any page numbers that are
outside the current bounds of the tablespace, by invoking the
function ibuf_delete_recs() that was introduced in MDEV-20934.
This could avoid an infinite change buffer merge loop on
innodb_fast_shutdown=0, because normally the change buffer merge
would only be attempted if a page was successfully loaded into
the buffer pool.
dict_drop_index_tree(): Add the parameter trx_t*.
To prevent the DROP TABLE crash, do not invoke btr_free_if_exists()
if the entire .ibd file will be dropped. Thus, we will avoid a crash
if the BTR_SEG_LEAF or BTR_SEG_TOP of the index is corrupted,
and we will also avoid unnecessarily accessing the to-be-dropped
tablespace via the buffer pool.
In MariaDB 10.2, we disable the DROP TABLE fix if innodb_safe_truncate=0,
because the backup-unsafe MySQL 5.7 WL#6501 form of TRUNCATE TABLE
requires that the individual pages be freed inside the tablespace.
In the test innodb.instant_alter,4k we would be flagging an error
for too large row size. That error was previously only being reported
if the table was being rebuilt. Thus, this merge is fixing a small
omission in MDEV-11369 (instant ADD COLUMN).
Move row size check to early CREATE/ALTER TABLE phase. Stop checking
on table open.
dict_index_add_to_cache(): remove parameter 'strict', stop checking row size
dict_index_t::record_size_info_t: this is a result of row size check operation
create_table_info_t::row_size_is_acceptable(): performs row size check.
Issues error or warning. Writes first overflow field to InnoDB log.
create_table_info_t::create_table(): add row size check
dict_index_t::record_size_info(): this is a refactored version
of dict_index_t::rec_potentially_too_big(). New version doesn't change global
state of a program but return all interesting info. And it's callers who
decide how to handle row size overflow.
dict_index_t::rec_potentially_too_big(): removed
Due to MDEV-12288, the slow shutdown in MariaDB 10.3 will include
resetting the DB_TRX_ID for all inserted records. This might
cause the 60-second shutdown_server timeout to be exceeded.
Let us wait for the purge to complete before initiating slow shutdown.
Due to a data corruption bug that may have occurred a long time earlier
(possibly involving physical backup and MySQL Bug #69122, which was
addressed in commit f166ec71b7)
it seems possible that the InnoDB change buffer might end up containing
entries, while no buffered changes exist according to the change buffer
bitmap pages in the .ibd files.
ibuf_delete_recs(): New function, to be invoked on slow shutdown only.
Remove all buffered changes for a specific page.
ibuf_merge_or_delete_for_page(): If the change buffer bitmap is clean
and a slow shutdown is in progress, invoke ibuf_delete_recs().
We do not want to do that during normal operation, due to the additional
overhead that is involved. The bitmap page should be consistent with
the change buffer in the first place.
innobase_drop_foreign_try(): Don't evict and reload the dict_foreign_t
during instant ALTER TABLE if the FOREIGN KEY constraint is being
dropped.
The MDEV-19630 fix (commit 07b1a26c33)
was incomplete, because it did not cover a case where the
FOREIGN KEY constraint is being dropped.
mysql_insert() first opens all affected tables (which implicitly
starts a transaction in InnoDB), then stat tables.
A failure to open a stat table caused open_tables() to abort
the current stmt transaction (trans_rollback_stmt()). So, from the
server point of view the following ha_write_row()-s happened outside
of a transactions, and the server didn't bother to commit them.
The server has a mechanism to prevent a transaction being
unexpectedly committed or rolled back in the middle of a statement -
if an operation takes place _in a sub-statement_ it cannot change
the transaction state. Operations on stat tables are exactly that -
they are not allowed to change a transaction state. Put them in
a sub-statement to make sure they don't.
Apply the changes to InnoDB and XtraDB that had been
inadvertently skipped in the merge
commit ae476868a5
That merge failure sabotaged part of MDEV-20127:
>Revert a problematic auto_increment_increment 'fix' from 2014.
>This involves replacing the MDEV-8827 fix and in 10.1,
>removing some WSREP instrumentation.
The code changes were re-merged manually by executing the following:
# Get the parent of the problematic merge.
git checkout ae476868a5394041a00e75a29c7d45917e8dfae8^
# Perform the merge again.
git merge ae476868a5394041a00e75a29c7d45917e8dfae8^2
# Get the conflict resolution from that merge.
git checkout ae476868a5 .
# Note: Any changes to these files were removed (empty diff)!
git diff HEAD storage/{innobase,xtradb}/handler/ha_innodb.cc
# Apply the code changes:
git diff cf40393471b10ca68cc1d2804c22ab9203900978^2..MERGE_HEAD \
storage/{innobase,xtradb}/handler/ha_innodb.cc|
patch -p1
To diagnose a hang in slow shutdown (innodb_fast_shutdown=0),
let us introduce a Boolean startup option in debug builds
that will cause the contents of the InnoDB change buffer
to be dumped to the server error log at startup.
Test innodb_read_only startup (which will be refused after a crash),
and test also innodb_force_recovery=5, and extract some change buffer
merge statistics. Omit any statistics about delete (purge) buffering,
because purge could happen at any time.
Use the sequence storage engine for populating the table.
Try to use more deterministic floating-point operations.
Apparently, 2.2 > 2.2 wrongly holds on many platforms, but
not ppc64le on the compiler used on Red Had Enterprise Linux 8.
The reason could be an infinite binary presentation:
2.2 = 0b10.001100110011…
With t1_f = 2.5 = 0b10.1, t1_f > 2.5 would no longer hold on AMD64.
Let us replace the 2.2 with 2.5 and compare t1_f >= 2.5 in order to
get more consistent results across all platforms.
We were missing a test that would exercise trx_free_prepared()
with innodb_fast_shutdown=0. Add a test.
Note: if shutdown hangs due to the XA PREPARE transactions,
in MariaDB 10.2 the test would unfortunately pass, but take
2*60 seconds longer, because of two shutdown_server statements
timing out after 60 seconds. Starting with MariaDB 10.3, the
hung server would be killed with SIGABRT, and the test could
fail thanks to a backtrace message.