The test encryption.innodb-redo-badkey was accidentally disabled
until commit 23657a21018d0b3d0464bbd55236113ebcd3d4b7 enabled
it recently. Once it was enabled, it started failing randomly.
recv_recover_corrupt_page(): Do not assume that any redo log exists
for the page. A page may be unnecessarily read by read-ahead.
When noting the corruption, reset recv_addr->state to RECV_PROCESSED,
so that even if the same page is re-read again, we will only
decrement recv_sys->n_addrs once.
This allows one to run the test suite even if any of the following
options are changed:
- character-set-server
- collation-server
- join-cache-level
- log-basename
- max-allowed-packet
- optimizer-switch
- query-cache-size and query-cache-type
- skip-name-resolve
- table-definition-cache
- table-open-cache
- Some innodb options
etc
Changes:
- Don't print out the value of system variables as one can't depend on
them to being constants.
- Don't set global variables to 'default' as the default may not
be the same as the test was started with if there was an additional
option file. Instead save original value and reset it at end of test.
- Test that depends on the latin1 character set should include
default_charset.inc or set the character set to latin1
- Test that depends on the original optimizer switch, should include
default_optimizer_switch.inc
- Test that depends on the value of a specific system variable should
set it in the test (like optimizer_use_condition_selectivity)
- Split subselect3.test into subselect3.test and subselect3.inc to
make it easier to set and reset system variables.
- Added .opt files for test that required specfic options that could
be changed by external configuration files.
- Fixed result files in rockdsb & tokudb that had not been updated for
a while.
========
During ibd file creation, InnoDB flushes the page0 without crypt
information. During recovery, InnoDB encounters encrypted page read
before initialising the crypt data of the tablespace. So it leads t
corruption of page and doesn't allow innodb to start.
Solution:
=========
Write crypt_data information in page0 while creating .ibd file creation.
During recovery, crypt_data will be initialised while processing
MLOG_FILE_NAME redo log record.
- Introduce a new variable called innodb_encrypt_temporary_tables which is
a boolean variable. It decides whether to encrypt the temporary tablespace.
- Encrypts the temporary tablespace based on full checksum format.
- Introduced a new counter to track encrypted and decrypted temporary
tablespace pages.
- Warnings issued if temporary table creation has conflict value with
innodb_encrypt_temporary_tables
- Added a new test case which reads and writes the pages from/to temporary
tablespace.
Added the condition in innochecksum tool to check page id mismatch.
This could catch the write corruption caused by InnoDB.
Added the debug insert inside fil_io() to check whether it writes
the page to wrong offset.
Remove the test, because it easily fails with a result difference.
Analysis by Thirunarayanan Balathandayuthapani:
By default, innodb_encrypt_tables=0.
1) Test case creates 100 tables in innodb_encrypt_1.
2) creates another 100 unencrypted tables (encryption=off) in innodb_encrypt_2
3) creates another 100 encrypted tables (encryption=on) in innodb_encrypt_3
4) enabling innodb_encrypt_tables=1 and checking that only
100 encrypted tables exist. (already we have 100 in dictionary)
5) opening all tables again (no idea why)
6) After that, set innodb_encrypt_tables=0 and wait for 100 tables
to be decrypted (already we have 100 unencrypted tables)
7) dropping all databases
Sporadic failure happens because after step 4, it could encrypt the
normal table too, because innodb_encryption_threads=4.
This test was added in MDEV-9931, which was about InnoDB startup being
slow due to all .ibd files being opened. There have been a number of
later fixes to this problem. Currently the latest one is
commit cad56fbabaea7b5dab0ccfbabb98d0a9c61f3dc3, in which some tests
(in particular the test innodb.alter_kill) could fail if all InnoDB
.ibd files are read during startup. That could make this test redundant.
Let us remove the test, because it is big, slow, unreliable, and
does not seem to reliably catch the problem that all files are being
read on InnoDB startup.
Problem:
=======
fil_iterate() writes imported tablespace page0 as it is to discarded
tablespace. Space id wasn't even changed. While opening the tablespace,
tablespace fails with space id mismatch error.
Fix:
====
fil_iterate() copies the page0 with discarded space id to imported
tablespace.
row_search_mvcc(): Duplicate the logic of btr_pcur_move_to_next()
so that an infinite loop can be avoided when advancing to the next
page fails due to a corrupted page.
- Don't apply redo log for the corrupted page when innodb_force_recovery > 0.
- Allow the table to be dropped when index root page is
corrupted when innodb_force_recovery > 0.
The statement
SET GLOBAL innodb_encryption_rotate_key_age=0;
would have the unwanted side effect that ENCRYPTION=DEFAULT tablespaces
would no longer be encrypted or decrypted according to the setting of
innodb_encrypt_tables.
We implement a trigger, so that whenever one of the following is executed:
SET GLOBAL innodb_encrypt_tables=OFF;
SET GLOBAL innodb_encrypt_tables=ON;
SET GLOBAL innodb_encrypt_tables=FORCE;
all wrong-state ENCRYPTION=DEFAULT tablespaces will be added to
fil_system_t::rotation_list, so that the encryption will be added
or removed.
Note: This will *NOT* happen automatically after a server restart.
Before reading the first page of a data file, InnoDB cannot know
the encryption status of the data file. The statement
SET GLOBAL innodb_encrypt_tables will have the side effect that
all not-yet-read InnoDB data files will be accessed in order to
determine the encryption status.
innodb_encrypt_tables_validate(): Stop disallowing
SET GLOBAL innodb_encrypt_tables when innodb_encryption_rotate_key_age=0.
This reverts part of commit 50eb40a2a8aa3af6cc271f6028f4d6d74301d030
that addressed MDEV-11738 and MDEV-11581.
fil_system_t::read_page0(): Trigger a call to fil_node_t::read_page0().
Refactored from fil_space_get_space().
fil_crypt_rotation_list_fill(): If innodb_encryption_rotate_key_age=0,
initialize fil_system->rotation_list. This is invoked both on
SET GLOBAL innodb_encrypt_tables and
on SET GLOBAL innodb_encryption_rotate_key_age=0.
fil_space_set_crypt_data(): Remove.
fil_parse_write_crypt_data(): Simplify the logic.
This is joint work with Marko Mäkelä.
InnoDB crash recovery used to read every data page for which
redo log exists. This is unnecessary for those pages that are
initialized by the redo log. If a newly created page is corrupted,
recovery could unnecessarily fail. It would suffice to reinitialize
the page based on the redo log records.
To add insult to injury, InnoDB crash recovery could hang if it
encountered a corrupted page. We will fix also that problem.
InnoDB would normally refuse to start up if it encounters a
corrupted page on recovery, but that can be overridden by
setting innodb_force_recovery=1.
Data pages are completely initialized by the records
MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS.
MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE,
which notifies that a page has been freed and its contents
can be discarded (filled with zeroes).
The record MLOG_INDEX_LOAD notifies that redo logging has
been re-enabled after being disabled. We can avoid loading
the page if all buffered redo log records predate the
MLOG_INDEX_LOAD record.
For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD
records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab.
Hence, we will skip these optimizations for tables whose
name starts with FTS_.
This is joint work with Thirunarayanan Balathandayuthapani.
fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the
latest recovered MLOG_INDEX_LOAD record for a tablespace.
mlog_init: Page initialization operations discovered during
redo log scanning. FIXME: This really belongs in recv_sys->addr_hash,
and should be removed in MDEV-19176.
recv_addr_state: Add the new state RECV_WILL_NOT_READ to
indicate that according to mlog_init, the page will be
initialized based on redo log record contents.
recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state
if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS
as page initialization. This works around bugs in the crash
recovery of ROW_FORMAT=COMPRESSED tables.
recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record
by resetting the state to RECV_NOT_PROCESSED and by updating
the fil_name_t::enable_lsn.
recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn
to fil_space_t::enable_lsn.
recv_recover_page(): Add the parameter init_lsn, to ignore
any log records that precede the page initialization.
Add DBUG output about skipped operations.
buf_page_create(): Initialize FIL_PAGE_LSN, so that
recv_recover_page() will not wrongly skip applying
the page-initialization record due to the field containing
some newer LSN as a leftover from a different page.
Do not invoke ibuf_merge_or_delete_for_page() during
crash recovery.
recv_apply_hashed_log_recs(): Remove some unnecessary lookups.
Note if a corrupted page was found during recovery.
After invoking buf_page_create(), do invoke
ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge()
in the last recovery batch.
ibuf_merge_or_delete_for_page(): Relax a debug assertion.
innobase_start_or_create_for_mysql(): Abort startup if
a corrupted page was found during recovery. Corrupted pages
will not be flagged if innodb_force_recovery is set.
However, the recv_sys->found_corrupt_fs flag can be set
regardless of innodb_force_recovery if file names are found
to be incorrect (for example, multiple files with the same
tablespace ID).
If InnoDB crash recovery was needed, the InnoDB function srv_start()
would invoke extra validation, reading something from every InnoDB
data file. This should be unnecessary now that MDEV-14717 made
RENAME operations crash-safe inside InnoDB (which can be
disabled in MariaDB 10.2 by setting innodb_safe_truncate=OFF).
dict_check_sys_tables(): Skip tables that would be dropped by
row_mysql_drop_garbage_tables(). Perform extra validation only
if innodb_safe_truncate=OFF, innodb_force_recovery=0 and
crash recovery was needed.
dict_load_table_one(): Validate the root page of the table.
In this way, we can deny access to corrupted or mismatching tables
not only after crash recovery, but also after a clean shutdown.
The problem with the InnoDB table attribute encryption_key_id is that it is
not being persisted anywhere in InnoDB except if the table attribute
encryption is specified and is something else than encryption=default.
MDEV-17320 made it a hard error if encryption_key_id is specified to be
anything else than 1 in that case.
Ideally, we would always persist encryption_key_id in InnoDB. But, then we
would have to be prepared for the case that when encryption is being enabled
for a table whose encryption_key_id attribute refers to a non-existing key.
In MariaDB Server 10.1, our best option remains to not store anything
inside InnoDB. But, instead of returning the error that MDEV-17320
introduced, we should merely issue a warning that the specified
encryption_key_id is going to be ignored if encryption=default.
To improve the situation a little more, we will issue a warning if
SET [GLOBAL|SESSION] innodb_default_encryption_key_id is being set
to something that does not refer to an available encryption key.
Starting with MariaDB Server 10.2, thanks to MDEV-5800, we could open the
table definition from InnoDB side when the encryption is being enabled,
and actually fix the root cause of what was reported in MDEV-17320.
The initial fix only covered a part of Mariabackup.
This fix hardens InnoDB and XtraDB in a similar way, in order
to reduce the probability of mistaking a corrupted encrypted page
for a valid unencrypted one.
This is based on work by Thirunarayanan Balathandayuthapani.
fil_space_verify_crypt_checksum(): Assert that key_version!=0.
Let the callers guarantee that. Now that we have this assertion,
we also know that buf_page_is_zeroes() cannot hold.
Also, remove all diagnostic output and related parameters,
and let the relevant callers emit such messages.
Last but not least, validate the post-encryption checksum
according to the innodb_checksum_algorithm (only accepting
one checksum for the strict variants), and no longer
try to validate the page as if it was unencrypted.
buf_page_is_zeroes(): Move to the compilation unit of the only callers,
and declare static.
xb_fil_cur_read(), buf_page_check_corrupt(): Add a condition before
calling fil_space_verify_crypt_checksum(). This is a non-functional
change.
buf_dblwr_process(): Validate the page only as encrypted or unencrypted,
but not both.
Also, apply the MDEV-17957 changes to encrypted page checksums,
and remove error message output from the checksum function,
because these messages would be useless noise when mariabackup
is retrying reads of corrupted-looking pages, and not that
useful during normal server operation either.
The error messages in fil_space_verify_crypt_checksum()
should be refactored separately.