Now there can be only one log file instead of several which
logically work as a single file.
Possible names of redo log files: ib_logfile0,
ib_logfile101 (for just created one)
innodb_log_fiels_in_group: value of this variable is not used
by InnoDB. Possible values are still 1..100, to not break upgrade
LOG_FILE_NAME: add constant of value "ib_logfile0"
LOG_FILE_NAME_PREFIX: add constant of value "ib_logfile"
get_log_file_path(): convenience function that returns full
path of a redo log file
SRV_N_LOG_FILES_MAX: removed
srv_n_log_files: we can't remove this for compatibility reasons,
but now server doesn't use this variable
log_sys_t::file::fd: now just one, not std::vector
log_sys_t::log_capacity: removed word 'group'
find_and_check_log_file(): part of logic from huge srv_start()
moved here
recv_sys_t::files: file descriptors of redo log files.
There can be several of those in case we're upgrading
from older MariaDB version.
recv_sys_t::remove_extra_log_files: whether to remove
ib_logfile{1,2,3...} after successfull upgrade.
recv_sys_t::read(): open if needed and read from one
of several log files
recv_sys_t::files_size(): open if needed and return files count
redo_file_sizes_are_correct(): check that redo log files
sizes are equal. Just to log an error for a user.
Corresponding check was moved from srv0start.cc
namespace deprecated: put all deprecated variables here to
prevent usage of it by us, developers
log_t::FORMAT_10_5: physical redo log format tag
log_phys_t: Buffered records in the physical format.
The log record bytes will follow the last data field,
making use of alignment padding that would otherwise be wasted.
If there are multiple records for the same page, also those
may be appended to an existing log_phys_t object if the memory
is available.
In the physical format, the first byte of a record identifies the
record and its length (up to 15 bytes). For longer records, the
immediately following bytes will encode the remaining length
in a variable-length encoding. Usually, a variable-length-encoded
page identifier will follow, followed by optional payload, whose
length is included in the initially encoded total record length.
When a mini-transaction is updating multiple fields in a page,
it can avoid repeating the tablespace identifier and page number
by setting the same_page flag (most significant bit) in the first
byte of the log record. The byte offset of the record will be
relative to where the previous record for that page ended.
Until MDEV-14425 introduces a separate file-level log for
redo log checkpoints and file operations, we will write the
file-level records in the page-level redo log file.
The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT)
will be removed in MDEV-14425, and one sequential scan of the
page recovery log will suffice.
Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags.
If the information is needed, it can be parsed from WRITE records that
modify FSP_SPACE_FLAGS.
MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily
as part of this work, before being replaced with WRITE (along with
MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES).
mtr_buf_t::empty(): Check if the buffer is empty.
mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty.
mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record,
for the same_page encoding.
page_recv_t::last_offset: Reflects mtr_t::m_last_offset.
Valid values for last_offset during recovery should be 0 or above 8.
(The first 8 bytes of a page are the checksum and the page number,
and neither are ever updated directly by log records.)
Internally, the special value 1 indicates that the same_page form
will not be allowed for the subsequent record.
mtr_t::page_create(): Take the block descriptor as parameter,
so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE
record will always followed by a subtype byte, because same_page
records must be longer than 1 byte.
trx_undo_page_init(): Combine the writes in WRITE record.
trx_undo_header_create(): Write 4 bytes using a special MEMSET
record that includes 1 bytes of length and 2 bytes of payload.
flst_write_addr(): Define as a static function. Combine the writes.
flst_zero_both(): Replaces two flst_zero_addr() calls.
flst_init(): Do not inline the function.
fsp_free_seg_inode(): Zerofill the whole inode.
fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT
to FIL_NULL when using the physical format.
btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page()
must have been invoked.
fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE.
fil_names_dirty_and_write(): Remove the parameter mtr.
Write the records using a separate mini-transaction object,
because any FILE_ records must be at the start of a mini-transaction log.
recv_recover_page(): Add a fil_space_t* parameter.
After applying log to the a ROW_FORMAT=COMPRESSED page,
invoke buf_zip_decompress() to restore the uncompressed page.
buf_page_io_complete(): Remove the temporary hack to discard the
uncompressed page of a ROW_FORMAT=COMPRESSED page.
page_zip_write_header(): Remove. Use mtr_t::write() or
mtr_t::memset() instead, and update the compressed page frame
separately.
trx_undo_header_add_space_for_xid(): Remove.
trx_undo_seg_create(): Perform the changes that were previously
made by trx_undo_header_add_space_for_xid().
btr_reset_instant(): New function: Reset the table to MariaDB 10.2
or 10.3 format when rolling back an instant ALTER TABLE operation.
page_rec_find_owner_rec(): Merge with the only callers.
page_cur_insert_rec_low(): Combine writes by using a local buffer.
MEMMOVE data from the preceding record whenever feasible
(copying at least 3 bytes).
page_cur_insert_rec_zip(): Combine writes to page header fields.
PageBulk::insertPage(): Issue MEMMOVE records to copy a matching
part from the preceding record.
PageBulk::finishPage(): Combine the writes to the page header
and to the sparse page directory slots.
mtr_t::write(): Only log the least significant (last) bytes
of multi-byte fields that actually differ.
For updating FSP_SIZE, we must always write all 4 bytes to the
redo log, so that the fil_space_set_recv_size() logic in
recv_sys_t::parse() will work.
mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument
instead of a numeric offset to the page frame. Only log the
last bytes of multi-byte fields that actually differ.
In fil_space_crypt_t::write_page0(), we must log also any
unchanged bytes, so that recovery will recognize the record
and invoke fil_crypt_parse().
Future work:
MDEV-21724 Optimize page_cur_insert_rec_low() redo logging
MDEV-21725 Optimize btr_page_reorganize_low() redo logging
MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED
Instead of writing the high-level redo log records
MLOG_LIST_END_COPY_CREATED, MLOG_COMP_LIST_END_COPY_CREATED
write log for each individual insert of a record.
page_copy_rec_list_end_to_created_page(): Remove.
This will improve the fill factor of some pages.
Adjust some tests accordingly.
PageBulk::init(), PageBulk::finish(): Avoid setting bogus limits
to PAGE_HEAP_TOP and PAGE_N_DIR_SLOTS. Avoid accessor functions
that would enforce these limits before the correct ones are set
at the end of PageBulk::finish().
The test encryption.innodb-redo-badkey will by design cause access
to pages that appear corrupted (due to incorrect encryption key).
Let us disable the page dumps by requiring the test to be run on
a debug server. Page dumps on debug builds were already disabled
in MDEV-19766.
Historically, InnoDB split the redo log into at least 2 files.
MDEV-12061 allowed the minimum to be innodb_log_files_in_group=1,
but it kept the default at innodb_log_files_in_group=2.
Because performance seems to be slightly better with only one log file,
and because implementing an append-only variant of the log would require
a single file, let us define the default to be 1, and have
innodb_log_file_size=96M, to retain the same default total size.
In commit 0f7732d1d1
we introduced a innodb_checksum_algorithm=full_crc32 combination
to a number of encryption tests, and also fixed the code accordingly.
The default in MariaDB 10.5 is innodb_checksum_algorithm=full_crc32.
In a test merge to 10.5, the test encryption.innodb-redo-badkey failed
once due to a message that had been added in that commit.
Let us introduce a full_crc32 option to that test.
And let us use strict_crc32 and strict_full_crc32 instead of the
non-strict variants, for the previously augmented tests, to be in
line with the earlier tests encryption.corrupted_during_recovery and
encryption.innodb_encrypt_temporary_tables.
When MDEV-12026 introduced innodb_checksum_algorithm=full_crc32 in
MariaDB 10.4, it accidentally added a dependency on buf_page_t::encrypted.
Now that the flag has been removed, we must adjust the page-read routine.
buf_page_io_complete(): When the full_crc32 page checksum matches but the
tablespace ID in the page does not match after decrypting, we should
declare it a decryption failure and suppress the page dump output and
any attempts to re-read the page.
The test encryption.innodb-redo-badkey was accidentally disabled
until commit 23657a2101 enabled
it recently. Once it was enabled, it started failing randomly.
recv_recover_corrupt_page(): Do not assume that any redo log exists
for the page. A page may be unnecessarily read by read-ahead.
When noting the corruption, reset recv_addr->state to RECV_PROCESSED,
so that even if the same page is re-read again, we will only
decrement recv_sys->n_addrs once.
The test had been disabled in 10.2 due to frequent failures,
in 5ec9b88e11.
After the problems were addressed, we failed to re-enable the test
until now.
This allows one to run the test suite even if any of the following
options are changed:
- character-set-server
- collation-server
- join-cache-level
- log-basename
- max-allowed-packet
- optimizer-switch
- query-cache-size and query-cache-type
- skip-name-resolve
- table-definition-cache
- table-open-cache
- Some innodb options
etc
Changes:
- Don't print out the value of system variables as one can't depend on
them to being constants.
- Don't set global variables to 'default' as the default may not
be the same as the test was started with if there was an additional
option file. Instead save original value and reset it at end of test.
- Test that depends on the latin1 character set should include
default_charset.inc or set the character set to latin1
- Test that depends on the original optimizer switch, should include
default_optimizer_switch.inc
- Test that depends on the value of a specific system variable should
set it in the test (like optimizer_use_condition_selectivity)
- Split subselect3.test into subselect3.test and subselect3.inc to
make it easier to set and reset system variables.
- Added .opt files for test that required specfic options that could
be changed by external configuration files.
- Fixed result files in rockdsb & tokudb that had not been updated for
a while.
The MDEV-20265 commit e746f451d5
introduces DBUG_ASSERT(right_op == r_tbl) in
st_select_lex::add_cross_joined_table(), and that assertion would
fail in several tests that exercise joins. That commit was skipped
in this merge, and a separate fix of MDEV-20265 will be necessary in 10.4.
========
During ibd file creation, InnoDB flushes the page0 without crypt
information. During recovery, InnoDB encounters encrypted page read
before initialising the crypt data of the tablespace. So it leads t
corruption of page and doesn't allow innodb to start.
Solution:
=========
Write crypt_data information in page0 while creating .ibd file creation.
During recovery, crypt_data will be initialised while processing
MLOG_FILE_NAME redo log record.
Problem:
========
Checksum for the encrypted temporary tablespace is not stored in the page
for full crc32 format.
Solution:
========
Made temporary tablespace in full crc32 format irrespective of encryption
parameter.
buf_tmp_page_encrypt(), buf_tmp_page_decrypt() - Both follows full_crc32
format.
- Introduce a new variable called innodb_encrypt_temporary_tables which is
a boolean variable. It decides whether to encrypt the temporary tablespace.
- Encrypts the temporary tablespace based on full checksum format.
- Introduced a new counter to track encrypted and decrypted temporary
tablespace pages.
- Warnings issued if temporary table creation has conflict value with
innodb_encrypt_temporary_tables
- Added a new test case which reads and writes the pages from/to temporary
tablespace.
Added the condition in innochecksum tool to check page id mismatch.
This could catch the write corruption caused by InnoDB.
Added the debug insert inside fil_io() to check whether it writes
the page to wrong offset.
Remove the test, because it easily fails with a result difference.
Analysis by Thirunarayanan Balathandayuthapani:
By default, innodb_encrypt_tables=0.
1) Test case creates 100 tables in innodb_encrypt_1.
2) creates another 100 unencrypted tables (encryption=off) in innodb_encrypt_2
3) creates another 100 encrypted tables (encryption=on) in innodb_encrypt_3
4) enabling innodb_encrypt_tables=1 and checking that only
100 encrypted tables exist. (already we have 100 in dictionary)
5) opening all tables again (no idea why)
6) After that, set innodb_encrypt_tables=0 and wait for 100 tables
to be decrypted (already we have 100 unencrypted tables)
7) dropping all databases
Sporadic failure happens because after step 4, it could encrypt the
normal table too, because innodb_encryption_threads=4.
This test was added in MDEV-9931, which was about InnoDB startup being
slow due to all .ibd files being opened. There have been a number of
later fixes to this problem. Currently the latest one is
commit cad56fbaba, in which some tests
(in particular the test innodb.alter_kill) could fail if all InnoDB
.ibd files are read during startup. That could make this test redundant.
Let us remove the test, because it is big, slow, unreliable, and
does not seem to reliably catch the problem that all files are being
read on InnoDB startup.
Problem:
=======
fil_iterate() writes imported tablespace page0 as it is to discarded
tablespace. Space id wasn't even changed. While opening the tablespace,
tablespace fails with space id mismatch error.
Fix:
====
fil_iterate() copies the page0 with discarded space id to imported
tablespace.