Some builders in our CI, most notably FreeBSD and IBM AIX, do not support
sparse files. Also, Microsoft Windows requires special means for creating
sparse files. Since these platforms do not run ./mtr --big-test, we will
for now simply move the test to a separate file that requires that option.
recv_log_recover_10_4(): Widen the operand of bitwise and to 64 bits,
so that the upgrade check will work when the redo log record is located
more than 4 gigabytes from the start of the first file.
ibuf_init_at_db_start(): Validate the change buffer root page.
A later version may stop creating a change buffer, and this
validation check will prevent a downgrade from such later versions.
ibuf_max_size_update(): If the change buffer was not loaded, do nothing.
dict_boot(): Merge the local variable "error" to "err". Ignore
failures of ibuf_init_at_db_start() if innodb_force_recovery>=4.
mtr_t::page_lock(): Validate the page number.
ibuf_tree_root_get(): Remove assertions that became redundant.
The assertions in btr_validate_level() are kind of redundant as well,
but because they are ut_a(), they are also present in release builds,
while the ones in mtr_t::page_lock() are only present in debug builds.
btr_cur_position(): Do not duplicate an assertion that is part of
page_cur_position().
dict_load_tablespace(): Introduce a new option
DICT_ERR_IGNORE_TABLESPACE that will suppress loading a tablespace
when a table is going to be dropped.
Now there can be only one log file instead of several which
logically work as a single file.
Possible names of redo log files: ib_logfile0,
ib_logfile101 (for just created one)
innodb_log_fiels_in_group: value of this variable is not used
by InnoDB. Possible values are still 1..100, to not break upgrade
LOG_FILE_NAME: add constant of value "ib_logfile0"
LOG_FILE_NAME_PREFIX: add constant of value "ib_logfile"
get_log_file_path(): convenience function that returns full
path of a redo log file
SRV_N_LOG_FILES_MAX: removed
srv_n_log_files: we can't remove this for compatibility reasons,
but now server doesn't use this variable
log_sys_t::file::fd: now just one, not std::vector
log_sys_t::log_capacity: removed word 'group'
find_and_check_log_file(): part of logic from huge srv_start()
moved here
recv_sys_t::files: file descriptors of redo log files.
There can be several of those in case we're upgrading
from older MariaDB version.
recv_sys_t::remove_extra_log_files: whether to remove
ib_logfile{1,2,3...} after successfull upgrade.
recv_sys_t::read(): open if needed and read from one
of several log files
recv_sys_t::files_size(): open if needed and return files count
redo_file_sizes_are_correct(): check that redo log files
sizes are equal. Just to log an error for a user.
Corresponding check was moved from srv0start.cc
namespace deprecated: put all deprecated variables here to
prevent usage of it by us, developers
log_t::FORMAT_10_5: physical redo log format tag
log_phys_t: Buffered records in the physical format.
The log record bytes will follow the last data field,
making use of alignment padding that would otherwise be wasted.
If there are multiple records for the same page, also those
may be appended to an existing log_phys_t object if the memory
is available.
In the physical format, the first byte of a record identifies the
record and its length (up to 15 bytes). For longer records, the
immediately following bytes will encode the remaining length
in a variable-length encoding. Usually, a variable-length-encoded
page identifier will follow, followed by optional payload, whose
length is included in the initially encoded total record length.
When a mini-transaction is updating multiple fields in a page,
it can avoid repeating the tablespace identifier and page number
by setting the same_page flag (most significant bit) in the first
byte of the log record. The byte offset of the record will be
relative to where the previous record for that page ended.
Until MDEV-14425 introduces a separate file-level log for
redo log checkpoints and file operations, we will write the
file-level records in the page-level redo log file.
The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT)
will be removed in MDEV-14425, and one sequential scan of the
page recovery log will suffice.
Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags.
If the information is needed, it can be parsed from WRITE records that
modify FSP_SPACE_FLAGS.
MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily
as part of this work, before being replaced with WRITE (along with
MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES).
mtr_buf_t::empty(): Check if the buffer is empty.
mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty.
mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record,
for the same_page encoding.
page_recv_t::last_offset: Reflects mtr_t::m_last_offset.
Valid values for last_offset during recovery should be 0 or above 8.
(The first 8 bytes of a page are the checksum and the page number,
and neither are ever updated directly by log records.)
Internally, the special value 1 indicates that the same_page form
will not be allowed for the subsequent record.
mtr_t::page_create(): Take the block descriptor as parameter,
so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE
record will always followed by a subtype byte, because same_page
records must be longer than 1 byte.
trx_undo_page_init(): Combine the writes in WRITE record.
trx_undo_header_create(): Write 4 bytes using a special MEMSET
record that includes 1 bytes of length and 2 bytes of payload.
flst_write_addr(): Define as a static function. Combine the writes.
flst_zero_both(): Replaces two flst_zero_addr() calls.
flst_init(): Do not inline the function.
fsp_free_seg_inode(): Zerofill the whole inode.
fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT
to FIL_NULL when using the physical format.
btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page()
must have been invoked.
fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE.
fil_names_dirty_and_write(): Remove the parameter mtr.
Write the records using a separate mini-transaction object,
because any FILE_ records must be at the start of a mini-transaction log.
recv_recover_page(): Add a fil_space_t* parameter.
After applying log to the a ROW_FORMAT=COMPRESSED page,
invoke buf_zip_decompress() to restore the uncompressed page.
buf_page_io_complete(): Remove the temporary hack to discard the
uncompressed page of a ROW_FORMAT=COMPRESSED page.
page_zip_write_header(): Remove. Use mtr_t::write() or
mtr_t::memset() instead, and update the compressed page frame
separately.
trx_undo_header_add_space_for_xid(): Remove.
trx_undo_seg_create(): Perform the changes that were previously
made by trx_undo_header_add_space_for_xid().
btr_reset_instant(): New function: Reset the table to MariaDB 10.2
or 10.3 format when rolling back an instant ALTER TABLE operation.
page_rec_find_owner_rec(): Merge with the only callers.
page_cur_insert_rec_low(): Combine writes by using a local buffer.
MEMMOVE data from the preceding record whenever feasible
(copying at least 3 bytes).
page_cur_insert_rec_zip(): Combine writes to page header fields.
PageBulk::insertPage(): Issue MEMMOVE records to copy a matching
part from the preceding record.
PageBulk::finishPage(): Combine the writes to the page header
and to the sparse page directory slots.
mtr_t::write(): Only log the least significant (last) bytes
of multi-byte fields that actually differ.
For updating FSP_SIZE, we must always write all 4 bytes to the
redo log, so that the fil_space_set_recv_size() logic in
recv_sys_t::parse() will work.
mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument
instead of a numeric offset to the page frame. Only log the
last bytes of multi-byte fields that actually differ.
In fil_space_crypt_t::write_page0(), we must log also any
unchanged bytes, so that recovery will recognize the record
and invoke fil_crypt_parse().
Future work:
MDEV-21724 Optimize page_cur_insert_rec_low() redo logging
MDEV-21725 Optimize btr_page_reorganize_low() redo logging
MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED
Historically, InnoDB split the redo log into at least 2 files.
MDEV-12061 allowed the minimum to be innodb_log_files_in_group=1,
but it kept the default at innodb_log_files_in_group=2.
Because performance seems to be slightly better with only one log file,
and because implementing an append-only variant of the log would require
a single file, let us define the default to be 1, and have
innodb_log_file_size=96M, to retain the same default total size.
In tests that directly write InnoDB data file pages,
compute the innodb_checksum_algorithm=crc32 checksums,
instead of writing the 0xdeadbeef value used by
innodb_checksum_algorithm=none. In this way, these tests
will not cause failures when executing
./mtr --mysqld=--loose-innodb-checksum-algorithm=strict_crc32
recv_log_recover_10_3(): Determine if a log from MariaDB 10.3 is clean.
recv_find_max_checkpoint(): Allow startup with a clean 10.3 redo log.
srv_prepare_to_delete_redo_log_files(): When starting up with a 10.3 log,
display a "Downgrading redo log" message instead of "Upgrading".
While the redo log format was changed in MariaDB 10.3.2 and 10.3.3
due to MDEV-12288 and MDEV-11369, it should be technically possible
to upgrade from a crashed MariaDB 10.2 instance.
On a related note, it should be possible for Mariabackup 10.3
to create a backup from a running MariaDB Server 10.2.
mlog_id_t: Put back the 10.2 specific redo log record types
MLOG_UNDO_INSERT, MLOG_UNDO_ERASE_END, MLOG_UNDO_INIT,
MLOG_UNDO_HDR_REUSE.
trx_undo_parse_add_undo_rec(): Parse or apply MLOG_UNDO_INSERT.
trx_undo_erase_page_end(): Apply MLOG_UNDO_ERASE_END.
trx_undo_parse_page_init(): Parse or apply MLOG_UNDO_INIT.
trx_undo_parse_page_header_reuse(): Parse or apply MLOG_UNDO_HDR_REUSE.
recv_log_recover_10_2(): Remove. Always parse the redo log from 10.2.
recv_find_max_checkpoint(), recv_recovery_from_checkpoint_start():
Always parse the redo log from MariaDB 10.2.
recv_parse_or_apply_log_rec_body(): Parse or apply
MLOG_UNDO_INSERT, MLOG_UNDO_ERASE_END, MLOG_UNDO_INIT.
srv_prepare_to_delete_redo_log_files(),
innobase_start_or_create_for_mysql(): Upgrade from a previous (supported)
redo log format.
The redo log format will be changed by MDEV-12288, and it could
be changed further during MariaDB 10.3 development. We will
allow startup from a clean redo log from any earlier InnoDB
version (up to MySQL 5.7 or MariaDB 10.3), but we will refuse
to do crash recovery from older-format redo logs.
recv_log_format_0_recover(): Remove a reference to MySQL documentation,
which may be misleading when it comes to MariaDB.
recv_log_recover_10_2(): Check if a MariaDB 10.2.2/MySQL 5.7.9
redo log is clean.
recv_find_max_checkpoint(): Invoke recv_log_recover_10_2() if the
redo log is in the MariaDB 10.2.2 or MySQL 5.7.9 format.
1. Special mode to search in error logs: if SEARCH_RANGE is not set,
the file is considered an error log and the search is performed
since the last CURRENT_TEST: line
2. Number of matches is printed too. "FOUND 5 /foo/ in bar".
Use greedy .* at the end of the pattern if number of matches
isn't stable. If nothing is found it's still "NOT FOUND",
not "FOUND 0".
3. SEARCH_ABORT specifies the prefix of the output.
Can be "NOT FOUND" or "FOUND" as before,
but also "FOUND 5 " if needed.
Provide more useful progress reporting of crash recovery.
recv_sys_t::progress_time: The time of the last report.
recv_sys_t::report(ib_time_t): Determine whether progress should
be reported.
recv_scan_print_counter: Remove.
log_group_read_log_seg(): After after each I/O request, invoke
recv_sys_t::report() and report progress if needed.
recv_apply_hashed_log_recs(): Change the return type back to void
(DB_SUCCESS was always returned), and rename the parameter to last_batch.
At the start of each batch, if there are pages to be recovered,
issue a message.
This fixes MySQL Bug#80788 in MariaDB 10.2.5.
When I made the InnoDB crash recovery more robust by implementing
WL#7142, I also introduced an extra redo log scan pass that can be
shortened.
This fix will slightly extend the InnoDB redo log format that I
introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT
mini-transaction to the end of the log checkpoint page, so that recovery
can jump straight to it without scanning all the preceding redo log.
LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN
of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were
written as 0.
log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter
end_lsn for writing LOG_CHECKPOINT_END_LSN.
log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT
mini-transaction is starting (or at which the redo log ends on
shutdown).
recv_init_crash_recovery(): Remove.
recv_group_scan_log_recs(): Add the parameter checkpoint_lsn.
recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN
and if it is set, start the first scan from it instead of the
checkpoint LSN. Improve some messages and remove bogus assertions.
recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some
file-level redo log records.
recv_parse_or_apply_log_rec_body(): If we have not parsed all redo
log between the checkpoint and the corresponding MLOG_CHECKPOINT
record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records
to recv_init_crash_recovery_spaces().
recv_init_crash_recovery_spaces(): Refuse recovery if
MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
Write only one encryption key to the checkpoint page.
Use 4 bytes of nonce. Encrypt more of each redo log block,
only skipping the 4-byte field LOG_BLOCK_HDR_NO which the
initialization vector is derived from.
Issue notes, not warning messages for rewriting the redo log files.
recv_recovery_from_checkpoint_finish(): Do not generate any redo log,
because we must avoid that before rewriting the redo log files, or
otherwise a crash during a redo log rewrite (removing or adding
encryption) may end up making the database unrecoverable.
Instead, do these tasks in innobase_start_or_create_for_mysql().
Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some
unreachable code and duplicated error messages for log corruption.
LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo
log format.
log_group_t::is_encrypted(), log_t::is_encrypted(): Determine
if the redo log is in encrypted format.
recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED.
srv_prepare_to_delete_redo_log_files(): Display NOTE messages about
adding or removing encryption. Do not issue warnings for redo log
resizing any more.
innobase_start_or_create_for_mysql(): Rebuild the redo logs also when
the encryption changes.
innodb_log_checksums_func_update(): Always use the CRC-32C checksum
if innodb_encrypt_log. If needed, issue a warning
that innodb_encrypt_log implies innodb_log_checksums.
log_group_write_buf(): Compute the checksum on the encrypted
block contents, so that transmission errors or incomplete blocks can be
detected without decrypting.
Rewrite most of the redo log encryption code. Only remember one
encryption key at a time (but remember up to 5 when upgrading from the
MariaDB 10.1 format.)
LOG_CHECKPOINT_ARRAY_END, LOG_CHECKPOINT_SIZE: Remove.
Change some error messages to refer to MariaDB 10.2.2 instead of
MySQL 5.7.9.
recv_find_max_checkpoint_0(): Do not abort when decrypting one of the
checkpoint pages fails.
Test server startup with an empty encrypted redo log from 10.1.21.
FIXME: Pass the encryption parameters. Currently we only test startup
without properly set up encryption.
Remove the dependency on unzip. Instead, generate the InnoDB files
with perl.
log_block_checksum_is_ok(): Correct the error message.
recv_scan_log_recs(): Remove the duplicated error message for
log block checksum mismatch.
innobase_start_or_create_for_mysql(): If the server is in read-only
mode or if innodb_force_recovery>=3, do not try to modify the system
tablespace. (If the doublewrite buffer or the non-core system tables
do not exist, do not try to create them.)
innodb_shutdown(): Relax a debug assertion. If the system tablespace
did not contain a doublewrite buffer and if we started up in
innodb_read_only mode or with innodb_force_recovery>=3, it will not
be created.
dict_create_or_check_sys_tablespace(): Set the flag
srv_sys_tablespaces_open when the tables exist.