1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00
Commit Graph

555 Commits

Author SHA1 Message Date
Marko Mäkelä
f27e9c8947 MDEV-29694 Remove the InnoDB change buffer
The purpose of the change buffer was to reduce random disk access,
which could be useful on rotational storage, but maybe less so on
solid-state storage.
When we wished to
(1) insert a record into a non-unique secondary index,
(2) delete-mark a secondary index record,
(3) delete a secondary index record as part of purge (but not ROLLBACK),
and the B-tree leaf page where the record belongs to is not in the buffer
pool, we inserted a record into the change buffer B-tree, indexed by
the page identifier. When the page was eventually read into the buffer
pool, we looked up the change buffer B-tree for any modifications to the
page, applied these upon the completion of the read operation. This
was called the insert buffer merge.

We remove the change buffer, because it has been the source of
various hard-to-reproduce corruption bugs, including those fixed in
commit 5b9ee8d819 and
commit 165564d3c3 but not limited to them.

A downgrade will fail with a clear message starting with
commit db14eb16f9 (MDEV-30106).

buf_page_t::state: Merge IBUF_EXIST to UNFIXED and
WRITE_FIX_IBUF to WRITE_FIX.

buf_pool_t::watch[]: Remove.

trx_t: Move isolation_level, check_foreigns, check_unique_secondary,
bulk_insert into the same bit-field. The only purpose of
trx_t::check_unique_secondary is to enable bulk insert into an
empty table. It no longer enables insert buffering for UNIQUE INDEX.

btr_cur_t::thr: Remove. This field was originally needed for change
buffering. Later, its use was extended to cover SPATIAL INDEX.
Much of the time, rtr_info::thr holds this field. When it does not,
we will add parameters to SPATIAL INDEX specific functions.

ibuf_upgrade_needed(): Check if the change buffer needs to be updated.

ibuf_upgrade(): Merge and upgrade the change buffer after all redo log
has been applied. Free any pages consumed by the change buffer, and
zero out the change buffer root page to mark the upgrade completed,
and to prevent a downgrade to an earlier version.

dict_load_tablespaces(): Renamed from
dict_check_tablespaces_and_store_max_id(). This needs to be invoked
before ibuf_upgrade().

btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics.
The change buffer merge does not need this function anymore.

btr_page_alloc(): Renamed from btr_page_alloc_low(). We no longer
allocate any change buffer pages.

btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics.
The change buffer merge does not need this function anymore.

row_search_index_entry(), btr_lift_page_up(): Add a parameter thr
for the SPATIAL INDEX case.

rtr_page_split_and_insert(): Specialized from btr_page_split_and_insert().

rtr_root_raise_and_insert(): Specialized from btr_root_raise_and_insert().

Note: The support for upgrading from the MySQL 3.23 or MySQL 4.0
change buffer format that predates the MySQL 4.1 introduction of
the option innodb_file_per_table was removed in MySQL 5.6.5
as part of mysql/mysql-server@69b6241a79
and MariaDB 10.0.11 as part of 1d0f70c2f8.

In the tests innodb.log_upgrade and innodb.log_corruption, we create
valid (upgraded) change buffer pages.

Tested by: Matthias Leich
2023-01-11 17:59:36 +02:00
Marko Mäkelä
e581396b7a MDEV-29983 Deprecate innodb_file_per_table
Before commit 6112853cda in MySQL 4.1.1
introduced the parameter innodb_file_per_table, all InnoDB data was
written to the InnoDB system tablespace (often named ibdata1).
A serious design problem is that once the system tablespace has grown to
some size, it cannot shrink even if the data inside it has been deleted.

There are also other design problems, such as the server hang MDEV-29930
that should only be possible when using innodb_file_per_table=0 and
innodb_undo_tablespaces=0 (storing both tables and undo logs in the
InnoDB system tablespace).

The parameter innodb_change_buffering was deprecated
in commit b5852ffbee.
Starting with commit baf276e6d4
(MDEV-19229) the number of innodb_undo_tablespaces can be increased,
so that the undo logs can be moved out of the system tablespace
of an existing installation.

If all these things (tables, undo logs, and the change buffer) are
removed from the InnoDB system tablespace, the only variable-size
data structure inside it is the InnoDB data dictionary.

DDL operations on .ibd files was optimized in
commit 86dc7b4d4c (MDEV-24626).
That should have removed any thinkable performance advantage of
using innodb_file_per_table=0.

Since there should be no benefit of setting innodb_file_per_table=0,
the parameter should be deprecated. Starting with MySQL 5.6 and
MariaDB Server 10.0, the default value is innodb_file_per_table=1.
2023-01-11 17:55:56 +02:00
Daniel Black
069eb169b3 Merge branch '10.8' into 10.9 2022-12-15 08:39:46 +11:00
Marko Mäkelä
3faa68d15b Add missing error suppression 2022-12-14 10:18:37 +02:00
Marko Mäkelä
3ba8828396 Merge 10.8 into 10.9 2022-11-30 12:21:10 +02:00
Marko Mäkelä
b7ae4d442a Merge 10.6 into 10.7 2022-11-30 12:09:01 +02:00
Thirunarayanan Balathandayuthapani
bb29712b45 MDEV-30119 INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION.NAME is NULL for undo tablespaces
- Information_schema.innodb_tablespaces_encryption should print
undo tablespace name as innodb_undo001, innodb_undo002 and soon.

- Encryption test should include undo tablespaces count when
the tests are waiting for the condition to check whether all
tables are encrypted or decrypted.
2022-11-29 19:49:53 +05:30
Marko Mäkelä
d48db97d0a Merge 10.8 into 10.9 2022-11-24 08:35:36 +02:00
Marko Mäkelä
22ab79c430 Merge 10.6 into 10.7 2022-11-24 08:34:45 +02:00
Marko Mäkelä
6d40274f65 Merge 10.5 into 10.6 2022-11-23 18:13:28 +02:00
Marko Mäkelä
cff9939d09 MDEV-30068 Confusing error message when encryption is not available on recovery
fil_name_process(): If fil_ibd_load() returns FIL_LOAD_INVALID,
display the file name and the tablespace identifier.
2022-11-22 15:31:12 +02:00
Marko Mäkelä
b35a048ece Merge 10.8 into 10.9 2022-11-21 10:25:38 +02:00
Marko Mäkelä
d5332086d7 Merge 10.6 into 10.7 2022-11-17 09:19:32 +02:00
Marko Mäkelä
9aea7d83c8 Merge 10.5 into 10.6 2022-11-17 08:37:35 +02:00
Marko Mäkelä
41028d70f6 MDEV-29982 fixup: Relax the test
The log overwrite warnings are not being reliably emitted in all
debug-instrumented environments. It may be related to the
scheduling of some InnoDB internal activity, such as the purging
of committed transaction history.
2022-11-17 08:33:05 +02:00
Marko Mäkelä
ae6ebafd81 Merge 10.5 into 10.6 2022-11-14 15:44:55 +02:00
Marko Mäkelä
e0e096faaa MDEV-29982 Improve the InnoDB log overwrite error message
The InnoDB write-ahead log ib_logfile0 is of fixed size,
specified by innodb_log_file_size. If the tail of the log
manages to overwrite the head (latest checkpoint) of the log,
crash recovery will be broken.

Let us clarify the messages about this, including adding
a message on the completion of a log checkpoint that notes
that the dangerous situation is over.

To reproduce the dangerous scenario, we will introduce the
debug injection label ib_log_checkpoint_avoid_hard, which will
avoid log checkpoints even harder than the previous
ib_log_checkpoint_avoid.

log_t::overwrite_warned: The first known dangerous log sequence number.
Set in log_close() and cleared in log_write_checkpoint_info(),
which will output a "Crash recovery was broken" message.
2022-11-14 12:18:03 +02:00
Oleksandr Byelkin
ebf2121529 Merge branch '10.8' into 10.9 2022-11-01 10:33:44 +01:00
Oleksandr Byelkin
1ebfa2af62 Merge branch '10.6' into 10.7 2022-10-29 19:22:04 +02:00
Marko Mäkelä
aeccbbd926 Merge 10.5 into 10.6
To prevent ASAN heap-use-after-poison in the MDEV-16549 part of
./mtr --repeat=6 main.derived
the initialization of Name_resolution_context was cleaned up.
2022-10-25 14:25:42 +03:00
Marko Mäkelä
9a0b9e3360 Merge 10.4 into 10.5 2022-10-25 11:26:37 +03:00
Marko Mäkelä
667d3fbbb5 Merge 10.3 into 10.4 2022-10-25 10:04:37 +03:00
kurt
e11661a4a2 MDEV-25343 Error log message not helpful when filekey is too long
Add a test related to the Encrypted Key File by following instructions in kb example
https://mariadb.com/kb/en/file-key-management-encryption-plugin/#creating-the-key-file

Reviewed by Daniel Black (with minor formatting and re-org of duplicate
close(f) calls).
2022-10-21 15:54:17 +11:00
Daniel Black
3a62ff7e89 Revert "MDEV-25343 add read secret size in file key plugin"
This reverts commit cee7175b79.
2022-10-19 20:05:59 +11:00
kurt
cee7175b79 MDEV-25343 add read secret size in file key plugin 2022-10-19 16:44:16 +11:00
Oleksandr Byelkin
55e07d9ade Merge branch '10.8' into 10.9 2022-10-04 13:23:13 +02:00
Oleksandr Byelkin
b6ebadaa66 Merge branch '10.6' into 10.7 2022-10-04 07:41:35 +02:00
Marko Mäkelä
829e8111c7 Merge 10.5 into 10.6 2022-09-26 14:34:43 +03:00
Marko Mäkelä
6286a05d80 Merge 10.4 into 10.5 2022-09-26 13:34:38 +03:00
Marko Mäkelä
a69cf6f07e MDEV-29613 Improve WITH_DBUG_TRACE=OFF
In commit 28325b0863
a compile-time option was introduced to disable the macros
DBUG_ENTER and DBUG_RETURN or DBUG_VOID_RETURN.

The parameter name WITH_DBUG_TRACE would hint that it also
covers DBUG_PRINT statements. Let us do that: WITH_DBUG_TRACE=OFF
shall disable DBUG_PRINT() as well.

A few InnoDB recovery tests used to check that some output from
DBUG_PRINT("ib_log", ...) is present. We can live without those checks.

Reviewed by: Vladislav Vaintroub
2022-09-23 13:40:42 +03:00
Jan Lindström
ddd8901cd2 Merge 10.8 into 10.9 2022-09-06 09:45:54 +03:00
Jan Lindström
5fdbb3a72e Merge 10.6 into 10.7 2022-09-05 14:55:47 +03:00
Marko Mäkelä
bdf62ece6c MDEV-29374 InnoDB recovery fails with "Data structure corruption"
recv_sys_t::free_corrupted_page(): Identify the corrupted page in
an error or warning message.

buf_page_free(): Just in case, register the page as modified.
This should already have been done in mtr_t::free() as part of
fseg_free_page_low().

mtr_t::memo_push(): Simplify a condition, so that when invoked
with MTR_MEMO_PAGE_X_MODIFY, we will do the right thing.

fseg_free_page_low(): Remove an accidentally added return statement
that prevented mtr_t::free() from being called. This fixes a regression
that was introduced in
commit 0b47c126e3 (MDEV-13542).
2022-08-31 17:52:16 +03:00
Oleksandr Byelkin
22d455612b Merge branch '10.8' into 10.9 2022-08-09 09:57:13 +02:00
Oleksandr Byelkin
1d48041982 Merge branch '10.6' into 10.7 2022-08-08 17:12:32 +02:00
Marko Mäkelä
c980350438 MDEV-13542 fixup: Improve a recovery error message
A message used to say "failed to read or decrypt"
but the "or decrypt" part was removed in
commit 0b47c126e3
without adjusting rarely needed error message suppressions in some
encryption tests.

Let us improve the error message so that it mentions the file name,
and adjust all error message suppressions in tests.

Thanks to Oleksandr Byelkin for noticing one test failure.
2022-08-05 11:02:18 +03:00
Marko Mäkelä
f53f64b7b9 Merge 10.8 into 10.9 2022-07-28 10:47:33 +03:00
Thirunarayanan Balathandayuthapani
19283c67c6 MDEV-28679 After upgrade to 10.7.3-1 with enabled data-at-rest encryption unable to restore dump file.
- InnoDB bulk insert fails to use encryption buffer for encrypting
the temporary log file. Declare the m_crypt_block, m_crypt_pfx in
row_merge_bulk_t to be used for encrypting the temporary file.
2022-07-26 11:25:56 +05:30
Marko Mäkelä
4a164364d7 Merge 10.8 into 10.9 2022-06-29 16:22:22 +03:00
Marko Mäkelä
cac6f0a8c4 Merge 10.6 into 10.7 2022-06-29 16:17:14 +03:00
Marko Mäkelä
02a313dc56 MDEV-18976 fixup: encryption.innodb-redo-nokeys
This test failure is similar to encryption.innodb-redo-badkey,
which was fixed in commit 0f0a45b2dc.
2022-06-28 12:29:30 +03:00
Marko Mäkelä
b81460f07e Merge 10.8 into 10.9 2022-06-23 13:47:22 +03:00
Marko Mäkelä
5d0496c749 Merge 10.6 into 10.7 2022-06-23 13:20:25 +03:00
Marko Mäkelä
0f0a45b2dc MDEV-18976 fixup: encryption.innodb-redo-badkey
When attempting to recover a database with an incorrect encryption key,
the unencrypted page contents should be expected to differ from what
was written before recovery. Let us suppress some more messages.
This caused intermittent failures, depending on when the latest
log checkpoint was triggered.
2022-06-22 17:27:49 +03:00
Marko Mäkelä
5a33a37682 Merge 10.8 into 10.9 2022-06-07 09:20:07 +03:00
Marko Mäkelä
7e39470e33 Merge 10.6 into 10.7 2022-06-06 14:56:20 +03:00
Marko Mäkelä
0b47c126e3 MDEV-13542: Crashing on corrupted page is unhelpful
The approach to handling corruption that was chosen by Oracle in
commit 177d8b0c12
is not really useful. Not only did it actually fail to prevent InnoDB
from crashing, but it is making things worse by blocking attempts to
rescue data from or rebuild a partially readable table.

We will try to prevent crashes in a different way: by propagating
errors up the call stack. We will never mark the clustered index
persistently corrupted, so that data recovery may be attempted by
reading from the table, or by rebuilding the table.

This should also fix MDEV-13680 (crash on btr_page_alloc() failure);
it was extensively tested with innodb_file_per_table=0 and a
non-autoextend system tablespace.

We should now avoid crashes in many cases, such as when a page
cannot be read or allocated, or an inconsistency is detected when
attempting to update multiple pages. We will not crash on double-free,
such as on the recovery of DDL in system tablespace in case something
was corrupted.

Crashes on corrupted data are still possible. The fault injection mechanism
that is introduced in the subsequent commit may help catch more of them.

buf_page_import_corrupt_failure: Remove the fault injection, and instead
corrupt some pages using Perl code in the tests.

btr_cur_pessimistic_insert(): Always reserve extents (except for the
change buffer), in order to prevent a subsequent allocation failure.

btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages().

btr_assert_not_corrupted(), btr_corruption_report(): Remove.
Similar checks are already part of btr_block_get().

FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE.

dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(),
trx_undo_page_get_s_latched(): Replaced with error-checking calls.

trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get().

trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed.

trx_sys_create_sys_pages(): Merged with trx_sysf_create().

dict_check_tablespaces_and_store_max_id(): Do not access
DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot().
Merge dict_check_sys_tables() with this function.

dir_pathname(): Replaces os_file_make_new_pathname().

row_undo_ins_remove_sec(): Do not modify the undo page by adding
a terminating NUL byte to the record.

btr_decryption_failed(): Report decryption failures

dict_set_corrupted_by_space(), dict_set_encrypted_by_space(),
dict_set_corrupted_index_cache_only(): Remove.

dict_set_corrupted(): Remove the constant parameter dict_locked=false.
Never flag the clustered index corrupted in SYS_INDEXES, because
that would deny further access to the table. It might be possible to
repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case
no B-tree leaf page is corrupted.

dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(),
row_purge_skip_uncommitted_virtual_index(): Remove, and refactor
the callers to read dict_index_t::type only once.

dict_table_is_corrupted(): Remove.

dict_index_t::is_btree(): Determine if the index is a valid B-tree.

BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove.

UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger
assertion failures, but error codes being returned.

buf_corrupt_page_release(): Replaced with a direct call to
buf_pool.corrupted_evict().

fil_invalid_page_access_msg(): Never crash on an invalid read;
let the caller of buf_page_get_gen() decide.

btr_pcur_t::restore_position(): Propagate failure status to the caller
by returning CORRUPTED.

opt_search_plan_for_table(): Simplify the code.

row_purge_del_mark(), row_purge_upd_exist_or_extern_func(),
row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(),
row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free()
when no secondary indexes exist.

row_undo_mod_upd_exist_sec(): Simplify the code.

row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT
if the clustered index (and therefore the table) is corrupted, similar
to what we do in row_insert_for_mysql().

fut_get_ptr(): Replace with buf_page_get_gen() calls.

buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION
if the page is marked as freed. For other modes than
BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will
trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED,
we will return nullptr for freed pages, so that the callers
can be simplified. The purge of transaction history will be
a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on
corrupted data.

buf_page_get_low(): Never crash on a corrupted page, but simply
return nullptr.

fseg_page_is_allocated(): Replaces fseg_page_is_free().

fts_drop_common_tables(): Return an error if the transaction
was rolled back.

fil_space_t::set_corrupted(): Report a tablespace as corrupted if
it was not reported already.

fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report
out-of-bounds page access or other errors.

Clean up mtr_t::page_lock()

buf_page_get_low(): Validate the page identifier (to check for
recently read corrupted pages) after acquiring the page latch.

buf_page_t::read_complete(): Flag uninitialized (all-zero) pages
with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch.

mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi().

recv_sys_t::free_corrupted_page(): Only set_corrupt_fs()
if any log records exist for the page. We do not mind if read-ahead
produces corrupted (or all-zero) pages that were not actually needed
during recovery.

recv_recover_page(): Return whether the operation succeeded.

recv_sys_t::recover_low(): Simplify the logic. Check for recovery error.

Thanks to Matthias Leich for testing this extensively and to the
authors of https://rr-project.org for making it easy to diagnose
and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
Sergei Golubchik
bf2bdd1a1a Merge branch '10.8' into 10.9 2022-05-19 14:07:55 +02:00
Sergei Golubchik
fd132be117 Merge branch '10.6' into 10.7 2022-05-11 11:25:33 +02:00
Marko Mäkelä
f67d65e331 MDEV-28484 InnoDB encryption key rotation is not being marked completed
fil_crypt_flush_space(): Correct a condition that was refactored
incorrectly in commit aaef2e1d8c
2022-05-06 11:23:13 +03:00