1
0
mirror of https://github.com/MariaDB/server.git synced 2025-09-11 05:52:26 +03:00
Commit Graph

449 Commits

Author SHA1 Message Date
Marko Mäkelä
2719cc4925 Merge 10.11 into 11.4 2024-12-02 11:35:34 +02:00
Marko Mäkelä
3d23adb766 Merge 10.6 into 10.11 2024-11-29 13:43:17 +02:00
Marko Mäkelä
895cd553a3 MDEV-32175: Reduce page_align(), page_offset() calls
When srv_page_size and innodb_page_size were introduced,
the functions page_align() and page_offset() got more expensive.
Let us try to replace such calls with simpler pointer arithmetics
with respect to the buffer page frame.

page_rec_get_next_non_del_marked(): Add a page frame as a parameter,
and template<bool comp>.

page_rec_next_get(): A more efficient variant of page_rec_get_next(),
with template<bool comp> and const page_t* parameters.

lock_get_heap_no(): Replaces page_rec_get_heap_no() outside debug checks.

fseg_free_step(), fseg_free_step_not_header(): Take the header block
as a parameter.

Reviewed by: Vladislav Lesin
2024-11-21 11:01:30 +02:00
Oleksandr Byelkin
80abd847da Merge branch '10.11' into 11.1 2024-08-03 09:32:42 +02:00
Oleksandr Byelkin
0e8fb977b0 Merge branch '10.6' into 10.11 2024-08-03 09:15:40 +02:00
Thirunarayanan Balathandayuthapani
ee5f7692d7 MDEV-34357 InnoDB: Assertion failure in file ./storage/innobase/page/page0zip.cc line 4211
During InnoDB root page split, InnoDB does the following
1) First move the root records to the new page(p1)
2) Empty the root, insert the node pointer to the root page
3) Split the new page and make it as child nodes.
4) Finds the split record, allocate another new page(p2)
to the index
5) InnoDB stores the record(ret) predecessor to the supremum
record of the page (p2).
6) In page_copy_rec_list_start(), move the records from p1 to p2
upto the split record
6) Given table is a compressed row format page, InnoDB attempts to
compress the page p2 and failed (due to innodb_compression_level = 0)
7) Since the compression fails, InnoDB gets the number of preceding
records(ret_pos) of a record (ret) on the page (p2)
8) Page (p2) is a new page, ret points to infimum record.
ret_pos can be 0. InnoDB have wrong condition that ret_pos shouldn't
be 0 and returns corruption. InnoDB has similar wrong check in
page_copy_rec_list_end()
2024-07-30 14:36:34 +05:30
Sergei Golubchik
f9807aadef Merge branch '10.11' into 11.0 2024-05-12 12:18:28 +02:00
Oleksandr Byelkin
c9b1ebee2f Merge branch '10.6' into 10.11 2024-04-26 08:02:49 +02:00
Marko Mäkelä
07faba08b9 MDEV-27924 fixup: cmake -DWITH_INNODB_EXTRA_DEBUG=ON 2024-04-23 12:57:39 +03:00
Marko Mäkelä
313c5a1dfb MDEV-31443 [FATAL] InnoDB: Unable to find charset-collation in ibuf_upgrade()
dtype_new_read_for_order_and_null_size(): Correctly assign type->prtype.
This caused the fatal error and crash.

ibuf_merge(): Relax a too strict condition that would result in
[ERROR] InnoDB: Unable to upgrade the change buffer
when there exist buffered changes to redundant secondary indexes, such as
PRIMARY KEY(x), INDEX(x).

ibuf_upgrade(): Modify at most one user tablespace per mini-transaction,
to be crash-safe.

page_cur_insert_rec_zip(), page_cur_delete_rec(): Relax debug assertions
for ibuf_upgrade().

ibuf_log_rebuild_if_needed(): Invoke recv_sys.debug_free() only after
srv_log_rebuild_if_needed() to avoid an assertion failure. This code
is executed when the innodb_log_file_size is changed when upgrading
from 10.x to 11.0.

Tested by: Matthias Leich, Christian Hesse
2023-07-05 12:37:05 +03:00
Sergei Petrunia
c7fe8e51de Merge 10.11 into 11.0 2023-04-17 16:50:01 +03:00
Marko Mäkelä
1d1e0ab2cc Merge 10.6 into 10.8 2023-04-12 15:50:08 +03:00
Marko Mäkelä
5bada1246d Merge 10.5 into 10.6 2023-04-11 16:15:19 +03:00
Oleksandr Byelkin
ac5a534a4c Merge remote-tracking branch '10.4' into 10.5 2023-03-31 21:32:41 +02:00
Marko Mäkelä
5e01255732 Merge 10.11 into 11.0 2023-03-29 17:20:42 +03:00
Marko Mäkelä
dd2fe81122 Merge 10.6 into 10.8 2023-03-29 15:16:42 +03:00
Marko Mäkelä
1efdf67e60 Merge 10.5 into 10.6 2023-03-22 15:54:45 +02:00
Marko Mäkelä
701399ad37 MDEV-30882 Crash on ROLLBACK in a ROW_FORMAT=COMPRESSED table
btr_cur_upd_rec_in_place(): Avoid calling page_zip_write_rec() if we
are not modifying any fields that are stored in compressed format.

btr_cur_update_in_place_zip_check(): New function to check if a
ROW_FORMAT=COMPRESSED record can actually be updated in place.

btr_cur_pessimistic_update(): If the BTR_KEEP_POS_FLAG is not set
(we are in a ROLLBACK and cannot write any BLOBs), ignore the potential
overflow and let page_zip_reorganize() or page_zip_compress() handle it.
This avoids a failure when an attempted UPDATE of an NULL column to 0 is
rolled back. During the ROLLBACK, we would try to move a non-updated
long column to off-page storage in order to avoid a compression failure
of the ROW_FORMAT=COMPRESSED page.

page_zip_write_trx_id_and_roll_ptr(): Remove an assertion that would fail
in row_upd_rec_in_place() because the uncompressed page would already
have been modified there.

This is a 10.5 version of commit ff3d4395d8
(different because of commit 08ba388713).
2023-03-22 15:19:52 +02:00
Marko Mäkelä
ff3d4395d8 MDEV-30882 Crash on ROLLBACK in a ROW_FORMAT=COMPRESSED table
row_upd_rec_in_place(): Avoid calling page_zip_write_rec() if we
are not modifying any fields that are stored in compressed format.

btr_cur_update_in_place_zip_check(): New function to check if a
ROW_FORMAT=COMPRESSED record can actually be updated in place.

btr_cur_pessimistic_update(): If the BTR_KEEP_POS_FLAG is not set
(we are in a ROLLBACK and cannot write any BLOBs), ignore the potential
overflow and let page_zip_reorganize() or page_zip_compress() handle it.
This avoids a failure when an attempted UPDATE of an NULL column to 0 is
rolled back. During the ROLLBACK, we would try to move a non-updated
long column to off-page storage in order to avoid a compression failure
of the ROW_FORMAT=COMPRESSED page.

page_zip_write_trx_id_and_roll_ptr(): Remove an assertion that would fail
in row_upd_rec_in_place() because the uncompressed page would already
have been modified there.

Thanks to Jean-François Gagné for providing a copy of a page that
triggered these bugs on the ROLLBACK of UPDATE and DELETE.

A 10.6 version of this was tested by Matthias Leich using
cmake -DWITH_INNODB_EXTRA_DEBUG=ON a.k.a. UNIV_ZIP_DEBUG.
2023-03-22 14:31:00 +02:00
Marko Mäkelä
26e4ba5eb5 Fix cmake -DWITH_INNODB_EXTRA_DEBUG (UNIV_ZIP_DEBUG) 2023-03-20 14:12:52 +02:00
Marko Mäkelä
2e431ff7e6 Merge 10.11 into 11.0 2023-02-16 13:34:45 +02:00
Oleksandr Byelkin
638625278e Merge branch '10.7' into 10.8 2023-01-31 09:57:52 +01:00
Oleksandr Byelkin
c3a5cf2b5b Merge branch '10.5' into 10.6 2023-01-31 09:31:42 +01:00
Marko Mäkelä
672cdcbb93 MDEV-30404: Inconsistent updates of PAGE_MAX_TRX_ID on ROW_FORMAT=COMPRESSED pages
page_copy_rec_list_start(): Do not update the PAGE_MAX_TRX_ID
on the compressed copy of the page. The modification is supposed
to be logged as part of page_zip_compress() or page_zip_reorganize().
If the page cannot be compressed (due to running out of space),
then page_zip_decompress() must be able to roll back the changes.

This fixes a regression that was introduced in
commit 56f6dab1d0 (MDEV-21174).
2023-01-26 14:47:52 +02:00
Marko Mäkelä
f27e9c8947 MDEV-29694 Remove the InnoDB change buffer
The purpose of the change buffer was to reduce random disk access,
which could be useful on rotational storage, but maybe less so on
solid-state storage.
When we wished to
(1) insert a record into a non-unique secondary index,
(2) delete-mark a secondary index record,
(3) delete a secondary index record as part of purge (but not ROLLBACK),
and the B-tree leaf page where the record belongs to is not in the buffer
pool, we inserted a record into the change buffer B-tree, indexed by
the page identifier. When the page was eventually read into the buffer
pool, we looked up the change buffer B-tree for any modifications to the
page, applied these upon the completion of the read operation. This
was called the insert buffer merge.

We remove the change buffer, because it has been the source of
various hard-to-reproduce corruption bugs, including those fixed in
commit 5b9ee8d819 and
commit 165564d3c3 but not limited to them.

A downgrade will fail with a clear message starting with
commit db14eb16f9 (MDEV-30106).

buf_page_t::state: Merge IBUF_EXIST to UNFIXED and
WRITE_FIX_IBUF to WRITE_FIX.

buf_pool_t::watch[]: Remove.

trx_t: Move isolation_level, check_foreigns, check_unique_secondary,
bulk_insert into the same bit-field. The only purpose of
trx_t::check_unique_secondary is to enable bulk insert into an
empty table. It no longer enables insert buffering for UNIQUE INDEX.

btr_cur_t::thr: Remove. This field was originally needed for change
buffering. Later, its use was extended to cover SPATIAL INDEX.
Much of the time, rtr_info::thr holds this field. When it does not,
we will add parameters to SPATIAL INDEX specific functions.

ibuf_upgrade_needed(): Check if the change buffer needs to be updated.

ibuf_upgrade(): Merge and upgrade the change buffer after all redo log
has been applied. Free any pages consumed by the change buffer, and
zero out the change buffer root page to mark the upgrade completed,
and to prevent a downgrade to an earlier version.

dict_load_tablespaces(): Renamed from
dict_check_tablespaces_and_store_max_id(). This needs to be invoked
before ibuf_upgrade().

btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics.
The change buffer merge does not need this function anymore.

btr_page_alloc(): Renamed from btr_page_alloc_low(). We no longer
allocate any change buffer pages.

btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics.
The change buffer merge does not need this function anymore.

row_search_index_entry(), btr_lift_page_up(): Add a parameter thr
for the SPATIAL INDEX case.

rtr_page_split_and_insert(): Specialized from btr_page_split_and_insert().

rtr_root_raise_and_insert(): Specialized from btr_root_raise_and_insert().

Note: The support for upgrading from the MySQL 3.23 or MySQL 4.0
change buffer format that predates the MySQL 4.1 introduction of
the option innodb_file_per_table was removed in MySQL 5.6.5
as part of mysql/mysql-server@69b6241a79
and MariaDB 10.0.11 as part of 1d0f70c2f8.

In the tests innodb.log_upgrade and innodb.log_corruption, we create
valid (upgraded) change buffer pages.

Tested by: Matthias Leich
2023-01-11 17:59:36 +02:00
Marko Mäkelä
d7a4ce3c80 Merge 10.7 into 10.8 2022-12-13 18:11:24 +02:00
Marko Mäkelä
a8a5c8a1b8 Merge 10.5 into 10.6 2022-12-13 16:58:58 +02:00
Marko Mäkelä
1dc2f35598 Merge 10.4 into 10.5 2022-12-13 14:39:18 +02:00
Marko Mäkelä
fdf43b5c78 Merge 10.3 into 10.4 2022-12-13 11:37:33 +02:00
Thirunarayanan Balathandayuthapani
71c93fb8fd MDEV-28462 Race condition between instant alter and AHI access
- InnoDB AHI tries to access the concurrent instant alter column,
leads to asan failure. Instant alter column should acquire the
clustered index search latch in exclusive mode before changing
the table cache definition.

- Removed the default parameter for the function
btr_search_drop_page_hash_index()

- Addressed the DWITH_INNODB_AHI=0 compilation failure
by passing two parameters from all callers of
btr_search_drop_page_hash_index()
2022-11-22 15:24:44 +05:30
Marko Mäkelä
f46efb4476 Merge 10.7 into 10.8 2022-11-17 21:35:12 +02:00
Marko Mäkelä
24fe53477c MDEV-29603 btr_cur_open_at_index_side() is missing some consistency checks
btr_cur_t: Zero-initialize all fields in the default constructor.

btr_cur_t::index: Remove; it duplicated page_cur.index.

Many functions: Remove arguments that were duplicating
page_cur_t::index and page_cur_t::block.

page_cur_open_level(), btr_pcur_open_level(): Replaces
btr_cur_open_at_index_side() for dict_stats_analyze_index().
At the end, release all latches except the dict_index_t::lock
and the buf_page_t::lock on the requested page.

dict_stats_analyze_index(): Rely on mtr_t::rollback_to_savepoint()
to release all uninteresting page latches.

btr_search_guess_on_hash(): Simplify the logic, and invoke
mtr_t::rollback_to_savepoint().

We will use plain C++ std::vector<mtr_memo_slot_t> for mtr_t::m_memo.
In this way, we can avoid setting mtr_memo_slot_t::object to nullptr
and instead just remove garbage from m_memo.

mtr_t::rollback_to_savepoint(): Shrink the vector. We will be needing this
in dict_stats_analyze_index(), where we will release page latches and
only retain the index->lock in mtr_t::m_memo.

mtr_t::release_last_page(): Release the last acquired page latch.
Replaces btr_leaf_page_release().

mtr_t::release(const buf_block_t&): Release a single page latch.
Used in btr_pcur_move_backward_from_page().

mtr_t::memo_release(): Replaced with mtr_t::release().

mtr_t::upgrade_buffer_fix(): Acquire a latch for a buffer-fixed page.
This replaces the double bookkeeping in btr_cur_t::open_leaf().

Reviewed by: Vladislav Lesin
2022-11-17 08:19:01 +02:00
Marko Mäkelä
71fc31ba36 Merge 10.7 into 10.8 2022-09-07 09:25:46 +03:00
Marko Mäkelä
4318391346 Merge 10.5 into 10.6 2022-09-06 12:28:39 +03:00
Marko Mäkelä
c0470caf5a MDEV-29471 Buffer overflow in page_cur_insert_rec_low()
In commit 244fdc435d (MDEV-29438)
we made sure that if the preceding record is the page infimum record,
no more than 8 bytes will be read from it. But, if the data payload of
the being-inserted record is less than 8 bytes (this can happen in
secondary indexes), we must not compare all 8 bytes.

This was caught by a failure of the test gcol.innodb_virtual_basic
under MemorySanitizer and some builds with AddressSanitizer.
2022-09-06 11:33:52 +03:00
Jan Lindström
dee24f3155 Merge 10.7 into 10.8 2022-09-05 15:59:56 +03:00
Marko Mäkelä
3c7887a85f Merge 10.5 into 10.6 2022-09-05 10:09:03 +03:00
Marko Mäkelä
244fdc435d MDEV-29438 Recovery or backup of instant ALTER TABLE is incorrect
This bug was found in MariaDB Server 10.6 thanks to the
OPT_PAGE_CHECKSUM record that was implemented
in commit 4179f93d28 for catching
this type of recovery failures.

page_cur_insert_rec_low(): If the previous record is the page infimum,
correctly limit the end of the record. We do not want to copy data from
the header of the page supremum. This omission caused the incorrect
recovery of DB_TRX_ID in an instant ALTER TABLE metadata record, because
part of the DB_TRX_ID was incorrectly copied from the n_owned of the
page supremum, which in recovery would be updated after the copying,
but in normal operation would already have been updated at the time the
common prefix was being determined.

log_phys_t::apply(): If a data page is found to be corrupted, do not
flag the log corrupted but instead return a new status APPLIED_CORRUPTED
so that the caller may discard all log for this page. We do not want
the recovery of unrelated pages to fail in recv_recover_page().

No test case is included, because the known test case would only work
in 10.6, and even after this fix, it would trigger another bug in
instant ALTER TABLE crash recovery.
2022-09-05 09:54:47 +03:00
Marko Mäkelä
9cbf8ccf29 Merge 10.7 into 10.8 2022-08-02 08:52:57 +03:00
Marko Mäkelä
63478e72de MDEV-21098: Assertion failure in rec_get_offsets_func()
The function rec_get_offsets_func() used to hit ut_error
due to an invalid rec_get_status() value of a
ROW_FORMAT!=REDUNDANT record. This fix is twofold:
We will not only avoid a crash on corruption in this case,
but we will also make more effort to validate each record
every time we are iterating over index page records.

rec_get_offsets_func(): Do not crash on a corrupted record.

page_rec_get_nth(): Return nullptr on error.

page_dir_slot_get_rec_validate(): Like page_dir_slot_get_rec(),
but validate the pointer and return nullptr on error.

page_cur_search_with_match(), page_cur_search_with_match_bytes(),
page_dir_split_slot(), page_cur_move_to_next():
Indicate failure in a return value.

page_cur_search(): Replaced with page_cur_search_with_match().

rec_get_next_ptr_const(), rec_get_next_ptr(): Replaced with
page_rec_get_next_low().

TODO: rtr_page_split_initialize_nodes(), rtr_update_mbr_field(),
and possibly other SPATIAL INDEX functions fail to properly handle
errors.

Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
Performance tested by: Axel Schwenke
2022-08-01 11:25:50 +03:00
Marko Mäkelä
0af9346079 Merge 10.7 into 10.8 2022-06-09 14:37:53 +03:00
Marko Mäkelä
77b3959b5c MDEV-28457 Crash in page_dir_find_owner_slot()
A prominent remaining source of crashes on corrupted index pages
is page directory corruption.

A frequent caller of page_dir_find_owner_slot() is page_rec_get_prev().
Some of those calls can be replaced with simpler logic that is less
prone to fail.

page_dir_find_owner_slot(),
page_rec_get_prev(), page_rec_get_prev_const(),
btr_pcur_move_to_prev(), btr_pcur_move_to_prev_on_page(),
btr_cur_upd_rec_sys(),
page_delete_rec_list_end(),
rtr_page_copy_rec_list_end_no_locks(),
rtr_page_copy_rec_list_start_no_locks(): Return an error code on failure.

fil_space_t::io(), buf_page_get_low(): Use DB_CORRUPTION for
out-of-bounds page reads.

PageBulk::getSplitRec(), PageBulk::copyOut(): Simplify the code.

btr_validate_level(): Prevent some more CHECK TABLE crashes on
corrupted pages.

btr_block_get(), btr_pcur_move_to_next_page(): Implement some checks that
were previously only part of IndexPurge::next().

IndexPurge::next(): Use btr_pcur_move_to_next_page().
2022-06-08 14:53:24 +03:00
Marko Mäkelä
57d4a242da Merge 10.7 into 10.8 2022-06-06 16:22:09 +03:00
Marko Mäkelä
4179f93d28 MDEV-18976 Implement OPT_PAGE_CHECKSUM log record for improved validation
We will introduce an optional log record OPT_PAGE_CHECKSUM for recording
page checksums, so that more inconsistencies on crash recovery may be
caught.

mtr_t::page_checksum(const buf_page_t&): Write OPT_PAGE_CHECKSUM
(currently not for ROW_FORMAT=COMPRESSED pages).

mtr_t::do_write(): Write OPT_PAGE_CHECKSUM records for all pages
(currently, in debug builds only).

mtr_t::is_logged(): Return whether log should be written.

mtr_t::set_log_mode_sub(const mtr_t&): Set the logging mode of
a sub-minitransaction when another mini-transaction is holding
latches on some modified pages. When creating or freeing BLOB pages,
we may only write OPT_PAGE_CHECKSUM records in the main mini-transaction,
after all changes have been written to the log.

MTR_LOG_SUB: Log mode for a sub-mini-transaction.

mtr_t::free(): Define non-inline, and invoke MarkFreed.

MarkFreed: For any matching page in the mini-transaction log,
change the first entry to say MTR_MEMO_PAGE_X_MODIFY and any subsequent
entries to MTR_MEMO_PAGE_X_FIX.

FindModified: Simplify a condition. MTR_MEMO_MODIFY can only be set
if MTR_MEMO_PAGE_X_FIX or MTR_MEMO_PAGE_SX_FIX are set.

FindBlockX: Consider also MTR_MEMO_PAGE_X_MODIFY.

recv_sys_t::parse(): Store OPT_PAGE_CHECKSUM records.

log_phys_t::apply(): Validate OPT_PAGE_CHECKSUM records.

log_phys_t::page_checksum(): Validate an OPT_PAGE_CHECKSUM record.

Tested by: Matthias Leich
2022-06-06 14:05:01 +03:00
Marko Mäkelä
0b47c126e3 MDEV-13542: Crashing on corrupted page is unhelpful
The approach to handling corruption that was chosen by Oracle in
commit 177d8b0c12
is not really useful. Not only did it actually fail to prevent InnoDB
from crashing, but it is making things worse by blocking attempts to
rescue data from or rebuild a partially readable table.

We will try to prevent crashes in a different way: by propagating
errors up the call stack. We will never mark the clustered index
persistently corrupted, so that data recovery may be attempted by
reading from the table, or by rebuilding the table.

This should also fix MDEV-13680 (crash on btr_page_alloc() failure);
it was extensively tested with innodb_file_per_table=0 and a
non-autoextend system tablespace.

We should now avoid crashes in many cases, such as when a page
cannot be read or allocated, or an inconsistency is detected when
attempting to update multiple pages. We will not crash on double-free,
such as on the recovery of DDL in system tablespace in case something
was corrupted.

Crashes on corrupted data are still possible. The fault injection mechanism
that is introduced in the subsequent commit may help catch more of them.

buf_page_import_corrupt_failure: Remove the fault injection, and instead
corrupt some pages using Perl code in the tests.

btr_cur_pessimistic_insert(): Always reserve extents (except for the
change buffer), in order to prevent a subsequent allocation failure.

btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages().

btr_assert_not_corrupted(), btr_corruption_report(): Remove.
Similar checks are already part of btr_block_get().

FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE.

dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(),
trx_undo_page_get_s_latched(): Replaced with error-checking calls.

trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get().

trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed.

trx_sys_create_sys_pages(): Merged with trx_sysf_create().

dict_check_tablespaces_and_store_max_id(): Do not access
DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot().
Merge dict_check_sys_tables() with this function.

dir_pathname(): Replaces os_file_make_new_pathname().

row_undo_ins_remove_sec(): Do not modify the undo page by adding
a terminating NUL byte to the record.

btr_decryption_failed(): Report decryption failures

dict_set_corrupted_by_space(), dict_set_encrypted_by_space(),
dict_set_corrupted_index_cache_only(): Remove.

dict_set_corrupted(): Remove the constant parameter dict_locked=false.
Never flag the clustered index corrupted in SYS_INDEXES, because
that would deny further access to the table. It might be possible to
repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case
no B-tree leaf page is corrupted.

dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(),
row_purge_skip_uncommitted_virtual_index(): Remove, and refactor
the callers to read dict_index_t::type only once.

dict_table_is_corrupted(): Remove.

dict_index_t::is_btree(): Determine if the index is a valid B-tree.

BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove.

UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger
assertion failures, but error codes being returned.

buf_corrupt_page_release(): Replaced with a direct call to
buf_pool.corrupted_evict().

fil_invalid_page_access_msg(): Never crash on an invalid read;
let the caller of buf_page_get_gen() decide.

btr_pcur_t::restore_position(): Propagate failure status to the caller
by returning CORRUPTED.

opt_search_plan_for_table(): Simplify the code.

row_purge_del_mark(), row_purge_upd_exist_or_extern_func(),
row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(),
row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free()
when no secondary indexes exist.

row_undo_mod_upd_exist_sec(): Simplify the code.

row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT
if the clustered index (and therefore the table) is corrupted, similar
to what we do in row_insert_for_mysql().

fut_get_ptr(): Replace with buf_page_get_gen() calls.

buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION
if the page is marked as freed. For other modes than
BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will
trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED,
we will return nullptr for freed pages, so that the callers
can be simplified. The purge of transaction history will be
a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on
corrupted data.

buf_page_get_low(): Never crash on a corrupted page, but simply
return nullptr.

fseg_page_is_allocated(): Replaces fseg_page_is_free().

fts_drop_common_tables(): Return an error if the transaction
was rolled back.

fil_space_t::set_corrupted(): Report a tablespace as corrupted if
it was not reported already.

fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report
out-of-bounds page access or other errors.

Clean up mtr_t::page_lock()

buf_page_get_low(): Validate the page identifier (to check for
recently read corrupted pages) after acquiring the page latch.

buf_page_t::read_complete(): Flag uninitialized (all-zero) pages
with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch.

mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi().

recv_sys_t::free_corrupted_page(): Only set_corrupt_fs()
if any log records exist for the page. We do not mind if read-ahead
produces corrupted (or all-zero) pages that were not actually needed
during recovery.

recv_recover_page(): Return whether the operation succeeded.

recv_sys_t::recover_low(): Simplify the logic. Check for recovery error.

Thanks to Matthias Leich for testing this extensively and to the
authors of https://rr-project.org for making it easy to diagnose
and fix any failures that were found during the testing.
2022-06-06 14:03:22 +03:00
Marko Mäkelä
600751e769 Merge 10.7 into 10.8 2022-06-02 08:01:17 +03:00
Marko Mäkelä
05d049bdbe Merge 10.5 into 10.6 2022-05-25 14:39:42 +03:00
Marko Mäkelä
a0e4853eff MDEV-28668 Recovery or backup of INSERT may be incorrect
page_cur_insert_rec_low(): When checking for common bytes with
the preceding record, exclude the header bytes of next_rec
that could have been updated by this function.

The scenario where this caused corruption was an insert of
a node pointer record. The child page number was written as
0x203 but recovered as 0x103 because the n_owned field of next_rec
was changed from 1 to 2 before the comparison was invoked.
2022-05-25 13:15:56 +03:00
Marko Mäkelä
133c2129cd Merge 10.7 into 10.8 2022-04-27 10:43:00 +03:00
Marko Mäkelä
2ca1123464 MDEV-26217 Failing assertion: list.count > 0 in ut_list_remove or Assertion `lock->trx == this' failed in dberr_t trx_t::drop_table
This follows up the previous fix in
commit c3c53926c4 (MDEV-26554).

ha_innobase::delete_table(): Work around the insufficient
metadata locking (MDL) during DML operations by acquiring exclusive
InnoDB table locks on all child tables. Previously, this was only
done on TRUNCATE and ALTER.

ibuf_delete_rec(), btr_cur_optimistic_delete(): Do not invoke
lock_update_delete() during change buffer operations.
The revised trx_t::commit(std::vector<pfs_os_file_t>&) will
hold exclusive lock_sys.latch while invoking fil_delete_tablespace(),
which in turn may invoke ibuf_delete_rec().

dict_index_t::has_locking(): A new predicate, replacing the dummy
!dict_table_is_locking_disabled(index->table). Used for skipping lock
operations during ibuf_delete_rec().

trx_t::commit(std::vector<pfs_os_file_t>&): Release the locks
and remove the table from the cache while holding exclusive
lock_sys.latch.

trx_t::commit_in_memory(): Skip release_locks() if dict_operation holds.

trx_t::commit(): Reset dict_operation before invoking commit_in_memory()
via commit_persist().

lock_release_on_drop(): Release locks while lock_sys.latch is
exclusively locked.

lock_table(): Add a parameter for a pointer to the table.
We must not dereference the table before a lock_sys.latch has
been acquired. If the pointer to the table does not match the table
at that point, the table is invalid and DB_DEADLOCK will be returned.

row_ins_foreign_check_on_constraint(): Improve the checks.
Remove a bogus DB_LOCK_WAIT_TIMEOUT return that was needed
before commit c5fd9aa562 (MDEV-25919).

row_upd_check_references_constraints(),
wsrep_row_upd_check_foreign_constraints(): Simplify checks.
2022-04-26 18:09:03 +03:00