The function ibuf_remove_free_page() may be called while the caller
is holding several mutexes or rw-locks. Because of this, this
housekeeping loop may cause performance glitches for operations that
involve tables that are stored in the InnoDB system tablespace.
Also deadlocks might be possible.
The worst impact of all is that due to the mutexes being held, calls to
log_free_check() had to be skipped during this housekeeping.
This means that the cyclic InnoDB redo log may be overwritten.
If the system crashes during this, it would be unable to recover.
The entry point to the problematic code is ibuf_free_excess_pages().
It would make sense to call it before acquiring any mutexes or rw-locks,
in any 'pessimistic' operation that involves the system tablespace.
fseg_create_general(), fseg_alloc_free_page_general(): Do not call
ibuf_free_excess_pages() while potentially holding some latches.
ibuf_remove_free_page(): Do call log_free_check(), like every operation
that is about to generate redo log should do.
ibuf_free_excess_pages(): Remove some assertions that are replaced
by stricter assertions in the log_free_check() that is now called by
ibuf_remove_free_page().
row_ins_sec_index_entry(), row_undo_ins_remove_sec_low(),
row_undo_mod_del_mark_or_remove_sec_low(),
row_undo_mod_del_unmark_sec_and_undo_update(): Call
ibuf_free_excess_pages() if the operation may involve allocating pages
and change buffering in the system tablespace.
Comment from Codership:-
To fix the problem, we changed the certification logic in galera to treat insert
on child table row as exclusive to prevent any operation on referenced
parent table row. At the same time, update and delete on
child table row were demoted to "shared", which makes it possible to
update/delete referenced parent table row, but only in a later transaction.
This change allows somewhat more concurrency for foreign key constrained
transactions, but is still safe for correct certification end result.
The parameter thr of the function btr_cur_optimistic_insert()
is not declared as nonnull, but GCC 7.1.0 with -O3 is wrongly
optimizing away the first part of the condition
UNIV_UNLIKELY(thr && thr_get_trx(thr)->fake_changes)
when the function is being called by row_merge_insert_index_tuples()
with thr==NULL.
The fake_changes is an XtraDB addition. This GCC bug only appears
to have an impact on XtraDB, not InnoDB.
We work around the problem by not attempting to dereference thr
when both BTR_NO_LOCKING_FLAG and BTR_NO_UNDO_LOG_FLAG are set
in the flags. Probably BTR_NO_LOCKING_FLAG alone should suffice.
btr_cur_optimistic_insert(), btr_cur_pessimistic_insert(),
btr_cur_pessimistic_update(): Correct comments that disagree with
usage and with nonnull attributes. No other parameter than thr can
actually be NULL.
row_ins_duplicate_error_in_clust(): Remove an unused parameter.
innobase_is_fake_change(): Unused function; remove.
ibuf_insert_low(), row_log_table_apply(), row_log_apply(),
row_undo_mod_clust_low():
Because we will be passing BTR_NO_LOCKING_FLAG | BTR_NO_UNDO_LOG_FLAG
in the flags, the trx->fake_changes flag will be treated as false,
which is the right thing to do at these low-level operations
(change buffer merge, ALTER TABLE…LOCK=NONE, or ROLLBACK).
This might be fixing actual XtraDB bugs.
Other callers that pass these two flags are also passing thr=NULL,
implying fake_changes=false. (Some callers in ROLLBACK are passing
BTR_NO_LOCKING_FLAG and a nonnull thr. In these callers, fake_changes
better be false, to avoid corruption.)
Problem was that bpage was referenced after it was already freed
from LRU. Fixed by adding a new variable encrypted that is
passed down to buf_page_check_corrupt() and used in
buf_page_get_gen() to stop processing page read.
This patch should also address following test failures and
bugs:
MDEV-12419: IMPORT should not look up tablespace in
PageConverter::validate(). This is now removed.
MDEV-10099: encryption.innodb_onlinealter_encryption fails
sporadically in buildbot
MDEV-11420: encryption.innodb_encryption-page-compression
failed in buildbot
MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8
Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing
and replaced these with dict_table_t::file_unreadable. Table
ibd file is missing if fil_get_space(space_id) returns NULL
and encrypted if not. Removed dict_table_t::is_corrupted field.
Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(),
buf_page_decrypt_after_read(), buf_page_encrypt_before_write(),
buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats().
Added test cases when enrypted page could be read while doing
redo log crash recovery. Also added test case for row compressed
blobs.
btr_cur_open_at_index_side_func(),
btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is
NULL.
buf_page_get_zip(): Issue error if page read fails.
buf_page_get_gen(): Use dberr_t for error detection and
do not reference bpage after we hare freed it.
buf_mark_space_corrupt(): remove bpage from LRU also when
it is encrypted.
buf_page_check_corrupt(): @return DB_SUCCESS if page has
been read and is not corrupted,
DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. In read
case only DB_SUCCESS is possible.
buf_page_io_complete(): use dberr_t for error handling.
buf_flush_write_block_low(),
buf_read_ahead_random(),
buf_read_page_async(),
buf_read_ahead_linear(),
buf_read_ibuf_merge_pages(),
buf_read_recv_pages(),
fil_aio_wait():
Issue error if page read fails.
btr_pcur_move_to_next_page(): Do not reference page if it is
NULL.
Introduced dict_table_t::is_readable() and dict_index_t::is_readable()
that will return true if tablespace exists and pages read from
tablespace are not corrupted or page decryption failed.
Removed buf_page_t::key_version. After page decryption the
key version is not removed from page frame. For unencrypted
pages, old key_version is removed at buf_page_encrypt_before_write()
dict_stats_update_transient_for_index(),
dict_stats_update_transient()
Do not continue if table decryption failed or table
is corrupted.
dict0stats.cc: Introduced a dict_stats_report_error function
to avoid code duplication.
fil_parse_write_crypt_data():
Check that key read from redo log entry is found from
encryption plugin and if it is not, refuse to start.
PageConverter::validate(): Removed access to fil_space_t as
tablespace is not available during import.
Fixed error code on innodb.innodb test.
Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown
to innodb-bad-key-change2. Removed innodb-bad-key-change5 test.
Decreased unnecessary complexity on some long lasting tests.
Removed fil_inc_pending_ops(), fil_decr_pending_ops(),
fil_get_first_space(), fil_get_next_space(),
fil_get_first_space_safe(), fil_get_next_space_safe()
functions.
fil_space_verify_crypt_checksum(): Fixed bug found using ASAN
where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly
accessed from row compressed tables. Fixed out of page frame
bug for row compressed tables in
fil_space_verify_crypt_checksum() found using ASAN. Incorrect
function was called for compressed table.
Added new tests for discard, rename table and drop (we should allow them
even when page decryption fails). Alter table rename is not allowed.
Added test for restart with innodb-force-recovery=1 when page read on
redo-recovery cant be decrypted. Added test for corrupted table where
both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted.
Adjusted the test case innodb_bug14147491 so that it does not anymore
expect crash. Instead table is just mostly not usable.
fil0fil.h: fil_space_acquire_low is not visible function
and fil_space_acquire and fil_space_acquire_silent are
inline functions. FilSpace class uses fil_space_acquire_low
directly.
recv_apply_hashed_log_recs() does not return anything.
Analysis: Problem was that in fil_read_first_page we do find that
table has encryption information and that encryption service
or used key_id is not available. But, then we just printed
fatal error message that causes above assertion.
Fix: When we open single table tablespace if it has encryption
information (crypt_data) store this crypt data to the table
structure. When we open a table and we find out that tablespace
is not available, check has table a encryption information
and from there is encryption service or used key_id is not available.
If it is, add additional warning for SQL-layer.
Merge Facebook commit 25295d003cb0c17aa8fb756523923c77250b3294
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
This adds a pointer to the trx to each mtr.
This allows the trx to be accessed in parts of the code
where it was otherwise not available. This is needed later.
Merged lp:maria/maria-10.0-galera up to revision 3879.
Added a new functions to handler API to forcefully abort_transaction,
producing fake_trx_id, get_checkpoint and set_checkpoint for XA. These
were added for future possiblity to add more storage engines that
could use galera replication.