InnoDB shutdown assumes that once the server has entered
SRV_SHUTDOWN_FLUSH_PHASE, no change to persistent data is allowed.
It was possible for the master thread to wake up while shutdown
is executing in SRV_SHUTDOWN_FLUSH_PHASE or
even in SRV_SHUTDOWN_LAST_PHASE.
We do not yet know if further crashes at shutdown are possible.
Also, we do not know if all the observed crashes could be explained
by the race conditions that we are now fixing.
srv_shutdown_print_master_pending(): Remove a redundant ut_time() call.
srv_shutdown(): Renamed from srv_master_do_shutdown_tasks().
srv_master_thread(): Do not resume after shutdown has been initiated.
Snappy compression method require that output buffer
used for compression is bigger than input buffer.
Similarly lzo require additional work memory buffer.
Increase the allocated buffer accordingly.
buf_tmp_buffer_t: removed unnecessary lzo_mem, crypt_buf_free and
comp_buf_free.
buf_pool_reserve_tmp_slot: use alligned_alloc and if snappy
available allocate size based on snappy_max_compressed_length and
if lzo is available increase buffer by LZO1X_1_15_MEM_COMPRESS.
fil_compress_page: Remove unneeded lzo mem (we use same buffer)
and if output buffer is not yet allocated allocate based similarly
as above.
Decompression does not require additional work area.
Modify test to use same test as other compression method tests.
In my merge of the MySQL fix for Oracle Bug#23333990 / WL#9513
I overlooked some subsequent revisions to the test, and I also
failed to notice that the test is actually always failing.
Oracle introduced the parameter innodb_stats_include_delete_marked
but failed to consistently take it into account in FOREIGN KEY
constraints that involve CASCADE or SET NULL.
When innodb_stats_include_delete_marked=ON, obviously the purge of
delete-marked records should update the statistics as well.
One more omission was that statistics were never updated on ROLLBACK.
We are fixing that as well, properly taking into account the
parameter innodb_stats_include_delete_marked.
dict_stats_analyze_index_level(): Simplify an expression.
(Using the ternary operator with a constant operand is unnecessary
obfuscation.)
page_scan_method_t: Revert the change done by Oracle. Instead,
examine srv_stats_include_delete_marked directly where it is needed.
dict_stats_update_if_needed(): Renamed from
row_update_statistics_if_needed().
row_update_for_mysql_using_upd_graph(): Assert that the table statistics
are initialized, as guaranteed by ha_innobase::open(). Update the
statistics in a consistent way, both for FOREIGN KEY triggers and
for the main table. If FOREIGN KEY constraints exist, do not dereference
a freed pointer, but cache the proper value of node->is_delete so that
it matches prebuilt->table.
row_purge_record_func(): Update statistics if
innodb_stats_include_delete_marked=ON.
row_undo_ins(): Update statistics (on ROLLBACK of a fresh INSERT).
This is independent of the parameter; the record is not delete-marked.
row_undo_mod(): Update statistics on the ROLLBACK of updating key columns,
or (if innodb_stats_include_delete_marked=OFF) updating delete-marks.
innodb.innodb_stats_persistent: Renamed and extended from
innodb.innodb_stats_del_mark. Reduced the unnecessarily large dataset
from 262,144 to 32 rows. Test both values of the configuration
parameter innodb_stats_include_delete_marked.
Test that purge is updating the statistics.
innodb_fts.innodb_fts_multiple_index: Adjust the result. The test
is performing a ROLLBACK of an INSERT, which now affects the statistics.
include/wait_all_purged.inc: Moved from innodb.innodb_truncate_debug
to its own file.
available
lz4.cmake: Check if shared or static lz4 library has LZ4_compress_default
function and if it has define HAVE_LZ4_COMPRESS_DEFAULT.
fil_compress_page: If HAVE_LZ4_COMPRESS_DEFAULT is defined use
LZ4_compress_default function for compression if not use
LZ4_compress_limitedOutput function.
Introduced a innodb-page-compression.inc file for page compression
tests that will also search .ibd file to verify that pages
are compressed (i.e. used search string is not found). Modified
page compression tests to use this file.
Note that snappy method is not included because of MDEV-12615
InnoDB page compression method snappy mostly does not compress pages
that will be fixed on different commit.
The issue was that my_errno was not set properly when a repair was killed,
which confused the rpl_killed_ddl script.
I also added an extra test line in varchar.inc to ensure we don't give
duplicate error rows.
The InnoDB temporary tablespace is only usable if innodb_read_only=OFF.
It is useless to create the tablespace in read-only mode, because
CREATE TEMPORARY TABLE is disallowed if innodb_read_only, and nothing
can we written to the temporary tablespace if no temporary tables
can be created.
Added a new file ha_xtradb.h where XtraDB parameters are defined. This
file is included in two places to avoid too intrusive change to
ha_innodb.cc that would make future merges harder.
innodb_show_locks_held and innodb_show_verbose_locks should be
implemented (but on different commit).
row_undo_mod_parse_undo_rec(): Relax the too strict assertion and
correct the comment.
innodb.innodb-blob: Force a flush of the redo log right before
killing the server, to ensure that the code path gets exercised.
(The bogus debug assertion failed on the rollback of the statement
UPDATE t3 SET c=REPEAT('j',3000) WHERE a=2 which did not modify
any indexes before the server was killed.)
ha_innobase::check_if_supported_inplace_alter(): For now, reject
ALGORITHM=INPLACE when a non-constant DEFAULT expression is specified
for ADD COLUMN or for changing a NULL column to NOT NULL.
Later, we should evaluate the non-constant column values in these cases.
Actual error number returned from the query depends what point
corrupted page is accessed, is it accessed when we read
one of the pages for result set or is it accessed during
background page read.
table_already_fk_prelocked() was looking for a table in the wrong
list (not the complete list of prelocked tables, but only in its tail,
starting from the current table - which is always empty for the last
added table), so for circular FKs it kept adding same tables to the list
indefinitely.
Actual error number returned from the query depends what point
corrupted page is accessed, is it accessed when we read
one of the pages for result set or is it accessed during
background page read.
In MariaDB Server before 10.2, InnoDB will not be shut down properly
if startup fails. So, Valgrind failures are to be expected.
Disable the test under Valgrind. In 10.2, it should pass with Valgrind.
This only merges MDEV-12253, adapting it to MDEV-12602 which is already
present in 10.2 but not yet in the 10.1 revision that is being merged.
TODO: Error handling in crash recovery needs to be improved.
If a page cannot be decrypted (or read), we should cleanly abort
the startup. If innodb_force_recovery is specified, we should
ignore the problematic page and apply redo log to other pages.
Currently, the test encryption.innodb-redo-badkey randomly fails
like this (the last messages are from cmake -DWITH_ASAN):
2017-05-05 10:19:40 140037071685504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1635994
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT for tablespace 1
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2201] with error Data structure corruption
2017-05-05 10:19:41 140037071685504 [Note] InnoDB: Starting shutdown...
i=================================================================
==5226==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x612000018588 in thread T0
#0 0x736750 in operator delete(void*) (/mariadb/server/build/sql/mysqld+0x736750)
#1 0x1e4833f in LatchCounter::~LatchCounter() /mariadb/server/storage/innobase/include/sync0types.h:599:4
#2 0x1e480b8 in LatchMeta<LatchCounter>::~LatchMeta() /mariadb/server/storage/innobase/include/sync0types.h:786:17
#3 0x1e35509 in sync_latch_meta_destroy() /mariadb/server/storage/innobase/sync/sync0debug.cc:1622:3
#4 0x1e35314 in sync_check_close() /mariadb/server/storage/innobase/sync/sync0debug.cc:1839:2
#5 0x1dfdc18 in innodb_shutdown() /mariadb/server/storage/innobase/srv/srv0start.cc:2888:2
#6 0x197e5e6 in innobase_init(void*) /mariadb/server/storage/innobase/handler/ha_innodb.cc:4475:3
Split the test case so that a server restart is not needed.
Reduce the test cases and use a simpler mechanism for triggering
and waiting for purge.
fil_table_accessible(): Check if a table can be accessed without
enjoying MDL protection.
PROBLEM
When truncating single tablespace tables, we need to scan the entire
buffer pool to remove the pages of the table from the buffer pool.
During this scan and removal dict_sys->mutex is being held ,causing
stalls in other DDL operations.
FIX
Release the dict_sys->mutex during the scan and reacquire it after the
scan. Make sure that purge thread doesn't purge the records of the table
being truncated and background stats collection thread skips the updation
of stats for the table being truncated.
[#rb 14564 Approved by Jimmy and satya ]
Problem :
---------
Information_Schema.referential_constraints (UNIQUE_CONSTRAINT_NAME)
shows NULL for a foreign key constraint after restarting the server.
If any dml or query (select/insert/update/delete) is done on
referenced table, then the constraint name is correctly shown.
Solution :
----------
UNIQUE_CONSTRAINT_NAME column is the key name of the referenced table.
In innodb, FK reference is stored as a list of columns in referenced
table in INNODB_SYS_FOREIGN and INNODB_SYS_FOREIGN_COLS. The referenced
column must have at least one index/key with the referenced column as
prefix but the key name itself is not included in FK metadata. For this
reason, the UNIQUE_CONSTRAINT_NAME is only filled up when the
referenced table is actually loaded in innodb dictionary cache.
The information_schema view calls handler::get_foreign_key_list() on
foreign key table to read the FK metadata. The UNIQUE_CONSTRAINT_NAME
information shows NULL based on whether the referenced table is
already loaded or not.
One way to fix this issue is to load the referenced table while reading
the FK metadata information, if needed.
Reviewed-by: Sunny Bains <sunny.bains@oracle.com>
RB: 14654
Problem :
---------
This bug is filed from the base replication bug#25040331 where the
slave thread times out while INSERT operation waits on GAP lock taken
during Foreign Key validation.
The primary reason for the lock wait is because the statements are
getting replayed in different order. However, we also observed
two things ...
1. The slave thread could always use "Read Committed" isolation for
row level replication.
2. It is not necessary to have GAP locks in "READ Committed" isolation
level in innodb.
This bug is filed to address point(2) to avoid taking GAP locks during
Foreign Key validation.
Solution :
----------
Innodb is primarily designed for "Repeatable Read" and the GAP lock
behaviour is default. For "Read Committed" isolation, we have special
handling in row_search_mvcc to avoid taking the GAP lock while
scanning records.
While looking for Foreign Key, the code is following the default
behaviour taking GAP locks. The suggested fix is to avoid GAP
locking during FK validation similar to normal search operation
(row_search_mvcc) for "Read Committed" isolation level.
Reviewed-by: Sunny Bains <sunny.bains@oracle.com>
RB: 14526
Problem was that bpage was referenced after it was already freed
from LRU. Fixed by adding a new variable encrypted that is
passed down to buf_page_check_corrupt() and used in
buf_page_get_gen() to stop processing page read.
This patch should also address following test failures and
bugs:
MDEV-12419: IMPORT should not look up tablespace in
PageConverter::validate(). This is now removed.
MDEV-10099: encryption.innodb_onlinealter_encryption fails
sporadically in buildbot
MDEV-11420: encryption.innodb_encryption-page-compression
failed in buildbot
MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8
Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing
and replaced these with dict_table_t::file_unreadable. Table
ibd file is missing if fil_get_space(space_id) returns NULL
and encrypted if not. Removed dict_table_t::is_corrupted field.
Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(),
buf_page_decrypt_after_read(), buf_page_encrypt_before_write(),
buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats().
Added test cases when enrypted page could be read while doing
redo log crash recovery. Also added test case for row compressed
blobs.
btr_cur_open_at_index_side_func(),
btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is
NULL.
buf_page_get_zip(): Issue error if page read fails.
buf_page_get_gen(): Use dberr_t for error detection and
do not reference bpage after we hare freed it.
buf_mark_space_corrupt(): remove bpage from LRU also when
it is encrypted.
buf_page_check_corrupt(): @return DB_SUCCESS if page has
been read and is not corrupted,
DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. In read
case only DB_SUCCESS is possible.
buf_page_io_complete(): use dberr_t for error handling.
buf_flush_write_block_low(),
buf_read_ahead_random(),
buf_read_page_async(),
buf_read_ahead_linear(),
buf_read_ibuf_merge_pages(),
buf_read_recv_pages(),
fil_aio_wait():
Issue error if page read fails.
btr_pcur_move_to_next_page(): Do not reference page if it is
NULL.
Introduced dict_table_t::is_readable() and dict_index_t::is_readable()
that will return true if tablespace exists and pages read from
tablespace are not corrupted or page decryption failed.
Removed buf_page_t::key_version. After page decryption the
key version is not removed from page frame. For unencrypted
pages, old key_version is removed at buf_page_encrypt_before_write()
dict_stats_update_transient_for_index(),
dict_stats_update_transient()
Do not continue if table decryption failed or table
is corrupted.
dict0stats.cc: Introduced a dict_stats_report_error function
to avoid code duplication.
fil_parse_write_crypt_data():
Check that key read from redo log entry is found from
encryption plugin and if it is not, refuse to start.
PageConverter::validate(): Removed access to fil_space_t as
tablespace is not available during import.
Fixed error code on innodb.innodb test.
Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown
to innodb-bad-key-change2. Removed innodb-bad-key-change5 test.
Decreased unnecessary complexity on some long lasting tests.
Removed fil_inc_pending_ops(), fil_decr_pending_ops(),
fil_get_first_space(), fil_get_next_space(),
fil_get_first_space_safe(), fil_get_next_space_safe()
functions.
fil_space_verify_crypt_checksum(): Fixed bug found using ASAN
where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly
accessed from row compressed tables. Fixed out of page frame
bug for row compressed tables in
fil_space_verify_crypt_checksum() found using ASAN. Incorrect
function was called for compressed table.
Added new tests for discard, rename table and drop (we should allow them
even when page decryption fails). Alter table rename is not allowed.
Added test for restart with innodb-force-recovery=1 when page read on
redo-recovery cant be decrypted. Added test for corrupted table where
both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted.
Adjusted the test case innodb_bug14147491 so that it does not anymore
expect crash. Instead table is just mostly not usable.
fil0fil.h: fil_space_acquire_low is not visible function
and fil_space_acquire and fil_space_acquire_silent are
inline functions. FilSpace class uses fil_space_acquire_low
directly.
recv_apply_hashed_log_recs() does not return anything.
PROBLEM
By design stats estimation always reading uncommitted data. In this scenario
an uncommitted transaction has deleted all rows in the table. In Innodb
uncommitted delete records are marked as delete but not actually removed
from Btree until the transaction has committed or a read view for the rows
is present.While calculating persistent stats we were ignoring the delete
marked records,since all the records are delete marked we were estimating
the number of rows present in the table as zero which leads to bad plans
in other transaction operating on the table.
Fix
Introduced a system variable called innodb_stats_include_delete_marked
which when enabled includes delete marked records for stat
calculations .
Problem:
=======
Autoincrement value gives duplicate values because of the following reasons.
(1) In InnoDB handler function, current autoincrement value is not changed
based on newly set auto_increment_increment or auto_increment_offset variable.
(2) Handler function does the rounding logic and changes the current
autoincrement value and InnoDB doesn't aware of the change in current
autoincrement value.
Solution:
========
Fix the problem(1), InnoDB always respect the auto_increment_increment
and auto_increment_offset value in case of current autoincrement value.
By fixing the problem (2), handler layer won't change any current
autoincrement value.
Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
RB: 13748
Problem:
=======
Inplace alter algorithm determines the table to be rebuild if the table
undergoes row format change, key block size if handler flag contains only
change table create option. If alter with inplace ignore flag operations and change table create options then it leads to table rebuild operation.
Solution:
========
During the check for rebuild, ignore the inplace ignore flag and check for
table create options.
Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
Reviewed-by: Marko Makela <marko.makela@oracle.com>
RB: 13172