InnoDB was writing unnecessary information to the
update undo log records. Most notably, if an indexed column is updated,
the old value of the column would be logged twice: first as part of
the update vector, and then another time because it is an indexed column.
Because the InnoDB undo log record must fit in a single page,
this would cause unnecessary failure of certain updates.
Even after this fix, InnoDB still seems to be unnecessarily logging
indexed column values for non-updated columns. It seems that non-updated
secondary index columns only need to be logged when a PRIMARY KEY
column is updated. To reduce risk, we are not fixing this remaining flaw
in GA versions.
trx_undo_page_report_modify(): Log updated indexed columns only once.
remove remnants of 10.0 bugfix, incorrectly merged into 10.2
Using col_names[i] was obviously, wrong, must've been col_names[ifield->col_no].
incorrect column name resulted in innodb having index unique_id2(id1),
while the server thought it's unique_id2(id4).
But col_names[ifield->col_no] is wrong too, because `table` has non-renamed
columns, so the correct column name is always dict_table_get_col_name(table, ifield->col_no)
When MySQL 5.6.10 introduced innodb_read_only mode, it skipped the
creation of the InnoDB buffer pool dump/restore subsystem in that mode.
Attempts to set the variable innodb_buf_pool_dump_now would have
no effect in innodb_read_only mode, but the corresponding condition
was forgotten in from the other two update functions.
MySQL 5.7.20 would fix the innodb_buffer_pool_load_now,
but not innodb_buffer_pool_load_abort. Let us fix both in MariaDB.
There are two bugs related to failed ADD INDEX and
the InnoDB table cache eviction.
dict_table_close(): Try dropping failed ADD INDEX when releasing
the last table handle, not when releasing the last-but-one.
dict_table_remove_from_cache_low(): Do not invoke
row_merge_drop_indexes() after freeing all index metadata.
Instead, directly invoke row_merge_drop_indexes_dict() to
remove the metadata from the persistent data dictionary
and to free the index pages.
Problem was that we could take page latches on different
order than wat is entitled with SX-lock. To follow the
latching order defined in WL#6326, acquire index->lock X-latch.
This entitles us to acquire page latches in any order for the index.
btr0btr.cc
Document latch rules before and after MariaDB 10.2.2
sync0rw.cc
Document latch compatibility rules better.
btr_defragment_merge_pages
Fix parameter value.
btr_defragment_thread
Acquire X-lock to dict_index_t::lock before restoring
cursor position and continuing defragmentation.
ha_innobase::optimize
Restore defragment feature.
Testing
Add GIS-index and FT-index to table being defragmented.
Defragmentation is not done to GIS-indexes and FT auxiliary
tables.
Reverted incorrect changes done on MDEV-7367 and MDEV-9469. Fixes properly
also related bugs:
MDEV-13668: InnoDB unnecessarily rebuilds table when renaming a column and adding index
MDEV-9469: 'Incorrect key file' on ALTER TABLE
MDEV-9548: Alter table (renaming and adding index) fails with "Incorrect key file for table"
MDEV-10535: ALTER TABLE causes standalone/wsrep cluster crash
MDEV-13640: ALTER TABLE CHANGE and ADD INDEX on auto_increment column fails with "Incorrect key file for table..."
Root cause for all these bugs is the fact that MariaDB .frm file
can contain virtual columns but InnoDB dictionary does not and
previous fixes were incorrect or unnecessarily forced table
rebuilt. In index creation key_part->fieldnr can be bigger than
number of columns in InnoDB data dictionary. We need to skip not
stored fields when calculating correct column number for InnoDB
data dictionary.
dict_table_get_col_name_for_mysql
Remove
innobase_match_index_columns
Revert incorrect change done on MDEV-7367
innobase_need_rebuild
Remove unnecessary rebuild force when column is renamed.
innobase_create_index_field_def
Calculate InnoDB column number correctly and remove
unnecessary column name set.
innobase_create_index_def, innobase_create_key_defs
Remove unneeded fields parameter. Revert unneeded memset.
prepare_inplace_alter_table_dict
Remove unneeded col_names parameter
index_field_t
Remove unneeded col_name member.
row_merge_create_index
Remove unneeded col_names parameter and resolution.
Effected tests:
innodb-alter-table : Add test case for MDEV-13668
innodb-alter : Remove MDEV-13668, MDEV-9469 FIXMEs
and restore original tests
innodb-wl5980-alter : Remove MDEV-13668, MDEV-9469 FIXMEs
and restore original tests
Mariabackup 10.2.7 would delete the redo log files after a successful
--prepare operation. If the user is manually copying the prepared files
instead of using the --copy-back option, it could happen that some old
redo log file would be preserved in the restored location. These old
redo log files could cause corruption of the restored data files when
the server is started up.
We prevent this scenario by creating a "poisoned" redo log file
ib_logfile0 at the end of the --prepare step. The poisoning consists
of simply truncating the file to an empty file. InnoDB will refuse
to start up on an empty redo log file.
copy_back(): Delete all redo log files in the target if the source
file ib_logfile0 is empty. (Previously we did this if the source
file is missing.)
SRV_OPERATION_RESTORE_EXPORT: A new variant of SRV_OPERATION_RESTORE
when the --export option is specified. In this mode, we will keep
deleting all redo log files, instead of truncating the first one.
delete_log_files(): Add a parameter for the first file to delete,
to be passed as 0 or 1.
innobase_start_or_create_for_mysql(): In mariabackup --prepare,
tolerate an empty ib_logfile0 file. Otherwise, require the first
redo log file to be longer than 4 blocks (2048 bytes). Unless
--export was specified, truncate the first log file at the
end of --prepare.
fil_space_extend_must_retry(): If the table is being truncated,
do not call fil_flush_low(). The operation is covered by the
truncate log. File extension during TRUNCATE only occurs
if there are many indexes on the table. With smaller innodb_page_size,
the file extension occurs already with fewer indexes on the table.
Ensure that no adaptive hash index exists for any system tables,
so that the blocked TRUNCATE TABLE t1 will not block the concurrent
TRUNCATE TABLE t2.
This bug is a regression caused by the code refactoring in
commit f5a833c3e0. It was not present
in any release of the MariaDB server. The bug affects table-rebuilding
ALTER TABLE when the source table is in ROW_FORMAT=REDUNDANT and
contains no virtual columns.
row_log_table_low_redundant(): Log virtual column data only if
virtual columns are present.
Another fix (work around MDEV-12699): Ensure that the 1234-byte
truncated page is all zero, so that after data file extension
pads the page with zeroes to full page size, the page will read
as a valid one (consisting of zero bytes only).
The purpose of the test is to ensure that redo log apply will
extend data files before applying page-level redo log records.
The test intermittently failed, because the doublewrite buffer
would sometimes contain data for the pages that the test
truncated. When the test truncates data files, it must also remove
the doublewrite buffer entries, because under normal operation, the
doublewrite buffer would only be written to if the data file already
has been extended.
MySQL 5.7 introduced some optimizations to avoid file I/O during
ALGORITHM=INPLACE operations. While both innodb-index-online and
innodb-table-online will exercise both the merge sort files and
the online log files in 10.1, in 10.2 they would only exercise the
online log files.
Modify one test case in innodb.innodb-table-online so that
skip_pk_sort will not hold. In this way, this test case will
write and read the merge sort files. The other instrumented tests
in innodb-index-online and innodb-table-online will only write
and read online_log files.
This should also fix the MariaDB 10.2.2 bug
MDEV-13826 CREATE FULLTEXT INDEX on encrypted table fails.
MDEV-12634 FIXME: Modify innodb-index-online, innodb-table-online
so that they will write and read merge sort files. InnoDB 5.7
introduced some optimizations to avoid using the files for small tables.
Many collation test results have been adjusted for MDEV-10191.
Introduce innodb_encrypt_log.combinations and prove that
the encryption and decryption take place during both
online ADD INDEX (WL#5266) and online table-rebuilding ALTER (WL#6625).
Import the changes to innodb.innodb-index innodb.innodb-index-debug
Note: As noted in MDEV-13613, due to the behaviour change in MDEV-11114,
DROP COLUMN will not imply DROP/ADD PRIMARY/UNIQUE KEY, like it does
in MySQL. The tests have been adjusted accordingly.
The InnoDB purge subsystem can be best stopped by opening a read view,
for example by START TRANSACTION WITH CONSISTENT SNAPSHOT.
To ensure that everything is purged, use wait_all_purged.inc,
which waits for the History list length in SHOW ENGINE INNODB STATUS
to reach 0. Setting innodb_purge_run_now never guaranteed this.
The bug only affects ROW_FORMAT=DYNAMIC tables.
It is reproducible already with 2 records in the table.
Keep testing with ROW_FORMAT=REDUNDANT just in case.
For running the Galera tests, the variable my_disable_leak_check
was set to true in order to avoid assertions due to memory leaks
at shutdown.
Some adjustments due to MDEV-13625 (merge InnoDB tests from MySQL 5.6)
were performed. The most notable behaviour changes from 10.0 and 10.1
are the following:
* innodb.innodb-table-online: adjustments for the DROP COLUMN
behaviour change (MDEV-11114, MDEV-13613)
* innodb.innodb-index-online-fk: the removal of a (1,NULL) record
from the result; originally removed in MySQL 5.7 in the
Oracle Bug #16244691 fix
377774689b
* innodb.create-index-debug: disabled due to MDEV-13680
(the MySQL Bug #77497 fix was not merged from 5.6 to 5.7.10)
* innodb.innodb-alter-autoinc: MariaDB 10.2 behaves like MySQL 5.6/5.7,
while MariaDB 10.0 and 10.1 assign different values when
auto_increment_increment or auto_increment_offset are used.
Also MySQL 5.6/5.7 exhibit different behaviour between
LGORITHM=INPLACE and ALGORITHM=COPY, so something needs to be tested
and fixed in both MariaDB 10.0 and 10.2.
* innodb.innodb-wl5980-alter: disabled because it would trigger an
InnoDB assertion failure (MDEV-13668 may need additional effort in 10.2)
Import the MySQL 5.6 addition from innodb.create-index to a new debug-only
test, innodb.create-index-debug. The existing test innodb.create-index
also runs on a debug server.
FIXME: MDEV-13668 InnoDB unnecessarily rebuilds table
FIXME: MDEV-13671 InnoDB should use case-insensitive column name comparisons
like the rest of the server
FIXME: MDEV-13640 / Properly fix MDEV-9469 'Incorrect key file' on ALTER TABLE
FIXME: investigate result difference in innodb.innodb-alter-autoinc
and ensure that MariaDB does the right thing with auto_increment_increment
and auto_increment_offset, for both ALGORITHM=INPLACE and ALGORITHM=COPY
(Oracle MySQL behaviour differs between those two).
This bug was a regression caused by MDEV-12698.
On non-leaf pages, the delete-mark flag in the node pointer records is
basically garbage. (Delete-marking only makes sense at the leaf level
anyway. The purpose of the delete-mark is to tell MVCC, locking and purge
that a leaf-level record does not exist in the READ UNCOMMITTED view,
but it used to exist.)
Node pointer records and non-leaf pages are glue that attaches multiple
leaf pages to an index. This glue is supposed to be transparent to the
transactional layer.
When a page is split, InnoDB creates a node pointer record out of the
child page record that the cursor is positioned on. The node pointer record
for the parent page will be a copy of the child page record, amended with
the child page number. If the child page record happened to carry the
delete-mark flag, then the node pointer record would also carry this flag
(even though the flag makes no sense outside child pages).
(On a related note, for the first node pointer record in the first
node pointer page of each tree level, if the MIN_REC_FLAG is set,
the rest of the record contents (except the child page number)
is basically garbage. From this garbage you could deduce at which point
the child was originally split.)
page_scan_method_t: Replace with bool, as there are only 2 values.
dict_stats_scan_page(): Replace the parameter scan_method with is_leaf.
Ignore the bogus (garbage) delete-mark flag if !is_leaf.
When MySQL 5.0.3 introduced InnoDB support for two-phase commit,
it also introduced the questionable logic to roll back XA PREPARE
transactions on startup when innodb_force_recovery is 1 or 2.
Remove this logic in order to avoid unwanted side effects when
innodb_force_recovery is being set for other reasons. That is,
XA PREPARE transactions will always remain in that state until
InnoDB receives an explicit XA ROLLBACK or XA COMMIT request
from the upper layer.
At the time the logic was introduced in MySQL 5.0.3, there already
was a startup parameter that is the preferred way of achieving
the behaviour: --tc-heuristic-recover=ROLLBACK.
row_ins_check_foreign_constraint(): On timeout,
return DB_LOCK_WAIT_TIMEOUT instead of DB_LOCK_WAIT,
so that the lock wait will be properly terminated.
Also, replace some redundant assignments.
It looks like this bug was introduced in MySQL 5.7.8 by:
commit a97f6b91227c7e0fc3151cfe5421891e79c12d19
Author: Annamalai Gurusami <annamalai.gurusami@oracle.com>
Date: Tue Jun 9 16:02:31 2015 +0530
Bug #20953265 INNODB: FAILING ASSERTION: RESULT != FTS_INVALID
MDEV-13498 is a performance regression that was introduced in MariaDB 10.2.2
by commit fec844aca8
which introduced some Galera-specific conditions that were being
evaluated even if the write-set replication was not enabled.
MDEV-13246 Stale rows despite ON DELETE CASCADE constraint
is a correctness regression that was introduced by the same commit.
Especially the subcondition
!(parent && que_node_get_type(parent) == QUE_NODE_UPDATE)
which is equivalent to
!parent || que_node_get_type(parent) != QUE_NODE_UPDATE
makes little sense. If parent==NULL, the evaluation would proceed to the
std::find() expression, which would dereference parent. Because no SIGSEGV
was observed related to this, we can conclude that parent!=NULL always
holds. But then, the condition would be equivalent to
que_node_get_type(parent) != QUE_NODE_UPDATE
which would not make sense either, because the std::find() expression
is actually assuming the opposite when casting parent to upd_node_t*.
It looks like this condition never worked properly, or that
it was never properly tested, or both.
wsrep_must_process_fk(): Helper function to check if FOREIGN KEY
constraints need to be processed. Only evaluate the costly std::find()
expression when write-set replication is enabled.
Also, rely on operator<<(std::ostream&, const id_name_t&) and
operator<<(std::ostream&, const table_name_t&) for pretty-printing
index and table names.
row_upd_sec_index_entry(): Add !wsrep_thd_is_BF() to the condition.
This is applying part of "Galera MW-369 FK fixes"
f37b79c6da
that is described by the following part of the commit comment:
additionally: skipping wsrep_row_upd_check_foreign_constraint if thd has
BF, essentially is applier or replaying
This FK check would be needed only for populating parent row FK keys
in write set, so no use for appliers
trx_set_rw_mode(): Check the flag high_level_read_only instead
of testing srv_force_recovery (innodb_force_recovery) directly.
There is no need to prevent the creation of read-write transactions
if innodb_force_recovery=3 is used. Yes, in that mode any recovered
incomplete transactions will not be rolled back, but these transactions
will continue to hold locks on the records that they have modified.
If the new read-write transactions hit conflicts with already existing
(possibly recovered) transactions, the lock wait timeout mechanism
will work just fine.