mirror of
https://github.com/MariaDB/server.git
synced 2025-07-27 18:02:13 +03:00
Bug#19330255 WL#7142 - CRASH DURING ALTER TABLE LEADS TO DATA DICTIONARY INCONSISTENCY
The server crashes on a SELECT because of space id mismatch. The mismatch happens if the server crashes during an ALTER TABLE. There are actually two cases of inconsistency, and three fixes needed for the InnoDB problems. We have dictionary data (tablespace or table name) in 3 places: (a) The *.frm file is for the old table definition. (b) The InnoDB data dictionary is for the new table definition. (c) The file system did not rename the tablespace files yet. In this fix, we will not care if the *.frm file is in sync with the InnoDB data dictionary and file system. We will concentrate on the mismatch between (b) and (c). Two scenarios have been mentioned in this bug report. The simpler one first: 1. The changes to SYS_TABLES were committed, and MLOG_FILE_RENAME2 records were written in a single mini-transaction commit. The files were not yet renamed in the file system. 2a. The server is killed, without making a log checkpoint. 3a. The server refuses to start up, because replaying MLOG_FILE_RENAME2 fails. I failed to repeat this myself. I repeated step 3a with a saved dataset. The problem seems to be that MLOG_FILE_RENAME2 replay is incorrectly being skipped when there is no page-redo log or MLOG_FILE_NAME record for the old name of the tablespace. FIX#1: Recover the id-to-name mapping also from MLOG_FILE_RENAME2 records when scanning the redo log. It is not necessary to write MLOG_FILE_NAME records in addition to MLOG_FILE_RENAME2 records for renaming tablespace files. The scenario in the original Description involves a log checkpoint: 1. The changes to SYS_TABLES were committed, and MLOG_FILE_RENAME2 records were written in a single mini-transaction commit. 2. A log checkpoint and a server kill was injected. 3. Crash recovery will see no records (other than the MLOG_CHECKPOINT). 4. dict_check_tablespaces_and_store_max_id() will emit a message about a non-found table #sql-ib22*. 5. A mismatch is triggering the assertion failure. In my test, at step 4 the SYS_TABLES root page (0:8) contains these 3 records right before the page supremum: * delete-marked (committed) name=#sql-ib21* record, with space=10. * name=#sql-ib22*, space=9. * name=t1, space=10. space=10 is the rebuilt table (#sql-ib21*.ibd in the file system). space=9 is the old table (t1.ibd in the file system). The function dict_check_tablespaces_and_store_max_id() will enter t1.ibd with space_id=10 into the fil_system cache without noticing that t1.ibd contains space_id=9, because it invokes fil_open_single_table_tablespace() with validate=false. In MySQL 5.6, the space_id from all *.ibd files are being read when the redo log checkpoint LSN disagrees with the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace. This field is only updated during a clean shutdown, after performing the final log checkpoint. FIX#2: dict_check_tablespaces_and_store_max_id() should pass validate=true to fil_open_single_table_tablespace() when a non-clean shutdown is detected, forcing the first page of each *.ibd file to be read. (We do not want to slow down startup after a normal shutdown.) With FIX#2, the SELECT would fail to find the table. This would introduce a regression, because before WL#7142, a copy of the table was accessible after recovery. FIX#3: Maintain a list of MLOG_FILE_RENAME2 records that have been written to the redo log, but not performed yet in the file system. When performing a checkpoint, re-emit these records to the redo log. In this way, a mismatch between (b) and (c) should be impossible. fil_name_process(): Refactored from fil_name_parse(). Adds an item to the id-to-filename mapping. fil_name_parse(): Parses and applies a MLOG_FILE_NAME, MLOG_FILE_DELETE or MLOG_FILE_RENAME2 record. This implements FIX#1. fil_name_write_rename(): A wrapper function for writing MLOG_FILE_RENAME2 records. fil_op_replay_rename(): Apply MLOG_FILE_RENAME2 records. Replaces fil_op_log_parse_or_replay(), whose logic was moved to fil_name_parse(). fil_tablespace_exists_in_mem(): Return fil_space_t* instead of bool. dict_check_tablespaces_and_store_max_id(): Add the parameter "validate" to implement FIX#2. log_sys->append_on_checkpoint: Extra log records to append in case of a checkpoint. Needed for FIX#3. log_append_on_checkpoint(): New function, to update log_sys->append_on_checkpoint. mtr_write_log(): New function, to append mtr_buf_t to the redo log. fil_names_clear(): Append the data from log_sys->append_on_checkpoint if needed. ha_innobase::commit_inplace_alter_table(): Add any MLOG_FILE_RENAME2 records to log_sys->append_on_checkpoint(), and remove them once the files have been renamed in the file system. mtr_buf_copy_t: A helper functor for copying a mini-transaction log. rb#6282 approved by Jimmy Yang
This commit is contained in:
committed by
Thirunarayanan Balathandayuthapani
parent
6f4534e622
commit
ac2410f6d8
76
mysql-test/suite/innodb/r/alter_kill.result
Normal file
76
mysql-test/suite/innodb/r/alter_kill.result
Normal file
@ -0,0 +1,76 @@
|
||||
#
|
||||
# Bug#16720368 INNODB CRASHES ON BROKEN #SQL*.IBD FILE AT STARTUP
|
||||
#
|
||||
SET GLOBAL innodb_file_per_table=1;
|
||||
CREATE TABLE bug16720368_1 (a INT PRIMARY KEY) ENGINE=InnoDB;
|
||||
CREATE TABLE bug16720368 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
|
||||
INSERT INTO bug16720368 (a) VALUES (1),(2),(3),(4),(5),(6),(7),(8);
|
||||
# Cleanly shutdown mysqld
|
||||
# Corrupt FIL_PAGE_OFFSET in bug16720368.ibd,
|
||||
# and update the checksum to the "don't care" value.
|
||||
# Restart mysqld
|
||||
# This will succeed after a clean shutdown, due to
|
||||
# fil_open_single_table_tablespace(check_space_id=FALSE).
|
||||
SELECT COUNT(*) FROM bug16720368;
|
||||
COUNT(*)
|
||||
8
|
||||
INSERT INTO bug16720368_1 VALUES(1);
|
||||
# Kill the server to do an unclean shutdown.
|
||||
# The corruption should not prevent startup after crash recovery,
|
||||
# because there is no redo log for the corrupted tablespace.
|
||||
# The table is unaccessible, because after a crash we will
|
||||
# validate the tablespace header.
|
||||
SELECT COUNT(*) FROM bug16720368;
|
||||
ERROR 42S02: Table 'test.bug16720368' doesn't exist
|
||||
INSERT INTO bug16720368 VALUES(0,1);
|
||||
ERROR 42S02: Table 'test.bug16720368' doesn't exist
|
||||
# Kill the server to do an unclean shutdown.
|
||||
# The table is readable thanks to innodb-force-recovery.
|
||||
SELECT COUNT(*) FROM bug16720368;
|
||||
COUNT(*)
|
||||
8
|
||||
INSERT INTO bug16720368 VALUES(0,1);
|
||||
ERROR HY000: Operation not allowed when innodb_forced_recovery > 0.
|
||||
# Shut down the server cleanly to hide the corruption.
|
||||
# The table is accessible, because after a clean shutdown we will
|
||||
# NOT validate the tablespace header.
|
||||
# We can modify the existing pages, but we cannot allocate or free
|
||||
# any pages, because that would hit the corruption on page 0.
|
||||
SELECT COUNT(*) FROM bug16720368;
|
||||
COUNT(*)
|
||||
8
|
||||
INSERT INTO bug16720368 VALUES(0,1);
|
||||
# Shut down the server to uncorrupt the data.
|
||||
# Restart the server after uncorrupting the file.
|
||||
INSERT INTO bug16720368 VALUES(9,1);
|
||||
SELECT COUNT(*) FROM bug16720368;
|
||||
COUNT(*)
|
||||
10
|
||||
DROP TABLE bug16720368, bug16720368_1;
|
||||
#
|
||||
# Bug#16735660 ASSERT TABLE2 == NULL, ROLLBACK OF RESURRECTED TXNS,
|
||||
# DICT_TABLE_ADD_TO_CACHE
|
||||
#
|
||||
SET GLOBAL innodb_file_per_table=1;
|
||||
CREATE TEMPORARY TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
|
||||
BEGIN;
|
||||
INSERT INTO t1 VALUES(42);
|
||||
CREATE TABLE bug16735660 (a INT PRIMARY KEY) ENGINE=InnoDB;
|
||||
XA START 'x';
|
||||
INSERT INTO bug16735660 VALUES(1),(2),(3);
|
||||
XA END 'x';
|
||||
XA PREPARE 'x';
|
||||
# Attempt to start without an *.ibd file.
|
||||
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
|
||||
SELECT * FROM bug16735660;
|
||||
a
|
||||
1
|
||||
2
|
||||
3
|
||||
XA RECOVER;
|
||||
formatID gtrid_length bqual_length data
|
||||
1 1 0 x
|
||||
XA ROLLBACK 'x';
|
||||
SELECT * FROM bug16735660;
|
||||
a
|
||||
DROP TABLE bug16735660;
|
Reference in New Issue
Block a user