mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-07-01 03:26:54 +03:00

Author	SHA1	Message	Date
Marko Mäkelä	1122ac978e	MDEV-33545: Improve innodb_doublewrite to cover NO_FSYNC In commit `24648768b4` (MDEV-30136) the parameter innodb_flush_method was deprecated, with no direct replacement for innodb_flush_method=O_DIRECT_NO_FSYNC. Let us change innodb_doublewrite from Boolean to ENUM that can be changed while the server is running: OFF: Assume that writes of innodb_page_size are atomic ON: Prevent torn writes (the default) fast: Like ON, but avoid synchronizing writes to data files The deprecated start-up parameter innodb_flush_method=NO_FSYNC will cause innodb_doublewrite=ON to be changed to innodb_doublewrite=fast, which will prevent InnoDB from making any durable writes to data files. This would normally be done right before the log checkpoint LSN is updated. Depending on the file systems being used and their configuration, this may or may not be safe. The value innodb_doublewrite=fast differs from the previous combination of innodb_doublewrite=ON and innodb_flush_method=O_DIRECT_NO_FSYNC by always invoking os_file_flush() on the doublewrite buffer itself in buf_dblwr_t::flush_buffered_writes_completed(). This should be safer when there are multiple doublewrite batches between checkpoints. Typically, once per second, buf_flush_page_cleaner() would write out up to innodb_io_capacity pages and advance the log checkpoint. Also typically, innodb_io_capacity>128, which is the size of the doublewrite buffer in pages. Should os_file_flush_func() not be invoked between doublewrite batches, writes could be reordered in an unsafe way. The setting innodb_doublewrite=fast could be safe when the doublewrite buffer (the first file of the system tablespace) and the data files reside in the same file system. This was tested by running "./mtr --rr innodb.alter_kill". On the first server startup, with innodb_doublewrite=fast, os_file_flush_func() would only be invoked on the ibdata1 file and possibly ib_logfile0. On subsequent startups with innodb_doublewrite=OFF, os_file_flush_func() will be invoked on the individual data files during log_checkpoint(). Note: The setting debug_no_sync (in the code, my_disable_sync) would disable all durable writes to InnoDB files, which would be much less safe. IORequest::Type: Introduce special values WRITE_DBL and PUNCH_DBL for asynchronous writes that are submitted via the doublewrite buffer. In this way, fil_space_t::use_doublewrite() or buf_dblwr.in_use() will only be consulted during buf_page_t::flush() and the doublewrite buffer can be enabled or disabled without any fear of inconsistency. buf_dblwr_t::block_size: Replaces block_size(). buf_dblwr_t::flush_buffered_writes(): If !in_use() and the doublewrite buffer is empty, just invoke fil_flush_file_spaces() and return. The doublewrite buffer could have been disabled while a batch was in progress. innodb_init_params(): If innodb_flush_method=O_DIRECT_NO_FSYNC, set innodb_doublewrite=fast or innodb_doublewrite=fearless. Thanks to Mark Callaghan for reporting this, and Vladislav Vaintroub for feedback.	2024-04-04 08:12:54 +03:00
Marko Mäkelä	d73baa402a	Merge 10.11 into 11.0	2024-02-20 12:02:01 +02:00
Marko Mäkelä	77b4399545	MDEV-33421 innodb.corrupted_during_recovery fails due to error that the table is corrupted This fixes up the merge commit `7e39470e33` dict_table_open_on_name(): Report ER_TABLE_CORRUPT in a consistent fashion, with a pretty-printed table name.	2024-02-08 14:20:42 +02:00
Marko Mäkelä	5b6134b040	Merge 10.11 into 11.0	2023-11-24 11:20:56 +02:00
Marko Mäkelä	f87c7d1772	Merge 10.6 into 10.11	2023-11-21 12:47:51 +02:00
Marko Mäkelä	4c16ec3e77	MDEV-32050 fixup: Stabilize tests In any test that uses wait_all_purged.inc, ensure that InnoDB tables will be created without persistent statistics. This is a follow-up to commit `cd04673a17` after a similar failure was observed in the innodb_zip.blob test.	2023-11-21 12:42:00 +02:00
Oleksandr Byelkin	48af85db21	Merge branch '10.11' into 11.0	2023-11-08 17:09:44 +01:00
Oleksandr Byelkin	04d9a46c41	Merge branch '10.6' into 10.10	2023-11-08 16:23:30 +01:00
Marko Mäkelä	14685b10df	MDEV-32050: Deprecate&ignore innodb_purge_rseg_truncate_frequency The motivation of introducing the parameter innodb_purge_rseg_truncate_frequency in mysql/mysql-server@28bbd66ea5 and mysql/mysql-server@8fc2120fed seems to have been to avoid stalls due to freeing undo log pages or truncating undo log tablespaces. In MariaDB Server, innodb_undo_log_truncate=ON should be a much lighter operation than in MySQL, because it will not involve any log checkpoint. Another source of performance stalls should be trx_purge_truncate_rseg_history(), which is shrinking the history list by freeing the undo log pages whose undo records have been purged. To alleviate that, we will introduce a purge_truncation_task that will offload this from the purge_coordinator_task. In that way, the next innodb_purge_batch_size pages may be parsed and purged while the pages from the previous batch are being freed and the history list being shrunk. The processing of innodb_undo_log_truncate=ON will still remain the responsibility of the purge_coordinator_task. purge_coordinator_state::count: Remove. We will ignore innodb_purge_rseg_truncate_frequency, and act as if it had been set to 1 (the maximum shrinking frequency). purge_coordinator_state::do_purge(): Invoke an asynchronous task purge_truncation_callback() to free the undo log pages. purge_sys_t::iterator::free_history(): Free those undo log pages that have been processed. This used to be a part of trx_purge_truncate_history(). purge_sys_t::clone_end_view(): Take a new value of purge_sys.head as a parameter, so that it will be updated while holding exclusive purge_sys.latch. This is needed for race-free access to the field in purge_truncation_callback(). Reviewed by: Vladislav Lesin	2023-10-25 09:11:58 +03:00
Oleksandr Byelkin	51f9d62005	Merge branch '10.11' into 11.0	2023-08-09 07:53:48 +02:00
Oleksandr Byelkin	34a8e78581	Merge branch '10.6' into 10.9	2023-08-04 08:01:06 +02:00
Oleksandr Byelkin	6bf8483cac	Merge branch '10.5' into 10.6	2023-08-01 15:08:52 +02:00
Lena Startseva	9854fb6fa7	MDEV-31003: Second execution for ps-protocol This patch adds for "--ps-protocol" second execution of queries "SELECT". Also in this patch it is added ability to disable/enable (--disable_ps2_protocol/--enable_ps2_protocol) second execution for "--ps-prototocol" in testcases.	2023-07-26 17:15:00 +07:00
Marko Mäkelä	e581396b7a	MDEV-29983 Deprecate innodb_file_per_table Before commit `6112853cda` in MySQL 4.1.1 introduced the parameter innodb_file_per_table, all InnoDB data was written to the InnoDB system tablespace (often named ibdata1). A serious design problem is that once the system tablespace has grown to some size, it cannot shrink even if the data inside it has been deleted. There are also other design problems, such as the server hang MDEV-29930 that should only be possible when using innodb_file_per_table=0 and innodb_undo_tablespaces=0 (storing both tables and undo logs in the InnoDB system tablespace). The parameter innodb_change_buffering was deprecated in commit `b5852ffbee`. Starting with commit `baf276e6d4` (MDEV-19229) the number of innodb_undo_tablespaces can be increased, so that the undo logs can be moved out of the system tablespace of an existing installation. If all these things (tables, undo logs, and the change buffer) are removed from the InnoDB system tablespace, the only variable-size data structure inside it is the InnoDB data dictionary. DDL operations on .ibd files was optimized in commit `86dc7b4d4c` (MDEV-24626). That should have removed any thinkable performance advantage of using innodb_file_per_table=0. Since there should be no benefit of setting innodb_file_per_table=0, the parameter should be deprecated. Starting with MySQL 5.6 and MariaDB Server 10.0, the default value is innodb_file_per_table=1.	2023-01-11 17:55:56 +02:00
Marko Mäkelä	28b27b96d0	Cleanup: Remove some ib::logger in recovery messages	2021-12-10 12:26:26 +02:00
Vladislav Vaintroub	d20de4d447	Merge branch '10.6' into 10.7	2021-11-04 10:55:05 +01:00
Marko Mäkelä	993b8edf3f	MDEV-26933 InnoDB fails to detect page number mismatch mtr_t::page_lock(): Validate the page number. ibuf_tree_root_get(): Remove assertions that became redundant. The assertions in btr_validate_level() are kind of redundant as well, but because they are ut_a(), they are also present in release builds, while the ones in mtr_t::page_lock() are only present in debug builds. btr_cur_position(): Do not duplicate an assertion that is part of page_cur_position(). dict_load_tablespace(): Introduce a new option DICT_ERR_IGNORE_TABLESPACE that will suppress loading a tablespace when a table is going to be dropped.	2021-11-01 13:35:47 +02:00
Sergei Golubchik	a52cd4aeda	InnoDB: send "corrupted" error to the user, not only to the log	2021-10-27 15:55:14 +02:00
Marko Mäkelä	02d9b048a2	Merge 10.3 into 10.4	2019-04-05 11:41:03 +03:00
Marko Mäkelä	d5a2bc6a0f	Merge 10.2 into 10.3	2019-04-04 19:41:12 +03:00
Marko Mäkelä	cad56fbaba	MDEV-18733 MariaDB slow start after crash recovery If InnoDB crash recovery was needed, the InnoDB function srv_start() would invoke extra validation, reading something from every InnoDB data file. This should be unnecessary now that MDEV-14717 made RENAME operations crash-safe inside InnoDB (which can be disabled in MariaDB 10.2 by setting innodb_safe_truncate=OFF). dict_check_sys_tables(): Skip tables that would be dropped by row_mysql_drop_garbage_tables(). Perform extra validation only if innodb_safe_truncate=OFF, innodb_force_recovery=0 and crash recovery was needed. dict_load_table_one(): Validate the root page of the table. In this way, we can deny access to corrupted or mismatching tables not only after crash recovery, but also after a clean shutdown.	2019-04-03 19:56:03 +03:00
Marko Mäkelä	514b305dfb	Merge 10.3 into 10.4 The MDEV-17262 commit `26432e49d3` was skipped. In Galera 4, the implementation would seem to require changes to the streaming replication. In the tests archive.rnd_pos main.profiling, disable_ps_protocol for SHOW STATUS and SHOW PROFILE commands until MDEV-18974 has been fixed.	2019-03-20 10:41:32 +02:00
Sergei Golubchik	b64fde8f38	Merge branch '10.2' into 10.3	2019-03-17 13:06:41 +01:00
Marko Mäkelä	d3afdb1e8f	Datafile::validate_first_page(): Change some ERROR to Note On startup, if the InnoDB doublewrite buffer can be used to recover a corrupted page, raising an ERROR about a recoverable error seems inappropriate. Issue Note instead, and adjust tests accordingly. Also, correctly validate the tablespace ID in the files.	2019-03-14 10:15:50 +02:00
Thirunarayanan Balathandayuthapani	c0f47a4a58	MDEV-12026: Implement innodb_checksum_algorithm=full_crc32 MariaDB data-at-rest encryption (innodb_encrypt_tables) had repurposed the same unused data field that was repurposed in MySQL 5.7 (and MariaDB 10.2) for the Split Sequence Number (SSN) field of SPATIAL INDEX. Because of this, MariaDB was unable to support encryption on SPATIAL INDEX pages. Furthermore, InnoDB page checksums skipped some bytes, and there are multiple variations and checksum algorithms. By default, InnoDB accepts all variations of all algorithms that ever existed. This unnecessarily weakens the page checksums. We hereby introduce two more innodb_checksum_algorithm variants (full_crc32, strict_full_crc32) that are special in a way: When either setting is active, newly created data files will carry a flag (fil_space_t::full_crc32()) that indicates that all pages of the file will use a full CRC-32C checksum over the entire page contents (excluding the bytes where the checksum is stored, at the very end of the page). Such files will always use that checksum, no matter what the parameter innodb_checksum_algorithm is assigned to. For old files, the old checksum algorithms will continue to be used. The value strict_full_crc32 will be equivalent to strict_crc32 and the value full_crc32 will be equivalent to crc32. ROW_FORMAT=COMPRESSED tables will only use the old format. These tables do not support new features, such as larger innodb_page_size or instant ADD/DROP COLUMN. They may be deprecated in the future. We do not want an unnecessary file format change for them. The new full_crc32() format also cleans up the MariaDB tablespace flags. We will reserve flags to store the page_compressed compression algorithm, and to store the compressed payload length, so that checksum can be computed over the compressed (and possibly encrypted) stream and can be validated without decrypting or decompressing the page. In the full_crc32 format, there no longer are separate before-encryption and after-encryption checksums for pages. The single checksum is computed on the page contents that is written to the file. We do not make the new algorithm the default for two reasons. First, MariaDB 10.4.2 was a beta release, and the default values of parameters should not change after beta. Second, we did not yet implement the full_crc32 format for page_compressed pages. This will be fixed in MDEV-18644. This is joint work with Marko Mäkelä.	2019-02-19 18:50:19 +02:00
Marko Mäkelä	df51dc28f5	Fix tests for innodb_checksum_algorithm=strict_crc32 In tests that directly write InnoDB data file pages, compute the innodb_checksum_algorithm=crc32 checksums, instead of writing the 0xdeadbeef value used by innodb_checksum_algorithm=none. In this way, these tests will not cause failures when executing ./mtr --mysqld=--loose-innodb-checksum-algorithm=strict_crc32	2019-02-16 12:06:52 +02:00
Marko Mäkelä	61b32df931	Merge 10.2 into 10.3	2018-10-10 06:45:19 +03:00
Marko Mäkelä	00b6c7d8fc	MDEV-16946 innodb.alter_kill failed in buildbot with wrong result Ensure that no redo log checkpoint occurs in a critical section of a recovery test.	2018-10-10 06:31:43 +03:00
Marko Mäkelä	2610c26a53	MDEV-16273 innodb.alter_kill fails Unknown storage engine 'InnoDB' The test is shutting down InnoDB, corrupting a file, and finally restarting InnoDB. Before the shutdown, the test created the table and inserted some records. Before MDEV-12288, there would be no access to the table after server restart, but after MDEV-12288 purge would reset the transaction identifier after the INSERT, and this would sometimes happen after the restart. To make the test deterministic, wait for purge to complete before the shutdown.	2018-10-10 06:14:14 +03:00
Thirunarayanan Balathandayuthapani	91659983eb	Adjust the tests for MariaDB. New added test case: alter_kill in innodb suite.	2018-05-16 15:03:09 +05:30
Marko Mäkelä	ac2410f6d8	Bug#19330255 WL#7142 - CRASH DURING ALTER TABLE LEADS TO DATA DICTIONARY INCONSISTENCY The server crashes on a SELECT because of space id mismatch. The mismatch happens if the server crashes during an ALTER TABLE. There are actually two cases of inconsistency, and three fixes needed for the InnoDB problems. We have dictionary data (tablespace or table name) in 3 places: (a) The .frm file is for the old table definition. (b) The InnoDB data dictionary is for the new table definition. (c) The file system did not rename the tablespace files yet. In this fix, we will not care if the .frm file is in sync with the InnoDB data dictionary and file system. We will concentrate on the mismatch between (b) and (c). Two scenarios have been mentioned in this bug report. The simpler one first: 1. The changes to SYS_TABLES were committed, and MLOG_FILE_RENAME2 records were written in a single mini-transaction commit. The files were not yet renamed in the file system. 2a. The server is killed, without making a log checkpoint. 3a. The server refuses to start up, because replaying MLOG_FILE_RENAME2 fails. I failed to repeat this myself. I repeated step 3a with a saved dataset. The problem seems to be that MLOG_FILE_RENAME2 replay is incorrectly being skipped when there is no page-redo log or MLOG_FILE_NAME record for the old name of the tablespace. FIX#1: Recover the id-to-name mapping also from MLOG_FILE_RENAME2 records when scanning the redo log. It is not necessary to write MLOG_FILE_NAME records in addition to MLOG_FILE_RENAME2 records for renaming tablespace files. The scenario in the original Description involves a log checkpoint: 1. The changes to SYS_TABLES were committed, and MLOG_FILE_RENAME2 records were written in a single mini-transaction commit. 2. A log checkpoint and a server kill was injected. 3. Crash recovery will see no records (other than the MLOG_CHECKPOINT). 4. dict_check_tablespaces_and_store_max_id() will emit a message about a non-found table #sql-ib22. 5. A mismatch is triggering the assertion failure. In my test, at step 4 the SYS_TABLES root page (0:8) contains these 3 records right before the page supremum: delete-marked (committed) name=#sql-ib21* record, with space=10. * name=#sql-ib22, space=9. name=t1, space=10. space=10 is the rebuilt table (#sql-ib21.ibd in the file system). space=9 is the old table (t1.ibd in the file system). The function dict_check_tablespaces_and_store_max_id() will enter t1.ibd with space_id=10 into the fil_system cache without noticing that t1.ibd contains space_id=9, because it invokes fil_open_single_table_tablespace() with validate=false. In MySQL 5.6, the space_id from all .ibd files are being read when the redo log checkpoint LSN disagrees with the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace. This field is only updated during a clean shutdown, after performing the final log checkpoint. FIX#2: dict_check_tablespaces_and_store_max_id() should pass validate=true to fil_open_single_table_tablespace() when a non-clean shutdown is detected, forcing the first page of each .ibd file to be read. (We do not want to slow down startup after a normal shutdown.) With FIX#2, the SELECT would fail to find the table. This would introduce a regression, because before WL#7142, a copy of the table was accessible after recovery. FIX#3: Maintain a list of MLOG_FILE_RENAME2 records that have been written to the redo log, but not performed yet in the file system. When performing a checkpoint, re-emit these records to the redo log. In this way, a mismatch between (b) and (c) should be impossible. fil_name_process(): Refactored from fil_name_parse(). Adds an item to the id-to-filename mapping. fil_name_parse(): Parses and applies a MLOG_FILE_NAME, MLOG_FILE_DELETE or MLOG_FILE_RENAME2 record. This implements FIX#1. fil_name_write_rename(): A wrapper function for writing MLOG_FILE_RENAME2 records. fil_op_replay_rename(): Apply MLOG_FILE_RENAME2 records. Replaces fil_op_log_parse_or_replay(), whose logic was moved to fil_name_parse(). fil_tablespace_exists_in_mem(): Return fil_space_t instead of bool. dict_check_tablespaces_and_store_max_id(): Add the parameter "validate" to implement FIX#2. log_sys->append_on_checkpoint: Extra log records to append in case of a checkpoint. Needed for FIX#3. log_append_on_checkpoint(): New function, to update log_sys->append_on_checkpoint. mtr_write_log(): New function, to append mtr_buf_t to the redo log. fil_names_clear(): Append the data from log_sys->append_on_checkpoint if needed. ha_innobase::commit_inplace_alter_table(): Add any MLOG_FILE_RENAME2 records to log_sys->append_on_checkpoint(), and remove them once the files have been renamed in the file system. mtr_buf_copy_t: A helper functor for copying a mini-transaction log. rb#6282 approved by Jimmy Yang	2018-05-16 15:03:09 +05:30

31 Commits