mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-09-09 18:40:27 +03:00

Author	SHA1	Message	Date
Marko Mäkelä	1bd681c8b3	MDEV-25506 (3 of 3): Do not delete .ibd files before commit This is a complete rewrite of DROP TABLE, also as part of other DDL, such as ALTER TABLE, CREATE TABLE...SELECT, TRUNCATE TABLE. The background DROP TABLE queue hack is removed. If a transaction needs to drop and create a table by the same name (like TRUNCATE TABLE does), it must first rename the table to an internal #sql-ib name. No committed version of the data dictionary will include any #sql-ib tables, because whenever a transaction renames a table to a #sql-ib name, it will also drop that table. Either the rename will be rolled back, or the drop will be committed. Data files will be unlinked after the transaction has been committed and a FILE_RENAME record has been durably written. The file will actually be deleted when the detached file handle returned by fil_delete_tablespace() will be closed, after the latches have been released. It is possible that a purge of the delete of the SYS_INDEXES record for the clustered index will execute fil_delete_tablespace() concurrently with the DDL transaction. In that case, the thread that arrives later will wait for the other thread to finish. HTON_TRUNCATE_REQUIRES_EXCLUSIVE_USE: A new handler flag. ha_innobase::truncate() now requires that all other references to the table be released in advance. This was implemented by Monty. ha_innobase::delete_table(): If CREATE TABLE..SELECT is detected, we will "hijack" the current transaction, drop the table in the current transaction and commit the current transaction. This essentially fixes MDEV-21602. There is a FIXME comment about making the check less failure-prone. ha_innobase::truncate(), ha_innobase::delete_table(): Implement a fast path for temporary tables. We will no longer allow temporary tables to use the adaptive hash index. dict_table_t::mdl_name: The original table name for the purpose of acquiring MDL in purge, to prevent a race condition between a DDL transaction that is dropping a table, and purge processing undo log records of DML that had executed before the DDL operation. For #sql-backup- tables during ALTER TABLE...ALGORITHM=COPY, the dict_table_t::mdl_name will differ from dict_table_t::name. dict_table_t::parse_name(): Use mdl_name instead of name. dict_table_rename_in_cache(): Update mdl_name. For the internal FTS_ tables of FULLTEXT INDEX, purge would acquire MDL on the FTS_ table name, but not on the main table, and therefore it would be able to run concurrently with a DDL transaction that is dropping the table. Previously, the DROP TABLE queue hack prevented a race between purge and DDL. For now, we introduce purge_sys.stop_FTS() to prevent purge from opening any table, while a DDL transaction that may drop FTS_ tables is in progress. The function fts_lock_table(), which will be invoked before the dictionary is locked, will wait for purge to release any table handles. trx_t::drop_table_statistics(): Drop statistics for the table. This replaces dict_stats_drop_index(). We will drop or rename persistent statistics atomically as part of DDL transactions. On lock conflict for dropping statistics, we will fail instantly with DB_LOCK_WAIT_TIMEOUT, because we will be holding the exclusive data dictionary latch. trx_t::commit_cleanup(): Separated from trx_t::commit_in_memory(). Relax an assertion around fts_commit() and allow DB_LOCK_WAIT_TIMEOUT in addition to DB_DUPLICATE_KEY. The call to fts_commit() is entirely misplaced here and may obviously break the consistency of transactions that affect FULLTEXT INDEX. It needs to be fixed separately. dict_table_t::n_foreign_key_checks_running: Remove (MDEV-21175). The counter was a work-around for missing meta-data locking (MDL) on the SQL layer, and not really needed in MariaDB. ER_TABLE_IN_FK_CHECK: Replaced with ER_UNUSED_28. HA_ERR_TABLE_IN_FK_CHECK: Remove. row_ins_check_foreign_constraints(): Do not acquire dict_sys.latch either. The SQL-layer MDL will protect us. This was reviewed by Thirunarayanan Balathandayuthapani and tested by Matthias Leich.	2021-06-09 17:06:07 +03:00
Marko Mäkelä	49e2c8f0a6	MDEV-25743: Unnecessary copying of table names in InnoDB dictionary Many InnoDB data dictionary cache operations require that the table name be copied so that it will be NUL terminated. (For example, SYS_TABLES.NAME is not guaranteed to be NUL-terminated.) dict_table_t::is_garbage_name(): Check if a name belongs to the background drop table queue. dict_check_if_system_table_exists(): Remove. dict_sys_t::load_sys_tables(): Load the non-hard-coded system tables SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL on startup. dict_sys_t::create_or_check_sys_tables(): Replaces dict_create_or_check_foreign_constraint_tables() and dict_create_or_check_sys_virtual(). dict_sys_t::load_table(): Replaces dict_table_get_low() and dict_load_table(). dict_sys_t::find_table(): Renamed from get_table(). dict_sys_t::sys_tables_exist(): Check whether all the non-hard-coded tables SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL exist. trx_t::has_stats_table_lock(): Moved to dict0stats.cc. Some error messages will now report table names in the internal databasename/tablename format, instead of `databasename`.`tablename`.	2021-05-21 18:03:40 +03:00
Marko Mäkelä	3a427c568b	Cleanup: Remove useless message "If you are installing InnoDB"	2021-05-14 08:26:51 +03:00
Marko Mäkelä	356c149603	Merge 10.5 into 10.6	2021-03-26 11:50:32 +02:00
Marko Mäkelä	2e67b9f665	MDEV-25265: ALTER TABLE...IMPORT TABLESPACE fails after DROP INDEX A side effect of the MDEV-24589 bug fix is that if FLUSH TABLE...FOR EXPORT is initiated before the history of an earlier DROP INDEX operation has been purged, then the data file will contain allocated pages that belonged to the dropped indexes. These pages would never be freed after a subsequent IMPORT TABLESPACE. We will work around this regression by making IMPORT TABLESPACE tolerate pages that refer to an unknown index.	2021-03-26 10:57:26 +02:00
Vladislav Vaintroub	aa2ff62082	MDEV-9077 Use sys schema in bootstrapping, incl. mtr	2021-03-18 08:02:48 +01:00
Marko Mäkelä	7a4fbb55b0	MDEV-25105 Remove innodb_checksum_algorithm values none,innodb,... Historically, InnoDB supported a buggy page checksum algorithm that did not compute a checksum over the full page. Later, well before MySQL 4.1 introduced .ibd files and the innodb_file_per_table option, the algorithm was corrected and the first 4 bytes of each page were redefined to be a checksum. The original checksum was so slow that an option to disable page checksum was introduced for benchmarketing purposes. The Intel Nehalem microarchitecture introduced the SSE4.2 instruction set extension, which includes instructions for faster computation of CRC-32C. In MySQL 5.6 (and MariaDB 10.0), innodb_checksum_algorithm=crc32 was implemented to make of that. As that option was changed to be the default in MySQL 5.7, a bug was found on big-endian platforms and some work-around code was added to weaken that checksum further. MariaDB disables that work-around by default since MDEV-17958. Later, SIMD-accelerated CRC-32C has been implemented in MariaDB for POWER and ARM and also for IA-32/AMD64, making use of carry-less multiplication where available. Long story short, innodb_checksum_algorithm=crc32 is faster and more secure than the pre-MySQL 5.6 checksum, called innodb_checksum_algorithm=innodb. It should have removed any need to use innodb_checksum_algorithm=none. The setting innodb_checksum_algorithm=crc32 is the default in MySQL 5.7 and MariaDB Server 10.2, 10.3, 10.4. In MariaDB 10.5, MDEV-19534 made innodb_checksum_algorithm=full_crc32 the default. It is even faster and more secure. The default settings in MariaDB do allow old data files to be read, no matter if a worse checksum algorithm had been used. (Unfortunately, before innodb_checksum_algorithm=full_crc32, the data files did not identify which checksum algorithm is being used.) The non-default settings innodb_checksum_algorithm=strict_crc32 or innodb_checksum_algorithm=strict_full_crc32 would only allow CRC-32C checksums. The incompatibility with old data files is why they are not the default. The newest server not to support innodb_checksum_algorithm=crc32 were MySQL 5.5 and MariaDB 5.5. Both have reached their end of life. A valid reason for using innodb_checksum_algorithm=innodb could have been the ability to downgrade. If it is really needed, data files can be converted with an older version of the innochecksum utility. Because there is no good reason to allow data files to be written with insecure checksums, we will reject those option values: innodb_checksum_algorithm=none innodb_checksum_algorithm=innodb innodb_checksum_algorithm=strict_none innodb_checksum_algorithm=strict_innodb Furthermore, the following innochecksum options will be removed, because only strict crc32 will be supported: innochecksum --strict-check=crc32 innochecksum -C crc32 innochecksum --write=crc32 innochecksum -w crc32 If a user wishes to convert a data file to use a different checksum (so that it might be used with the no-longer-supported MySQL 5.5 or MariaDB 5.5, which do not support IMPORT TABLESPACE nor system tablespace format changes that were made in MariaDB 10.3), then the innochecksum tool from MariaDB 10.2, 10.3, 10.4, 10.5 or MySQL 5.7 can be used. Reviewed by: Thirunarayanan Balathandayuthapani	2021-03-11 12:46:18 +02:00
Marko Mäkelä	5407117a59	MDEV-22343 Remove SYS_TABLESPACES and SYS_DATAFILES The InnoDB internal tables SYS_TABLESPACES and SYS_DATAFILES as well as the INFORMATION_SCHEMA views INNODB_SYS_TABLESPACES and INNODB_SYS_DATAFILES were introduced in MySQL 5.6 for no good reason in mysql/mysql-server/commit/e9255a22ef16d612a8076bc0b34002bc5a784627 when the InnoDB support for the DATA DIRECTORY attribute was introduced. The file system should be the authoritative source of information on files. Storing information about file system paths in the file system (symlinks, or even the .isl files that were unfortunately chosen as the solution) is sufficient. If information is additionally stored in some hidden tables inside the InnoDB system tablespace, everything unnecessarily becomes more complicated, because more copies of data mean more opportunity for the copies to be out of sync, and because modifying the data in the system tablespace in the desired way might not be possible at all without modifying the InnoDB source code. So, the copy in the system tablespace basically is a redundant, non-authoritative source of information. We will stop creating or accessing the system tables SYS_TABLESPACES and SYS_DATAFILES. We will also remove the view INFORMATION_SCHEMA.INNODB_SYS_DATAFILES along with SYS_DATAFILES. The view INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES will be repurposed to directly reflect fil_system.space_list. The column PAGE_SIZE, which would always contain the value of the GLOBAL read-only variable innodb_page_size, is removed. The column ZIP_PAGE_SIZE, which would actually contain the physical page size of a page, is renamed to PAGE_SIZE. Finally, a new column FILENAME is added, as a replacement of SYS_DATAFILES.PATH. This will also address MDEV-21801 (files that were created before upgrading to MySQL 5.6 or MariaDB 10.0 or later were never registered in SYS_TABLESPACES or SYS_DATAFILES) and MDEV-21801 (information about the system tablespace is not stored in SYS_TABLESPACES or SYS_DATAFILES).	2020-11-11 11:15:11 +02:00
Marko Mäkelä	9bc874a594	MDEV-23497 Make ROW_FORMAT=COMPRESSED read-only by default Let us introduce the parameter innodb_read_only_compressed that is ON by default, making any ROW_FORMAT=COMPRESSED tables read-only. I developed the ROW_FORMAT=COMPRESSED format based on Heikki Tuuri's rough design between 2005 and 2008. It might have been a good idea back then, but no proper benchmarks were ever run to validate the design or the implementation. The format has been more or less obsolete for years. It limits innodb_page_size to 16384 bytes (the default), and instant ALTER TABLE is not supported. This is the first step towards deprecating and removing write support for ROW_FORMAT=COMPRESSED tables.	2020-11-11 11:15:11 +02:00
Marko Mäkelä	0e34bb3e97	Merge 10.5 into 10.6	2020-08-12 14:39:53 +03:00
Marko Mäkelä	6f84150c21	MDEV-23422 innodb_zip.restart fails with extra #sql-ib*.ibd The background DROP TABLE queue may be blocked for some more time due to MDEV-16678. Let us apply similar adjustments as earlier: commit `6af00b2cc6` commit `89633995e4` commit `ccd87d34a4`	2020-08-10 17:03:54 +03:00
Marko Mäkelä	e685809a3b	MDEV-23397 Remove deprecated InnoDB options in 10.6	2020-08-04 12:51:59 +03:00
Marko Mäkelä	5203bc10f1	Merge 10.4 into 10.5	2020-03-21 11:37:10 +02:00
Marko Mäkelä	bd3c8f47cd	Merge 10.3 into 10.4	2020-03-20 22:06:55 +02:00
Marko Mäkelä	44298e4dea	Merge 10.2 into 10.3 Also, clean up the test innodb_gis.geometry a little further.	2020-03-20 18:12:17 +02:00
Marko Mäkelä	b034d708c8	MDEV-21549: Clean up the import/export tests Remove CREATE/DROP database. Remove some unnecessary suppressions, replacements, and SQL statements. Populate tables via have_sequence.inc to avoid the creation of explicit InnoDB record locks in INSERT...SELECT. This will remove some gaps in AUTO_INCREMENT values.	2020-03-20 16:34:15 +02:00
Marko Mäkelä	adb4117631	MDEV-21892: Assertion ...row_get_rec_trx_id... failed on SELECT btr_cur_upd_rec_in_place(): Invoke page_zip_rec_set_deleted() for ROW_FORMAT=COMPRESSED pages, so that the change will be written to the redo log. This part of crash recovery was broken in commit `08ba388713` (MDEV-12353).	2020-03-09 11:38:34 +02:00
Marko Mäkelä	802a6b0a33	Reduce innodb_log_buffer_size	2020-02-18 10:54:56 +02:00
Marko Mäkelä	5bea43f5e0	MDEV-12353: Deprecate and ignore innodb_log_compressed_pages page_zip_compress_write_log_no_data(): Remove. We no longer write the MLOG_ZIP_PAGE_COMPRESS_NO_DATA record. Instead, we will write MLOG_ZIP_PAGE_COMPRESS records.	2020-02-13 18:19:13 +02:00
Marko Mäkelä	28c89b7151	Merge 10.4 into 10.5	2019-12-16 07:47:17 +02:00
Marko Mäkelä	8fa759a576	Merge 10.3 into 10.4 We disable the MDEV-21189 test galera.galera_partition because it times out.	2019-12-13 17:30:37 +02:00
Marko Mäkelä	0a20e5ab77	Merge 10.2 into 10.3	2019-12-12 14:41:51 +02:00
Marko Mäkelä	b1f2d3a8c8	MDEV-21256: Replace the 64-bit LCG with a 32-bit Galois LFSR We should not need anywhere near 32 bits of entropy, so we might just limit ourselves to a 32-bit random number generator. Also, it might be cheaper to use exclusive-or, bit shifting and conditional jumps, instead of multiplication and addition. We use relaxed atomic operations on the global random number generator state in order in an attempt to silence any warnings about race conditions. There is an obvious race condition between the load and store in ut_rnd_gen(), but we do not think that it matters much that the state of the random number generator could 'stutter'. This change seems makes the 'uncompress_ops' nondeterministic in innodb_zip.cmp_per_index after the restart. It looks like there is an inherent race condition in the test, because the table could be opened for InnoDB statistics recalculation already before innodb_cmp_per_index_enabled was set. We might end up having uncompress_ops anywhere between 0 and 9, or perhaps even more. Let us remove that part of the test.	2019-12-10 16:59:34 +02:00
Marko Mäkelä	d3b2625ba0	MDEV-21259 Assertion failed in mtr_t::write() btr_free_externally_stored_field(): Pass w=mtr_t::OPT to note that the BTR_EXTERN_LEN is not necessarily changing when a multi-page ROW_FORMAT=COMPRESSED off-page column is being freed, and to allow redundant writes to the redo log to be optimized away. Ever since commit `56f6dab1d0` the refactored function mtr_t::write() asserts by default that the page contents is being changed.	2019-12-09 21:11:08 +02:00
Marko Mäkelä	ae90f8431b	Merge 10.4 into 10.5	2019-11-14 14:49:20 +02:00
Marko Mäkelä	746ee78535	MDEV-20949: Merge 10.3 into 10.4	2019-11-14 13:22:29 +02:00
Marko Mäkelä	4ded5fb9ac	MDEV-20949: Merge 10.2 into 10.3 In the test innodb.instant_alter,4k we would be flagging an error for too large row size. That error was previously only being reported if the table was being rebuilt. Thus, this merge is fixing a small omission in MDEV-11369 (instant ADD COLUMN).	2019-11-14 11:26:49 +02:00
Eugene Kosov	98694ab0cb	MDEV-20949 Stop issuing 'row size' error on DML Move row size check to early CREATE/ALTER TABLE phase. Stop checking on table open. dict_index_add_to_cache(): remove parameter 'strict', stop checking row size dict_index_t::record_size_info_t: this is a result of row size check operation create_table_info_t::row_size_is_acceptable(): performs row size check. Issues error or warning. Writes first overflow field to InnoDB log. create_table_info_t::create_table(): add row size check dict_index_t::record_size_info(): this is a refactored version of dict_index_t::rec_potentially_too_big(). New version doesn't change global state of a program but return all interesting info. And it's callers who decide how to handle row size overflow. dict_index_t::rec_potentially_too_big(): removed	2019-11-13 22:00:55 +07:00
Marko Mäkelä	b42294bc64	MDEV-19514 Defer change buffer merge until pages are requested We will remove the InnoDB background operation of merging buffered changes to secondary index leaf pages. Changes will only be merged as a result of an operation that accesses a secondary index leaf page, such as a SQL statement that performs a lookup via that index, or is modifying the index. Also ROLLBACK and some background operations, such as purging the history of committed transactions, or computing index cardinality statistics, can cause change buffer merge. Encryption key rotation will not perform change buffer merge. The motivation of this change is to simplify the I/O logic and to allow crash recovery to happen in the background (MDEV-14481). We also hope that this will reduce the number of "mystery" crashes due to corrupted data. Because change buffer merge will typically take place as a result of executing SQL statements, there should be a clearer connection between the crash and the SQL statements that were executed when the server crashed. In many cases, a slight performance improvement was observed. This is joint work with Thirunarayanan Balathandayuthapani and was tested by Axel Schwenke and Matthias Leich. The InnoDB monitor counter innodb_ibuf_merge_usec will be removed. On slow shutdown (innodb_fast_shutdown=0), we will continue to merge all buffered changes (and purge all undo log history). Two InnoDB configuration parameters will be changed as follows: innodb_disable_background_merge: Removed. This parameter existed only in debug builds. All change buffer merges will use synchronous reads. innodb_force_recovery will be changed as follows: * innodb_force_recovery=4 will be the same as innodb_force_recovery=3 (the change buffer merge cannot be disabled; it can only happen as a result of an operation that accesses a secondary index leaf page). The option used to be capable of corrupting secondary index leaf pages. Now that capability is removed, and innodb_force_recovery=4 becomes 'safe'. * innodb_force_recovery=5 (which essentially hard-wires SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED) becomes safe to use. Bogus data can be returned to SQL, but persistent InnoDB data files will not be corrupted further. * innodb_force_recovery=6 (ignore the redo log files) will be the only option that can potentially cause persistent corruption of InnoDB data files. Code changes: buf_page_t::ibuf_exist: New flag, to indicate whether buffered changes exist for a buffer pool page. Pages with pending changes can be returned by buf_page_get_gen(). Previously, the changes were always merged inside buf_page_get_gen() if needed. ibuf_page_exists(const buf_page_t&): Check if a buffered changes exist for an X-latched or read-fixed page. buf_page_get_gen(): Add the parameter allow_ibuf_merge=false. All callers that know that they may be accessing a secondary index leaf page must pass this parameter as allow_ibuf_merge=true, unless it does not matter for that caller whether all buffered changes have been applied. Assert that whenever allow_ibuf_merge holds, the page actually is a leaf page. Attempt change buffer merge only to secondary B-tree index leaf pages. btr_block_get(): Add parameter 'bool merge'. All callers of btr_block_get() should know whether the page could be a secondary index leaf page. If it is not, we should avoid consulting the change buffer bitmap to even consider a merge. This is the main interface to requesting index pages from the buffer pool. ibuf_merge_or_delete_for_page(), recv_recover_page(): Replace buf_page_get_known_nowait() with much simpler logic, because it is now guaranteed that that the block is x-latched or read-fixed. mlog_init_t::mark_ibuf_exist(): Renamed from mlog_init_t::ibuf_merge(). On crash recovery, we will no longer merge any buffered changes for the pages that we read into the buffer pool during the last batch of applying log records. buf_page_get_gen_known_nowait(), BUF_MAKE_YOUNG, BUF_KEEP_OLD: Remove. btr_search_guess_on_hash(): Merge buf_page_get_gen_known_nowait() to its only remaining caller. buf_page_make_young_if_needed(): Define as an inline function. Add the parameter buf_pool. buf_page_peek_if_young(), buf_page_peek_if_too_old(): Add the parameter buf_pool. fil_space_validate_for_mtr_commit(): Remove a bogus comment about background merge of the change buffer. btr_cur_open_at_rnd_pos_func(), btr_cur_search_to_nth_level_func(), btr_cur_open_at_index_side_func(): Use narrower data types and scopes. ibuf_read_merge_pages(): Replaces buf_read_ibuf_merge_pages(). Merge the change buffer by invoking buf_page_get_gen().	2019-10-11 17:28:15 +03:00
Sergei Golubchik	244f0e6dd8	Merge branch '10.3' into 10.4	2019-09-06 11:53:10 +02:00
Monty	a071e0e029	Merge branch '10.2' into 10.3	2019-09-03 13:17:32 +03:00
Monty	9cba6c5aa3	Updated mtr files to support different compiled in options This allows one to run the test suite even if any of the following options are changed: - character-set-server - collation-server - join-cache-level - log-basename - max-allowed-packet - optimizer-switch - query-cache-size and query-cache-type - skip-name-resolve - table-definition-cache - table-open-cache - Some innodb options etc Changes: - Don't print out the value of system variables as one can't depend on them to being constants. - Don't set global variables to 'default' as the default may not be the same as the test was started with if there was an additional option file. Instead save original value and reset it at end of test. - Test that depends on the latin1 character set should include default_charset.inc or set the character set to latin1 - Test that depends on the original optimizer switch, should include default_optimizer_switch.inc - Test that depends on the value of a specific system variable should set it in the test (like optimizer_use_condition_selectivity) - Split subselect3.test into subselect3.test and subselect3.inc to make it easier to set and reset system variables. - Added .opt files for test that required specfic options that could be changed by external configuration files. - Fixed result files in rockdsb & tokudb that had not been updated for a while.	2019-09-01 19:17:35 +03:00
Marko Mäkelä	7a3d34d645	Merge 10.3 into 10.4	2019-07-02 21:44:58 +03:00
Marko Mäkelä	e82fe21e3a	Merge 10.2 into 10.3	2019-07-02 17:46:22 +03:00
Marko Mäkelä	41e02d149e	Remove the stray test innodb_zip.16k This was accidentally broken in the parent commit.	2019-06-25 15:55:55 +03:00
Marko Mäkelä	a12fb5d0eb	Replace innodb_zip.16k with innodb_zip.page_size Also, move part of the test back to innodb.innodb_mysql and another part to a new test innodb.purge. Last but not least, merge the tests innodb_zip.4k and innodb_zip.8k to innodb_zip.page_size.	2019-06-24 17:07:20 +03:00
Marko Mäkelä	517486963b	Adjust tests after commit `b5615eff0d`	2019-04-02 11:03:28 +03:00
Thirunarayanan Balathandayuthapani	c0f47a4a58	MDEV-12026: Implement innodb_checksum_algorithm=full_crc32 MariaDB data-at-rest encryption (innodb_encrypt_tables) had repurposed the same unused data field that was repurposed in MySQL 5.7 (and MariaDB 10.2) for the Split Sequence Number (SSN) field of SPATIAL INDEX. Because of this, MariaDB was unable to support encryption on SPATIAL INDEX pages. Furthermore, InnoDB page checksums skipped some bytes, and there are multiple variations and checksum algorithms. By default, InnoDB accepts all variations of all algorithms that ever existed. This unnecessarily weakens the page checksums. We hereby introduce two more innodb_checksum_algorithm variants (full_crc32, strict_full_crc32) that are special in a way: When either setting is active, newly created data files will carry a flag (fil_space_t::full_crc32()) that indicates that all pages of the file will use a full CRC-32C checksum over the entire page contents (excluding the bytes where the checksum is stored, at the very end of the page). Such files will always use that checksum, no matter what the parameter innodb_checksum_algorithm is assigned to. For old files, the old checksum algorithms will continue to be used. The value strict_full_crc32 will be equivalent to strict_crc32 and the value full_crc32 will be equivalent to crc32. ROW_FORMAT=COMPRESSED tables will only use the old format. These tables do not support new features, such as larger innodb_page_size or instant ADD/DROP COLUMN. They may be deprecated in the future. We do not want an unnecessary file format change for them. The new full_crc32() format also cleans up the MariaDB tablespace flags. We will reserve flags to store the page_compressed compression algorithm, and to store the compressed payload length, so that checksum can be computed over the compressed (and possibly encrypted) stream and can be validated without decrypting or decompressing the page. In the full_crc32 format, there no longer are separate before-encryption and after-encryption checksums for pages. The single checksum is computed on the page contents that is written to the file. We do not make the new algorithm the default for two reasons. First, MariaDB 10.4.2 was a beta release, and the default values of parameters should not change after beta. Second, we did not yet implement the full_crc32 format for page_compressed pages. This will be fixed in MDEV-18644. This is joint work with Marko Mäkelä.	2019-02-19 18:50:19 +02:00
Marko Mäkelä	e80bcd7f64	Merge 10.3 into 10.4	2019-02-05 12:48:02 +02:00
Marko Mäkelä	36be0a5aef	MDEV-18399 Recognize the deprecated parameters innodb_file_format, innodb_large_prefix The parameters innodb_file_format and innodb_large_prefix were overridden in the Debian-distributed configuration files, because the default values of these parameters between MariaDB 5.5 and MariaDB 10.2 did not make any sense. To allow a more seamless upgrade from MariaDB 10.1 to later versions, let InnoDB recognize the parameters innodb_file_format and innodb_large_prefix and issue deprecation warnings for them if they are specified. A deprecation period of only one major release (one year between the MariaDB 10.2 and 10.3 releases) is insufficient for these widely used parameters.	2019-01-28 17:58:14 +02:00
Marko Mäkelä	6dbc50a376	Merge 10.3 into 10.4	2018-12-13 22:05:34 +02:00
Marko Mäkelä	f6e16bdc62	Merge 10.2 into 10.3	2018-12-13 21:58:35 +02:00
Marko Mäkelä	2e5aea4bab	Merge 10.1 into 10.2	2018-12-13 15:47:38 +02:00
Marko Mäkelä	eea0c3c3e7	MDEV-17750: Remove unnecessary rec_get_offsets() in IMPORT TABLESPACE row_import_set_sys_max_row_id(): Change the return type to void, and access the first column (DB_ROW_ID) directly.	2018-11-16 20:37:00 +02:00
Marko Mäkelä	853a0a4368	MDEV-13564: Set innodb_safe_truncate=ON by default The setting innodb_safe_truncate=ON reduces compatibility with older versions of MariaDB and backup tools in two ways. First, we will be writing TRX_UNDO_RENAME_TABLE records, which older versions do not know about. These records could be misinterpreted if a DDL transaction was recovered and would be rolled back. Such rollback is only possible if the server was killed while an incomplete DDL transaction was persisted. On transaction completion, the insert_undo log pages would only be repurposed for new undo log allocations, and their contents would not matter. So, older versions will not have a problem with innodb_safe_truncate=ON if the server was shut down cleanly. Second, to prevent such recovery failure, innodb_safe_truncate=ON will cause a modification of the redo log format identifier, which will prevent older versions from starting up after a crash. MariaDB Server versions older than 10.2.13 will refuse to start up altogether, even after clean shutdown. A server restart with innodb_safe_truncate=OFF will restore compatibility with older server and backup versions.	2018-10-17 17:35:38 +03:00
Marko Mäkelä	6319c0b541	MDEV-13564: Replace innodb_unsafe_truncate with innodb_safe_truncate Rename the 10.2-specific configuration option innodb_unsafe_truncate to innodb_safe_truncate, and invert its value. The default (for now) is innodb_safe_truncate=OFF, to avoid disrupting users with an undo and redo log format change within a Generally Available (GA) release series.	2018-10-11 15:10:13 +03:00
Marko Mäkelä	3448ceb02a	MDEV-13564: Implement innodb_unsafe_truncate=ON for compatibility While MariaDB Server 10.2 is not really guaranteed to be compatible with Percona XtraBackup 2.4 (for example, the MySQL 5.7 undo log format change that could be present in XtraBackup, but was reverted from MariaDB in MDEV-12289), we do not want to disrupt users who have deployed xtrabackup and MariaDB Server 10.2 in their environments. With this change, MariaDB 10.2 will continue to use the backup-unsafe TRUNCATE TABLE code, so that neither the undo log nor the redo log formats will change in an incompatible way. Undo tablespace truncation will keep using the redo log only. Recovery or backup with old code will fail to shrink the undo tablespace files, but the contents will be recovered just fine. In the MariaDB Server 10.2 series only, we introduce the configuration parameter innodb_unsafe_truncate and make it ON by default. To allow MariaDB Backup (mariabackup) to work properly with TRUNCATE TABLE operations, use loose_innodb_unsafe_truncate=OFF. MariaDB Server 10.3.10 and later releases will always use the backup-safe TRUNCATE TABLE, and this parameter will not be added there. recv_recovery_rollback_active(): Skip row_mysql_drop_garbage_tables() unless innodb_unsafe_truncate=OFF. It is too unsafe to drop orphan tables if RENAME operations are not transactional within InnoDB. LOG_HEADER_FORMAT_10_3: Replaces LOG_HEADER_FORMAT_CURRENT. log_init(), log_group_file_header_flush(), srv_prepare_to_delete_redo_log_files(), innobase_start_or_create_for_mysql(): Choose the redo log format and subformat based on the value of innodb_unsafe_truncate.	2018-10-11 08:17:04 +03:00
Marko Mäkelä	5a1868b58d	MDEV-13564 Mariabackup does not work with TRUNCATE This is a merge from 10.2, but the 10.2 version of this will not be pushed into 10.2 yet, because the 10.2 version would include backports of MDEV-14717 and MDEV-14585, which would introduce a crash recovery regression: Tables could be lost on table-rebuilding DDL operations, such as ALTER TABLE, OPTIMIZE TABLE or this new backup-friendly TRUNCATE TABLE. The test innodb.truncate_crash occasionally loses the table due to the following bug: MDEV-17158 log_write_up_to() sometimes fails	2018-09-07 22:15:06 +03:00
Marko Mäkelä	055a3334ad	MDEV-13564 Mariabackup does not work with TRUNCATE Implement undo tablespace truncation via normal redo logging. Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name, CREATE, and DROP. Note: Orphan #sql-ib.ibd may be left behind if MariaDB Server 10.2 is killed before the DROP operation is committed. If MariaDB Server 10.2 is killed during TRUNCATE, it is also possible that the old table was renamed to #sql-ib.ibd but the data dictionary will refer to the table using the original name. In MariaDB Server 10.3, RENAME inside InnoDB is transactional, and #sql-* tables will be dropped on startup. So, this new TRUNCATE will be fully crash-safe in 10.3. ha_mroonga::wrapper_truncate(): Pass table options to the underlying storage engine, now that ha_innobase::truncate() will need them. rpl_slave_state::truncate_state_table(): Before truncating mysql.gtid_slave_pos, evict any cached table handles from the table definition cache, so that there will be no stale references to the old table after truncating. == TRUNCATE TABLE == WL#6501 in MySQL 5.7 introduced separate log files for implementing atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB undo and redo log. Some convoluted logic was added to the InnoDB crash recovery, and some extra synchronization (including a redo log checkpoint) was introduced to make this work. This synchronization has caused performance problems and race conditions, and the extra log files cannot be copied or applied by external backup programs. In order to support crash-upgrade from MariaDB 10.2, we will keep the logic for parsing and applying the extra log files, but we will no longer generate those files in TRUNCATE TABLE. A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE (with full redo and undo logging and proper rollback). This will be implemented in MDEV-14717. ha_innobase::truncate(): Invoke RENAME, create(), delete_table(). Because RENAME cannot be fully rolled back before MariaDB 10.3 due to missing undo logging, add some explicit rename-back in case the operation fails. ha_innobase::delete(): Introduce a variant that takes sqlcom as a parameter. In TRUNCATE TABLE, we do not want to touch any FOREIGN KEY constraints. ha_innobase::create(): Add the parameters file_per_table, trx. In TRUNCATE, the new table must be created in the same transaction that renames the old table. create_table_info_t::create_table_info_t(): Add the parameters file_per_table, trx. row_drop_table_for_mysql(): Replace a bool parameter with sqlcom. row_drop_table_after_create_fail(): New function, wrapping row_drop_table_for_mysql(). dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(), fil_prepare_for_truncate(), fil_reinit_space_header_for_table(), row_truncate_table_for_mysql(), TruncateLogger, row_truncate_prepare(), row_truncate_rollback(), row_truncate_complete(), row_truncate_fts(), row_truncate_update_system_tables(), row_truncate_foreign_key_checks(), row_truncate_sanity_checks(): Remove. row_upd_check_references_constraints(): Remove a check for TRUNCATE, now that the table is no longer truncated in place. The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some race-condition like scenarios. The test innodb-innodb.truncate does not use any synchronization. We add a redo log subformat to indicate backup-friendly format. MariaDB 10.4 will remove support for the old TRUNCATE logging, so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve limitations. == Undo tablespace truncation == MySQL 5.7 implements undo tablespace truncation. It is only possible when innodb_undo_tablespaces is set to at least 2. The logging is implemented similar to the WL#6501 TRUNCATE, that is, using separate log files and a redo log checkpoint. We can simply implement undo tablespace truncation within a single mini-transaction that reinitializes the undo log tablespace file. Unfortunately, due to the redo log format of some operations, currently, the total redo log written by undo tablespace truncation will be more than the combined size of the truncated undo tablespace. It should be acceptable to have a little more than 1 megabyte of log in a single mini-transaction. This will be fixed in MDEV-17138 in MariaDB Server 10.4. recv_sys_t: Add truncated_undo_spaces[] to remember for which undo tablespaces a MLOG_FILE_CREATE2 record was seen. namespace undo: Remove some unnecessary declarations. fil_space_t::is_being_truncated: Document that this flag now only applies to undo tablespaces. Remove some references. fil_space_t::is_stopping(): Do not refer to is_being_truncated. This check is for tablespaces of tables. Potentially used tablespaces are never truncated any more. buf_dblwr_process(): Suppress the out-of-bounds warning for undo tablespaces. fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero page number (new size of the tablespace in pages) to inform crash recovery that the undo tablespace size has been reduced. fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2 can be written for undo tablespaces (without .ibd file suffix) for a nonzero page number. os_file_truncate(): Add the parameter allow_shrink=false so that undo tablespaces can actually be shrunk using this function. fil_name_parse(): For undo tablespace truncation, buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[]. recv_read_in_area(): Avoid reading pages for which no redo log records remain buffered, after recv_addr_trim() removed them. trx_rseg_header_create(): Add a FIXME comment that we could write much less redo log. trx_undo_truncate_tablespace(): Reinitialize the undo tablespace in a single mini-transaction, which will be flushed to the redo log before the file size is trimmed. recv_addr_trim(): Discard any redo logs for pages that were logged after the new end of a file, before the truncation LSN. If the rec_list becomes empty, reduce n_addrs. After removing any affected records, actually truncate the file. recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before applying any log records. The undo tablespace files must be open at this point. buf_flush_or_remove_pages(), buf_flush_dirty_pages(), buf_LRU_flush_or_remove_pages(): Add a parameter for specifying the number of the first page to flush or remove (default 0). trx_purge_initiate_truncate(): Remove the log checkpoints, the extra logging, and some unnecessary crash points. Merge the code from trx_undo_truncate_tablespace(). First, flush all to-be-discarded pages (beyond the new end of the file), then trim the space->size to make the page allocation deterministic. At the only remaining crash injection point, flush the redo log, so that the recovery can be tested.	2018-09-07 22:10:02 +03:00
Marko Mäkelä	d71a8855ee	Merge 10.2 to 10.3 Temporarily disable main.cte_recursive due to hang in an added test related to MDEV-15575.	2018-04-19 15:23:21 +03:00

1 2 3

105 Commits