mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00

Author	SHA1	Message	Date
Marko Mäkelä	49a6baec56	Merge 10.11 into 11.4	2025-03-03 11:07:56 +02:00
Marko Mäkelä	0c204bfb87	Merge 10.6 into 10.11	2025-02-25 10:23:24 +02:00
Marko Mäkelä	c07e355c40	MDEV-36015: unrepresentable value in row_parse_int() row_parse_int(): Refactor the code and define the function static in one compilation unit. For any negative values, we must return 0. row_search_get_max_rec(), row_search_max_autoinc(): Moved to the same compilation unit with row_parse_int(). We also remove a work-around of an internal compiler error when targeting ARMv8 on GCC 4.8.5, a compiler that is no longer supported. Reviewed by: Debarun Banerjee	2025-02-13 15:10:53 +01:00
Sergei Golubchik	a8a22b7af2	support 'alter online table t1 page_checksum=0'	2023-08-15 10:16:11 +02:00
Marko Mäkelä	618d820646	Merge 10.7 into 10.8	2022-10-13 10:42:41 +03:00
Marko Mäkelä	6dc157f8a6	Merge 10.5 into 10.6	2022-10-06 09:22:39 +03:00
Marko Mäkelä	f600690c6b	MDEV-29710: Skip some more tests on Valgrind	2022-10-05 20:37:54 +03:00
Marko Mäkelä	358921ce32	MDEV-26938 Support descending indexes internally in InnoDB This is loosely based on the InnoDB changes in mysql/mysql-server@97fd8b1b69 that I had developed in 2015 or 2016. For each B-tree key field, we will allow a flag ASC/DESC to be associated. When PRIMARY KEY fields are internally appended to secondary indexes, the ASC/DESC attribute will be inherited, so that covering index scans will work as expected. Note: Until the subsequent commit, the DESC attribute will be ignored (no HA_REVERSE_SORT flag will be written to .frm files). dict_field_t::descending: A new flag to denote descending order. cmp_data(), cmp_dfield_dfield(): Add a new parameter descending. cmp_dtuple_rec(), cmp_dtuple_rec_with_match(): Add a parameter "index". dtuple_coll_eq(): Replaces dtuple_coll_cmp(). cmp_dfield_dfield_eq_prefix(): Replaces cmp_dfield_dfield_like_prefix(). dict_index_t::is_btree(): Check whether the index is a regular B-tree index (not SPATIAL, FULLTEXT, or the ibuf.index, or a corrupted index. btr_cur_search_to_nth_level_func(): Only attempt to use the adaptive hash index if index->is_btree(). This function may also be invoked on ibuf.index, and cmp_dtuple_rec_with_match_bytes() will no longer work on ibuf.index because it assumes that the index and record fields exactly match. The ibuf.index is a special variadic index tree. Thanks to Thirunarayanan Balathandayuthapani for fixing some bugs: MDEV-27439, MDEV-27374/MDEV-27445.	2022-01-26 18:43:05 +01:00
Marko Mäkelä	3cef4f8f0f	MDEV-515 Reduce InnoDB undo logging for insert into empty table We implement an idea that was suggested by Michael 'Monty' Widenius in October 2017: When InnoDB is inserting into an empty table or partition, we can write a single undo log record TRX_UNDO_EMPTY, which will cause ROLLBACK to clear the table. For this to work, the insert into an empty table or partition must be covered by an exclusive table lock that will be held until the transaction has been committed or rolled back, or the INSERT operation has been rolled back (and the table is empty again), in lock_table_x_unlock(). Clustered index records that are covered by the TRX_UNDO_EMPTY record will carry DB_TRX_ID=0 and DB_ROLL_PTR=1<<55, and thus they cannot be distinguished from what MDEV-12288 leaves behind after purging the history of row-logged operations. Concurrent non-locking reads must be adjusted: If the read view was created before the INSERT into an empty table, then we must continue to imagine that the table is empty, and not try to read any records. If the read view was created after the INSERT was committed, then all records must be visible normally. To implement this, we introduce the field dict_table_t::bulk_trx_id. This special handling only applies to the very first INSERT statement of a transaction for the empty table or partition. If a subsequent statement in the transaction is modifying the initially empty table again, we must enable row-level undo logging, so that we will be able to roll back to the start of the statement in case of an error (such as duplicate key). INSERT IGNORE will continue to use row-level logging and locking, because implementing it would require the ability to roll back the latest row. Since the undo log that we write only allows us to roll back the entire statement, we cannot support INSERT IGNORE. We will introduce a handler::extra() parameter HA_EXTRA_IGNORE_INSERT to indicate to storage engines that INSERT IGNORE is being executed. In many test cases, we add an extra record to the table, so that during the 'interesting' part of the test, row-level locking and logging will be used. Replicas will continue to use row-level logging and locking until MDEV-24622 has been addressed. Likewise, this optimization will be disabled in Galera cluster until MDEV-24623 enables it. dict_table_t::bulk_trx_id: The latest active or committed transaction that initiated an insert into an empty table or partition. Protected by exclusive table lock and a clustered index leaf page latch. ins_node_t::bulk_insert: Whether bulk insert was initiated. trx_t::mod_tables: Use C++11 style accessors (emplace instead of insert). Unlike earlier, this collection will cover also temporary tables. trx_mod_table_time_t: Add start_bulk_insert(), end_bulk_insert(), is_bulk_insert(), was_bulk_insert(). trx_undo_report_row_operation(): Before accessing any undo log pages, invoke trx->mod_tables.emplace() in order to determine whether undo logging was disabled, or whether this is the first INSERT and we are supposed to write a TRX_UNDO_EMPTY record. row_ins_clust_index_entry_low(): If we are inserting into an empty clustered index leaf page, set the ins_node_t::bulk_insert flag for the subsequent trx_undo_report_row_operation() call. lock_rec_insert_check_and_lock(), lock_prdt_insert_check_and_lock(): Remove the redundant parameter 'flags' that can be checked in the caller. btr_cur_ins_lock_and_undo(): Simplify the logic. Correctly write DB_TRX_ID,DB_ROLL_PTR after invoking trx_undo_report_row_operation(). trx_mark_sql_stat_end(), ha_innobase::extra(HA_EXTRA_IGNORE_INSERT), ha_innobase::external_lock(): Invoke trx_t::end_bulk_insert() so that the next statement will not be covered by table-level undo logging. ReadView::changes_visible(trx_id_t) const: New accessor for the case where the trx_id_t is not read from a potentially corrupted index page but directly from the memory. In this case, we can skip a sanity check. row_sel(), row_sel_try_search_shortcut(), row_search_mvcc(): row_sel_try_search_shortcut_for_mysql(), row_merge_read_clustered_index(): Check dict_table_t::bulk_trx_id. row_sel_clust_sees(): Replaces lock_clust_rec_cons_read_sees(). lock_sec_rec_cons_read_sees(): Replaced with lower-level code. btr_root_page_init(): Refactored from btr_create(). dict_index_t::clear(), dict_table_t::clear(): Empty an index or table, for the ROLLBACK of an INSERT operation. ROW_T_EMPTY, ROW_OP_EMPTY: Note a concurrent ROLLBACK of an INSERT into an empty table. This is joint work with Thirunarayanan Balathandayuthapani, who created a working prototype. Thanks to Matthias Leich for extensive testing.	2021-01-25 18:41:27 +02:00
Marko Mäkelä	571d4137bf	Add IMPORT test for MDEV-12123 Page contains nonzero PAGE_MAX_TRX_ID	2017-04-19 08:17:41 +03:00
Sergei Golubchik	d6d994bf42	remove two redundant *.inc files to restart a server namely, restart_mysqld_with_option.inc and kill_and_restart_mysqld.inc - use restart_mysqld.inc instead. Also remove innodb_wl6501_crash_stripped.inc that wasn't used anywhere.	2017-03-31 19:28:58 +02:00
Marko Mäkelä	c64edc6b83	MDEV-6076: Preserve PAGE_ROOT_AUTO_INC when emptying pages. Thanks to Zhangyuan from Alibaba for pointing out this bug. btr_page_empty(): When a clustered index root page is emptied, preserve PAGE_ROOT_AUTO_INC. This would occur during a page split. page_create_empty(): Preserve PAGE_ROOT_AUTO_INC when a clustered index root page becomes empty. Use a faster method for writing the field. page_zip_copy_recs(): Reset PAGE_MAX_TRX_ID when copying clustered index pages. We must clear the field when the root page was a leaf page and it is being split, so that PAGE_MAX_TRX_ID will continue to be 0 in clustered index non-root pages. page_create_zip(): Add debug assertions for validating PAGE_MAX_TRX_ID and PAGE_ROOT_AUTO_INC.	2016-12-16 10:26:41 +02:00
Marko Mäkelä	cb0ce5c2e9	MDEV-6076: Optimize the test. Remove unnecessary restarts by testing multiple tables across a restart. This change almost halves the execution time. Some further restarts could be removed with additional effort.	2016-12-16 10:24:54 +02:00
Marko Mäkelä	8777458a6e	MDEV-6076 Persistent AUTO_INCREMENT for InnoDB This should be functionally equivalent to WL#6204 in MySQL 8.0.0, with the notable difference that the file format changes are limited to repurposing a previously unused data field in B-tree pages. For persistent InnoDB tables, write the last used AUTO_INCREMENT value to the root page of the clustered index, in the previously unused (0) PAGE_MAX_TRX_ID field, now aliased as PAGE_ROOT_AUTO_INC. Unlike some other previously unused InnoDB data fields, this one was actually always zero-initialized, at least since MySQL 3.23.49. The writes to PAGE_ROOT_AUTO_INC are protected by SX or X latch on the root page. The SX latch will allow concurrent read access to the root page. (The field PAGE_ROOT_AUTO_INC will only be read on the first-time call to ha_innobase::open() from the SQL layer. The PAGE_ROOT_AUTO_INC can only be updated when executing SQL, so read/write races are not possible.) During INSERT, the PAGE_ROOT_AUTO_INC is updated by the low-level function btr_cur_search_to_nth_level(), adding no extra page access. [Adaptive hash index lookup will be disabled during INSERT.] If some rare UPDATE modifies an AUTO_INCREMENT column, the PAGE_ROOT_AUTO_INC will be adjusted in a separate mini-transaction in ha_innobase::update_row(). When a page is reorganized, we have to preserve the PAGE_ROOT_AUTO_INC field. During ALTER TABLE, the initial AUTO_INCREMENT value will be copied from the table. ALGORITHM=COPY and online log apply in LOCK=NONE will update PAGE_ROOT_AUTO_INC in real time. innodb_col_no(): Determine the dict_table_t::cols[] element index corresponding to a Field of a non-virtual column. (The MySQL 5.7 implementation of virtual columns breaks the 1:1 relationship between Field::field_index and dict_table_t::cols[]. Virtual columns are omitted from dict_table_t::cols[]. Therefore, we must translate the field_index of AUTO_INCREMENT columns into an index of dict_table_t::cols[].) Upgrade from old data files: By default, the AUTO_INCREMENT sequence in old data files would appear to be reset, because PAGE_MAX_TRX_ID or PAGE_ROOT_AUTO_INC would contain the value 0 in each clustered index page. In new data files, PAGE_ROOT_AUTO_INC can only be 0 if the table is empty or does not contain any AUTO_INCREMENT column. For backward compatibility, we use the old method of SELECT MAX(auto_increment_column) for initializing the sequence. btr_read_autoinc(): Read the AUTO_INCREMENT sequence from a new-format data file. btr_read_autoinc_with_fallback(): A variant of btr_read_autoinc() that will resort to reading MAX(auto_increment_column) for data files that did not use AUTO_INCREMENT yet. It was manually tested that during the execution of innodb.autoinc_persist the compatibility logic is not activated (for new files, PAGE_ROOT_AUTO_INC is never 0 in nonempty clustered index root pages). initialize_auto_increment(): Replaces ha_innobase::innobase_initialize_autoinc(). This initializes the AUTO_INCREMENT metadata. Only called from ha_innobase::open(). ha_innobase::info_low(): Do not try to lazily initialize dict_table_t::autoinc. It must already have been initialized by ha_innobase::open() or ha_innobase::create(). Note: The adjustments to class ha_innopart were not tested, because the source code (native InnoDB partitioning) is not being compiled.	2016-12-16 09:19:19 +02:00

14 Commits