1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00
Commit Graph

14 Commits

Author SHA1 Message Date
Marko Mäkelä
49a6baec56 Merge 10.11 into 11.4 2025-03-03 11:07:56 +02:00
Marko Mäkelä
0c204bfb87 Merge 10.6 into 10.11 2025-02-25 10:23:24 +02:00
Marko Mäkelä
c07e355c40 MDEV-36015: unrepresentable value in row_parse_int()
row_parse_int(): Refactor the code and define the function static in
one compilation unit. For any negative values, we must return 0.

row_search_get_max_rec(), row_search_max_autoinc(): Moved to the same
compilation unit with row_parse_int().

We also remove a work-around of an internal compiler error when
targeting ARMv8 on GCC 4.8.5, a compiler that is no longer supported.

Reviewed by: Debarun Banerjee
2025-02-13 15:10:53 +01:00
Sergei Golubchik
a8a22b7af2 support 'alter online table t1 page_checksum=0' 2023-08-15 10:16:11 +02:00
Marko Mäkelä
618d820646 Merge 10.7 into 10.8 2022-10-13 10:42:41 +03:00
Marko Mäkelä
6dc157f8a6 Merge 10.5 into 10.6 2022-10-06 09:22:39 +03:00
Marko Mäkelä
f600690c6b MDEV-29710: Skip some more tests on Valgrind 2022-10-05 20:37:54 +03:00
Marko Mäkelä
358921ce32 MDEV-26938 Support descending indexes internally in InnoDB
This is loosely based on the InnoDB changes in
mysql/mysql-server@97fd8b1b69
that I had developed in 2015 or 2016.

For each B-tree key field, we will allow a flag ASC/DESC to be associated.
When PRIMARY KEY fields are internally appended to secondary indexes,
the ASC/DESC attribute will be inherited, so that covering index scans
will work as expected.

Note: Until the subsequent commit, the DESC attribute will be ignored
(no HA_REVERSE_SORT flag will be written to .frm files).

dict_field_t::descending: A new flag to denote descending order.

cmp_data(), cmp_dfield_dfield(): Add a new parameter descending.

cmp_dtuple_rec(), cmp_dtuple_rec_with_match(): Add a parameter "index".

dtuple_coll_eq(): Replaces dtuple_coll_cmp().

cmp_dfield_dfield_eq_prefix(): Replaces cmp_dfield_dfield_like_prefix().

dict_index_t::is_btree(): Check whether the index is a regular
B-tree index (not SPATIAL, FULLTEXT, or the ibuf.index,
or a corrupted index.

btr_cur_search_to_nth_level_func(): Only attempt to use
the adaptive hash index if index->is_btree().
This function may also be invoked on ibuf.index, and
cmp_dtuple_rec_with_match_bytes() will no longer work on ibuf.index
because it assumes that the index and record fields exactly match.
The ibuf.index is a special variadic index tree.

Thanks to Thirunarayanan Balathandayuthapani for fixing some bugs:
MDEV-27439, MDEV-27374/MDEV-27445.
2022-01-26 18:43:05 +01:00
Marko Mäkelä
3cef4f8f0f MDEV-515 Reduce InnoDB undo logging for insert into empty table
We implement an idea that was suggested by Michael 'Monty' Widenius
in October 2017: When InnoDB is inserting into an empty table or partition,
we can write a single undo log record TRX_UNDO_EMPTY, which will cause
ROLLBACK to clear the table.

For this to work, the insert into an empty table or partition must be
covered by an exclusive table lock that will be held until the transaction
has been committed or rolled back, or the INSERT operation has been
rolled back (and the table is empty again), in lock_table_x_unlock().

Clustered index records that are covered by the TRX_UNDO_EMPTY record
will carry DB_TRX_ID=0 and DB_ROLL_PTR=1<<55, and thus they cannot
be distinguished from what MDEV-12288 leaves behind after purging the
history of row-logged operations.

Concurrent non-locking reads must be adjusted: If the read view was
created before the INSERT into an empty table, then we must continue
to imagine that the table is empty, and not try to read any records.
If the read view was created after the INSERT was committed, then
all records must be visible normally. To implement this, we introduce
the field dict_table_t::bulk_trx_id.

This special handling only applies to the very first INSERT statement
of a transaction for the empty table or partition. If a subsequent
statement in the transaction is modifying the initially empty table again,
we must enable row-level undo logging, so that we will be able to
roll back to the start of the statement in case of an error (such as
duplicate key).

INSERT IGNORE will continue to use row-level logging and locking, because
implementing it would require the ability to roll back the latest row.
Since the undo log that we write only allows us to roll back the entire
statement, we cannot support INSERT IGNORE. We will introduce a
handler::extra() parameter HA_EXTRA_IGNORE_INSERT to indicate to storage
engines that INSERT IGNORE is being executed.

In many test cases, we add an extra record to the table, so that during
the 'interesting' part of the test, row-level locking and logging will
be used.

Replicas will continue to use row-level logging and locking until
MDEV-24622 has been addressed. Likewise, this optimization will be
disabled in Galera cluster until MDEV-24623 enables it.

dict_table_t::bulk_trx_id: The latest active or committed transaction
that initiated an insert into an empty table or partition.
Protected by exclusive table lock and a clustered index leaf page latch.

ins_node_t::bulk_insert: Whether bulk insert was initiated.

trx_t::mod_tables: Use C++11 style accessors (emplace instead of insert).
Unlike earlier, this collection will cover also temporary tables.

trx_mod_table_time_t: Add start_bulk_insert(), end_bulk_insert(),
is_bulk_insert(), was_bulk_insert().

trx_undo_report_row_operation(): Before accessing any undo log pages,
invoke trx->mod_tables.emplace() in order to determine whether undo
logging was disabled, or whether this is the first INSERT and we are
supposed to write a TRX_UNDO_EMPTY record.

row_ins_clust_index_entry_low(): If we are inserting into an empty
clustered index leaf page, set the ins_node_t::bulk_insert flag for
the subsequent trx_undo_report_row_operation() call.

lock_rec_insert_check_and_lock(), lock_prdt_insert_check_and_lock():
Remove the redundant parameter 'flags' that can be checked in the caller.

btr_cur_ins_lock_and_undo(): Simplify the logic. Correctly write
DB_TRX_ID,DB_ROLL_PTR after invoking trx_undo_report_row_operation().

trx_mark_sql_stat_end(), ha_innobase::extra(HA_EXTRA_IGNORE_INSERT),
ha_innobase::external_lock(): Invoke trx_t::end_bulk_insert() so that
the next statement will not be covered by table-level undo logging.

ReadView::changes_visible(trx_id_t) const: New accessor for the case
where the trx_id_t is not read from a potentially corrupted index page
but directly from the memory. In this case, we can skip a sanity check.

row_sel(), row_sel_try_search_shortcut(), row_search_mvcc():
row_sel_try_search_shortcut_for_mysql(),
row_merge_read_clustered_index(): Check dict_table_t::bulk_trx_id.

row_sel_clust_sees(): Replaces lock_clust_rec_cons_read_sees().

lock_sec_rec_cons_read_sees(): Replaced with lower-level code.

btr_root_page_init(): Refactored from btr_create().

dict_index_t::clear(), dict_table_t::clear(): Empty an index or table,
for the ROLLBACK of an INSERT operation.

ROW_T_EMPTY, ROW_OP_EMPTY: Note a concurrent ROLLBACK of an INSERT
into an empty table.

This is joint work with Thirunarayanan Balathandayuthapani,
who created a working prototype.
Thanks to Matthias Leich for extensive testing.
2021-01-25 18:41:27 +02:00
Marko Mäkelä
571d4137bf Add IMPORT test for MDEV-12123 Page contains nonzero PAGE_MAX_TRX_ID 2017-04-19 08:17:41 +03:00
Sergei Golubchik
d6d994bf42 remove two redundant *.inc files to restart a server
namely, restart_mysqld_with_option.inc and kill_and_restart_mysqld.inc -
use restart_mysqld.inc instead.

Also remove innodb_wl6501_crash_stripped.inc that wasn't used anywhere.
2017-03-31 19:28:58 +02:00
Marko Mäkelä
c64edc6b83 MDEV-6076: Preserve PAGE_ROOT_AUTO_INC when emptying pages.
Thanks to Zhangyuan from Alibaba for pointing out this bug.

btr_page_empty(): When a clustered index root page is emptied,
preserve PAGE_ROOT_AUTO_INC. This would occur during a page split.

page_create_empty(): Preserve PAGE_ROOT_AUTO_INC when a clustered
index root page becomes empty. Use a faster method for writing
the field.

page_zip_copy_recs(): Reset PAGE_MAX_TRX_ID when copying
clustered index pages. We must clear the field when the root page
was a leaf page and it is being split, so that PAGE_MAX_TRX_ID
will continue to be 0 in clustered index non-root pages.

page_create_zip(): Add debug assertions for validating
PAGE_MAX_TRX_ID and PAGE_ROOT_AUTO_INC.
2016-12-16 10:26:41 +02:00
Marko Mäkelä
cb0ce5c2e9 MDEV-6076: Optimize the test.
Remove unnecessary restarts by testing multiple tables across a restart.
This change almost halves the execution time.
Some further restarts could be removed with additional effort.
2016-12-16 10:24:54 +02:00
Marko Mäkelä
8777458a6e MDEV-6076 Persistent AUTO_INCREMENT for InnoDB
This should be functionally equivalent to WL#6204 in MySQL 8.0.0, with
the notable difference that the file format changes are limited to
repurposing a previously unused data field in B-tree pages.

For persistent InnoDB tables, write the last used AUTO_INCREMENT
value to the root page of the clustered index, in the previously
unused (0) PAGE_MAX_TRX_ID field, now aliased as PAGE_ROOT_AUTO_INC.
Unlike some other previously unused InnoDB data fields, this one was
actually always zero-initialized, at least since MySQL 3.23.49.

The writes to PAGE_ROOT_AUTO_INC are protected by SX or X latch on the
root page. The SX latch will allow concurrent read access to the root
page. (The field PAGE_ROOT_AUTO_INC will only be read on the
first-time call to ha_innobase::open() from the SQL layer. The
PAGE_ROOT_AUTO_INC can only be updated when executing SQL, so
read/write races are not possible.)

During INSERT, the PAGE_ROOT_AUTO_INC is updated by the low-level
function btr_cur_search_to_nth_level(), adding no extra page
access. [Adaptive hash index lookup will be disabled during INSERT.]

If some rare UPDATE modifies an AUTO_INCREMENT column, the
PAGE_ROOT_AUTO_INC will be adjusted in a separate mini-transaction in
ha_innobase::update_row().

When a page is reorganized, we have to preserve the PAGE_ROOT_AUTO_INC
field.

During ALTER TABLE, the initial AUTO_INCREMENT value will be copied
from the table. ALGORITHM=COPY and online log apply in LOCK=NONE will
update PAGE_ROOT_AUTO_INC in real time.

innodb_col_no(): Determine the dict_table_t::cols[] element index
corresponding to a Field of a non-virtual column.
(The MySQL 5.7 implementation of virtual columns breaks the 1:1
relationship between Field::field_index and dict_table_t::cols[].
Virtual columns are omitted from dict_table_t::cols[]. Therefore,
we must translate the field_index of AUTO_INCREMENT columns into
an index of dict_table_t::cols[].)

Upgrade from old data files:

By default, the AUTO_INCREMENT sequence in old data files would appear
to be reset, because PAGE_MAX_TRX_ID or PAGE_ROOT_AUTO_INC would contain
the value 0 in each clustered index page. In new data files,
PAGE_ROOT_AUTO_INC can only be 0 if the table is empty or does not contain
any AUTO_INCREMENT column.

For backward compatibility, we use the old method of
SELECT MAX(auto_increment_column) for initializing the sequence.

btr_read_autoinc(): Read the AUTO_INCREMENT sequence from a new-format
data file.

btr_read_autoinc_with_fallback(): A variant of btr_read_autoinc()
that will resort to reading MAX(auto_increment_column) for data files
that did not use AUTO_INCREMENT yet. It was manually tested that during
the execution of innodb.autoinc_persist the compatibility logic is
not activated (for new files, PAGE_ROOT_AUTO_INC is never 0 in nonempty
clustered index root pages).

initialize_auto_increment(): Replaces
ha_innobase::innobase_initialize_autoinc(). This initializes
the AUTO_INCREMENT metadata. Only called from ha_innobase::open().

ha_innobase::info_low(): Do not try to lazily initialize
dict_table_t::autoinc. It must already have been initialized by
ha_innobase::open() or ha_innobase::create().

Note: The adjustments to class ha_innopart were not tested, because
the source code (native InnoDB partitioning) is not being compiled.
2016-12-16 09:19:19 +02:00