InnoDB tablespace identifiers and page numbers are 32-bit numbers.
Let us use a 32-bit type for them in innochecksum.
The changes in commit 1918bdf32c
broke the build on 32-bit Windows.
Thanks to Vicențiu Ciorbaru for an initial version of this fixup.
Let us simply refuse an upgrade from earlier versions if the
upgrade procedure was not followed. This simplifies the purge,
commit, and rollback of transactions.
Before upgrading to MariaDB 10.3 or later, a clean shutdown
of the server (with innodb_fast_shutdown=1 or 0) is necessary,
to ensure that any incomplete transactions are rolled back.
The undo log format was changed in MDEV-12288. There is only
one persistent undo log for each transaction.
A side effect of the MDEV-24589 bug fix is that if
FLUSH TABLE...FOR EXPORT is initiated before the history of an
earlier DROP INDEX operation has been purged, then the data file
will contain allocated pages that belonged to the dropped indexes.
These pages would never be freed after a subsequent IMPORT TABLESPACE.
We will work around this regression by making IMPORT TABLESPACE
tolerate pages that refer to an unknown index.
* be strict in CREATE TABLE, just like in ALTER TABLE, because
CREATE TABLE, just like ALTER TABLE, can be rolled back for any engine
* but don't auto-convert warnings into errors for engine warnings
(handler::create) - this matches ALTER TABLE behavior
* and not when creating a default record, these errors are handled
specially (and replaced with ER_INVALID_DEFAULT)
* always issue a Note when a non-unique key is truncated, because it's
not a Warning that can be converted to an Error. Before this commit
it was a Note for blobs and a Warning for all other data types.
In main.index_merge_myisam we remove the test that was added in
commit a2d24def8c because
it duplicates the test case that was added in
commit 5af12e4635.
The test innodb.innodb_wl6326 that had been disabled in 10.4 due to
MDEV-21535 is failing on 10.5 due to a different reason: the removal
of the MLOG_COMP_END_COPY_CREATED operations in MDEV-12353
commit 276f996af9 caused PAGE_LAST_INSERT
to be set to something nonzero by the function page_copy_rec_list_end().
This in turn would cause btr_page_get_split_rec_to_right() to behave
differently: we would not attempt to split the page at all, but simply
insert the new record into the new, empty, right leaf page.
Even though the change reduced the sizes of some tables, it is better
to aim for balanced trees.
page_copy_rec_list_end(), PageBulk::finishPage():
Preserve PAGE_LAST_INSERT, PAGE_N_DIRECTION, PAGE_DIRECTION.
PageBulk::finish(): Move some common code from PageBulk::finishPage().
Remove CREATE/DROP database.
Remove some unnecessary suppressions, replacements, and
SQL statements.
Populate tables via have_sequence.inc to avoid the creation of
explicit InnoDB record locks in INSERT...SELECT. This will remove
some gaps in AUTO_INCREMENT values.
When a InnoDB data file page is freed, its contents becomes garbage,
and any storage allocated in the data file is wasted. During flushing,
InnoDB initializes the page with zeros if scrubbing is enabled. If the
tablespace is compressed then InnoDB should punch a hole else ignore the
flushing of the freed page.
buf_page_t:
- Replaced the variable file_page_was_freed, init_on_flush in buf_page_t
with status enum variable.
- Changed all debug assert of file_page_was_freed to DBUG_ASSERT
of buf_page_t::status
Removed buf_page_set_file_page_was_freed(),
buf_page_reset_file_page_was_freed().
buf_page_free(): Newly added function which takes X-lock on the page
before marking the status as FREED. So that InnoDB flush handler can
avoid concurrent flush of the freed page. Also while flushing the page,
InnoDB make sure that redo log which does freeing of the page also written
to the disk. Currently, this function only marks the page as FREED if
it is in buffer pool
buf_flush_freed_page(): Newly added function which initializes zeros
asynchorously if innodb_immediate_scrub_data_uncompressed is enabled.
Punch a hole to the file synchorously if page_compressed is enabled.
Reset the io_fix to NORMAL. Release the block from flush list and
associated mutex before writing zeros or punch a hole to the file.
buf_flush_page(): Removed the unnecessary usage of temporary
variable "flush"
fil_io(): Introduce new parameter called punch_hole. It allows fil_io()
to punch the hole to the file for the given offset.
buf_page_create(): Let the callers assign buf_page_t::status.
Every caller should eventually invoke mtr_t::init().
fsp_page_create(): Remove the unused mtr_t parameter.
In all other callers of buf_page_create() except fsp_page_create(),
before invoking mtr_t::init(), invoke
mtr_t::sx_latch_at_savepoint() or mtr_t::x_latch_at_savepoint().
mtr_t::init(): Initialize buf_page_t::status also for the temporary
tablespace (when redo logging is disabled), to avoid assertion failures.
btr_cur_upd_rec_in_place(): Invoke page_zip_rec_set_deleted()
for ROW_FORMAT=COMPRESSED pages, so that the change will be
written to the redo log.
This part of crash recovery was broken in
commit 08ba388713 (MDEV-12353).
Instead of writing the high-level redo log records
MLOG_LIST_END_COPY_CREATED, MLOG_COMP_LIST_END_COPY_CREATED
write log for each individual insert of a record.
page_copy_rec_list_end_to_created_page(): Remove.
This will improve the fill factor of some pages.
Adjust some tests accordingly.
PageBulk::init(), PageBulk::finish(): Avoid setting bogus limits
to PAGE_HEAP_TOP and PAGE_N_DIR_SLOTS. Avoid accessor functions
that would enforce these limits before the correct ones are set
at the end of PageBulk::finish().
page_zip_compress_write_log_no_data(): Remove.
We no longer write the MLOG_ZIP_PAGE_COMPRESS_NO_DATA record.
Instead, we will write MLOG_ZIP_PAGE_COMPRESS records.
We should not need anywhere near 32 bits of entropy, so we might
just limit ourselves to a 32-bit random number generator.
Also, it might be cheaper to use exclusive-or, bit shifting and
conditional jumps, instead of multiplication and addition.
We use relaxed atomic operations on the global random number generator
state in order in an attempt to silence any warnings about race conditions.
There is an obvious race condition between the load and store in
ut_rnd_gen(), but we do not think that it matters much that the
state of the random number generator could 'stutter'.
This change seems makes the 'uncompress_ops' nondeterministic
in innodb_zip.cmp_per_index after the restart. It looks like
there is an inherent race condition in the test, because the
table could be opened for InnoDB statistics recalculation
already before innodb_cmp_per_index_enabled was set. We might
end up having uncompress_ops anywhere between 0 and 9, or perhaps
even more. Let us remove that part of the test.
btr_free_externally_stored_field(): Pass w=mtr_t::OPT to
note that the BTR_EXTERN_LEN is not necessarily changing
when a multi-page ROW_FORMAT=COMPRESSED off-page column
is being freed, and to allow redundant writes to the redo
log to be optimized away.
Ever since commit 56f6dab1d0
the refactored function mtr_t::write() asserts by default
that the page contents is being changed.
In the test innodb.instant_alter,4k we would be flagging an error
for too large row size. That error was previously only being reported
if the table was being rebuilt. Thus, this merge is fixing a small
omission in MDEV-11369 (instant ADD COLUMN).
Move row size check to early CREATE/ALTER TABLE phase. Stop checking
on table open.
dict_index_add_to_cache(): remove parameter 'strict', stop checking row size
dict_index_t::record_size_info_t: this is a result of row size check operation
create_table_info_t::row_size_is_acceptable(): performs row size check.
Issues error or warning. Writes first overflow field to InnoDB log.
create_table_info_t::create_table(): add row size check
dict_index_t::record_size_info(): this is a refactored version
of dict_index_t::rec_potentially_too_big(). New version doesn't change global
state of a program but return all interesting info. And it's callers who
decide how to handle row size overflow.
dict_index_t::rec_potentially_too_big(): removed
We will remove the InnoDB background operation of merging buffered
changes to secondary index leaf pages. Changes will only be merged as a
result of an operation that accesses a secondary index leaf page,
such as a SQL statement that performs a lookup via that index,
or is modifying the index. Also ROLLBACK and some background operations,
such as purging the history of committed transactions, or computing
index cardinality statistics, can cause change buffer merge.
Encryption key rotation will not perform change buffer merge.
The motivation of this change is to simplify the I/O logic and to
allow crash recovery to happen in the background (MDEV-14481).
We also hope that this will reduce the number of "mystery" crashes
due to corrupted data. Because change buffer merge will typically
take place as a result of executing SQL statements, there should be
a clearer connection between the crash and the SQL statements that
were executed when the server crashed.
In many cases, a slight performance improvement was observed.
This is joint work with Thirunarayanan Balathandayuthapani
and was tested by Axel Schwenke and Matthias Leich.
The InnoDB monitor counter innodb_ibuf_merge_usec will be removed.
On slow shutdown (innodb_fast_shutdown=0), we will continue to
merge all buffered changes (and purge all undo log history).
Two InnoDB configuration parameters will be changed as follows:
innodb_disable_background_merge: Removed.
This parameter existed only in debug builds.
All change buffer merges will use synchronous reads.
innodb_force_recovery will be changed as follows:
* innodb_force_recovery=4 will be the same as innodb_force_recovery=3
(the change buffer merge cannot be disabled; it can only happen as
a result of an operation that accesses a secondary index leaf page).
The option used to be capable of corrupting secondary index leaf pages.
Now that capability is removed, and innodb_force_recovery=4 becomes 'safe'.
* innodb_force_recovery=5 (which essentially hard-wires
SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED)
becomes safe to use. Bogus data can be returned to SQL, but
persistent InnoDB data files will not be corrupted further.
* innodb_force_recovery=6 (ignore the redo log files)
will be the only option that can potentially cause
persistent corruption of InnoDB data files.
Code changes:
buf_page_t::ibuf_exist: New flag, to indicate whether buffered
changes exist for a buffer pool page. Pages with pending changes
can be returned by buf_page_get_gen(). Previously, the changes
were always merged inside buf_page_get_gen() if needed.
ibuf_page_exists(const buf_page_t&): Check if a buffered changes
exist for an X-latched or read-fixed page.
buf_page_get_gen(): Add the parameter allow_ibuf_merge=false.
All callers that know that they may be accessing a secondary index
leaf page must pass this parameter as allow_ibuf_merge=true,
unless it does not matter for that caller whether all buffered
changes have been applied. Assert that whenever allow_ibuf_merge
holds, the page actually is a leaf page. Attempt change buffer
merge only to secondary B-tree index leaf pages.
btr_block_get(): Add parameter 'bool merge'.
All callers of btr_block_get() should know whether the page could be
a secondary index leaf page. If it is not, we should avoid consulting
the change buffer bitmap to even consider a merge. This is the main
interface to requesting index pages from the buffer pool.
ibuf_merge_or_delete_for_page(), recv_recover_page(): Replace
buf_page_get_known_nowait() with much simpler logic, because
it is now guaranteed that that the block is x-latched or read-fixed.
mlog_init_t::mark_ibuf_exist(): Renamed from mlog_init_t::ibuf_merge().
On crash recovery, we will no longer merge any buffered changes
for the pages that we read into the buffer pool during the last batch
of applying log records.
buf_page_get_gen_known_nowait(), BUF_MAKE_YOUNG, BUF_KEEP_OLD: Remove.
btr_search_guess_on_hash(): Merge buf_page_get_gen_known_nowait()
to its only remaining caller.
buf_page_make_young_if_needed(): Define as an inline function.
Add the parameter buf_pool.
buf_page_peek_if_young(), buf_page_peek_if_too_old(): Add the
parameter buf_pool.
fil_space_validate_for_mtr_commit(): Remove a bogus comment
about background merge of the change buffer.
btr_cur_open_at_rnd_pos_func(), btr_cur_search_to_nth_level_func(),
btr_cur_open_at_index_side_func(): Use narrower data types and scopes.
ibuf_read_merge_pages(): Replaces buf_read_ibuf_merge_pages().
Merge the change buffer by invoking buf_page_get_gen().
This allows one to run the test suite even if any of the following
options are changed:
- character-set-server
- collation-server
- join-cache-level
- log-basename
- max-allowed-packet
- optimizer-switch
- query-cache-size and query-cache-type
- skip-name-resolve
- table-definition-cache
- table-open-cache
- Some innodb options
etc
Changes:
- Don't print out the value of system variables as one can't depend on
them to being constants.
- Don't set global variables to 'default' as the default may not
be the same as the test was started with if there was an additional
option file. Instead save original value and reset it at end of test.
- Test that depends on the latin1 character set should include
default_charset.inc or set the character set to latin1
- Test that depends on the original optimizer switch, should include
default_optimizer_switch.inc
- Test that depends on the value of a specific system variable should
set it in the test (like optimizer_use_condition_selectivity)
- Split subselect3.test into subselect3.test and subselect3.inc to
make it easier to set and reset system variables.
- Added .opt files for test that required specfic options that could
be changed by external configuration files.
- Fixed result files in rockdsb & tokudb that had not been updated for
a while.
Shorten some VARCHAR attributes to a more reasonable length.
INNODB_METRICS: Rename the column STATUS to ENABLED, and make it Boolean.
Replace with INT(1) many Boolean attributes that were declared as VARCHAR
containing 'NO','YES','disabled','enabled','Uninitialized','Initialized'.
Replace some VARCHAR attributes with ENUM.
Replace some BIGINT with INT when 32 bits are sufficient.
Remove INNODB_SYS_TABLESPACES.SPACE_TYPE. The type of a tablespace
can be derived from the tablespace ID. A fixed number is used for
the system tablespace and the temporary tablespace. All other tablespaces
are single-table or single-partition tablespaces.
i_s_locks_row_t::lock_type, lock_get_type_str(): Remove.
This is a redundant field. Table and record locks can be
distinguished by whether i_s_locks_row_t::lock_index is NULL.
fill_trx_row(): Do not unnecessarily copy the constant strings that
trx->op_info is pointing to.
i_s_locks_row_t::lock_mode: Replace string with integer.
lock_get_mode_str(), lock_get_trx_id(), lock_get_trx(): Remove.
field_store_ulint(): Remove.
Also, move part of the test back to innodb.innodb_mysql
and another part to a new test innodb.purge.
Last but not least, merge the tests innodb_zip.4k and innodb_zip.8k
to innodb_zip.page_size.
Reason for the change was that ha_notify_table_changed() was done
after table open when .frm had been replaced, which caused failure
in engines that checks on open if .frm matches the engines table
definition.
Other changes:
- Remove not needed open/close call at end of inline alter table.
Some test that depended on the table beeing in the table cache after
ALTER TABLE had to be updated.