Historically, InnoDB split the redo log into at least 2 files.
MDEV-12061 allowed the minimum to be innodb_log_files_in_group=1,
but it kept the default at innodb_log_files_in_group=2.
Because performance seems to be slightly better with only one log file,
and because implementing an append-only variant of the log would require
a single file, let us define the default to be 1, and have
innodb_log_file_size=96M, to retain the same default total size.
We will remove the InnoDB background operation of merging buffered
changes to secondary index leaf pages. Changes will only be merged as a
result of an operation that accesses a secondary index leaf page,
such as a SQL statement that performs a lookup via that index,
or is modifying the index. Also ROLLBACK and some background operations,
such as purging the history of committed transactions, or computing
index cardinality statistics, can cause change buffer merge.
Encryption key rotation will not perform change buffer merge.
The motivation of this change is to simplify the I/O logic and to
allow crash recovery to happen in the background (MDEV-14481).
We also hope that this will reduce the number of "mystery" crashes
due to corrupted data. Because change buffer merge will typically
take place as a result of executing SQL statements, there should be
a clearer connection between the crash and the SQL statements that
were executed when the server crashed.
In many cases, a slight performance improvement was observed.
This is joint work with Thirunarayanan Balathandayuthapani
and was tested by Axel Schwenke and Matthias Leich.
The InnoDB monitor counter innodb_ibuf_merge_usec will be removed.
On slow shutdown (innodb_fast_shutdown=0), we will continue to
merge all buffered changes (and purge all undo log history).
Two InnoDB configuration parameters will be changed as follows:
innodb_disable_background_merge: Removed.
This parameter existed only in debug builds.
All change buffer merges will use synchronous reads.
innodb_force_recovery will be changed as follows:
* innodb_force_recovery=4 will be the same as innodb_force_recovery=3
(the change buffer merge cannot be disabled; it can only happen as
a result of an operation that accesses a secondary index leaf page).
The option used to be capable of corrupting secondary index leaf pages.
Now that capability is removed, and innodb_force_recovery=4 becomes 'safe'.
* innodb_force_recovery=5 (which essentially hard-wires
SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED)
becomes safe to use. Bogus data can be returned to SQL, but
persistent InnoDB data files will not be corrupted further.
* innodb_force_recovery=6 (ignore the redo log files)
will be the only option that can potentially cause
persistent corruption of InnoDB data files.
Code changes:
buf_page_t::ibuf_exist: New flag, to indicate whether buffered
changes exist for a buffer pool page. Pages with pending changes
can be returned by buf_page_get_gen(). Previously, the changes
were always merged inside buf_page_get_gen() if needed.
ibuf_page_exists(const buf_page_t&): Check if a buffered changes
exist for an X-latched or read-fixed page.
buf_page_get_gen(): Add the parameter allow_ibuf_merge=false.
All callers that know that they may be accessing a secondary index
leaf page must pass this parameter as allow_ibuf_merge=true,
unless it does not matter for that caller whether all buffered
changes have been applied. Assert that whenever allow_ibuf_merge
holds, the page actually is a leaf page. Attempt change buffer
merge only to secondary B-tree index leaf pages.
btr_block_get(): Add parameter 'bool merge'.
All callers of btr_block_get() should know whether the page could be
a secondary index leaf page. If it is not, we should avoid consulting
the change buffer bitmap to even consider a merge. This is the main
interface to requesting index pages from the buffer pool.
ibuf_merge_or_delete_for_page(), recv_recover_page(): Replace
buf_page_get_known_nowait() with much simpler logic, because
it is now guaranteed that that the block is x-latched or read-fixed.
mlog_init_t::mark_ibuf_exist(): Renamed from mlog_init_t::ibuf_merge().
On crash recovery, we will no longer merge any buffered changes
for the pages that we read into the buffer pool during the last batch
of applying log records.
buf_page_get_gen_known_nowait(), BUF_MAKE_YOUNG, BUF_KEEP_OLD: Remove.
btr_search_guess_on_hash(): Merge buf_page_get_gen_known_nowait()
to its only remaining caller.
buf_page_make_young_if_needed(): Define as an inline function.
Add the parameter buf_pool.
buf_page_peek_if_young(), buf_page_peek_if_too_old(): Add the
parameter buf_pool.
fil_space_validate_for_mtr_commit(): Remove a bogus comment
about background merge of the change buffer.
btr_cur_open_at_rnd_pos_func(), btr_cur_search_to_nth_level_func(),
btr_cur_open_at_index_side_func(): Use narrower data types and scopes.
ibuf_read_merge_pages(): Replaces buf_read_ibuf_merge_pages().
Merge the change buffer by invoking buf_page_get_gen().
mark big_tables deprecated, the server can put temp tables on disk
as needed avoiding "table full" errors.
in case someone would really need to force a tmp table to be created
on disk from the start and for testing allow tmp_memory_table_size
to be set to 0.
fix tests to use that instead (and add a test that it actually
works).
make sure in-memory TREE size limit is never 0 (it's [ab]using
tmp_memory_table_size at the moment)
remove few sys_vars.*_basic tests
Cherry-pick the commits the mysql and some changes.
WL#4618 RBR: extended table metadata in the binary log
This patch extends Table Map Event. It appends some new fields for
more metadata. The new metadata includes:
- Signedness of Numberic Columns
- Character Set of Character Columns and Binary Columns
- Column Name
- String Value of SET Columns
- String Value of ENUM Columns
- Primary Key
- Character Set of SET Columns and ENUM Columns
- Geometry Type
Some of them are optional, the patch introduces a GLOBAL system
variable to control it. It is binlog_row_metadata.
- Scope: GLOBAL
- Dynamic: Yes
- Type: ENUM
- Values: {NO_LOG, MINIMAL, FULL}
- Default: NO_LOG
Only Signedness, character set and geometry type are logged if it is MINIMAL.
Otherwise all of them are logged.
Also add a binlog_type_info() to field, So that we can have extract
relevant binlog info from field.
The setting innodb_change_buffering_debug=2 was supposed to inject
a crash during change buffer merge. There is no public test for
that functionality, and even if there were, it would be better
to use DEBUG_SYNC to halt the thread that does change buffer merge,
force a redo log flush from another thread, and finally kill the
server externally.
This allows one to run the test suite even if any of the following
options are changed:
- character-set-server
- collation-server
- join-cache-level
- log-basename
- max-allowed-packet
- optimizer-switch
- query-cache-size and query-cache-type
- skip-name-resolve
- table-definition-cache
- table-open-cache
- Some innodb options
etc
Changes:
- Don't print out the value of system variables as one can't depend on
them to being constants.
- Don't set global variables to 'default' as the default may not
be the same as the test was started with if there was an additional
option file. Instead save original value and reset it at end of test.
- Test that depends on the latin1 character set should include
default_charset.inc or set the character set to latin1
- Test that depends on the original optimizer switch, should include
default_optimizer_switch.inc
- Test that depends on the value of a specific system variable should
set it in the test (like optimizer_use_condition_selectivity)
- Split subselect3.test into subselect3.test and subselect3.inc to
make it easier to set and reset system variables.
- Added .opt files for test that required specfic options that could
be changed by external configuration files.
- Fixed result files in rockdsb & tokudb that had not been updated for
a while.
Some bugs are detected only after a table definition has been evicted
and then reloaded to the InnoDB data dictionary cache.
For debug builds, introduce the settable Boolean configuration parameter
innodb_evict_tables_on_commit_debug that can be set to request InnoDB
to attempt to evict table definitions from the data dictionary cache
whenever a transaction is committed.
This has been tested on 10.3 and 10.4 with the following:
./mysql-test-run.pl --mysqld=--loose-innodb-evict-tables-on-commit-debug
You can also use the following:
SET GLOBAL innodb_evict_tables_on_commit_debug=ON;
SET GLOBAL innodb_evict_tables_on_commit_debug=OFF;
The parameter affects the commit (or rollback or abort) of
transactions that have modified persistent InnoDB tables.
- mysqltest didn't free read_command_buf
- wait_for_slave_param did write different things to the log if valgrind
was used.
- Table open cache should not write the initial variable value as it
can depend on the configuration or if valgrind is used
- A variable in GetResult was used uninitalized
The option innodb_rollback_segments was deprecated already in
MariaDB Server 10.0. Its misleadingly named replacement innodb_undo_logs
is of very limited use. It makes sense to always create and use the
maximum number of rollback segments.
Let us remove the deprecated parameter innodb_rollback_segments and
deprecate&ignore the parameter innodb_undo_logs (to be removed in a
later major release).
This work involves some cleanup of InnoDB startup. Similar to other
write operations, DROP TABLE will no longer be allowed if
innodb_force_recovery is set to a value larger than 3.
The parameter innodb_stats_sample_pages became an alias for
innodb_stats_transient_sample_pages and was deprecated in
MariaDB Server 10.0. Let us finally remove that alias.
The transaction isolation levels READ COMMITTED and READ UNCOMMITTED
should behave similarly to the old deprecated setting
innodb_locks_unsafe_for_binlog=1, that is, avoid acquiring gap locks.
row_search_mvcc(): Reduce the scope of some variables, and clean up
the initialization and use of the variable set_also_gap_locks.
Fix:
====
1) Combined innodb_ft_result_cache_limit_32.test and
innodb_ft_result_cache_limit_64.test test case in sys_vars suite.
2) Use word_size.inc for combinations of innodb_ft_result_cache_limit test case.