The option innodb_rollback_segments was deprecated already in
MariaDB Server 10.0. Its misleadingly named replacement innodb_undo_logs
is of very limited use. It makes sense to always create and use the
maximum number of rollback segments.
Let us remove the deprecated parameter innodb_rollback_segments and
deprecate&ignore the parameter innodb_undo_logs (to be removed in a
later major release).
This work involves some cleanup of InnoDB startup. Similar to other
write operations, DROP TABLE will no longer be allowed if
innodb_force_recovery is set to a value larger than 3.
The parameter innodb_stats_sample_pages became an alias for
innodb_stats_transient_sample_pages and was deprecated in
MariaDB Server 10.0. Let us finally remove that alias.
The transaction isolation levels READ COMMITTED and READ UNCOMMITTED
should behave similarly to the old deprecated setting
innodb_locks_unsafe_for_binlog=1, that is, avoid acquiring gap locks.
row_search_mvcc(): Reduce the scope of some variables, and clean up
the initialization and use of the variable set_also_gap_locks.
The parameter innodb_log_checksums that was introduced in MariaDB 10.2.2
via mysql/mysql-server@af0acedd88
does not make much sense. The original motivation of introducing this
parameter (initially called innodb_log_checksum_algorithm in
mysql/mysql-server@22ba38218e)
was that the InnoDB redo log used the slow and insecure innodb algorithm.
With hardware or SIMD accelerated CRC-32C, there should be no reason to
allow checksums to be disabled on the redo log.
The parameter innodb_encrypt_log already implies innodb_log_checksums=ON.
Let us deprecate the parameter innodb_log_checksums and always compute
redo log checksums, even if innodb_log_checksums=OFF is specified.
An upgrade from MariaDB 10.2.2 or later will only be possible after
using the default value innodb_log_checksums=ON. If the non-default
value innodb_log_checksums=OFF was in effect when the server was shut down,
a log block checksum mismatch will be reported and the upgraded server
will fail to start up.
MariaDB data-at-rest encryption (innodb_encrypt_tables)
had repurposed the same unused data field that was repurposed
in MySQL 5.7 (and MariaDB 10.2) for the Split Sequence Number (SSN)
field of SPATIAL INDEX. Because of this, MariaDB was unable to
support encryption on SPATIAL INDEX pages.
Furthermore, InnoDB page checksums skipped some bytes, and there
are multiple variations and checksum algorithms. By default,
InnoDB accepts all variations of all algorithms that ever existed.
This unnecessarily weakens the page checksums.
We hereby introduce two more innodb_checksum_algorithm variants
(full_crc32, strict_full_crc32) that are special in a way:
When either setting is active, newly created data files will
carry a flag (fil_space_t::full_crc32()) that indicates that
all pages of the file will use a full CRC-32C checksum over the
entire page contents (excluding the bytes where the checksum
is stored, at the very end of the page). Such files will always
use that checksum, no matter what the parameter
innodb_checksum_algorithm is assigned to.
For old files, the old checksum algorithms will continue to be
used. The value strict_full_crc32 will be equivalent to strict_crc32
and the value full_crc32 will be equivalent to crc32.
ROW_FORMAT=COMPRESSED tables will only use the old format.
These tables do not support new features, such as larger
innodb_page_size or instant ADD/DROP COLUMN. They may be
deprecated in the future. We do not want an unnecessary
file format change for them.
The new full_crc32() format also cleans up the MariaDB tablespace
flags. We will reserve flags to store the page_compressed
compression algorithm, and to store the compressed payload length,
so that checksum can be computed over the compressed (and
possibly encrypted) stream and can be validated without
decrypting or decompressing the page.
In the full_crc32 format, there no longer are separate before-encryption
and after-encryption checksums for pages. The single checksum is
computed on the page contents that is written to the file.
We do not make the new algorithm the default for two reasons.
First, MariaDB 10.4.2 was a beta release, and the default values
of parameters should not change after beta. Second, we did not
yet implement the full_crc32 format for page_compressed pages.
This will be fixed in MDEV-18644.
This is joint work with Marko Mäkelä.
The parameters innodb_file_format and innodb_large_prefix were overridden
in the Debian-distributed configuration files, because the default values
of these parameters between MariaDB 5.5 and MariaDB 10.2
did not make any sense.
To allow a more seamless upgrade from MariaDB 10.1 to later versions,
let InnoDB recognize the parameters innodb_file_format and
innodb_large_prefix and issue deprecation warnings for them if they
are specified. A deprecation period of only one major release
(one year between the MariaDB 10.2 and 10.3 releases) is insufficient
for these widely used parameters.
main.derived_cond_pushdown: Move all 10.3 tests to the end,
trim trailing white space, and add an "End of 10.3 tests" marker.
Add --sorted_result to tests where the ordering is not deterministic.
main.win_percentile: Add --sorted_result to tests where the
ordering is no longer deterministic.
The setting innodb_safe_truncate=ON reduces compatibility with older
versions of MariaDB and backup tools in two ways.
First, we will be writing TRX_UNDO_RENAME_TABLE records, which older
versions do not know about. These records could be misinterpreted if
a DDL transaction was recovered and would be rolled back.
Such rollback is only possible if the server was killed while
an incomplete DDL transaction was persisted. On transaction completion,
the insert_undo log pages would only be repurposed for new undo log
allocations, and their contents would not matter. So, older versions
will not have a problem with innodb_safe_truncate=ON if the server was
shut down cleanly.
Second, to prevent such recovery failure, innodb_safe_truncate=ON will
cause a modification of the redo log format identifier, which will
prevent older versions from starting up after a crash. MariaDB Server
versions older than 10.2.13 will refuse to start up altogether, even
after clean shutdown.
A server restart with innodb_safe_truncate=OFF will restore compatibility
with older server and backup versions.
Rename the 10.2-specific configuration option innodb_unsafe_truncate
to innodb_safe_truncate, and invert its value.
The default (for now) is innodb_safe_truncate=OFF, to avoid
disrupting users with an undo and redo log format change within
a Generally Available (GA) release series.
While MariaDB Server 10.2 is not really guaranteed to be compatible
with Percona XtraBackup 2.4 (for example, the MySQL 5.7 undo log format
change that could be present in XtraBackup, but was reverted from
MariaDB in MDEV-12289), we do not want to disrupt users who have
deployed xtrabackup and MariaDB Server 10.2 in their environments.
With this change, MariaDB 10.2 will continue to use the backup-unsafe
TRUNCATE TABLE code, so that neither the undo log nor the redo log
formats will change in an incompatible way.
Undo tablespace truncation will keep using the redo log only. Recovery
or backup with old code will fail to shrink the undo tablespace files,
but the contents will be recovered just fine.
In the MariaDB Server 10.2 series only, we introduce the configuration
parameter innodb_unsafe_truncate and make it ON by default. To allow
MariaDB Backup (mariabackup) to work properly with TRUNCATE TABLE
operations, use loose_innodb_unsafe_truncate=OFF.
MariaDB Server 10.3.10 and later releases will always use the
backup-safe TRUNCATE TABLE, and this parameter will not be
added there.
recv_recovery_rollback_active(): Skip row_mysql_drop_garbage_tables()
unless innodb_unsafe_truncate=OFF. It is too unsafe to drop orphan
tables if RENAME operations are not transactional within InnoDB.
LOG_HEADER_FORMAT_10_3: Replaces LOG_HEADER_FORMAT_CURRENT.
log_init(), log_group_file_header_flush(),
srv_prepare_to_delete_redo_log_files(),
innobase_start_or_create_for_mysql(): Choose the redo log format
and subformat based on the value of innodb_unsafe_truncate.
This concludes the merge of all applicable InnoDB changes from
MySQL 5.7.23, with the exception of a performance fix, which we
plan to rewrite in MariaDB later in such a way that it does not
involve changing the storage engine API:
MDEV-16849 Extending indexed VARCHAR column should be instantaneous
Introduce the configuration option innodb_log_optimize_ddl
for controlling whether native index creation or table-rebuild
in InnoDB should keep optimizing the redo log
(and writing MLOG_INDEX_LOAD records to ensure that
concurrent backup would fail).
By default, we have innodb_log_optimize_ddl=ON, that is,
the default behaviour that was introduced in MariaDB 10.2.2
(with the merge of InnoDB from MySQL 5.7) will be unchanged.
BtrBulk::m_trx: Replaces m_trx_id. We must be able to check for
KILL QUERY even if !m_flush_observer (innodb_log_optimize_ddl=OFF).
page_cur_insert_rec_write_log(): Declare globally, so that this
can be called from PageBulk::insert().
row_merge_insert_index_tuples(): Remove the unused parameter trx_id.
row_merge_build_indexes(): Enable or disable redo logging based on
the innodb_log_optimize_ddl parameter.
PageBulk::init(), PageBulk::insert(), PageBulk::finish(): Write
redo log records if needed. For ROW_FORMAT=COMPRESSED, redo log
will be written in PageBulk::compress() unless we called
m_mtr.set_log_mode(MTR_LOG_NO_REDO).
The parameter innodb_lock_schedule_algorithm was introduced in
MariaDB Server 10.1.19, 10.2.13, 10.3.4 as part of MDEV-11039.
In MariaDB 10.1, the default value of the parameter is 'fcfs',
that is, the existing algorithm is used by default. But in
later versions of MariaDB Server, the parameter was 'vats',
enabling the new algorithm.
Because the new algorithm is triggering a debug assertion failure
that suggests corruption of the transactional lock data structures,
we will revert to the old algorithm by default until we have
resolved the problem.
Because the InnoDB implementation in MariaDB has diverged from MySQL,
it is not meaningful to report a MySQL version number for InnoDB
any more. Some examples include:
MariaDB 10.1 (which is based on MySQL 5.6) included encryption and
variable-size page compression before MySQL 5.7 introduced them.
MariaDB 10.2 (based on MySQL 5.7) introduced persistent AUTO_INCREMENT
(MDEV-6076) in a GA release before MySQL 8.0.
MariaDB 10.3 (based on MySQL 5.7) introduced instant ADD COLUMN
(MDEV-11369) before MySQL.
All of these features use a different implementation and file format.
Also, some features were never merged from MySQL 5.7, and thus MariaDB
is not affected by related bugs. Examples include CREATE TABLESPACE
and the reimplementation of the partitioning engine.
Bind more InnoDB parameters directly to MYSQL_SYSVAR and
remove "shadow variables".
innodb_change_buffering: Declare as ENUM, not STRING.
innodb_flush_method: Declare as ENUM, not STRING.
innodb_log_buffer_size: Bind directly to srv_log_buffer_size,
without rounding it to a multiple of innodb_page_size.
LOG_BUFFER_SIZE: Remove.
SysTablespace::normalize_size(): Renamed from normalize().
innodb_init_params(): A new function to initialize and validate
InnoDB startup parameters.
innodb_init(): Renamed from innobase_init(). Invoke innodb_init_params()
before actually trying to start up InnoDB.
srv_start(bool): Renamed from innobase_start_or_create_for_mysql().
Added the input parameter create_new_db.
SRV_ALL_O_DIRECT_FSYNC: Define only for _WIN32.
xb_normalize_init_values(): Merge to innodb_init_param().
row_undo_step(): If innodb_fast_shutdown=3 has been requested,
abort the rollback of any non-DDL transactions. Starting with
MDEV-12323, we aborted the rollback of recovered transactions. The
transactions would be rolled back on subsequent server startup.
trx_roll_report_progress(): Renamed from trx_roll_must_shutdown(),
now that the shutdown check has been moved to the only caller.
trx_commit_low(): Allow mtr=NULL for transactions that are aborted
on rollback.
trx_rollback_finish(): Clean up aborted transactions to avoid
assertion failures and memory leaks on shutdown. This code was
previously in trx_rollback_active().
trx_rollback_to_savepoint_low(), trx_rollback_for_mysql_low():
Remove some redundant assertions.
InnoDB in Debian uses utf8mb4 as default character set since
version 10.0.20-2. This leads to major pain due to keys longer
than 767 bytes.
MariaDB 10.2 (and MySQL 5.7) introduced the setting
innodb_default_row_format that is DYNAMIC by default. These
versions also changed the default values of the parameters
innodb_large_prefix=ON and innodb_file_format=Barracuda.
This would allow longer column index prefixes to be created.
The original purpose of these parameters was to allow InnoDB
to be downgraded to MySQL 5.1, which is long out of support.
Every InnoDB version since MySQL 5.5 does support operation
with the relaxed limits.
We backport the parameter innodb_default_row_format to
MariaDB 10.1, but we will keep its default value at COMPACT.
This allows MariaDB 10.1 to be configured so that CREATE TABLE
is less likely to encounter a problem with the limitation:
loose_innodb_large_prefix=ON
loose_innodb_default_row_format=DYNAMIC
(Note that the setting innodb_large_prefix was deprecated in
MariaDB 10.2 and removed in MariaDB 10.3.)
The only observable difference in the behaviour with the default
settings should be that ROW_FORMAT=DYNAMIC tables can be created
both in the system tablespace and in .ibd files, no matter what
innodb_file_format has been assigned to. Unlike MariaDB 10.2,
we are not changing the default value of innodb_file_format,
so ROW_FORMAT=COMPRESSED tables cannot be created without
changing the parameter.
Add innodb debug system variable, innodb_buffer_pool_load_pages_abort, to test
the behaviour of innodb_buffer_pool_load_incomplete.
(innodb_buufer_pool_dump_abort_loads.test)
Two follow-up tasks were filed for MySQL 5.7.21 changes that
were not applied here:
MDEV-15179 performance_schema.file_instances does not reflect RENAME TABLE
MDEV-14222 Unnecessary 'cascade' memory allocation for every updated
row when there is no FOREIGN KEY
InnoDB RNG maintains global state, causing otherwise unnecessary bus
traffic. Even worse this is cross-mutex traffic. That is different
mutexes suffer from contention.
Fixed delay of 4 was verified to give best throughput by OLTP update
index and read-write benchmarks on Intel Broadwell (2/20/40) and
ARM (1/46/46).
This is 10.1 version where no merge error exists.
wsrep_on_check
New check function. Galera can't be enabled
if innodb-lock-schedule-algorithm=VATS.
innobase_kill_query
In Galera async kill we could own lock mutex.
innobase_init
If Variance-Aware-Transaction-Sheduling Algorithm (VATS) is
used on Galera we refuse to start InnoDB.
Changed innodb-lock-schedule-algorithm as read-only parameter
as it was designed to be.
lock_rec_other_has_expl_req,
lock_rec_other_has_conflicting,
lock_rec_lock_slow
lock_table_other_has_incompatible
lock_rec_insert_check_and_lock
Change pointer to conflicting lock to normal pointer as this
pointer contents could be changed later.
New test cases
innodb-page-cleaners
Modified test cases
innodb_page_cleaners_basic
New function buf_flush_set_page_cleaner_thread_cnt
Increase or decrease the amount of page cleaner worker threads.
In case of increase this function creates based on current
abount and requested amount how many new threads should be
created. In case of decrease this function sets up the
requested amount of threads and uses is_requested event
to signal workers. Then we wait until all new treads
are started, old threads that should exit signal
is_finished or shutdown has marked that page cleaner
should finish.
buf_flush_page_cleaner_worker
Store current thread id and thread_no and then signal
event is_finished. If number of used page cleaner threads
decrease we shut down those threads that have thread_no
greater or equal than number of page configured page
cleaners - 1 (note that there will be always page cleaner
coordinator). Before exiting we signal is_finished.
New function innodb_page_cleaners_threads_update
Update function for innodb-page-cleaners system variable.
innobase_start_or_create_for_mysql
If more than one page cleaner threads is configured
we use new function buf_flush_set_page_cleaner_thread_cnt
to set up the requested threads (-1 coordinator).
The InnoDB purge subsystem can be best stopped by opening a read view,
for example by START TRANSACTION WITH CONSISTENT SNAPSHOT.
To ensure that everything is purged, use wait_all_purged.inc,
which waits for the History list length in SHOW ENGINE INNODB STATUS
to reach 0. Setting innodb_purge_run_now never guaranteed this.
Only a relevant subset of the InnoDB changes was merged.
In particular, two follow-up bug fixes for the bugs that
were introduced in 5.7.18 but not MariaDB 10.2.7 were omitted.
Because MariaDB 10.2.7 omitted the risky change
Bug#23481444 OPTIMISER CALL ROW_SEARCH_MVCC() AND READ THE INDEX
APPLIED BY UNCOMMITTED ROWS
we do not need the follow-up fixes that were introduced in
MySQL 5.6.37 and MySQL 5.7.19:
Bug#25175249 ASSERTION: (TEMPL->IS_VIRTUAL && !FIELD) || ...
Bug#25793677 INNODB: FAILING ASSERTION: CLUST_TEMPL_FOR_SEC || LEN
The option innodb_log_compressed_pages was contributed by
Facebook to MySQL 5.6. It was disabled in the 5.6.10 GA release
due to problems that were fixed in 5.6.11, which is when the
option was enabled.
The option was set to innodb_log_compressed_pages=ON by default
(disabling the feature), because safety was considered more
important than speed. The option innodb_log_compressed_pages=OFF
can *CORRUPT* ROW_FORMAT=COMPRESSED tables on crash recovery
if the zlib deflate function is behaving differently (producing
a different amount of compressed data) from how it behaved
when the redo log records were written (prior to the crash recovery).
In MDEV-6935, the default value was changed to
innodb_log_compressed_pages=OFF. This is inherently unsafe, because
there are very many different environments where MariaDB can be
running, using different zlib versions. While zlib can decompress
data just fine, there are no guarantees that different versions will
always compress the same data to the exactly same size. To avoid
problems related to zlib upgrades or version mismatch, we must
use a safe default setting.
This will reduce the write performance for users of
ROW_FORMAT=COMPRESSED tables. If you configure
innodb_log_compressed_pages=ON, please make sure that you will
always cleanly shut down InnoDB before upgrading the server
or zlib.