MDEV-11581: Mariadb starts InnoDB encryption threads
when key has not changed or data scrubbing turned off
Background: Key rotation is based on background threads
(innodb-encryption-threads) periodically going through
all tablespaces on fil_system. For each tablespace
current used key version is compared to max key age
(innodb-encryption-rotate-key-age). This process
naturally takes CPU. Similarly, in same time need for
scrubbing is investigated. Currently, key rotation
is fully supported on Amazon AWS key management plugin
only but InnoDB does not have knowledge what key
management plugin is used.
This patch re-purposes innodb-encryption-rotate-key-age=0
to disable key rotation and background data scrubbing.
All new tables are added to special list for key rotation
and key rotation is based on sending a event to
background encryption threads instead of using periodic
checking (i.e. timeout).
fil0fil.cc: Added functions fil_space_acquire_low()
to acquire a tablespace when it could be dropped concurrently.
This function is used from fil_space_acquire() or
fil_space_acquire_silent() that will not print
any messages if we try to acquire space that does not exist.
fil_space_release() to release a acquired tablespace.
fil_space_next() to iterate tablespaces in fil_system
using fil_space_acquire() and fil_space_release().
Similarly, fil_space_keyrotation_next() to iterate new
list fil_system->rotation_list where new tables.
are added if key rotation is disabled.
Removed unnecessary functions fil_get_first_space_safe()
fil_get_next_space_safe()
fil_node_open_file(): After page 0 is read read also
crypt_info if it is not yet read.
btr_scrub_lock_dict_func()
buf_page_check_corrupt()
buf_page_encrypt_before_write()
buf_merge_or_delete_for_page()
lock_print_info_all_transactions()
row_fts_psort_info_init()
row_truncate_table_for_mysql()
row_drop_table_for_mysql()
Use fil_space_acquire()/release() to access fil_space_t.
buf_page_decrypt_after_read():
Use fil_space_get_crypt_data() because at this point
we might not yet have read page 0.
fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly
to functions needing it and store fil_space_t* to rotation state.
Use fil_space_acquire()/release() when iterating tablespaces
and removed unnecessary is_closing from fil_crypt_t. Use
fil_space_t::is_stopping() to detect when access to
tablespace should be stopped. Removed unnecessary
fil_space_get_crypt_data().
fil_space_create(): Inform key rotation that there could
be something to do if key rotation is disabled and new
table with encryption enabled is created.
Remove unnecessary functions fil_get_first_space_safe()
and fil_get_next_space_safe(). fil_space_acquire()
and fil_space_release() are used instead. Moved
fil_space_get_crypt_data() and fil_space_set_crypt_data()
to fil0crypt.cc.
fsp_header_init(): Acquire fil_space_t*, write crypt_data
and release space.
check_table_options()
Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_*
i_s.cc: Added ROTATING_OR_FLUSHING field to
information_schema.innodb_tablespace_encryption
to show current status of key rotation.
Because the default build configuration of the server will remain
at -DWITH_INNODB_AHI=ON, we want to test the instrumentation.
We make and revert the test adjustments in separate commits on purpose,
so that this commit can be easily reverted later if the default
build configuration is changed to -DWITH_INNODB_AHI=OFF.
The InnoDB adaptive hash index is sometimes degrading the performance of
InnoDB, and it is sometimes disabled to get more consistent performance.
We should have a compile-time option to disable the adaptive hash index.
Let us introduce two options:
OPTION(WITH_INNODB_AHI "Include innodb_adaptive_hash_index" ON)
OPTION(WITH_INNODB_ROOT_GUESS "Cache index root block descriptors" ON)
where WITH_INNODB_AHI always implies WITH_INNODB_ROOT_GUESS.
As part of this change, the misleadingly named function
trx_search_latch_release_if_reserved(trx) will be replaced with the macro
trx_assert_no_search_latch(trx) that will be empty unless
BTR_CUR_HASH_ADAPT is defined (cmake -DWITH_INNODB_AHI=ON).
We will also remove the unused column
INFORMATION_SCHEMA.INNODB_TRX.TRX_ADAPTIVE_HASH_TIMEOUT.
In MariaDB Server 10.1, it used to reflect the value of
trx_t::search_latch_timeout which could be adjusted during
row_search_for_mysql(). In 10.2, there is no such field.
Other than the removal of the unused column TRX_ADAPTIVE_HASH_TIMEOUT,
this is an almost non-functional change to the server when using the
default build options.
Some tests are adjusted so that they will work with both
-DWITH_INNODB_AHI=ON and -DWITH_INNODB_AHI=OFF. The test
innodb.innodb_monitor has been renamed to innodb.monitor
in order to track MySQL 5.7, and the duplicate tests
sys_vars.innodb_monitor_* are removed.
MDEV-7618 introduced configuration parameter innodb_instrument_semaphores
in MariaDB Server 10.1. The parameter seems to only affect the rw-lock
X-latch acquisition. Extra fields are added to rw_lock_t to remember one
X-latch holder or waiter. These fields are not being consulted or reported
anywhere. This is basically only adding code bloat.
If the intention is to debug hangs or deadlocks, we have better tools for
that in the debug server, and for the non-debug server, core dumps can
reveal a lot. For example, the mini-transaction memo records the
currently held buffer block or index rw-locks, to be released at
mtr_t::commit().
The configuration parameter innodb_instrument_semaphores will be
deprecated in 10.2.5 and removed in 10.3.0.
rw_lock_t: Remove the members lock_name, file_name, line, thread_id
which did not affect any output.
The InnoDB redo log consists of a list of files that logically form
a bigger file, as if the individual files were concatenated together.
The first file will always be written on redo log checkpoint, because
the two checkpoint pages are at the start of the single logical
redo log file.
There is no technical reason why InnoDB requires at least 2 files
to exist. Let us reduce the minimum number to 1. In that way,
restoring from backups will become easier, since InnoDB can directly
deal with a single backed-up redo log file.
Oracle introduced a Memcached plugin interface to the InnoDB
storage engine in MySQL 5.6. That interface is essentially a
fork of Memcached development snapshot 1.6.0-beta1 of an old
development branch 'engine-pu'.
To my knowledge, there have not been any updates to the Memcached code
between MySQL 5.6 and 5.7; only bug fixes and extensions related to
the Oracle modifications.
The Memcached plugin is not part of the MariaDB Server. Therefore it
does not make sense to include the InnoDB interfaces for the Memcached
plugin, or to have any related configuration parameters:
innodb_api_bk_commit_interval
innodb_api_disable_rowlock
innodb_api_enable_binlog
innodb_api_enable_mdl
innodb_api_trx_level
Removing this code in one commit makes it possible to easily restore
it, in case it turns out to be needed later.
Remove the debug parameter innodb_force_recovery_crash that was
introduced into MySQL 5.6 by me in WL#6494 which allowed InnoDB
to resize the redo log on startup.
Let innodb.log_file_size actually start up the server, but ensure
that the InnoDB storage engine refuses to start up in each of the
scenarios.
Problem was that implementation merged from 10.1 was incompatible
with InnoDB 5.7.
buf0buf.cc: Add functions to return should we punch hole and
how big.
buf0flu.cc: Add written page to IORequest
fil0fil.cc: Remove unneeded status call and add test is
sparse files and punch hole supported by file system when
tablespace is created. Add call to get file system
block size. Used file node is added to IORequest. Added
functions to check is punch hole supported and setting
punch hole.
ha_innodb.cc: Remove unneeded status variables (trim512-32768)
and trim_op_saved. Deprecate innodb_use_trim and
set it ON by default. Add function to set innodb-use-trim
dynamically.
dberr.h: Add error code DB_IO_NO_PUNCH_HOLE
if punch hole operation fails.
fil0fil.h: Add punch_hole variable to fil_space_t and
block size to fil_node_t.
os0api.h: Header to helper functions on buf0buf.cc and
fil0fil.cc for os0file.h
os0file.h: Remove unneeded m_block_size from IORequest
and add bpage to IORequest to know actual size of
the block and m_fil_node to know tablespace file
system block size and does it support punch hole.
os0file.cc: Add function punch_hole() to IORequest
to do punch_hole operation,
get the file system block size and determine
does file system support sparse files (for punch hole).
page0size.h: remove implicit copy disable and
use this implicit copy to implement copy_from()
function.
buf0dblwr.cc, buf0flu.cc, buf0rea.cc, fil0fil.cc, fil0fil.h,
os0file.h, os0file.cc, log0log.cc, log0recv.cc:
Remove unneeded write_size parameter from fil_io
calls.
srv0mon.h, srv0srv.h, srv0mon.cc: Remove unneeded
trim512-trim32678 status variables. Removed
these from monitor tests.
Change default to zlib, this has effect only if user has
explicitly requested page compression and then user
naturally expects that pages are really compressed
if they can be compressed.
- Atomic writes are enabled by default
- Automatically detect if device supports atomic write and use it if
atomic writes are enabled
- Remove ATOMIC WRITE options from CREATE TABLE
- Atomic write is a device option, not a table options as the table may
crash if the media changes
- Add support for SHANNON SSD cards
Sometimes innodb_data_file_size_debug was reported as INT UNSIGNED
instead of BIGINT UNSIGNED. Make it uint instead of ulong to get
a more deterministic result.
The C preprocessor symbol WITH_NUMA is never defined. Instead, the symbol
HAVE_LIBNUMA is used for checking if the feature is to be used.
If cmake -DWITH_NUMA=OFF is specified, HAVE_LIBNUMA will not be defined
at compilation time even if the library is available.
If cmake -DWITH_NUMA=ON is specified but the library is not available
at configuration time, the compilation will be aborted.
The configuration parameter innodb_use_fallocate, which is mapped to
the variable srv_use_posix_fallocate, has no effect in MariaDB 10.2.2
or MariaDB 10.2.3.
Thus the configuration parameter and the variable should be removed.
fil_space_t::recv_size: New member: recovered tablespace size in pages;
0 if no size change was read from the redo log,
or if the size change was implemented.
fil_space_set_recv_size(): New function for setting space->recv_size.
innodb_data_file_size_debug: A debug parameter for setting the system
tablespace size in recovery even when the redo log does not contain
any size changes. It is hard to write a small test case that would
cause the system tablespace to be extended at the critical moment.
recv_parse_log_rec(): Note those tablespaces whose size is being changed
by the redo log, by invoking fil_space_set_recv_size().
innobase_init(): Correct an error message, and do not require a larger
innodb_buffer_pool_size when starting up with a smaller innodb_page_size.
innobase_start_or_create_for_mysql(): Allow startup with any initial
size of the ibdata1 file if the autoextend attribute is set. Require
the minimum size of fixed-size system tablespaces to be 640 pages,
not 10 megabytes. Implement innodb_data_file_size_debug.
open_or_create_data_files(): Round the system tablespace size down
to pages, not to full megabytes, (Our test truncates the system
tablespace to more than 800 pages with innodb_page_size=4k.
InnoDB should not imagine that it was truncated to 768 pages
and then overwrite good pages in the tablespace.)
fil_flush_low(): Refactored from fil_flush().
fil_space_extend_must_retry(): Refactored from
fil_extend_space_to_desired_size().
fil_mutex_enter_and_prepare_for_io(): Extend the tablespace if
fil_space_set_recv_size() was called.
The test case has been successfully run with all the
innodb_page_size values 4k, 8k, 16k, 32k, 64k.
* remove new InnoDB-specific ER_ and HA_ERR_ codes
* renamed few old ER_ and HA_ERR_ error messages to be less MyISAM-specific
* remove duplicate enum definitions (durability_properties, icp_result)
* move new mysql-test include files to their owner suite
* rename xtradb.rdiff files to *-disabled
* remove mistakenly committed helper perl module
* remove long obsolete handler::ha_statistic_increment() method
* restore the standard C xid_t structure to not have setters and getters
* remove xid_t::reset that was cleaning too much
* move MySQL-5.7 ER_ codes where they belong
* fir innodb to include service_wsrep.h not internal wsrep headers
* update tests and results
Contains also
MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7
The failure happened because 5.7 has changed the signature of
the bool handler::primary_key_is_clustered() const
virtual function ("const" was added). InnoDB was using the old
signature which caused the function not to be used.
MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7
Fixed mutexing problem on lock_trx_handle_wait. Note that
rpl_parallel and rpl_optimistic_parallel tests still
fail.
MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan)
Reason: incorrect merge
MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan)
Reason: incorrect merge
Analysis: Lengths which are not UNIV_SQL_NULL, but bigger than the following
number indicate that a field contains a reference to an externally
stored part of the field in the tablespace. The length field then
contains the sum of the following flag and the locally stored len.
This was incorrectly set to
define UNIV_EXTERN_STORAGE_FIELD (UNIV_SQL_NULL - UNIV_PAGE_SIZE_MAX)
When it should be
define UNIV_EXTERN_STORAGE_FIELD (UNIV_SQL_NULL - UNIV_PAGE_SIZE_DEF)
Additionally, we need to disable support for > 16K page size for
row compressed tables because a compressed page directory entry
reserves 14 bits for the start offset and 2 bits for flags.
This limits the uncompressed page size to 16k. To support
larger pages page directory entry needs to be larger.