The invariant of write-ahead logging is that before any change to a
page is written to the data file, the corresponding log record must
must first have been durably written.
In crash recovery, there were some sloppy checks for this. Let us
implement accurate checks and flag an inconsistency as a hard error,
so that we can avoid further corruption of a corrupted database.
For data extraction from the corrupted database, innodb_force_recovery
can be used.
Before recovery is reading any data pages or invoking
buf_dblwr_t::recover() to recover torn pages from the
doublewrite buffer, InnoDB will have parsed the log until the
final LSN and updated log_sys.lsn to that. So, we can rely on
log_sys.lsn at all times. The doublewrite buffer recovery has been
refactored in such a way that the recv_sys.dblwr.pages may be consulted
while discovering files and their page sizes, but nothing will be
written back to data files before buf_dblwr_t::recover() is invoked.
recv_max_page_lsn, recv_lsn_checks_on: Remove.
recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging
condition at the end of the recovery.
recv_dblwr_t::validate_page(): Keep track of the maximum LSN
(if we are checking a non-doublewrite copy of a page) but
do not complain LSN being in the future. The doublewrite buffer
is a special case, because it will be read early during recovery.
Besides, starting with commit 762bcb81b5bf9bbde61fed59afb26417f4ce1e86
the dblwr=true copies of pages may legitimately be "too new".
recv_dblwr_t::find_page(): Find a valid page with the smallest
FIL_PAGE_LSN that is in the valid range for recovery.
recv_dblwr_t::restore_first_page(): Replaced by find_page().
Only buf_dblwr_t::recover() will write to data files.
buf_dblwr_t::recover(): Simplify the message output. Do attempt
doublewrite recovery on user page read error. Ignore doublewrite
pages whose FIL_PAGE_LSN is outside the usable bounds. Previously,
we could wrongly recover a too new page from the doublewrite buffer.
It is unlikely that this could have lead to an actual error.
Write back all recovered pages from the doublewrite buffer here,
including for the first page of any tablespace.
buf_page_is_corrupted(): Distinguish the return values
CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER.
buf_page_check_corrupt(): Return the error code DB_CORRUPTION
in case the LSN is in the future.
Datafile::read_first_page_flags(): Split from read_first_page().
Take a copy of the first page as a parameter.
recv_sys_t::free_corrupted_page(): Take the file as a parameter
and return whether a message was displayed. This avoids some duplicated
and incomplete error messages.
buf_page_t::read_complete(): Remove some redundant output and always
display the name of the corrupted file. Never return DB_FAIL;
use it only in internal error handling.
IORequest::read_complete(): Assume that buf_page_t::read_complete()
will have reported any error.
fil_space_t::set_corrupted(): Return whether this is the first time
the tablespace had been flagged as corrupted.
Datafile::validate_first_page(), fil_node_open_file_low(),
fil_node_open_file(), fil_space_t::read_page0(),
fil_node_t::read_page0(): Add a parameter for a copy of the
first page, and a parameter to indicate whether the FIL_PAGE_LSN
check should be suppressed. Before buf_dblwr_t::recover() is
invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the
FSP_SPACE_FLAGS and the tablespace ID that may be present in a
potentially too new copy of a page.
Reviewed by: Debarun Banerjee
PageBulk::init(): Unnecessary reserves the extent before
allocating a page for bulk insert. btr_page_alloc()
capable of handing the extending of tablespace.
In any test that uses wait_all_purged.inc, ensure that InnoDB tables
will be created without persistent statistics.
This is a follow-up to commit cd04673a177d40f7c409284d87ead851ec775c36
after a similar failure was observed in the innodb_zip.blob test.
dict_find_max_space_id(): Return SELECT MAX(SPACE) FROM SYS_TABLES.
dict_check_tablespaces_and_store_max_id(): In the normal case
(no encryption plugin has been loaded and the change buffer is empty),
invoke dict_find_max_space_id() and do not open any .ibd files.
If a std::set<uint32_t> has been specified, open the files whose
tablespace ID is mentioned. Else, open all data files that are identified
by SYS_TABLES records.
fil_ibd_open(): Remove a call to os_file_get_last_error() that can
report a misleading error, such as EINVAL inside my_realpath() that is
not an actual error. This could be invoked when a data file is found
but the FSP_SPACE_FLAGS are incorrect, such as is the case for
table test.td in
./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 innodb.table_flags
buf_load(): If any tablespaces could not be found, invoke
dict_check_tablespaces_and_store_max_id() on the missing tablespaces.
dict_load_tablespace(): Try to load the tablespace unless it was found
to be futile. This fixes failures related to FTS_*.ibd files for
FULLTEXT INDEX.
btr_cur_t::search_leaf(): Prevent a crash when the tablespace
does not exist. This was caught by the test innodb_fts.fts_concurrent_insert
when the change to dict_load_tablespaces() was not present.
We modify a few tests to ensure that tables will not be loaded at startup.
For some fault injection tests this means that the corrupted tables
will not be loaded, because dict_load_tablespace() would perform stricter
checks than dict_check_tablespaces_and_store_max_id().
Tested by: Matthias Leich
Reviewed by: Thirunarayanan Balathandayuthapani
The motivation of introducing the parameter
innodb_purge_rseg_truncate_frequency in
mysql/mysql-server@28bbd66ea5 and
mysql/mysql-server@8fc2120fed
seems to have been to avoid stalls due to freeing undo log pages
or truncating undo log tablespaces. In MariaDB Server,
innodb_undo_log_truncate=ON should be a much lighter operation
than in MySQL, because it will not involve any log checkpoint.
Another source of performance stalls should be
trx_purge_truncate_rseg_history(), which is shrinking the history list
by freeing the undo log pages whose undo records have been purged.
To alleviate that, we will introduce a purge_truncation_task that will
offload this from the purge_coordinator_task. In that way, the next
innodb_purge_batch_size pages may be parsed and purged while the pages
from the previous batch are being freed and the history list being shrunk.
The processing of innodb_undo_log_truncate=ON will still remain the
responsibility of the purge_coordinator_task.
purge_coordinator_state::count: Remove. We will ignore
innodb_purge_rseg_truncate_frequency, and act as if it had been
set to 1 (the maximum shrinking frequency).
purge_coordinator_state::do_purge(): Invoke an asynchronous task
purge_truncation_callback() to free the undo log pages.
purge_sys_t::iterator::free_history(): Free those undo log pages
that have been processed. This used to be a part of
trx_purge_truncate_history().
purge_sys_t::clone_end_view(): Take a new value of purge_sys.head
as a parameter, so that it will be updated while holding exclusive
purge_sys.latch. This is needed for race-free access to the field
in purge_truncation_callback().
Reviewed by: Vladislav Lesin
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.
Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue
Disabled test:
- spider/bugfix.mdev_27239 because we started to get
+Error 1429 Unable to connect to foreign data source: localhost
-Error 1158 Got an error reading communication packets
- main.delayed
- Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
This part is disabled for now as it fails randomly with different
warnings/errors (no corruption).
Some s390x environments include
https://github.com/madler/zlib/pull/410
and a more pessimistic compressBound: (sourceLen * 16 + 2308) / 8 + 6.
Let us adjust the recently enabled tests accordingly.
Currently include/have_innodb_4k.inc etc. files only check that the
server is running with the corresponding page size. I think it would
be more convenient if they actually enforced the setting.
The test innodb_zip.index_large_prefix_4k would not run unless it is
invoked as
./mtr --mysqld=--innodb-page-size=4k innodb_zip.index_large_prefix_4k
This test was originally developed to cover an option that was removed
in commit 0c92794db3026cda03218caf4918b996baab6ba6. Starting with
MariaDB Server 10.2, which introduced innodb_default_row_format=dynamic,
the option innodb_large_prefix had become useless.
Let us remove some of the stale tests and adjust the outcome to the
expected behaviour.
btr_cur_need_opposite_intention(): Check also page_zip_available()
so that we will escalate to exclusive index latch when a non-leaf
page may have to be split further due to ROW_FORMAT=COMPRESSED page
overflow.
Tested by: Matthias Leich
btr_cur_need_opposite_intention(): Check also page_zip_available()
so that we will escalate to exclusive index latch when a non-leaf
page may have to be split further due to ROW_FORMAT=COMPRESSED page
overflow.
Tested by: Matthias Leich
In MariaDB, we have a confusing problem where:
* The transaction_isolation option can be set in a configuration file, but it cannot be set dynamically.
* The tx_isolation system variable can be set dynamically, but it cannot be set in a configuration file.
Therefore, we have two different names for the same thing in different contexts. This is needlessly confusing, and it complicates the documentation. The same thing applys for transaction_read_only.
MySQL 5.7 solved this problem by making them into system variables. https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-20.html
This commit takes a similar approach by adding new system variables and marking the original ones as deprecated. This commit also resolves some legacy problems related to SET STATEMENT and transaction_isolation.
try to make them less confusing for users.
Hopefully, if the version string will be changed like
- mariadb Ver 15.1 Distrib 10.11.2-MariaDB for Linux (x86_64)
+ mariadb from 10.11.2-MariaDB, client 15.1 for Linux (x86_64)
users will be less inclined to reply "15.1" to the question
"what mariadb version are you using?"
The purpose of the change buffer was to reduce random disk access,
which could be useful on rotational storage, but maybe less so on
solid-state storage.
When we wished to
(1) insert a record into a non-unique secondary index,
(2) delete-mark a secondary index record,
(3) delete a secondary index record as part of purge (but not ROLLBACK),
and the B-tree leaf page where the record belongs to is not in the buffer
pool, we inserted a record into the change buffer B-tree, indexed by
the page identifier. When the page was eventually read into the buffer
pool, we looked up the change buffer B-tree for any modifications to the
page, applied these upon the completion of the read operation. This
was called the insert buffer merge.
We remove the change buffer, because it has been the source of
various hard-to-reproduce corruption bugs, including those fixed in
commit 5b9ee8d8193a8c7a8ebdd35eedcadc3ae78e7fc1 and
commit 165564d3c33ae3d677d70644a83afcb744bdbf65 but not limited to them.
A downgrade will fail with a clear message starting with
commit db14eb16f9977453467ec4765f481bb2f71814ba (MDEV-30106).
buf_page_t::state: Merge IBUF_EXIST to UNFIXED and
WRITE_FIX_IBUF to WRITE_FIX.
buf_pool_t::watch[]: Remove.
trx_t: Move isolation_level, check_foreigns, check_unique_secondary,
bulk_insert into the same bit-field. The only purpose of
trx_t::check_unique_secondary is to enable bulk insert into an
empty table. It no longer enables insert buffering for UNIQUE INDEX.
btr_cur_t::thr: Remove. This field was originally needed for change
buffering. Later, its use was extended to cover SPATIAL INDEX.
Much of the time, rtr_info::thr holds this field. When it does not,
we will add parameters to SPATIAL INDEX specific functions.
ibuf_upgrade_needed(): Check if the change buffer needs to be updated.
ibuf_upgrade(): Merge and upgrade the change buffer after all redo log
has been applied. Free any pages consumed by the change buffer, and
zero out the change buffer root page to mark the upgrade completed,
and to prevent a downgrade to an earlier version.
dict_load_tablespaces(): Renamed from
dict_check_tablespaces_and_store_max_id(). This needs to be invoked
before ibuf_upgrade().
btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics.
The change buffer merge does not need this function anymore.
btr_page_alloc(): Renamed from btr_page_alloc_low(). We no longer
allocate any change buffer pages.
btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics.
The change buffer merge does not need this function anymore.
row_search_index_entry(), btr_lift_page_up(): Add a parameter thr
for the SPATIAL INDEX case.
rtr_page_split_and_insert(): Specialized from btr_page_split_and_insert().
rtr_root_raise_and_insert(): Specialized from btr_root_raise_and_insert().
Note: The support for upgrading from the MySQL 3.23 or MySQL 4.0
change buffer format that predates the MySQL 4.1 introduction of
the option innodb_file_per_table was removed in MySQL 5.6.5
as part of mysql/mysql-server@69b6241a79
and MariaDB 10.0.11 as part of 1d0f70c2f894b27e98773a282871d32802f67964.
In the tests innodb.log_upgrade and innodb.log_corruption, we create
valid (upgraded) change buffer pages.
Tested by: Matthias Leich
Before commit 6112853cdab2770e92f9cfefdfef9c0a14b71cb7 in MySQL 4.1.1
introduced the parameter innodb_file_per_table, all InnoDB data was
written to the InnoDB system tablespace (often named ibdata1).
A serious design problem is that once the system tablespace has grown to
some size, it cannot shrink even if the data inside it has been deleted.
There are also other design problems, such as the server hang MDEV-29930
that should only be possible when using innodb_file_per_table=0 and
innodb_undo_tablespaces=0 (storing both tables and undo logs in the
InnoDB system tablespace).
The parameter innodb_change_buffering was deprecated
in commit b5852ffbeebc3000982988383daeefb0549e058a.
Starting with commit baf276e6d4a44fe7cdf3b435c0153da0a42af2b6
(MDEV-19229) the number of innodb_undo_tablespaces can be increased,
so that the undo logs can be moved out of the system tablespace
of an existing installation.
If all these things (tables, undo logs, and the change buffer) are
removed from the InnoDB system tablespace, the only variable-size
data structure inside it is the InnoDB data dictionary.
DDL operations on .ibd files was optimized in
commit 86dc7b4d4cfe15a2d37f8b5f60c4fce5dba9491d (MDEV-24626).
That should have removed any thinkable performance advantage of
using innodb_file_per_table=0.
Since there should be no benefit of setting innodb_file_per_table=0,
the parameter should be deprecated. Starting with MySQL 5.6 and
MariaDB Server 10.0, the default value is innodb_file_per_table=1.
Per fsp0types.h, SDI is on tablespace flags position 14 where MariaDB
stores its pagesize. Flag at position 13, also in MariaDB pagesize
flags, is a MySQL encryption flag.
These are checked only if fsp_flags_is_valid fails, so valid MariaDB
pages sizes don't become errors.
The error message "Cannot reset LSNs in table" was rather specific and
not always true to replaced with more generic error.
ALTER TABLE tbl IMPORT TABLESPACE now reports Unsupported on MySQL
tablespace (rather than index corrupted) along with a server error
message.
MySQL innodb Errors are with with UNSUPPORTED rather than CORRUPTED
to avoid user anxiety.
Reviewer: Marko Mäkelä