1
0
mirror of https://github.com/MariaDB/server.git synced 2025-12-01 17:39:21 +03:00
Commit Graph

554 Commits

Author SHA1 Message Date
Sergei Golubchik
ba01c2aaf0 Merge branch '11.4' into 11.7
* rpl.rpl_system_versioning_partitions updated for MDEV-32188
* innodb.row_size_error_log_warnings_3 changed error for MDEV-33658
  (checks are done in a different order)
2025-02-06 16:46:36 +01:00
Sergei Golubchik
7d657fda64 Merge branch '10.11 into 11.4 2025-01-30 12:01:11 +01:00
Sergei Golubchik
e69f8cae1a Merge branch '10.6' into 10.11 2025-01-30 11:55:13 +01:00
Thirunarayanan Balathandayuthapani
a6ab0e6c0b MDEV-34898 Doublewrite recovery of innodb_checksum_algorithm=full_crc32 encrypted pages does not work
- Use file_key_management encryption plugin instead of
example_key_management_plugin for the encryption.doublewrite_debug
test case
2025-01-16 18:14:49 +05:30
Marko Mäkelä
bca4cc0bd6 Merge 10.11 into 11.4 2025-01-09 14:02:19 +02:00
Marko Mäkelä
ea19a6b38c MDEV-35699 Multi-batch recovery occasionally fails
recv_sys_t::parse<storing=NO>(): Do invoke
fil_space_set_recv_size_and_flags() and do parse enough of page 0
to facilitate that.

This fixes a regression that had been introduced in
commit b249a059da (MDEV-34850).
In a multi-batch crash recovery, we would fail to invoke
fil_space_set_recv_size_and_flags() while parsing the remaining log,
before starting the first recovery batch.

Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
2025-01-09 13:18:30 +02:00
Marko Mäkelä
15700f54c2 Merge 11.4 into 11.7 2025-01-09 09:41:38 +02:00
Marko Mäkelä
17f01186f5 Merge 10.11 into 11.4 2025-01-09 07:58:08 +02:00
Marko Mäkelä
420d9eb27f Merge 10.6 into 10.11 2025-01-08 12:51:26 +02:00
Thirunarayanan Balathandayuthapani
f8cf493290 MDEV-34898 Doublewrite recovery of innodb_checksum_algorithm=full_crc32 encrypted pages does not work
- InnoDB fails to recover the full crc32 encrypted page from
doublewrite buffer. The reason is that buf_dblwr_t::recover()
fails to identify the space id from the page because the page has
been encrypted from FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION bytes.

Fix:
===
buf_dblwr_t::recover(): preserve any pages whose space_id
does not match a known tablespace. These could be encrypted pages
of tablespaces that had been created with
innodb_checksum_algorithm=full_crc32.

buf_page_t::read_complete(): If the page looks corrupted and the
tablespace is encrypted and in full_crc32 format, try to
restore the page from doublewrite buffer.

recv_dblwr_t::recover_encrypted_page(): Find the page which
has the same page number and try to decrypt the page using
space->crypt_data. After decryption, compare the space id.
Write the recovered page back to the file.
2025-01-07 19:33:56 +05:30
Oleksandr Byelkin
b12ff287ec Merge branch '11.6' into 11.7 2024-11-10 19:22:21 +01:00
Sergei Golubchik
7feec30939 relax the XA recovery error
it's just a suggestion anyway, not a bullet-proof check,
let's not act as if it is
2024-11-05 14:00:52 -08:00
Oleksandr Byelkin
c770bce898 Merge branch '11.2' into 11.4 2024-10-30 15:11:17 +01:00
Marko Mäkelä
ebefef658e Merge 10.11 into 11.2 2024-10-18 11:32:22 +03:00
Marko Mäkelä
eca552a1a4 MDEV-34830: LSN in the future is not being treated as serious corruption
The invariant of write-ahead logging is that before any change to a
page is written to the data file, the corresponding log record must
must first have been durably written.

In crash recovery, there were some sloppy checks for this. Let us
implement accurate checks and flag an inconsistency as a hard error,
so that we can avoid further corruption of a corrupted database.
For data extraction from the corrupted database, innodb_force_recovery
can be used.

Before recovery is reading any data pages or invoking
buf_dblwr_t::recover() to recover torn pages from the
doublewrite buffer, InnoDB will have parsed the log until the
final LSN and updated log_sys.lsn to that. So, we can rely on
log_sys.lsn at all times. The doublewrite buffer recovery has been
refactored in such a way that the recv_sys.dblwr.pages may be consulted
while discovering files and their page sizes, but nothing will be
written back to data files before buf_dblwr_t::recover() is invoked.

recv_max_page_lsn, recv_lsn_checks_on: Remove.

recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging
condition at the end of the recovery.

recv_dblwr_t::validate_page(): Keep track of the maximum LSN
(if we are checking a non-doublewrite copy of a page) but
do not complain LSN being in the future. The doublewrite buffer
is a special case, because it will be read early during recovery.
Besides, starting with commit 762bcb81b5
the dblwr=true copies of pages may legitimately be "too new".

recv_dblwr_t::find_page(): Find a valid page with the smallest
FIL_PAGE_LSN that is in the valid range for recovery.

recv_dblwr_t::restore_first_page(): Replaced by find_page().
Only buf_dblwr_t::recover() will write to data files.

buf_dblwr_t::recover(): Simplify the message output. Do attempt
doublewrite recovery on user page read error. Ignore doublewrite
pages whose FIL_PAGE_LSN is outside the usable bounds. Previously,
we could wrongly recover a too new page from the doublewrite buffer.
It is unlikely that this could have lead to an actual error.
Write back all recovered pages from the doublewrite buffer here,
including for the first page of any tablespace.

buf_page_is_corrupted(): Distinguish the return values
CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER.

buf_page_check_corrupt(): Return the error code DB_CORRUPTION
in case the LSN is in the future.

Datafile::read_first_page_flags(): Split from read_first_page().
Take a copy of the first page as a parameter.

recv_sys_t::free_corrupted_page(): Take the file as a parameter
and return whether a message was displayed. This avoids some duplicated
and incomplete error messages.

buf_page_t::read_complete(): Remove some redundant output and always
display the name of the corrupted file. Never return DB_FAIL;
use it only in internal error handling.

IORequest::read_complete(): Assume that buf_page_t::read_complete()
will have reported any error.

fil_space_t::set_corrupted(): Return whether this is the first time
the tablespace had been flagged as corrupted.

Datafile::validate_first_page(), fil_node_open_file_low(),
fil_node_open_file(), fil_space_t::read_page0(),
fil_node_t::read_page0(): Add a parameter for a copy of the
first page, and a parameter to indicate whether the FIL_PAGE_LSN
check should be suppressed. Before buf_dblwr_t::recover() is
invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the
FSP_SPACE_FLAGS and the tablespace ID that may be present in a
potentially too new copy of a page.

Reviewed by: Debarun Banerjee
2024-10-18 10:12:47 +03:00
Marko Mäkelä
bb47e575de MDEV-34830: LSN in the future is not being treated as serious corruption
The invariant of write-ahead logging is that before any change to a
page is written to the data file, the corresponding log record must
must first have been durably written.

On crash recovery, there were some sloppy checks for this. Let us
implement accurate checks and flag an inconsistency as a hard error,
so that we can avoid further corruption of a corrupted database.
For data extraction from the corrupted database, innodb_force_recovery
can be used.

Before recovery is reading any data pages or invoking
buf_dblwr_t::recover() to recover torn pages from the
doublewrite buffer, InnoDB will have parsed the log until the
final LSN and updated log_sys.lsn to that. So, we can rely on
log_sys.lsn at all times. The doublewrite buffer recovery has been
refactored in such a way that the recv_sys.dblwr.pages may be consulted
while discovering files and their page sizes, but nothing will be
written back to data files before buf_dblwr_t::recover() is invoked.

A section of the test mariabackup.innodb_redo_overwrite
that is parsing some mariadb-backup --backup output has
been removed, because that output "redo log block is overwritten"
would often be missing in a Microsoft Windows environment
as a result of these changes.

recv_max_page_lsn, recv_lsn_checks_on: Remove.

recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging
condition at the end of the recovery.

recv_dblwr_t::validate_page(): Keep track of the maximum LSN
(if we are checking a non-doublewrite copy of a page) but
do not complain LSN being in the future. The doublewrite buffer
is a special case, because it will be read early during recovery.
Besides, starting with commit 762bcb81b5
the dblwr=true copies of pages may legitimately be "too new".

recv_dblwr_t::find_page(): Find a valid page with the smallest
FIL_PAGE_LSN that is in the valid range for recovery.

recv_dblwr_t::restore_first_page(): Replaced by find_page().
Only buf_dblwr_t::recover() will write to data files.

buf_dblwr_t::recover(): Simplify the message output. Do attempt
doublewrite recovery on user page read error. Ignore doublewrite
pages whose FIL_PAGE_LSN is outside the usable bounds. Previously,
we could wrongly recover a too new page from the doublewrite buffer.
It is unlikely that this could have lead to an actual error.
Write back all recovered pages from the doublewrite buffer here,
including for the first page of any tablespace.

buf_page_is_corrupted(): Distinguish the return values
CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER.

buf_page_check_corrupt(): Return the error code DB_CORRUPTION
in case the LSN is in the future.

Datafile::read_first_page(): Handle FSP_SPACE_FLAGS=0xffffffff
in the same way on both 32-bit and 64-bit architectures.

Datafile::read_first_page_flags(): Split from read_first_page().
Take a copy of the first page as a parameter.

recv_sys_t::free_corrupted_page(): Take the file as a parameter
and return whether a message was displayed. This avoids some duplicated
and incomplete error messages.

buf_page_t::read_complete(): Remove some redundant output and always
display the name of the corrupted file. Never return DB_FAIL;
use it only in internal error handling.

IORequest::read_complete(): Assume that buf_page_t::read_complete()
will have reported any error.

fil_space_t::set_corrupted(): Return whether this is the first time
the tablespace had been flagged as corrupted.

Datafile::validate_first_page(), fil_node_open_file_low(),
fil_node_open_file(), fil_space_t::read_page0(),
fil_node_t::read_page0(): Add a parameter for a copy of the
first page, and a parameter to indicate whether the FIL_PAGE_LSN
check should be suppressed. Before buf_dblwr_t::recover() is
invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the
FSP_SPACE_FLAGS and the tablespace ID that may be present in a
potentially too new copy of a page.

Reviewed by: Debarun Banerjee
2024-10-17 17:24:20 +03:00
Libing Song
72cc58bb71 MDEV-32014 Rename binlog cache temporary file to binlog file
for large transaction

Description
===========
When a transaction commits, it copies the binlog events from
binlog cache to binlog file. Very large transactions
(eg. gigabytes) can stall other transactions for a long time
because the data is copied while holding LOCK_log, which blocks
other commits from binlogging.

The solution in this patch is to rename the binlog cache file to
a binlog file instead of copy, if the commiting transaction has
large binlog cache. Rename is a very fast operation, it doesn't
block other transactions a long time.

Design
======
* binlog_large_commit_threshold
  type: ulonglong
  scope: global
  dynamic: yes
  default: 128MB

  Only the binlog cache temporary files large than 128MB are
  renamed to binlog file.

* #binlog_cache_files directory
  To support rename, all binlog cache temporary files are managed
  as normal files now. `#binlog_cache_files` directory is in the same
  directory with binlog files. It is created at server startup if it doesn't
  exist. Otherwise, all files in the directory is deleted at startup.

  The temporary files are named with ML_ prefix and the memorary address
  of the binlog_cache_data object which guarantees it is unique.

* Reserve space
  To supprot rename feature, It must reserve enough space at the
  begin of the binlog cache file. The space is required for
  Format description, Gtid list, checkpoint and Gtid events when
  renaming it to a binlog file.

  Since binlog_cache_data's cache_log is directly accessed by binlog log,
  online alter and wsrep. It is not easy to update all the code. Thus
  binlog cache will not reserve space if it is not session binlog cache or
  wsrep session is enabled.

  - m_file_reserved_bytes
    Stores the bytes reserved at the begin of the cache file.
    It is initialized in write_prepare() and cleared by reset().

    The reserved file header is hide to callers. Thus there is no
    change for callers. E.g.
    - get_byte_position() still get the length of binlog data
      written to the cache, but not the file length.
    - truncate(0) will truncate the file to m_file_reserved_bytes but not 0.

  - write_prepare()
    write_prepare() is called everytime when anything is being written
    into the cache. It will call init_file_reserved_bytes() to  create
    the cache file (if it doesn't exist) and reserve suitable space if
    the data written exceeds buffer's size.

* Binlog_commit_by_rotate
  It is used to encapsulate the code for remaing a binlog cache
  tempoary file to binlog file.
  - should_commit_by_rotate()
    it is called by write_transaction_to_binlog_events() to check if
    a binlog cache should be rename to a binlog file.
  - commit()
    That is the entry to rename a binlog cache and commit the
    transaction. Both rename and commit are protected by LOCK_log,
    Thus not other transactions can write anything into the renamed
    binlog before it.

    Rename happens in a rotation. After the new binlog file is generated,
    replace_binlog_file() is called to:
    - copy data from the new binlog file to its binlog cache file.
    - write gtid event.
    - rename the binlog cache file to binlog file.

    After that the rotation will continue to succeed. Then the transaction
    is committed in a seperated group itself. Its cache file will be
    detached and cache log will be reset before calling
    trx_group_commit_with_engines(). Thus only Xid event be written.
2024-10-17 07:53:59 -06:00
Marko Mäkelä
b53b81e937 Merge 11.2 into 11.4 2024-10-03 14:32:14 +03:00
Marko Mäkelä
12a91b57e2 Merge 10.11 into 11.2 2024-10-03 13:24:43 +03:00
Marko Mäkelä
63913ce5af Merge 10.6 into 10.11 2024-10-03 10:55:08 +03:00
Marko Mäkelä
7e0afb1c73 Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
Lena Startseva
0a5e4a0191 MDEV-31005: Make working cursor-protocol
Updated tests: cases with bugs or which cannot be run
with the cursor-protocol were excluded with
"--disable_cursor_protocol"/"--enable_cursor_protocol"

Fix for v.10.5
2024-09-18 18:39:26 +07:00
Marko Mäkelä
7ea9e1358f Merge 11.2 into 11.4 2024-09-18 08:07:22 +03:00
Marko Mäkelä
e782e416ac Merge 10.11 into 11.2 2024-09-18 07:38:49 +03:00
Marko Mäkelä
b187414764 Merge 10.6 into 10.11 2024-09-16 10:58:40 +03:00
Marko Mäkelä
a74bea7ba9 MDEV-34879 InnoDB fails to merge the change buffer to ROW_FORMAT=COMPRESSED tables
buf_page_t::read_complete(): Fix an incorrect condition that had been
added in commit aaef2e1d8c (MDEV-27058).
Also for compressed-only pages we must remember that buffered changes
may exist.

buf_read_page(): Correct the function comment; this is for a synchronous
and not asynchronous read. Pass the parameter unzip=true to
buf_read_page_low(), because each of our callers will be interested in
the uncompressed page frame. This will cause the test
encryption.innodb-compressed-blob to emit more errors when the
correct keys for decrypting the clustered index root page are unavailable.

Reviewed by: Debarun Banerjee
2024-09-12 10:52:55 +03:00
Oleksandr Byelkin
1640c9b06e Merge branch '11.2' into 11.4 2024-08-04 17:27:48 +02:00
Oleksandr Byelkin
dced6cbdb6 Merge branch '11.1' into 11.2 2024-08-03 09:50:16 +02:00
Oleksandr Byelkin
80abd847da Merge branch '10.11' into 11.1 2024-08-03 09:32:42 +02:00
Oleksandr Byelkin
0e8fb977b0 Merge branch '10.6' into 10.11 2024-08-03 09:15:40 +02:00
Oleksandr Byelkin
8f020508c8 Merge branch '10.5' into 10.6 2024-08-03 09:04:24 +02:00
Thirunarayanan Balathandayuthapani
533e6d5d13 MDEV-34670 IMPORT TABLESPACE unnecessary traverses tablespace list
Problem:
========
- After the commit ada1074bb1 (MDEV-14398)
fil_crypt_set_encrypt_tables() iterates through all tablespaces to
fill the default_encrypt tables list. This was a trigger to
encrypt or decrypt when key rotation age is set to 0. But import
tablespace does call fil_crypt_set_encrypt_tables() unnecessarily.
The motivation for the call is to signal the encryption threads.

Fix:
====
ha_innobase::discard_or_import_tablespace: Remove the
fil_crypt_set_encrypt_tables() and add the import tablespace
to the default encrypt list if necessary
2024-07-31 14:13:38 +05:30
Oleksandr Byelkin
99b370e023 Merge branch '11.2' into 11.4 2024-05-21 19:38:51 +02:00
Sergei Golubchik
bf5da43e50 Merge branch '11.1' into 11.2 2024-05-13 10:00:26 +02:00
Sergei Golubchik
f0a5412037 Merge branch '11.0' into 11.1 2024-05-13 09:52:30 +02:00
Sergei Golubchik
f9807aadef Merge branch '10.11' into 11.0 2024-05-12 12:18:28 +02:00
Sergei Golubchik
018d537ec1 Merge branch '10.6' into 10.11 2024-04-22 15:23:10 +02:00
Marko Mäkelä
bb2e125d07 Merge 10.5 into 10.6
This excludes commit 040069f4ba
because it is specific to innodb_sync_debug, which had been removed
in commit ff5d306e29.
2024-04-18 07:14:56 +03:00
Vladislav Vaintroub
061adae9a2 MDEV-16944 Fix file sharing issues on Windows in mysqltest
On Windows systems, occurrences of ERROR_SHARING_VIOLATION due to
conflicting share modes between processes accessing the same file can
result in CreateFile failures.

mysys' my_open() already incorporates a workaround by implementing
wait/retry logic on Windows.

But this does not help if files are opened using shell redirection like
mysqltest traditionally did it, i.e via

--echo exec "some text" > output_file

In such cases, it is cmd.exe, that opens the output_file, and it
won't do any sharing-violation retries.

This commit addresses the issue by introducing a new built-in command,
'write_line', in mysqltest. This new command serves as a brief alternative
to 'write_file', with a single line output, that also resolves variables
like "exec" would.

Internally, this command will use my_open(), and therefore retry-on-error
logic.

Hopefully this will eliminate the very sporadic "can't open file because
it is used by another process" error on CI.
2024-04-17 16:52:37 +02:00
Oleksandr Byelkin
cd28b2479c Merge branch '11.1' into 11.2 2024-04-09 12:12:33 +02:00
Marko Mäkelä
683fbced6b Merge 11.0 into 11.1 2024-03-28 12:15:36 +02:00
Marko Mäkelä
fec2fd6add Merge 10.11 into 11.0 2024-03-28 10:51:36 +02:00
Marko Mäkelä
788953463d Merge 10.6 into 10.11
Some fixes related to commit f838b2d799 and
Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row()
for system-versioned tables were provided by Nikita Malyavin.
This was required by test versioning.rpl,trx_id,row.
2024-03-28 09:16:57 +02:00
Sergei Golubchik
f71d7f2f0f Merge branch '10.5' into 10.6 2024-03-13 21:02:34 +01:00
Marko Mäkelä
f703e72bd8 Merge 10.4 into 10.5 2024-03-11 10:08:20 +02:00
Thirunarayanan Balathandayuthapani
8532dd82f1 MDEV-13765 encryption.encrypt_and_grep failed in buildbot with wrong result
- Adjust the test case to check whether all tablespaces
are encrypted by comparing it with existing table count.
2024-03-06 11:57:09 +05:30
Marko Mäkelä
d73baa402a Merge 10.11 into 11.0 2024-02-20 12:02:01 +02:00
Marko Mäkelä
86c2c89743 Merge 10.6 into 10.11 2024-02-08 15:04:46 +02:00
Marko Mäkelä
77b4399545 MDEV-33421 innodb.corrupted_during_recovery fails due to error that the table is corrupted
This fixes up the merge commit 7e39470e33

dict_table_open_on_name(): Report ER_TABLE_CORRUPT in a consistent
fashion, with a pretty-printed table name.
2024-02-08 14:20:42 +02:00
Marko Mäkelä
466069b184 Merge 10.5 into 10.6 2024-02-08 10:38:53 +02:00