1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00
Commit Graph

642 Commits

Author SHA1 Message Date
Marko Mäkelä
65f1a42788 Merge 10.5 into 10.6 2021-06-09 16:50:58 +03:00
Sergei Petrunia
c872125a66 MDEV-25630: Crash with window function in left expr of IN subquery
* Make Item_in_optimizer::fix_fields inherit the with_window_func
  attribute of the subquery's left expression (the subquery itself
  cannot have window functions that are aggregated in this select)

* Make Item_cache_wrapper::Item_cache_wrapper() inherit
  with_window_func attribute of the item it is caching.
2021-06-09 15:52:13 +03:00
Marko Mäkelä
f4425d3a3d Merge 10.4 into 10.5 2021-06-08 16:03:53 +03:00
Anel Husakovic
ddddfc33c7 Fix mtr tests with file_key_managment extension for Windows
Commit b5615eff0d introduced comment in result file during shutdown.
In case of Windows for the tests involving `file_key_managment.so` as plugin-load-add the tests will be overwritten with .dll extension.
The same happens with environment variable `$FILE_KEY_MANAGMENT_SO`.
So the patch is removing the extension to be extension agnostic.

Reviewed by: wlad@mariadb.com
2021-06-04 14:57:13 +02:00
Marko Mäkelä
1e1073b810 Merge 10.5 into 10.6 2021-05-04 07:37:38 +03:00
Monty
25fa2a6831 MDEV-25507 CHECK on encrypted Aria table complains about "Wrong LSN"
This happens during repair when a temporary table is opened
with HA_OPEN_COPY, which resets 'share->born_transactional', which
the encryption code did not like.

Fixed by resetting just share->now_transactional.
2021-04-30 15:45:07 +03:00
Marko Mäkelä
1900c2ede5 Merge 10.5 into 10.6 2021-04-08 10:11:36 +03:00
Monty
81258f1432 MDEV-17913 Encrypted transactional Aria tables remain corrupt after crash recovery, automatic repairment does not work
This was because of a wrong test in encryption code that wrote random
numbers over the LSN for pages for transactional Aria tables during repair.
The effect was that after an ALTER TABLE ENABLE KEYS of a encrypted
recovery of the tables would not work.

Fixed by changing testing of !share->now_transactional to
!share->base.born_transactional.

Other things:
- Extended Aria check_table() to check for wrong (= too big) LSN numbers.
- If check_table() failed just because of wrong LSN or TRN numbers,
  a following repair table will just do a zerofill which is much faster.
- Limit number of LSN errors in one check table to MAX_LSN_ERROR (10).
- Removed old obsolete test of 'if (error_count & 2)'. Changed error_count
  and warning_count from bits to numbers of errors/warnings as this is
  more useful.
2021-04-06 14:57:22 +03:00
Marko Mäkelä
a4b7232b2c Merge 10.4 into 10.5 2021-03-11 20:09:34 +02:00
Marko Mäkelä
1ea6ac3c95 Merge 10.3 into 10.4 2021-03-11 19:33:45 +02:00
Marko Mäkelä
08e8ad7c71 MDEV-25106 Deprecation warning for innodb_checksum_algorithm=none,innodb,...
MDEV-25105 (commit 7a4fbb55b0)
in MariaDB 10.6 will refuse the innodb_checksum_algorithm
values none, innodb, strict_none, strict_innodb.

We will issue a deprecation warning if innodb_checksum_algorithm
is set to any of these non-default unsafe values.

innodb_checksum_algorithm=crc32 was made the default in
MySQL 5.7 and MariaDB Server 10.2, and given that older versions
of the server have reached their end of life, there is no valid
reason to use anything else than innodb_checksum_algorithm=crc32
or innodb_checksum_algorithm=strict_crc32 in MariaDB 10.3.

Reviewed by: Sergei Golubchik
2021-03-11 12:50:04 +02:00
Marko Mäkelä
7a4fbb55b0 MDEV-25105 Remove innodb_checksum_algorithm values none,innodb,...
Historically, InnoDB supported a buggy page checksum algorithm that did not
compute a checksum over the full page. Later, well before MySQL 4.1
introduced .ibd files and the innodb_file_per_table option, the algorithm
was corrected and the first 4 bytes of each page were redefined to be
a checksum.

The original checksum was so slow that an option to disable page checksum
was introduced for benchmarketing purposes.

The Intel Nehalem microarchitecture introduced the SSE4.2 instruction set
extension, which includes instructions for faster computation of CRC-32C.
In MySQL 5.6 (and MariaDB 10.0), innodb_checksum_algorithm=crc32 was
implemented to make of that. As that option was changed to be the default
in MySQL 5.7, a bug was found on big-endian platforms and some work-around
code was added to weaken that checksum further. MariaDB disables that
work-around by default since MDEV-17958.

Later, SIMD-accelerated CRC-32C has been implemented in MariaDB for POWER
and ARM and also for IA-32/AMD64, making use of carry-less multiplication
where available.

Long story short, innodb_checksum_algorithm=crc32 is faster and more secure
than the pre-MySQL 5.6 checksum, called innodb_checksum_algorithm=innodb.
It should have removed any need to use innodb_checksum_algorithm=none.

The setting innodb_checksum_algorithm=crc32 is the default in
MySQL 5.7 and MariaDB Server 10.2, 10.3, 10.4. In MariaDB 10.5,
MDEV-19534 made innodb_checksum_algorithm=full_crc32 the default.
It is even faster and more secure.

The default settings in MariaDB do allow old data files to be read,
no matter if a worse checksum algorithm had been used.
(Unfortunately, before innodb_checksum_algorithm=full_crc32,
the data files did not identify which checksum algorithm is being used.)

The non-default settings innodb_checksum_algorithm=strict_crc32 or
innodb_checksum_algorithm=strict_full_crc32 would only allow CRC-32C
checksums. The incompatibility with old data files is why they are
not the default.

The newest server not to support innodb_checksum_algorithm=crc32
were MySQL 5.5 and MariaDB 5.5. Both have reached their end of life.
A valid reason for using innodb_checksum_algorithm=innodb could have
been the ability to downgrade. If it is really needed, data files
can be converted with an older version of the innochecksum utility.

Because there is no good reason to allow data files to be written
with insecure checksums, we will reject those option values:

    innodb_checksum_algorithm=none
    innodb_checksum_algorithm=innodb
    innodb_checksum_algorithm=strict_none
    innodb_checksum_algorithm=strict_innodb

Furthermore, the following innochecksum options will be removed,
because only strict crc32 will be supported:

    innochecksum --strict-check=crc32
    innochecksum -C crc32
    innochecksum --write=crc32
    innochecksum -w crc32

If a user wishes to convert a data file to use a different checksum
(so that it might be used with the no-longer-supported
MySQL 5.5 or MariaDB 5.5, which do not support IMPORT TABLESPACE
nor system tablespace format changes that were made in MariaDB 10.3),
then the innochecksum tool from MariaDB 10.2, 10.3, 10.4, 10.5 or
MySQL 5.7 can be used.

Reviewed by: Thirunarayanan Balathandayuthapani
2021-03-11 12:46:18 +02:00
Marko Mäkelä
d346763479 Merge 10.5 into 10.6 2021-03-08 10:51:31 +02:00
Marko Mäkelä
a5d3c1c819 Merge 10.4 into 10.5 2021-03-08 10:16:20 +02:00
Marko Mäkelä
a26e7a3726 Merge 10.3 into 10.4 2021-03-08 09:39:54 +02:00
Vicențiu Ciorbaru
e9b8b76f47 Merge branch '10.2' into 10.3 2021-03-04 16:04:30 +02:00
Vicențiu Ciorbaru
5da6ffe227 MDEV-25032: Window functions without column references get removed from ORDER BY
row_number() over () window function can be used without any column in the OVER
clause. Additionally, the item doesn't reference any tables, as it's not
effectively referencing any table. Rather it is specifically built based
on the end temporary table used for window function computation.

This caused remove_const function to wrongly drop it from the ORDER
list. Effectively, we shouldn't be dropping any window function from the
ORDER clause, so adjust remove_const to account for that.

Reviewed by: Sergei Petrunia sergey@mariadb.com
2021-03-04 15:37:47 +02:00
Vlad Lesin
23833dce05 MDEV-24792 Assertion `!newest_lsn || fil_page_get_type(page)' failed upon MariaBackup prepare in buf_flush_init_for_writing with innodb_log_optimize_ddl=off
fsp_free_page() writes MLOG_INIT_FREE_PAGE, but does not update page
type. But fil_crypt_rotate_page() checks the type to understand if the
page is freshly initialized, and writes dummy record(updates space id)
to force rotation during recovery. This dummy record causes assertion
crash when the page is flushed after recovery, as it's supposed
that pages LSN is 0 for freshly initialized pages.

The bug is similiar to MDEV-24695, the difference is that in 10.5 the
assertion crashes during log record applying, but in 10.4 it crashes
during page flushing.

The fix could be in marking page as freed and not writing dummy record
during keys rotation procedure for such marked pages. But
bpage->file_page_was_freed is not consistent enough for release builds in
10.4, and the issue is fixed in 10.5 and does not exist in 10.[23] as
MLOG_INIT_FREE_PAGE was introduced since 10.4.

So the better solution is just to relax the assertion and implement some
additional property for freshly allocated pages, and check this property
during pages flushing.

The test is copied from MDEV-24695, the only change is in forcing pages
flushing after each server start to cause crash in non-fixed code.

There is no need to merge it to 10.5+, as the bug is already fixed by
MDEV-24695.
2021-02-14 10:11:03 +03:00
Marko Mäkelä
1110beccd4 Merge 10.5 into 10.6 2021-02-02 15:15:53 +02:00
Thirunarayanan Balathandayuthapani
6e80a34d68 MDEV-24695 Encryption modifies a freed page
During recovery, InnoDB fails if it tries to apply a FREE_PAGE
and WRITE record to the page. InnoDB encryption thread accesses
the freed page and writes redo log for it.

This is similar to commit deadec4e68 (MDEV-24569)
InnoDB is missing buf_page_free() while freeing the segment.
To avoid accessing of freed page in buffer pool, InnoDB should
mark the pages as FREED while freeing the segment. Also to
avoid reading of freed page, InnoDB should check the
allocation bitmap page.

fseg_free_step(): Mark the page in buffer pool as FREED

fseg_free_step_not_header(): Mark the page in buffer pool as FREED

buf_dump(): Ignore the freed pages while dumping the buffer pool content

fil_crypt_get_page_throttle_func(): Skip the rotation for FREED page
to avoid the assert failure during recovery

fil_crypt_rotate_page(): Skip the rotation for the freed page

Reviewed-by: Marko Mäkelä
2021-01-28 13:26:34 +05:30
Marko Mäkelä
09a1f0075a Merge 10.5 into 10.6 2020-11-02 12:49:19 +02:00
Marko Mäkelä
898521e2dd Merge 10.4 into 10.5 2020-10-30 11:15:30 +02:00
Marko Mäkelä
7b2bb67113 Merge 10.3 into 10.4 2020-10-29 13:38:38 +02:00
Marko Mäkelä
a8de8f261d Merge 10.2 into 10.3 2020-10-28 10:01:50 +02:00
Varun Gupta
b94e8e4b25 MDEV-23867: insert... select crash in compute_window_func
There are 2 issues here:

Issue #1: memory allocation.
An IO_CACHE that uses encryption uses a larger buffer (it needs space for the encrypted data,
decrypted data, IO_CACHE_CRYPT struct to describe encryption parameters etc).

Issue #2: IO_CACHE::seek_not_done
When IO_CACHE objects are cloned, they still share the file descriptor.
This means, operation on one IO_CACHE may change the file read position
which will confuse other IO_CACHEs using it.

The fix of these issues would be:
Allocate the buffer to also include the extra size needed for encryption.
Perform seek again after one IO_CACHE reads the file.
2020-10-23 22:36:47 +05:30
Marko Mäkelä
1657b7a583 Merge 10.4 to 10.5 2020-10-22 17:08:49 +03:00
Marko Mäkelä
46957a6a77 Merge 10.3 into 10.4 2020-10-22 13:27:18 +03:00
Marko Mäkelä
e3d692aa09 Merge 10.2 into 10.3 2020-10-22 08:26:28 +03:00
Marko Mäkelä
db02c458c9 Clean up some encryption tests
Instead of pointlessly waiting for a page flush to occur, take
the matter into our own hands and request an explicit flush.
Also, test with the minimum necessary amount of data (0 or 1 rows)
so that both page encryption and decryption will be exercised.
2020-10-17 13:13:01 +03:00
Marko Mäkelä
6ce0a6f9ad Merge 10.5 into 10.6 2020-09-24 10:21:26 +03:00
Marko Mäkelä
882ce206db Merge 10.4 into 10.5 2020-09-23 11:32:43 +03:00
Marko Mäkelä
55e48b7722 Merge 10.3 into 10.4 2020-09-22 15:12:59 +03:00
Marko Mäkelä
fde3d895d9 Merge 10.2 into 10.3 2020-09-22 14:33:03 +03:00
Marko Mäkelä
2af8f712de MDEV-23776: Re-apply the fix and make the test more robust
The test that was added in commit e05650e686
would break a subsequent run of a test encryption.innodb-bad-key-change
because some pages in the system tablespace would be encrypted with
a different key.

The failure was repeatable with the following invocation:

./mtr --no-reorder \
encryption.create_or_replace,cbc \
encryption.innodb-bad-key-change,cbc

Because the crash was unrelated to the code changes that we reverted
in commit eb38b1f703
we can safely re-apply those fixes.
2020-09-22 13:08:09 +03:00
Marko Mäkelä
952a028a52 Merge 10.3 into 10.4
We omit commit a3bdce8f1e
and commit a0e2a293bc
because they would make the test galera_3nodes.galera_gtid_2_cluster
fail and disable it.
2020-09-21 17:42:02 +03:00
Marko Mäkelä
2cf489d430 Merge 10.2 into 10.3 2020-09-21 16:39:23 +03:00
Marko Mäkelä
e05650e686 MDEV-23776: Add the reduced encryption.create_or_replace test
This was forgotten in commit a9d8f5c1a5.
2020-09-21 16:35:28 +03:00
Marko Mäkelä
a9d8f5c1a5 MDEV-23776: Split encryption.create_or_replace
Let us shrink the test encryption.create_or_replace so that it can
run on the CI system, also on the embedded server.

encryption.create_or_replace_big: Renamed from the original test,
with the subset of encryption.create_or_replace omitted.
2020-09-21 16:14:35 +03:00
Marko Mäkelä
2fa9f8c53a Merge 10.3 into 10.4 2020-08-20 11:01:47 +03:00
Marko Mäkelä
de0e7cd72a Merge 10.2 into 10.3 2020-08-20 09:12:16 +03:00
Marko Mäkelä
4c50120d14 MDEV-23474 InnoDB fails to restart after SET GLOBAL innodb_log_checksums=OFF
Regretfully, the parameter innodb_log_checksums was introduced
in MySQL 5.7.9 (the first GA release of that series) by
mysql/mysql-server@af0acedd88
which partly replaced a parameter that had been introduced in 5.7.8
mysql/mysql-server@22ba38218e
as innodb_log_checksum_algorithm.

Given that the CRC-32C operations are accelerated on many processor
implementations (AMD64 with SSE4.2; since MDEV-22669 also on IA-32
with SSE4.2, POWER 8 and later, ARMv8 with some extensions)
and by lookup tables when only generic SISD instructions are available,
there should be no valid reason to disable checksums.

In MariaDB 10.5.2, as a preparation for MDEV-12353, MDEV-19543 deprecated
and ignored the parameter innodb_log_checksums altogether. This should
imply that after a clean shutdown with innodb_log_checksums=OFF one
cannot upgrade to MariaDB Server 10.5 at all.

Due to these problems, let us deprecate the parameter innodb_log_checksums
and honor it only during server startup.
The command SET GLOBAL innodb_log_checksums will always set the
parameter to ON.
2020-08-18 16:46:07 +03:00
Marko Mäkelä
e685809a3b MDEV-23397 Remove deprecated InnoDB options in 10.6 2020-08-04 12:51:59 +03:00
Marko Mäkelä
50a11f396a Merge 10.4 into 10.5 2020-08-01 14:42:51 +03:00
Marko Mäkelä
9216114ce7 Merge 10.3 into 10.4 2020-07-31 18:09:08 +03:00
Marko Mäkelä
66ec3a770f Merge 10.2 into 10.3 2020-07-31 13:51:28 +03:00
Marko Mäkelä
6307b17aa1 MDEV-20142 encryption.innodb_encrypt_temporary_tables failed in buildbot with wrong result
Let us read both encrypted temporary tables to increase the changes of
page flushing and eviction.
2020-07-28 13:00:59 +03:00
Thirunarayanan Balathandayuthapani
c92f7e287f MDEV-8139 Fix Scrubbing
fil_space_t::freed_ranges: Store ranges of freed page numbers.

fil_space_t::last_freed_lsn: Store the most recent LSN of
freeing a page.

fil_space_t::freed_mutex: Protects freed_ranges, last_freed_lsn.

fil_space_create(): Initialize the freed_range mutex.

fil_space_free_low(): Frees the freed_range mutex.

range_set: Ranges of page numbers.

buf_page_create(): Removes the page from freed_ranges when page
is being reused.

btr_free_root(): Remove the PAGE_INDEX_ID invalidation. Because
btr_free_root() and dict_drop_index_tree() are executed in
the same atomic mini-transaction, there is no need to
invalidate the root page.

buf_release_freed_page(): Split from buf_flush_freed_page().
Skip any I/O

buf_flush_freed_pages(): Get the freed ranges from tablespace and
Write punch-hole or zeroes of the freed ranges.

buf_flush_try_neighbors(): Handles the flushing of freed ranges.

mtr_t::freed_pages: Variable to store the list of freed pages.

mtr_t::add_freed_pages(): To add freed pages.

mtr_t::clear_freed_pages(): To clear the freed pages.

mtr_t::m_freed_in_system_tablespace: Variable to indicate whether page has
been freed in system tablespace.

mtr_t::m_trim_pages: Variable to indicate whether the space has been trimmed.

mtr_t::commit(): Add the freed page and update the last freed lsn
in the tablespace and clear the tablespace freed range if space is
trimmed.

file_name_t::freed_pages: Store the freed pages during recovery.

file_name_t::add_freed_page(), file_name_t::remove_freed_page(): To
add and remove freed page during recovery.

store_freed_or_init_rec(): Store or remove the freed pages while
encountering FREE_PAGE or INIT_PAGE redo log record.

recv_init_crash_recovery_spaces(): Add the freed page encountered
during recovery to respective tablespace.
2020-06-12 09:17:51 +05:30
Marko Mäkelä
b1ab211dee MDEV-15053 Reduce buf_pool_t::mutex contention
User-visible changes: The INFORMATION_SCHEMA views INNODB_BUFFER_PAGE
and INNODB_BUFFER_PAGE_LRU will report a dummy value FLUSH_TYPE=0
and will no longer report the PAGE_STATE value READY_FOR_USE.

We will remove some fields from buf_page_t and move much code to
member functions of buf_pool_t and buf_page_t, so that the access
rules of data members can be enforced consistently.

Evicting or adding pages in buf_pool.LRU will remain covered by
buf_pool.mutex.

Evicting or adding pages in buf_pool.page_hash will remain
covered by both buf_pool.mutex and the buf_pool.page_hash X-latch.

After this fix, buf_pool.page_hash lookups can entirely
avoid acquiring buf_pool.mutex, only relying on
buf_pool.hash_lock_get() S-latch.

Similarly, buf_flush_check_neighbors() can will rely solely on
buf_pool.mutex, no buf_pool.page_hash latch at all.

The buf_pool.mutex is rather contended in I/O heavy benchmarks,
especially when the workload does not fit in the buffer pool.

The first attempt to alleviate the contention was the
buf_pool_t::mutex split in
commit 4ed7082eef
which introduced buf_block_t::mutex, which we are now removing.

Later, multiple instances of buf_pool_t were introduced
in commit c18084f71b
and recently removed by us in
commit 1a6f708ec5 (MDEV-15058).

UNIV_BUF_DEBUG: Remove. This option to enable some buffer pool
related debugging in otherwise non-debug builds has not been used
for years. Instead, we have been using UNIV_DEBUG, which is enabled
in CMAKE_BUILD_TYPE=Debug.

buf_block_t::mutex, buf_pool_t::zip_mutex: Remove. We can mainly rely on
std::atomic and the buf_pool.page_hash latches, and in some cases
depend on buf_pool.mutex or buf_pool.flush_list_mutex just like before.
We must always release buf_block_t::lock before invoking
unfix() or io_unfix(), to prevent a glitch where a block that was
added to the buf_pool.free list would apper X-latched. See
commit c5883debd6 how this glitch
was finally caught in a debug environment.

We move some buf_pool_t::page_hash specific code from the
ha and hash modules to buf_pool, for improved readability.

buf_pool_t::close(): Assert that all blocks are clean, except
on aborted startup or crash-like shutdown.

buf_pool_t::validate(): No longer attempt to validate
n_flush[] against the number of BUF_IO_WRITE fixed blocks,
because buf_page_t::flush_type no longer exists.

buf_pool_t::watch_set(): Replaces buf_pool_watch_set().
Reduce mutex contention by separating the buf_pool.watch[]
allocation and the insert into buf_pool.page_hash.

buf_pool_t::page_hash_lock<bool exclusive>(): Acquire a
buf_pool.page_hash latch.
Replaces and extends buf_page_hash_lock_s_confirm()
and buf_page_hash_lock_x_confirm().

buf_pool_t::READ_AHEAD_PAGES: Renamed from BUF_READ_AHEAD_PAGES.

buf_pool_t::curr_size, old_size, read_ahead_area, n_pend_reads:
Use Atomic_counter.

buf_pool_t::running_out(): Replaces buf_LRU_buf_pool_running_out().

buf_pool_t::LRU_remove(): Remove a block from the LRU list
and return its predecessor. Incorporates buf_LRU_adjust_hp(),
which was removed.

buf_page_get_gen(): Remove a redundant call of fsp_is_system_temporary(),
for mode == BUF_GET_IF_IN_POOL_OR_WATCH, which is only used by
BTR_DELETE_OP (purge), which is never invoked on temporary tables.

buf_free_from_unzip_LRU_list_batch(): Avoid redundant assignments.

buf_LRU_free_from_unzip_LRU_list(): Simplify the loop condition.

buf_LRU_free_page(): Clarify the function comment.

buf_flush_check_neighbor(), buf_flush_check_neighbors():
Rewrite the construction of the page hash range. We will hold
the buf_pool.mutex for up to buf_pool.read_ahead_area (at most 64)
consecutive lookups of buf_pool.page_hash.

buf_flush_page_and_try_neighbors(): Remove.
Merge to its only callers, and remove redundant operations in
buf_flush_LRU_list_batch().

buf_read_ahead_random(), buf_read_ahead_linear(): Rewrite.
Do not acquire buf_pool.mutex, and iterate directly with page_id_t.

ut_2_power_up(): Remove. my_round_up_to_next_power() is inlined
and avoids any loops.

fil_page_get_prev(), fil_page_get_next(), fil_addr_is_null(): Remove.

buf_flush_page(): Add a fil_space_t* parameter. Minimize the
buf_pool.mutex hold time. buf_pool.n_flush[] is no longer updated
atomically with the io_fix, and we will protect most buf_block_t
fields with buf_block_t::lock. The function
buf_flush_write_block_low() is removed and merged here.

buf_page_init_for_read(): Use static linkage. Initialize the newly
allocated block and acquire the exclusive buf_block_t::lock while not
holding any mutex.

IORequest::IORequest(): Remove the body. We only need to invoke
set_punch_hole() in buf_flush_page() and nowhere else.

buf_page_t::flush_type: Remove. Replaced by IORequest::flush_type.
This field is only used during a fil_io() call.
That function already takes IORequest as a parameter, so we had
better introduce  for the rarely changing field.

buf_block_t::init(): Replaces buf_page_init().

buf_page_t::init(): Replaces buf_page_init_low().

buf_block_t::initialise(): Initialise many fields, but
keep the buf_page_t::state(). Both buf_pool_t::validate() and
buf_page_optimistic_get() requires that buf_page_t::in_file()
be protected atomically with buf_page_t::in_page_hash
and buf_page_t::in_LRU_list.

buf_page_optimistic_get(): Now that buf_block_t::mutex
no longer exists, we must check buf_page_t::io_fix()
after acquiring the buf_pool.page_hash lock, to detect
whether buf_page_init_for_read() has been initiated.
We will also check the io_fix() before acquiring hash_lock
in order to avoid unnecessary computation.
The field buf_block_t::modify_clock (protected by buf_block_t::lock)
allows buf_page_optimistic_get() to validate the block.

buf_page_t::real_size: Remove. It was only used while flushing
pages of page_compressed tables.

buf_page_encrypt(): Add an output parameter that allows us ot eliminate
buf_page_t::real_size. Replace a condition with debug assertion.

buf_page_should_punch_hole(): Remove.

buf_dblwr_t::add_to_batch(): Replaces buf_dblwr_add_to_batch().
Add the parameter size (to replace buf_page_t::real_size).

buf_dblwr_t::write_single_page(): Replaces buf_dblwr_write_single_page().
Add the parameter size (to replace buf_page_t::real_size).

fil_system_t::detach(): Replaces fil_space_detach().
Ensure that fil_validate() will not be violated even if
fil_system.mutex is released and reacquired.

fil_node_t::complete_io(): Renamed from fil_node_complete_io().

fil_node_t::close_to_free(): Replaces fil_node_close_to_free().
Avoid invoking fil_node_t::close() because fil_system.n_open
has already been decremented in fil_space_t::detach().

BUF_BLOCK_READY_FOR_USE: Remove. Directly use BUF_BLOCK_MEMORY.

BUF_BLOCK_ZIP_DIRTY: Remove. Directly use BUF_BLOCK_ZIP_PAGE,
and distinguish dirty pages by buf_page_t::oldest_modification().

BUF_BLOCK_POOL_WATCH: Remove. Use BUF_BLOCK_NOT_USED instead.
This state was only being used for buf_page_t that are in
buf_pool.watch.

buf_pool_t::watch[]: Remove pointer indirection.

buf_page_t::in_flush_list: Remove. It was set if and only if
buf_page_t::oldest_modification() is nonzero.

buf_page_decrypt_after_read(), buf_corrupt_page_release(),
buf_page_check_corrupt(): Change the const fil_space_t* parameter
to const fil_node_t& so that we can report the correct file name.

buf_page_monitor(): Declare as an ATTRIBUTE_COLD global function.

buf_page_io_complete(): Split to buf_page_read_complete() and
buf_page_write_complete().

buf_dblwr_t::in_use: Remove.

buf_dblwr_t::buf_block_array: Add IORequest::flush_t.

buf_dblwr_sync_datafiles(): Remove. It was a useless wrapper of
os_aio_wait_until_no_pending_writes().

buf_flush_write_complete(): Declare static, not global.
Add the parameter IORequest::flush_t.

buf_flush_freed_page(): Simplify the code.

recv_sys_t::flush_lru: Renamed from flush_type and changed to bool.

fil_read(), fil_write(): Replaced with direct use of fil_io().

fil_buffering_disabled(): Remove. Check srv_file_flush_method directly.

fil_mutex_enter_and_prepare_for_io(): Return the resolved
fil_space_t* to avoid a duplicated lookup in the caller.

fil_report_invalid_page_access(): Clean up the parameters.

fil_io(): Return fil_io_t, which comprises fil_node_t and error code.
Always invoke fil_space_t::acquire_for_io() and let either the
sync=true caller or fil_aio_callback() invoke
fil_space_t::release_for_io().

fil_aio_callback(): Rewrite to replace buf_page_io_complete().

fil_check_pending_operations(): Remove a parameter, and remove some
redundant lookups.

fil_node_close_to_free(): Wait for n_pending==0. Because we no longer
do an extra lookup of the tablespace between fil_io() and the
completion of the operation, we must give fil_node_t::complete_io() a
chance to decrement the counter.

fil_close_tablespace(): Remove unused parameter trx, and document
that this is only invoked during the error handling of IMPORT TABLESPACE.

row_import_discard_changes(): Merged with the only caller,
row_import_cleanup(). Do not lock up the data dictionary while
invoking fil_close_tablespace().

logs_empty_and_mark_files_at_shutdown(): Do not invoke
fil_close_all_files(), to avoid a !needs_flush assertion failure
on fil_node_t::close().

innodb_shutdown(): Invoke os_aio_free() before fil_close_all_files().

fil_close_all_files(): Invoke fil_flush_file_spaces()
to ensure proper durability.

thread_pool::unbind(): Fix a crash that would occur on Windows
after srv_thread_pool->disable_aio() and os_file_close().
This fix was submitted by Vladislav Vaintroub.

Thanks to Matthias Leich and Axel Schwenke for extensive testing,
Vladislav Vaintroub for helpful comments, and Eugene Kosov for a review.
2020-06-05 12:35:46 +03:00
Marko Mäkelä
23047d3ed4 Merge 10.4 into 10.5 2020-05-18 17:30:02 +03:00
Varun Gupta
0a5668f512 MDEV-22556: Incorrect result for window function when using encrypt-tmp-files=ON
The issue here is that end_of_file for encrypted temporary IO_CACHE (used by filesort) is updated
using lseek.
Encryption adds storage overhead and hides it from the caller by recalculating offsets and lengths.
Two different IO_CACHE cannot possibly modify the same file
because the encryption key is randomly generated and stored in the IO_CACHE.
So when the tempfiles are encrypted DO NOT use lseek to change end_of_file.

Further observations about updating end_of_file using lseek
1) The end_of_file update is only used for binlog index files
2) The whole point is to update file length when the file was modified via a different file descriptor.
3) The temporary IO_CACHE files can never be modified via a different file descriptor.
4) For encrypted temporary IO_CACHE, end_of_file should not be updated with lseek
2020-05-17 15:26:18 +05:30