1
0
mirror of https://github.com/MariaDB/server.git synced 2025-06-03 07:02:23 +03:00

201915 Commits

Author SHA1 Message Date
Nikita Malyavin
044d8c50c6 ed25519: support empty password 2024-11-08 07:17:54 +01:00
Nikita Malyavin
583a5a79c9 MDEV-34854 Parsec sends garbage when using an empty password
When an empty password is set, the server doesn't call
st_mysql_auth::hash_password and leaves MYSQL_SERVER_AUTH_INFO::auth_string
empty.

Fix:
generate hashes by calling hash_password for empty passwords as well. This
changes the api behavior slightly, but since even old plugins support it,
we can ignore this.

Some empty passwords could be already stored with no salt, though. The user
will have to call SET PASSWORD once again, anyway the authentication wouldn't
have worked for such password.
2024-11-08 07:17:44 +01:00
Oleksandr Byelkin
9e1fb104a3 MariaDB 11.4.4 release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEF39AEP5WyjM2MAMF8WVvJMdM0dgFAmck77AACgkQ8WVvJMdM
 0dgccQ/+Lls8fWt4D+gMPP7x+drJSO/IE/gZFt3ugbWF+/p3B2xXAs5AAE83wxEh
 QSbp4DCkb/9PnuakhLmzg0lFbxMUlh4rsJ1YyiuLB2J+YgKbAc36eQQf+rtYSipd
 DT5uRk36c9wOcOXo/mMv4APEvpPXBIBdIL4VvpKFbIOE7xT24Sp767zWXdXqrB1f
 JgOQdM2ct+bvSPC55oZ5p1kqyxwvd6K6+3RB3CIpwW9zrVSLg7enT3maLjj/761s
 jvlRae+Cv+r+Hit9XpmEH6n2FYVgIJ3o3WhdAHwN0kxKabXYTg7OCB7QxDZiUHI9
 C/5goKmKaPB1PCQyuTQyLSyyK9a8nPfgn6tqw/p/ZKDQhKT9sWJv/5bSWecrVndx
 LLYifSTrFC/eXLzgPvCnNv/U8SjsZaAdMIKS681+qDJ0P5abghUIlGnMYTjYXuX1
 1B6Vrr0bdrQ3V1CLB3tpkRjpUvicrsabtuAUAP65QnEG2G9UJXklOer+DE291Gsl
 f1I0o6C1zVGAOkUUD3QEYaHD8w7hlvyfKme5oXKUm3DOjaAar5UUKLdr6prxRZL4
 ebhmGEy42Mf8fBYoeohIxmxgvv6h2Xd9xCukgPp8hFpqJGw8abg7JNZTTKH4h2IY
 J51RpD10h4eoi6WRn3opEcjexTGvZ+xNR7yYO5WxWw6VIre9IUA=
 =s+WW
 -----END PGP SIGNATURE-----

Merge tag '11.4' into 11.6

MariaDB 11.4.4 release
2024-11-08 07:17:00 +01:00
Brandon Nesterenko
e9a502df08 Testing fix for rpl_semi_sync_cond_var_per_thd failure mariadb-11.4.4 2024-10-30 08:32:19 -06:00
Oleksandr Byelkin
c770bce898 Merge branch '11.2' into 11.4 2024-10-30 15:11:17 +01:00
Alexander Barkov
4c7cfd2624 MDEV-34817 perfschema.lowercase_fs_off fails on buildbot
I pushed this patch in a mistake to 11.7 instead of 11.6.
Backporting from 11.7.

This is a workaround patch to make buildbot green.

Renaming databases from db1/DB2 to m33020_db1/m33020_DB1
to make them unique. So the garbage left by other tests
does not show up any more.

The real problem will be fixed under terms of:
  MDEV-35282 Performance schema does not clear package routines
2024-10-30 13:47:19 +04:00
Marko Mäkelä
67c0fd2a41 MDEV-35289 innodb_fast_shutdown=0 might corrupt the system tablespace on 32-bit systems
inode_info::get_unused(): Do not try to assign 64 bits
into a potentially narrower variable.
mariadb-11.2.6
2024-10-30 09:36:12 +02:00
Oleksandr Byelkin
45e1ddc00f Revert "don't disable lto in DEB builds"
This reverts commit 99178311ac6091eff092c44005db06c0bcbe7c66.

ColumnStore does not build with LTO. See MCOL-5819
2024-10-29 16:42:52 +01:00
Oleksandr Byelkin
69d033d165 Merge branch '10.11' into 11.2 2024-10-29 16:42:46 +01:00
Oleksandr Byelkin
3d0fb15028 Merge branch '10.6' into 10.11 mariadb-10.11.10 2024-10-29 15:24:38 +01:00
Oleksandr Byelkin
f00711bba2 Merge branch '10.5' into 10.6 mariadb-10.6.20 2024-10-29 14:20:03 +01:00
Oleksandr Byelkin
6aa47fae30 MDEV-35276 Assertion failure in find_producing_item upon a query from a view
Two problem solved:

1) Item_default_value makes a shallow copy so the copy
 should not delete field belong to the Item

2) Item_default_value should not inherit
 derived_field_transformer_for_having and
 derived_field_transformer_for_where (in this variant
 pushing DEFAULT(f) is prohibited (return NULL) but
 if return "this" it will be allowed (should go with
 a lot of tests))
mariadb-10.5.27
2024-10-29 11:44:43 +01:00
Thirunarayanan Balathandayuthapani
db3be9b434 MDEV-35237 Bulk insert fails to apply buffered operation during CREATE..SELECT statement
Problem:
=======
- InnoDB fails to write the buffered insert operation during
create..select operation. This happens when bulk_insert
in transaction is reset to false while unlocking a source table.

Fix:
===
- InnoDB should apply the previous buffered changes to
all tables if we encounter any statement other than
pure INSERT or INSERT..SELECT statement in
ha_innobase::external_lock() and start_stmt().

- Remove the function bulk_insert_apply_for_table()

start_stmt(), external_lock(): Assert that trx->duplicates
should be enabled during bulk insert operation
2024-10-29 15:03:23 +05:30
Sergei Golubchik
953f847aed Post-fix for MDEV-35236
when the option_list is initially empty, its value doesn't need
to be restored, as it'll be shallow-copied every time.
Furthermore, the CREATE_INFO is allocated on the stack, so it's
even wrong to restore it after its frame was left.

followup for 3cd706b107d
2024-10-28 15:29:10 +01:00
Alexander Barkov
a88c71b294 MDEV-35041 Simple comparison causes "Illegal mix of collations" even with default server settings
The task "MDEV-25829 Change default Unicode collation to uca1400_ai_ci"
previously changed collation derivation for string user variables
from DERIVATION_EXPLICIT to DERIVATION_COERCIBLE, to resolve illegal
collation mix conflicts between table columns and user variables
when they have different collations.

However, DERIVATION_COERCIBLE was a wrong choice because it caused
conflicts between string literals and user variables when they have
different collations.

Adding a new collation derivation level DERIVATION_USERVAR.
This makes the collation of a user variable:
- weaker than a table column (like it was intended by MDEV-25829)
- but stronger than a literal (like it was in pre-MDEV-25829)

Cleanup in sql_type.h:
  Removing the line "- BINARY(expr)" from the before-DERIVATION_CAST
  comment, as it was on a wrong place. It's also listed on the correct
  place before DERIVATION_IMPLICIT.
2024-10-28 16:30:49 +04:00
Oleg Smirnov
52723ec09a MDEV-34880 Incorrect result for query with derived table having TEXT field
Fixup: check key flags only in the case of successful index initialization
2024-10-28 18:49:01 +07:00
Vladislav Vaintroub
4b068b7fcb MDEV-32387 - prevent output during temporary changes to STDOUT/STDERR
create_process() temporarily changes STDOUT/STDERR output to error log file

This might redirect mtr output on Windows, so avoid it by holding
flush_lock.
2024-10-28 10:45:50 +01:00
Marko Mäkelä
63a7e4c96b MDEV-35257 Backup fails during an ALTER TABLE with FULLTEXT INDEX
In commit 1c55b845e0fe337e647ba230288ed13e966cb7c7 (MDEV-32932) the
test mariabackup.innodb_ddl_on_intermediate_table was introduced but
disabled.

xb_load_single_table_tablespace(): Properly handle missing FTS_ tables.

backup_file_op_fail(): Properly handle FILE_DELETE records.
2024-10-28 07:44:18 +02:00
Sergei Petrunia
284593413f MDEV-35253: xa_prepare_unlock_unmodified fails: shift exponent 32 is too large
The code in best_access_path() uses PREV_BITS(uint, N) to
compute a bitmap of all keyparts: {keypart0, ... keypart{N-1}).

The problem is that PREV_BITS($type, N) macro code can't handle the case
when N=<number of bits in $type).
Also, why use PREV_BITS(uint, ...) for key part map computations when
we could have used PREV_BITS(key_part_map) ?

Fixed both:
- Change PREV_BITS(type, N) to handle any N in [0; n_bits(type)].
- Change PREV_BITS() to use key_part_map when computing key_part_map bitmaps.
2024-10-25 18:02:14 +03:00
Yuchen Pei
b8c2bd9f69
MDEV-35249 Fix regression caused by MDEV-34447
MDEV-34447 Removed setting first_cond_optimization to 0 in update and
delete when leaf_tables_saved. This can cause problems when two ps
executions of an update go through different paths, where the first ps
execution goes through single table update only and the second ps
execution also goes through multi table update. When this happens, the
first_cond_optimization of the outer query is not set to false during
the first ps execution because optimize() is not called for the outer
query. But then the second ps execution will call optimize() on the
outer query, which with first_cond_optimization==true trips the 2nd ps
mem leak detection.

This is not a problem in higher version as both executions go through
multi table updates, possibly due to MDEV-28883.

We fix this problem by restoring the FALSE assignments to
first_cond_optimization.
2024-10-25 18:03:40 +11:00
Marko Mäkelä
decdd4bf49 MDEV-29015/MDEV-29260/MDEV-34938: os_file_get_size() WSL work-around
When MariaDB Server is run in a container under
Windows Subsystem for Linux, the fstat(2) system calls that InnoDB
invokes in os_file_set_size() or os_file_get_size() are causing a
failure in case the file had been renamed in the past while the file
handle was open. This affects at least ALTER TABLE and OPTIMIZE TABLE.

os_file_get_size(): Invoke lseek(2) instead of fstat(2). We do not mind
if the file pointer is moving to the end of the file, because InnoDB
exclusively invokes positioned reads and writes, or in some rare cases,
appends to an existing file.

os_file_set_size(): Invoke os_file_get_size() instead of fstat(2).
Define the POSIX and Windows versions separately. Formerly, the
Windows version was called os_file_change_size_win32().

fil_node_t::read_page0(): Use os_file_get_size() to determine the
size, and do not crash on error.

fil_node_t::read_metadata(): Remove the non-Windows stat* parameter
and always invoke fstat(2) outside Windows, but do tolerate errors.
Because fstat(2) is more likely to fail than lseek(2), and this is
not time critical code, we can afford the extra lseek(2) system call.

Reviewed by: Vladislav Vaintroub
2024-10-24 16:08:56 +03:00
Sergei Golubchik
3cd706b107 MDEV-35236 Assertion `(mem_root->flags & 4) == 0' failed in safe_lexcstrdup_root
Post-fix for MDEV-35144.

Cannot allocate options values on the statement arena, because
HA_CREATE_INFO is shallow-copied for every execution, so if the
option_list was initially empty, it will be reset for every execution
and any values allocated on the  statement arena will be lost.

Cannot allocate option values on the execution arena, because
HA_CREATE_INFO is shallow-copied for every execution, so if the
option_list was  initially NOT empty, any values appended to the
end will be preserved and if they're on the execution arena their
content will be destroyed.

Let's use thd->change_item_tree() to save and restore necessary pointers
for every execution.

followup for 3da565c41d87
2024-10-23 14:58:57 +02:00
Sergei Golubchik
eac33a23da MDEV-32022 ERROR 1054 (42S22): Unknown column 'X' in 'NEW' in trigger
add missing do_get_copy/do_build_clone
2024-10-23 14:58:57 +02:00
Oleg Smirnov
6bd1cb0ea0 MDEV-34880 Incorrect result for query with derived table having TEXT field
When a derived table which has distinct values and BLOB fields is
materialized, an index is created over all columns to ensure only
unique values are placed to the result.
This index is created in a special mode HA_UNIQUE_HASH to support BLOBs.
Later the optimizer may incorrectly choose this index to retrieve values
from the derived table, although such type of index cannot be used
for data retrieval.

This commit excludes HA_UNIQUE_HASH indexes from adding to
`JOIN::keyuse` array thus preventing their subsequent usage for
data retrieval
2024-10-23 17:55:00 +07:00
Vlad Lesin
8c7786e7d5 MDEV-34690 lock_rec_unlock_unmodified() causes deadlock
lock_rec_unlock_unmodified() is executed either under lock_sys.wr_lock()
or under a combination of lock_sys.rd_lock() + record locks hash table
cell latch. It also requests page latch to check if locked records were
changed by the current transaction or not.

Usually InnoDB requests page latch to find the certain record on the
page, and then requests lock_sys and/or record lock hash cell latch to
request record lock. lock_rec_unlock_unmodified() requests the latches
in the opposite order, what causes deadlocks. One of the possible
scenario for the deadlock is the following:

thread 1 - lock_rec_unlock_unmodified() is invoked under locks hash table
           cell latch, the latch is acquired;
thread 2 - purge thread acquires page latch and tries to remove
           delete-marked record, it invokes lock_update_delete(), which
           requests locks hash table cell latch, held by thread 1;
thread 1 - requests page latch, held by thread 2.

To fix it we need to release lock_sys.latch and/or lock hash cell latch,
acquire page latch and re-acquire lock_sys related latches.

When lock_sys.latch and/or lock hash cell latch are released in
lock_release_on_prepare() and lock_release_on_prepare_try(), the page on
which the current lock is held, can be merged. In this case the bitmap
of the current lock must be cleared, and the new lock must be added to
the end of trx->lock.trx_locks list, or bitmap of already existing lock
must be changed.

The new field trx_lock_t::set_nth_bit_calls indicates if new locks
(bits in existing lock bitmaps or new lock objects) were created during
the period when lock_sys was released in trx->lock.trx_locks list
iteration loop in lock_release_on_prepare() or
lock_release_on_prepare_try(). And, if so, we traverse the list again.

The block can be freed during pages merging, what causes assertion
failure in buf_page_get_gen(), as btr_block_get() passes BUF_GET as page
get mode to it. That's why page_get_mode parameter was added to
btr_block_get() to pass BUF_GET_POSSIBLY_FREED from
lock_release_on_prepare() and lock_release_on_prepare_try() to
buf_page_get_gen().

As searching for id of trx, which modified secondary index record, is
quite expensive operation, restrict its usage for master. System variable
was added to remove the restriction for testing simplifying. The
variable exists only either for debug build or for build with
-DINNODB_ENABLE_XAP_UNLOCK_UNMODIFIED_FOR_PRIMARY option to increase the
probability of catching bugs for release build with RQG.

Note that the code, which does primary index lookup to find out what
transaction modified secondary index record, is necessary only when
there is no primary key and no unique secondary key on replica with row
based replication, because only in this case extra X locks on unmodified
records can be set during scan phase.

Reviewed by Marko Mäkelä.
2024-10-23 12:36:17 +03:00
Vlad Lesin
92180ad513 MDEV-34466 XA prepare don't release unmodified records for some cases
There is no need to exclude exclusive non-gap locks from the procedure
of locks releasing on XA PREPARE execution in
lock_release_on_prepare_try() after commit
17e59ed3aad481918f1a01d8afbc071c316d5930 (MDEV-33454), because
lock_rec_unlock_unmodified() should check if the record was modified
with the XA, and release the lock if it was not.

lock_release_on_prepare_try(): don't skip X-locks, let
lock_rec_unlock_unmodified() to process them.

lock_sec_rec_some_has_impl(): add template parameter for not acquiring
trx_t::mutex for the case if a caller already holds the mutex, don't
crash if lock's bitmap is clean.

row_vers_impl_x_locked(), row_vers_impl_x_locked_low(): add new argument
to skip trx_t::mutex acquiring.

rw_trx_hash_t::validate_element(): don't acquire trx_t::mutex if the
current thread already holds it.

Thanks to Andrei Elkin for finding the bug.
Reviewed by Marko Mäkelä, Debarun Banerjee.
2024-10-23 12:36:17 +03:00
Oleksandr Byelkin
83db978271 new CC 3.1 2024-10-23 08:40:22 +02:00
Marko Mäkelä
1cad1dbde6 MDEV-35235 innodb_snapshot_isolation=ON fails to signal transaction rollback
convert_error_code_to_mysql(): Treat DB_DEADLOCK and DB_RECORD_CHANGED
in the same way, that is, signal to the SQL layer that the transaction
had been rolled back.
2024-10-23 07:55:22 +03:00
Jan Lindström
b3be3c2157 MDEV-30653 : With wsrep_mode=REPLICATE_ARIA only part of mixed-engine transactions is replicated
Replication of non-transactional engines is experimental and
uses TOI. This naturally means that if there is open transaction
with transactional engine it's changes will be rolled back.

Fixed by adding error message if non-transactional engine
is part of multi-engine transaction with warning.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-10-23 04:00:52 +02:00
Julius Goryavsky
2339f15a00 galera: wsrep-lib submodule update 2024-10-23 03:47:52 +02:00
Jan Lindström
7ffa7b6b01 MDEV-31888 : galera.galera_wan, galera.galera_vote_rejoin_* fail
Clean up configuration and tests. Add wait conditions to make
sure test continues from clean state.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-10-23 03:47:08 +02:00
Marko Mäkelä
b38edd09ff MDEV-34830 fixup: Relax an assertion
This follows up 1067046b7f4337080e5b59c364c0a53cbeca715e
2024-10-22 11:35:33 +03:00
Kristian Nielsen
45537939e7 MDEV-34859: Pass thorugh -DWITH_BOOST_CONTEXT to libmariadb
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-10-22 09:37:13 +02:00
Oleksandr Byelkin
9b3413c71f MDEV-8578: fix galera test 2024-10-22 09:23:56 +02:00
Oleksandr Byelkin
d29611afa1 MDEV-15497 fixed outdated syntax 2024-10-22 09:12:23 +02:00
Marko Mäkelä
1067046b7f MDEV-34830 fixup: Relax an assertion
It is possible that recv_sys.scanned_lsn is ahead of recv_sys.recovered_lsn
by a few 512-byte log blocks in case the last mini-transaction in the log
had not been written out completely before the server was killed.
This is occasionally the case when running the test
innodb.innodb-32k-crash.
2024-10-22 09:09:11 +03:00
Marko Mäkelä
bea4adcb5a MDEV-35225 Bogus debug assertion failures in innodb.innodb-32k-crash
log_sort_flush_list(): Correct some debug assertions that had been added in
commit 0d175968d1181a0308ce6caccc2e4fbc972ca6c6 (MDEV-31354).
The writes of some blocks may be completed and the oldest_modification()
set to 1 at any time.

The bogus assertion failures led to occasional failures of the test
innodb.innodb-32k-crash.
2024-10-22 09:07:57 +03:00
Brandon Nesterenko
1ed30e08af MDEV-34122: Assertion `entry' failed in Active_tranx::assert_thd_is_waiter
If semi-sync is switched off then on while a transaction is
in-between binlogging and waiting for an ACK, the semi-sync state of
the transaction is removed, leading to a debug assertion that
indicates the transaction tried to wait, but cannot receive an ACK
signal. More specifically, when semi-sync is switched off, the
Active_tranx list is cleared (where a transaction adds an entry to
this list during binlogging), and each entry in this list saves the
thread which will wait for an ACK, and the thread has the COND
variable to signal to wake itself. So if the entry is lost, the
Ack_receiver thread won’t be able to find the thread to wake up when
an ACK comes in

The fix is to ensure that the entry exists before awaiting the ACK,
and if there is no entry, skip the wait. In debug builds, an
informative message is written explaining that the transaction is
skipping its wait. Additional debug-build only logic is added to
ensure that the cause of the missing entry is due to semi-sync being
turned off and on

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
2024-10-21 15:35:54 -06:00
Vladislav Vaintroub
e8db5c8760 MDEV-35171 OS_FILE_NORMAL and OS_FILE_AIO are misleading
Removed 'purpose' parameter from os_file_create() and related functions.
Always use FILE_FLAG_OVERLAPPED when opening Windows files.

No performance regression was measured, nor there is any measurable
improvement.
2024-10-21 15:31:32 +02:00
Alexander Barkov
855c21eb99 Recording ctype_gbk_export_import.result according to MDEV-34883 2024-10-21 14:08:00 +04:00
Marko Mäkelä
7701ccb72d MDEV-35149 Race condition around SET GLOBAL innodb_lru_scan_depth
A debug assertion in buf_LRU_get_free_block() could fail if
SET GLOBAL innodb_lru_scan_depth is being executed during a workload
that involves allocating buffer pool pages.

buf_pool_t::LRU_scan_depth: Replaces srv_LRU_scan_depth.

buf_pool_t::flush_neighbors: Replaces srv_flush_neighbors.

innodb_buf_pool_update<T>(): Update a parameter of buf_pool
while holding buf_pool.mutex.
2024-10-21 10:08:58 +03:00
Thirunarayanan Balathandayuthapani
7f7d78bc18 MDEV-35183 ADD FULLTEXT INDEX unnecessarily DROPS FTS COMMON TABLES
- InnoDB fulltext rebuilds the FTS COMMON table while adding the
new fulltext index. This can be optimized by avoiding rebuilding
the FTS COMMON table in case of FTS COMMON TABLE already exists.

Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
2024-10-21 12:27:09 +05:30
Sergei Golubchik
70d8ce63c7 update C/C 2024-10-20 17:13:20 +02:00
Kristian Nielsen
abc46259c6 MDEV-34753 memory pressure - erroneous termination condition
Fix race condition in test case by waiting for the expected state to occur.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-10-19 17:20:27 +11:00
Daniel Black
eb29190398 MDEV-34753 memory pressure - erroneous termination condition
The 'if (!m_abort) break' condition was inverted by accident.

Constrain the test case to environments where there is cgroupv2
runtime environment which is the same case that will pass a memory
pressure initialization.

Remove the explicit garbage_collection trigger as it hides the abnormal
termination error on the event loop for memory pressure. This
also means there is no support in non-cgroupv2 environments
(possibly some container environments).

As the trigger to memory pressure is via a different thread we
need to wait until a "[mM]emory pressure" log message is there to
know it has succeeded or failed.

Thanks Kristian Nielsen for noticing and review.
2024-10-19 17:20:27 +11:00
Sergei Petrunia
a68e74b5a4 MDEV-35164: optimizer_join_limit_pref_ratio: assertion when the ORDER BY table becomes constant
Assertion failure has happened due to this scenario:

A query was ran with optimizer_join_limit_pref_ratio=1.
The query had "ORDER BY t1.col LIMIT N".
The optimizer set join->limit_shortcut_applicable=1.
Then, table t1 was marked as constant.
The code in choose_query_plan() still set join->limit_optimization_mode=1
which caused the optimizer to only consider t1 as the first non-const table.
But t1 was already put into the join prefix as the constant table.
The optimizer couldn't produce any join order at all and crashed.

Fixed by not searching for shortcut plan if ORDER BY table is a constant.
We will not try to do sorting anyway in this case (and LIMIT short-cutting
will be done for any join order).
2024-10-18 15:42:05 +03:00
Rucha Deodhar
e14d2b7974 MDEV-8578: Wrong error code/message with enforce_storage_engine and
NO_ENGINE_SUBSTITUTION

Analysis:
When the error is hit, wrong error code is passed in my_error
Fix:
Pass a better error code.
2024-10-18 16:42:52 +05:30
Sergei Petrunia
0540eac05c MDEV-35180: ref_to_range rewrite causes poor query plan
(Variant 2: only allow rewrite for ref(const))

make_join_select() has a "ref_to_range" rewrite: it would rewrite
any ref access to a range access on the same index if the latter uses
more keyparts.
It seems, he initial intent of this was to fix poor query plan choice
in cases like

  t.keypart1=const AND t.keypart2 < 'foo'

Due to deficiency in cost model, ref access could be picked while range
would enumerate fewer rows and be cheaper.
However, the condition also forces a rewrite in cases like:

  t.keypart1=prev_table.col AND t.keypart1<='foo' AND t.keypart2<'bar'

Here, it can be that
* keypart1=prev_table.col is highly selective
* (keypart1, keypart2) <= ('foo', 'bar') is not at all selective.

Still, the rewrite would be made and poor query plan chosen.
Fixed this by only doing the rewrite if ref access was ref(const)
so we can be certain that quick select also used these restrictions
and will scan a subset of rows that ref access would scan.
2024-10-18 13:37:04 +03:00
Marko Mäkelä
ebefef658e Merge 10.11 into 11.2 2024-10-18 11:32:22 +03:00
Marko Mäkelä
eca552a1a4 MDEV-34830: LSN in the future is not being treated as serious corruption
The invariant of write-ahead logging is that before any change to a
page is written to the data file, the corresponding log record must
must first have been durably written.

In crash recovery, there were some sloppy checks for this. Let us
implement accurate checks and flag an inconsistency as a hard error,
so that we can avoid further corruption of a corrupted database.
For data extraction from the corrupted database, innodb_force_recovery
can be used.

Before recovery is reading any data pages or invoking
buf_dblwr_t::recover() to recover torn pages from the
doublewrite buffer, InnoDB will have parsed the log until the
final LSN and updated log_sys.lsn to that. So, we can rely on
log_sys.lsn at all times. The doublewrite buffer recovery has been
refactored in such a way that the recv_sys.dblwr.pages may be consulted
while discovering files and their page sizes, but nothing will be
written back to data files before buf_dblwr_t::recover() is invoked.

recv_max_page_lsn, recv_lsn_checks_on: Remove.

recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging
condition at the end of the recovery.

recv_dblwr_t::validate_page(): Keep track of the maximum LSN
(if we are checking a non-doublewrite copy of a page) but
do not complain LSN being in the future. The doublewrite buffer
is a special case, because it will be read early during recovery.
Besides, starting with commit 762bcb81b5bf9bbde61fed59afb26417f4ce1e86
the dblwr=true copies of pages may legitimately be "too new".

recv_dblwr_t::find_page(): Find a valid page with the smallest
FIL_PAGE_LSN that is in the valid range for recovery.

recv_dblwr_t::restore_first_page(): Replaced by find_page().
Only buf_dblwr_t::recover() will write to data files.

buf_dblwr_t::recover(): Simplify the message output. Do attempt
doublewrite recovery on user page read error. Ignore doublewrite
pages whose FIL_PAGE_LSN is outside the usable bounds. Previously,
we could wrongly recover a too new page from the doublewrite buffer.
It is unlikely that this could have lead to an actual error.
Write back all recovered pages from the doublewrite buffer here,
including for the first page of any tablespace.

buf_page_is_corrupted(): Distinguish the return values
CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER.

buf_page_check_corrupt(): Return the error code DB_CORRUPTION
in case the LSN is in the future.

Datafile::read_first_page_flags(): Split from read_first_page().
Take a copy of the first page as a parameter.

recv_sys_t::free_corrupted_page(): Take the file as a parameter
and return whether a message was displayed. This avoids some duplicated
and incomplete error messages.

buf_page_t::read_complete(): Remove some redundant output and always
display the name of the corrupted file. Never return DB_FAIL;
use it only in internal error handling.

IORequest::read_complete(): Assume that buf_page_t::read_complete()
will have reported any error.

fil_space_t::set_corrupted(): Return whether this is the first time
the tablespace had been flagged as corrupted.

Datafile::validate_first_page(), fil_node_open_file_low(),
fil_node_open_file(), fil_space_t::read_page0(),
fil_node_t::read_page0(): Add a parameter for a copy of the
first page, and a parameter to indicate whether the FIL_PAGE_LSN
check should be suppressed. Before buf_dblwr_t::recover() is
invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the
FSP_SPACE_FLAGS and the tablespace ID that may be present in a
potentially too new copy of a page.

Reviewed by: Debarun Banerjee
2024-10-18 10:12:47 +03:00