1
0
mirror of https://github.com/MariaDB/server.git synced 2025-11-09 11:41:36 +03:00
Commit Graph

3687 Commits

Author SHA1 Message Date
Thirunarayanan Balathandayuthapani
8a4d3a044f MDEV-36017 Alter table aborts when temporary directory is full
Problem:
=======
- During inplace algorithm, concurrent DML fails to write
the log operation into the temporary file. InnoDB fail to
mark the error for the online log.

- ddl_log_write() releases the global ddl lock prematurely before
release the log memory entry

Fix:
===
row_log_online_op(): Mark the error in online log when
InnoDB ran out of temporary space

fil_space_extend_must_retry(): Mark the os_has_said_disk_full
as true if os_file_set_size() fails

btr_cur_pessimistic_update(): Return error code when
btr_cur_pessimistic_insert() fails

ddl_log_write(): Release the global ddl lock after releasing the
log memory entry when error was encountered

btr_cur_optimistic_update(): Relax the assertion that
blob pointer can be null during rollback because InnoDB can
ran out of space while allocating the external page

row_undo_mod_upd_exist_sec(): Remove the assertion which says
that InnoDB should fail to build index entry when rollbacking
an incomplete transaction after crash recovery. This scenario
can happen when InnoDB ran out of space.

row_upd_changes_ord_field_binary_func(): Relax the assertion to
make that externally stored field can be null when InnoDB ran out
of space.
2025-05-25 09:11:41 +05:30
Marko Mäkelä
82d7419e06 MDEV-34388: Stack overflow on Alpine Linux
page_is_corrupted(): Do not allocate the buffers from stack,
but from the heap, in xb_fil_cur_open().

row_quiesce_write_cfg(): Issue one type of message when we
fail to create the .cfg file.

update_statistics_for_table(), read_statistics_for_table(),
delete_statistics_for_table(), rename_table_in_stat_tables():
Use a common stack buffer for Index_stat, Column_stat, Table_stat.

ha_connect::FileExists(): Invoke push_warning_printf() so that
we can avoid allocating a buffer for snprintf().

translog_init_with_table(): Do not duplicate TRANSLOG_PAGE_SIZE_BUFF.

Let us also globally enable the GCC 4.4 and clang 3.0 option
-Wframe-larger-than=16384 to reduce the possibility of introducing
such stack overflow in the future.  For RocksDB and Mroonga we relax
these limits.

Reviewed by: Vladislav Lesin
2025-05-20 17:27:05 +03:00
Vlad Lesin
47e687b109 MDEV-36639 innodb_snapshot_isolation=1 gives error for not committed row changes
Set solution is to check if transaction, which modified a record, is
still active in lock_clust_rec_read_check_and_lock(). if yes, then just
request a lock. If no, then, depending on if the current transaction read
view can see the changes, return eighter DB_RECORD_CHANGED or request a
lock.

We can do the check in lock_clust_rec_read_check_and_lock() because
transaction tries to set a lock on the record which cursor points to after
transaction resuming and cursor position restoring. If the lock already
exists, then we don't request the lock again. But for the current commit
it's important that lock_clust_rec_read_check_and_lock() will be invoked
again for the same record, so we can do the check again after
transaction, which modified a record, was committed or rolled back.

MDEV-33802(4aa9291) is partially reverted. If some transaction holds
implicit lock on some record and transaction with snapshot isolation level
requests conflicting lock on the same record, it should be blocked instead
of returning DB_RECORD_CHANGED to have ability to continue execution when
implicit lock owner is rolled back.

The construction
--------------------------------------------------------------------------
let $wait_condition=
  select count(*) = 1 from information_schema.processlist
  where state = 'Updating' and info = 'UPDATE t SET b = 2 WHERE a';
--source include/wait_condition.inc
--------------------------------------------------------------------------

is not reliable enought to make sure transaction is blocked in test
case, the test failed sporadically with
--------------------------------------------------------------------------
./mtr --max-test-fail=1 --parallel=96 lock_isolation{,,,,,,,}{,,,}{,,} \
--repeat=500
--------------------------------------------------------------------------

command. That's why it was replaced with debug sync-points.

Reviewed by: Marko Mäkelä
2025-04-22 20:41:43 +03:00
Thirunarayanan Balathandayuthapani
dac3d702f7 MDEV-36649 dict_acquire_mdl_shared() aborts when table mode is DICT_TABLE_OP_OPEN_ONLY_IF_CACHED
- InnoDB fails to check the table is being dropped or evicted
while acquiring the MDL for the table when table open operation
mode is DICT_TABLE_OP_OPEN_ONLY_IF_CACHED. This is caused by
the commit 337bf8ac4b (MDEV-36122)

Fix:
===
dict_acquire_mdl_shared(): If the table is evicted or dropped when
table operation mode is DICT_TABLE_OP_OPEN_IF_CACHED then return
nullptr
2025-04-22 15:17:29 +05:30
Marko Mäkelä
db4763a0d1 Fix a slow test
When we expect a lock wait timeout, let us override
the default innodb_lock_wait_timeout=50
with the minimum timeout of 1 second.
2025-04-07 10:25:34 +03:00
Thirunarayanan Balathandayuthapani
b11772d9a5 MDEV-33167 ASAN errors in dict_sys_t::load_table / get_foreign_key_info after failing to load a table
Problem:
=======
- While loading the foreign key constraints for the parent table,
if child table wasn't open then InnoDB uses the parent table heap
to store the child table name in fk_tables list. If the consecutive
foreign key relation for the parent table fails with error,
InnoDB evicts the parent table from memory. But InnoDB accesses the
evicted table memory again in dict_sys.load_table()

Solution:
========
dict_load_table_one(): In case of error, remove the child table
names which was added during dict_load_foreigns()
2025-04-03 17:39:40 +05:30
Thirunarayanan Balathandayuthapani
0d7ef4f478 MDEV-36236 Instant alter aborts when InnoDB fails to rollback instant operation
Problem:
========
- InnoDB does consecutive instant alter operation, first instant DDL
fails, it fails to reset the old instant information in table during
rollback. This lead to consecutive instant alter to have wrong
assumption about the exisitng instant column information.

Fix:
====
dict_table_t::instant_column(): Duplicate the instant information
field of the table. By doing this, InnoDB alter retains the old
instant information and reset it during rollback operation
2025-04-03 13:09:08 +05:30
Marko Mäkelä
4c0e2f1aca MDEV-35813: even more robust test case
The test in commit 1756b0f37d
is occasionally failing if there are unexpectedly many page cleaner
batches that are updating the log checkpoint by small amounts.
This occurs in particular when running the server under Valgrind.

Let us insert the same number of records with a larger number of
statements in a hope that the test would then be more likely to pass.
2025-04-02 08:12:29 +03:00
Marko Mäkelä
191209d8ab Merge 10.5 into 10.6 2025-03-26 17:09:57 +02:00
Thirunarayanan Balathandayuthapani
1f4a901576 MDEV-36281 DML aborts during online virtual index
Reason:
=======
- InnoDB DML commit aborts the server when InnoDB does online
virtual index. During online DDL, concurrent DML commit
operation does read the undo log record and their related
current version of the clustered index record. Based on the
operation, InnoDB do build the old tuple and new tuple for
the table. If the concurrent online index can be affected
by the operation, InnoDB does build the entry
for the index and log the operation.

Problematic case is update operation, InnoDB does build the
update vector. But while building the old row, InnoDB fails
to fill the non-affected virtual column. This lead to
server abort while build the entry for index.

Fix:
===
- First, fill the virtual column entries for the new row.
Duplicate the old row based on new row and change only the
affected fields in old row based on the update vector.
2025-03-26 12:48:39 +01:00
Marko Mäkelä
1756b0f37d MDEV-35813: more robust test case
Let us integrate the test case with innodb.page_cleaner so that there
will be less interference from log writes due to checkpoints.
Also, make the test compatible with ./mtr --cursor-protocol.
2025-03-18 10:41:38 +02:00
Marko Mäkelä
0e8e0065d6 MDEV-35813 test case 2025-03-17 16:21:09 +02:00
Kristian Nielsen
acaf07daed Add --source include/long_test.inc to some tests
This will make mysql-test-run.pl try to schedule these long-running
(> 60 seconds) tests early in --parallel runs, which helps avoid that
the testsuite gets stuck with a few long-running tests at the end
while most other test workers are idle.

This speed up mtr --parallel=96 with 25 seconds for me.

Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-03-15 11:15:54 +01:00
Marko Mäkelä
c07e355c40 MDEV-36015: unrepresentable value in row_parse_int()
row_parse_int(): Refactor the code and define the function static in
one compilation unit. For any negative values, we must return 0.

row_search_get_max_rec(), row_search_max_autoinc(): Moved to the same
compilation unit with row_parse_int().

We also remove a work-around of an internal compiler error when
targeting ARMv8 on GCC 4.8.5, a compiler that is no longer supported.

Reviewed by: Debarun Banerjee
2025-02-13 15:10:53 +01:00
Vlad Lesin
6e6fcf4d43 MDEV-34489 innodb.innodb_row_lock_time_ms fails
The test fails trying to compare (innodb/lock)_row_lock_time_avg with
some limit. We can't predict (innodb/lock)_row_lock_time_avg value,
because it's counted as the whole waiting time divided by the amount of
waits. Both waiting time and amount of waits depend on the previous
tests execution. The corresponding counters in lock_sys can't be reset
with any query. Remove (innodb/lock)_row_lock_time_avg comparision from
the test.

information_schema.global_status.innodb_row_lock_time can't be reset,
compare its difference instead of absolute value.

Reviewed by: Marko Mäkelä
2025-02-04 19:14:41 +03:00
Marko Mäkelä
900bbbe4a8 MDEV-33295 innodb.doublewrite occasionally fails
When the first attempt of XA ROLLBACK is expected to fail,
some recovered changes could be written back through the doublewrite
buffer. Should that happen, the next recovery attempt (after mangling
the data file t1.ibd further) could fail because no copy of the
affected pages would be available in the doublewrite buffer.
To prevent this from happening, ensure that the doublewrite buffer
will not be used and no log checkpoint occurs during the previous
failed recovery attempt.

Also, let a successful XA ROLLBACK serve the additional purpose of
freeing a BLOB page and therefore rewriting page 0, which we must then
be able to recover despite induced corruption. In the last restart step,
we will tolerate an unexpected checkpoint, because one is frequently
occurring on FreeBSD and AIX, despite our efforts to force a buffer pool
flush before each "no checkpoint" section.
2025-02-03 08:11:43 +02:00
Sergei Golubchik
066e8d6aea Merge branch '10.5' into 10.6 2025-01-29 11:17:38 +01:00
Marko Mäkelä
3cfffb4de6 MDEV-35962 CREATE INDEX fails to heal a FOREIGN KEY constraint
commit_cache_norebuild(): Replace any newly added indexes in
the attached foreign key constraints.
2025-01-29 09:04:50 +02:00
Nikita Malyavin
ecaedbe299 MDEV-33658 1/2 Refactoring: extract Key length initialization
mysql_prepare_create_table: Extract a Key initialization part that
relates to length calculation and long unique index designation.

append_system_key_parts call also moves there.

Move this initialization before the duplicate elimination.

Extract WITHOUT OVERPLAPS check into a separate function. It had to be moved
earlier in the code to preserve the order of the error checks, as in the tests.
2025-01-26 16:15:46 +01:00
Marko Mäkelä
d4da659b43 MDEV-35854: Simplify dict_get_referenced_table()
innodb_convert_name(): Convert a schema or table name to
my_charset_filename compatible format.

dict_table_lookup(): Replaces dict_get_referenced_table().
Make the callers responsible for invoking innodb_convert_name().

innobase_casedn_str(): Remove. Let us invoke my_casedn_str() directly.

dict_table_rename_in_cache(): Do not duplicate a call to
dict_mem_foreign_table_name_lookup_set().

innobase_convert_to_filename_charset(): Defined static in the only
compilation unit that needs it.

dict_scan_id(): Remove the constant parameters
table_id=FALSE, accept_also_dot=TRUE. Invoke strconvert() directly.

innobase_convert_from_id(): Remove; only called from dict_scan_id().

innobase_convert_from_table_id(): Remove (dead code).

table_name_t::dblen(), table_name_t::basename(): In non-debug builds,
tolerate names that may miss a '/' separator.

Reviewed by: Debarun Banerjee
2025-01-23 14:38:08 +02:00
Marko Mäkelä
82310f926b MDEV-29182 Assertion fld->field_no < table->n_v_def failed on cascade
row_ins_cascade_calc_update_vec(): Skip any virtual columns in the
update vector of the parent table.

Based on mysql/mysql-server@0ac176453b

Reviewed by: Debarun Banerjee
2025-01-22 17:22:07 +02:00
Thirunarayanan Balathandayuthapani
0301ef38b4 MDEV-35445 Disable foreign key column nullability check for strict sql mode
- MDEV-34392(commit cc810e64d4) adds
the check for nullability of foreign key column when foreign key
relation is of UPDATE_CASCADE or UPDATE SET NULL. This check
makes DDL fail when it violates foreign key nullability.
This patch basically does the nullability check for foreign key
column only for strict sql mode
2025-01-21 18:52:33 +05:30
Marko Mäkelä
98dbe3bfaf Merge 10.5 into 10.6 2025-01-20 09:57:37 +02:00
Marko Mäkelä
f521b8ac21 MDEV-35723: applying non-zero offset to null pointer in INSERT
row_mysql_read_blob_ref(): Correctly handle what Field_blob::store()
generates for length=0.
2025-01-17 12:34:03 +02:00
Marko Mäkelä
b82abc7163 MDEV-35701 trx_t::autoinc_locks causes unnecessary dynamic memory allocation
trx_t::autoinc_locks: Use small_vector<lock_t*,4> in order to avoid any
dynamic memory allocation in the most common case (a statement is
holding AUTO_INCREMENT locks on at most 4 tables or partitions).

lock_cancel_waiting_and_release(): Instead of removing elements from
the middle, simply assign nullptr, like lock_table_remove_autoinc_lock().

The added test innodb.auto_increment_lock_mode covers the dynamic memory
allocation as well as nondeterministically (occasionally) covers
the out-of-order lock release in lock_table_remove_autoinc_lock().

Reviewed by: Debarun Banerjee
2025-01-15 16:55:01 +02:00
Sergei Golubchik
c478b1ba08 MDEV-35598 foreign key error is unnecessary truncated
truncate it at 512 bytes (max allowed by the protocol), not 192
2025-01-09 10:00:36 +01:00
Sergei Golubchik
0706c01b88 cleanup: innodb.innodb_information_schema
don't disable query/result log unless the output is unstable.

and even then don't, but replace away unstable parts.
2025-01-09 10:00:35 +01:00
Marko Mäkelä
6d4841ae26 MDEV-35647 Possible hang during CREATE TABLE…SELECT error handling
ha_innobase::delete_table(): Clear trx->dict_operation_lock_mode
after, not before invoking trx->rollback(), so that
row_undo_mod_parse_undo_rec() will be invoked with dict_locked=true
and dict_sys_t::freeze() will not be invoked for loading a table
definition. Inside dict_sys_t::freeze(), an assertion !have_any()
would fail when the current thread is already holding the latch.

This fixes up commit c5fd9aa562 (MDEV-25919).

Reviewed by: Debarun Banerjee
2025-01-08 13:29:16 +02:00
Monty
88d9348dfc Remove dates from all rdiff files 2025-01-05 16:40:11 +02:00
Marko Mäkelä
e5c4c0842d MDEV-35443: opt_search_plan_for_table() may degrade to full table scan
opt_calc_index_goodness(): Correct an inaccurate condition.
We can very well use a clustered index of a table that is subject
to online rebuild. But we must not choose an index that has not been
committed (it is a secondary index that was not fully created)
or that is corrupted or not a normal B-tree index.

opt_search_plan_for_table(): Remove some redundant code, now that
opt_calc_index_goodness() checks against corrupted indexes.

The test case allows this code to be exercised. The main observation
in the following:
	./mtr --rr innodb.stats_persistent
	rr replay var/log/mysqld.1.rr/latest-trace
should be that when opt_search_plan_for_table() is being invoked by
dict_stats_update_persistent() on the being-altered statistics table
in the 2nd call after ha_innobase::inplace_alter_table(),
and the fix in opt_calc_index_goodness() is absent,
it would choose the code path if (n_fields == 0), that is, a full
table scan, instead of searching for the record. The GDB commands to
execute in "rr replay" would be as follows:
	break ha_innobase::inplace_alter_table
	continue
	break opt_search_plan_for_table
	continue
	continue
	next
	next
	…

Reviewed by: Vladislav Lesin
2024-12-19 14:05:16 +02:00
Marko Mäkelä
ddd7d5d8e3 MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure
Under unknown circumstances, the SQL layer may wrongly disregard an
invocation of thd_mark_transaction_to_rollback() when an InnoDB
transaction had been aborted (rolled back) due to one of the following errors:
* HA_ERR_LOCK_DEADLOCK
* HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON)
* HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON)

Such an error used to cause a crash of InnoDB during transaction commit.
These changes aim to catch and report the error earlier, so that not only
this crash can be avoided but also the original root cause be found and
fixed more easily later.

The idea of this fix is from Michael 'Monty' Widenius.

HA_ERR_ROLLBACK: A new error code that will be translated into
ER_ROLLBACK_ONLY, signalling that the current transaction
has been aborted and the only allowed action is ROLLBACK.

trx_t::state: Add TRX_STATE_ABORTED that is like
TRX_STATE_NOT_STARTED, but noting that the transaction had been
rolled back and aborted.

trx_t::is_started(): Replaces trx_is_started().

ha_innobase: Check the transaction state in various places.
Simplify the logic around SAVEPOINT.

ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only().

The InnoDB logic around transaction savepoints, commit, and rollback
was unnecessarily complex and might have contributed to this
inconsistency. So, we are simplifying that logic as well.

trx_savept_t: Replace with const undo_no_t*. When we rollback to
a savepoint, all we need to know is the number of undo log records
that must survive.

trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t
directly in the space allocated at innobase_hton->savepoint_offset.

fts_trx_create(): Do not copy previous savepoints.

fts_savepoint_rollback(): If a savepoint was not found, roll back
everything after the default savepoint of fts_trx_create().
The test innodb_fts.savepoint is extended to cover this code.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
2024-12-12 18:02:00 +02:00
Marko Mäkelä
19acb0257e MDEV-35508 Race condition between purge and secondary index INSERT or UPDATE
row_purge_remove_sec_if_poss_leaf(): If there is an active transaction
that is not newer than PAGE_MAX_TRX_ID, return the bogus value 1
so that row_purge_remove_sec_if_poss_tree() is guaranteed to recheck if
the record needs to be purged. It could be the case that an active
transaction would insert this record between the time this check
completed and row_purge_remove_sec_if_poss_tree() acquired a latch
on the secondary index leaf page again.

row_purge_del_mark_error(), row_purge_check(): Some unlikely code
refactored into separate non-inline functions.

trx_sys_t::find_same_or_older_low(): Move the unlikely and bulky
part of trx_sys_t::find_same_or_older() to a non-inline function.

trx_sys_t::find_same_or_older_in_purge(): A variant of
trx_sys_t::find_same_or_older() for use in the purge subsystem,
with potential concurrent access of the same trx_t object from
multiple threads.

trx_t::max_inactive_id_atomic: An Atomic_relaxed alias of the
regular data field trx_t::max_inactive_id, which we
use on systems that have native 64-bit loads or stores.
On any 64-bit system that seems to be supported by GCC, Clang or MSVC,
relaxed atomic loads and stores use the regular load and store
instructions. On -march=i686 the 64-bit atomic loads and stores
would use an XMM register.

This fixes a regression that had been introduced in
commit b7b9f3ce82 (MDEV-34515).
There would be messages
[ERROR] InnoDB: tried to purge non-delete-marked record in index
in the server error log, and an assertion ut_ad(0) would cause a
crash of debug instrumented builds. This could also cause incorrect
results for MVCC reads and corrupted secondary indexes.

The debug instrumented test case was written by Debarun Banerjee.

Reviewed by: Debarun Banerjee
2024-11-29 10:44:38 +02:00
Thirunarayanan Balathandayuthapani
9ba18d1aa0 MDEV-35394 Innochecksum misinterprets freed pages
- Innochecksum misinterprets the freed pages as active one.
This leads the user to think there are too many valid
pages exist.

- To avoid this confusion, innochecksum introduced one
more option --skip-freed-pages and -r to avoid the freed
pages while dumping or printing the summary of the tablespace.

- Innochecksum can safely assume the page is freed if
the respective extent doesn't belong to a segment and marked as
freed in XDES_BITMAP in extent descriptor page.

- Innochecksum shouldn't assume that zero-filled page as extent
descriptor page.

Reviewed-by: Marko Mäkelä
2024-11-27 13:00:51 +05:30
Thirunarayanan Balathandayuthapani
8e1cf078a0 MDEV-35363 Avoid cloning of table statistics while saving the InnoDB table stats
- Remove the test case added as a part of the
commit 98d57719e2 (MDEV-32667)
2024-11-14 15:32:55 +05:30
Thirunarayanan Balathandayuthapani
074831ec61 Merge branch 10.5 into 10.6 2024-11-08 18:17:15 +05:30
Thirunarayanan Balathandayuthapani
7afee25b08 MDEV-35115 Inconsistent Replace behaviour when multiple unique index exist
- Replace statement fails with duplicate key error when multiple
unique key conflict happens. Reason is that Server expects the
InnoDB engine to store the confliciting keys in ascending order.
But the InnoDB doesn't store the conflicting keys
in ascending order.

Fix:
===
- Enable HA_DUPLICATE_KEY_NOT_IN_ORDER for InnoDB storage engine
only when unique index order is different in .frm and innodb dictionary.
2024-11-08 16:46:41 +05:30
Marko Mäkelä
ba4541ba7f MDEV-29015/MDEV-29260/MDEV-34938 test fixup
The merge f00711bba2 included a change
of the test innodb.log_file_name, which would try to ensure that
in the presence of the code fix decdd4bf49
we would get an error on Linux when invoking lseek() on a directory.
It turns out that this is not the case in at least one Linux based
cloud environment.
2024-11-08 09:55:47 +02:00
Thirunarayanan Balathandayuthapani
98d57719e2 MDEV-32667 dict_stats_save_index_stat() reads uninitialized index->stats_error_printed
Problem:
========
- dict_stats_table_clone_create() does not initialize the
flag stats_error_printed in either dict_table_t or dict_index_t.
Because dict_stats_save_index_stat() is operating on a copy
of a dict_index_t object, it appears that
dict_index_t::stats_error_printed will always be false
for actual metadata objects, and uninitialized in
dict_stats_save_index_stat().

Solution:
=========
dict_stats_table_clone_create(): Assign stats_error_printed
for table and index while copying the statistics
2024-11-08 11:35:19 +05:30
Oleksandr Byelkin
f00711bba2 Merge branch '10.5' into 10.6 2024-10-29 14:20:03 +01:00
Sergei Golubchik
3cd706b107 MDEV-35236 Assertion `(mem_root->flags & 4) == 0' failed in safe_lexcstrdup_root
Post-fix for MDEV-35144.

Cannot allocate options values on the statement arena, because
HA_CREATE_INFO is shallow-copied for every execution, so if the
option_list was initially empty, it will be reset for every execution
and any values allocated on the  statement arena will be lost.

Cannot allocate option values on the execution arena, because
HA_CREATE_INFO is shallow-copied for every execution, so if the
option_list was  initially NOT empty, any values appended to the
end will be preserved and if they're on the execution arena their
content will be destroyed.

Let's use thd->change_item_tree() to save and restore necessary pointers
for every execution.

followup for 3da565c41d
2024-10-23 14:58:57 +02:00
Vlad Lesin
8c7786e7d5 MDEV-34690 lock_rec_unlock_unmodified() causes deadlock
lock_rec_unlock_unmodified() is executed either under lock_sys.wr_lock()
or under a combination of lock_sys.rd_lock() + record locks hash table
cell latch. It also requests page latch to check if locked records were
changed by the current transaction or not.

Usually InnoDB requests page latch to find the certain record on the
page, and then requests lock_sys and/or record lock hash cell latch to
request record lock. lock_rec_unlock_unmodified() requests the latches
in the opposite order, what causes deadlocks. One of the possible
scenario for the deadlock is the following:

thread 1 - lock_rec_unlock_unmodified() is invoked under locks hash table
           cell latch, the latch is acquired;
thread 2 - purge thread acquires page latch and tries to remove
           delete-marked record, it invokes lock_update_delete(), which
           requests locks hash table cell latch, held by thread 1;
thread 1 - requests page latch, held by thread 2.

To fix it we need to release lock_sys.latch and/or lock hash cell latch,
acquire page latch and re-acquire lock_sys related latches.

When lock_sys.latch and/or lock hash cell latch are released in
lock_release_on_prepare() and lock_release_on_prepare_try(), the page on
which the current lock is held, can be merged. In this case the bitmap
of the current lock must be cleared, and the new lock must be added to
the end of trx->lock.trx_locks list, or bitmap of already existing lock
must be changed.

The new field trx_lock_t::set_nth_bit_calls indicates if new locks
(bits in existing lock bitmaps or new lock objects) were created during
the period when lock_sys was released in trx->lock.trx_locks list
iteration loop in lock_release_on_prepare() or
lock_release_on_prepare_try(). And, if so, we traverse the list again.

The block can be freed during pages merging, what causes assertion
failure in buf_page_get_gen(), as btr_block_get() passes BUF_GET as page
get mode to it. That's why page_get_mode parameter was added to
btr_block_get() to pass BUF_GET_POSSIBLY_FREED from
lock_release_on_prepare() and lock_release_on_prepare_try() to
buf_page_get_gen().

As searching for id of trx, which modified secondary index record, is
quite expensive operation, restrict its usage for master. System variable
was added to remove the restriction for testing simplifying. The
variable exists only either for debug build or for build with
-DINNODB_ENABLE_XAP_UNLOCK_UNMODIFIED_FOR_PRIMARY option to increase the
probability of catching bugs for release build with RQG.

Note that the code, which does primary index lookup to find out what
transaction modified secondary index record, is necessary only when
there is no primary key and no unique secondary key on replica with row
based replication, because only in this case extra X locks on unmodified
records can be set during scan phase.

Reviewed by Marko Mäkelä.
2024-10-23 12:36:17 +03:00
Vlad Lesin
92180ad513 MDEV-34466 XA prepare don't release unmodified records for some cases
There is no need to exclude exclusive non-gap locks from the procedure
of locks releasing on XA PREPARE execution in
lock_release_on_prepare_try() after commit
17e59ed3aa (MDEV-33454), because
lock_rec_unlock_unmodified() should check if the record was modified
with the XA, and release the lock if it was not.

lock_release_on_prepare_try(): don't skip X-locks, let
lock_rec_unlock_unmodified() to process them.

lock_sec_rec_some_has_impl(): add template parameter for not acquiring
trx_t::mutex for the case if a caller already holds the mutex, don't
crash if lock's bitmap is clean.

row_vers_impl_x_locked(), row_vers_impl_x_locked_low(): add new argument
to skip trx_t::mutex acquiring.

rw_trx_hash_t::validate_element(): don't acquire trx_t::mutex if the
current thread already holds it.

Thanks to Andrei Elkin for finding the bug.
Reviewed by Marko Mäkelä, Debarun Banerjee.
2024-10-23 12:36:17 +03:00
Marko Mäkelä
1cad1dbde6 MDEV-35235 innodb_snapshot_isolation=ON fails to signal transaction rollback
convert_error_code_to_mysql(): Treat DB_DEADLOCK and DB_RECORD_CHANGED
in the same way, that is, signal to the SQL layer that the transaction
had been rolled back.
2024-10-23 07:55:22 +03:00
Sergei Golubchik
3a1cf2c85b MDEV-34679 ER_BAD_FIELD uses non-localizable substrings 2024-10-17 21:37:37 +02:00
Sergei Golubchik
3da565c41d MDEV-35144 CREATE TABLE ... LIKE uses current innodb_compression_default instead of the create value
When adding a column or index that uses plugin-defined
sysvar-based options with CREATE ... LIKE the server
was using the current value of the sysvar, not the default one.

Because parse_option_list() function was used both in create
and open and it tried to guess when it's create (need to use
current sysvar value and add a new name=value pair to the list)
or open (need to use default, without extending the list).

Let's move the list extending functionality into a separate
function and call it explicitly when needed. Operations that
add new objects (CREATE, ALTER ... ADD) will extend the list,
other operations (ALTER, CREATE ... LIKE, open) will not.
2024-10-17 16:28:39 +02:00
Marko Mäkelä
bb47e575de MDEV-34830: LSN in the future is not being treated as serious corruption
The invariant of write-ahead logging is that before any change to a
page is written to the data file, the corresponding log record must
must first have been durably written.

On crash recovery, there were some sloppy checks for this. Let us
implement accurate checks and flag an inconsistency as a hard error,
so that we can avoid further corruption of a corrupted database.
For data extraction from the corrupted database, innodb_force_recovery
can be used.

Before recovery is reading any data pages or invoking
buf_dblwr_t::recover() to recover torn pages from the
doublewrite buffer, InnoDB will have parsed the log until the
final LSN and updated log_sys.lsn to that. So, we can rely on
log_sys.lsn at all times. The doublewrite buffer recovery has been
refactored in such a way that the recv_sys.dblwr.pages may be consulted
while discovering files and their page sizes, but nothing will be
written back to data files before buf_dblwr_t::recover() is invoked.

A section of the test mariabackup.innodb_redo_overwrite
that is parsing some mariadb-backup --backup output has
been removed, because that output "redo log block is overwritten"
would often be missing in a Microsoft Windows environment
as a result of these changes.

recv_max_page_lsn, recv_lsn_checks_on: Remove.

recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging
condition at the end of the recovery.

recv_dblwr_t::validate_page(): Keep track of the maximum LSN
(if we are checking a non-doublewrite copy of a page) but
do not complain LSN being in the future. The doublewrite buffer
is a special case, because it will be read early during recovery.
Besides, starting with commit 762bcb81b5
the dblwr=true copies of pages may legitimately be "too new".

recv_dblwr_t::find_page(): Find a valid page with the smallest
FIL_PAGE_LSN that is in the valid range for recovery.

recv_dblwr_t::restore_first_page(): Replaced by find_page().
Only buf_dblwr_t::recover() will write to data files.

buf_dblwr_t::recover(): Simplify the message output. Do attempt
doublewrite recovery on user page read error. Ignore doublewrite
pages whose FIL_PAGE_LSN is outside the usable bounds. Previously,
we could wrongly recover a too new page from the doublewrite buffer.
It is unlikely that this could have lead to an actual error.
Write back all recovered pages from the doublewrite buffer here,
including for the first page of any tablespace.

buf_page_is_corrupted(): Distinguish the return values
CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER.

buf_page_check_corrupt(): Return the error code DB_CORRUPTION
in case the LSN is in the future.

Datafile::read_first_page(): Handle FSP_SPACE_FLAGS=0xffffffff
in the same way on both 32-bit and 64-bit architectures.

Datafile::read_first_page_flags(): Split from read_first_page().
Take a copy of the first page as a parameter.

recv_sys_t::free_corrupted_page(): Take the file as a parameter
and return whether a message was displayed. This avoids some duplicated
and incomplete error messages.

buf_page_t::read_complete(): Remove some redundant output and always
display the name of the corrupted file. Never return DB_FAIL;
use it only in internal error handling.

IORequest::read_complete(): Assume that buf_page_t::read_complete()
will have reported any error.

fil_space_t::set_corrupted(): Return whether this is the first time
the tablespace had been flagged as corrupted.

Datafile::validate_first_page(), fil_node_open_file_low(),
fil_node_open_file(), fil_space_t::read_page0(),
fil_node_t::read_page0(): Add a parameter for a copy of the
first page, and a parameter to indicate whether the FIL_PAGE_LSN
check should be suppressed. Before buf_dblwr_t::recover() is
invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the
FSP_SPACE_FLAGS and the tablespace ID that may be present in a
potentially too new copy of a page.

Reviewed by: Debarun Banerjee
2024-10-17 17:24:20 +03:00
Vladislav Vaintroub
c1fc59277a MDEV-34929 page-compressed tables do not work on Windows
Remove workaround for MDEV-13941, it served for 5 years,and all affected
pre-release 10.2 installation should have been already fixed in between.

Apparently Innodb is using is_sparse parameter in os_file_set_size()
inconsistently, and it passes is_sparse=false now during first file
extension. With MDEV-13941 workaround in place, it would unsparse
the file, which is makes compression not to work at all anymore.
2024-10-16 16:02:13 +02:00
Thirunarayanan Balathandayuthapani
6aaae4c03b MDEV-35122 Incorrect NULL value handling for instantly dropped BLOB columns
Problem:
=======
- Redundant table fails to insert into the table after
instant drop blob column. Instant drop column only marking
the column as hidden and consecutive insert statement tries
to insert NULL value for the dropped BLOB column and returns
the fixed length of the blob type as 65535. This lead to
row size too large error.

Fix:
====
 For redundant table, if the non-fixed dropped column can be null
then set the length of the field type as 0.
2024-10-15 12:04:37 +05:30
Yuchen Pei
cd5577ba4a Merge branch '10.5' into 10.6 2024-10-15 16:00:44 +11:00
Thirunarayanan Balathandayuthapani
5777d9f282 MDEV-35116 InnoDB fails to set error index for HA_ERR_NULL_IN_SPATIAL
- InnoDB fails to set the index information or index number
for the spatial index error HA_ERR_NULL_IN_SPATIAL.

row_build_spatial_index_key(): Initialize the tmp_mbr array completely.

check_if_supported_inplace_alter(): Fix the spelling mistake of alter
2024-10-14 14:28:24 +05:30