innobase_commit(): When Galera is used with SET SQL_LOG_BIN=OFF,
some debug assertions that had been added in
commit ddd7d5d8e3 (MDEV-24035)
would fail. Let us relax those assertions for Galera transactions,
to allow an implicit commit after an internally executed XA PREPARE.
Note that trx_undo_report_row_operation() only allows undo log records
to be added to ACTIVE transactions (not after XA PREPARE). Hence, this
relaxation should be safe with respect to writes.
Problem:
=======
- Import tablespace fails to check the index fields descending
property while matching the schema given in cfg file with the
table schema.
Fix:
===
row_quiesce_write_index_fields(): Write the descending
property of the field into field fixed length field.
Since the field fixed length uses only 10 bits,
InnoDB can use 0th bit of the field fixed length
to store the descending field property.
row_import_cfg_read_index_fields(): Read the field
descending information from field fixed length.
page_encrypt_thread_key: The key for fil_crypt_thread().
All other InnoDB threads should already have been registered for
performance_schema ever since
commit a2f510fccf
page_encrypt_thread_key: The key for fil_crypt_thread().
All other InnoDB threads should already have been registered for
performance_schema ever since
commit a2f510fccf
When converting an IN-list to a subquery, a temporary table stores the IN-list
values and participates in join optimization. The problem is the bitmap
of usable keys for the temporary table is initialized after the optimization
phase, during execution. It happens when the table is opened
via `ha_heap::open()`, after the subroutine `set_keys_for_scanning()`
is called. Trying to access the bitmap earlier, during optimization, leads
to MSAN/Valgrind errors.
This fix removes the dependency on `set_keys_for_scanning()`. The key bitmap
is now dynamically composed on demand in `keys_to_use_for_scanning()`,
ensuring correctness without imposing strict call-order constraints.
Reviewer: Oleksandr Byelkin <sanja@mariadb.com>
This commit implements
mysql/mysql-server@7037a0bdc8
functionality.
If some transaction 't' requests not-gap X-lock 'Xt' on record 'r', and
locks list of the record 'r' contains not-gap granted S-lock 'St' of
transaction 't', followed by not-gap waiting locks WB={Wb1,
Wb2, ..., Wbn} conflicting with 'Xt', and 'Xt' does not conflict with any
other lock, located in the list after 'St', then grant 'Xt'. Note that
insert-intention locks are also gap locks.
If some transaction 't' holds not-gap lock 'Lt' on record 'r', and some
other transactions have not-gap continuous waiting locks sequence
L(B)={L(b1), L(b2), ..., L(bn)} following L(t) in
the list of locks for the record 'r', and transaction 't' requests not-gap,
what means also not insert intention, as ii-locks are also gap locks,
X-lock conflicting with any lock in L(B), then grant the.
MySQL's commit contains the following explanation of why insert-intention
locks must not overtake a waiting ordinary or gap locks:
"It is important that this decission rule doesn't allow
INSERT_INTENTION locks to overtake WAITING locks on gaps (`S`, `S|GAP`,
`X`, `X|GAP`), as inserting a record into a gap would split such WAITING
lock, violating the invariant that each transaction can have at most
single WAITING lock at any time."
I would add to the explanation the following. Suppose we have trx 1 which
holds ordinary X-lock on some record. And trx 2 executes "DELETE FROM t"
or "SELECT * FOR UPDATE" in RR(see lock_delete_updated.test and
MDEV-27992), i.e. it creates waiting ordinary X-lock on the same record.
And then trx 1 wants to insert some record just before the locked record.
It requests insert-intention lock, and if the lock overtakes trx 2 lock,
there will be phantom records for trx 2 in RR. lock_delete_updated.test
shows how "DELETE" allows to insert some records in already scanned gap
and misses some records to delete.
The current implementation differs from MySQL implementation. There are
two key differences:
1. Lock queue ordering. In MySQL all waiting locks precede all granted
locks. A new waiting lock is added to the head of the queue, a new
granted lock is added to the end of the queue, if some waiting lock
is granted, it's moved to the end of the queue. In MariaDB any new
lock is added to the end of the queue and waiting lock does not change
its position in the queue where the lock is granted. The rule is that
blocking lock must be located before blocked lock in lock queue. We
maintain the rule with inserting bypassing lock just before bypassed
one.
2. MySQL implementation uses some object(locksys::Trx_locks_cache) which
can be passed to consecutive calls to rec_lock_has_to_wait() for the
same trx and heap_no to cache the result of checking if trx has a
granted lock which is blocking the waiting lock(see
locksys::Trx_locks_cache::has_granted_blocker()). The current
implementation does not use such object, because it looks for such
granted lock on the level of lock_rec_other_has_conflicting() and
lock_rec_has_to_wait_in_queue(). I.e. there is no need in additional
lock queue iteration in
locksys::Trx_locks_cache::has_granted_blocker(), as we already iterate
it in lock_rec_other_has_conflicting() and
lock_rec_has_to_wait_in_queue().
During the testing the following case was found. Suppose we have
delete-marked record and going to do inplace insert into
that delete-marked record. Usually we don't create explicit lock if
there are no conlicting with not gap X-lock locks(see
lock_clust_rec_modify_check_and_lock(), btr_cur_update_in_place()). The
implicit lock will be converted to explicit one by demand.
That can happen during INSERT, the not-gap S-lock can
be acquired on searching for duplicates(see
row_ins_duplicate_error_in_clust()), and, if delete-marked record is
found, inplace insert(see btr_cur_upd_rec_in_place()) modifies the
record, what is treated as implicit lock.
But there can be a case when some transaction trx1 holds not-gap S-lock,
another transaction trx2 creates waiting X-lock, and then trx2 tries to
do inplace insert. Before the fix the waiting X-lock of trx2 would be
conflicting lock, and trx1 would try to create explicit X-lock, what
would cause deadlock, and one of the transactions whould be rolled back.
But after the fix, trx2 waiting X-lock is not treated as conflicting
with trx1 X-lock anymore, as trx1 already holds S-lock. If we don't create
explicit lock, then some other transaction trx3 can create it during
implicit to explicit lock conversion and place it at the end of the
queue. So there can be the following locks order in the queue:
S1(granted) X2(waiting) X1(granted)
The above queue is not valid, because all granted trx1 locks must be
placed before waiting trx2 lock. Besides, lock_rec_release_try() can
remove S(granted, trx1) lock and grant X lock to trx 2, and there can be
two granted X-locks on the same record:
X2(granted) X1(granted)
Taking into account that lock_rec_release_try() can release cell and
lock_sys latches leaving some locks unreleased, the queue validation
function can fail in any unexpected place.
It can be fixed with two ways:
1) Place explicit X(granted, trx1) lock before X(waiting, trx2) lock
during implicit to explicit lock conversion. This option is implemented
in MySQL, as granted lock is always placed at the top of locks queue,
and waiting locks are placed at the bottom of the queue. MariaDB does
not do this, and implementing this variant would require conflicting
locks search before converting implicit to explicit lock, what, in
turns, would require cell and/or lock_sys latch acquiring.
2) Create and place X(granted, trx1) lock before X(waiting, trx2) during
inplace INSERT, i.e. when lock_rec_lock() is invoked from
lock_clust_rec_modify_check_and_lock() or
lock_sec_rec_modify_check_and_lock(), if X(waiting, trx2) is
bypassed. Such a way we don't need in additional conflicting locks
search, as they are searched anyway in lock_rec_low().
This fix implements the second variant(see the changes around
c_lock_info.insert_after in lock_rec_lock). I.e. if some record was
delete-marked and we do inplace insert in such a record, and some lock for
bypass was found, create explicit lock to avoid conflicting lock search on
each implicit to explicit lock conversion. We can remove it if MDEV-35624
is implemented.
lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue():
search locks to bypass along with conflicting locks searching in the
same loop. The result is returned in conflicting_lock_info object.
There can be several locks to bypass, only the first one is returned to
limit lock_rec_find_similar_on_page() with the first bypassed lock to
preserve "blocking before blocked" invariant. conflicting_lock_info also
contains a pointer to the lock, after which we can insert bypassing
lock. This lock precedes bypassed one.
Bypassing lock can be next-key lock, and the following cases are
possible:
1. S1(not-gap, granted) II2(granted) X3(waiting for S1),
When new X1(ordinary) lock is acquired, there will be the following
locks queue:
S1(not-gap, granted) II2(granted) X1(ordinary, granted) X3(waiting for
S1)
If we had inserted new X1 lock just after S1, and S1 had been released
on transaction commit or rollback, we would have the following
sequence in the locks queue:
X1(ordinary, granted) II2(granted) X3(waiting for X1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is not a real issue as II lock once granted can be
ignored but it could possibly hit some assert(taking into account
that lock_release_try() can release lock_sys latch, and other threads
can acquire the latch and validate lock queue) as it breaks our design
constraint that any granted lock in the queue should not conflict
with locks ahead in the queue. But lock_rec_queue_validate() does not
check the above constraint. We place new bypassing lock just before
bypassed one, but there still can be the case when lock bitmap is used
instead of creating new lock object(see lock_rec_add_to_queue() and
lock_rec_find_similar_on_page()), and the lock, which owns the
bitmap, can precede II2(granted). We can either disable
lock_rec_find_similar_on_page() space optimization for bypassing locks
or treat "X1(ordinary, granted) II2(granted)" sequence as valid. As
we don't currently have the function which would fail on the above
sequence, let treat it as valid for the case, when lock_release()
execution is in process.
2. S1(ordinary, granted) II2(waiting for S1) X3(waiting for S1)
When new X1(ordinary) lock is acquired, there will be the following
locks queue:
S1(ordinary, granted) II2(waiting for S1) X1(ordinary, granted)
X3(waiting for S1).
After S1 releasing there will be:
II2(granted) X1(ordinary, granted) X3(waiting for X1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The above queue is valid because ordinary lock does not conflict with
II-lock(see lock_rec_has_to_wait()).
lock_rec_create_low(): insert new lock to the position which
lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue()
returned if the lock is bypassing.
lock_rec_find_similar_on_page(): add ability to limit similiar lock search
with the certain lock to preserve "blocking before blocked" invariant for
all bypassed locks.
lock_rec_add_to_queue(): don't treat bypassed locks as waiting ones to
let lock bitmap reusing for bypassing locks.
lock_rec_lock(): fix inplace insert case, explained above.
lock_rec_dequeue_from_page(), lock_rec_rebuild_waiting_queue(): move
bypassing lock to the correct place to preserve "blocking before blocked"
invariant.
Reviewed by: Debarun Banerjee, Marko Mäkelä.
enum rename_fk: Replaces the "bool use_fk" parameter of
row_rename_table_for_mysql() and innobase_rename_table():
RENAME_IGNORE_FK: Replaces use_fk=false when the operation cannot
involve any FOREIGN KEY constraints, that is, it is a partitioned
table or an internal table for FULLTEXT INDEX.
RENAME_REBUILD: Replaces use_fk=false when the table may contain
FOREIGN KEY constraints, which must not be modified in the data
dictionary tables SYS_FOREIGN and SYS_FOREIGN_COLS.
RENAME_ALTER_COPY: Replaces use_fk=true. This is only specified
in ha_innobase::rename_table(), which may be invoked as part of
ALTER TABLE…ALGORITHM=COPY, but also during RENAME TABLE.
An alternative value RENAME_FK could be useful to specify in
ha_innobase::rename_table() when it is executed as part of
CREATE OR REPLACE TABLE, which currently is not an atomic operation.
Reviewed by: Debarun Banerjee
innodb_convert_name(): Convert a schema or table name to
my_charset_filename compatible format.
dict_table_lookup(): Replaces dict_get_referenced_table().
Make the callers responsible for invoking innodb_convert_name().
innobase_casedn_str(): Remove. Let us invoke my_casedn_str() directly.
dict_table_rename_in_cache(): Do not duplicate a call to
dict_mem_foreign_table_name_lookup_set().
innobase_convert_to_filename_charset(): Defined static in the only
compilation unit that needs it.
dict_scan_id(): Remove the constant parameters
table_id=FALSE, accept_also_dot=TRUE. Invoke strconvert() directly.
innobase_convert_from_id(): Remove; only called from dict_scan_id().
innobase_convert_from_table_id(): Remove (dead code).
table_name_t::dblen(), table_name_t::basename(): In non-debug builds,
tolerate names that may miss a '/' separator.
Reviewed by: Debarun Banerjee
row_ins_cascade_calc_update_vec(): Skip any virtual columns in the
update vector of the parent table.
Based on mysql/mysql-server@0ac176453b
Reviewed by: Debarun Banerjee
Update conn->queue_connect_share in spider_check_trx_and_get_conn to
avoid use-after-free.
There are two branches in spider_check_trx_and_get_conn, often called
at the beginning of a spider DML, depending on whether an update of
various spider fields is needed. If it is determined to be needed, the
updating may NULL the connections associated with the spider handler,
which subsequently causes a call to spider_get_conn() which updates
conn->queued_connect_share with the SPIDER_SHARE associated with the
spider handler.
We make it so that conn->queued_connect_share is updated regardless of
the branch it enters, so that it will not be a stale and potentially
already freed one.
Each spider connection is identified with a connection key, which is
an encoding of the backend parameters.
The first byte of the key is by default 0, and in rare circumstances
it is changed to a different value: when semi_table_lock is set to 1;
and when using casual read. When this happens, often a new connection
is created with the new key. Neither case is useful: the description
of semi_table_lock has nothing to do with creation of new connections
and the parameter itself was deprecated for 10.7+ (MDEV-28829) and
marked for deletion (MDEV-28830); while new threads created by
non-zero spider_casual_read causes only threads to be idle, thus not
achieving any gain, see MDEV-26151, and the param has also been
deprecated in 11.5+ (MDEV-31789). The relevant code adds unnecessary
complexity to the spider code. This change does not reduce
parallelism, because already when bgs mode is on a background thread
is created per partition, and there is no evidence spider creates
multiple threads for one partition. If the needs of such cases arise
it will be a separate issue.
The conn_kind, which stands for "connection kind", is no longer useful
because the HandlerSocket support is deleted and Spider now has only
one connection kind, SPIDER_CONN_KIND_MYSQL. Remove conn_kind and
related code.
Signed-off-by: Yuchen Pei <yuchen.pei@mariadb.com>
Reviewed-by: Nayuta Yanagisawa <nayuta.yanagisawa@mariadb.com>
- MDEV-34392(commit cc810e64d4) adds
the check for nullability of foreign key column when foreign key
relation is of UPDATE_CASCADE or UPDATE SET NULL. This check
makes DDL fail when it violates foreign key nullability.
This patch basically does the nullability check for foreign key
column only for strict sql mode
It's supposed that the function gets the previous lock set on a record.
But if there are several locks set on a record, it will return only the
first one. Continue locks list iteration till the certain lock even if
the certain bit in lock bitmap is set.
Item:print_for_table_def() uses QT_TO_SYSTEM_CHARSET to print
the DEFAULT expression into FRM file during CREATE TABLE.
Therefore, the expression is encoded in utf8 in FRM.
get_field_default_value() erroneously used field->charset() to
print the DEFAULT expression at SHOW CREATE TABLE time.
Fixing get_field_default_value() to use &my_charset_utf8mb4_general_ci instead.
This makes DEFAULT work in the way way with:
- virtual column expressions:
if (field->vcol_info)
{
StringBuffer<MAX_FIELD_WIDTH> str(&my_charset_utf8mb4_general_ci);
field->vcol_info->print(&str);
- check constraint expressions:
if (field->check_constraint)
{
StringBuffer<MAX_FIELD_WIDTH> str(&my_charset_utf8mb4_general_ci);
field->check_constraint->print(&str);
Additional cleanup:
Fixing system_charset_info to &my_charset_utf8mb4_general_ci in a few
places to make non-BMP characters work in DEFAULT, virtual column,
check constraint expressions.
problem:
=======
- During load statement, InnoDB bulk operation relies on temporary
directory and it got crash when tmpdir is exhausted.
Solution:
========
During bulk insert, LOAD statement is building the clustered
index one record at a time instead of page. By doing this,
InnoDB does the following
1) Avoids creation of temporary file for clustered index.
2) Writes the undo log for first insert operation alone
Let us make some member functions of lock_sys_t non-static
to avoid some shuffling of function parameter registers.
lock_cancel_waiting_and_release(): Declare static, because there
are no external callers.
Reviewed by: Debarun Banerjee
trx_t::autoinc_locks: Use small_vector<lock_t*,4> in order to avoid any
dynamic memory allocation in the most common case (a statement is
holding AUTO_INCREMENT locks on at most 4 tables or partitions).
lock_cancel_waiting_and_release(): Instead of removing elements from
the middle, simply assign nullptr, like lock_table_remove_autoinc_lock().
The added test innodb.auto_increment_lock_mode covers the dynamic memory
allocation as well as nondeterministically (occasionally) covers
the out-of-order lock release in lock_table_remove_autoinc_lock().
Reviewed by: Debarun Banerjee
Quick read record uses different handler (H1) for finding records. It
cannot use ha_delete_row() handler (H2) as it is different search
mode: inited == INDEX for H1, inited == RND for H2. So, read handler
H1 uses index while write handler H2 uses random access.
For going next record in H1 there is info->last_pos optimization for
stepping index via tree_search_next(). This optimization can work with
deleted rows only if delete is conducted in the same handler, there
is:
67 int hp_rb_delete_key(HP_INFO *info, register HP_KEYDEF *keyinfo,
68 const uchar *record, uchar *recpos, int flag)
69 {
...
74 if (flag)
75 info->last_pos= NULL; /* For heap_rnext/heap_rprev */
But this cannot work for different handler. So, last_pos in H1 after
delete in H2 contains stale info->parents array and last_pos points
into that parents. In the specific test case last_pos' parent is
already freed node and tree_search_next() steps into it.
The fix invalidates local savings of info->parents and info->last_pos
based on key_version. Record deletion increments share->key_version in
H2, so in H1 we know the tree might be changed.
Another good measure would be to use H1 for delete. But this is bigger
refactoring than just bug fixing.
* Innobase `os0file.cc`: use `PRIu64` over `llu`
* These came after I prepared #3485.
* MyISAM `mi_check.c`: in impossible block length warning
* I missed this one in #3485 (and #3360 too?).
InnoDB transactions may be reused after committed:
- when taken from the transaction pool
- during a DDL operation execution
In this case wsrep flag on trx object is cleared, which may cause wrong
execution logic afterwards (wsrep-related hooks are not run).
Make trx->wsrep flag initialize from THD object only once on InnoDB transaction
start and don't change it throughout the transaction's lifetime.
The flag is reset at commit time as before.
Unconditionally set wsrep=OFF for THD objects that represent InnoDB background
threads.
Make Wsrep_schema::store_view() operate in its own transaction.
Fix streaming replication transactions' fragments rollback to not switch
THD->wsrep value during transaction's execution
(use THD->wsrep_ignore_table as a workaround).
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
recv_sys_t::parse(): Correctly handle the storing==BACKUP case,
and simplify some logic around storing==YES as well.
The added test mariabackup.undo_truncate is based on an idea of
Thirunarayanan Balathandayuthapani. It nondeterministically (not on
every run) covers this logic, including the function backup_undo_trunc(),
for both innodb_encrypt_log=ON and innodb_encrypt_log=OFF.
Reviewed by: Debarun Banerjee
The problems were that:
1) resources was freed "asimetric" normal execution in send_eof,
in case of error in destructor.
2) destructor was not called in case of SP for result objects.
(so if the last SP execution ended with error resorces was not
freeded on reinit before execution (cleanup() called before next
execution) and destructor also was not called due to lack of
delete call for the object)
Result cleanup() renamed to reset_for_next_ps_execution() to better
reflect function().
All result method revised and freeing resources made "symetric".
Destructor of result object called for SP.
Added skipped invalidation in case of error in insert.
Removed misleading naming of reset(thd) (could be mixed with
with reset()).
log_t::resize_start(): If the ib_logfile101 cannot be created,
be sure to reset log_sys.resize_lsn.
log_t::resize_abort(): In case SET GLOBAL innodb_log_file_size is
aborted, delete the ib_logfile101.
Let us enable pmem_persist() on RISC-V and LoongArch, because those are
available in the Debian CI.
In commit 3f9f5ca48e these were initially
disabled by default.
According to the available documentation, these instructions are
available in all ISA versions. On LoongArch there would also be
__builtin_loongarch_dbar() that generates the same code.
innodb_log_file_mmap: Use a constant documentation string that
refers to persistent memory also when it is not available in the build.
HAVE_INNODB_MMAP: Remove, and unconditionally enable this code.
log_mmap(): On 32-bit systems, ensure that the size fits in 32 bits.
log_t::resize_start(), log_t::resize_abort(): Only handle memory-mapping
if HAVE_PMEM is defined. The generic memory-mapped interface is only for
reading the log in recovery. Writable memory mappings are only for
persistent memory, that is, Linux file systems with mount -o dax.
Reviewed by: Debarun Banerjee, Otto Kekäläinen
The mismatch occurs on the function calls as in the sql/sql_udf.h the
types of "error" and "is_null" are unsigned char rather than char.
This is corrected for the udf functions:
* spider_direct_sql
* spider_direct_bg_sql
* spider_flush_table_mon_cache
* spider_copy_tables
* spider_ping_table
Reviewer: Yuchen Pei
-DCMAKE_BUILD_TYPE=xxx sets some C compiler flags according to the build type.
-DBUILD_CONFIG was completely overwriting them in some compiler / arch
combinations and not in others. Make it consistently "append-only", not
overwrite.
Also, enforce the same set of flags for Release and RelWithDebInfo.
This reverts ff1f611a0d as it is no longer
necessary.
Avoid assert()
By default, CMAKE_BUILD_TYPE RelWithDebInfo or Release implies
-DNDEBUG, which disables the assert() macro. MariaDB is deviating
from that. Let us be explicit to use assert() only in debug builds.
log_t::persist(): Remove the parameter holding_latch, and assert
latch_holding_any(). We used to avoid acquiring a latch when log
resizing was not in progress. That allowed a race condition to occur
where log_t::write_checkpoint() has just completed log resizing.
In that case, we could wrongly invoke pmem_persist() on the old
log_sys.buf instead of the new one, which was shortly before known
as log_sys.resize_buf.
log_write_persist(): A non-inline wrapper function that will
invoke log_sys.persist() while holding a shared log_sys.latch.
By default, CMAKE_BUILD_TYPE RelWithDebInfo or Release implies -DNDEBUG,
which disables the assert() macro. MariaDB is deviating from that.
Let us be explicit to use assert() only in debug builds.
This fixes up 1b8358d943
The macros ut_ad() and DBUG_ASSERT() can evaluate their argument twice.
That is wrong for any read-modify-write arguments.
Thanks to Nikita Malyavin for pointing this out.
recv_sys_t::parse(): When parsing an OPTION record, invoke
l.copy_if_needed() before checking if the payload is OPT_PAGE_CHECKSUM
followed by a 32-bit page checksum.
This fixes up the merge 57d4a242da of
commit 4179f93d28 (MDEV-18976).
The impact of this can be observed by running a debug instrumented
build on the test encryption.recovery_memory. There should be over
5,000 invocations of log_phys_t::page_checksum(). Without this fix,
there should be less than 100 of them (when the OPT_PAGE_CHECKSUM
byte happens to encrypt to itself).
Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
This fixes another regression that had been introduced in
commit b249a059da (MDEV-34850).
This should prevent failures of mariadb-backup --backup of
the following type:
mariabackup: Failed to read undo log tablespace space id …
and there is no undo tablespace truncation redo record.
This error has not been hit by our internal testing, and we
currently have no regression test to cover this.