row_vers_impl_x_locked_low(): If a secondary index record points to
a clustered index record that carries the current transaction identifier,
then there cannot possibly be any implicit locks to that secondary index
record, because those would have been checked before the current
transaction got the implicit lock (modified the clustered index record)
in the first place.
This fix will avoid unnecessary access to undo log and possible BLOB pages,
which may already have been freed in a purge operation.
buf_page_get_zip(): Assert that the page is not marked as freed
in the tablespace. This assertion could fire in a scenario like the
test case when the table is created in ROW_FORMAT=COMPRESSED.
Add include/have_lowercase{0,1}.inc where test case require it.
Reason: lower_case_table_names setting in the corresponding .opt files
is insufficient, if it is overwritten with --mysqld setting in MTR.
When there's a column length mismatch in the InnoDB
statistics tables (innodb_table_stats or innodb_index_stats),
consecutive access of statistics table throws error message
and uses transient statistics.
This change makes it easier for users to understand and
resolve the issue when the statistics tables have been
modified or corrupted.
a comment in the test says
# do not clean up - we do not know which of the three has been released
# so the --reap command may hang because the command that is being executed
# in that connection is still running/waiting
Let us remove the thread-local variable mariadb_stats and introduce
trx_t::pages_accessed, trx_t::active_handler_stats for more
efficiently maintaining some statistics inside InnoDB.
buf_pool.stat.n_page_gets: Reimplemented as Atomic_counter<ulint>.
This will no longer track some accesses in the background where
!current_thd() || !thd_to_trx(current_thd).
trx_t::free(), trx_t::commit_cleanup(): Apply pages_accessed
to buf_pool.stat.n_page_gets.
buf_read_ahead_report(): Report a completed read-ahead batch.
ha_innobase::estimate_rows_upper_bound(): Do not bother updating
trx_t::op_info around some quick arithmetics.
ha_innobase::records_in_range(): Do invoke mariadb_set_stats.
This will change some ANALYZE FORMAT=JSON SELECT results of the test
main.rowid_filter_innodb.
Reviewed by: Vladislav Lesin
Tested by: Saahil Alam
Problem:
=======
- InnoDB statistics calculation for the table is done after
every 10 seconds by default in background thread dict_stats_thread()
- Doing multiple ALTER TABLE..ALGORITHM=COPY causes the
dict_stats_thread() to lag behind, therefore calculation of stats
for newly created intermediate table gets delayed
Fix:
====
- Stats calculation for newly created intermediate table is made
independent of background thread. After copying gets completed,
stats for new table is calculated as part of ALTER TABLE ... ALGORITHM=COPY.
dict_stats_rename_table(): Rename the table statistics from
intermediate table to new table
alter_stats_rebuild(): Removes the table name from the warning.
Because this warning can print for intermediate table as well.
Alter table using copy algorithm now calls alter_stats_rebuild()
under a shared MDL lock on a temporary #sql-alter- table,
differing from its previous use only during ALGORITHM=INPLACE
operations on user-visible tables.
dict_stats_schema_check(): Added a separate check for table
readability before checking for tablespace existence.
This could lead to detect of existence of persistent statistics
storage eariler and fallback to transient statistics.
This is a cherry-pick fix of mysql commit@cfe5f287ae99d004e8532a30003a7e8e77d379e3
Instead of using DBUG_EXECUTE_IF fault injection, let us construct
a minimal corrupted log file that will produce an OPT_PAGE_CHECKSUM
mismatch without depending on CMAKE_BUILD_TYPE=Debug.
Problem:
=======
When InnoDB encounters a corrupted page during crash recovery,
server would abort due to improper handling of page locks
and space references. The recovery process was not properly
cleaning up resources when corruption was detected,
leading to inconsistent state and server termination.
Solution:
=========
recover_low(): Move page lock recursive acquisition
after deferred/non-deferred page creation logic to
ensure consistent locking behavior for both code paths.
Ensure proper block recursive unlock for non-deferred tablespaces
recv_recover_page(): Simplify corrupted page cleanup by
removing redundant space reference handling.
The function row_purge_reset_trx_id() that had been introduced in
commit 3c09f148f3 (MDEV-12288)
introduces some extra buffer pool and redo log activity that will
cause a significant performance regression under some workloads.
This is currently the most significant performance issue, after
commit acd071f599 (MDEV-21923)
fixed the InnoDB LSN allocation and MDEV-19749 the MDL bottleneck in 12.1.
The purpose of row_purge_reset_trx_id() was to ensure that we can
easily identify records for which no history exists. If DB_TRX_ID
is 0, we could avoid looking up the transaction to see if the
history is accessible or the record is implicitly locked.
To avoid trx_sys_t::find() for stale DB_TRX_ID values, we can refer
to trx_t::max_inactive_id, which was introduced in
commit 4105017a58 (MDEV-30357).
Instead of comparing DB_TRX_ID to 0, we may compare it to this
cached value. The cache would be updated by
trx_sys_t::find_same_or_older(), which is invoked for some operations
on secondary indexes.
row_purge_reset_trx_id(): Remove. We will no longer reset the
DB_TRX_ID to 0 after an INSERT. We will retain a single undo log
for all operations, though. Before MDEV-12288, there had been
separate insert_undo and update_undo logs.
row_check_index(): No longer warn
"InnoDB: Clustered index record with stale history in table".
lock_rec_queue_validate(), lock_rec_convert_impl_to_expl(),
row_vers_impl_x_locked_low(): Instead of comparing the DB_TRX_ID
to 0, compare it to trx_t::max_inactive_id.
In dict0load.cc we will not spend any effort to avoid extra
trx_sys.find() calls for stale DB_TRX_ID in dictionary tables.
This code does not currently use trx_t objects, and therefore
we cannot easily access trx_t::max_inactive_id. Loading table
definitions into the InnoDB data dictionary cache (dict_sys)
should be a very rare operation.
Reviewed by: Vladislav Lesin
Problem:
=======
- During copy algorithm, InnoDB fails to detect the duplicate
key error for unique hash key blob index. Unique HASH index
treated as virtual index inside InnoDB.
When table does unique hash key , server does search on
the hash key before doing any insert operation and
finds the duplicate value in check_duplicate_long_entry_key().
Bulk insert does all the insert together when copy of
intermediate table is finished. This leads to undetection of
duplicate key error while building the index.
Solution:
========
- Avoid bulk insert operation when table does have unique
hash key blob index.
dict_table_t::can_bulk_insert(): To check whether the table
is eligible for bulk insert operation during alter copy algorithm.
Check whether any virtual column name starts with DB_ROW_HASH_ to
know whether blob column has unique index on it.
Problem:
=======
- InnoDB modifies the PAGE_ROOT_AUTO_INC value on clustered index
root page. But before committing the PAGE_ROOT_AUTO_INC changes
mini-transaction, InnoDB does bulk insert operation and
calculates the page checksum and store as a part of redo log in
mini-transaction. During recovery, InnoDB fails to validate the
page checksum.
Solution:
========
- Avoid writing the persistent auto increment value before doing
bulk insert operation.
- For bulk insert operation, persistent auto increment value
is written via btr_write_autoinc while applying the buffered
insert operation.
buf_pool_t::shrink(): If we run out of pages to evict from buf_pool.LRU,
abort the operation. Also, do not leak the spare block that we may have
allocated.
buf_pool_t::shrink(): When relocating a dirty page of the temporary
tablespace, reset the oldest_modification() on the discarded block,
like we do for persistent pages in buf_flush_relocate_on_flush_list().
buf_pool_t::resize(): Add debug assertions to catch this error earlier.
This bug does not seem to affect non-debug builds.
Reviewed by: Thirunarayanan Balathandayuthapani
* automatically disable ps2 and cursor protocol when the
select statement returns no result set
* remove manual {disable|enable}_{ps2|cursor}_protocol from around
`select ... into` in tests
* other misc collateral test cleanups
Cherry-pick from 11.8
ha_innobase::store_lock(): Set also trx->will_lock when starting
a transaction at SERIALIZABLE isolation level. This fixes up
commit 7fbbbc983f (MDEV-36330).
At TRANSACTION ISOLATION LEVEL SERIALIZABLE, InnoDB would fail to flag
a write/read conflict, which would be a violation already at the more
relaxed REPEATABLE READ level when innodb_snapshot_isolation=ON.
Fix: Create a read view and start the transaction at the same time.
Thus, lock checks will be able to consult the correct read view
to flag ER_CHECKREAD if we are about to lock a record that was committed
after the start of our transaction.
innobase_start_trx_and_assign_read_view(): At any other isolation level
than READ UNCOMMITTED, do create a read view. This is needed for the
correct operation of START TRANSACTION WITH CONSISTENT SNAPSHOT.
ha_innobase::store_lock(): At SERIALIZABLE isolation level, if the
transaction was not started yet, start it and open a read view.
An alternative way to achieve this would be to make trans_begin()
treat START TRANSACTION (or BEGIN) in the same way as
START TRANSACTION WITH CONSISTENT SNAPSHOT when the isolation level
is SERIALIZABLE.
innodb_isolation_level(const THD*): A simpler version of
innobase_map_isolation_level(). Compared to earlier, we will return
READ UNCOMMITTED also if the :newraw option is set for the
InnoDB system tablespace.
Reviewed by: Vladislav Lesin
page_delete_rec_list_end(): Do not attempt to scrub the data of
an empty record.
The test case would reproduce a debug assertion failure in branches
where commit 358921ce32 (MDEV-26938)
is present. MariaDB Server 10.6 only supports ascending indexes,
and in those, the empty string would always be sorted first, never
last in a page.
Nevertheless, we fix the bug also in 10.6, in case it would be
reproducible in a slightly different scenario.
Reviewed by: Thirunarayanan Balathandayuthapani
While updating the persistent defragmentation statistics
for the table, InnoDB opens the table only if it is in cache.
If dict_table_open_on_id() fails to find the table in cache
then it fails to unfreeze dict_sys.latch. This lead to crash
Problem:
=======
- There are two failures occurs for this test case:
(1) set global innodb_buf_flush_list_now=1 doesn't make sure that pages
are being flushed.
(2) InnoDB page cleaner thread aborts while writing the checkpoint information.
Problem is that When InnoDB startup aborts, InnoDB changes the shutdown
state to SRV_SHUTDOWN_EXIT_THREADS. By changing the shutdown state, InnoDB
doesn't advance the log_sys.lsn (avoids fil_names_clear()).
After InnoDB shutdown(innodb_shutdown()) is being initiated, shutdown state
again changed to SRV_SHUTDOWN_INITIATED. This leads the page cleaner thread
to fail with assertion ut_ad(srv_shutdown_state > SRV_SHUTDOWN_INITIATED)
in log_write_checkpoint_info()
Solution:
=========
(1) In order to avoid (1) failure, InnoDB can make the
variable innodb_max_dirty_pages_pct_lwm, innodb_max_dirty_pages_pct to 0.
Also make sure that InnoDB doesn't have any dirty pages in buffer pool
by adding wait_condition.
(2) Avoid changing the srv_shutdown_state to SRV_SHUTDOWN_EXIT_THREADS
when the InnoDB startup aborts
Issue: When XA transaction is implicitly rolled back, we keep XA state
XA_ACTIVE and set rm_error to ER_LOCK_DEADLOCK. Other than XA command
we don't check for rm_error and DML and query are executed with a new
transaction.
Fix: One way to fix this issue is to set the XA state to XA_ROLLBACK_ONLY
which is checked while opening table open_tables() and ER_XAER_RMFAIL is
returned for any DML or Query.
A deadlock forces the on going transaction to rollback implicitly.
Within a transaction block, started with START TRANSACTION / BEGIN,
implicit rollback doesn't reset OPTION_BEGIN flag. It results in a
new implicit transaction to start when the next statement is executed.
This behaviour is unexpected and should be fixed. However, we should
note that there is no issue with rollback.
We fix the issue to keep the behaviour of implicit rollback (deadlock)
similar to explicit COMMIT and ROLLBACK i.e. the next statement after
deadlock error is not going to start a transaction block implicitly
unless autocommit is set to zero.
innodb.doublewrite: Skip the test case if we get an unexpected
checkpoint. This could happen because page cleaner thread
could be active after reading the initial checkpoint information.
buf_pool_t::resize(): After successfully shrinking the buffer pool,
announce the success. The size had already been updated in shrunk().
After failing to shrink the buffer pool, re-enable the adaptive
hash index if it had been enabled.
Reviewed by: Debarun Banerjee
Problem:
=======
- In 10.11, During Copy algorithm, InnoDB does use bulk insert
for row by row insert operation. When temporary directory
ran out of memory, row_mysql_handle_errors() fails to handle
DB_TEMP_FILE_WRITE_FAIL.
- During inplace algorithm, concurrent DML fails to write
the log operation into the temporary file. InnoDB fail to
mark the error for the online log.
- ddl_log_write() releases the global ddl lock prematurely before
release the log memory entry
Fix:
===
row_mysql_handle_errors(): Rollback the transaction when
InnoDB encounters DB_TEMP_FILE_WRITE_FAIL
convert_error_code_to_mysql(): Report an aborted transaction
when InnoDB encounters DB_TEMP_FILE_WRITE_FAIL during
alter table algorithm=copy or innodb bulk insert operation
row_log_online_op(): Mark the error in online log when
InnoDB ran out of temporary space
fil_space_extend_must_retry(): Mark the os_has_said_disk_full
as true if os_file_set_size() fails
btr_cur_pessimistic_update(): Return error code when
btr_cur_pessimistic_insert() fails
ddl_log_write(): Release the global ddl lock after releasing
the log memory entry when error was encountered
btr_cur_optimistic_update(): Relax the assertion that
blob pointer can be null during rollback because InnoDB can
ran out of space while allocating the external page
ha_innobase::extra(): Rollback the transaction during DDL before
calling convert_error_code_to_mysql().
row_undo_mod_upd_exist_sec(): Remove the assertion which says
that InnoDB should fail to build index entry when rollbacking
an incomplete transaction after crash recovery. This scenario
can happen when InnoDB ran out of space.
row_upd_changes_ord_field_binary_func(): Relax the assertion to
make that externally stored field can be null when InnoDB ran out
of space.
Problem:
=======
- During inplace algorithm, concurrent DML fails to write
the log operation into the temporary file. InnoDB fail to
mark the error for the online log.
- ddl_log_write() releases the global ddl lock prematurely before
release the log memory entry
Fix:
===
row_log_online_op(): Mark the error in online log when
InnoDB ran out of temporary space
fil_space_extend_must_retry(): Mark the os_has_said_disk_full
as true if os_file_set_size() fails
btr_cur_pessimistic_update(): Return error code when
btr_cur_pessimistic_insert() fails
ddl_log_write(): Release the global ddl lock after releasing the
log memory entry when error was encountered
btr_cur_optimistic_update(): Relax the assertion that
blob pointer can be null during rollback because InnoDB can
ran out of space while allocating the external page
row_undo_mod_upd_exist_sec(): Remove the assertion which says
that InnoDB should fail to build index entry when rollbacking
an incomplete transaction after crash recovery. This scenario
can happen when InnoDB ran out of space.
row_upd_changes_ord_field_binary_func(): Relax the assertion to
make that externally stored field can be null when InnoDB ran out
of space.
page_is_corrupted(): Do not allocate the buffers from stack,
but from the heap, in xb_fil_cur_open().
row_quiesce_write_cfg(): Issue one type of message when we
fail to create the .cfg file.
update_statistics_for_table(), read_statistics_for_table(),
delete_statistics_for_table(), rename_table_in_stat_tables():
Use a common stack buffer for Index_stat, Column_stat, Table_stat.
ha_connect::FileExists(): Invoke push_warning_printf() so that
we can avoid allocating a buffer for snprintf().
translog_init_with_table(): Do not duplicate TRANSLOG_PAGE_SIZE_BUFF.
Let us also globally enable the GCC 4.4 and clang 3.0 option
-Wframe-larger-than=16384 to reduce the possibility of introducing
such stack overflow in the future. For RocksDB and Mroonga we relax
these limits.
Reviewed by: Vladislav Lesin
buf_buddy_shrink(): Properly cover the case when KEY_BLOCK_SIZE
corresponds to the innodb_page_size, that is, the ROW_FORMAT=COMPRESSED
page frame is directly allocated from the buffer pool, not via the
binary buddy allocator.
buf_LRU_check_size_of_non_data_objects(): Avoid a crash when the
buffer pool is being shrunk.
buf_pool_t::shrink(): Abort if over 95% of the shrunk buffer pool
would be occupied by the adaptive hash index or record locks.