In commit ab38b7511b
an added "goto err" would seemingly cause a read of
an uninitialized variable old_info if errpos>=5.
However, because we would have errpos=0 at that point,
there was no real error.
buf_flush_freed_pages(): Assert that neither buf_pool.mutex
nor buf_pool.flush_list_mutex are held. Simplify the loops.
Return the tablespace and the number of pages written or punched.
buf_flush_LRU_list_batch(), buf_do_flush_list_batch():
Release buf_pool.mutex before invoking buf_flush_space().
buf_flush_list_space(): Acquire the mutexes only after invoking
buf_flush_freed_pages().
Reviewed by: Thirunarayanan Balathandayuthapani
fil_space_t::try_to_close(): Tolerate a tablespace that has no
data files attached. The function fil_ibd_create() initially
creates and attaches a tablespace with no files, and invokes
fil_space_t::add() later.
fil_node_open_file(): After releasing and reacquiring fil_system.mutex,
check if the file was already opened by another thread. This avoids
an assertion failure !node->is_open() in fil_node_open_file_low().
These failures were reproduced with the test
innodb.table_definition_cache_debug and the fix of MDEV-27985.
- InnoDB fails to skip newly created column while checking for
change column when table is in redundant row format. This issue
is caused the MDEV-18035 (ccb1acbd3c)
For some reason, the tests of the MemorySanitizer build on 10.5 failed
with both clang 13 and clang 14 with SIGSEGV. On 10.6 where it worked
better, some more places to work around were identified.
The MemorySanitizer implementation in clang includes some built-in
instrumentation (interceptors) for GNU libc. In GNU libc 2.33, the
interface to the stat() family of functions was changed. Until the
MemorySanitizer interceptors are adjusted, any MSAN code builds
will act as if that the stat() family of functions failed to initialize
the struct stat.
A fix was applied in
https://reviews.llvm.org/rG4e1a6c07052b466a2a1cd0c3ff150e4e89a6d87a
but it fails to cover the 64-bit variants of the calls.
For now, let us work around the MemorySanitizer bug by defining
and using the macro MSAN_STAT_WORKAROUND().
As btrfs showed, a partial read of data in AIO /O_DIRECT circumstances can
really confuse MariaDB.
Filipe Manana (SuSE)[1] showed how database programmers can assume
O_DIRECT is all or nothing.
While a fix was done in the kernel side, we can do better in our code by
requesting that the rest of the block be read/written synchronously if
we do only get a partial read/write.
Per the APIs, a partial read/write can occur before an error, so
reattempting the request will leave the caller with a concrete error to
handle.
[1] https://lore.kernel.org/linux-btrfs/CABVffENfbsC6HjGbskRZGR2NvxbnQi17gAuW65eOM+QRzsr8Bg@mail.gmail.com/T/#mb2738e675e48e0e0778a2e8d1537dec5ec0d3d3a
Also spell synchronously correctly in other files.
The first step for deprecating innodb_autoinc_lock_mode(see MDEV-27844) is:
- to switch statement binlog format to ROW if binlog format is MIXED and
the statement changes autoincremented fields
- issue warnings if innodb_autoinc_lock_mode == 2 and binlog format is
STATEMENT
MDEV-27025 allows to insert records before the record on which DELETE is
locked, as a result the DELETE misses those records, what causes serious ACID
violation.
Revert MDEV-27025, MDEV-27550. The test which shows the scenario of ACID
violation is added.
The virtual member function handler::reset_auto_increment(ulonglong)
is only ever invoked by the default implementation of the virtual
member function handler::truncate().
Because ha_innobase::truncate() overrides handler::truncate() without
ever invoking handler::truncate(), some InnoDB member functions are
never called.
ha_innobase::innobase_reset_autoinc(), ha_innobase::reset_auto_increment():
Removed (unreachable code).
ha_innobase::delete_all_rows(): Removed. The default implementation
handler::delete_all_rows() works just as fine.
- InnoDB FTS DDL decrements the FTS_DOC_ID when there is a
deleted marked record involved. FTS_DOC_ID must never be
reused. The purpose of FTS_DOC_ID is to be a unique row
identifier that will be changed whenever a fulltext indexed
column is updated.
btr_cur_optimistic_insert(): Disregard DEBUG_DBUG injection to
invoke btr_page_reorganize() if the page (and the table) is empty.
Otherwise, an assertion would fail in btr_page_reorganize_low()
because PAGE_MAX_TRX_ID is 0 in an empty secondary index leaf page.
The test innodb.log_file_size would occasionally crash in
a debug assertion failure in srv_prepare_to_delete_redo_log_file().
In the core dump, the assertion would hold.
buf_pool_t::any_io_pending(): Avoid dirty reads by acquiring
buf_pool.mutex. This function is only being invoked in debug builds,
so we do not care about any performance impact.
In commit c7d0448797 (MDEV-15132)
MariaDB Server 10.3 stopped writing the latest transaction identifier
to the TRX_SYS page. Instead, the transaction identifier will be
recovered from undo log pages.
Unfortunately, before commit 3926673ce7
and mysql/mysql-server@dc29792ff2
(MySQL 5.1.48 or MariaDB 5.1.48) InnoDB did not always initialize all
data fields, but some garbage could be left behind in unused parts
of data pages.
In undo log pages that are essentially free, but added to a list for
reuse (TRX_UNDO_CACHED) the TRX_UNDO_TRX_NO fields could contain garbage,
instead of 0. As long as such undo pages are being reused and never
marked completely free, the garbage contents may remain forever.
In fact, the function trx_undo_header_create() and the record
MLOG_UNDO_HDR_CREATE will only initialize TRX_UNDO_TRX_ID, but leave
TRX_UNDO_TRX_NO uninitialized.
trx_undo_mem_create_at_db_start(): Only read the TRX_UNDO_TRX_NO
fields of TRX_UNDO_CACHED pages if the TRX_UNDO_PAGE_TYPE is 0,
that is, the page was updated by MariaDB Server 10.3. Earlier versions
would always write the TRX_UNDO_PAGE_TYPE as 1 or 2.
trx_undo_header_create(): Zero out the TRX_UNDO_TRX_NO field.
Strictly speaking, this will change the semantics of the
MLOG_UNDO_HDR_CREATE record, but it should not do any harm to
overwrite a potentially garbage field with zeroes.
Note: This fix will only help future upgrades straight from
MariaDB Server 10.2 or MySQL 5.6 or earlier. If such an upgrade has
already been made, then an earlier server startup could have
fast-forwarded the transaction ID sequence to a large value.
If this large value cannot be represented in 48 bits (the size of
the DB_TRX_ID column in clustered index records), then various
strange things can happen.
- Make innodb_ft_cache_size & innodb_ft_total_cache_size are dynamic
variable and increase the maximum value of innodb_ft_cache_size to
512MB for 32-bit system and 1 TB for 64-bit system and set
innodb_ft_total_cache_size maximum value to 1 TB for 64-bit system.
- Print warning if the fts cache exceeds the innodb_ft_cache_size
and also unlock the cache if fts cache memory reduces less than
innodb_ft_cache_size.
When recovery is rolling back an incomplete instant DROP COLUMN
operation, it may access non-existing fields. Let us avoid
invoking std::find_if() outside the valid bounds of the array.
This bug was reproduced with the Random Query Generator, using a
combination of instant DROP, ADD...FIRST, CHANGE (renaming a column).
Unfortunately, we were unable to create an mtr test case for
reproducing this, despite spending considerable effort on it.
Backported from 10.5 20e9e804c1 and
5948d7602e.
sel_restore_position_for_mysql() moves forward persistent cursor
position after btr_pcur_restore_position() call if cursor relative position
is BTR_PCUR_ON and the cursor points to the record with NOT the same field
values as in a stored record(and some other not important for this case
conditions).
It was done because btr_pcur_restore_position() sets
page_cur_mode_t mode to PAGE_CUR_LE for cursor->rel_pos == BTR_PCUR_ON
before opening cursor. So we are searching for the record less or equal
to stored one. And if the found record is not equal to stored one, then
it is less and we need to move cursor forward.
But there can be a situation when the stored record was purged, but the
new one with the same key but different value was inserted while
row_search_mvcc() was suspended. In this case, when the thread is
awaken, it will invoke sel_restore_position_for_mysql(), which, in turns,
invoke btr_pcur_restore_position(), which will return false because found
record don't match stored record, and
sel_restore_position_for_mysql() will move forward cursor position.
The above can lead to the case when awaken row_search_mvcc() do not see
records inserted by other transactions while it slept. The mtr test case
shows the example how it can be.
The fix is to return special value from persistent cursor restoring
function which would notify its caller that uniq fields of restored
record and stored record are the same, and in this case
sel_restore_position_for_mysql() don't move cursor forward.
Delete-marked records are correctly processed in row_search_mvcc().
Non-unique secondary indexes are "uniquified" by adding the PK, the
index->n_uniq should then be index->n_fields. So there is no need in
additional checks in the fix.
If transaction's readview can't see the changes made in secondary index
record, it requests clustered index record in row_search_mvcc() to check
its transaction id and get the correspondent record version. After this
row_search_mvcc() commits mtr to preserve clustered index latching
order, and starts mtr. Between those mtr commit and start secondary
index pages are unlatched, and purge has the ability to remove stored in
the cursor record, what causes rows duplication in result set for
non-locking reads, as cursor position is restored to the previously
visited record.
To solve this the changes are just switched off for non-locking reads,
it's quite simple solution, besides the changes don't make sense for
non-locking reads.
The more complex and effective from performance perspective solution is
to create mtr savepoint before clustered record requesting and rolling
back to that savepoint after that. See MDEV-27557.
One more solution is to have per-record transaction id for secondary
indexes. See MDEV-17598.
If any of those is implemented, just remove select_lock_type argument in
sel_restore_position_for_mysql().
The result is not used anywhere but in the output of Innodb information
schema, but this can take as much as 7%CPU (only) on a benchmark.
Fix to move fs blocksize calculate to where it is used.
The sys_var class has the deprecation_substitute member to mark the
deprecated variables. As it's set, the server produces warnings when
these variables are used. However, the plugin has no means to utilize
that functionality.
So, the PLUGIN_VAR_DEPRECATED flag is introduced to set the
deprecation_substitute with the empty string. A non-empty string can
make the warning more informative, but there's no nice way seen to
specify it, and not that needed at the moment.