Problem:
=======
- During alter rebuild, document read from old table is tokenzied
parallelly by innodb_ft_sort_pll_degree threads and stores it
in respective merge files. While doing the parallel merge, InnoDB
wrongly skips the root level selection of merging buffer records.
So it leads to insertion of merge records in non-ascending order.
Solution:
==========
Build selection tree for the root level also. So that root of
selection tree can always contain sorted buffer.
The -Wconversion in GCC seems to be stricter than in clang.
GCC at least since version 4.4.7 issues truncation warnings for
assignments to bitfields, while clang 10 appears to only issue
warnings when the sizes in bytes rounded to the nearest integer
powers of 2 are different.
Before GCC 10.0.0, -Wconversion required more casts and would not
allow some operations, such as x<<=1 or x+=1 on a data type that
is narrower than int.
GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining
about x|=y even when x and y are compatible types that are narrower
than int. Hence, we must rewrite some x|=y as
x=static_cast<byte>(x|y) or similar, or we must disable -Wconversion.
In GCC 6 and later, the warning for assigning wider to bitfields
that are narrower than 8, 16, or 32 bits can be suppressed by
applying a bitwise & with the exact bitmask of the bitfield.
For older GCC, we must disable -Wconversion for GCC 4 or 5 in such
cases.
The bitwise negation operator appears to promote short integers
to a wider type, and hence we must add explicit truncation casts
around them. Microsoft Visual C does not allow a static_cast to
truncate a constant, such as static_cast<byte>(1) truncating int.
Hence, we will use the constructor-style cast byte(~1) for such cases.
This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0,
clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019)
on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).
During native table rebuild or index creation, InnoDB used to skip
redo logging and write MLOG_INDEX_LOAD records to inform crash recovery
and Mariabackup of the gaps in redo log. This is fragile and prohibits
some optimizations, such as skipping the doublewrite buffer for
newly (re)initialized pages (MDEV-19738).
row_merge_write_redo(): Remove. We do not write MLOG_INDEX_LOAD
records any more. Instead, we write full redo log.
FlushObserver: Remove.
fseg_free_page_func(): Remove the parameter log. Redo logging
cannot be disabled.
fil_space_t::redo_skipped_count: Remove.
We cannot remove buf_block_t::skip_flush_check, because PageBulk
will temporarily generate invalid B-tree pages in the buffer pool.
Now that we will be invoking dtuple_get_n_ext() instead of
letting btr_push_update_extern_fields() update an already
calculated value, it is unnecessary to calculate the n_ext
upfront.
row_rec_to_index_entry(), row_rec_to_index_entry_low():
Remove the output parameter n_ext.
offset_t: this is a type which represents one record offset.
It's unsigned short int.
a lot of functions: replace ulint with offset_t
btr_pcur_restore_position_func(),
page_validate(),
row_ins_scan_sec_index_for_duplicate(),
row_upd_clust_rec_by_insert_inherit_func(),
row_vers_impl_x_locked_low(),
trx_undo_prev_version_build():
allocate record offsets on the stack instead of waiting for rec_get_offsets()
to allocate it from mem_heap_t. So, reducing memory allocations.
RECORD_OFFSET, INDEX_OFFSET:
now it's less convenient to store pointers in offset_t*
array. One pointer occupies now several offset_t. And those constant are start
indexes into array to places where to store pointer values
REC_OFFS_HEADER_SIZE: adjusted for the new reality
REC_OFFS_NORMAL_SIZE:
increase size from 100 to 300 which means less heap allocations.
And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which
is smaller than previous 800 bytes.
REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality
rem0rec.h, rem0rec.ic, rem0rec.cc:
various arguments, return values and local variables types were changed to
fix numerous integer conversions issues.
enum field_type_t:
offset types concept was introduces which replaces old offset flags stuff.
Like in earlier version, 2 upper bits are used to store offset type.
And this enum represents those types.
REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed
get_type(), set_type(), get_value(), combine():
these are convenience functions to work with offsets and it's types
rec_offs_base()[0]:
still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL
rec_offs_base()[i]:
these have type offset_t now. Two upper bits contains type.
Before commit 90c52e5291 introduced
aligned_malloc(), InnoDB always used a pattern of over-allocating
memory and invoking ut_align() to guarantee the desired alignment.
It is cleaner to invoke aligned_malloc() and aligned_free() directly.
ut_align(): Remove. In assertions, ut_align_down() can be used instead.
Almost all threads have gone
- the "ticking" threads, that sleep a while then do some work)
(srv_monitor_thread, srv_error_monitor_thread, srv_master_thread)
were replaced with timers. Some timers are periodic,
e.g the "master" timer.
- The btr_defragment_thread is also replaced by a timer , which
reschedules it self when current defragment "item" needs throttling
- the buf_resize_thread and buf_dump_threads are substitutes with tasks
Ditto with page cleaner workers.
- purge workers threads are not tasks as well, and purge cleaner
coordinator is a combination of a task and timer.
- All AIO is outsourced to tpool, Innodb just calls thread_pool::submit_io()
and provides the callback.
- The srv_slot_t was removed, and innodb_debug_sync used in purge
is currently not working, and needs reimplementation.
Problem:
=======
During online alter, fts tokenization thread uses new table page size
to read the externally stored page from old table. If the alter changes
the page size then it leads to failure of alter table.
Solution:
=========
fts tokenization thread should use old table page size to read the
externally stored page from old table.
InnoDB duplicates file descriptor returned by create_temp_file() to
workaround further inconsistent use of this descriptor.
Use mysys file descriptors consistently for innobase_mysql_tmpfile(path).
Mostly close it by appropriate mysys wrappers.
fts_table_t::parent: Remove the redundant field. Refer to
table->name.m_name instead.
fts_update_sync_doc_id(), fts_update_next_doc_id(): Remove
the redundant parameter table_name.
fts_get_table_name_prefix(): Access the dict_table_t::name.
FIXME: Ensure that this access is always covered by
dict_sys->mutex.
The record MLOG_INDEX_LOAD is supposed to be written to indicate that
some page modifications bypassed redo logging, and that redo logging
is now re-enabled. It was not written for fulltext indexes during
ALTER TABLE.
row_merge_write_redo(): Declare globally. Assert that the index
is neither a spatial nor fulltext index.
recv_mlog_index_load(): Observe a MLOG_INDEX_LOAD operation.
recv_parse_log_recs(): Handle MLOG_INDEX_LOAD also in multi-record
mini-transactions. Because of this omission, we should keep writing
MLOG_INDEX_LOAD in single-record mini-transactions, because older
versions of Mariabackup would fail.
row_fts_merge_insert(): Write MLOG_INDEX_LOAD for the auxiliary
tables of fulltext indexes.
1) Avoid writing of MLOG_INDEX_LOAD redo log record during inplace
alter table when the table is empty and also for spatial index.
2) Avoid creation of temporary merge file for spatial index during
index creation process.
row_merge_create_fts_sort_index(): Initialize dict_col_t in
an unambiguous way. GCC 6 and later appear to be able to optimize
away the memset() that is part of mem_heap_zalloc() in the
placement new call. Let us avoid using placement new in order
to ensure that the objects will actually be initialized.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71388https://gcc.gnu.org/ml/gcc/2016-02/msg00207.html
While the latter reference hints that the optimization is only
applicable to non-POD types (and dict_col_t does not define
any member functions before 10.2), it is most consistent to
use the same initialization across all versions.
row_merge_create_fts_sort_index(): Initialize dict_col_t.
This fixes an access to uninitialized dict_col_t::ind when a debug
assertion in MariaDB 10.4 invokes is_dropped() in
rec_get_converted_size_comp_prefix_low(). Older MariaDB versions
seem to be unaffected by the uninitialized values, but it should
not hurt to initialize everything.
MySQL 5.7 introduced the class page_size_t and increased the size of
buffer pool page descriptors by introducing this object to them.
Maybe the intention of this exercise was to prepare for a future
where the buffer pool could accommodate multiple page sizes.
But that future never arrived, not even in MySQL 8.0. It is much
easier to manage a pool of a single page size, and typically all
storage devices of an InnoDB instance benefit from using the same
page size.
Let us remove page_size_t from MariaDB Server. This will make it
easier to remove support for ROW_FORMAT=COMPRESSED (or make it a
compile-time option) in the future, just by removing various
occurrences of zip_size.