This was the last abuse of trx_sys.mutex, which is now exclusively
protecting trx_sys.trx_list.
This global acquisition was also potential scalability bottleneck for
oltp_read_write benchmark. Although it didn't expose itself as such due
to larger scalability issues.
Replaced trx_sys.mutex based synchronisation between ReadView creator
thread and purge coordinator thread performing latest view clone with
ReadView::m_mutex.
It also allowed to simplify tri-state view m_state down to boolean
m_open flag.
For performance reasons trx->read_view.close() is left as atomic relaxed
store, so that we don't have to waste resources for otherwise meaningless
mutex acquisition.
In the merge 9e6e43551f
we made Atomic_counter a more generic wrapper of std::atomic
so that dict_index_t would support the implicit assignment operator.
It is better to revert the changes to Atomic_counter and
instead introduce Atomic_relaxed as a generic wrapper to std::atomic.
Unlike Atomic_counter, we will not define operator++, operator+=
or similar, because we want to make the operations more explicit
in the users of Atomic_wrapper, because unlike loads and stores,
atomic read-modify-write operations always incur some overhead.
- There are multiple inconsistency and incorrect way in which rw-lock
stats are calculated.
- shared rw-lock stats:
"rounds" counter is incremented only once for N rounds done
in spin-cycle.
- all rw-lock stats:
If the spin-cycle is short-circuited then attempts are re-counted.
[If spin-cycle is interrupted, before it completes
srv_n_spin_wait_rounds (default 30) rounds, spin_count is incremented
to consider this. If thread resumes spin-cycle (due to unavailability
of the locks) and is again interrupted or completed, spin_count
is again incremented with the total count, failing to adjust the
previous attempt increment].
- s/x rw-lock stats:
spin_loop counter is not incremented at-all instead it is projected
as 0 (in show engine output) and division to calculate spin-round per
spin-loop is adjusted.
As per the original semantics spin_loop counter should be incremented
once per spin_loop execution.
- sx rw-lock stats:
sx locks increments spin_loop counter but instead of incrementing it
once for a spin_loop invocation it does it multiple times based on how
many time spin_loop flow is repeated for same instance post os-wait.
As part of the SPATIAL INDEX implementation in InnoDB,
dict_index_t was expanded by a rtr_ssn_t field. There are only
3 operations for this field, all protected by rtr_ssn_t::mutex:
* btr_cur_search_to_nth_level() stores the least significant 32 bits
of the 64-bit value that is stored in the index root page.
(This would better be done when the table is opened for the
very first time.)
* rtr_get_new_ssn_id() increments the value by 1.
* rtr_get_current_ssn_id() reads the current value.
All these operations can be implemented equally safely by using
atomic memory access operations.
Thanks to MDEV-15058, there is only one InnoDB buffer pool.
Allocating buf_pool statically removes one level of pointer indirection
and makes code more readable, and removes the awkward initialization of
some buf_pool members.
While doing this, we will also declare some buf_pool_t data members
private and replace some functions with member functions. This is
mostly affecting buffer pool resizing.
This is not aiming to be a complete rewrite of buf_pool_t to
a proper class. Most of the buffer pool interface, such as
buf_page_get_gen(), will remain in the C programming style
for now.
buf_pool_t::withdrawing: Replaces buf_pool_withdrawing.
buf_pool_t::withdraw_clock_: Replaces buf_withdraw_clock.
buf_pool_t::create(): Repalces buf_pool_init().
buf_pool_t::close(): Replaces buf_pool_free().
buf_bool_t::will_be_withdrawn(): Replaces buf_block_will_be_withdrawn(),
buf_frame_will_be_withdrawn().
buf_pool_t::clear_hash_index(): Replaces buf_pool_clear_hash_index().
buf_pool_t::get_n_pages(): Replaces buf_pool_get_n_pages().
buf_pool_t::validate(): Replaces buf_validate().
buf_pool_t::print(): Replaces buf_print().
buf_pool_t::block_from_ahi(): Replaces buf_block_from_ahi().
buf_pool_t::is_block_field(): Replaces buf_pointer_is_block_field().
buf_pool_t::is_block_mutex(): Replaces buf_pool_is_block_mutex().
buf_pool_t::is_block_lock(): Replaces buf_pool_is_block_lock().
buf_pool_t::is_obsolete(): Replaces buf_pool_is_obsolete().
buf_pool_t::io_buf: Make default-constructible.
buf_pool_t::io_buf::create(): Delayed 'constructor'
buf_pool_t::io_buf::close(): Early 'destructor'
HazardPointer: Make default-constructible. Define all member functions
inline, also for derived classes.
The -Wconversion in GCC seems to be stricter than in clang.
GCC at least since version 4.4.7 issues truncation warnings for
assignments to bitfields, while clang 10 appears to only issue
warnings when the sizes in bytes rounded to the nearest integer
powers of 2 are different.
Before GCC 10.0.0, -Wconversion required more casts and would not
allow some operations, such as x<<=1 or x+=1 on a data type that
is narrower than int.
GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining
about x|=y even when x and y are compatible types that are narrower
than int. Hence, we must rewrite some x|=y as
x=static_cast<byte>(x|y) or similar, or we must disable -Wconversion.
In GCC 6 and later, the warning for assigning wider to bitfields
that are narrower than 8, 16, or 32 bits can be suppressed by
applying a bitwise & with the exact bitmask of the bitfield.
For older GCC, we must disable -Wconversion for GCC 4 or 5 in such
cases.
The bitwise negation operator appears to promote short integers
to a wider type, and hence we must add explicit truncation casts
around them. Microsoft Visual C does not allow a static_cast to
truncate a constant, such as static_cast<byte>(1) truncating int.
Hence, we will use the constructor-style cast byte(~1) for such cases.
This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0,
clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019)
on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).
InnoDB RNG maintains global state, causing otherwise unnecessary bus
traffic. Even worse, this is cross-mutex traffic. That is, different
mutexes suffer from contention.
Fixed delay of 4 was verified to give best throughput by OLTP update
index and read-write benchmarks on Intel Broadwell (2/20/40) and
ARM (1/46/46).
This is a backport of ce04790065 from
MariaDB Server 10.3.
ut_rnd_interval(): Remove the first parameter, which was mostly
passed as 0. Implement as a simple wrapper around ut_rnd_gen().
Trivially return 0 if the size of the interval is smaller than 2.
ut_rnd_ulint_counter, ut_rnd_gen_next_ulint(), ut_rnd_gen_ulint(): Remove.
Almost all threads have gone
- the "ticking" threads, that sleep a while then do some work)
(srv_monitor_thread, srv_error_monitor_thread, srv_master_thread)
were replaced with timers. Some timers are periodic,
e.g the "master" timer.
- The btr_defragment_thread is also replaced by a timer , which
reschedules it self when current defragment "item" needs throttling
- the buf_resize_thread and buf_dump_threads are substitutes with tasks
Ditto with page cleaner workers.
- purge workers threads are not tasks as well, and purge cleaner
coordinator is a combination of a task and timer.
- All AIO is outsourced to tpool, Innodb just calls thread_pool::submit_io()
and provides the callback.
- The srv_slot_t was removed, and innodb_debug_sync used in purge
is currently not working, and needs reimplementation.
TODO: do not use fil_* functions for redo log files.
log_t::checkpoint_lock: remove this lock which was used to wait for
async I/O completion.
checkpoint_lock_key
checkpoint_lock: remove now unneeded globals
log_write_checkpoint_info(): remove sync argument because all checkpoint
writes are synchronous now
log_write_checkpoint_info(): remove sync argument
log_group_checkpoint(): merge with the only caller
log_complete_checkpoint(): merge with the only caller
log_t::complete_checkpoint(): remove by merging with the only caller.
srv_slot_t::suspend_time, os_aio_slot_t::reservation_time,
sync_cell_t::reservation_time: Explain what could happen
if the system time has is being adjusted.
fts_sync_t::start_time: Document that the field is mostly unused.
Shorten some VARCHAR attributes to a more reasonable length.
INNODB_METRICS: Rename the column STATUS to ENABLED, and make it Boolean.
Replace with INT(1) many Boolean attributes that were declared as VARCHAR
containing 'NO','YES','disabled','enabled','Uninitialized','Initialized'.
Replace some VARCHAR attributes with ENUM.
Replace some BIGINT with INT when 32 bits are sufficient.
Remove INNODB_SYS_TABLESPACES.SPACE_TYPE. The type of a tablespace
can be derived from the tablespace ID. A fixed number is used for
the system tablespace and the temporary tablespace. All other tablespaces
are single-table or single-partition tablespaces.
i_s_locks_row_t::lock_type, lock_get_type_str(): Remove.
This is a redundant field. Table and record locks can be
distinguished by whether i_s_locks_row_t::lock_index is NULL.
fill_trx_row(): Do not unnecessarily copy the constant strings that
trx->op_info is pointing to.
i_s_locks_row_t::lock_mode: Replace string with integer.
lock_get_mode_str(), lock_get_trx_id(), lock_get_trx(): Remove.
field_store_ulint(): Remove.
In MySQL 5.7.8 an extra level of pointer indirection was added to
dict_operation_lock and some other rw_lock_t without solid justification,
in mysql/mysql-server@52720f1772.
Let us revert that change and remove the rather useless rw_lock_t
constructor and destructor and the magic_n field. In this way,
some unnecessary pointer dereferences and heap allocation will be avoided
and debugging might be a little easier.
Some places didn't match the previous rules, making the Floor
address wrong.
Additional sed rules:
sed -i -e 's/Place.*Suite .*, Boston/Street, Fifth Floor, Boston/g'
sed -i -e 's/Suite .*, Boston/Fifth Floor, Boston/g'
Almost trivial rw_lock_t::waiters transition. Since C++11 doesn't
seem to allow mixed (atomic and non-atomic) access to atomic variables,
we have to perform atomic initialisation.
Almost trivial rw_lock_t::lock_word transition. Since C++11 doesn't
seem to allow mixed (atomic and non-atomic) access to atomic variables,
we have to perform atomic initialisation.
Also made previously broken code in gis0sea.cc even more broken. It is
unclear how it was supposed to work and what exactly it was supposed to
do.
Also, related to MDEV-15522, MDEV-17304, MDEV-17835,
remove the Galera xtrabackup tests, because xtrabackup never worked
with MariaDB Server 10.3 due to InnoDB redo log format changes.