1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-31 22:22:30 +03:00
Commit Graph

554 Commits

Author SHA1 Message Date
Marko Mäkelä
03ca6495df MDEV-24142: Replace InnoDB rw_lock_t with sux_lock
InnoDB buffer pool block and index tree latches depend on a
special kind of read-update-write lock that allows reentrant
(recursive) acquisition of the 'update' and 'write' locks
as well as an upgrade from 'update' lock to 'write' lock.
The 'update' lock allows any number of reader locks from
other threads, but no concurrent 'update' or 'write' lock.

If there were no requirement to support an upgrade from 'update'
to 'write', we could compose the lock out of two srw_lock
(implemented as any type of native rw-lock, such as SRWLOCK on
Microsoft Windows). Removing this requirement is very difficult,
so in commit f7e7f487d4b06695f91f6fbeb0396b9d87fc7bbf we
implemented an 'update' mode to our srw_lock.

Re-entrant or recursive locking is mostly needed when writing or
freeing BLOB pages, but also in crash recovery or when merging
buffered changes to an index page. The re-entrancy allows us to
attach a previously acquired page to a sub-mini-transaction that
will be committed before whatever else is holding the page latch.

The SUX lock supports Shared ('read'), Update, and eXclusive ('write')
locking modes. The S latches are not re-entrant, but a single S latch
may be acquired even if the thread already holds an U latch.

The idea of the U latch is to allow a write of something that concurrent
readers do not care about (such as the contents of BTR_SEG_LEAF,
BTR_SEG_TOP and other page allocation metadata structures, or
the MDEV-6076 PAGE_ROOT_AUTO_INC). (The PAGE_ROOT_AUTO_INC field
is only updated when a dict_table_t for the table exists, and only
read when a dict_table_t for the table is being added to dict_sys.)

block_lock::u_lock_try(bool for_io=true) is used in buf_flush_page()
to allow concurrent readers but no concurrent modifications while the
page is being written to the data file. That latch will be released
by buf_page_write_complete() in a different thread. Hence, we use
the special lock owner value FOR_IO.

The index_lock::u_lock() improves concurrency on operations that
involve non-leaf index pages.

The interface has been cleaned up a little. We will use
x_lock_recursive() instead of x_lock() when we know that a
lock is already held by the current thread. Similarly,
a lock upgrade from U to X is only allowed via u_x_upgrade()
or x_lock_upgraded() but not via x_lock().

We will disable the LatchDebug and sync_array interfaces to
InnoDB rw-locks.

The SEMAPHORES section of SHOW ENGINE INNODB STATUS output
will no longer include any information about InnoDB rw-locks,
only TTASEventMutex (cmake -DMUTEXTYPE=event) waits.
This will make a part of the 'innotop' script dead code.

The block_lock buf_block_t::lock will not be covered by any
PERFORMANCE_SCHEMA instrumentation.

SHOW ENGINE INNODB MUTEX and INFORMATION_SCHEMA.INNODB_MUTEXES
will no longer output source code file names or line numbers.
The dict_index_t::lock will be identified by index and table names,
which should be much more useful. PERFORMANCE_SCHEMA is lumping
information about all dict_index_t::lock together as
event_name='wait/synch/sxlock/innodb/index_tree_rw_lock'.

buf_page_free(): Remove the file,line parameters. The sux_lock will
not store such diagnostic information.

buf_block_dbg_add_level(): Define as empty macro, to be removed
in a subsequent commit.

Unless the build was configured with cmake -DPLUGIN_PERFSCHEMA=NO
the index_lock dict_index_t::lock will be instrumented via
PERFORMANCE_SCHEMA. Similar to
commit 1669c8890c
we will distinguish lock waits by registering shared_lock,exclusive_lock
events instead of try_shared_lock,try_exclusive_lock.
Actual 'try' operations will not be instrumented at all.

rw_lock_list: Remove. After MDEV-24167, this only covered
buf_block_t::lock and dict_index_t::lock. We will output their
information by traversing buf_pool or dict_sys.
2020-12-03 15:19:49 +02:00
Marko Mäkelä
3872e585f3 MDEV-24167: Stabilize perfschema.sxlock_func
The extension of the test perfschema.sxlock_func in
commit 1669c8890c
turned out to be unstable.

Let us filter out purge_sys.latch (trx_purge_latch) from the output,
because it might happen that the purge tasks will not be executed
during the test execution.
2020-12-03 15:15:47 +02:00
Marko Mäkelä
1669c8890c MDEV-24167 fixup: Improve the PERFORMANCE_SCHEMA instrumentation
Let us try to avoid code bloat for the common case that
performance_schema is disabled at runtime, and use
ATTRIBUTE_NOINLINE member functions for instrumented latch acquisition.

Also, let us distinguish lock waits from non-contended lock requests
by using write_lock,read_lock for the requests that lead to waits,
and try_write_lock,try_read_lock for the wait-free lock acquisitions.
Actual 'try' operations are not being instrumented at all.
2020-12-03 09:55:53 +02:00
Marko Mäkelä
e28d9c15c3 MDEV-24167 fixup: Improve perfschema.sxlock_func test 2020-12-01 17:19:53 +02:00
Marko Mäkelä
edbde4a11f MDEV-24167: Replace fil_space::latch
We must avoid acquiring a latch while we are already holding one.
The tablespace latch was being acquired recursively in some
operations that allocate or free pages.
2020-11-24 15:43:12 +02:00
Marko Mäkelä
bdd88cfa34 MDEV-24167: Replace fts_cache_rw_lock, fts_cache_init_rw_lock with mutex
fts_cache_t::init_lock: Replace with mutex. This was only acquired
in exclusive mode.

fts_cache_t:🔒 Replace with mutex. The only read-lock user was
i_s_fts_index_cache_fill() for producing content for the view
INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE.
2020-11-24 15:43:12 +02:00
Marko Mäkelä
1a1b7a6f16 MDEV-24167: Replace dict_operation_lock (dict_sys.latch) 2020-11-24 15:43:11 +02:00
Marko Mäkelä
06ef5509d0 MDEV-24167: Replace trx_purge_latch 2020-11-24 15:43:10 +02:00
Marko Mäkelä
63dd2a97e4 MDEV-24167: Replace trx_i_s_cache_lock 2020-11-24 15:43:09 +02:00
Marko Mäkelä
c561f9e6e8 MDEV-24167: Use lightweight srw_lock for btr_search_latch
Many InnoDB rw-locks unnecessarily depend on the complex
InnoDB rw_lock_t implementation that support the SX lock mode
as well as recursive acquisition of X or SX locks.
One of them is the bunch of adaptive hash index search latches,
instrumented as btr_search_latch in PERFORMANCE_SCHEMA.
Let us introduce a simpler lock for those in order to
reduce overhead.

srw_lock: A simple read-write lock that does not support recursion.
On Microsoft Windows, this wraps SRWLOCK, only adding
runtime overhead if PERFORMANCE_SCHEMA is enabled.
On Linux (all architectures), this is implemented with
std::atomic<uint32_t> and the futex system call.
On other platforms, we will wrap mysql_rwlock_t with
zero runtime overhead.

The PERFORMANCE_SCHEMA instrumentation differs
from InnoDB rw_lock_t in that we will only invoke
PSI_RWLOCK_CALL(start_rwlock_wrwait) or
PSI_RWLOCK_CALL(start_rwlock_rdwait)
if there is an actual conflict.
2020-11-24 15:41:03 +02:00
Marko Mäkelä
a62a675fd2 Merge 10.5 into 10.6 2020-11-23 17:57:58 +02:00
Marko Mäkelä
1e5d989d2a MDEV-24167: Remove PFS instrumentation of buf_block_t
We always defined PFS_SKIP_BUFFER_MUTEX_RWLOCK, that is,
the latches of the buffer pool blocks were never instrumented
in PERFORMANCE_SCHEMA.

For some reason, the debug_latch (which enforce proper usage of
buffer-fixing in debug builds) was instrumented.
2020-11-20 08:55:41 +02:00
Marko Mäkelä
fabdad6857 Merge 10.4 into 10.5 2020-11-17 18:15:13 +02:00
Marko Mäkelä
bbf0b55c17 Work around MDEV-24232: Skip perfschema.nesting if WITH_WSREP=OFF 2020-11-17 18:13:47 +02:00
Oleksandr Byelkin
10aa576483 Merge branch '10.5' into 10.6 2020-11-14 20:05:35 +01:00
Marko Mäkelä
d7a5824899 Merge 10.4 into 10.5 2020-11-13 21:54:21 +02:00
Marko Mäkelä
09a1f0075a Merge 10.5 into 10.6 2020-11-02 12:49:19 +02:00
Marko Mäkelä
898521e2dd Merge 10.4 into 10.5 2020-10-30 11:15:30 +02:00
Sergei Golubchik
e764d11829 fix occasisonal test failures: SELECT without ORDER BY 2020-10-24 11:15:51 +02:00
Marko Mäkelä
7cffb5f6e8 MDEV-23399: Performance regression with write workloads
The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted
the performance bottleneck to the page flushing.

The configuration parameters will be changed as follows:

innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction)
innodb_lru_scan_depth=1536 (old: 1024)
innodb_max_dirty_pages_pct=90 (old: 75)
innodb_max_dirty_pages_pct_lwm=75 (old: 0)

Note: The parameter innodb_lru_scan_depth will only affect LRU
eviction of buffer pool pages when a new page is being allocated. The
page cleaner thread will no longer evict any pages. It used to
guarantee that some pages will remain free in the buffer pool. Now, we
perform that eviction 'on demand' in buf_LRU_get_free_block().
The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows:
 * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks()
 * As a buf_pool.free limit in buf_LRU_list_batch() for terminating
   the flushing that is initiated e.g., by buf_LRU_get_free_block()
The parameter also used to serve as an initial limit for unzip_LRU
eviction (evicting uncompressed page frames while retaining
ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit
of 100 or unlimited for invoking buf_LRU_scan_and_free_block().

The status variables will be changed as follows:

innodb_buffer_pool_pages_flushed: This includes also the count of
innodb_buffer_pool_pages_LRU_flushed and should work reliably,
updated one by one in buf_flush_page() to give more real-time
statistics. The function buf_flush_stats(), which we are removing,
was not called in every code path. For both counters, we will use
regular variables that are incremented in a critical section of
buf_pool.mutex. Note that show_innodb_vars() directly links to the
variables, and reads of the counters will *not* be protected by
buf_pool.mutex, so you cannot get a consistent snapshot of both variables.

The following INFORMATION_SCHEMA.INNODB_METRICS counters will be
removed, because the page cleaner no longer deals with writing or
evicting least recently used pages, and because the single-page writes
have been removed:
* buffer_LRU_batch_flush_avg_time_slot
* buffer_LRU_batch_flush_avg_time_thread
* buffer_LRU_batch_flush_avg_time_est
* buffer_LRU_batch_flush_avg_pass
* buffer_LRU_single_flush_scanned
* buffer_LRU_single_flush_num_scan
* buffer_LRU_single_flush_scanned_per_call

When moving to a single buffer pool instance in MDEV-15058, we missed
some opportunity to simplify the buf_flush_page_cleaner thread. It was
unnecessarily using a mutex and some complex data structures, even
though we always have a single page cleaner thread.

Furthermore, the buf_flush_page_cleaner thread had separate 'recovery'
and 'shutdown' modes where it was waiting to be triggered by some
other thread, adding unnecessary latency and potential for hangs in
relatively rarely executed startup or shutdown code.

The page cleaner was also running two kinds of batches in an
interleaved fashion: "LRU flush" (writing out some least recently used
pages and evicting them on write completion) and the normal batches
that aim to increase the MIN(oldest_modification) in the buffer pool,
to help the log checkpoint advance.

The buf_pool.flush_list flushing was being blocked by
buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN
of a page is ahead of log_sys.get_flushed_lsn(), that is, what has
been persistently written to the redo log, we would trigger a log
flush and then resume the page flushing. This would unnecessarily
limit the performance of the page cleaner thread and trigger the
infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms.
The settings might not be optimal" that were suppressed in
commit d1ab89037a unless log_warnings>2.

Our revised algorithm will make log_sys.get_flushed_lsn() advance at
the start of buf_flush_lists(), and then execute a 'best effort' to
write out all pages. The flush batches will skip pages that were modified
since the log was written, or are are currently exclusively locked.
The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message
will be removed, because by design, the buf_flush_page_cleaner() should
not be blocked during a batch for extended periods of time.

We will remove the single-page flushing altogether. Related to this,
the debug parameter innodb_doublewrite_batch_size will be removed,
because all of the doublewrite buffer will be used for flushing
batches. If a page needs to be evicted from the buffer pool and all
100 least recently used pages in the buffer pool have unflushed
changes, buf_LRU_get_free_block() will execute buf_flush_lists() to
write out and evict innodb_lru_flush_size pages. At most one thread
will execute buf_flush_lists() in buf_LRU_get_free_block(); other
threads will wait for that LRU flushing batch to finish.

To improve concurrency, we will replace the InnoDB ib_mutex_t and
os_event_t native mutexes and condition variables in this area of code.
Most notably, this means that the buffer pool mutex (buf_pool.mutex)
is no longer instrumented via any InnoDB interfaces. It will continue
to be instrumented via PERFORMANCE_SCHEMA.

For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be
declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical
sections of buf_pool.flush_list_mutex should be shorter than those for
buf_pool.mutex, because in the worst case, they cover a linear scan of
buf_pool.flush_list, while the worst case of a critical section of
buf_pool.mutex covers a linear scan of the potentially much longer
buf_pool.LRU list.

mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable
with SAFE_MUTEX. Some InnoDB debug assertions need this predicate
instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner().

buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list:
Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[].
The number of active flush operations.

buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t
instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA
and SAFE_MUTEX instrumentation.

buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU.

buf_pool_t::done_flush_list: Condition variable for !n_flush_list.

buf_pool_t::do_flush_list: Condition variable to wake up the
buf_flush_page_cleaner when a log checkpoint needs to be written
or the server is being shut down. Replaces buf_flush_event.
We will keep using timed waits (the page cleaner thread will wake
_at least_ once per second), because the calculations for
innodb_adaptive_flushing depend on fixed time intervals.

buf_dblwr: Allocate statically, and move all code to member functions.
Use a native mutex and condition variable. Remove code to deal with
single-page flushing.

buf_dblwr_check_block(): Make the check debug-only. We were spending
a significant amount of execution time in page_simple_validate_new().

flush_counters_t::unzip_LRU_evicted: Remove.

IORequest: Make more members const. FIXME: m_fil_node should be removed.

buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex
(which we are removing).

page_cleaner_slot_t, page_cleaner_t: Remove many redundant members.

pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot().

recv_writer_thread: Remove. Recovery works just fine without it, if we
simply invoke buf_flush_sync() at the end of each batch in
recv_sys_t::apply().

recv_recovery_from_checkpoint_finish(): Remove. We can simply call
recv_sys.debug_free() directly.

srv_started_redo: Replaces srv_start_state.

SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown()
can communicate with the normal page cleaner loop via the new function
flush_buffer_pool().

buf_flush_remove(): Assert that the calling thread is holding
buf_pool.flush_list_mutex. This removes unnecessary mutex operations
from buf_flush_remove_pages() and buf_flush_dirty_pages(),
which replace buf_LRU_flush_or_remove_pages().

buf_flush_lists(): Renamed from buf_flush_batch(), with simplified
interface. Return the number of flushed pages. Clarified comments and
renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions
buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this
function, which was their only caller, and remove 2 unnecessary
buf_pool.mutex release/re-acquisition that we used to perform around
the buf_flush_batch() call. At the start, if not all log has been
durably written, wait for a background task to do it, or start a new
task to do it. This allows the log write to run concurrently with our
page flushing batch. Any pages that were skipped due to too recent
FIL_PAGE_LSN or due to them being latched by a writer should be flushed
during the next batch, unless there are further modifications to those
pages. It is possible that a page that we must flush due to small
oldest_modification also carries a recent FIL_PAGE_LSN or is being
constantly modified. In the worst case, all writers would then end up
waiting in log_free_check() to allow the flushing and the checkpoint
to complete.

buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_flush_space(): Auxiliary function to look up a tablespace for
page flushing.

buf_flush_page(): Defer the computation of space->full_crc32(). Never
call log_write_up_to(), but instead skip persistent pages whose latest
modification (FIL_PAGE_LSN) is newer than the redo log. Also skip
pages on which we cannot acquire a shared latch without waiting.

buf_flush_try_neighbors(): Do not bother checking buf_fix_count
because buf_flush_page() will no longer wait for the page latch.
Take the tablespace as a parameter, and only execute this function
when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold().

buf_flush_relocate_on_flush_list(): Declare as cold, and push down
a condition from the callers.

buf_flush_check_neighbor(): Take id.fold() as a parameter.

buf_flush_sync(): Ensure that the buf_pool.flush_list is empty,
because the flushing batch will skip pages whose modifications have
not yet been written to the log or were latched for modification.

buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables.

buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize
the counters, and report n->evicted.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_do_LRU_batch(): Return the number of pages flushed.

buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if
adaptive hash index entries are pointing to the block.

buf_LRU_get_free_block(): Do not wake up the page cleaner, because it
will no longer perform any useful work for us, and we do not want it
to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0)
writes out and evicts at most innodb_lru_flush_size pages. (The
function buf_do_LRU_batch() may complete after writing fewer pages if
more than innodb_lru_scan_depth pages end up in buf_pool.free list.)
Eliminate some mutex release-acquire cycles, and wait for the LRU
flush batch to complete before rescanning.

buf_LRU_check_size_of_non_data_objects(): Simplify the code.

buf_page_write_complete(): Remove the parameter evict, and always
evict pages that were part of an LRU flush.

buf_page_create(): Take a pre-allocated page as a parameter.

buf_pool_t::free_block(): Free a pre-allocated block.

recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block
while not holding recv_sys.mutex. During page allocation, we may
initiate a page flush, which in turn may initiate a log flush, which
would require acquiring log_sys.mutex, which should always be acquired
before recv_sys.mutex in order to avoid deadlocks. Therefore, we must
not be holding recv_sys.mutex while allocating a buffer pool block.

BtrBulk::logFreeCheck(): Skip a redundant condition.

row_undo_step(): Do not invoke srv_inc_activity_count() for every row
that is being rolled back. It should suffice to invoke the function in
trx_flush_log_if_needed() during trx_t::commit_in_memory() when the
rollback completes.

sync_check_enable(): Remove. We will enable innodb_sync_debug from the
very beginning.

Reviewed by: Vladislav Vaintroub
2020-10-15 17:04:56 +03:00
Marko Mäkelä
82bc007ffb Cleanup: Remove non-existing parameters 2020-10-02 07:11:53 +03:00
Marko Mäkelä
32798e154f MDEV-23397 fixup: Remove removed options from tests 2020-08-12 14:43:15 +03:00
Marko Mäkelä
e685809a3b MDEV-23397 Remove deprecated InnoDB options in 10.6 2020-08-04 12:51:59 +03:00
Marko Mäkelä
9a7948e3f6 Merge 10.5 into 10.6 2020-08-04 07:55:16 +03:00
sjaakola
7bffe468b2 MDEV-21910 Deadlock between BF abort and manual KILL command
When high priority replication slave applier encounters lock conflict in innodb,
it will force the conflicting lock holder transaction (victim) to rollback.
This is a must in multi-master sychronous replication model to avoid cluster lock-up.
This high priority victim abort (aka "brute force" (BF) abort), is started
from innodb lock manager while holding the victim's transaction's (trx) mutex.
Depending on the execution state of the victim transaction, it may happen that the
BF abort will call for THD::awake() to wake up the victim transaction for the rollback.
Now, if BF abort requires THD::awake() to be called, then the applier thread executed
locking protocol of: victim trx mutex -> victim THD::LOCK_thd_data

If, at the same time another DBMS super user issues KILL command to abort the same victim,
it will execute locking protocol of: victim THD::LOCK_thd_data  -> victim trx mutex.
These two locking protocol acquire mutexes in opposite order, hence unresolvable mutex locking
deadlock may occur.

The fix in this commit adds THD::wsrep_aborter flag to synchronize who can kill the victim
This flag is set both when BF is called for from innodb and by KILL command.
Either path of victim killing will bail out if victim's wsrep_killed is already
set to avoid mutex conflicts with the other aborter execution. THD::wsrep_aborter
records the aborter THD's ID. This is needed to preserve the right to kill
the victim from different locations for the same aborter thread.
It is also good error logging, to see who is reponsible for the abort.

A new test case was added in galera.galera_bf_kill_debug.test for scenario where
wsrep applier thread and manual KILL command try to kill same idle victim
2020-07-22 08:20:10 +03:00
Vladislav Vaintroub
1a2b494100 MDEV-23124 Eliminate the overhead of system call access() on every USE(or connection)
Make check_db_dir_existence() use a cache of existing directories
clear the cache whenever any directory is removed.

With this, the cost of check_db_dir_existence() will usually be a cost of
acquiring/releasing a non-contended readwrite lock in shared mode,
much less than going to the kernel and filesystem to check for file existence
2020-07-14 11:16:24 +02:00
Jan Lindström
6a1713ac17 Fix perfschema.nesting test case after fix. 2020-06-26 21:40:13 +03:00
Vladislav Vaintroub
98333883ee MDEV-22933 - remove ---source include/not_threadpool.inc from tests 2020-06-18 23:12:54 +02:00
Marko Mäkelä
5155a300fa MDEV-22871: Reduce InnoDB buf_pool.page_hash contention
The rw_lock_s_lock() calls for the buf_pool.page_hash became a
clear bottleneck after MDEV-15053 reduced the contention on
buf_pool.mutex. We will replace that use of rw_lock_t with a
special implementation that is optimized for memory bus traffic.

The hash_table_locks instrumentation will be removed.

buf_pool_t::page_hash: Use a special implementation whose API is
compatible with hash_table_t, and store the custom rw-locks
directly in buf_pool.page_hash.array, intentionally sharing
cache lines with the hash table pointers.

rw_lock: A low-level rw-lock implementation based on std::atomic<uint32_t>
where read_trylock() becomes a simple fetch_add(1).

buf_pool_t::page_hash_latch: The special of rw_lock for the page_hash.

buf_pool_t::page_hash_latch::read_lock(): Assert that buf_pool.mutex
is not being held by the caller.

buf_pool_t::page_hash_latch::write_lock() may be called while not holding
buf_pool.mutex. buf_pool_t::watch_set() is such a caller.

buf_pool_t::page_hash_latch::read_lock_wait(),
page_hash_latch::write_lock_wait(): The spin loops.
These will obey the global parameters innodb_sync_spin_loops and
innodb_sync_spin_wait_delay.

buf_pool_t::freed_page_hash: A singly linked list of copies of
buf_pool.page_hash that ever existed. The fact that we never
free any buf_pool.page_hash.array guarantees that all
page_hash_latch that ever existed will remain valid until shutdown.

buf_pool_t::resize_hash(): Replaces buf_pool_resize_hash().
Prepend a shallow copy of the old page_hash to freed_page_hash.

buf_pool_t::page_hash_table::n_cells: Declare as Atomic_relaxed.

buf_pool_t::page_hash_table::lock(): Explain what prevents a
race condition with buf_pool_t::resize_hash().
2020-06-18 14:16:01 +03:00
Marko Mäkelä
5579c38991 MDEV-22884: Adjust the test for PLUGIN_PERFSCHEMA=NO 2020-06-14 18:58:55 +03:00
Sergei Golubchik
9ed08f3576 MDEV-22884 Assertion `grant_table || grant_table_role' failed on perfschema
when allowing access via perfschema callbacks, update
the cached GRANT_INFO to match
2020-06-13 21:22:07 +02:00
Monty
4102f1589c Aria will now register it's transactions
MDEV-22531 Remove maria::implicit_commit()
MDEV-22607 Assertion `ha_info->ht() != binlog_hton' failed in
           MYSQL_BIN_LOG::unlog_xa_prepare

From the handler point of view, Aria now looks like a transactional
engine. One effect of this is that we don't need to call
maria::implicit_commit() anymore.

This change also forces the server to call trans_commit_stmt() after doing
any read or writes to system tables.  This work will also make it easier
to later allow users to have system tables in other engines than Aria.

To handle the case that Aria doesn't support rollback, a new
handlerton flag, HTON_NO_ROLLBACK, was added to engines that has
transactions without rollback (for the moment only binlog and Aria).

Other things
- Moved freeing of MARIA_SHARE to a separate function as the MARIA_SHARE
  can be still part of a transaction even if the table has closed.
- Changed Aria checkpoint to use the new MARIA_SHARE free function. This
  fixes a possible memory leak when using S3 tables
- Changed testing of binlog_hton to instead test for HTON_NO_ROLLBACK
- Removed checking of has_transaction_manager() in handler.cc as we can
  assume that as the transaction was started by the engine, it does
  support transactions.
- Added new class 'start_new_trans' that can be used to start indepdendent
  sub transactions, for example while reading mysql.proc, using help or
  status tables etc.
- open_system_tables...() and open_proc_table_for_Read() doesn't anymore
  take a Open_tables_backup list. This is now handled by 'start_new_trans'.
- Split thd::has_transactions() to thd::has_transactions() and
  thd::has_transactions_and_rollback()
- Added handlerton code to free cached transactions objects.
  Needed by InnoDB.

squash! 2ed35999f2a2d84f1c786a21ade5db716b6f1bbc
2020-05-23 12:29:10 +03:00
Sergei Golubchik
13038e4705 Merge branch '10.4' into 10.5 2020-05-09 20:43:36 +02:00
Oleksandr Byelkin
0253ea7f22 MDEV-19650: Privilege bug on MariaDB 10.4
Also fixes:
MDEV-21487: Implement option for mysql_upgrade that allows root@localhost to be replaced
MDEV-21486: Implement option for mysql_install_db that allows root@localhost to be replaced

Add user mariadb.sys to be definer of user view
(and has right on underlying table global_priv for
required operation over global_priv
(SELECT,UPDATE,DELETE))

Also changed definer of gis functions in case of creation,
but they work with any definer so upgrade script do not try
to push this change.
2020-05-07 10:54:56 +02:00
Monty
78357796e8 Added support for VISIBLE attribute for indexes in CREATE TABLE
MDEV-22199 Add VISIBLE attribute for indexes in CREATE TABLE

This was done to make it easier to read in dumps from MySQL 8.0 generated
with MySQL workbench
2020-04-19 17:33:51 +03:00
Rasmus Johansson
9075973dbf MDEV-17812 Use MariaDB in error messages instead of MySQL
Changed wording in error messages from MySQL to MariaDB. In
cases where the word server could be used instead it was done.

Tests that have these errors recorded were updated.
2020-04-08 06:09:42 +00:00
Nikita Malyavin
259fb1cbed MDEV-16978 Application-time periods: WITHOUT OVERLAPS
* The overlaps check is implemented on a handler level per row command.
  It creates a separate cursor (actually, another handler instance) and
  caches it inside the original handler, when ha_update_row or
  ha_insert_row is issued. Cursor closes on unlocking the handler.

* Containing the same key in index means unique constraint violation
  even in usual terms. So we fetch left and right neighbours and check
  that they have same key prefix, excluding from the key only the period part.
  If it doesnt match, then there's no such neighbour, and the check passes.
  Otherwise, we check if this neighbour intersects with the considered key.

* The check does not introduce new error and fails with ER_DUPP_KEY error.
  This might break REPLACE workflow and should be fixed separately
2020-03-31 17:42:34 +02:00
Monty
6736f152f4 Added FLUSH THREADS 2020-03-24 21:00:04 +02:00
Sergei Golubchik
02fe997505 fix a nondeterminism in perfschema.statement_program_non_nested test
when selecting from perfschema, filter out statements
used by the test istself in wait_condition.inc, because they,
by design, can be repeated unpredictable number of times.
2020-03-23 10:45:08 +01:00
Alexander Barkov
e0eacbee77 MDEV-21975 Add BINLOG REPLAY privilege and bind new privileges to gtid_seq_no, preudo_thread_id, server_id, gtid_domain_id 2020-03-18 20:16:34 +04:00
Sergei Golubchik
b55d960e43 fix a nondeterminism in perfschema.statement_program_nesting_event_check test
when selecting from perfschema, filter out statements
used by the test istself in wait_condition.inc, because they,
by design, can be repeated unpredictable number of times.
2020-03-16 10:33:38 +01:00
Sergei Golubchik
c4f3e37b0c fix a race condition in the perfschema.transaction_nested_events 2020-03-14 09:51:52 +01:00
Sergei Golubchik
df25d67d5f perfschema test formatting. Use --echo # 2020-03-14 09:51:36 +01:00
Marko Mäkelä
4f4fccecb2 Fix perfschema.statement_program_concurrency 2020-03-11 08:29:48 +02:00
Alexander Barkov
0e04beb28f Recoding new suite/perfschema/r/start_server_low_digest_sql_length.result
Fixing a test failure introduced by MDEV-21743
2020-03-11 00:35:30 +04:00
Alexander Barkov
a1e330de5a MDEV-21743 Split up SUPER privilege to smaller privileges 2020-03-10 23:49:47 +04:00
Sergei Golubchik
7180afa094 fix perfschema for pool-of-threads 2020-03-10 19:24:24 +01:00
Sergei Golubchik
7af733a5a2 perfschema compilation, test and misc fixes 2020-03-10 19:24:23 +01:00
Sergei Golubchik
0ea717f51a P_S 5.7.28 2020-03-10 19:24:22 +01:00
Oleksandr Byelkin
4b087e1754 Merge branch '10.4' into 10.5 2020-02-12 08:55:17 +01:00