1
0
mirror of https://github.com/MariaDB/server.git synced 2025-09-02 09:41:40 +03:00
Commit Graph

14718 Commits

Author SHA1 Message Date
Marko Mäkelä
db006a9a43 MDEV-21452: Remove os_event_t, MUTEX_EVENT, TTASEventMutex, sync_array
We will default to MUTEXTYPE=sys (using OSTrackMutex) for those
ib_mutex_t that have not been replaced yet.

The view INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS is removed.

The parameter innodb_sync_array_size is removed.

FIXME: innodb_fatal_semaphore_wait_threshold will no longer be enforced.
We should enforce it for lock_sys.mutex and dict_sys.mutex somehow!

innodb_sync_debug=ON might still cover ib_mutex_t.
2020-12-15 17:56:17 +02:00
Marko Mäkelä
38fd7b7d91 MDEV-21452: Replace all direct use of os_event_t
Let us replace os_event_t with mysql_cond_t, and replace the
necessary ib_mutex_t with mysql_mutex_t so that they can be
used with condition variables.

Also, let us replace polling (os_thread_sleep() or timed waits)
with plain mysql_cond_wait() wherever possible.

Furthermore, we will use the lightweight srw_mutex for trx_t::mutex,
to hopefully reduce contention on lock_sys.mutex.

FIXME: Add test coverage of
mariabackup --backup --kill-long-queries-timeout
2020-12-15 17:56:17 +02:00
Stepan Patryshev
1c660211bb MDEV-23659 Update Galera disabled.def file 2020-12-14 19:50:42 +02:00
Marko Mäkelä
9ecd766526 Merge 10.5 into 10.6 2020-12-14 18:02:40 +02:00
Marko Mäkelä
f24b738318 MDEV-24313 (2 of 2): Silently ignored innodb_use_native_aio=1
In commit 5e62b6a5e0 (MDEV-16264)
the logic of os_aio_init() was changed so that it will never fail,
but instead automatically disable innodb_use_native_aio (which is
enabled by default) if the io_setup() system call would fail due
to resource limits being exceeded. This is questionable, especially
because falling back to simulated AIO may lead to significantly
reduced performance.

srv_n_file_io_threads, srv_n_read_io_threads, srv_n_write_io_threads:
Change the data type from ulong to uint.

os_aio_init(): Remove the parameters, and actually return an error code.

thread_pool::configure_aio(): Do not silently fall back to simulated AIO.

Reviewed by: Vladislav Vaintroub
2020-12-14 15:27:03 +02:00
Marko Mäkelä
17d3f8560b MDEV-24313 (1 of 2): Hang with innodb_write_io_threads=1
After commit a5a2ef079c (part of MDEV-23855)
implemented asynchronous doublewrite, it is possible that the server will
hang when the following parametes are in effect:

    innodb_doublewrite=1 (default)
    innodb_write_io_threads=1
    innodb_use_native_aio=0

Note: In commit 5e62b6a5e0 (MDEV-16264)
the logic of os_aio_init() was changed so that it will never fail,
but instead automatically disable innodb_use_native_aio (which is
enabled by default) if the io_setup() system call would fail due
to resource limits being exceeded.

Before commit a5a2ef079c, we used
a synchronous write for the doublewrite buffer batches, always at
most 64 pages at a time. So, upon completing a doublewrite batch,
a single thread would submit at most 64 page writes (for the
individual pages that were first written to the doublewrite buffer).
With that commit, we may submit up to 128 page writes at a time.

The maximum number of outstanding requests per thread is 256.
Because the maximum number of asynchronous write submissions per
thread was roughly doubled, it is now possible that
buf_dblwr_t::flush_buffered_writes_completed() will hang in
io_slots::acquire(), called via os_aio() and fil_space_t::io(),
when submitting writes of the individual blocks.

We will prevent this type of hang by increasing the minimum number
of innodb_write_io_threads from 1 to 2, so that this type of hang
would only become possible when 512 outstanding write requests
are exceeded.
2020-12-14 13:11:44 +02:00
Marko Mäkelä
ca821692af Merge 10.5 into 10.6 2020-12-09 14:36:38 +02:00
Marko Mäkelä
5eb539555b MDEV-12227 Defer writes to the InnoDB temporary tablespace
The flushing of the InnoDB temporary tablespace is unnecessarily
tied to the write-ahead redo logging and redo log checkpoints,
which must be tied to the page writes of persistent tablespaces.

Let us simply omit any pages of temporary tables from buf_pool.flush_list.
In this way, log checkpoints will never incur any 'collateral damage' of
writing out unmodified changes for temporary tables.

After this change, pages of the temporary tablespace can only be written
out by buf_flush_lists(n_pages,0) as part of LRU eviction. Hopefully,
most of the time, that code will never be executed, and instead, the
temporary pages will be evicted by buf_release_freed_page() without
ever being written back to the temporary tablespace file.

This should improve the efficiency of the checkpoint flushing and
the buf_flush_page_cleaner thread.

Reviewed by: Vladislav Vaintroub
2020-12-09 09:22:13 +02:00
Sergei Petrunia
6859e80df7 MDEV-24351: S3, same-backend replication: Dropping a table on master...
..causes error on slave.
Cause: if the master doesn't have the frm file for the table,
DROP TABLE code will call ha_delete_table_force() to drop the table
in all available storage engines.
The issue was that this code path didn't check for
HTON_TABLE_MAY_NOT_EXIST_ON_SLAVE flag for the storage engine,
and so did not add "... IF EXISTS" to the statement that's written
to the binary log.  This can cause error on the slave when it tries to
drop a table that's already gone.
2020-12-08 17:58:22 +03:00
Marko Mäkelä
aa0e380568 MDEV-24348 InnoDB shutdown hang with innodb_flush_sync=0
This hang was caused by MDEV-23855, and we failed to fix it in
MDEV-24109 (commit 4cbfdeca84).

When buf_flush_ahead() is invoked soon before server shutdown
and the non-default setting innodb_flush_sync=OFF is in effect
and the buffer pool contains dirty pages of temporary tables,
the page cleaner thread may remain in an infinite loop
without completing its work, thus causing the shutdown to hang.

buf_flush_page_cleaner(): If the buffer pool contains no
unmodified persistent pages, ensure that buf_flush_sync_lsn= 0
will be assigned, so that shutdown will proceed.

The test case is not deterministic. On my system, it reproduced
the hang with 95% probability when running multiple instances
of the test in parallel, and 4% when running single-threaded.

Thanks to Eugene Kosov for debugging and testing this.
2020-12-04 14:11:48 +02:00
Marko Mäkelä
ba2d45dc54 MDEV-24142: Remove INFORMATION_SCHEMA.INNODB_MUTEXES
Let us remove sux_lock::waits and the associated bookkeeping.
Starting with commit 1669c8890c
the PERFORMANCE_SCHEMA instrumentation interface is keeping
track of lock waits.

The view INFORMATION_SCHEMA.INNODB_MUTEXES only exported counts
of rw-lock waits.

Also, SHOW ENGINE INNODB MUTEX will no longer export any information
about rw-locks.
2020-12-03 15:28:53 +02:00
Marko Mäkelä
9702be2c73 MDEV-24142: Remove __FILE__,__LINE__ related to buf_block_t::lock 2020-12-03 15:28:53 +02:00
Marko Mäkelä
03ca6495df MDEV-24142: Replace InnoDB rw_lock_t with sux_lock
InnoDB buffer pool block and index tree latches depend on a
special kind of read-update-write lock that allows reentrant
(recursive) acquisition of the 'update' and 'write' locks
as well as an upgrade from 'update' lock to 'write' lock.
The 'update' lock allows any number of reader locks from
other threads, but no concurrent 'update' or 'write' lock.

If there were no requirement to support an upgrade from 'update'
to 'write', we could compose the lock out of two srw_lock
(implemented as any type of native rw-lock, such as SRWLOCK on
Microsoft Windows). Removing this requirement is very difficult,
so in commit f7e7f487d4b06695f91f6fbeb0396b9d87fc7bbf we
implemented an 'update' mode to our srw_lock.

Re-entrant or recursive locking is mostly needed when writing or
freeing BLOB pages, but also in crash recovery or when merging
buffered changes to an index page. The re-entrancy allows us to
attach a previously acquired page to a sub-mini-transaction that
will be committed before whatever else is holding the page latch.

The SUX lock supports Shared ('read'), Update, and eXclusive ('write')
locking modes. The S latches are not re-entrant, but a single S latch
may be acquired even if the thread already holds an U latch.

The idea of the U latch is to allow a write of something that concurrent
readers do not care about (such as the contents of BTR_SEG_LEAF,
BTR_SEG_TOP and other page allocation metadata structures, or
the MDEV-6076 PAGE_ROOT_AUTO_INC). (The PAGE_ROOT_AUTO_INC field
is only updated when a dict_table_t for the table exists, and only
read when a dict_table_t for the table is being added to dict_sys.)

block_lock::u_lock_try(bool for_io=true) is used in buf_flush_page()
to allow concurrent readers but no concurrent modifications while the
page is being written to the data file. That latch will be released
by buf_page_write_complete() in a different thread. Hence, we use
the special lock owner value FOR_IO.

The index_lock::u_lock() improves concurrency on operations that
involve non-leaf index pages.

The interface has been cleaned up a little. We will use
x_lock_recursive() instead of x_lock() when we know that a
lock is already held by the current thread. Similarly,
a lock upgrade from U to X is only allowed via u_x_upgrade()
or x_lock_upgraded() but not via x_lock().

We will disable the LatchDebug and sync_array interfaces to
InnoDB rw-locks.

The SEMAPHORES section of SHOW ENGINE INNODB STATUS output
will no longer include any information about InnoDB rw-locks,
only TTASEventMutex (cmake -DMUTEXTYPE=event) waits.
This will make a part of the 'innotop' script dead code.

The block_lock buf_block_t::lock will not be covered by any
PERFORMANCE_SCHEMA instrumentation.

SHOW ENGINE INNODB MUTEX and INFORMATION_SCHEMA.INNODB_MUTEXES
will no longer output source code file names or line numbers.
The dict_index_t::lock will be identified by index and table names,
which should be much more useful. PERFORMANCE_SCHEMA is lumping
information about all dict_index_t::lock together as
event_name='wait/synch/sxlock/innodb/index_tree_rw_lock'.

buf_page_free(): Remove the file,line parameters. The sux_lock will
not store such diagnostic information.

buf_block_dbg_add_level(): Define as empty macro, to be removed
in a subsequent commit.

Unless the build was configured with cmake -DPLUGIN_PERFSCHEMA=NO
the index_lock dict_index_t::lock will be instrumented via
PERFORMANCE_SCHEMA. Similar to
commit 1669c8890c
we will distinguish lock waits by registering shared_lock,exclusive_lock
events instead of try_shared_lock,try_exclusive_lock.
Actual 'try' operations will not be instrumented at all.

rw_lock_list: Remove. After MDEV-24167, this only covered
buf_block_t::lock and dict_index_t::lock. We will output their
information by traversing buf_pool or dict_sys.
2020-12-03 15:19:49 +02:00
Marko Mäkelä
3872e585f3 MDEV-24167: Stabilize perfschema.sxlock_func
The extension of the test perfschema.sxlock_func in
commit 1669c8890c
turned out to be unstable.

Let us filter out purge_sys.latch (trx_purge_latch) from the output,
because it might happen that the purge tasks will not be executed
during the test execution.
2020-12-03 15:15:47 +02:00
Marko Mäkelä
1669c8890c MDEV-24167 fixup: Improve the PERFORMANCE_SCHEMA instrumentation
Let us try to avoid code bloat for the common case that
performance_schema is disabled at runtime, and use
ATTRIBUTE_NOINLINE member functions for instrumented latch acquisition.

Also, let us distinguish lock waits from non-contended lock requests
by using write_lock,read_lock for the requests that lead to waits,
and try_write_lock,try_read_lock for the wait-free lock acquisitions.
Actual 'try' operations are not being instrumented at all.
2020-12-03 09:55:53 +02:00
Marko Mäkelä
a13fac9eee Merge 10.5 into 10.6 2020-12-03 08:12:47 +02:00
Marko Mäkelä
9b725f9aef MDEV-20051 fixup: Correct galera.galera_defaults result
For some reason, the test was never adjusted for
commit e6a50e41da.
2020-12-02 21:28:18 +02:00
Marko Mäkelä
6a1e655cb0 Merge 10.4 into 10.5 2020-12-02 18:29:49 +02:00
Marko Mäkelä
589cf8dbf3 Merge 10.3 into 10.4 2020-12-01 19:51:14 +02:00
Marko Mäkelä
e28d9c15c3 MDEV-24167 fixup: Improve perfschema.sxlock_func test 2020-12-01 17:19:53 +02:00
Marko Mäkelä
73f34336e3 MDEV-24323 Crash on recovery after kill during instant ADD COLUMN
row_undo_ins_parse_undo_rec(): Do not try to read non-existing
virtual column information for the metadata record.
2020-12-01 15:24:49 +02:00
Marko Mäkelä
81ab9ea63f Merge 10.2 into 10.3 2020-12-01 14:55:46 +02:00
Vlad Lesin
e6b3e38d62 MDEV-22929 MariaBackup option to report and/or continue when corruption is encountered
The new option --log-innodb-page-corruption is introduced.

When this option is set, backup is not interrupted if innodb corrupted
page is detected. Instead it logs all found corrupted pages in
innodb_corrupted_pages file in backup directory and finishes with error.

For incremental backup corrupted pages are also copied to .delta file,
because we can't do LSN check for such pages during backup,
innodb_corrupted_pages will also be created in incremental backup
directory.

During --prepare, corrupted pages list is read from the file just after
redo log is applied, and each page from the list is checked if it is allocated
in it's tablespace or not. If it is not allocated, then it is zeroed out,
flushed to the tablespace and removed from the list. If all pages are removed
from the list, then --prepare is finished successfully and
innodb_corrupted_pages file is removed from backup directory. Otherwise
--prepare is finished with error message and innodb_corrupted_pages contains
the list of the pages, which are detected as corrupted during backup, and are
allocated in their tablespaces, what means backup directory contains corrupted
innodb pages, and backup can not be considered as consistent.

For incremental --prepare corrupted pages from .delta files are applied
to the base backup, innodb_corrupted_pages is read from both base in
incremental directories, and the same action is proceded for corrupted
pages list as for full --prepare. innodb_corrupted_pages file is
modified or removed only in base directory.

If DDL happens during backup, it is also processed at the end of backup
to have correct tablespace names in innodb_corrupted_pages.
2020-12-01 08:08:57 +03:00
Monty
828471cbf8 MDEV 15532 Assertion `!log->same_pk' failed in row_log_table_apply_delete
The reason for the failure is that
thd->mdl_context.release_transactional_locks()
was called after commit & rollback even in cases where the current
transaction is still active.

For 10.2, 10.3 and 10.4 the fix is simple:
- Replace all calls to thd->mdl_context.release_transactional_locks() with
  thd->release_transactional_locks(). The thd function will only call
  the mdl_context function if there are no active transactional locks.
  In 10.6 we will better fix where we will change the return value for
  some trans_xxx() functions to indicate if transaction did close the
  transaction or not. This will avoid the need of the indirect call.

Other things:
- trans_xa_commit() and trans_xa_rollback() will automatically
  call release_transactional_locks() if the transaction is closed.
- We can't do that for the other functions as the caller of many of these
  are doing additional work (like close_thread_tables) before calling
  release_transactional_locks().
- Added missing abort_result_set() and missing DBUG_RETURN in
  select_create::send_eof()
- Fixed wrong indentation in injector::transaction::commit()
2020-11-30 22:21:43 +02:00
Monty
c537576495 Fixed maria.create test 2020-11-30 19:57:50 +02:00
Monty
6261b1f45e Fixed maria.create test 2020-11-30 19:49:06 +02:00
Anel Husakovic
1ccd1daaff MDEV-24289: show grants missing with grant option
Reviewed by:serg@mariadb.com
2020-11-26 18:10:40 +01:00
Marko Mäkelä
edbde4a11f MDEV-24167: Replace fil_space::latch
We must avoid acquiring a latch while we are already holding one.
The tablespace latch was being acquired recursively in some
operations that allocate or free pages.
2020-11-24 15:43:12 +02:00
Marko Mäkelä
bdd88cfa34 MDEV-24167: Replace fts_cache_rw_lock, fts_cache_init_rw_lock with mutex
fts_cache_t::init_lock: Replace with mutex. This was only acquired
in exclusive mode.

fts_cache_t:🔒 Replace with mutex. The only read-lock user was
i_s_fts_index_cache_fill() for producing content for the view
INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE.
2020-11-24 15:43:12 +02:00
Marko Mäkelä
1a1b7a6f16 MDEV-24167: Replace dict_operation_lock (dict_sys.latch) 2020-11-24 15:43:11 +02:00
Marko Mäkelä
06ef5509d0 MDEV-24167: Replace trx_purge_latch 2020-11-24 15:43:10 +02:00
Marko Mäkelä
63dd2a97e4 MDEV-24167: Replace trx_i_s_cache_lock 2020-11-24 15:43:09 +02:00
Marko Mäkelä
c561f9e6e8 MDEV-24167: Use lightweight srw_lock for btr_search_latch
Many InnoDB rw-locks unnecessarily depend on the complex
InnoDB rw_lock_t implementation that support the SX lock mode
as well as recursive acquisition of X or SX locks.
One of them is the bunch of adaptive hash index search latches,
instrumented as btr_search_latch in PERFORMANCE_SCHEMA.
Let us introduce a simpler lock for those in order to
reduce overhead.

srw_lock: A simple read-write lock that does not support recursion.
On Microsoft Windows, this wraps SRWLOCK, only adding
runtime overhead if PERFORMANCE_SCHEMA is enabled.
On Linux (all architectures), this is implemented with
std::atomic<uint32_t> and the futex system call.
On other platforms, we will wrap mysql_rwlock_t with
zero runtime overhead.

The PERFORMANCE_SCHEMA instrumentation differs
from InnoDB rw_lock_t in that we will only invoke
PSI_RWLOCK_CALL(start_rwlock_wrwait) or
PSI_RWLOCK_CALL(start_rwlock_rdwait)
if there is an actual conflict.
2020-11-24 15:41:03 +02:00
Marko Mäkelä
a62a675fd2 Merge 10.5 into 10.6 2020-11-23 17:57:58 +02:00
Sujatha
7effcb8ed6 MDEV-23846: O_TMPFILE error in mysqlbinlog stream output breaks restore
Problem:
========
When O_TMPFILE is not supported mysqlbinlog outputs the error to standard
stream as a warning which breaks PITR:

ERROR 1064 (42000) at line 382: You have an error in your SQL syntax; check
the manual that corresponds to your MariaDB server version for the right
syntax to use near 'mysqlbinlog: O_TMPFILE is not supported on /tmp (disabling
    future attempts)

Analysis:
=========
'mysqlbinlog' utility is used to perform point-in-time-recovery based on binary
log. It converts the events in the binary log files, from binary format to text
so that they can be viewed or applied. This output can be saved to a file and
it can be sourced back to mysql client. The mysqlbinlog utility stores the
text output into IO_CACHE and when it is full the data is written to a temp
file.  The temporary file creation is attempted using 'O_TMPFILE' flag. If the
underlying filesystem doesn't support this operation, a note is printed on to
standard error and file creation is done without O_TMPFILE' flag. If standard
error is redirected to standard output the note gets written to the sql file
as shown below.

/bld/client/mysqlbinlog: O_TMPFILE is not supported on /tmp (disabling future
    attempts)
table id 32

When the sql file is used for PITR, it leads to a syntax error as it is not a
valid sql command.

Fix:
====
Make 'my_message_stderr' to ignore messages which are flagged as ME_NOTE and
ME_ERROR_LOG_ONLY. ME_ERROR_LOG_ONLY flag is applicable to server. In order to
print an informational note to stderr stream, ME_NOTE flag without
ME_ERROR_LOG_ONLY flag should be specified. 'my_message_stderr' should print
messages flagged with ME_WARNING or ME_FATAL to stderr stream.
2020-11-23 12:16:45 +05:30
Jan Lindström
fe56e0e342 MDEV-22136 : wsrep_restart_slave = 1 does not always work
Add test case
2020-11-23 07:39:23 +02:00
Marko Mäkelä
1e5d989d2a MDEV-24167: Remove PFS instrumentation of buf_block_t
We always defined PFS_SKIP_BUFFER_MUTEX_RWLOCK, that is,
the latches of the buffer pool blocks were never instrumented
in PERFORMANCE_SCHEMA.

For some reason, the debug_latch (which enforce proper usage of
buffer-fixing in debug builds) was instrumented.
2020-11-20 08:55:41 +02:00
Marko Mäkelä
3c8ecb5bbd Run innodb_wl6326_big only in debug builds
The test seems to deterministically fail on RelWithDebInfo builds
due to a timeout in wait_condition.inc.

According to Matthias Leich (the original author of the test),
the failure rate would reduce if we disabled the purge of
transaction history by setting innodb_force_recovery=2.

For now, let us run this stress test on debug builds only.
2020-11-20 08:06:24 +02:00
Jan Lindström
031e1427ed MDEV-23659: Update Galera disabled.def file
Removed
* lp1376747-4
* MDEV-16509
* galera_defaults
2020-11-19 12:42:54 +02:00
Jan Lindström
6f50f51e60 MDEV-21494 : Galera test sporadic failure on galera.galera_defaults
Make sure that we operate with correct Galera library version
and do not print wsrep_provider_options field.
2020-11-19 12:42:54 +02:00
Daniele Sciascia
60035bd2f1 Make test galera_parallel_apply_3nodes deterministic
Test galera_parallel_apply_3nodes started to failed occasionally.
The test assumes that one round of autocommit retry is sufficient in
order to avoid a deadlock error when two conflicting UPDATE statements
run concurrently.
This assumption no longer holds after galera library has changed
last_committed() to return the seqno of the last transaction that left
apply monitor, rather than commit monitor. So it is possible that
after a BF abort, a command is re-executed before it's BF abortee has
left the apply monitor. Thus causing another retry or a deadlock error.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2020-11-19 12:42:54 +02:00
Jan Lindström
3897ce2396 MDEV-21523 : galera.MDEV-16509 MTR failed: timeout after 900 seconds: Can't connect to local MySQL server
Add requirement for debug-build.
2020-11-19 12:42:54 +02:00
Jan Lindström
b04be43ead MDEV-24165 : Galera test failure on galera_var_ignore_apply_errors
Add wait conditions.
2020-11-19 12:42:54 +02:00
Marko Mäkelä
33d41167c5 MDEV-24224 Gap lock on delete in 10.5 using READ COMMITTED
When MDEV-19544 (commit 1a6f470464)
simplified the initialization of the local variable
set_also_gap_locks, an inadvertent change was included.
Essentially, all code branches that are executed when
set_also_gap_locks hold must also ensure that
trx->isolation_level > TRX_ISO_READ_COMMITTED holds.
This was being violated in a few code paths.

It turns out that there is an even simpler fix: Remove the test
of thd_is_select() completely. In that way, the first part of
UPDATE or DELETE should work exactly like SELECT...FOR UPDATE.

thd_is_select(): Remove.
2020-11-18 10:27:18 +02:00
Oleksandr Byelkin
c3d4e57182 MDEV-19162 anonymous derived tables part
The idea of Alexander Barkov: to add name derived as a slect id.
2020-11-17 20:11:39 +01:00
Marko Mäkelä
fabdad6857 Merge 10.4 into 10.5 2020-11-17 18:15:13 +02:00
Marko Mäkelä
bbf0b55c17 Work around MDEV-24232: Skip perfschema.nesting if WITH_WSREP=OFF 2020-11-17 18:13:47 +02:00
Marko Mäkelä
f0c9903795 Merge 10.3 into 10.4 2020-11-17 13:56:20 +02:00
Daniele Sciascia
694926a4f7 Fix suppression in MTR test galera_3nodes.inconsistency_shutdown 2020-11-17 13:00:44 +02:00
Jan Lindström
ab4f743610 MDEV-24169 : Galera test failure on galera_rsu_simple
Add wait_condition.
2020-11-17 08:52:47 +02:00