1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-30 11:22:14 +03:00
Commit Graph

516 Commits

Author SHA1 Message Date
Vladislav Vaintroub
f0fa40efad MDEV-25785 Add support for OpenSSL 3.0
Summary of changes

- MD_CTX_SIZE is increased

- EVP_CIPHER_CTX_buf_noconst(ctx) does not work anymore, points
  to nobody knows where. The assumption made previously was that
  (since the function does not seem to be documented)
  was that it points to the last partial source block.
  Add own partial block buffer for NOPAD encryption instead

- SECLEVEL in CipherString in openssl.cnf
  had been downgraded to 0, from 1, to make TLSv1.0 and TLSv1.1 possible
   (according to https://github.com/openssl/openssl/blob/openssl-3.0.0/NEWS.md
   even though the manual for SSL_CTX_get_security_level claims that it
   should not be necessary)

- Workaround Ssl_cipher_list issue, it now returns TLSv1.3 ciphers,
  in addition to what was set in --ssl-cipher

- ctx_buf buffer now must be aligned to 16 bytes with openssl(
  previously with WolfSSL only), ot crashes will happen

- updated aes-t , to be better debuggable
  using function, rather than a huge multiline macro
  added test that does "nopad" encryption piece-wise, to test
  replacement of EVP_CIPHER_CTX_buf_noconst

part of MDEV-28133
2022-05-23 15:27:51 +02:00
Marko Mäkelä
92f79a22e6 Merge 10.5 into 10.6 2022-02-22 12:12:49 +02:00
Vlad Lesin
a112a80b47 Merge 10.4 into 10.5 2022-02-22 10:35:16 +03:00
Vlad Lesin
f6f055a191 Merge 10.3 into 10.4 2022-02-21 14:10:27 +03:00
Nayuta Yanagisawa
66f55a018b MDEV-27730 Add PLUGIN_VAR_DEPRECATED flag to plugin variables
The sys_var class has the deprecation_substitute member to mark the
deprecated variables. As it's set, the server produces warnings when
these variables are used. However, the plugin has no means to utilize
that functionality.

So, the PLUGIN_VAR_DEPRECATED flag is introduced to set the
deprecation_substitute with the empty string. A non-empty string can
make the warning more informative, but there's no nice way seen to
specify it, and not that needed at the moment.
2022-02-18 13:10:20 +09:00
Oleksandr Byelkin
f5c5f8e41e Merge branch '10.5' into 10.6 2022-02-03 17:01:31 +01:00
Oleksandr Byelkin
cf63eecef4 Merge branch '10.4' into 10.5 2022-02-01 20:33:04 +01:00
Oleksandr Byelkin
880d543554 Merge branch 'merge-perfschema-5.7' into 10.5 2022-01-28 11:57:52 +01:00
Sergei Golubchik
7b555ff2c5 MDEV-27341 Use SET PASSWORD to change PAM service
SET PASSWORD = PASSWORD('foo') would fail for pam plugin with

ERROR HY000: SET PASSWORD is ignored for users authenticating via pam plugin

but SET PASSWORD = 'foo' would not.

Now it will.
2022-01-17 18:19:29 +01:00
Marko Mäkelä
3f5726768f Merge 10.5 into 10.6 2022-01-04 09:26:38 +02:00
Julius Goryavsky
55bb933a88 Merge branch 10.4 into 10.5 2021-12-26 12:51:04 +01:00
sjaakola
c1846c4fcf MDEV-26803 PA unsafety with FK cascade delete operation
This commit has a mtr test where two two transactions delete a row from
two separate tables, which will cascade a FK delete for the same row in
a third table. Second replica node is configured with 2 applier threads,
and the test will fail if these two transactions are applied in parallel.

The actual fix, in this commit, is to mark a transaction as unsafe for
parallel applying when it traverses into cascade delete operation.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-12-17 09:38:23 +02:00
Marko Mäkelä
e94172c2a0 Merge 10.5 into 10.6 2021-08-31 11:00:41 +03:00
Marko Mäkelä
e62120cec7 Merge 10.4 into 10.5 2021-08-31 10:04:56 +03:00
Marko Mäkelä
0464761126 Merge 10.3 into 10.4 2021-08-31 09:22:21 +03:00
Marko Mäkelä
e835cc851e Merge 10.2 into 10.3 2021-08-31 08:36:59 +03:00
Marko Mäkelä
fda704c82c Fix GCC 11 -Wmaybe-uninitialized for PLUGIN_PERFSCHEMA
init_mutex_v1_t: Stop lying that the mutex parameter is const.
GCC 11.2.0 assumes that it is and could complain about any mysql_mutex_t
being uninitialized even after mysql_mutex_init() as long as
PLUGIN_PERFSCHEMA is enabled.

init_rwlock_v1_t, init_cond_v1_t: Remove untruthful const qualifiers.

Note: init_socket_v1_t is expecting that the socket fd has already
been created before PSI_SOCKET_CALL(init_socket), and therefore that
parameter really is being treated as a pointer to const.
2021-08-30 11:52:59 +03:00
Oleksandr Byelkin
6efb5e9f5e Merge branch '10.5' into 10.6 2021-08-02 10:11:41 +02:00
Oleksandr Byelkin
ae6bdc6769 Merge branch '10.4' into 10.5 2021-07-31 23:19:51 +02:00
Michael Okoko
6cd3588f0e Improve documentation of json parser functions
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2021-07-22 21:51:49 +03:00
Marko Mäkelä
4dfec8b230 Merge 10.5 into 10.6 2021-06-21 17:49:33 +03:00
Marko Mäkelä
a42c80bd48 Merge 10.4 into 10.5 2021-06-21 14:22:22 +03:00
Monty
af33202af7 Added DDL_options_st *thd_ddl_options(const MYSQL_THD thd)
This is used by InnoDB to detect if CREATE...SELECT is used

Other things:
- Changed InnoDB to use thd_ddl_options()
- Removed lock checking code for create...select (Approved by Marko)
2021-06-14 17:03:19 +03:00
Vladislav Vaintroub
b81803f065 MDEV-22221: MariaDB with WolfSSL doesn't support AES-GCM cipher for SSL
Enable AES-GCM for SSL (only).

AES-GCM for encryption plugins remains disabled (aes-t fails, on some bug
in GCM or CTR padding)
2021-06-09 15:44:55 +02:00
Sergei Golubchik
0b116d160a 5.7.34 2021-05-03 11:22:07 +02:00
Daniel Black
460d480c74 MDEV-5536: add systemd socket activation
Systemd has a socket activation feature where a mariadb.socket
definition defines the sockets to listen to, and passes those
file descriptors directly to mariadbd to use when a connection
occurs.

The new functionality is utilized when starting as follows:

  systemctl start mariadb.socket

The mariadb.socket definition only needs to contain the network
information, ListenStream= directives, the mariadb.service
definition is still used for service instigation.

When mariadbd is started in this way, the socket, port, bind-address
backlog are all assumed to be self contained in the mariadb.socket
definition and as such the mariadb settings and command line
arguments of these network settings are ignored.
See man systemd.socket for how to limit this to specific ports.

Extra ports, those specified with extra_port in socket activation
mode, are those with a FileDescriptorName=extra. These need
to be in a separate service name like mariadb-extra.socket and
these require a Service={mariadb.service} directive to map to the
original service. Extra ports need systemd v227 or greater
(not RHEL/Centos7 - v219) when FileDescriptorName= was added,
otherwise the extra ports are treated like ordinary ports.

The number of sockets isn't limited when using systemd socket activation
(except by operating system limits on file descriptors and a minimal
amount of memory used per file descriptor). The systemd sockets passed
can include any ownership or permissions, including those the
mariadbd process wouldn't normally have the permission to create.

This implementation is compatible with mariadb.service definitions.
Those services started with:

  systemctl start mariadb.service

does actually start the mariadb.service and used all the my.cnf
settings of sockets and ports like it previously did.
2021-03-28 13:53:55 +11:00
Marko Mäkelä
00528a0445 Merge 10.5 into 10.6 2021-03-19 13:35:18 +02:00
Marko Mäkelä
be881ec457 Merge 10.4 into 10.5 2021-03-19 13:09:21 +02:00
Marko Mäkelä
44d70c01f0 Merge 10.3 into 10.4 2021-03-19 11:42:44 +02:00
Marko Mäkelä
19052b6deb Merge 10.2 into 10.3 2021-03-18 12:34:48 +02:00
Julius Goryavsky
7345d37141 MDEV-24853: Duplicate key generated during cluster configuration change
Incorrect processing of an auto-incrementing field in the
WSREP-related code during applying transactions results in
a duplicate key being created. This is due to the fact that
at the beginning of the write_row() and update_row() functions,
the values of the auto-increment parameters are used, which
are read from the parameters of the current thread, but further
along the code other values are used, which are read from global
variables (when applying a transaction). This can happen when
the cluster configuration has changed while applying a transaction
(for example in the high_priority_service mode for Galera 4).
Further during IST processing duplicating key is detected, and
processing of the DB_DUPLICATE_KEY return code (inside innodb,
in the write_row() handler) results in a call to the
wsrep_thd_self_abort() function.
2021-03-08 11:15:08 +01:00
Rinat Ibragimov
b3abcf80a1 MDEV-6536: make --bind=hostname to listen on both IPv6 and IPv4 addresses
Binding to a hostname now makes MariaDB server to listen on all addresses
that hostname resolves to.

Rebased to 10.6 by Daniel Black

Closes: #1668
2021-03-05 08:25:52 +11:00
Marko Mäkelä
80ac9ec1cc MDEV-24973 Performance schema duplicates rarely executed code for mutex operations
The PERFORMANCE_SCHEMA wrapper for mutex and rw-lock operations is
causing a lot of unlikely code to be inlined in each invocation.
The impact of this may have been emphasized in MariaDB 10.6, because
InnoDB now uses the common implementation of mutexes and condition
variables (MDEV-21452).

By default, we build with cmake -DPLUGIN_PERFSCHEMA enabled,
but at runtime no instrumentation will be enabled. Similar to
commit eba2d10ac5
we had better avoid inlining the rarely executed code in order to reduce
the code size and to improve the efficiency of the instruction cache.

This change was extensively tested by Axel Schwenke with and without
--enable-performance-schema (with no individual instruments enabled).
Removing the inline functions did not cause any performance regression
in either case. There seemed to be a tiny improvement, possibly due
to reduced code size and better instruction cache hit rate.
2021-03-02 14:32:37 +02:00
Sergei Golubchik
25d9d2e37f Merge branch 'bb-10.4-release' into bb-10.5-release 2021-02-15 16:43:15 +01:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Monty
bd5ac03896 Make maria_data_root const char*
This allow one to remove some casts like:
maria_data_root= (char *)".";

It also removes warnings from icc.
2021-02-08 12:16:29 +02:00
Sergei Golubchik
2676c9aad7 galera fixes related to THD::LOCK_thd_kill
Since 2017 (c2118a08b1) THD::awake() no longer requires LOCK_thd_data.
It uses LOCK_thd_kill, and this latter mutex is used to prevent
a thread of dying, not LOCK_thd_data as before.
2021-02-02 10:02:17 +01:00
Oleksandr Byelkin
02e7bff882 Merge commit '10.4' into 10.5 2021-01-06 10:53:00 +01:00
Oleksandr Byelkin
478b83032b Merge branch '10.3' into 10.4 2020-12-25 09:13:28 +01:00
Oleksandr Byelkin
25561435e0 Merge branch '10.2' into 10.3 2020-12-23 19:28:02 +01:00
Etienne Guesnet
2c7247622a AIX workaround for GCC TOC bug 2020-12-16 08:07:04 +11:00
Sergei Golubchik
e189faf0b3 document that a fulltext parser plugin can replace mysql_add_word callback 2020-12-10 08:45:20 +01:00
Marko Mäkelä
7cffb5f6e8 MDEV-23399: Performance regression with write workloads
The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted
the performance bottleneck to the page flushing.

The configuration parameters will be changed as follows:

innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction)
innodb_lru_scan_depth=1536 (old: 1024)
innodb_max_dirty_pages_pct=90 (old: 75)
innodb_max_dirty_pages_pct_lwm=75 (old: 0)

Note: The parameter innodb_lru_scan_depth will only affect LRU
eviction of buffer pool pages when a new page is being allocated. The
page cleaner thread will no longer evict any pages. It used to
guarantee that some pages will remain free in the buffer pool. Now, we
perform that eviction 'on demand' in buf_LRU_get_free_block().
The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows:
 * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks()
 * As a buf_pool.free limit in buf_LRU_list_batch() for terminating
   the flushing that is initiated e.g., by buf_LRU_get_free_block()
The parameter also used to serve as an initial limit for unzip_LRU
eviction (evicting uncompressed page frames while retaining
ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit
of 100 or unlimited for invoking buf_LRU_scan_and_free_block().

The status variables will be changed as follows:

innodb_buffer_pool_pages_flushed: This includes also the count of
innodb_buffer_pool_pages_LRU_flushed and should work reliably,
updated one by one in buf_flush_page() to give more real-time
statistics. The function buf_flush_stats(), which we are removing,
was not called in every code path. For both counters, we will use
regular variables that are incremented in a critical section of
buf_pool.mutex. Note that show_innodb_vars() directly links to the
variables, and reads of the counters will *not* be protected by
buf_pool.mutex, so you cannot get a consistent snapshot of both variables.

The following INFORMATION_SCHEMA.INNODB_METRICS counters will be
removed, because the page cleaner no longer deals with writing or
evicting least recently used pages, and because the single-page writes
have been removed:
* buffer_LRU_batch_flush_avg_time_slot
* buffer_LRU_batch_flush_avg_time_thread
* buffer_LRU_batch_flush_avg_time_est
* buffer_LRU_batch_flush_avg_pass
* buffer_LRU_single_flush_scanned
* buffer_LRU_single_flush_num_scan
* buffer_LRU_single_flush_scanned_per_call

When moving to a single buffer pool instance in MDEV-15058, we missed
some opportunity to simplify the buf_flush_page_cleaner thread. It was
unnecessarily using a mutex and some complex data structures, even
though we always have a single page cleaner thread.

Furthermore, the buf_flush_page_cleaner thread had separate 'recovery'
and 'shutdown' modes where it was waiting to be triggered by some
other thread, adding unnecessary latency and potential for hangs in
relatively rarely executed startup or shutdown code.

The page cleaner was also running two kinds of batches in an
interleaved fashion: "LRU flush" (writing out some least recently used
pages and evicting them on write completion) and the normal batches
that aim to increase the MIN(oldest_modification) in the buffer pool,
to help the log checkpoint advance.

The buf_pool.flush_list flushing was being blocked by
buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN
of a page is ahead of log_sys.get_flushed_lsn(), that is, what has
been persistently written to the redo log, we would trigger a log
flush and then resume the page flushing. This would unnecessarily
limit the performance of the page cleaner thread and trigger the
infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms.
The settings might not be optimal" that were suppressed in
commit d1ab89037a unless log_warnings>2.

Our revised algorithm will make log_sys.get_flushed_lsn() advance at
the start of buf_flush_lists(), and then execute a 'best effort' to
write out all pages. The flush batches will skip pages that were modified
since the log was written, or are are currently exclusively locked.
The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message
will be removed, because by design, the buf_flush_page_cleaner() should
not be blocked during a batch for extended periods of time.

We will remove the single-page flushing altogether. Related to this,
the debug parameter innodb_doublewrite_batch_size will be removed,
because all of the doublewrite buffer will be used for flushing
batches. If a page needs to be evicted from the buffer pool and all
100 least recently used pages in the buffer pool have unflushed
changes, buf_LRU_get_free_block() will execute buf_flush_lists() to
write out and evict innodb_lru_flush_size pages. At most one thread
will execute buf_flush_lists() in buf_LRU_get_free_block(); other
threads will wait for that LRU flushing batch to finish.

To improve concurrency, we will replace the InnoDB ib_mutex_t and
os_event_t native mutexes and condition variables in this area of code.
Most notably, this means that the buffer pool mutex (buf_pool.mutex)
is no longer instrumented via any InnoDB interfaces. It will continue
to be instrumented via PERFORMANCE_SCHEMA.

For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be
declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical
sections of buf_pool.flush_list_mutex should be shorter than those for
buf_pool.mutex, because in the worst case, they cover a linear scan of
buf_pool.flush_list, while the worst case of a critical section of
buf_pool.mutex covers a linear scan of the potentially much longer
buf_pool.LRU list.

mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable
with SAFE_MUTEX. Some InnoDB debug assertions need this predicate
instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner().

buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list:
Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[].
The number of active flush operations.

buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t
instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA
and SAFE_MUTEX instrumentation.

buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU.

buf_pool_t::done_flush_list: Condition variable for !n_flush_list.

buf_pool_t::do_flush_list: Condition variable to wake up the
buf_flush_page_cleaner when a log checkpoint needs to be written
or the server is being shut down. Replaces buf_flush_event.
We will keep using timed waits (the page cleaner thread will wake
_at least_ once per second), because the calculations for
innodb_adaptive_flushing depend on fixed time intervals.

buf_dblwr: Allocate statically, and move all code to member functions.
Use a native mutex and condition variable. Remove code to deal with
single-page flushing.

buf_dblwr_check_block(): Make the check debug-only. We were spending
a significant amount of execution time in page_simple_validate_new().

flush_counters_t::unzip_LRU_evicted: Remove.

IORequest: Make more members const. FIXME: m_fil_node should be removed.

buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex
(which we are removing).

page_cleaner_slot_t, page_cleaner_t: Remove many redundant members.

pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot().

recv_writer_thread: Remove. Recovery works just fine without it, if we
simply invoke buf_flush_sync() at the end of each batch in
recv_sys_t::apply().

recv_recovery_from_checkpoint_finish(): Remove. We can simply call
recv_sys.debug_free() directly.

srv_started_redo: Replaces srv_start_state.

SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown()
can communicate with the normal page cleaner loop via the new function
flush_buffer_pool().

buf_flush_remove(): Assert that the calling thread is holding
buf_pool.flush_list_mutex. This removes unnecessary mutex operations
from buf_flush_remove_pages() and buf_flush_dirty_pages(),
which replace buf_LRU_flush_or_remove_pages().

buf_flush_lists(): Renamed from buf_flush_batch(), with simplified
interface. Return the number of flushed pages. Clarified comments and
renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions
buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this
function, which was their only caller, and remove 2 unnecessary
buf_pool.mutex release/re-acquisition that we used to perform around
the buf_flush_batch() call. At the start, if not all log has been
durably written, wait for a background task to do it, or start a new
task to do it. This allows the log write to run concurrently with our
page flushing batch. Any pages that were skipped due to too recent
FIL_PAGE_LSN or due to them being latched by a writer should be flushed
during the next batch, unless there are further modifications to those
pages. It is possible that a page that we must flush due to small
oldest_modification also carries a recent FIL_PAGE_LSN or is being
constantly modified. In the worst case, all writers would then end up
waiting in log_free_check() to allow the flushing and the checkpoint
to complete.

buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_flush_space(): Auxiliary function to look up a tablespace for
page flushing.

buf_flush_page(): Defer the computation of space->full_crc32(). Never
call log_write_up_to(), but instead skip persistent pages whose latest
modification (FIL_PAGE_LSN) is newer than the redo log. Also skip
pages on which we cannot acquire a shared latch without waiting.

buf_flush_try_neighbors(): Do not bother checking buf_fix_count
because buf_flush_page() will no longer wait for the page latch.
Take the tablespace as a parameter, and only execute this function
when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold().

buf_flush_relocate_on_flush_list(): Declare as cold, and push down
a condition from the callers.

buf_flush_check_neighbor(): Take id.fold() as a parameter.

buf_flush_sync(): Ensure that the buf_pool.flush_list is empty,
because the flushing batch will skip pages whose modifications have
not yet been written to the log or were latched for modification.

buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables.

buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize
the counters, and report n->evicted.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_do_LRU_batch(): Return the number of pages flushed.

buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if
adaptive hash index entries are pointing to the block.

buf_LRU_get_free_block(): Do not wake up the page cleaner, because it
will no longer perform any useful work for us, and we do not want it
to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0)
writes out and evicts at most innodb_lru_flush_size pages. (The
function buf_do_LRU_batch() may complete after writing fewer pages if
more than innodb_lru_scan_depth pages end up in buf_pool.free list.)
Eliminate some mutex release-acquire cycles, and wait for the LRU
flush batch to complete before rescanning.

buf_LRU_check_size_of_non_data_objects(): Simplify the code.

buf_page_write_complete(): Remove the parameter evict, and always
evict pages that were part of an LRU flush.

buf_page_create(): Take a pre-allocated page as a parameter.

buf_pool_t::free_block(): Free a pre-allocated block.

recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block
while not holding recv_sys.mutex. During page allocation, we may
initiate a page flush, which in turn may initiate a log flush, which
would require acquiring log_sys.mutex, which should always be acquired
before recv_sys.mutex in order to avoid deadlocks. Therefore, we must
not be holding recv_sys.mutex while allocating a buffer pool block.

BtrBulk::logFreeCheck(): Skip a redundant condition.

row_undo_step(): Do not invoke srv_inc_activity_count() for every row
that is being rolled back. It should suffice to invoke the function in
trx_flush_log_if_needed() during trx_t::commit_in_memory() when the
rollback completes.

sync_check_enable(): Remove. We will enable innodb_sync_debug from the
very beginning.

Reviewed by: Vladislav Vaintroub
2020-10-15 17:04:56 +03:00
Marko Mäkelä
882ce206db Merge 10.4 into 10.5 2020-09-23 11:32:43 +03:00
Marko Mäkelä
3a423088ac Merge 10.3 into 10.4 2020-09-21 12:29:00 +03:00
Marko Mäkelä
cbcb4ecabb Merge 10.2 into 10.3 2020-09-21 11:04:04 +03:00
Jan Lindström
224c950462 MDEV-23101 : SIGSEGV in lock_rec_unlock() when Galera is enabled
Remove incorrect BF (brute force) handling from lock_rec_has_to_wait_in_queue
and move condition to correct callers. Add a function to report
BF lock waits and assert if incorrect BF-BF lock wait happens.

wsrep_report_bf_lock_wait
	Add a new function to report BF lock wait.

wsrep_assert_no_bf_bf_wait
	Add a new function to check do we have a
	BF-BF wait and if we have report this case
	and assert as it is a bug.

lock_rec_has_to_wait
	Use new wsrep_assert_bf_wait to check BF-BF wait.

lock_rec_create_low
lock_table_create
	Use new function to report BF lock waits.

lock_rec_insert_by_trx_age
lock_grant_and_move_on_page
lock_grant_and_move_on_rec
	Assert that trx is not Galera as VATS is not compatible
	with Galera.

lock_rec_add_to_queue
	If there is conflicting lock in a queue make sure that
	transaction is BF.

lock_rec_has_to_wait_in_queue
	Remove incorrect BF handling. If there is conflicting
	locks in a queue all transactions must wait.

lock_rec_dequeue_from_page
lock_rec_unlock
	If there is conflicting lock make sure it is not
	BF-BF case.

lock_rec_queue_validate
	Add Galera record locking rules comment and use
	new function to report BF lock waits.

All attempts to reproduce the original assertion have been
failed. Therefore, there is no test case on this commit.
2020-09-10 13:18:12 +03:00
sjaakola
7bffe468b2 MDEV-21910 Deadlock between BF abort and manual KILL command
When high priority replication slave applier encounters lock conflict in innodb,
it will force the conflicting lock holder transaction (victim) to rollback.
This is a must in multi-master sychronous replication model to avoid cluster lock-up.
This high priority victim abort (aka "brute force" (BF) abort), is started
from innodb lock manager while holding the victim's transaction's (trx) mutex.
Depending on the execution state of the victim transaction, it may happen that the
BF abort will call for THD::awake() to wake up the victim transaction for the rollback.
Now, if BF abort requires THD::awake() to be called, then the applier thread executed
locking protocol of: victim trx mutex -> victim THD::LOCK_thd_data

If, at the same time another DBMS super user issues KILL command to abort the same victim,
it will execute locking protocol of: victim THD::LOCK_thd_data  -> victim trx mutex.
These two locking protocol acquire mutexes in opposite order, hence unresolvable mutex locking
deadlock may occur.

The fix in this commit adds THD::wsrep_aborter flag to synchronize who can kill the victim
This flag is set both when BF is called for from innodb and by KILL command.
Either path of victim killing will bail out if victim's wsrep_killed is already
set to avoid mutex conflicts with the other aborter execution. THD::wsrep_aborter
records the aborter THD's ID. This is needed to preserve the right to kill
the victim from different locations for the same aborter thread.
It is also good error logging, to see who is reponsible for the abort.

A new test case was added in galera.galera_bf_kill_debug.test for scenario where
wsrep applier thread and manual KILL command try to kill same idle victim
2020-07-22 08:20:10 +03:00
Julius Goryavsky
956f21c3b0 Merge remote-tracking branch 'origin/bb-10.4-MDEV-21910' into 10.4 2020-07-16 13:03:29 +02:00
Marko Mäkelä
e67daa5653 Merge 10.4 into 10.5 2020-07-15 14:51:22 +03:00