mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-30 11:22:14 +03:00

Author	SHA1	Message	Date
Vladislav Vaintroub	f0fa40efad	MDEV-25785 Add support for OpenSSL 3.0 Summary of changes - MD_CTX_SIZE is increased - EVP_CIPHER_CTX_buf_noconst(ctx) does not work anymore, points to nobody knows where. The assumption made previously was that (since the function does not seem to be documented) was that it points to the last partial source block. Add own partial block buffer for NOPAD encryption instead - SECLEVEL in CipherString in openssl.cnf had been downgraded to 0, from 1, to make TLSv1.0 and TLSv1.1 possible (according to https://github.com/openssl/openssl/blob/openssl-3.0.0/NEWS.md even though the manual for SSL_CTX_get_security_level claims that it should not be necessary) - Workaround Ssl_cipher_list issue, it now returns TLSv1.3 ciphers, in addition to what was set in --ssl-cipher - ctx_buf buffer now must be aligned to 16 bytes with openssl( previously with WolfSSL only), ot crashes will happen - updated aes-t , to be better debuggable using function, rather than a huge multiline macro added test that does "nopad" encryption piece-wise, to test replacement of EVP_CIPHER_CTX_buf_noconst part of MDEV-28133	2022-05-23 15:27:51 +02:00
Marko Mäkelä	92f79a22e6	Merge 10.5 into 10.6	2022-02-22 12:12:49 +02:00
Vlad Lesin	a112a80b47	Merge 10.4 into 10.5	2022-02-22 10:35:16 +03:00
Vlad Lesin	f6f055a191	Merge 10.3 into 10.4	2022-02-21 14:10:27 +03:00
Nayuta Yanagisawa	66f55a018b	MDEV-27730 Add PLUGIN_VAR_DEPRECATED flag to plugin variables The sys_var class has the deprecation_substitute member to mark the deprecated variables. As it's set, the server produces warnings when these variables are used. However, the plugin has no means to utilize that functionality. So, the PLUGIN_VAR_DEPRECATED flag is introduced to set the deprecation_substitute with the empty string. A non-empty string can make the warning more informative, but there's no nice way seen to specify it, and not that needed at the moment.	2022-02-18 13:10:20 +09:00
Oleksandr Byelkin	f5c5f8e41e	Merge branch '10.5' into 10.6	2022-02-03 17:01:31 +01:00
Oleksandr Byelkin	cf63eecef4	Merge branch '10.4' into 10.5	2022-02-01 20:33:04 +01:00
Oleksandr Byelkin	880d543554	Merge branch 'merge-perfschema-5.7' into 10.5	2022-01-28 11:57:52 +01:00
Sergei Golubchik	7b555ff2c5	MDEV-27341 Use SET PASSWORD to change PAM service SET PASSWORD = PASSWORD('foo') would fail for pam plugin with ERROR HY000: SET PASSWORD is ignored for users authenticating via pam plugin but SET PASSWORD = 'foo' would not. Now it will.	2022-01-17 18:19:29 +01:00
Marko Mäkelä	3f5726768f	Merge 10.5 into 10.6	2022-01-04 09:26:38 +02:00
Julius Goryavsky	55bb933a88	Merge branch 10.4 into 10.5	2021-12-26 12:51:04 +01:00
sjaakola	c1846c4fcf	MDEV-26803 PA unsafety with FK cascade delete operation This commit has a mtr test where two two transactions delete a row from two separate tables, which will cascade a FK delete for the same row in a third table. Second replica node is configured with 2 applier threads, and the test will fail if these two transactions are applied in parallel. The actual fix, in this commit, is to mark a transaction as unsafe for parallel applying when it traverses into cascade delete operation. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-12-17 09:38:23 +02:00
Marko Mäkelä	e94172c2a0	Merge 10.5 into 10.6	2021-08-31 11:00:41 +03:00
Marko Mäkelä	e62120cec7	Merge 10.4 into 10.5	2021-08-31 10:04:56 +03:00
Marko Mäkelä	0464761126	Merge 10.3 into 10.4	2021-08-31 09:22:21 +03:00
Marko Mäkelä	e835cc851e	Merge 10.2 into 10.3	2021-08-31 08:36:59 +03:00
Marko Mäkelä	fda704c82c	Fix GCC 11 -Wmaybe-uninitialized for PLUGIN_PERFSCHEMA init_mutex_v1_t: Stop lying that the mutex parameter is const. GCC 11.2.0 assumes that it is and could complain about any mysql_mutex_t being uninitialized even after mysql_mutex_init() as long as PLUGIN_PERFSCHEMA is enabled. init_rwlock_v1_t, init_cond_v1_t: Remove untruthful const qualifiers. Note: init_socket_v1_t is expecting that the socket fd has already been created before PSI_SOCKET_CALL(init_socket), and therefore that parameter really is being treated as a pointer to const.	2021-08-30 11:52:59 +03:00
Oleksandr Byelkin	6efb5e9f5e	Merge branch '10.5' into 10.6	2021-08-02 10:11:41 +02:00
Oleksandr Byelkin	ae6bdc6769	Merge branch '10.4' into 10.5	2021-07-31 23:19:51 +02:00
Michael Okoko	6cd3588f0e	Improve documentation of json parser functions Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2021-07-22 21:51:49 +03:00
Marko Mäkelä	4dfec8b230	Merge 10.5 into 10.6	2021-06-21 17:49:33 +03:00
Marko Mäkelä	a42c80bd48	Merge 10.4 into 10.5	2021-06-21 14:22:22 +03:00
Monty	af33202af7	Added DDL_options_st *thd_ddl_options(const MYSQL_THD thd) This is used by InnoDB to detect if CREATE...SELECT is used Other things: - Changed InnoDB to use thd_ddl_options() - Removed lock checking code for create...select (Approved by Marko)	2021-06-14 17:03:19 +03:00
Vladislav Vaintroub	b81803f065	MDEV-22221: MariaDB with WolfSSL doesn't support AES-GCM cipher for SSL Enable AES-GCM for SSL (only). AES-GCM for encryption plugins remains disabled (aes-t fails, on some bug in GCM or CTR padding)	2021-06-09 15:44:55 +02:00
Sergei Golubchik	0b116d160a	5.7.34	2021-05-03 11:22:07 +02:00
Daniel Black	460d480c74	MDEV-5536: add systemd socket activation Systemd has a socket activation feature where a mariadb.socket definition defines the sockets to listen to, and passes those file descriptors directly to mariadbd to use when a connection occurs. The new functionality is utilized when starting as follows: systemctl start mariadb.socket The mariadb.socket definition only needs to contain the network information, ListenStream= directives, the mariadb.service definition is still used for service instigation. When mariadbd is started in this way, the socket, port, bind-address backlog are all assumed to be self contained in the mariadb.socket definition and as such the mariadb settings and command line arguments of these network settings are ignored. See man systemd.socket for how to limit this to specific ports. Extra ports, those specified with extra_port in socket activation mode, are those with a FileDescriptorName=extra. These need to be in a separate service name like mariadb-extra.socket and these require a Service={mariadb.service} directive to map to the original service. Extra ports need systemd v227 or greater (not RHEL/Centos7 - v219) when FileDescriptorName= was added, otherwise the extra ports are treated like ordinary ports. The number of sockets isn't limited when using systemd socket activation (except by operating system limits on file descriptors and a minimal amount of memory used per file descriptor). The systemd sockets passed can include any ownership or permissions, including those the mariadbd process wouldn't normally have the permission to create. This implementation is compatible with mariadb.service definitions. Those services started with: systemctl start mariadb.service does actually start the mariadb.service and used all the my.cnf settings of sockets and ports like it previously did.	2021-03-28 13:53:55 +11:00
Marko Mäkelä	00528a0445	Merge 10.5 into 10.6	2021-03-19 13:35:18 +02:00
Marko Mäkelä	be881ec457	Merge 10.4 into 10.5	2021-03-19 13:09:21 +02:00
Marko Mäkelä	44d70c01f0	Merge 10.3 into 10.4	2021-03-19 11:42:44 +02:00
Marko Mäkelä	19052b6deb	Merge 10.2 into 10.3	2021-03-18 12:34:48 +02:00
Julius Goryavsky	7345d37141	MDEV-24853: Duplicate key generated during cluster configuration change Incorrect processing of an auto-incrementing field in the WSREP-related code during applying transactions results in a duplicate key being created. This is due to the fact that at the beginning of the write_row() and update_row() functions, the values of the auto-increment parameters are used, which are read from the parameters of the current thread, but further along the code other values are used, which are read from global variables (when applying a transaction). This can happen when the cluster configuration has changed while applying a transaction (for example in the high_priority_service mode for Galera 4). Further during IST processing duplicating key is detected, and processing of the DB_DUPLICATE_KEY return code (inside innodb, in the write_row() handler) results in a call to the wsrep_thd_self_abort() function.	2021-03-08 11:15:08 +01:00
Rinat Ibragimov	b3abcf80a1	MDEV-6536: make --bind=hostname to listen on both IPv6 and IPv4 addresses Binding to a hostname now makes MariaDB server to listen on all addresses that hostname resolves to. Rebased to 10.6 by Daniel Black Closes: #1668	2021-03-05 08:25:52 +11:00
Marko Mäkelä	80ac9ec1cc	MDEV-24973 Performance schema duplicates rarely executed code for mutex operations The PERFORMANCE_SCHEMA wrapper for mutex and rw-lock operations is causing a lot of unlikely code to be inlined in each invocation. The impact of this may have been emphasized in MariaDB 10.6, because InnoDB now uses the common implementation of mutexes and condition variables (MDEV-21452). By default, we build with cmake -DPLUGIN_PERFSCHEMA enabled, but at runtime no instrumentation will be enabled. Similar to commit `eba2d10ac5` we had better avoid inlining the rarely executed code in order to reduce the code size and to improve the efficiency of the instruction cache. This change was extensively tested by Axel Schwenke with and without --enable-performance-schema (with no individual instruments enabled). Removing the inline functions did not cause any performance regression in either case. There seemed to be a tiny improvement, possibly due to reduced code size and better instruction cache hit rate.	2021-03-02 14:32:37 +02:00
Sergei Golubchik	25d9d2e37f	Merge branch 'bb-10.4-release' into bb-10.5-release	2021-02-15 16:43:15 +01:00
Sergei Golubchik	00a313ecf3	Merge branch 'bb-10.3-release' into bb-10.4-release Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution" was null-merged. 10.4 version of the fix is coming up separately	2021-02-12 17:44:22 +01:00
Monty	bd5ac03896	Make maria_data_root const char* This allow one to remove some casts like: maria_data_root= (char *)"."; It also removes warnings from icc.	2021-02-08 12:16:29 +02:00
Sergei Golubchik	2676c9aad7	galera fixes related to THD::LOCK_thd_kill Since 2017 (`c2118a08b1`) THD::awake() no longer requires LOCK_thd_data. It uses LOCK_thd_kill, and this latter mutex is used to prevent a thread of dying, not LOCK_thd_data as before.	2021-02-02 10:02:17 +01:00
Oleksandr Byelkin	02e7bff882	Merge commit '10.4' into 10.5	2021-01-06 10:53:00 +01:00
Oleksandr Byelkin	478b83032b	Merge branch '10.3' into 10.4	2020-12-25 09:13:28 +01:00
Oleksandr Byelkin	25561435e0	Merge branch '10.2' into 10.3	2020-12-23 19:28:02 +01:00
Etienne Guesnet	2c7247622a	AIX workaround for GCC TOC bug	2020-12-16 08:07:04 +11:00
Sergei Golubchik	e189faf0b3	document that a fulltext parser plugin can replace mysql_add_word callback	2020-12-10 08:45:20 +01:00
Marko Mäkelä	7cffb5f6e8	MDEV-23399: Performance regression with write workloads The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted the performance bottleneck to the page flushing. The configuration parameters will be changed as follows: innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction) innodb_lru_scan_depth=1536 (old: 1024) innodb_max_dirty_pages_pct=90 (old: 75) innodb_max_dirty_pages_pct_lwm=75 (old: 0) Note: The parameter innodb_lru_scan_depth will only affect LRU eviction of buffer pool pages when a new page is being allocated. The page cleaner thread will no longer evict any pages. It used to guarantee that some pages will remain free in the buffer pool. Now, we perform that eviction 'on demand' in buf_LRU_get_free_block(). The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows: * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks() * As a buf_pool.free limit in buf_LRU_list_batch() for terminating the flushing that is initiated e.g., by buf_LRU_get_free_block() The parameter also used to serve as an initial limit for unzip_LRU eviction (evicting uncompressed page frames while retaining ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit of 100 or unlimited for invoking buf_LRU_scan_and_free_block(). The status variables will be changed as follows: innodb_buffer_pool_pages_flushed: This includes also the count of innodb_buffer_pool_pages_LRU_flushed and should work reliably, updated one by one in buf_flush_page() to give more real-time statistics. The function buf_flush_stats(), which we are removing, was not called in every code path. For both counters, we will use regular variables that are incremented in a critical section of buf_pool.mutex. Note that show_innodb_vars() directly links to the variables, and reads of the counters will not be protected by buf_pool.mutex, so you cannot get a consistent snapshot of both variables. The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed, because the page cleaner no longer deals with writing or evicting least recently used pages, and because the single-page writes have been removed: * buffer_LRU_batch_flush_avg_time_slot * buffer_LRU_batch_flush_avg_time_thread * buffer_LRU_batch_flush_avg_time_est * buffer_LRU_batch_flush_avg_pass * buffer_LRU_single_flush_scanned * buffer_LRU_single_flush_num_scan * buffer_LRU_single_flush_scanned_per_call When moving to a single buffer pool instance in MDEV-15058, we missed some opportunity to simplify the buf_flush_page_cleaner thread. It was unnecessarily using a mutex and some complex data structures, even though we always have a single page cleaner thread. Furthermore, the buf_flush_page_cleaner thread had separate 'recovery' and 'shutdown' modes where it was waiting to be triggered by some other thread, adding unnecessary latency and potential for hangs in relatively rarely executed startup or shutdown code. The page cleaner was also running two kinds of batches in an interleaved fashion: "LRU flush" (writing out some least recently used pages and evicting them on write completion) and the normal batches that aim to increase the MIN(oldest_modification) in the buffer pool, to help the log checkpoint advance. The buf_pool.flush_list flushing was being blocked by buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN of a page is ahead of log_sys.get_flushed_lsn(), that is, what has been persistently written to the redo log, we would trigger a log flush and then resume the page flushing. This would unnecessarily limit the performance of the page cleaner thread and trigger the infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms. The settings might not be optimal" that were suppressed in commit `d1ab89037a` unless log_warnings>2. Our revised algorithm will make log_sys.get_flushed_lsn() advance at the start of buf_flush_lists(), and then execute a 'best effort' to write out all pages. The flush batches will skip pages that were modified since the log was written, or are are currently exclusively locked. The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message will be removed, because by design, the buf_flush_page_cleaner() should not be blocked during a batch for extended periods of time. We will remove the single-page flushing altogether. Related to this, the debug parameter innodb_doublewrite_batch_size will be removed, because all of the doublewrite buffer will be used for flushing batches. If a page needs to be evicted from the buffer pool and all 100 least recently used pages in the buffer pool have unflushed changes, buf_LRU_get_free_block() will execute buf_flush_lists() to write out and evict innodb_lru_flush_size pages. At most one thread will execute buf_flush_lists() in buf_LRU_get_free_block(); other threads will wait for that LRU flushing batch to finish. To improve concurrency, we will replace the InnoDB ib_mutex_t and os_event_t native mutexes and condition variables in this area of code. Most notably, this means that the buffer pool mutex (buf_pool.mutex) is no longer instrumented via any InnoDB interfaces. It will continue to be instrumented via PERFORMANCE_SCHEMA. For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical sections of buf_pool.flush_list_mutex should be shorter than those for buf_pool.mutex, because in the worst case, they cover a linear scan of buf_pool.flush_list, while the worst case of a critical section of buf_pool.mutex covers a linear scan of the potentially much longer buf_pool.LRU list. mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable with SAFE_MUTEX. Some InnoDB debug assertions need this predicate instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner(). buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list: Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[]. The number of active flush operations. buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA and SAFE_MUTEX instrumentation. buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU. buf_pool_t::done_flush_list: Condition variable for !n_flush_list. buf_pool_t::do_flush_list: Condition variable to wake up the buf_flush_page_cleaner when a log checkpoint needs to be written or the server is being shut down. Replaces buf_flush_event. We will keep using timed waits (the page cleaner thread will wake _at least_ once per second), because the calculations for innodb_adaptive_flushing depend on fixed time intervals. buf_dblwr: Allocate statically, and move all code to member functions. Use a native mutex and condition variable. Remove code to deal with single-page flushing. buf_dblwr_check_block(): Make the check debug-only. We were spending a significant amount of execution time in page_simple_validate_new(). flush_counters_t::unzip_LRU_evicted: Remove. IORequest: Make more members const. FIXME: m_fil_node should be removed. buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex (which we are removing). page_cleaner_slot_t, page_cleaner_t: Remove many redundant members. pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot(). recv_writer_thread: Remove. Recovery works just fine without it, if we simply invoke buf_flush_sync() at the end of each batch in recv_sys_t::apply(). recv_recovery_from_checkpoint_finish(): Remove. We can simply call recv_sys.debug_free() directly. srv_started_redo: Replaces srv_start_state. SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown() can communicate with the normal page cleaner loop via the new function flush_buffer_pool(). buf_flush_remove(): Assert that the calling thread is holding buf_pool.flush_list_mutex. This removes unnecessary mutex operations from buf_flush_remove_pages() and buf_flush_dirty_pages(), which replace buf_LRU_flush_or_remove_pages(). buf_flush_lists(): Renamed from buf_flush_batch(), with simplified interface. Return the number of flushed pages. Clarified comments and renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this function, which was their only caller, and remove 2 unnecessary buf_pool.mutex release/re-acquisition that we used to perform around the buf_flush_batch() call. At the start, if not all log has been durably written, wait for a background task to do it, or start a new task to do it. This allows the log write to run concurrently with our page flushing batch. Any pages that were skipped due to too recent FIL_PAGE_LSN or due to them being latched by a writer should be flushed during the next batch, unless there are further modifications to those pages. It is possible that a page that we must flush due to small oldest_modification also carries a recent FIL_PAGE_LSN or is being constantly modified. In the worst case, all writers would then end up waiting in log_free_check() to allow the flushing and the checkpoint to complete. buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_flush_space(): Auxiliary function to look up a tablespace for page flushing. buf_flush_page(): Defer the computation of space->full_crc32(). Never call log_write_up_to(), but instead skip persistent pages whose latest modification (FIL_PAGE_LSN) is newer than the redo log. Also skip pages on which we cannot acquire a shared latch without waiting. buf_flush_try_neighbors(): Do not bother checking buf_fix_count because buf_flush_page() will no longer wait for the page latch. Take the tablespace as a parameter, and only execute this function when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold(). buf_flush_relocate_on_flush_list(): Declare as cold, and push down a condition from the callers. buf_flush_check_neighbor(): Take id.fold() as a parameter. buf_flush_sync(): Ensure that the buf_pool.flush_list is empty, because the flushing batch will skip pages whose modifications have not yet been written to the log or were latched for modification. buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables. buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize the counters, and report n->evicted. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_do_LRU_batch(): Return the number of pages flushed. buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if adaptive hash index entries are pointing to the block. buf_LRU_get_free_block(): Do not wake up the page cleaner, because it will no longer perform any useful work for us, and we do not want it to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0) writes out and evicts at most innodb_lru_flush_size pages. (The function buf_do_LRU_batch() may complete after writing fewer pages if more than innodb_lru_scan_depth pages end up in buf_pool.free list.) Eliminate some mutex release-acquire cycles, and wait for the LRU flush batch to complete before rescanning. buf_LRU_check_size_of_non_data_objects(): Simplify the code. buf_page_write_complete(): Remove the parameter evict, and always evict pages that were part of an LRU flush. buf_page_create(): Take a pre-allocated page as a parameter. buf_pool_t::free_block(): Free a pre-allocated block. recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block while not holding recv_sys.mutex. During page allocation, we may initiate a page flush, which in turn may initiate a log flush, which would require acquiring log_sys.mutex, which should always be acquired before recv_sys.mutex in order to avoid deadlocks. Therefore, we must not be holding recv_sys.mutex while allocating a buffer pool block. BtrBulk::logFreeCheck(): Skip a redundant condition. row_undo_step(): Do not invoke srv_inc_activity_count() for every row that is being rolled back. It should suffice to invoke the function in trx_flush_log_if_needed() during trx_t::commit_in_memory() when the rollback completes. sync_check_enable(): Remove. We will enable innodb_sync_debug from the very beginning. Reviewed by: Vladislav Vaintroub	2020-10-15 17:04:56 +03:00
Marko Mäkelä	882ce206db	Merge 10.4 into 10.5	2020-09-23 11:32:43 +03:00
Marko Mäkelä	3a423088ac	Merge 10.3 into 10.4	2020-09-21 12:29:00 +03:00
Marko Mäkelä	cbcb4ecabb	Merge 10.2 into 10.3	2020-09-21 11:04:04 +03:00
Jan Lindström	224c950462	MDEV-23101 : SIGSEGV in lock_rec_unlock() when Galera is enabled Remove incorrect BF (brute force) handling from lock_rec_has_to_wait_in_queue and move condition to correct callers. Add a function to report BF lock waits and assert if incorrect BF-BF lock wait happens. wsrep_report_bf_lock_wait Add a new function to report BF lock wait. wsrep_assert_no_bf_bf_wait Add a new function to check do we have a BF-BF wait and if we have report this case and assert as it is a bug. lock_rec_has_to_wait Use new wsrep_assert_bf_wait to check BF-BF wait. lock_rec_create_low lock_table_create Use new function to report BF lock waits. lock_rec_insert_by_trx_age lock_grant_and_move_on_page lock_grant_and_move_on_rec Assert that trx is not Galera as VATS is not compatible with Galera. lock_rec_add_to_queue If there is conflicting lock in a queue make sure that transaction is BF. lock_rec_has_to_wait_in_queue Remove incorrect BF handling. If there is conflicting locks in a queue all transactions must wait. lock_rec_dequeue_from_page lock_rec_unlock If there is conflicting lock make sure it is not BF-BF case. lock_rec_queue_validate Add Galera record locking rules comment and use new function to report BF lock waits. All attempts to reproduce the original assertion have been failed. Therefore, there is no test case on this commit.	2020-09-10 13:18:12 +03:00
sjaakola	7bffe468b2	MDEV-21910 Deadlock between BF abort and manual KILL command When high priority replication slave applier encounters lock conflict in innodb, it will force the conflicting lock holder transaction (victim) to rollback. This is a must in multi-master sychronous replication model to avoid cluster lock-up. This high priority victim abort (aka "brute force" (BF) abort), is started from innodb lock manager while holding the victim's transaction's (trx) mutex. Depending on the execution state of the victim transaction, it may happen that the BF abort will call for THD::awake() to wake up the victim transaction for the rollback. Now, if BF abort requires THD::awake() to be called, then the applier thread executed locking protocol of: victim trx mutex -> victim THD::LOCK_thd_data If, at the same time another DBMS super user issues KILL command to abort the same victim, it will execute locking protocol of: victim THD::LOCK_thd_data -> victim trx mutex. These two locking protocol acquire mutexes in opposite order, hence unresolvable mutex locking deadlock may occur. The fix in this commit adds THD::wsrep_aborter flag to synchronize who can kill the victim This flag is set both when BF is called for from innodb and by KILL command. Either path of victim killing will bail out if victim's wsrep_killed is already set to avoid mutex conflicts with the other aborter execution. THD::wsrep_aborter records the aborter THD's ID. This is needed to preserve the right to kill the victim from different locations for the same aborter thread. It is also good error logging, to see who is reponsible for the abort. A new test case was added in galera.galera_bf_kill_debug.test for scenario where wsrep applier thread and manual KILL command try to kill same idle victim	2020-07-22 08:20:10 +03:00
Julius Goryavsky	956f21c3b0	Merge remote-tracking branch 'origin/bb-10.4-MDEV-21910' into 10.4	2020-07-16 13:03:29 +02:00
Marko Mäkelä	e67daa5653	Merge 10.4 into 10.5	2020-07-15 14:51:22 +03:00

1 2 3 4 5 ...

516 Commits