mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-07-24 19:42:23 +03:00

Author	SHA1	Message	Date
Vladislav Vaintroub	061adae9a2	MDEV-16944 Fix file sharing issues on Windows in mysqltest On Windows systems, occurrences of ERROR_SHARING_VIOLATION due to conflicting share modes between processes accessing the same file can result in CreateFile failures. mysys' my_open() already incorporates a workaround by implementing wait/retry logic on Windows. But this does not help if files are opened using shell redirection like mysqltest traditionally did it, i.e via --echo exec "some text" > output_file In such cases, it is cmd.exe, that opens the output_file, and it won't do any sharing-violation retries. This commit addresses the issue by introducing a new built-in command, 'write_line', in mysqltest. This new command serves as a brief alternative to 'write_file', with a single line output, that also resolves variables like "exec" would. Internally, this command will use my_open(), and therefore retry-on-error logic. Hopefully this will eliminate the very sporadic "can't open file because it is used by another process" error on CI.	2024-04-17 16:52:37 +02:00
Oleksandr Byelkin	ac5a534a4c	Merge remote-tracking branch '10.4' into 10.5	2023-03-31 21:32:41 +02:00
Yuchen Pei	7c91082e39	MDEV-27912 Fixing inconsistency w.r.t. expect files in tests. mtr uses group suffix, but some existing inc and test files use server_id for expect files. This patch aims to fix that. For spider: With this change we will not have to maintain a separate version of restart_mysqld.inc for spider, that duplicates code, just because spider tests use different names for expect files, and shutdown_mysqld requires magical names for them. With this change spider tests will also be able to use other features provided by restart_mysqld.inc without code duplication, like the parameter $restart_parameters (see e.g. the testcase mdev_29904.test in commit ef1161e5d4f). Tests run after this change: default, spider, rocksdb, galera, using the following command mtr --parallel=auto --force --max-test-fail=0 --skip-core-file mtr --suite spider,spider/,spider//* \ --skip-test="spider/oracle.\|./t\..*" --parallel=auto --big-test \ --force --max-test-fail=0 --skip-core-file mtr --suite galera --parallel=auto mtr --suite rocksdb --parallel=auto	2023-03-22 11:55:57 +11:00
Marko Mäkelä	5c46751f23	MDEV-27734 Set innodb_change_buffering=none by default The aim of the InnoDB change buffer is to avoid delays when a leaf page of a secondary index is not present in the buffer pool, and a record needs to be inserted, delete-marked, or purged. Instead of reading the page into the buffer pool for making such a modification, we may insert a record to the change buffer (a special index tree in the InnoDB system tablespace). The buffered changes are guaranteed to be merged if the index page actually needs to be read later. The change buffer could be useful when the database is stored on a rotational medium (hard disk) where random seeks are slower than sequential reads or writes. Obviously, the change buffer will cause write amplification, due to potentially large amount of metadata that is being written to the change buffer. We will have to write redo log records for modifying the change buffer tree as well as the user tablespace. Furthermore, in the user tablespace, we must maintain a change buffer bitmap page that uses 2 bits for estimating the amount of free space in pages, and 1 bit to specify whether buffered changes exist. This bitmap needs to be updated on every operation, which could reduce performance. Even if the change buffer were free of bugs such as MDEV-24449 (potentially causing the corruption of any page in the system tablespace) or MDEV-26977 (corruption of secondary indexes due to a currently unknown reason), it will make diagnosis of other data corruption harder. Because of all this, it is best to disable the change buffer by default.	2022-02-09 08:36:41 +02:00
Marko Mäkelä	7cffb5f6e8	MDEV-23399: Performance regression with write workloads The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted the performance bottleneck to the page flushing. The configuration parameters will be changed as follows: innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction) innodb_lru_scan_depth=1536 (old: 1024) innodb_max_dirty_pages_pct=90 (old: 75) innodb_max_dirty_pages_pct_lwm=75 (old: 0) Note: The parameter innodb_lru_scan_depth will only affect LRU eviction of buffer pool pages when a new page is being allocated. The page cleaner thread will no longer evict any pages. It used to guarantee that some pages will remain free in the buffer pool. Now, we perform that eviction 'on demand' in buf_LRU_get_free_block(). The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows: * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks() * As a buf_pool.free limit in buf_LRU_list_batch() for terminating the flushing that is initiated e.g., by buf_LRU_get_free_block() The parameter also used to serve as an initial limit for unzip_LRU eviction (evicting uncompressed page frames while retaining ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit of 100 or unlimited for invoking buf_LRU_scan_and_free_block(). The status variables will be changed as follows: innodb_buffer_pool_pages_flushed: This includes also the count of innodb_buffer_pool_pages_LRU_flushed and should work reliably, updated one by one in buf_flush_page() to give more real-time statistics. The function buf_flush_stats(), which we are removing, was not called in every code path. For both counters, we will use regular variables that are incremented in a critical section of buf_pool.mutex. Note that show_innodb_vars() directly links to the variables, and reads of the counters will not be protected by buf_pool.mutex, so you cannot get a consistent snapshot of both variables. The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed, because the page cleaner no longer deals with writing or evicting least recently used pages, and because the single-page writes have been removed: * buffer_LRU_batch_flush_avg_time_slot * buffer_LRU_batch_flush_avg_time_thread * buffer_LRU_batch_flush_avg_time_est * buffer_LRU_batch_flush_avg_pass * buffer_LRU_single_flush_scanned * buffer_LRU_single_flush_num_scan * buffer_LRU_single_flush_scanned_per_call When moving to a single buffer pool instance in MDEV-15058, we missed some opportunity to simplify the buf_flush_page_cleaner thread. It was unnecessarily using a mutex and some complex data structures, even though we always have a single page cleaner thread. Furthermore, the buf_flush_page_cleaner thread had separate 'recovery' and 'shutdown' modes where it was waiting to be triggered by some other thread, adding unnecessary latency and potential for hangs in relatively rarely executed startup or shutdown code. The page cleaner was also running two kinds of batches in an interleaved fashion: "LRU flush" (writing out some least recently used pages and evicting them on write completion) and the normal batches that aim to increase the MIN(oldest_modification) in the buffer pool, to help the log checkpoint advance. The buf_pool.flush_list flushing was being blocked by buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN of a page is ahead of log_sys.get_flushed_lsn(), that is, what has been persistently written to the redo log, we would trigger a log flush and then resume the page flushing. This would unnecessarily limit the performance of the page cleaner thread and trigger the infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms. The settings might not be optimal" that were suppressed in commit `d1ab89037a` unless log_warnings>2. Our revised algorithm will make log_sys.get_flushed_lsn() advance at the start of buf_flush_lists(), and then execute a 'best effort' to write out all pages. The flush batches will skip pages that were modified since the log was written, or are are currently exclusively locked. The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message will be removed, because by design, the buf_flush_page_cleaner() should not be blocked during a batch for extended periods of time. We will remove the single-page flushing altogether. Related to this, the debug parameter innodb_doublewrite_batch_size will be removed, because all of the doublewrite buffer will be used for flushing batches. If a page needs to be evicted from the buffer pool and all 100 least recently used pages in the buffer pool have unflushed changes, buf_LRU_get_free_block() will execute buf_flush_lists() to write out and evict innodb_lru_flush_size pages. At most one thread will execute buf_flush_lists() in buf_LRU_get_free_block(); other threads will wait for that LRU flushing batch to finish. To improve concurrency, we will replace the InnoDB ib_mutex_t and os_event_t native mutexes and condition variables in this area of code. Most notably, this means that the buffer pool mutex (buf_pool.mutex) is no longer instrumented via any InnoDB interfaces. It will continue to be instrumented via PERFORMANCE_SCHEMA. For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical sections of buf_pool.flush_list_mutex should be shorter than those for buf_pool.mutex, because in the worst case, they cover a linear scan of buf_pool.flush_list, while the worst case of a critical section of buf_pool.mutex covers a linear scan of the potentially much longer buf_pool.LRU list. mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable with SAFE_MUTEX. Some InnoDB debug assertions need this predicate instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner(). buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list: Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[]. The number of active flush operations. buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA and SAFE_MUTEX instrumentation. buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU. buf_pool_t::done_flush_list: Condition variable for !n_flush_list. buf_pool_t::do_flush_list: Condition variable to wake up the buf_flush_page_cleaner when a log checkpoint needs to be written or the server is being shut down. Replaces buf_flush_event. We will keep using timed waits (the page cleaner thread will wake _at least_ once per second), because the calculations for innodb_adaptive_flushing depend on fixed time intervals. buf_dblwr: Allocate statically, and move all code to member functions. Use a native mutex and condition variable. Remove code to deal with single-page flushing. buf_dblwr_check_block(): Make the check debug-only. We were spending a significant amount of execution time in page_simple_validate_new(). flush_counters_t::unzip_LRU_evicted: Remove. IORequest: Make more members const. FIXME: m_fil_node should be removed. buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex (which we are removing). page_cleaner_slot_t, page_cleaner_t: Remove many redundant members. pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot(). recv_writer_thread: Remove. Recovery works just fine without it, if we simply invoke buf_flush_sync() at the end of each batch in recv_sys_t::apply(). recv_recovery_from_checkpoint_finish(): Remove. We can simply call recv_sys.debug_free() directly. srv_started_redo: Replaces srv_start_state. SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown() can communicate with the normal page cleaner loop via the new function flush_buffer_pool(). buf_flush_remove(): Assert that the calling thread is holding buf_pool.flush_list_mutex. This removes unnecessary mutex operations from buf_flush_remove_pages() and buf_flush_dirty_pages(), which replace buf_LRU_flush_or_remove_pages(). buf_flush_lists(): Renamed from buf_flush_batch(), with simplified interface. Return the number of flushed pages. Clarified comments and renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this function, which was their only caller, and remove 2 unnecessary buf_pool.mutex release/re-acquisition that we used to perform around the buf_flush_batch() call. At the start, if not all log has been durably written, wait for a background task to do it, or start a new task to do it. This allows the log write to run concurrently with our page flushing batch. Any pages that were skipped due to too recent FIL_PAGE_LSN or due to them being latched by a writer should be flushed during the next batch, unless there are further modifications to those pages. It is possible that a page that we must flush due to small oldest_modification also carries a recent FIL_PAGE_LSN or is being constantly modified. In the worst case, all writers would then end up waiting in log_free_check() to allow the flushing and the checkpoint to complete. buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_flush_space(): Auxiliary function to look up a tablespace for page flushing. buf_flush_page(): Defer the computation of space->full_crc32(). Never call log_write_up_to(), but instead skip persistent pages whose latest modification (FIL_PAGE_LSN) is newer than the redo log. Also skip pages on which we cannot acquire a shared latch without waiting. buf_flush_try_neighbors(): Do not bother checking buf_fix_count because buf_flush_page() will no longer wait for the page latch. Take the tablespace as a parameter, and only execute this function when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold(). buf_flush_relocate_on_flush_list(): Declare as cold, and push down a condition from the callers. buf_flush_check_neighbor(): Take id.fold() as a parameter. buf_flush_sync(): Ensure that the buf_pool.flush_list is empty, because the flushing batch will skip pages whose modifications have not yet been written to the log or were latched for modification. buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables. buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize the counters, and report n->evicted. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_do_LRU_batch(): Return the number of pages flushed. buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if adaptive hash index entries are pointing to the block. buf_LRU_get_free_block(): Do not wake up the page cleaner, because it will no longer perform any useful work for us, and we do not want it to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0) writes out and evicts at most innodb_lru_flush_size pages. (The function buf_do_LRU_batch() may complete after writing fewer pages if more than innodb_lru_scan_depth pages end up in buf_pool.free list.) Eliminate some mutex release-acquire cycles, and wait for the LRU flush batch to complete before rescanning. buf_LRU_check_size_of_non_data_objects(): Simplify the code. buf_page_write_complete(): Remove the parameter evict, and always evict pages that were part of an LRU flush. buf_page_create(): Take a pre-allocated page as a parameter. buf_pool_t::free_block(): Free a pre-allocated block. recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block while not holding recv_sys.mutex. During page allocation, we may initiate a page flush, which in turn may initiate a log flush, which would require acquiring log_sys.mutex, which should always be acquired before recv_sys.mutex in order to avoid deadlocks. Therefore, we must not be holding recv_sys.mutex while allocating a buffer pool block. BtrBulk::logFreeCheck(): Skip a redundant condition. row_undo_step(): Do not invoke srv_inc_activity_count() for every row that is being rolled back. It should suffice to invoke the function in trx_flush_log_if_needed() during trx_t::commit_in_memory() when the rollback completes. sync_check_enable(): Remove. We will enable innodb_sync_debug from the very beginning. Reviewed by: Vladislav Vaintroub	2020-10-15 17:04:56 +03:00
Marko Mäkelä	2f7d91bb6c	MDEV-22242 B-trees can become extremely skewed The test innodb.innodb_wl6326 that had been disabled in 10.4 due to MDEV-21535 is failing on 10.5 due to a different reason: the removal of the MLOG_COMP_END_COPY_CREATED operations in MDEV-12353 commit `276f996af9` caused PAGE_LAST_INSERT to be set to something nonzero by the function page_copy_rec_list_end(). This in turn would cause btr_page_get_split_rec_to_right() to behave differently: we would not attempt to split the page at all, but simply insert the new record into the new, empty, right leaf page. Even though the change reduced the sizes of some tables, it is better to aim for balanced trees. page_copy_rec_list_end(), PageBulk::finishPage(): Preserve PAGE_LAST_INSERT, PAGE_N_DIRECTION, PAGE_DIRECTION. PageBulk::finish(): Move some common code from PageBulk::finishPage().	2020-04-14 18:43:03 +03:00
Marko Mäkelä	276f996af9	MDEV-12353: Replace MLOG_*_END_COPY_CREATED Instead of writing the high-level redo log records MLOG_LIST_END_COPY_CREATED, MLOG_COMP_LIST_END_COPY_CREATED write log for each individual insert of a record. page_copy_rec_list_end_to_created_page(): Remove. This will improve the fill factor of some pages. Adjust some tests accordingly. PageBulk::init(), PageBulk::finish(): Avoid setting bogus limits to PAGE_HEAP_TOP and PAGE_N_DIR_SLOTS. Avoid accessor functions that would enforce these limits before the correct ones are set at the end of PageBulk::finish().	2020-02-13 18:19:14 +02:00
Marko Mäkelä	2d6719d7ee	MDEV-19514 preparation: Extend innodb.innodb-change-buffer-recovery Test innodb_read_only startup (which will be refused after a crash), and test also innodb_force_recovery=5, and extract some change buffer merge statistics. Omit any statistics about delete (purge) buffering, because purge could happen at any time. Use the sequence storage engine for populating the table.	2019-09-26 13:03:40 +03:00
Marko Mäkelä	eeee1832d7	Speed up buildbot by requiring --big-test for some slow tests	2019-05-29 08:28:15 +03:00
Marko Mäkelä	3d46768da2	Merge 10.1 into 10.2	2017-01-09 09:47:12 +02:00
Marko Mäkelä	59ea6456c3	Minor cleanup of innodb.innodb-change-buffer-recovery This should be a non-functional change. I was unable to repeat MDEV-11626 innodb.innodb-change-buffer-recovery fails for xtradb and cannot determine the reason for the failure without having access to the files. The repeatability of MDEV-11626 should not be affected by these changes.	2017-01-09 09:28:00 +02:00
Jan Lindström	2e814d4702	Merge InnoDB 5.7 from mysql-5.7.9. Contains also MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7 The failure happened because 5.7 has changed the signature of the bool handler::primary_key_is_clustered() const virtual function ("const" was added). InnoDB was using the old signature which caused the function not to be used. MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7 Fixed mutexing problem on lock_trx_handle_wait. Note that rpl_parallel and rpl_optimistic_parallel tests still fail. MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan) Reason: incorrect merge MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan) Reason: incorrect merge	2016-09-02 13:22:28 +03:00
Jan Lindström	25f8738112	MDEV-9040: 10.1.8 fails after upgrade from 10.0.21 Analysis: Lengths which are not UNIV_SQL_NULL, but bigger than the following number indicate that a field contains a reference to an externally stored part of the field in the tablespace. The length field then contains the sum of the following flag and the locally stored len. This was incorrectly set to define UNIV_EXTERN_STORAGE_FIELD (UNIV_SQL_NULL - UNIV_PAGE_SIZE_MAX) When it should be define UNIV_EXTERN_STORAGE_FIELD (UNIV_SQL_NULL - UNIV_PAGE_SIZE_DEF) Additionally, we need to disable support for > 16K page size for row compressed tables because a compressed page directory entry reserves 14 bits for the start offset and 2 bits for flags. This limits the uncompressed page size to 16k. To support larger pages page directory entry needs to be larger.	2015-11-05 10:30:48 +02:00
Sergei Golubchik	d9c01e4b4a	5.5 merge	2015-01-21 12:03:02 +01:00
Jan Lindström	e544bcd16d	MDEV-7243: innodb-change-buffer-recovery fails on windows Problem is that on Windows command "perl" failed with error: 255 my_errno: 0 errno: 0. Do not run on Windows.	2014-12-02 12:19:29 +02:00
Sergei Golubchik	0dc23679c8	10.0-base merge	2014-02-26 15:28:07 +01:00
Sergei Golubchik	84651126c0	MySQL-5.5.36 merge (without few incorrect bugfixes and with 1250 files where only a copyright year was changed)	2014-02-17 11:00:51 +01:00
Satya Bodapati	e8232b1d95	BUG#16752251 - INNODB DOESN'T REDO-LOG INSERT BUFFER MERGE OPERATION IF IT IS DONE IN-PLACE Add testcase as innodb-change-buffer-recovery.test	2013-12-26 14:33:52 +05:30

18 Commits