mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-09-13 13:47:59 +03:00

Author	SHA1	Message	Date
Thirunarayanan Balathandayuthapani	c507678b20	MDEV-28699 Shrink temporary tablespaces without restart - Introduced the variable "innodb_truncate_temporary_tablespace_now" to shrink the temporary tablespace. Steps for shrinking the temporary tablespace: 1) Find the last used extent in temporary tablespace by iterating through the BITMAP in extent descriptor pages 2) If the last used extent is lesser than user specified size then set desired target size to user specified size 3) Store the page contents of "to be modified" extent descriptor pages, latches the "to be modified" extent descriptor pages and check for buffer pool memory availability 4) Update the FSP_SIZE and FSP_FREE_LIMIT in header page 5) Remove the "to be truncated" pages from FSP_FREE and FSP_FREE_FRAG list 6) Reset the bitmap in the last descriptor pages for the "to be truncated" pages. 7) Clear the freed range in temporary tablespace which are to be truncated. 8) Evict the "to be truncated" temporary tablespace pages from LRU list. 9) In case of multiple files, calculate the truncated last file size and do truncation in last file 10) Commit the mini-transaction for shrinking the tablespace	2023-10-27 10:51:37 +03:00
Marko Mäkelä	5b53342a6a	MDEV-32588 InnoDB may hang when running out of buffer pool buf_flush_LRU_list_batch(): Do not skip pages that are actually clean but in buf_pool.flush_list due to the "lazy removal" optimization of commit `22b62edaed`, but try to evict them. After acquiring buf_pool.flush_list_mutex, reread oldest_modification to ensure that the block still remains in buf_pool.flush_list. In addition to server hangs, this bug could also cause InnoDB: Failing assertion: list.count > 0 in invocations of UT_LIST_REMOVE(flush_list, ...). This fixes a regression that was caused by commit `a55b951e60` and possibly made more likely to hit due to commit `aa719b5010`.	2023-10-26 15:10:53 +03:00
Marko Mäkelä	39e3ca8bd2	MDEV-31826 InnoDB may fail to recover after being killed in fil_delete_tablespace() InnoDB was violating the write-ahead-logging protocol when a file was being deleted, like this: 1. fil_delete_tablespace() set the fil_space_t::STOPPING flag 2. The buf_flush_page_cleaner() thread discards some changed pages for this tablespace advances the log checkpoint a little. 3. The server process is killed before fil_delete_tablespace() wrote a FILE_DELETE record. 4. Recovery will try to apply log to pages of the tablespace, because there was no FILE_DELETE record. This will fail, because some pages that had been modified since the latest checkpoint had not been written by the page cleaner. Page writes must not be stopped before a FILE_DELETE record has been durably written. fil_space_t::drop(): Replaces fil_space_t::check_pending_operations(). Add the parameter detached_handle, and return a tablespace pointer if this thread was the first one to stop I/O on the tablespace. mtr_t::commit_file(): Remove the parameter detached_handle, and move some handling to fil_space_t::drop(). fil_space_t: STOPPING_READS, STOPPING_WRITES: Separate flags for STOPPING. We want to stop reads (and encryption) before stopping page writes. fil_space_t::is_stopping_writes(), fil_space_t::get_for_write(): Special accessors for the write path. fil_space_t::flush_low(): Ignore the STOPPING_READS flag and only stop if STOPPING_WRITES is set, to avoid an infinite loop in fil_flush_file_spaces(), which was occasionally repeated by running the test encryption.create_or_replace. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2023-10-26 15:07:59 +03:00
Marko Mäkelä	3036b36f9b	Merge 10.10 into 10.11	2023-10-23 18:44:12 +03:00
Marko Mäkelä	5a8fca5a4f	Merge 10.6 into 10.10	2023-10-23 18:43:36 +03:00
Marko Mäkelä	b21f52ee73	Merge 10.5 into 10.6	2023-10-23 16:43:48 +03:00
Marko Mäkelä	b5e43a1d35	MDEV-32552 Write-ahead logging is broken for freed pages buf_page_free(): Flag the freed page as modified if it is found in the buffer pool. buf_flush_page(): If the page has been freed, ensure that the log for it has been durably written, before removing the page from buf_pool.flush_list. FindBlockX: Find also MTR_MEMO_PAGE_X_MODIFY in order to avoid an occasional failure of innodb.innodb_defrag_concurrent, which involves freeing and reallocating pages in the same mini-transaction. This fixes a regression that was introduced in commit `a35b4ae898` (MDEV-15528). This logic was tested by commenting out the $shutdown_timeout line from a test and running the following: ./mtr --rr innodb.scrub rr replay var/log/mysqld.1.rr/mariadbd-0 A breakpoint in the modified buf_flush_page() was hit, and the FIL_PAGE_LSN of that page had been last modified during the mtr_t::commit() of a mini-transaction where buf_page_free() had been executed on that page.	2023-10-23 16:13:16 +03:00
Marko Mäkelä	65700edb26	Merge 10.10 into 10.11	2023-10-19 14:50:42 +03:00
Marko Mäkelä	c92d06748a	Merge 10.6 into 10.10	2023-10-19 14:35:31 +03:00
Marko Mäkelä	6991b1c47c	Merge 10.5 into 10.6	2023-10-19 13:50:00 +03:00
Marko Mäkelä	be24e75229	Merge 10.11 into 11.0	2023-10-19 08:12:16 +03:00
Thirunarayanan Balathandayuthapani	3da5d047b8	MDEV-31851 After crash recovery, undo tablespace fails to open Problem: ======== - InnoDB fails to open undo tablespace when page0 is corrupted and fails to throw error. Solution: ========= - InnoDB throws DB_CORRUPTION error when InnoDB encounters page0 corruption of undo tablespace. - InnoDB restores the page0 of undo tablespace from doublewrite buffer if it encounters page corruption - Moved Datafile::restore_from_doublewrite() to recv_dblwr_t::restore_first_page(). So that undo tablespace and system tablespace can use this function instead of duplicating the code srv_undo_tablespace_open(): Returns 0 if file doesn't exist or ULINT_UNDEFINED if page0 is corrupted.	2023-10-17 18:41:21 +05:30
Marko Mäkelä	2ecc0443ec	Merge 10.10 into 10.11	2023-10-17 16:04:21 +03:00
Marko Mäkelä	d5e15424d8	Merge 10.6 into 10.10 The MDEV-29693 conflict resolution is from Monty, as well as is a bug fix where ANALYZE TABLE wrongly built histograms for single-column PRIMARY KEY. Also includes a fix for safe_malloc error reporting. Other things: - Copied main.log_slow from 10.4 to avoid mtr issue Disabled test: - spider/bugfix.mdev_27239 because we started to get +Error 1429 Unable to connect to foreign data source: localhost -Error 1158 Got an error reading communication packets - main.delayed - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED This part is disabled for now as it fails randomly with different warnings/errors (no corruption).	2023-10-14 13:36:11 +03:00
Vladislav Vaintroub	e33e2fa949	MDEV-31095 tpool - restrict threadpool concurrency during bufferpool load Add threadpool functionality to restrict concurrency during "batch" periods (where tasks are added in rapid succession). This will throttle thread creation more agressively than usual, while keeping performance at least on-par. One of these cases is bufferpool load, where async read IOs are executed without any throttling. There can be as much as 650K read IOs for loading 10GB buffer pool. Another one is recovery, where "fake read" IOs are executed. Why there are more threads than we expect? Worker threads are not be recognized as idle, until they return to the standby list, and to return to that list, they need to acquire mutex currently held in the submit_task(). In those cases, submit_task() has no worker to wake, and would create threads until default concurrency level (2*ncpus) is satisfied. Only after that throttling would happen.	2023-10-04 17:44:02 +02:00
Marko Mäkelä	0f9acce3f2	Merge 10.5 into 10.6	2023-09-14 09:01:15 +03:00
Marko Mäkelä	d20a4da23d	MDEV-32150 InnoDB reports corruption on 32-bit platforms with ibd files sizes > 4GB buf_read_page_low(): Use 64-bit arithmetics when computing the file byte offset. In other calls to fil_space_t::io() the offset was being computed correctly, for example by buf_page_t::physical_offset().	2023-09-12 15:16:31 +03:00
Thirunarayanan Balathandayuthapani	a03b8cd0a2	MDEV-32145 Disable read-ahead for temporary tablespace - Lifetime of temporary tables is expected to be short, it would seem to make sense to assume that all temporary tablespace pages will remain in the buffer pool. It doesn't make sense to have read-ahead for pages of temporary tablespace	2023-09-11 18:02:53 +05:30
Marko Mäkelä	cdd2fa7fc5	MDEV-32134 InnoDB hang in buf_flush_wait_LRU_batch_end() buf_flush_page_cleaner(): Before finishing a batch, wake up any threads that are waiting for buf_pool.done_flush_LRU. This should fix a hung shutdown that we observed after SET GLOBAL innodb_buffer_pool_size started was executed to shrink the InnoDB buffer pool.	2023-09-11 14:54:50 +03:00
Marko Mäkelä	9d1466522e	MDEV-32029 Assertion failures in log_sort_flush_list upon crash recovery In commit `0d175968d1` (MDEV-31354) we only waited that no buf_pool.flush_list writes are in progress. The buf_flush_page_cleaner() thread could still initiate page writes from the buf_pool.LRU list while only holding buf_pool.mutex, not buf_pool.flush_list_mutex. This is something that was changed in commit `a55b951e60` (MDEV-26827). log_sort_flush_list(): Wait for the buf_flush_page_cleaner() thread to be completely idle, including LRU flushing. buf_flush_page_cleaner(): Always broadcast buf_pool.done_flush_list when becoming idle, so that log_sort_flush_list() will be woken up. Also, ensure that buf_pool.n_flush_inc() or buf_pool.flush_list_set_active() has been invoked before any page writes are initiated. buf_flush_try_neighbors(): Release buf_pool.mutex here and not in the callers, to avoid code duplication. Make innodb_flush_neighbors=ON obey the innodb_io_capacity limit.	2023-08-30 14:40:13 +03:00
Marko Mäkelä	31ea201ecc	MDEV-30986 Slow full index scan for I/O bound case buf_page_init_for_read(): Test a condition before acquiring a latch, not while holding it. buf_read_ahead_linear(): Do not use a memory transaction, because it could be too large, leading to frequent retries. Release the hash_lock as early as possible.	2023-08-30 13:20:27 +03:00
Marko Mäkelä	08a549c33d	Clean up buf_LRU_remove_hashed() buf_LRU_block_remove_hashed(): Test for "not ROW_FORMAT=COMPRESSED" first, because in that case we can assume that an uncompressed page exists. This removes a condition from the likely code branch.	2023-08-25 13:44:59 +03:00
Marko Mäkelä	a60462d93e	Remove bogus references to replaced Google contributions In commit `03ca6495df` and commit `ff5d306e29` we forgot to remove some Google copyright notices related to a contribution of using atomic memory access in the old InnoDB mutex_t and rw_lock_t implementation. The copyright notices had been mostly added in commit `c6232c06fa` due to commit `a1bb700fd2`. The following Google contributions remain: * some logic related to the parameter innodb_io_capacity * innodb_encrypt_tables, added in MariaDB Server 10.1	2023-08-21 15:51:16 +03:00
Marko Mäkelä	6cc88c3db1	Clean up buf0buf.inl Let us move some #include directives from buf0buf.inl to the compilation units where they are really used.	2023-08-21 15:51:10 +03:00
Marko Mäkelä	448c2077fb	Merge 10.5 into 10.6	2023-08-21 15:50:31 +03:00
Marko Mäkelä	be5fd3ec35	Remove a stale comment buf_LRU_block_remove_hashed(): Remove a comment that had been added in mysql/mysql-server@aad1c7d0dd and apparently referring to buf_LRU_invalidate_tablespace(), which was later replaced with buf_LRU_flush_or_remove_pages() and ultimately with buf_flush_remove_pages() and buf_flush_list_space(). All that code is covered by buf_pool.mutex. The note about releasing the hash_lock for the buf_pool.page_hash slice would actually apply to the last reference to hash_lock in buf_LRU_free_page(), for the case zip=false (retaining a ROW_FORMAT=COMPRESSED page while discarding the uncompressed one).	2023-08-21 13:28:12 +03:00
Oleksandr Byelkin	51f9d62005	Merge branch '10.11' into 11.0	2023-08-09 07:53:48 +02:00
Oleksandr Byelkin	036df5f970	Merge branch '10.10' into 10.11	2023-08-08 14:57:31 +02:00
Oleksandr Byelkin	34a8e78581	Merge branch '10.6' into 10.9	2023-08-04 08:01:06 +02:00
Marko Mäkelä	72928e640e	MDEV-27593: Crashing on I/O error is unhelpful buf_page_t::write_complete(), buf_page_write_complete(), IORequest::write_complete(): Add a parameter for passing an error code. If an error occurred, we will release the io-fix, buffer-fix and page latch but not reset the oldest_modification field. The block would remain in buf_pool.LRU and possibly buf_pool.flush_list, to be written again later, by buf_flush_page_cleaner(). If all page writes start consistently failing, all write threads should eventually hang in log_free_check() because the log checkpoint cannot be advanced to make room in the circular write-ahead-log ib_logfile0. IORequest::read_complete(): Add a parameter for passing an error code. If a read operation fails, we report the error and discard the page, just like we would do if the page checksum was not validated or the page could not be decrypted. This only affects asynchronous reads, due to linear or random read-ahead or crash recovery. When buf_page_get_low() invokes buf_read_page(), that will be a synchronous read, not involving this code. This was tested by randomly injecting errors in write_io_callback() and read_io_callback(), like this: if (!ut_rnd_interval(100)) cb->m_err= 42;	2023-08-01 14:39:29 +03:00
Marko Mäkelä	96cfdb8710	MDEV-31816 fixup: Relax a debug assertion buf_LRU_free_page(): The block may also be in the IBUF_EXIST state when executing the test innodb.innodb_bulk_create_index_debug.	2023-08-01 13:22:16 +03:00
Marko Mäkelä	d794d3484b	MDEV-31816 buf_LRU_free_page() does not preserve ROW_FORMAT=COMPRESSED block state buf_LRU_free_page(): When we are discarding the uncompressed copy of a ROW_FORMAT=COMPRESSED page, buf_page_t::can_relocate() must have ensured that the block descriptor state is one of FREED, UNFIXED, REINIT. Do not overwrite the state with UNFIXED. We do not want to write back pages that were actually freed, and we want to avoid doublewrite for pages that were (re)initialized by log records written since the latest checkpoint. Last but not least, we do not want crashes like those that commit `dc1bd1802a` (MDEV-31386) was supposed to fix. The test innodb_zip.wl5522_zip should typically cover all 3 states. This bug is a regression due to commit `aaef2e1d8c` (MDEV-27058).	2023-08-01 09:58:15 +03:00
Marko Mäkelä	f2b4972bd4	Merge 10.11 into 11.0	2023-07-26 15:13:06 +03:00
Marko Mäkelä	bce3ee704f	Merge 10.10 into 10.11	2023-07-26 14:44:43 +03:00
Marko Mäkelä	7cde5c539b	Merge 10.6 into 10.9	2023-07-10 11:22:21 +03:00
Monty	99bd226059	MDEV-31558 Add InnoDB engine information to the slow query log The new statistics is enabled by adding the "engine", "innodb" or "full" option to --log-slow-verbosity Example output: # Pages_accessed: 184 Pages_read: 95 Pages_updated: 0 Old_rows_read: 1 # Pages_read_time: 17.0204 Engine_time: 248.1297 Page_read_time is time doing physical reads inside a storage engine. (Writes cannot be tracked as these are usually done in the background). Engine_time is the time spent inside the storage engine for the full duration of the read/write/update calls. It uses the same code as 'analyze statement' for calculating the time spent. The engine statistics is done with a generic interface that should be easy for any engine to use. It can also easily be extended to provide even more statistics. Currently only InnoDB has counters for Pages_% and Undo_% status. Engine_time works for all engines. Implementation details: class ha_handler_stats holds all engine stats. This class is included in handler and THD classes. While a query is running, all statistics is updated in the handler. In close_thread_tables() the statistics is added to the THD. handler::handler_stats is a pointer to where statistics should be collected. This is set to point to handler::active_handler_stats if stats are requested. If not, it is set to 0. handler_stats has also an element, 'active' that is 1 if stats are requested. This is to allow engines to avoid doing any 'if's while updating the statistics. Cloned or partition tables have the pointer set to the base table if status are requested. There is a small performance impact when using --log-slow-verbosity=engine: - All engine calls in 'select' will be timed. - IO calls for InnoDB reads will be timed. - Incrementation of counters are done on local variables and accesses are inline, so these should have very little impact. - Statistics has to be reset for each statement for the THD and each used handler. This is only 40 bytes, which should be neglectable. - For partition tables we have to loop over all partitions to update the handler_status as part of table_init(). Can be optimized in the future to only do this is log-slow-verbosity changes. For this to work we have to update handler_status for all opened partitions and also for all partitions opened in the future. Other things: - Added options 'engine' and 'full' to log-slow-verbosity. - Some of the new files in the test suite comes from Percona server, which has similar status information. - buf_page_optimistic_get(): Do not increment any counter, since we are only validating a pointer, not performing any buf_pool.page_hash lookup. - Added THD argument to save_explain_data_intern(). - Switched arguments for save_explain_.*_data() to have always THD first (generates better code as other functions also have THD first).	2023-07-07 12:53:18 +03:00
Marko Mäkelä	a906046f1f	Merge 10.11 into 11.0	2023-07-04 08:20:20 +03:00
Marko Mäkelä	3430767e00	Merge 10.10 into 10.11	2023-07-04 08:19:48 +03:00
Marko Mäkelä	35de8326fb	MDEV-31311: The test innodb.log_file_size_online occasionally hangs log_t::write_checkpoint(), log_t::resize_start(): Invoke buf_flush_ahead() with buf_pool.get_oldest_modification(0)+1 so that another checkpoint will be invoked, to complete the log resizing. Tested by: Oleg Smirnov (on Microsoft Windows, where this hung most often)	2023-07-03 16:50:01 +03:00
Marko Mäkelä	5fb2c031f7	Merge 10.11 into 11.0	2023-06-08 13:49:48 +03:00
Marko Mäkelä	c04284e747	Merge 10.10 into 10.11	2023-06-07 15:01:43 +03:00
Marko Mäkelä	878a86f276	Merge 10.6 into 10.9	2023-06-07 14:32:46 +03:00
Sergei Golubchik	0005f2f06c	Merge branch 'bb-10.11-release' into bb-11.0-release	2023-06-05 19:27:00 +02:00
Sergei Golubchik	4e2b93dffe	Merge branch 'bb-10.10-release' into bb-10.11-release	2023-06-05 19:04:58 +02:00
Sergei Golubchik	33fd519ca7	Merge branch 'github/bb-10.6-release' into bb-10.9-release	2023-06-05 18:55:26 +02:00
Marko Mäkelä	cf37e44eec	MDEV-31350: Hang in innodb.recovery_memory buf_flush_page_cleaner(): Whenever buf_pool.ran_out(), invoke buf_pool.get_oldest_modification(0) so that all clean blocks will be removed from buf_pool.flush_list and buf_flush_LRU_list_batch() will be able to evict some pages. This fixes a regression that was likely caused by commit `a55b951e60` (MDEV-26827).	2023-06-03 11:12:32 +02:00
Marko Mäkelä	f569e06e03	MDEV-31385 Change buffer stale entries leads to corruption while reusing page buf_page_free(): If buffered changes existed for the page, drop them. Co-developed with Thirunarayanan Balathandayuthapani	2023-06-02 11:06:09 +03:00
Marko Mäkelä	ce547cfc05	MDEV-31350: Hang in innodb.recovery_memory buf_flush_page_cleaner(): Whenever buf_pool.ran_out(), invoke buf_pool.get_oldest_modification(0) so that all clean blocks will be removed from buf_pool.flush_list and buf_flush_LRU_list_batch() will be able to evict some pages. This fixes a regression that was likely caused by commit `a55b951e60` (MDEV-26827).	2023-05-26 16:40:07 +03:00
Marko Mäkelä	f2c17cc9d9	MDEV-29911 InnoDB recovery and mariadb-backup --prepare fail to report detailed progress This is a 10.6 port of commit `2f9e264781` from MariaDB Server 10.9 that is missing some optimization due to a more complex redo log format and recovery logic (which was simplified in commit `685d958e38`). The progress reporting of InnoDB crash recovery was rather intermittent. Nothing was reported during the single-threaded log record parsing, which could consume minutes when parsing a large log. During log application, there only was progress reporting in background threads that would be invoked on data page read completion. The progress reporting here will be detailed like this: InnoDB: Starting crash recovery from checkpoint LSN=628599973,5653727799 InnoDB: Read redo log up to LSN=1963895808 InnoDB: Multi-batch recovery needed at LSN 2534560930 InnoDB: Read redo log up to LSN=3312233472 InnoDB: Read redo log up to LSN=1599646720 InnoDB: Read redo log up to LSN=2160831488 InnoDB: To recover: LSN 2806789376/2806819840; 195082 pages InnoDB: To recover: LSN 2806789376/2806819840; 63507 pages InnoDB: Read redo log up to LSN=3195776000 InnoDB: Read redo log up to LSN=3687099392 InnoDB: Read redo log up to LSN=4165315584 InnoDB: To recover: LSN 4374395699/4374440960; 241454 pages InnoDB: To recover: LSN 4374395699/4374440960; 123701 pages InnoDB: Read redo log up to LSN=4508724224 InnoDB: Read redo log up to LSN=5094550528 InnoDB: To recover: 205230 pages The previous messages "Starting a batch to recover" or "Starting a final batch to recover" will be replaced by "To recover: ... pages" messages. If a batch lasts longer than 15 seconds, then there will be progress reports every 15 seconds, showing the number of remaining pages. For the non-final batch, the "To recover:" message includes two end LSN: that of the batch, and of the recovered log. This is the primary measure of progress. The batch will end once the number of pages to recover reaches 0. If recovery is possible in a single batch, the output will look like this, with a shorter "To recover:" message that counts only the remaining pages: InnoDB: Starting crash recovery from checkpoint LSN=628599973,5653727799 InnoDB: Read redo log up to LSN=1984539648 InnoDB: Read redo log up to LSN=2710875136 InnoDB: Read redo log up to LSN=3358895104 InnoDB: Read redo log up to LSN=3965299712 InnoDB: Read redo log up to LSN=4557417472 InnoDB: Read redo log up to LSN=5219527680 InnoDB: To recover: 450915 pages We will also speed up recovery by improving the memory management and implementing multi-threaded recovery of data pages that will not need to be read into the buffer pool ("fake read"). Log application in the "fake read" threads will be protected by an atomic being_recovered field and exclusive buf_page_t::lock. Recovery will reserve for data pages two thirds of the buffer pool, or 256 pages, whichever is smaller. Previously, we could only use at most one third of the buffer pool for buffered log records. This would typically mean that with large buffer pools, recovery unnecessary consisted of multiple batches. If recovery runs out of memory, it will "roll back" or "rewind" the current mini-transaction. The recv_sys.recovered_lsn and recv_sys.pages will correspond to the "out of memory LSN", at the end of the previous complete mini-transaction. If recovery runs out of memory while executing the final recovery batch, we can simply invoke recv_sys.apply(false) to make room, and resume parsing. If recovery runs out of memory before the final batch, we will scan the redo log to the end and check for any missing or inconsistent files. In this version of the patch, we will throw away any previously buffered recv_sys.pages and rescan the log from the checkpoint onwards. recv_sys_t::pages_it: A cached iterator to recv_sys.pages. recv_sys_t::is_memory_exhausted(): Remove. We will have out-of-memory handling deep inside recv_sys_t::parse(). recv_sys_t::rewind(), page_recv_t::recs_t::rewind(): Remove all log starting with a specific LSN. IORequest::write_complete(), IORequest::read_complete(): Replaces fil_aio_callback(). read_io_callback(), write_io_callback(): Replaces io_callback(). IORequest::fake_read_complete(), fake_io_callback(), os_fake_read(): Process a "fake read" request for concurrent recovery. recv_sys_t::apply_batch(): Choose a number of successive pages for a recovery batch. recv_sys_t::erase(recv_sys_t::map::iterator): Remove log records for a page whose recovery is not in progress. Log application threads will not invoke this; they will only set being_recovered=-1 to indicate that the entry is no longer needed. recv_sys_t::garbage_collect(): Remove all being_recovered=-1 entries. recv_sys_t::wait_for_pool(): Wait for some space to become available in the buffer pool. mlog_init_t::mark_ibuf_exist(): Avoid calls to recv_sys::recover_low() via ibuf_page_exists() and buf_page_get_low(). Such calls would lead to double locking of recv_sys.mutex, which depending on implementation could cause a deadlock. We will use lower-level calls to look up index pages. buf_LRU_block_remove_hashed(): Disable consistency checks for freed ROW_FORMAT=COMPRESSED pages. Their contents could be uninitialized garbage. This fixes an occasional failure of the test innodb.innodb_bulk_create_index_debug. Tested by: Matthias Leich	2023-05-19 15:20:07 +03:00
Marko Mäkelä	2f9e264781	MDEV-29911 InnoDB recovery and mariadb-backup --prepare fail to report detailed progress The progress reporting of InnoDB crash recovery was rather intermittent. Nothing was reported during the single-threaded log record parsing, which could consume minutes when parsing a large log. During log application, there only was progress reporting in background threads that would be invoked on data page read completion. The progress reporting here will be detailed like this: InnoDB: Starting crash recovery from checkpoint LSN=503549688 InnoDB: Parsed redo log up to LSN=1990840177; to recover: 124806 pages InnoDB: Parsed redo log up to LSN=2729777071; to recover: 186123 pages InnoDB: Parsed redo log up to LSN=3488599173; to recover: 248397 pages InnoDB: Parsed redo log up to LSN=4177856618; to recover: 306469 pages InnoDB: Multi-batch recovery needed at LSN 4189599815 InnoDB: End of log at LSN=4483551634 InnoDB: To recover: LSN 4189599815/4483551634; 307490 pages InnoDB: To recover: LSN 4189599815/4483551634; 197159 pages InnoDB: To recover: LSN 4189599815/4483551634; 67623 pages InnoDB: Parsed redo log up to LSN=4353924218; to recover: 102083 pages ... InnoDB: log sequence number 4483551634 ... The previous messages "Starting a batch to recover" or "Starting a final batch to recover" will be replaced by "To recover: ... pages" messages. If a batch lasts longer than 15 seconds, then there will be progress reports every 15 seconds, showing the number of remaining pages. For the non-final batch, the "To recover:" message includes two end LSN: that of the batch, and of the recovered log. This is the primary measure of progress. The batch will end once the number of pages to recover reaches 0. If recovery is possible in a single batch, the output will look like this, with a shorter "To recover:" message that counts only the remaining pages: InnoDB: Starting crash recovery from checkpoint LSN=503549688 InnoDB: Parsed redo log up to LSN=1998701027; to recover: 125560 pages InnoDB: Parsed redo log up to LSN=2734136874; to recover: 186446 pages InnoDB: Parsed redo log up to LSN=3499505504; to recover: 249378 pages InnoDB: Parsed redo log up to LSN=4183247844; to recover: 306964 pages InnoDB: End of log at LSN=4483551634 ... InnoDB: To recover: 331797 pages ... InnoDB: log sequence number 4483551634 ... We will also speed up recovery by improving the memory management and implementing multi-threaded recovery of data pages that will not need to be read into the buffer pool ("fake read"). Log application in the "fake read" threads will be protected by an atomic being_recovered field and exclusive buf_page_t::latch. Recovery will reserve for data pages two thirds of the buffer pool, or 256 pages, whichever is smaller. Previously, we could only use at most one third of the buffer pool for buffered log records. This would typically mean that with large buffer pools, recovery unnecessary consisted of multiple batches. If recovery runs out of memory, it will "roll back" or "rewind" the current mini-transaction. The recv_sys.lsn and recv_sys.pages will correspond to the "out of memory LSN", at the end of the previous complete mini-transaction. If recovery runs out of memory while executing the final recovery batch, we can simply invoke recv_sys.apply(false) to make room, and resume parsing. If recovery runs out of memory before the final batch, we will scan the redo log to the end (recv_sys.scanned_lsn) and check for any missing or inconsistent files. If recv_init_crash_recovery_spaces() does not report any potentially missing tablespaces, we can make use of the already stored recv_sys.pages and only rewind to the "out of memory LSN". Else, we must keep parsing and invoking recv_validate_tablespace() until an error has been found or everything has been resolved, and ultimatily rewind to to the checkpoint LSN. recv_sys_t::pages_it: A cached iterator to recv_sys.pages recv_sys_t::parse_mtr(): Remove an ATTRIBUTE_NOINLINE that would prevent tail call optimization in recv_sys_t::parse_pmem(). recv_sys_t::parse(), recv_sys_t::parse_mtr(), recv_sys_t::parse_pmem(): Add template<bool store> parameter. Redo log record parsing (store=false) is better specialized from store=true (with bool if_exists) so that we can avoid some conditional branches in frequently invoked low-level code. recv_sys_t::is_memory_exhausted(): Remove. The special parse() status GOT_OOM will report out-of-memory situation at the low level. recv_sys_t::rewind(), page_recv_t::recs_t::rewind(): Remove all log starting with a specific LSN. recv_scan_log(): Separate some code for only parsing, not storing log. In rewound_lsn, remember the LSN at which last_phase=false recovery ran out of memory. This is where the next call to recv_scan_log() will resume storing the log. This replaces recv_sys.last_stored_lsn. recv_sys_t::parse(): Evaluate the template parameter store in a few more cases, to allow dead code to be eliminated at compile time. recv_sys_t::scanned_lsn: The end of the log found by recv_scan_log(). The special value 1 means that recv_sys has been initialized but no log has been parsed. IORequest::write_complete(), IORequest::read_complete(): Replaces fil_aio_callback(). read_io_callback(), write_io_callback(): Replaces io_callback(). IORequest::fake_read_complete(), fake_io_callback(), os_fake_read(): Process a "fake read" request for concurrent recovery. recv_sys_t::apply_batch(): Choose a number of successive pages for a recovery batch. recv_sys_t::erase(recv_sys_t::map::iterator): Remove log records for a page whose recovery is not in progress. Log application threads will not invoke this; they will only set being_recovered=-1 to indicate that the entry is no longer needed. recv_sys_t::garbage_collect(): Remove all being_recovered=-1 entries. recv_sys_t::wait_for_pool(): Wait for some space to become available in the buffer pool. mlog_init_t::mark_ibuf_exist(): Avoid calls to recv_sys::recover_low() via ibuf_page_exists() and buf_page_get_low(). Such calls would lead to double locking of recv_sys.mutex, which depending on implementation could cause a deadlock. We will use lower-level calls to look up index pages. buf_LRU_block_remove_hashed(): Disable consistency checks for freed ROW_FORMAT=COMPRESSED pages. Their contents could be uninitialized garbage. This fixes an occasional failure of the test innodb.innodb_bulk_create_index_debug. Tested by: Matthias Leich	2023-05-19 15:15:38 +03:00

... 3 4 5 6 7 ...

1771 Commits