mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-09-15 05:41:27 +03:00

Author	SHA1	Message	Date
Marko Mäkelä	b3ca7fa089	Merge 10.6 into 10.11	2024-01-22 08:49:04 +02:00
Marko Mäkelä	495e7f1b3d	MDEV-33053 fixup: Correct a condition before a message	2024-01-22 08:24:08 +02:00
Daniel Black	0c23f84d8d	MDEV-32983 cosmetic improvement on path separator near ib_buffer_pool A mix of path separators looks odd. InnoDB: Loading buffer pool(s) from C:\xampp\mysql\data/ib_buffer_pool This was changed in `cf552f5886` Both forward slashes and backward slashes work on Windows. We do not use \\?\ names. So we improve the consistent look of it so it doesn't look like a bug. Normalize, in this case, the path separator to \ for making the filename. Reported thanks to Github user @celestinoxp. Closes: https://github.com/ApacheFriends/xampp-build/issues/33 Reviewed by: Marko Mäkelä and Vladislav Vaintroub	2024-01-22 16:56:00 +11:00
Marko Mäkelä	7e65e3027e	MDEV-33275 buf_flush_LRU(): mysql_mutex_assert_owner(&buf_pool.mutex) failed In commit `a55b951e60` (MDEV-26827) an error was introduced in a rarely executed code path of the buf_flush_page_cleaner() thread. As a result, the function buf_flush_LRU() could be invoked while not holding buf_pool.mutex. Reviewed by: Debarun Banerjee	2024-01-19 12:40:32 +02:00
Marko Mäkelä	d34479dc66	MDEV-33053 InnoDB LRU flushing does not run before running out of buffer pool buf_flush_LRU(): Display a warning if no pages could be evicted and no writes initiated. buf_pool_t::need_LRU_eviction(): Renamed from buf_pool_t::ran_out(). Check if the amount of free pages is smaller than innodb_lru_scan_depth instead of checking if it is 0. buf_flush_page_cleaner(): For the final LRU flush after a checkpoint flush, use a "budget" of innodb_io_capacity_max, like we do in the case when we are not in "furious" checkpoint flushing. Co-developed by: Debarun Banerjee Reviewed by: Debarun Banerjee Tested by: Matthias Leich	2024-01-19 12:40:16 +02:00
Marko Mäkelä	9374772ecd	Merge 10.11 into 11.0	2024-01-19 09:07:48 +02:00
Marko Mäkelä	9d20853c74	Merge 10.6 into 10.11	2024-01-18 19:22:23 +02:00
Marko Mäkelä	3a96eba25f	Merge 10.5 into 10.6	2024-01-17 13:35:05 +02:00
Thirunarayanan Balathandayuthapani	caad34df54	MDEV-32968 InnoDB fails to restore tablespace first page from doublewrite buffer when page is empty - InnoDB fails to find the space id from the page0 of the tablespace. In that case, InnoDB can use doublewrite buffer to recover the page0 and write into the file. - buf_dblwr_t::init_or_load_pages(): Loads only the pages which are valid.(page lsn >= checkpoint). To do that, InnoDB has to open the redo log before system tablespace, read the latest checkpoint information. recv_dblwr_t::find_first_page(): 1) Iterate the doublewrite buffer pages and find the 0th page 2) Read the tablespace flags, space id from the 0th page. 3) Read the 1st, 2nd and 3rd page from tablespace file and compare the space id with the space id which is stored in doublewrite buffer. 4) If it matches then we can write into the file. 5) Return space which matches the pages from the file. SysTablespace::read_lsn_and_check_flags(): Remove the retry logic for validating the first page. After restoring the first page from doublewrite buffer, assign tablespace flags by reading the first page. recv_recovery_read_max_checkpoint(): Reads the maximum checkpoint information from log file recv_recovery_from_checkpoint_start(): Avoid reading the checkpoint header information from log file Datafile::validate_first_page(): Throw error in case of first page validation fails.	2024-01-15 14:08:27 +05:30
Marko Mäkelä	af4f9daeb8	Merge 11.2 into 11.3	2024-01-10 15:30:21 +02:00
Marko Mäkelä	c2da55ac01	Merge 10.11 into 11.0	2024-01-10 12:42:56 +02:00
Marko Mäkelä	1eb11da3e5	Merge 10.6 into 10.11	2024-01-10 12:37:19 +02:00
Marko Mäkelä	3613fb2aa8	MDEV-33112 innodb_undo_log_truncate=ON is blocking page write When innodb_undo_log_truncate=ON causes an InnoDB undo tablespace to be truncated, we must guarantee that the undo tablespace will be rebuilt atomically: After mtr_t::commit_shrink() has durably written the mini-transaction that rebuilds the undo tablespace, we must not write any old pages to the tablespace. To guarantee this, in trx_purge_truncate_history() we used to traverse the entire buf_pool.flush_list in order to acquire exclusive latches on all pages for the undo tablespace that reside in the buffer pool, so that those pages cannot be written and will be evicted during mtr_t::commit_shrink(). But, this traversal may interfere with the page writing activity of buf_flush_page_cleaner(). It would be better to lazily discard the old pages of the truncated undo tablespace. fil_space_t::is_being_truncated, fil_space_t::clear_stopping(): Remove. fil_space_t::create_lsn: A new field, identifying the LSN of the latest rebuild of a tablespace. buf_page_t::flush(), buf_flush_try_neighbors(): Evict pages whose FIL_PAGE_LSN is below fil_space_t::create_lsn. mtr_t::commit_shrink(): Update fil_space_t::create_lsn and fil_space_t::size right before the log is durably written and the tablespace file is being truncated. fsp_page_create(), trx_purge_truncate_history(): Simplify the logic. Reviewed by: Thirunarayanan Balathandayuthapani, Vladislav Lesin Performance tested by: Axel Schwenke Correctness tested by: Matthias Leich	2024-01-10 11:53:00 +02:00
Marko Mäkelä	6538a91e94	Merge 10.5 into 10.6	2024-01-08 14:39:56 +02:00
Marko Mäkelä	0b612619ef	MDEV-33098: Fix some instrumentation for innodb.doublewrite_debug buf_flush_page_cleaner(): A continue or break inside DBUG_EXECUTE_IF actually is a no-op. Use an explicit call to _db_keyword_() to actually avoid advancing the checkpoint. buf_flush_list_now_set(): Invoke os_aio_wait_until_no_pending_writes() to ensure that the page write to the system tablespace is completed.	2024-01-08 14:36:54 +02:00
Marko Mäkelä	193b22d822	Merge 11.2 into 11.3	2024-01-05 14:20:35 +02:00
Marko Mäkelä	d8b8adce49	Merge 10.11 into 11.0	2023-12-21 14:38:53 +02:00
Daniel Black	c8d52c895d	MDEV-24670 memory pressure - warnings/notes Silence a few more un-actionable warnings. Use ignoring result on write to make sure compilers don't complain.	2023-12-21 20:05:05 +11:00
Sergei Golubchik	7f0094aac8	Merge branch '11.2' into 11.3	2023-12-21 02:14:59 +01:00
Marko Mäkelä	590036b021	Merge 10.11 into 11.0	2023-12-20 16:05:20 +02:00
Marko Mäkelä	2b99e5f7ef	Merge 10.6 into 10.11	2023-12-20 15:58:36 +02:00
Daniel Black	9d2c3d388e	MDEV-24670 memory pressure - warnings/notes Errors outputted as notes are weird, and when a user cannot do anything about them, why output them at all. Point taken, removed these, and left positive message on initialization (from Marko). Thanks Elena Stepanova.	2023-12-20 17:55:50 +11:00
Marko Mäkelä	a057a6e41f	MDEV-24670 memory pressure - mariadb-backup postfix mariadb-backup wasn't meant to have memory pressure sensors so restrict the operation to SRV_OPERATIONAL_NORMAL mode. Reviewed by Daniel Black.	2023-12-20 09:53:46 +11:00
Marko Mäkelä	2b01e5103d	Merge 10.5 into 10.6	2023-12-19 18:41:42 +02:00
Sergei Golubchik	8c8bce05d2	Merge branch '10.11' into 11.0	2023-12-19 15:53:18 +01:00
Sergei Golubchik	fd0b47f9d6	Merge branch '10.6' into 10.11	2023-12-18 11:19:04 +01:00
Marko Mäkelä	4c2e971841	MDEV-33052 MSAN use-of-uninitialized-value in buf_read_ahead_linear() buf_read_ahead_linear(): Suppress a warning of comparing potentially uninitialized FIL_PAGE_PREV and FIL_PAGE_NEXT fields.	2023-12-18 10:37:11 +02:00
Sergei Golubchik	e95bba9c58	Merge branch '10.5' into 10.6	2023-12-17 11:20:43 +01:00
Marko Mäkelä	c8346c0bac	MDEV-31939 Adaptive flush recommendation ignores dirty ratio and checkpoint age buf_flush_page_cleaner(): Pass pct_lwm=srv_max_dirty_pages_pct_lwm (innodb_max_dirty_pages_pct_lwm) to page_cleaner_flush_pages_recommendation() unless the dirty page ratio of the buffer pool is below that. Starting with commit `d4265fbde5` we used to always pass pct_lwm=0.0, which was not intended. Reviewed by: Vladislav Vaintroub	2023-12-08 09:31:34 +02:00
Marko Mäkelä	f074223ae7	MDEV-32068 Some calls to buf_read_ahead_linear() seem to be useless The linear read-ahead (enabled by nonzero innodb_read_ahead_threshold) works best if index leaf pages or undo log pages have been allocated on adjacent page numbers. The read-ahead is assumed not to be helpful in other types of page accesses, such as non-leaf index pages. buf_page_get_low(): Do not invoke buf_page_t::set_accessed(), buf_page_make_young_if_needed(), or buf_read_ahead_linear(). We will invoke them in those callers of buf_page_get_gen() or buf_page_get() where it makes sense: the access is not one-time-on-startup and the page and not going to be freed soon. btr_copy_blob_prefix(), btr_pcur_move_to_next_page(), trx_undo_get_prev_rec_from_prev_page(), trx_undo_get_first_rec(), btr_cur_t::search_leaf(), btr_cur_t::open_leaf(): Invoke buf_read_ahead_linear(). We will not invoke linear read-ahead in functions that would essentially allocate or free pages, because pages that are freshly allocated are expected to be initialized by buf_page_create() and not read from the data file. Likewise, freeing pages should not involve accessing any sibling pages, except for freeing singly-linked lists of BLOB pages. We will not invoke read-ahead in btr_cur_t::pessimistic_search_leaf() or in a pessimistic operation of btr_cur_t::open_leaf(), because it is assumed that pessimistic operations should be preceded by optimistic operations, which should already have invoked read-ahead. buf_page_make_young_if_needed(): Invoke also buf_page_t::set_accessed() and return the result. btr_cur_nonleaf_make_young(): Like buf_page_make_young_if_needed(), but do not invoke buf_page_t::set_accessed(). Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2023-12-05 12:31:29 +02:00
Marko Mäkelä	850d61736d	MDEV-32042 Simplify buf_page_get_gen() buf_page_get_low(): Rename to buf_page_get_gen(), and assume that no crash recovery is needed. recv_sys_t::recover(): Replaces the old buf_page_get_gen(). Read a page while crash recovery is in progress. trx_rseg_get_n_undo_tablespaces(), ibuf_upgrade_needed(): Invoke recv_sys.recover() instead of buf_page_get_gen(). dict_boot(): Invoke recv_sys.recover() instead of buf_page_get_gen(). Do not load the system tables. srv_start(): Load the system tables and the undo logs after all redo log has been applied in recv_sys.apply(true) and we can safely invoke the regular buf_page_get_gen().	2023-12-04 09:45:53 +02:00
Marko Mäkelä	0fb897b081	Merge 10.11 into 11.0	2023-11-30 16:20:47 +02:00
Marko Mäkelä	6d0bcfc4b9	Merge 10.6 into 10.11	2023-11-30 13:03:59 +02:00
Marko Mäkelä	bb511def1d	MDEV-32371 Deadlock between buf_page_get_zip() and buf_pool_t::corrupted_evict() buf_page_get_zip(): Do not wait for the page latch while holding hash_lock. If the latch is not available, ensure that any concurrent buf_pool_t::corrupted_evict() will be able to acquire the hash_lock, and then retry the lookup. If the page was corrupted, we will finally "goto must_read_page", retry the read once more, and then report an error. Reviewed by: Thirunarayanan Balathandayuthapani	2023-11-30 10:35:53 +02:00
Marko Mäkelä	02701a8430	Merge 11.2 into 11.3	2023-11-28 11:19:50 +02:00
Marko Mäkelä	5b6134b040	Merge 10.11 into 11.0	2023-11-24 11:20:56 +02:00
Daniel Black	a48c1b89c8	MDEV-24670 memory pressure - eventfd rather than pipe Eventfds have a simplier interface and are one file descriptor rather than two. Reuse the patten of the accepting socket connections by testing for abort after a poll returns. This way the same event descriptor can be used for Quit and debugging trigger. Also correct the registration of mem pressure file descriptors.	2023-11-23 18:39:38 +11:00
Marko Mäkelä	f2bd662f6c	Merge 10.6 into 10.11	2023-11-22 18:14:11 +02:00
Marko Mäkelä	d963584d4c	Merge 10.5 into 10.6	2023-11-22 16:56:47 +02:00
Marko Mäkelä	78c9a12c8f	MDEV-32861 InnoDB hangs when running out of I/O slots When the constant OS_AIO_N_PENDING_IOS_PER_THREAD is changed from 256 to 1 and the server is run with the minimum parameters innodb_read_io_threads=1 and innodb_write_io_threads=2, two hangs were observed. tpool::cache<T>::put(T*): Ensure that get() in io_slots::acquire() will be woken up when the cache previously was empty. buf_pool_t::io_buf_t::reserve(): Schedule a possibly partial doublewrite batch so that os_aio_wait_until_no_pending_writes() has a chance of returning. Add a Boolean parameter and pass wait_for_reads=false inside buf_page_decrypt_after_read(), because those calls will be executed inside a read completion callback, and therefore os_aio_wait_until_no_pending_reads() would block indefinitely.	2023-11-22 16:54:41 +02:00
Marko Mäkelä	7443ad1c8a	MDEV-32374 log_sys.lsn_lock is a performance hog The log_sys.lsn_lock that was introduced in commit `a635c40648` had better be located in the same cache line with log_sys.latch so that log_t::append_prepare() needs to modify only two first cache lines where log_sys is stored. log_t::lsn_lock: On Linux, change the type from pthread_mutex_t to something that may be as small as 32 bits, to pack more data members in the same cache line. On Microsoft Windows, CRITICAL_SECTION works better. log_t::check_flush_or_checkpoint_: Renamed to need_checkpoint. There is no need to pause all writer threads in log_free_check() when we only need to write log_sys.buf to ib_logfile0. That will be done in mtr_t::commit(). log_t::append_prepare_wait(): Make the member function non-static to simplify the call interface, and add a parameter for the LSN. log_t::append_prepare(): Invoke append_prepare_wait() at most once. Only set_check_for_checkpoint() if a log checkpoint needs to be written. If the log buffer needs to be written, we will take care of it ourselves later in our caller. This will reduce interference with log_free_check() in other threads. mtr_t::commit(): Call log_write_up_to() if needed. log_t::get_write_target(): Return a log_write_up_to() target to mtr_t::commit(). buf_flush_ahead(): If we are in furious flushing, call log_sys.set_check_for_checkpoint() so that all writers will wait in log_free_check() until the checkpoint is done. Otherwise, the test innodb.insert_into_empty could occasionally report an error "Crash recovery is broken". log_check_margins(): Replaced by log_free_check(). log_flush_margin(): Removed. This is part of mtr_t::commit() and other operations that write log. log_t::create(), log_t::attach(): Guarantee that buf_free < max_buf_free will always hold on PMEM, to satisfy an assumption of log_t::get_write_target(). log_write_up_to(): Assert lsn!=0. Such calls are not incorrect, but it is cheaper to test that single unlikely condition in mtr_t::commit() rather than test several conditions in log_write_up_to(). innodb_drop_database(), unlock_and_close_files(): Check the LSN before calling log_write_up_to(). ha_innobase::commit_inplace_alter_table(): Remove redundant calls to log_write_up_to() after calling unlock_and_close_files(). Reviewed by: Vladislav Vaintroub Stress tested by: Matthias Leich Performance tested by: Steve Shaw	2023-11-21 14:38:35 +02:00
Marko Mäkelä	90d968dab9	Merge 10.6 into 10.11	2023-11-20 10:08:19 +02:00
Marko Mäkelä	2323483528	MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression buf_page_t::set_os_unused(): Remove the system call that had been added in commit `16c9718758` and revised in commit `c1fd082e9c` for Microsoft Windows. buf_pool_t::garbage_collect(): A new function to collect any garbage from the InnoDB buffer pool that can be removed without writing any log or data files. This will also invoke madvise() for all of buf_pool.free. To trigger this the following MDEV is implemented: MDEV-24670 avoid OOM by linux kernel co-operative memory management To avoid frequent triggers that caused the MDEV-31953 regression, while still preserving the 10.11 functionality of non-greedy kernel memory usage, memory triggers are used. On the triggering of memory pressure, if supported in the Linux kernel, trigger the garbage collection of the innodb buffer pool. The hard coded triggers occur where there is: * some memory pressure in 5 of the last 10 seconds * a full stall on memory pressure for 10ms in the last 2 seconds The kernel will trigger only one in each of these time windows. To avoid mariadb being in a constant state of memory garbage collection, this has been limited to once per minute. For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring CAP_SYS_RESOURCE that was lifted[1] to support the use case of user memory pressure. It not currently possible to set CAP_SYS_RESOURCES in a systemd service as its setting a capability inside a usernamespace. Running under systemd v254+ requires the default MemoryPressureWatch=auto (or alternately "on"). Functionality was tested in a 6.4 kernel Fedora successfully under a systemd service. Running in a container requires that (unmask=)/sys/fs/cgroup be writable by the mariadbd process. To aid testing, the buf_pool_resize was a convient trigger point on which to trigger garbage collection. ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427 Co-Author: Daniel Black (on memory pressure trigger) Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin, Thirunarayanan Balathandayuthapani Tested by: Matthias Leich	2023-11-18 20:12:33 +11:00
Marko Mäkelä	eb1f8b2919	MDEV-32027 Opening all .ibd files on InnoDB startup can be slow dict_find_max_space_id(): Return SELECT MAX(SPACE) FROM SYS_TABLES. dict_check_tablespaces_and_store_max_id(): In the normal case (no encryption plugin has been loaded and the change buffer is empty), invoke dict_find_max_space_id() and do not open any .ibd files. If a std::set<uint32_t> has been specified, open the files whose tablespace ID is mentioned. Else, open all data files that are identified by SYS_TABLES records. fil_ibd_open(): Remove a call to os_file_get_last_error() that can report a misleading error, such as EINVAL inside my_realpath() that is not an actual error. This could be invoked when a data file is found but the FSP_SPACE_FLAGS are incorrect, such as is the case for table test.td in ./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 innodb.table_flags buf_load(): If any tablespaces could not be found, invoke dict_check_tablespaces_and_store_max_id() on the missing tablespaces. dict_load_tablespace(): Try to load the tablespace unless it was found to be futile. This fixes failures related to FTS_*.ibd files for FULLTEXT INDEX. btr_cur_t::search_leaf(): Prevent a crash when the tablespace does not exist. This was caught by the test innodb_fts.fts_concurrent_insert when the change to dict_load_tablespaces() was not present. We modify a few tests to ensure that tables will not be loaded at startup. For some fault injection tests this means that the corrupted tables will not be loaded, because dict_load_tablespace() would perform stricter checks than dict_check_tablespaces_and_store_max_id(). Tested by: Matthias Leich Reviewed by: Thirunarayanan Balathandayuthapani	2023-11-17 15:07:51 +02:00
Marko Mäkelä	9a545eb67c	MDEV-26055: Correct the formula for adaptive flushing This is a 10.5 backport of 10.6 commit `d4265fbde5`. page_cleaner_flush_pages_recommendation(): If dirty_pct is between innodb_max_dirty_pages_pct_lwm and innodb_max_dirty_pages_pct, scale the effort relative to how close we are to innodb_max_dirty_pages_pct. The previous formula was missing a multiplication by 100.	2023-11-16 17:45:37 +02:00
Marko Mäkelä	a3d0d5fc33	MDEV-26055: Improve adaptive flushing This is a 10.5 backport from 10.6 commit `9593cccf28`. Adaptive flushing is enabled by setting innodb_max_dirty_pages_pct_lwm>0 (not default) and innodb_adaptive_flushing=ON (default). There is also the parameter innodb_adaptive_flushing_lwm (default: 10 per cent of the log capacity). It should enable some adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0. That is not being changed here. This idea was first presented by Inaam Rana several years ago, and I discussed it with Jean-François Gagné at FOSDEM 2023. buf_flush_page_cleaner(): When we are not near the log capacity limit (neither buf_flush_async_lsn nor buf_flush_sync_lsn are set), also try to move clean blocks from the buf_pool.LRU list to buf_pool.free or initiate writes (but not the eviction) of dirty blocks, until the remaining I/O capacity has been consumed. buf_flush_LRU_list_batch(): Add the parameter bool evict, to specify whether dirty least recently used pages (from buf_pool.LRU) should be evicted immediately after they have been written out. Callers outside buf_flush_page_cleaner() will pass evict=true, to retain the existing behaviour. buf_do_LRU_batch(): Add the parameter bool evict. Return counts of evicted and flushed pages. buf_flush_LRU(): Add the parameter bool evict. Assume that the caller holds buf_pool.mutex and will invoke buf_dblwr.flush_buffered_writes() afterwards. buf_flush_list_holding_mutex(): A low-level variant of buf_flush_list() whose caller must hold buf_pool.mutex and invoke buf_dblwr.flush_buffered_writes() afterwards. buf_flush_wait_batch_end_acquiring_mutex(): Remove. It is enough to have buf_flush_wait_batch_end(). page_cleaner_flush_pages_recommendation(): Avoid some floating-point arithmetics. buf_flush_page(), buf_flush_check_neighbor(), buf_flush_check_neighbors(), buf_flush_try_neighbors(): Rename the parameter "bool lru" to "bool evict". buf_free_from_unzip_LRU_list_batch(): Remove the parameter. Only actual page writes will contribute towards the limit. buf_LRU_free_page(): Evict freed pages of temporary tables. buf_pool.done_free: Broadcast whenever a block is freed (and buf_pool.try_LRU_scan is set). buf_pool_t::io_buf_t::reserve(): Retry indefinitely. During the test encryption.innochecksum we easily run out of these buffers for PAGE_COMPRESSED or ENCRYPTED pages. Tested by Matthias Leich and Axel Schwenke	2023-11-16 17:45:18 +02:00
Oleksandr Byelkin	34272bd6a5	Merge branch '11.2' into 11.3	2023-11-14 18:33:03 +01:00
Oleksandr Byelkin	48af85db21	Merge branch '10.11' into 11.0	2023-11-08 17:09:44 +01:00
Oleksandr Byelkin	fecd78b837	Merge branch '10.10' into 10.11	2023-11-08 16:46:47 +01:00
Oleksandr Byelkin	04d9a46c41	Merge branch '10.6' into 10.10	2023-11-08 16:23:30 +01:00

... 2 3 4 5 6 ...

1771 Commits