mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-09-11 05:52:26 +03:00

Author	SHA1	Message	Date
Oleksandr Byelkin	89c7e2b9c7	Merge branch '10.11' into 11.4	2025-06-17 09:50:22 +02:00
Daniel Black	5cd982c51b	MDEV-34863 RAM Usage Changed Significantly Between 10.11 Releases (postfix) Correct error message to use the correct system variable name - innodb_buffer_pool_size_auto_min	2025-06-03 15:02:18 +10:00
Marko Mäkelä	4d37b1c4b9	MDEV-36886 log_t::get_lsn_approx() isn't lower bound If the execution of the two reads in log_t::get_lsn_approx() is interleaved with concurrent writes of those fields in log_t::write_buf() or log_t::persist(), the returned approximation will be an upper bound. If log_t::append_prepare_wait() is pending, the approximation could be a lower bound. We must adjust each caller of log_t::get_lsn_approx() for the possibility that the return value is larger than MAX(oldest_modification) in buf_pool.flush_list. af_needed_for_redo(): Add a comment that explains why the glitch is not a problem. page_cleaner_flush_pages_recommendation(): Revise the logic for the unlikely case cur_lsn < oldest_lsn. The original logic would have invoked af_get_pct_for_lsn() with a very large age value, which would likely cause an overflow of the local variable lsn_age_factor, and make pct_for_lsn a "random number". Based on that value, total_ratio would be normalized to something between 0.0 and 1.0. Nothing extremely bad should have happened in this case; the innodb_io_capacity_max should not be exceeded.	2025-05-28 14:44:43 +03:00
Marko Mäkelä	7b4b759f13	MDEV-36868: Inconsistency when shrinking innodb_buffer_pool_size buf_pool_t::resize(): After successfully shrinking the buffer pool, announce the success. The size had already been updated in shrunk(). After failing to shrink the buffer pool, re-enable the adaptive hash index if it had been enabled. Reviewed by: Debarun Banerjee	2025-05-28 13:33:06 +03:00
Marko Mäkelä	118cfcf821	Merge 10.11 into 11.4	2025-05-13 13:44:58 +03:00
Marko Mäkelä	bb48d7bc81	MDEV-36781: Assertion i < BUF_BUDDY_SIZES failed in buf_buddy_shrink() buf_buddy_shrink(): Properly cover the case when KEY_BLOCK_SIZE corresponds to the innodb_page_size, that is, the ROW_FORMAT=COMPRESSED page frame is directly allocated from the buffer pool, not via the binary buddy allocator. buf_LRU_check_size_of_non_data_objects(): Avoid a crash when the buffer pool is being shrunk. buf_pool_t::shrink(): Abort if over 95% of the shrunk buffer pool would be occupied by the adaptive hash index or record locks.	2025-05-13 12:27:46 +03:00
Marko Mäkelä	56e0be34bc	MDEV-36780: InnoDB buffer pool reserves all assigned memory In commit `b6923420f3` (MDEV-29445) we started to specify the MAP_POPULATE flag for allocating the InnoDB buffer pool. This would cause a lot of time to be spent on __mm_populate() inside the Linux kernel, such as 16 seconds to pre-fault or commit innodb_buffer_pool_size=64G. Let us revert to the previous way of allocating the buffer pool at startup. Note: An attempt to increase the buffer pool size by SET GLOBAL innodb_buffer_pool_size (up to innodb_buffer_pool_size_max) will invoke my_virtual_mem_commit(), which will use MAP_POPULATE to zero-fill and prefault the requested additional memory area, blocking buf_pool.mutex. Before MDEV-29445 we allocated the InnoDB buffer pool by invoking mmap(2) once (via my_large_malloc()). After the change, we would invoke mmap(2) twice, first via my_virtual_mem_reserve() and then via my_virtual_mem_commit(). Outside Microsoft Windows, we are reverting back to my_large_malloc() like allocation. my_virtual_mem_reserve(): Define only for Microsoft Windows. Other platforms should invoke my_large_virtual_alloc() and update_malloc_size() instead of my_virtual_mem_reserve() and my_virtual_mem_commit(). my_large_virtual_alloc(): Define only outside Microsoft Windows. Do not specify MAP_NORESERVE nor MAP_POPULATE, to preserve compatibility with my_large_malloc(). Were MAP_POPULATE specified, the mmap() system call would be significantly slower, for example 18 seconds to reserve 64 GiB upfront.	2025-05-13 12:27:42 +03:00
Oleksandr Byelkin	a8d4642375	Merge branch '10.11' into 11.4	2025-04-26 10:53:02 +02:00
Marko Mäkelä	f1a8b7fe95	MDEV-36646: innodb_buffer_pool_size change aborted A statement SET GLOBAL innodb_buffer_pool_size=... could fail for no good reason when the buffer pool contains many pages that can actually be evicted. buf_flush_LRU_list_batch(): Keep evicting as long as the buffer pool is being shrunk, for at most innodb_lru_scan_depth extra blocks. Disregard the flush limit for pages that are marked as freed in files. buf_flush_LRU_to_withdraw(): Update the to_withdraw target during buf_flush_LRU_list_batch(). buf_pool_t::will_be_withdrawn(): Allow also ptr=nullptr (the condition will not hold for it). This fixes a regression that was introduced in commit `b6923420f3` (MDEV-29445) and caught by the test innodb.temp_truncate_freed in MariaDB Server 11.4. Tested by: Thirunarayanan Balathandayuthapani Reviewed by: Thirunarayanan Balathandayuthapani	2025-04-23 15:42:12 +03:00
Marko Mäkelä	a096f12ff7	MDEV-29445 fixup: debug assertion buf_buddy_alloc_from(): Pass the correct argument to buf_pool.contains_zip(). This fixes a failure of the test encryption.innochecksum when the code is built with cmake -DWITH_UBSAN=ON -DCMAKE_BUILD_TYPE=Debug	2025-04-14 08:33:10 +03:00
Marko Mäkelä	acd071f599	MDEV-21923: LSN allocation is a bottleneck The parameter innodb_log_spin_wait_delay will be deprecated and ignored, because there is no spin loop anymore. Thanks to commit `685d958e38` and commit `a635c40648` multiple mtr_t::commit() can concurrently copy their slice of mtr_t::m_log to the shared log_sys.buf. Each writer would allocate their own log sequence number by invoking log_t::append_prepare() while holding a shared log_sys.latch. This function was too heavy, because it would invoke a minimum of 4 atomic read-modify-write operations as well as system calls in the supposedly fast code path. It turns out that with a simpler data structure, instead of having several data fields that needed to be kept consistent with each other, we only need one Atomic_relaxed<uint64_t> write_lsn_offset, on which we can operate using fetch_add(), fetch_sub() as well as a single-bit fetch_or(), which reasonably modern compilers (GCC 7, Clang 15 or later) can translate into loop-free code on AMD64. Before anything can be written to the log, log_sys.clear_mmap() must be invoked. log_t::base_lsn: The LSN of the last write_buf() or persist(). This is a rough approximation of log_sys.lsn, which will be removed. log_t::write_lsn_offset: An Atomic_relaxed<uint64_t> that buffers updates of write_to_buf and base_lsn. log_t::buf_free, log_t::max_buf_free, log_t::lsn. Remove. Replaced by base_lsn and write_lsn_offset. log_t::buf_size: Always reflects the usable size in append_prepare(). log_t::lsn_lock: Remove. For the memory-mapped log in resize_write(), there will be a resize_wrap_mutex. log_t::get_lsn_approx(): Return a lower bound of get_lsn(). This should be exact unless append_prepare_wait() is pending. log_get_lsn(): A wrapper for log_sys.get_lsn(), which must be invoked while holding an exclusive log_sys.latch. recv_recovery_from_checkpoint_start(): Do not invoke fil_names_clear(); it would seem to be unnecessary. In many places, references to log_sys.get_lsn() are replaced with log_sys.get_flushed_lsn(), which remains a simple std::atomic::load(). Reviewed by: Debarun Banerjee	2025-04-10 13:02:17 +03:00
Marko Mäkelä	669f719cc2	MDEV-36489 10.11 crashes during bootstrap on macOS buf_block_t::initialise(): Remove a redundant call to page.lock.init() that was already executed in buf_pool_t::create() or buf_pool_t::resize(). This fixes a regression that was introduced in commit `b6923420f3` (MDEV-29445).	2025-04-07 11:01:17 +03:00
Marko Mäkelä	58a3677309	MDEV-29445 fixup: Do not skip a test	2025-04-02 15:56:22 +03:00
Marko Mäkelä	373071b956	Merge 10.6 into 10.11	2025-04-02 15:55:46 +03:00
Marko Mäkelä	e7442e5eb9	MDEV-36226 fixup: format mismatch	2025-04-02 12:53:21 +03:00
Marko Mäkelä	3ae8f114e2	Merge 10.11 into 11.4	2025-04-02 10:15:08 +03:00
Marko Mäkelä	4d1484f045	Merge fixup This fixes up `74f0b99edf`	2025-04-02 09:33:09 +03:00
Julius Goryavsky	74f0b99edf	Merge branch '10.6' into '10.11'	2025-04-02 06:33:39 +02:00
mariadb-DebarunBanerjee	77bebe9eb0	MDEV-36226 Stall and crash when page cleaner fails to generate free pages during Async flush During regular iteration the page cleaner does flush from flush list with some flush target and then goes for generating free pages from LRU tail. When asynchronous flush is triggered i.e. when 7/8 th of the LSN margin is filled in the redo log, the flush target for flush list is set to innodb_io_capacity_max. If it could flush all, the flush bandwidth for LRU flush is currently set to zero. If the LRU tail has dirty pages, page cleaner ends up freeing no pages in one iteration. The scenario could repeat across multiple iterations till async flush target is reached. During this time the DB system is starved of free pages resulting in apparent stall and in some cases dict_sys latch fatal error. Fix: In page cleaner iteration, before LRU flush, ensure we provide enough flush limit so that freeing pages is no blocked by dirty pages in LRU tail. Log IO and flush state if double write flush wait is long. Reviewed by: Marko Mäkelä	2025-03-31 19:09:23 +05:30
Marko Mäkelä	f5bd250f5b	Merge 10.11 into 11.4	2025-03-28 13:55:21 +02:00
Marko Mäkelä	7335e9b8ef	Merge 10.6 into 10.11	2025-03-28 10:55:40 +02:00
Marko Mäkelä	1b9d5cdb83	MDEV-35813: Valgrind test fixup log_checkpoint(): In cmake -DWITH_VALGRIND=ON builds, let us wait for all outstanding writes to complete, in order to avoid an unexpectedly large number of innodb_log_writes in the test innodb.page_cleaner.	2025-03-28 08:38:04 +02:00
Marko Mäkelä	027d815546	MDEV-29445 fixup: Make Valgrind fair again recv_sys_t::wait_for_pool(): Also wait for pending writes, so that previously written blocks can be evicted and reused. buf_flush_sync_for_checkpoint(): Wait for pending writes, in order to guarantee progress even if the scheduler is unfair.	2025-03-27 14:52:07 +02:00
Marko Mäkelä	ab0f2a00b6	Merge 10.6 into 10.11	2025-03-27 08:01:47 +02:00
Marko Mäkelä	ba81009f63	MDEV-34863 RAM Usage Changed Significantly Between 10.11 Releases innodb_buffer_pool_size_auto_min: A minimum innodb_buffer_pool_size that a Linux memory pressure event can lead to shrinking the buffer pool to. On a memory pressure event, we will attempt to shrink innodb_buffer_pool_size halfway between its current value and innodb_buffer_pool_size_auto_min. If innodb_buffer_pool_size_auto_min is specified as 0 or not specified on startup, its default value will be adjusted to innodb_buffer_pool_size_max, that is, memory pressure events will be disregarded by default. buf_pool_t::garbage_collect(): For up to 15 seconds, attempt to shrink the buffer pool in response to a memory pressure event. Reviewed by: Debarun Banerjee	2025-03-26 17:05:48 +02:00
Marko Mäkelä	b6923420f3	MDEV-29445: Reimplement SET GLOBAL innodb_buffer_pool_size We deprecate and ignore the parameter innodb_buffer_pool_chunk_size and let the buffer pool size to be changed in arbitrary 1-megabyte increments. innodb_buffer_pool_size_max: A new read-only startup parameter that specifies the maximum innodb_buffer_pool_size. If 0 or unspecified, it will default to the specified innodb_buffer_pool_size rounded up to the allocation unit (2 MiB or 8 MiB). The maximum value is 4GiB-2MiB on 32-bit systems and 16EiB-8MiB on 64-bit systems. This maximum is very likely to be limited further by the operating system. The status variable Innodb_buffer_pool_resize_status will reflect the status of shrinking the buffer pool. When no shrinking is in progress, the string will be empty. Unlike before, the execution of SET GLOBAL innodb_buffer_pool_size will block until the requested buffer pool size change has been implemented, or the execution is interrupted by a KILL statement a client disconnect, or server shutdown. If the buf_flush_page_cleaner() thread notices that we are running out of memory, the operation may fail with ER_WRONG_USAGE. SET GLOBAL innodb_buffer_pool_size will be refused if the server was started with --large-pages (even if no HugeTLB pages were successfully allocated). This functionality is somewhat exercised by the test main.large_pages, which now runs also on Microsoft Windows. On Linux, explicit HugeTLB mappings are apparently excluded from the reported Redident Set Size (RSS), and apparently unshrinkable between mmap(2) and munmap(2). The buffer pool will be mapped to a contiguous virtual memory area that will be aligned and partitioned into extents of 8 MiB on 64-bit systems and 2 MiB on 32-bit systems. Within an extent, the first few innodb_page_size blocks contain buf_block_t objects that will cover the page frames in the rest of the extent. The number of such frames is precomputed in the array first_page_in_extent[] for each innodb_page_size. In this way, there is a trivial mapping between page frames and block descriptors and we do not need any lookup tables like buf_pool.zip_hash or buf_pool_t::chunk_t::map. We will always allocate the same number of block descriptors for an extent, even if we do not need all the buf_block_t in the last extent in case the innodb_buffer_pool_size is not an integer multiple of the of extents size. The minimum innodb_buffer_pool_size is 256*5/4 pages. At the default innodb_page_size=16k this corresponds to 5 MiB. However, now that the innodb_buffer_pool_size includes the memory allocated for the block descriptors, the minimum would be innodb_buffer_pool_size=6m. my_large_virtual_alloc(): A new function, similar to my_large_malloc(). my_virtual_mem_reserve(), my_virtual_mem_commit(), my_virtual_mem_decommit(), my_virtual_mem_release(): New interface mostly by Vladislav Vaintroub, to separately reserve and release virtual address space, as well as to commit and decommit memory within it. After my_virtual_mem_decommit(), the virtual memory range will be read-only or unaccessible, depending on whether the build option cmake -DHAVE_UNACCESSIBLE_AFTER_MEM_DECOMMIT=1 has been specified. This option is hard-coded on Microsoft Windows, where VirtualMemory(MEM_DECOMMIT) will make the memory unaccessible. On IBM AIX, Linux, Illumos and possibly Apple macOS, the virtual memory will be zeroed out immediately. On other POSIX-like systems, madvise(MADV_FREE) will be used if available, to give the operating system kernel a permission to zero out the virtual memory range. We prefer immediate freeing so that the reported resident set size (RSS) of the process will reflect the current innodb_buffer_pool_size. Shrinking the buffer pool is a rarely executed resource intensive operation, and the immediate configuration of the MMU mappings should not incur significant additional penalty. opt_super_large_pages: Declare only on Solaris. Actually, this is specific to the SPARC implementation of Solaris, but because we lack access to a Solaris development environment, we will not revise this for other MMU and ISA. buf_pool_t::chunk_t::create(): Remove. buf_pool_t::create(): Initialize all n_blocks of the buf_pool.free list. buf_pool_t::allocate(): Renamed from buf_LRU_get_free_only(). buf_pool_t::LRU_warned: Changed to Atomic_relaxed<bool>, only to be modified by the buf_flush_page_cleaner() thread. buf_pool_t::shrink(): Attempt to shrink the buffer pool. There are 3 possible outcomes: SHRINK_DONE (success), SHRINK_IN_PROGRESS (the caller may keep trying), and SHRINK_ABORT (we seem to be running out of buffer pool). While traversing buf_pool.LRU, release the contended buf_pool.mutex once in every 32 iterations in order to reduce starvation. Use lru_scan_itr for efficient traversal, similar to buf_LRU_free_from_common_LRU_list(). buf_pool_t::shrunk(): Update the reduced size of the buffer pool in a way that is compatible with buf_pool_t::page_guess(), and invoke my_virtual_mem_decommit(). buf_pool_t::resize(): Before invoking shrink(), run one batch of buf_flush_page_cleaner() in order to prevent LRU_warn(). Abort if shrink() recommends it, or no blocks were withdrawn in the past 15 seconds, or the execution of the statement SET GLOBAL innodb_buffer_pool_size was interrupted. buf_pool_t::first_to_withdraw: The first block descriptor that is out of the bounds of the shrunk buffer pool. buf_pool_t::withdrawn: The list of withdrawn blocks. If buf_pool_t::resize() is aborted before shrink() completes, we must be able to resurrect the withdrawn blocks in the free list. buf_pool_t::contains_zip(): Added a parameter for the number of least significant pointer bits to disregard, so that we can find any pointers to within a block that is supposed to be free. buf_pool_t::is_shrinking(): Return the total number or blocks that were withdrawn or are to be withdrawn. buf_pool_t::to_withdraw(): Return the number of blocks that will need to be withdrawn. buf_pool_t::usable_size(): Number of usable pages, considering possible in-progress attempt at shrinking the buffer pool. buf_pool_t::page_guess(): Try to buffer-fix a guessed block pointer. If HAVE_UNACCESSIBLE_AFTER_MEM_DECOMMIT is set, the pointer will be validated before being dereferenced. buf_pool_t::get_info(): Replaces buf_stats_get_pool_info(). innodb_init_param(): Refactored. We must first compute srv_page_size_shift and then determine the valid bounds of innodb_buffer_pool_size. buf_buddy_shrink(): Replaces buf_buddy_realloc(). Part of the work is deferred to buf_buddy_condense_free(), which is being executed when we are not holding any buf_pool.page_hash latch. buf_buddy_condense_free(): Do not relocate blocks. buf_buddy_free_low(): Do not care about buffer pool shrinking. This will be handled by buf_buddy_shrink() and buf_buddy_condense_free(). buf_buddy_alloc_zip(): Assert !buf_pool.contains_zip() when we are allocating from the binary buddy system. Previously we were asserting this on multiple recursion levels. buf_buddy_block_free(), buf_buddy_free_low(): Assert !buf_pool.contains_zip(). buf_buddy_alloc_from(): Remove the redundant parameter j. buf_flush_LRU_list_batch(): Add the parameter to_withdraw to keep track of buf_pool.n_blocks_to_withdraw. buf_do_LRU_batch(): Skip buf_free_from_unzip_LRU_list_batch() if we are shrinking the buffer pool. In that case, we want to minimize the page relocations and just finish as quickly as possible. trx_purge_attach_undo_recs(): Limit purge_sys.n_pages_handled() in every iteration, in case the buffer pool is being shrunk in the middle of a purge batch. Reviewed by: Debarun Banerjee	2025-03-26 17:05:44 +02:00
Thirunarayanan Balathandayuthapani	a390aaaf23	MDEV-36180 Doublewrite recovery of innodb_checksum_algorithm=full_crc32 page_compressed pages does not work - InnoDB fails to recover the full crc32 page_compressed page from doublewrite buffer. The reason is that buf_dblwr_t::recover() fails to identify the space id from the page because the page has compressed from FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION bytes. Fix: === recv_dblwr_t::find_deferred_page(): Find the page which has the same page number and try to decompress/decrypt the page based on the tablespace metadata. After the decompression/decryption, compare the space id and write the recovered page back to the file. buf_page_t::read_complete(): Page read from disk is corrupted then try to read the page from deferred pages in doublewrite buffer.	2025-03-26 12:03:44 +01:00
mariadb-DebarunBanerjee	a8e35a1cc6	MDEV-36149 UBSAN in X is outside the range of representable values of type 'unsigned long' \| page_cleaner_flush_pages_recommendation Currently it is allowed to set innodb_io_capacity to very large value up to unsigned 8 byte maximum value 18446744073709551615. While calculating the number of pages to flush, we could sometime go beyond innodb_io_capacity. Specifically, MDEV-24369 has introduced a logic for aggressive flushing when dirty page percentage in buffer pool exceeds innodb_max_dirty_pages_pct. So, when innodb_io_capacity is set to very large value and dirty page percentage exceeds the threshold, there is a multiplication overflow in Innodb page cleaner. Fix: We should prevent setting io_capacity to unrealistic values and define a practical limit to it. The patch introduces limits for innodb_io_capacity_max and innodb_io_capacity to the maximum of 4 byte unsigned integer i.e. 4294967295 (2^32-1). For 16k page size this limit translates to 64 TiB/sec write IO speed which looks sufficient. Reviewed by: Marko Mäkelä	2025-03-17 11:44:09 +05:30
Marko Mäkelä	49a6baec56	Merge 10.11 into 11.4	2025-03-03 11:07:56 +02:00
Marko Mäkelä	7e001b2a3c	MDEV-36082 Race condition between log_t::resize_start() and log_t::resize_abort() log_t::writer_update(): Add the parameter bool resizing, to indicate whether log resizing is in progress. We must enable log_writer_resizing only if resize_lsn>1, to ensure that log_t::resize_abort() will not choose the wrong log_sys.log_writer. log_t::resize_initiator: The thread that successfully invoked resize_start(). log_t::resize_start(): Simplify some logic, and assign resize_initiator if we successfully started log resizing. log_t::resize_abort(): Abort log resizing if we are the resize_initiator. innodb_log_file_size_update(): Clean up some logic. Reviewed by: Debarun Banerjee	2025-02-17 15:55:58 +02:00
Sergei Golubchik	f1a7693bc0	Merge branch '10.11' into 11.4	2025-01-14 23:45:41 +01:00
Sergei Golubchik	221aa5e08f	Merge branch '10.6' into 10.11	2025-01-10 13:14:42 +01:00
Marko Mäkelä	17f01186f5	Merge 10.11 into 11.4	2025-01-09 07:58:08 +02:00
Marko Mäkelä	990b010b09	MDEV-35438 Annotate InnoDB I/O functions with noexcept Most InnoDB functions do not throw any exceptions, not even indirectly std::bad_alloc, which could be thrown by a C++ memory allocation function. Let us annotate many functions with noexcept in order to reduce the code footprint related to exception handling. Reviewed by: Thirunarayanan Balathandayuthapani	2025-01-09 07:43:24 +02:00
Marko Mäkelä	420d9eb27f	Merge 10.6 into 10.11	2025-01-08 12:51:26 +02:00
Thirunarayanan Balathandayuthapani	f8cf493290	MDEV-34898 Doublewrite recovery of innodb_checksum_algorithm=full_crc32 encrypted pages does not work - InnoDB fails to recover the full crc32 encrypted page from doublewrite buffer. The reason is that buf_dblwr_t::recover() fails to identify the space id from the page because the page has been encrypted from FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION bytes. Fix: === buf_dblwr_t::recover(): preserve any pages whose space_id does not match a known tablespace. These could be encrypted pages of tablespaces that had been created with innodb_checksum_algorithm=full_crc32. buf_page_t::read_complete(): If the page looks corrupted and the tablespace is encrypted and in full_crc32 format, try to restore the page from doublewrite buffer. recv_dblwr_t::recover_encrypted_page(): Find the page which has the same page number and try to decrypt the page using space->crypt_data. After decryption, compare the space id. Write the recovered page back to the file.	2025-01-07 19:33:56 +05:30
Marko Mäkelä	3f914afd3a	Merge 10.6 into 10.11	2025-01-02 12:39:56 +02:00
Marko Mäkelä	a54d151fc1	Merge 10.6 into 10.11	2024-12-19 15:38:53 +02:00
mariadb-DebarunBanerjee	3f22f5f2fe	MDEV-35679 Potential issue in Secondary Index with ROW_FORMAT=COMPRESSED and Change buffering enabled In function buf_page_create_low(), remove duplicate code that over-write the ibuf_exist variable incorrectly when only compressed page is loaded in buffer pool. This would help removing any old change buffer record immediately before re-using the page.	2024-12-18 20:46:26 +05:30
Marko Mäkelä	c391fb1ff1	MDEV-35577 Broken recovery after SET GLOBAL innodb_log_file_size If InnoDB is killed in such a way that there had been no writes to a newly resized ib_logfile101 after it replaced ib_logfile0 in log_t::write_checkpoint(), it is possible that recovery will accidentally interpret some garbage at the end of the log as valid. log_t::write_buf(): To prevent the corruption, write an extra NUL byte at the end of log_sys.resize_buf, like we always did for the main log_sys.buf. To remove some conditional branches from a time critical code path, we instantiate a separate template for the rare case that the log is being resized. Define as __attribute__((always_inline)) so that this will be inlined also in the rare case the log is being resized. log_t::writer: Pointer to the current implementation of log_t::write_buf(). For quick access, this is located in the same cache line with log_sys.latch, which protects it. log_t::writer_update(): Update log_sys.writer. log_t::resize_write_buf(): Remove ATTRIBUTE_NOINLINE ATTRIBUTE_COLD. Now that log_t::write_buf() will be instantiated separately for the rare case of log resizing being in progress, there is no need to forbid this code from being inlined. Thanks to Thirunarayanan Balathandayuthapani for finding the root cause of this bug and suggesting the fix of writing an extra NUL byte. Reviewed by: Debarun Banerjee	2024-12-16 11:50:00 +02:00
mariadb-DebarunBanerjee	c7698a0b70	MDEV-35626 Race condition between buf_page_create_low() and read completion This regression is introduced in 10.6 by following commit. commit `35d477dd1d` MDEV-34453 Trying to read 16384 bytes at 70368744161280 The page state could change after being buffer-fixed and needs to be read again after locking the page.	2024-12-13 18:36:47 +05:30
Marko Mäkelä	ddd7d5d8e3	MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure Under unknown circumstances, the SQL layer may wrongly disregard an invocation of thd_mark_transaction_to_rollback() when an InnoDB transaction had been aborted (rolled back) due to one of the following errors: * HA_ERR_LOCK_DEADLOCK * HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON) * HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON) Such an error used to cause a crash of InnoDB during transaction commit. These changes aim to catch and report the error earlier, so that not only this crash can be avoided but also the original root cause be found and fixed more easily later. The idea of this fix is from Michael 'Monty' Widenius. HA_ERR_ROLLBACK: A new error code that will be translated into ER_ROLLBACK_ONLY, signalling that the current transaction has been aborted and the only allowed action is ROLLBACK. trx_t::state: Add TRX_STATE_ABORTED that is like TRX_STATE_NOT_STARTED, but noting that the transaction had been rolled back and aborted. trx_t::is_started(): Replaces trx_is_started(). ha_innobase: Check the transaction state in various places. Simplify the logic around SAVEPOINT. ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only(). The InnoDB logic around transaction savepoints, commit, and rollback was unnecessarily complex and might have contributed to this inconsistency. So, we are simplifying that logic as well. trx_savept_t: Replace with const undo_no_t*. When we rollback to a savepoint, all we need to know is the number of undo log records that must survive. trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t directly in the space allocated at innobase_hton->savepoint_offset. fts_trx_create(): Do not copy previous savepoints. fts_savepoint_rollback(): If a savepoint was not found, roll back everything after the default savepoint of fts_trx_create(). The test innodb_fts.savepoint is extended to cover this code. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2024-12-12 18:02:00 +02:00
Marko Mäkelä	7bcd6c610a	MDEV-35618 Bogus assertion failure 'recv_sys.scanned_lsn < max_lsn + 32 * 512U' during recovery buf_dblwr_t::recover(): Correct a debug assertion failure that had been added in commit `bb47e575de` (MDEV-34830). The server may have been killed while a log write was in progress, and therefore recv_sys.scanned_lsn may be up to RECV_PARSING_BUF_SIZE bytes ahead of recv_sys.recovered_lsn. Thanks to Matthias Leich for providing "rr replay" traces and testing this.	2024-12-11 14:47:39 +02:00
Marko Mäkelä	bfe7c8ff0a	MDEV-35494 fil_space_t::fil_space_t() may be unsafe with GCC -flifetime-dse fil_space_t::create(): Instead of invoking the default fil_space_t constructor on a zero-filled buffer, allocate an uninitialized buffer and invoke an explicitly defined constructor on it. Also, specify initializer expressions for all constant data members, so that all of them will be initialized in the constructor. fil_space_t::being_imported: Replaces part of fil_space_t::purpose. fil_space_t::is_being_imported(), fil_space_t::is_temporary(): Replaces fil_space_t::purpose. fil_space_t:🆔 Changed the type from ulint to uint32_t to reduce incompatibility with later branches that include commit `ca501ffb04` (MDEV-26195). fil_space_t::try_to_close(): Do not attempt to close files that are in an I/O bound phase of ALTER TABLE…IMPORT TABLESPACE. log_file_op, first_page_init: recv_spaces_t: Use uint32_t for the tablespace id. Reviewed by: Debarun Banerjee	2024-12-11 14:44:42 +02:00
Kristian Nielsen	0f47db8525	Merge 10.11 -> 11.4 Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-12-05 11:01:42 +01:00
Kristian Nielsen	e7c6cdd842	Merge 10.6 -> 10.11 Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-12-05 10:11:58 +01:00
Marko Mäkelä	2719cc4925	Merge 10.11 into 11.4	2024-12-02 11:35:34 +02:00
Marko Mäkelä	507323abe6	Cleanup: Remove duplicated code buf_block_alloc(): Define as an alias in buf0lru.h, which defines the underlying buf_LRU_get_free_block(). buf_block_free(): Define as an alias of the non-inline function buf_pool.free_block(block). Reviewed by: Vladislav Lesin	2024-11-29 14:16:34 +02:00
Marko Mäkelä	3d23adb766	Merge 10.6 into 10.11	2024-11-29 13:43:17 +02:00
Marko Mäkelä	26597b91b3	MDEV-35413 InnoDB: Cannot load compressed BLOB A race condition was observed between two buf_page_get_zip() for a page. One of them had proceeded to buf_read_page(), allocating and x-latching a buf_block_t that initially comprises only an uncompressed page frame. While that thread was waiting inside buf_block_alloc(), another thread would try to access the same page. Without acquiring a page latch, it would wrongly conclude that there is corruption because no compressed page frame exists for the block. buf_page_get_zip(): Simplify the logic and correct the documentation. Always acquire a shared latch to prevent any race condition with a concurrent read operation. No longer increment a buffer-fix; the latch is sufficient for preventing page relocation or eviction. buf_read_page(): Add the parameter bool unzip=true. In buf_page_get_zip() there is no need to allocate an uncompressed page frame for reading a compressed BLOB page. We only need that for other ROW_FORMAT=COMPRESSED pages, or for writing compressed BLOB pages. btr_copy_zblob_prefix(): Remove the message "Cannot load compressed BLOB" because buf_page_get_zip() will already have reported a more specific error whenever it returns nullptr. row_merge_buf_add(): Do not crash on BLOB corruption, but return an error instead. (In debug builds, an assertion will fail if this corruption is noticed.) Reviewed by: Debarun Banerjee	2024-11-22 08:33:03 +02:00

1 2 3 4 5 ...

1771 Commits