mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-01 03:47:19 +03:00

Author	SHA1	Message	Date
Sergei Golubchik	e3d9369774	cleanup: disconnect before DROP USER let's always disconnect a user connection before dropping the said user. MariaDB is traditionally very tolerant to active connections of the dropped user, which isn't the case for most other databases. Let's avoid unintentionally spreading incompatible behavior and disconnect before drop. Except in cases when the test specifically tests such a behavior.	2025-07-16 09:14:33 +07:00
Sergei Golubchik	bead24b7f3	mariadb-test: wait on disconnect Remove one of the major sources of race condiitons in mariadb-test. Normally, mariadb_close() sends COM_QUIT to the server and immediately disconnects. In mariadb-test it means the test can switch to another connection and sends queries to the server before the server even started parsing the COM_QUIT packet and these queries can see the connection as fully active, as it didn't reach dispatch_command yet. This is a major source of instability in tests and many - but not all, still less than a half - tests employ workarounds. The correct one is a pair count_sessions.inc/wait_until_count_sessions.inc. Also very popular was wait_until_disconnected.inc, which was completely useless, because it verifies that the connection is closed, and after disconnect it always is, it didn't verify whether the server processed COM_QUIT. Sadly the placebo was as widely used as the real thing. Let's fix this by making mariadb-test `disconnect` command _to wait_ for the server to confirm. This makes almost all workarounds redundant. In some cases count_sessions.inc/wait_until_count_sessions.inc is still needed, though, as only `disconnect` command is changed: * after external tools, like `exec $MYSQL` * after failed `connect` command * replication, after `STOP SLAVE` * Federated/CONNECT/SPIDER/etc after `DROP TABLE` and also in some XA tests, because an XA transaction is dissociated from the THD very late, after the server has closed the client connection. Collateral cleanups: fix comments, remove some redundant statements: * DROP IF EXISTS if nothing is known to exist * DROP table/view before DROP DATABASE * REVOKE privileges before DROP USER etc	2025-07-16 09:14:33 +07:00
Marko Mäkelä	cffbb17480	MDEV-28933: Per-table unique FOREIGN KEY constraint names Before MySQL 4.0.18, user-specified constraint names were ignored. Starting with MySQL 4.0.18, the specified constraint name was prepended with the schema name and '/'. Now we are transforming into a format where the constraint name is prepended with the dict_table_t::name and the impossible UTF-8 sequence 0xff. Generated constraint names will be ASCII decimal numbers. On upgrade, old FOREIGN KEY constraint names will be displayed without any schema name prefix. They will be updated to the new format on DDL operations. dict_foreign_t::sql_id(): Return the SQL constraint name without any schemaname/tablename\377 or schemaname/ prefix. row_rename_table_for_mysql(), dict_table_rename_in_cache(): Simplify the logic: Just rename constraints to the new format. dict_table_get_foreign_id(): Replaces dict_table_get_highest_foreign_id(). innobase_get_foreign_key_info(): Let my_error() refer to erroneous anonymous constraints as "(null)". row_delete_constraint(): Try to drop all 3 constraint name variants. Reviewed by: Thirunarayanan Balathandayuthapani Tested by: Matthias Leich	2025-07-08 12:30:27 +03:00
Oleksandr Byelkin	e653666368	Merge branch '12.0' into 12.1	2025-06-18 09:27:49 +02:00
Oleksandr Byelkin	dfcb5c91e0	Merge branch '11.8' into 12.0	2025-06-18 07:50:39 +02:00
Oleksandr Byelkin	a65f7dc71d	Merge branch '11.4' into 11.8	2025-06-18 07:43:24 +02:00
Oleksandr Byelkin	89c7e2b9c7	Merge branch '10.11' into 11.4	2025-06-17 09:50:22 +02:00
Thirunarayanan Balathandayuthapani	6a2afb42ba	MDEV-36487 Fix ha_innobase::check() for sequences InnoDB does the following check for sequence table during check table command: - There should be only one index should exist on sequence table - There should be only one row should exist on sequence table - The leaf page must be the root page for the sequence table - Delete marked record should not exist - DB_TRX_ID and DB_ROLL_PTR of the record should be 0 and 1U << 55	2025-06-09 13:52:44 +05:30
Oleksandr Byelkin	7dcdb2c876	Merge branch '11.8' into 11.8 release	2025-05-30 13:28:41 +02:00
Marko Mäkelä	d953f2c810	MDEV-36868: Inconsistency when shrinking innodb_buffer_pool_size buf_pool_t::resize(): After successfully shrinking the buffer pool, announce the success. The size had already been updated in shrunk(). After failing to shrink the buffer pool, re-enable the adaptive hash index if it had been enabled. Reviewed by: Debarun Banerjee	2025-05-28 14:33:20 +03:00
Marko Mäkelä	7b4b759f13	MDEV-36868: Inconsistency when shrinking innodb_buffer_pool_size buf_pool_t::resize(): After successfully shrinking the buffer pool, announce the success. The size had already been updated in shrunk(). After failing to shrink the buffer pool, re-enable the adaptive hash index if it had been enabled. Reviewed by: Debarun Banerjee	2025-05-28 13:33:06 +03:00
Thirunarayanan Balathandayuthapani	db188083c3	MDEV-36771 Assertion 'bulk_insert == TRX_NO_BULK' failed in trx_t::assert_freed - InnoDB fails to reset bulk_insert of a transaction while freeing the transaction during shutting down of a server.	2025-05-26 12:13:01 +05:30
Marko Mäkelä	3da36fa130	Merge 10.6 into 10.11	2025-05-26 08:10:47 +03:00
Thirunarayanan Balathandayuthapani	d8962d138f	MDEV-36017 Alter table aborts when temporary directory is full Problem: ======= - In 10.11, During Copy algorithm, InnoDB does use bulk insert for row by row insert operation. When temporary directory ran out of memory, row_mysql_handle_errors() fails to handle DB_TEMP_FILE_WRITE_FAIL. - During inplace algorithm, concurrent DML fails to write the log operation into the temporary file. InnoDB fail to mark the error for the online log. - ddl_log_write() releases the global ddl lock prematurely before release the log memory entry Fix: === row_mysql_handle_errors(): Rollback the transaction when InnoDB encounters DB_TEMP_FILE_WRITE_FAIL convert_error_code_to_mysql(): Report an aborted transaction when InnoDB encounters DB_TEMP_FILE_WRITE_FAIL during alter table algorithm=copy or innodb bulk insert operation row_log_online_op(): Mark the error in online log when InnoDB ran out of temporary space fil_space_extend_must_retry(): Mark the os_has_said_disk_full as true if os_file_set_size() fails btr_cur_pessimistic_update(): Return error code when btr_cur_pessimistic_insert() fails ddl_log_write(): Release the global ddl lock after releasing the log memory entry when error was encountered btr_cur_optimistic_update(): Relax the assertion that blob pointer can be null during rollback because InnoDB can ran out of space while allocating the external page ha_innobase::extra(): Rollback the transaction during DDL before calling convert_error_code_to_mysql(). row_undo_mod_upd_exist_sec(): Remove the assertion which says that InnoDB should fail to build index entry when rollbacking an incomplete transaction after crash recovery. This scenario can happen when InnoDB ran out of space. row_upd_changes_ord_field_binary_func(): Relax the assertion to make that externally stored field can be null when InnoDB ran out of space.	2025-05-26 10:12:14 +05:30
Thirunarayanan Balathandayuthapani	8a4d3a044f	MDEV-36017 Alter table aborts when temporary directory is full Problem: ======= - During inplace algorithm, concurrent DML fails to write the log operation into the temporary file. InnoDB fail to mark the error for the online log. - ddl_log_write() releases the global ddl lock prematurely before release the log memory entry Fix: === row_log_online_op(): Mark the error in online log when InnoDB ran out of temporary space fil_space_extend_must_retry(): Mark the os_has_said_disk_full as true if os_file_set_size() fails btr_cur_pessimistic_update(): Return error code when btr_cur_pessimistic_insert() fails ddl_log_write(): Release the global ddl lock after releasing the log memory entry when error was encountered btr_cur_optimistic_update(): Relax the assertion that blob pointer can be null during rollback because InnoDB can ran out of space while allocating the external page row_undo_mod_upd_exist_sec(): Remove the assertion which says that InnoDB should fail to build index entry when rollbacking an incomplete transaction after crash recovery. This scenario can happen when InnoDB ran out of space. row_upd_changes_ord_field_binary_func(): Relax the assertion to make that externally stored field can be null when InnoDB ran out of space.	2025-05-25 09:11:41 +05:30
Oleksandr Byelkin	f1102da37a	Merge branch '11.8' into 12.0	2025-05-22 09:22:55 +02:00
Oleksandr Byelkin	8d36cafe4f	Merge branch '11.4' into 11.8	2025-05-21 15:57:16 +02:00
Marko Mäkelä	118cfcf821	Merge 10.11 into 11.4	2025-05-13 13:44:58 +03:00
Marko Mäkelä	bb48d7bc81	MDEV-36781: Assertion i < BUF_BUDDY_SIZES failed in buf_buddy_shrink() buf_buddy_shrink(): Properly cover the case when KEY_BLOCK_SIZE corresponds to the innodb_page_size, that is, the ROW_FORMAT=COMPRESSED page frame is directly allocated from the buffer pool, not via the binary buddy allocator. buf_LRU_check_size_of_non_data_objects(): Avoid a crash when the buffer pool is being shrunk. buf_pool_t::shrink(): Abort if over 95% of the shrunk buffer pool would be occupied by the adaptive hash index or record locks.	2025-05-13 12:27:46 +03:00
Sergei Golubchik	11f6b9d12a	remove features that were deprecated in 10.5 --big-tables --large-page-size --storage-engine performance_schema.setup_timers (WL#10986)	2025-04-29 16:53:02 +02:00
Vasilii Lakhin	1b95e46524	Fix typos in mysql-test/	2025-04-29 13:53:16 +10:00
Monty	f8ba5ced55	MDEV-36099 Ensure that creation and usage of temporary tables in replication is predictable MDEV-36563 Assertion `!mysql_bin_log.is_open()' failed in THD::mark_tmp_table_as_free_for_reuse The purpose of this commit is to ensure that creation and changes of temporary tables are properly and predicable logged to the binary log. It also fixes some bugs where ROW logging was used in MIXED mode, when STATEMENT would be a better (and expected) choice. In this comment STATEMENT stands for logging to binary log in STATEMENT format, MIXED stands for MIXED binlog format and ROW for ROW binlog format. New rules for logging of temporary tables - CREATE of temporary tables are now by default binlogged only if STATEMENT binlog format is used. If it is binlogged, 1 is stored in TABLE_SHARE->table_creation_was_logged. The user can change this behavior by setting create_temporary_table_binlog_formats to MIXED,STATEMENT in which case the create is logged in statement format also in MIXED mode (as before). - Changes to temporary tables are only binlogged if and only if the CREATE was logged. The logging happens under STATEMENT or MIXED. If binlog_format=ROW, temporary table changes are not binlogged. A temporary table that are changed under ROW are marked as 'not up to date in binlog' and no future row changes are logged. Any usage of this temporary table will force row logging of other tables in any future statements using the temporary table to be row logged. - DROP TEMPORARY is binlogged only of the CREATE was binlogged. Changes done: - Row logging is forced for any statement using temporary tables that are not up to date in the binary log. (Before the row logging was forced if the user has a temporary table) - If there is any changes to the temporary table that is not binlogged, the table is marked as not up to date. - TABLE_SHARE->table_creation_was_logged has a new definition for temporary tables: 0 Table creating was not logged to binary log 1 Table creating was logged to binary log and table is up to date. 2 Table creating was logged to binary log but some changes where not logged to binary log. Table is not up to date in binary log is defined as value 0 or 2. - If a multi-table-update or multi-table-delete fails then all updated temporary tables are marked as not up to date. - Enforce row logging if the query is using temporary tables that are not up to date. Before row logging was enforced if the user had any temporary tables. - When dropping temporary tables use IF EXISTS. This ensures that slave will not stop if it had crashed and lost the temporary tables. - Remove comment and version from DROP /*!4000 TEMPORARY.. generated when a connection closes that has open temporary tables. Added 'generated by server' at the end of the DROP. Bugs fixed: - When using temporary tables with commands that forced row based, like INSERT INTO temporary_table VALUES (UUID()), this was never logged which causes the temporary table to be inconsistent on master and slave. - Used binlog format is now clearly defined. It is now only depending on the current binlog_format and the tables used. Before it was depending on the user had ANY temporary tables and the state of 'current_stmt_binlog_format' set by previous queries. This also caused temporary tables to be logged to binary log in some cases. - CREATE TABLE t1 LIKE not_logged_temporary_table caused replication to stop. - Rename of not binlogged temporary tables where binlogged to binary log which caused replication to stop. Changes in behavior: - By default create_temporary_table_binlog_formats=STATEMENT, which means that CREATE TEMPORARY is not logged to binary log under MIXED binary logging. This can be changed by setting create_temporary_table_binlog_formats to MIXED,STATEMENT. - Using temporary tables that was not logged to the binary log will cause any query using them for updating other tables to be logged in ROW format. Before all queries was logged in ROW format if the user had any temporary tables, even if they were not used by the query. - Generated DROP TEMPORARY TABLE is now always using IF EXISTS and has a "generated by server" comment in the binary log. The consequences of the above is that manipulations of a lot of rows through temporary tables will by default be be slower in mixed mode. For example: BEGIN; CREATE TEMPORARY TABLE tmp AS SELECT a, b, c FROM large_table1 JOIN large_table2 ON ...; INSERT INTO other_table SELECT b, c FROM tmp WHERE a <100; DROP TEMPORARY TABLE tmp; COMMIT; By default this will create a huge entry in the binary log, compared to just a few hundred bytes in statement mode. However the change in this commit will make usage of temporary tables more reliable and predicable and is thus worth it. Using statement mode or create_temporary_table_binlog_formats can be used to avoid this issue.	2025-04-28 12:59:38 +03:00
Sergei Golubchik	237e24497b	Merge remote-tracking branch 'github/bb-11.4-release' into bb-11.8-serg	2025-04-27 19:40:00 +02:00
Oleksandr Byelkin	a8d4642375	Merge branch '10.11' into 11.4	2025-04-26 10:53:02 +02:00
Marko Mäkelä	75ad1e9f00	Merge 10.6 into 10.11	2025-04-23 08:53:53 +03:00
Vlad Lesin	47e687b109	MDEV-36639 innodb_snapshot_isolation=1 gives error for not committed row changes Set solution is to check if transaction, which modified a record, is still active in lock_clust_rec_read_check_and_lock(). if yes, then just request a lock. If no, then, depending on if the current transaction read view can see the changes, return eighter DB_RECORD_CHANGED or request a lock. We can do the check in lock_clust_rec_read_check_and_lock() because transaction tries to set a lock on the record which cursor points to after transaction resuming and cursor position restoring. If the lock already exists, then we don't request the lock again. But for the current commit it's important that lock_clust_rec_read_check_and_lock() will be invoked again for the same record, so we can do the check again after transaction, which modified a record, was committed or rolled back. MDEV-33802(`4aa9291`) is partially reverted. If some transaction holds implicit lock on some record and transaction with snapshot isolation level requests conflicting lock on the same record, it should be blocked instead of returning DB_RECORD_CHANGED to have ability to continue execution when implicit lock owner is rolled back. The construction -------------------------------------------------------------------------- let $wait_condition= select count(*) = 1 from information_schema.processlist where state = 'Updating' and info = 'UPDATE t SET b = 2 WHERE a'; --source include/wait_condition.inc -------------------------------------------------------------------------- is not reliable enought to make sure transaction is blocked in test case, the test failed sporadically with -------------------------------------------------------------------------- ./mtr --max-test-fail=1 --parallel=96 lock_isolation{,,,,,,,}{,,,}{,,} \ --repeat=500 -------------------------------------------------------------------------- command. That's why it was replaced with debug sync-points. Reviewed by: Marko Mäkelä	2025-04-22 20:41:43 +03:00
Thirunarayanan Balathandayuthapani	dac3d702f7	MDEV-36649 dict_acquire_mdl_shared() aborts when table mode is DICT_TABLE_OP_OPEN_ONLY_IF_CACHED - InnoDB fails to check the table is being dropped or evicted while acquiring the MDL for the table when table open operation mode is DICT_TABLE_OP_OPEN_ONLY_IF_CACHED. This is caused by the commit `337bf8ac4b` (MDEV-36122) Fix: === dict_acquire_mdl_shared(): If the table is evicted or dropped when table operation mode is DICT_TABLE_OP_OPEN_IF_CACHED then return nullptr	2025-04-22 15:17:29 +05:30
Sergei Golubchik	1a85ae444a	MDEV-36050 DATA/INDEX DIRECTORY handling is inconsistent consistently issue a Note 1618 DATA DIRECTORY option ignored Note 1618 INDEX DIRECTORY option ignored in archive/csv/innodb/rocksdb whenever an option is ignored. Note that csv doesn't say "INDEX DIRECTORY option ignored" because it does not create index files at all anywhere. Other engines don't say "INDEX DIRECTORY option ignored" if the table has no indexes. additionally InnoDB doesn't say that if INDEX DIRECTORY is the same as DATA DIRECTORY, because in that case indexes are technically stored in INDEX DIRECTORY. collateral fix: use strmake to zero-terminate the string	2025-04-18 09:41:23 +02:00
Thirunarayanan Balathandayuthapani	f388222d49	MDEV-36504 Memory leak after CREATE TABLE..SELECT Problem: ======== - After commit `cc8eefb0dc` (MDEV-33087), InnoDB does use bulk insert operation for ALTER TABLE.. ALGORITHM=COPY and CREATE TABLE..SELECT as well. InnoDB fails to clear the bulk buffer when it encounters error during CREATE..SELECT. Problem is that while transaction cleanup, InnoDB fails to identify the bulk insert for DDL operation. Fix: ==== - Represent bulk_insert in trx by 2 bits. By doing that, InnoDB can distinguish between TRX_DML_BULK, TRX_DDL_BULK. During DDL, set bulk insert value for transaction to TRX_DDL_BULK. - Introduce a parameter HA_EXTRA_ABORT_ALTER_COPY which rollbacks only TRX_DDL_BULK transaction. - bulk_insert_apply() happens for TRX_DDL_BULK transaction happens only during HA_EXTRA_END_ALTER_COPY extra() call.	2025-04-17 12:04:09 +05:30
Julius Goryavsky	1a013cea95	Merge branch '10.6' into '10.11'	2025-04-16 03:34:40 +02:00
Thirunarayanan Balathandayuthapani	c6de1267dd	MDEV-35689 InnoDB system tables cannot be optimized or defragmented - With the help of MDEV-14795, InnoDB implemented a way to shrink the InnoDB system tablespace after undo tablespaces have been moved to separate files (MDEV-29986). There is no way to defragment any pages of InnoDB system tables. By doing that, shrinking of system tablespace can be more effective. This patch deals with defragment of system tables inside ibdata1. Following steps are done to do the defragmentation of system tablespace: 1) Make sure that there is no user tables exist in ibdata1 2) Iterate through all extent descriptor pages in system tablespace and note their states. 3) Find the free earlier extent to replace the lastly used extents in the system tablespace. 4) Iterate through all indexes of system tablespace and defragment the tree level by level. 5) Iterate the level from left page to right page and find out the page comes under the extent to be replaced. If it is then do step (6) else step(4) 6) Prepare the allocation of new extent by latching necessary pages. If any error happens then there is no modification of page happened till step (5). 7) Allocate the new page from the new extent 8) Prepare the associated pages for the block to be modified 9) Prepare the step of freeing of page 10) If any error happens during preparing of associated pages, freeing of page then restore the page which was modified during new page allocation 11) Copy the old page content to new page 12) Change the associative pages like left, right and parent page 13) Complete the freeing of old page Allocation of page from new extent, changing of relative pages, freeing of page are done by 2 steps. one is prepare which latches the to be modified pages and checks their validation. Other is complete(), Do the operation fseg_validate(): Validate the list exist in inode segment Defragmentation is enabled only when :autoextend exist in innodb_data_file_path variable.	2025-04-10 17:13:34 +05:30
Marko Mäkelä	669f719cc2	MDEV-36489 10.11 crashes during bootstrap on macOS buf_block_t::initialise(): Remove a redundant call to page.lock.init() that was already executed in buf_pool_t::create() or buf_pool_t::resize(). This fixes a regression that was introduced in commit `b6923420f3` (MDEV-29445).	2025-04-07 11:01:17 +03:00
Marko Mäkelä	db4763a0d1	Fix a slow test When we expect a lock wait timeout, let us override the default innodb_lock_wait_timeout=50 with the minimum timeout of 1 second.	2025-04-07 10:25:34 +03:00
Thirunarayanan Balathandayuthapani	b11772d9a5	MDEV-33167 ASAN errors in dict_sys_t::load_table / get_foreign_key_info after failing to load a table Problem: ======= - While loading the foreign key constraints for the parent table, if child table wasn't open then InnoDB uses the parent table heap to store the child table name in fk_tables list. If the consecutive foreign key relation for the parent table fails with error, InnoDB evicts the parent table from memory. But InnoDB accesses the evicted table memory again in dict_sys.load_table() Solution: ======== dict_load_table_one(): In case of error, remove the child table names which was added during dict_load_foreigns()	2025-04-03 17:39:40 +05:30
Thirunarayanan Balathandayuthapani	0d7ef4f478	MDEV-36236 Instant alter aborts when InnoDB fails to rollback instant operation Problem: ======== - InnoDB does consecutive instant alter operation, first instant DDL fails, it fails to reset the old instant information in table during rollback. This lead to consecutive instant alter to have wrong assumption about the exisitng instant column information. Fix: ==== dict_table_t::instant_column(): Duplicate the instant information field of the table. By doing this, InnoDB alter retains the old instant information and reset it during rollback operation	2025-04-03 13:09:08 +05:30
Marko Mäkelä	58a3677309	MDEV-29445 fixup: Do not skip a test	2025-04-02 15:56:22 +03:00
Marko Mäkelä	bb1d88b6dc	Merge 11.4 into 11.8	2025-04-02 14:07:01 +03:00
Marko Mäkelä	3ae8f114e2	Merge 10.11 into 11.4	2025-04-02 10:15:08 +03:00
Marko Mäkelä	aaec841865	Merge 10.6 into 10.11	2025-04-02 09:33:20 +03:00
Marko Mäkelä	4c0e2f1aca	MDEV-35813: even more robust test case The test in commit `1756b0f37d` is occasionally failing if there are unexpectedly many page cleaner batches that are updating the log checkpoint by small amounts. This occurs in particular when running the server under Valgrind. Let us insert the same number of records with a larger number of statements in a hope that the test would then be more likely to pass.	2025-04-02 08:12:29 +03:00
Marko Mäkelä	f56099a95d	Fix some tests mainly on Valgrind	2025-03-29 10:14:56 +02:00
Marko Mäkelä	f5bd250f5b	Merge 10.11 into 11.4	2025-03-28 13:55:21 +02:00
Marko Mäkelä	ab0f2a00b6	Merge 10.6 into 10.11	2025-03-27 08:01:47 +02:00
Marko Mäkelä	191209d8ab	Merge 10.5 into 10.6	2025-03-26 17:09:57 +02:00
Marko Mäkelä	ba81009f63	MDEV-34863 RAM Usage Changed Significantly Between 10.11 Releases innodb_buffer_pool_size_auto_min: A minimum innodb_buffer_pool_size that a Linux memory pressure event can lead to shrinking the buffer pool to. On a memory pressure event, we will attempt to shrink innodb_buffer_pool_size halfway between its current value and innodb_buffer_pool_size_auto_min. If innodb_buffer_pool_size_auto_min is specified as 0 or not specified on startup, its default value will be adjusted to innodb_buffer_pool_size_max, that is, memory pressure events will be disregarded by default. buf_pool_t::garbage_collect(): For up to 15 seconds, attempt to shrink the buffer pool in response to a memory pressure event. Reviewed by: Debarun Banerjee	2025-03-26 17:05:48 +02:00
Marko Mäkelä	b6923420f3	MDEV-29445: Reimplement SET GLOBAL innodb_buffer_pool_size We deprecate and ignore the parameter innodb_buffer_pool_chunk_size and let the buffer pool size to be changed in arbitrary 1-megabyte increments. innodb_buffer_pool_size_max: A new read-only startup parameter that specifies the maximum innodb_buffer_pool_size. If 0 or unspecified, it will default to the specified innodb_buffer_pool_size rounded up to the allocation unit (2 MiB or 8 MiB). The maximum value is 4GiB-2MiB on 32-bit systems and 16EiB-8MiB on 64-bit systems. This maximum is very likely to be limited further by the operating system. The status variable Innodb_buffer_pool_resize_status will reflect the status of shrinking the buffer pool. When no shrinking is in progress, the string will be empty. Unlike before, the execution of SET GLOBAL innodb_buffer_pool_size will block until the requested buffer pool size change has been implemented, or the execution is interrupted by a KILL statement a client disconnect, or server shutdown. If the buf_flush_page_cleaner() thread notices that we are running out of memory, the operation may fail with ER_WRONG_USAGE. SET GLOBAL innodb_buffer_pool_size will be refused if the server was started with --large-pages (even if no HugeTLB pages were successfully allocated). This functionality is somewhat exercised by the test main.large_pages, which now runs also on Microsoft Windows. On Linux, explicit HugeTLB mappings are apparently excluded from the reported Redident Set Size (RSS), and apparently unshrinkable between mmap(2) and munmap(2). The buffer pool will be mapped to a contiguous virtual memory area that will be aligned and partitioned into extents of 8 MiB on 64-bit systems and 2 MiB on 32-bit systems. Within an extent, the first few innodb_page_size blocks contain buf_block_t objects that will cover the page frames in the rest of the extent. The number of such frames is precomputed in the array first_page_in_extent[] for each innodb_page_size. In this way, there is a trivial mapping between page frames and block descriptors and we do not need any lookup tables like buf_pool.zip_hash or buf_pool_t::chunk_t::map. We will always allocate the same number of block descriptors for an extent, even if we do not need all the buf_block_t in the last extent in case the innodb_buffer_pool_size is not an integer multiple of the of extents size. The minimum innodb_buffer_pool_size is 256*5/4 pages. At the default innodb_page_size=16k this corresponds to 5 MiB. However, now that the innodb_buffer_pool_size includes the memory allocated for the block descriptors, the minimum would be innodb_buffer_pool_size=6m. my_large_virtual_alloc(): A new function, similar to my_large_malloc(). my_virtual_mem_reserve(), my_virtual_mem_commit(), my_virtual_mem_decommit(), my_virtual_mem_release(): New interface mostly by Vladislav Vaintroub, to separately reserve and release virtual address space, as well as to commit and decommit memory within it. After my_virtual_mem_decommit(), the virtual memory range will be read-only or unaccessible, depending on whether the build option cmake -DHAVE_UNACCESSIBLE_AFTER_MEM_DECOMMIT=1 has been specified. This option is hard-coded on Microsoft Windows, where VirtualMemory(MEM_DECOMMIT) will make the memory unaccessible. On IBM AIX, Linux, Illumos and possibly Apple macOS, the virtual memory will be zeroed out immediately. On other POSIX-like systems, madvise(MADV_FREE) will be used if available, to give the operating system kernel a permission to zero out the virtual memory range. We prefer immediate freeing so that the reported resident set size (RSS) of the process will reflect the current innodb_buffer_pool_size. Shrinking the buffer pool is a rarely executed resource intensive operation, and the immediate configuration of the MMU mappings should not incur significant additional penalty. opt_super_large_pages: Declare only on Solaris. Actually, this is specific to the SPARC implementation of Solaris, but because we lack access to a Solaris development environment, we will not revise this for other MMU and ISA. buf_pool_t::chunk_t::create(): Remove. buf_pool_t::create(): Initialize all n_blocks of the buf_pool.free list. buf_pool_t::allocate(): Renamed from buf_LRU_get_free_only(). buf_pool_t::LRU_warned: Changed to Atomic_relaxed<bool>, only to be modified by the buf_flush_page_cleaner() thread. buf_pool_t::shrink(): Attempt to shrink the buffer pool. There are 3 possible outcomes: SHRINK_DONE (success), SHRINK_IN_PROGRESS (the caller may keep trying), and SHRINK_ABORT (we seem to be running out of buffer pool). While traversing buf_pool.LRU, release the contended buf_pool.mutex once in every 32 iterations in order to reduce starvation. Use lru_scan_itr for efficient traversal, similar to buf_LRU_free_from_common_LRU_list(). buf_pool_t::shrunk(): Update the reduced size of the buffer pool in a way that is compatible with buf_pool_t::page_guess(), and invoke my_virtual_mem_decommit(). buf_pool_t::resize(): Before invoking shrink(), run one batch of buf_flush_page_cleaner() in order to prevent LRU_warn(). Abort if shrink() recommends it, or no blocks were withdrawn in the past 15 seconds, or the execution of the statement SET GLOBAL innodb_buffer_pool_size was interrupted. buf_pool_t::first_to_withdraw: The first block descriptor that is out of the bounds of the shrunk buffer pool. buf_pool_t::withdrawn: The list of withdrawn blocks. If buf_pool_t::resize() is aborted before shrink() completes, we must be able to resurrect the withdrawn blocks in the free list. buf_pool_t::contains_zip(): Added a parameter for the number of least significant pointer bits to disregard, so that we can find any pointers to within a block that is supposed to be free. buf_pool_t::is_shrinking(): Return the total number or blocks that were withdrawn or are to be withdrawn. buf_pool_t::to_withdraw(): Return the number of blocks that will need to be withdrawn. buf_pool_t::usable_size(): Number of usable pages, considering possible in-progress attempt at shrinking the buffer pool. buf_pool_t::page_guess(): Try to buffer-fix a guessed block pointer. If HAVE_UNACCESSIBLE_AFTER_MEM_DECOMMIT is set, the pointer will be validated before being dereferenced. buf_pool_t::get_info(): Replaces buf_stats_get_pool_info(). innodb_init_param(): Refactored. We must first compute srv_page_size_shift and then determine the valid bounds of innodb_buffer_pool_size. buf_buddy_shrink(): Replaces buf_buddy_realloc(). Part of the work is deferred to buf_buddy_condense_free(), which is being executed when we are not holding any buf_pool.page_hash latch. buf_buddy_condense_free(): Do not relocate blocks. buf_buddy_free_low(): Do not care about buffer pool shrinking. This will be handled by buf_buddy_shrink() and buf_buddy_condense_free(). buf_buddy_alloc_zip(): Assert !buf_pool.contains_zip() when we are allocating from the binary buddy system. Previously we were asserting this on multiple recursion levels. buf_buddy_block_free(), buf_buddy_free_low(): Assert !buf_pool.contains_zip(). buf_buddy_alloc_from(): Remove the redundant parameter j. buf_flush_LRU_list_batch(): Add the parameter to_withdraw to keep track of buf_pool.n_blocks_to_withdraw. buf_do_LRU_batch(): Skip buf_free_from_unzip_LRU_list_batch() if we are shrinking the buffer pool. In that case, we want to minimize the page relocations and just finish as quickly as possible. trx_purge_attach_undo_recs(): Limit purge_sys.n_pages_handled() in every iteration, in case the buffer pool is being shrunk in the middle of a purge batch. Reviewed by: Debarun Banerjee	2025-03-26 17:05:44 +02:00
Thirunarayanan Balathandayuthapani	1f4a901576	MDEV-36281 DML aborts during online virtual index Reason: ======= - InnoDB DML commit aborts the server when InnoDB does online virtual index. During online DDL, concurrent DML commit operation does read the undo log record and their related current version of the clustered index record. Based on the operation, InnoDB do build the old tuple and new tuple for the table. If the concurrent online index can be affected by the operation, InnoDB does build the entry for the index and log the operation. Problematic case is update operation, InnoDB does build the update vector. But while building the old row, InnoDB fails to fill the non-affected virtual column. This lead to server abort while build the entry for index. Fix: === - First, fill the virtual column entries for the new row. Duplicate the old row based on new row and change only the affected fields in old row based on the update vector.	2025-03-26 12:48:39 +01:00
Marko Mäkelä	33a462e0b1	MDEV-36373 Bogus Warning: ... storage is corrupted ha_innobase::statistics_init(), ha_innobase::info_low(): Correctly handle a DB_READ_ONLY return value from dict_stats_save(). Fixes up commit `6e6a1b316c` (MDEV-35000)	2025-03-25 08:48:08 +02:00
Marko Mäkelä	1756b0f37d	MDEV-35813: more robust test case Let us integrate the test case with innodb.page_cleaner so that there will be less interference from log writes due to checkpoints. Also, make the test compatible with ./mtr --cursor-protocol.	2025-03-18 10:41:38 +02:00
Marko Mäkelä	0e8e0065d6	MDEV-35813 test case	2025-03-17 16:21:09 +02:00

1 2 3 4 5 ...

3815 Commits