1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-01 03:47:19 +03:00

3815 Commits

Author SHA1 Message Date
e3d9369774 cleanup: disconnect before DROP USER
let's always disconnect a user connection before dropping the said user.
MariaDB is traditionally very tolerant to active connections
of the dropped user, which isn't the case for most other databases.

Let's avoid unintentionally spreading incompatible behavior
and disconnect before drop.

Except in cases when the test specifically tests such a behavior.
2025-07-16 09:14:33 +07:00
bead24b7f3 mariadb-test: wait on disconnect
Remove one of the major sources of race condiitons in mariadb-test.
Normally, mariadb_close() sends COM_QUIT to the server and immediately
disconnects. In mariadb-test it means the test can switch to another
connection and sends queries to the server before the server even
started parsing the COM_QUIT packet and these queries can see the
connection as fully active, as it didn't reach dispatch_command yet.

This is a major source of instability in tests and many - but not all,
still less than a half - tests employ workarounds. The correct one
is a pair count_sessions.inc/wait_until_count_sessions.inc.
Also very popular was wait_until_disconnected.inc, which was completely
useless, because it verifies that the connection is closed, and after
disconnect it always is, it didn't verify whether the server processed
COM_QUIT. Sadly the placebo was as widely used as the real thing.

Let's fix this by making mariadb-test `disconnect` command _to wait_ for
the server to confirm. This makes almost all workarounds redundant.

In some cases count_sessions.inc/wait_until_count_sessions.inc is still
needed, though, as only `disconnect` command is changed:

 * after external tools, like `exec $MYSQL`
 * after failed `connect` command
 * replication, after `STOP SLAVE`
 * Federated/CONNECT/SPIDER/etc after `DROP TABLE`

and also in some XA tests, because an XA transaction is dissociated from
the THD very late, after the server has closed the client connection.

Collateral cleanups: fix comments, remove some redundant statements:
 * DROP IF EXISTS if nothing is known to exist
 * DROP table/view before DROP DATABASE
 * REVOKE privileges before DROP USER
 etc
2025-07-16 09:14:33 +07:00
cffbb17480 MDEV-28933: Per-table unique FOREIGN KEY constraint names
Before MySQL 4.0.18, user-specified constraint names were ignored.
Starting with MySQL 4.0.18, the specified constraint name was
prepended with the schema name and '/'.  Now we are transforming
into a format where the constraint name is prepended with the
dict_table_t::name and the impossible UTF-8 sequence 0xff.
Generated constraint names will be ASCII decimal numbers.

On upgrade, old FOREIGN KEY constraint names will be displayed
without any schema name prefix. They will be updated to the new
format on DDL operations.

dict_foreign_t::sql_id(): Return the SQL constraint name
without any schemaname/tablename\377 or schemaname/ prefix.

row_rename_table_for_mysql(), dict_table_rename_in_cache():
Simplify the logic: Just rename constraints to the new format.

dict_table_get_foreign_id(): Replaces dict_table_get_highest_foreign_id().

innobase_get_foreign_key_info(): Let my_error() refer to erroneous
anonymous constraints as "(null)".

row_delete_constraint(): Try to drop all 3 constraint name variants.

Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
2025-07-08 12:30:27 +03:00
e653666368 Merge branch '12.0' into 12.1 2025-06-18 09:27:49 +02:00
dfcb5c91e0 Merge branch '11.8' into 12.0 2025-06-18 07:50:39 +02:00
a65f7dc71d Merge branch '11.4' into 11.8 2025-06-18 07:43:24 +02:00
89c7e2b9c7 Merge branch '10.11' into 11.4 2025-06-17 09:50:22 +02:00
6a2afb42ba MDEV-36487 Fix ha_innobase::check() for sequences
InnoDB does the following check for sequence table during check
table command:
- There should be only one index should exist on sequence table
- There should be only one row should exist on sequence table
- The leaf page must be the root page for the sequence table
- Delete marked record should not exist
- DB_TRX_ID and DB_ROLL_PTR of the record should be 0 and 1U << 55
2025-06-09 13:52:44 +05:30
7dcdb2c876 Merge branch '11.8' into 11.8 release 2025-05-30 13:28:41 +02:00
d953f2c810 MDEV-36868: Inconsistency when shrinking innodb_buffer_pool_size
buf_pool_t::resize(): After successfully shrinking the buffer pool,
announce the success. The size had already been updated in shrunk().
After failing to shrink the buffer pool, re-enable the adaptive
hash index if it had been enabled.

Reviewed by: Debarun Banerjee
2025-05-28 14:33:20 +03:00
7b4b759f13 MDEV-36868: Inconsistency when shrinking innodb_buffer_pool_size
buf_pool_t::resize(): After successfully shrinking the buffer pool,
announce the success. The size had already been updated in shrunk().
After failing to shrink the buffer pool, re-enable the adaptive
hash index if it had been enabled.

Reviewed by: Debarun Banerjee
2025-05-28 13:33:06 +03:00
db188083c3 MDEV-36771 Assertion 'bulk_insert == TRX_NO_BULK' failed in trx_t::assert_freed
- InnoDB fails to reset bulk_insert of a transaction while freeing
the transaction during shutting down of a server.
2025-05-26 12:13:01 +05:30
3da36fa130 Merge 10.6 into 10.11 2025-05-26 08:10:47 +03:00
d8962d138f MDEV-36017 Alter table aborts when temporary directory is full
Problem:
=======
- In 10.11, During Copy algorithm, InnoDB does use bulk insert
for row by row insert operation. When temporary directory
ran out of memory, row_mysql_handle_errors() fails to handle
DB_TEMP_FILE_WRITE_FAIL.

- During inplace algorithm, concurrent DML fails to write
the log operation into the temporary file. InnoDB fail to
mark the error for the online log.

- ddl_log_write() releases the global ddl lock prematurely before
release the log memory entry

Fix:
===
row_mysql_handle_errors(): Rollback the transaction when
InnoDB encounters DB_TEMP_FILE_WRITE_FAIL

convert_error_code_to_mysql(): Report an aborted transaction
when InnoDB encounters DB_TEMP_FILE_WRITE_FAIL during
alter table algorithm=copy or innodb bulk insert operation

row_log_online_op(): Mark the error in online log when
InnoDB ran out of temporary space

fil_space_extend_must_retry(): Mark the os_has_said_disk_full
as true if os_file_set_size() fails

btr_cur_pessimistic_update(): Return error code when
btr_cur_pessimistic_insert() fails

ddl_log_write(): Release the global ddl lock after releasing
the log memory entry when error was encountered

btr_cur_optimistic_update(): Relax the assertion that
blob pointer can be null during rollback because InnoDB can
ran out of space while allocating the external page

ha_innobase::extra(): Rollback the transaction during DDL before
calling convert_error_code_to_mysql().

row_undo_mod_upd_exist_sec(): Remove the assertion which says
that InnoDB should fail to build index entry when rollbacking
an incomplete transaction after crash recovery. This scenario
can happen when InnoDB ran out of space.

row_upd_changes_ord_field_binary_func(): Relax the assertion to
make that externally stored field can be null when InnoDB ran out
of space.
2025-05-26 10:12:14 +05:30
8a4d3a044f MDEV-36017 Alter table aborts when temporary directory is full
Problem:
=======
- During inplace algorithm, concurrent DML fails to write
the log operation into the temporary file. InnoDB fail to
mark the error for the online log.

- ddl_log_write() releases the global ddl lock prematurely before
release the log memory entry

Fix:
===
row_log_online_op(): Mark the error in online log when
InnoDB ran out of temporary space

fil_space_extend_must_retry(): Mark the os_has_said_disk_full
as true if os_file_set_size() fails

btr_cur_pessimistic_update(): Return error code when
btr_cur_pessimistic_insert() fails

ddl_log_write(): Release the global ddl lock after releasing the
log memory entry when error was encountered

btr_cur_optimistic_update(): Relax the assertion that
blob pointer can be null during rollback because InnoDB can
ran out of space while allocating the external page

row_undo_mod_upd_exist_sec(): Remove the assertion which says
that InnoDB should fail to build index entry when rollbacking
an incomplete transaction after crash recovery. This scenario
can happen when InnoDB ran out of space.

row_upd_changes_ord_field_binary_func(): Relax the assertion to
make that externally stored field can be null when InnoDB ran out
of space.
2025-05-25 09:11:41 +05:30
f1102da37a Merge branch '11.8' into 12.0 2025-05-22 09:22:55 +02:00
8d36cafe4f Merge branch '11.4' into 11.8 2025-05-21 15:57:16 +02:00
118cfcf821 Merge 10.11 into 11.4 2025-05-13 13:44:58 +03:00
bb48d7bc81 MDEV-36781: Assertion i < BUF_BUDDY_SIZES failed in buf_buddy_shrink()
buf_buddy_shrink(): Properly cover the case when KEY_BLOCK_SIZE
corresponds to the innodb_page_size, that is, the ROW_FORMAT=COMPRESSED
page frame is directly allocated from the buffer pool, not via the
binary buddy allocator.

buf_LRU_check_size_of_non_data_objects(): Avoid a crash when the
buffer pool is being shrunk.

buf_pool_t::shrink(): Abort if over 95% of the shrunk buffer pool
would be occupied by the adaptive hash index or record locks.
2025-05-13 12:27:46 +03:00
11f6b9d12a remove features that were deprecated in 10.5
--big-tables
--large-page-size
--storage-engine

performance_schema.setup_timers (WL#10986)
2025-04-29 16:53:02 +02:00
1b95e46524 Fix typos in mysql-test/ 2025-04-29 13:53:16 +10:00
f8ba5ced55 MDEV-36099 Ensure that creation and usage of temporary tables in replication is predictable
MDEV-36563 Assertion `!mysql_bin_log.is_open()' failed in
           THD::mark_tmp_table_as_free_for_reuse

The purpose of this commit is to ensure that creation and changes of
temporary tables are properly and predicable logged to the binary
log.  It also fixes some bugs where ROW logging was used in MIXED mode,
when STATEMENT would be a better (and expected) choice.

In this comment STATEMENT stands for logging to binary log in
STATEMENT format, MIXED stands for MIXED binlog format and ROW for ROW
binlog format.

New rules for logging of temporary tables
- CREATE of temporary tables are now by default binlogged only if
  STATEMENT binlog format is used. If it is binlogged, 1 is stored in
  TABLE_SHARE->table_creation_was_logged. The user can change this
  behavior by setting create_temporary_table_binlog_formats to
  MIXED,STATEMENT in which case the create is logged in statement
  format also in MIXED mode (as before).
- Changes to temporary tables are only binlogged if and only if
  the CREATE was logged. The logging happens under STATEMENT or MIXED.
  If binlog_format=ROW, temporary table changes are not binlogged. A
  temporary table that are changed under ROW are marked as 'not up to
  date in binlog' and no future row changes are logged.  Any usage of
  this temporary table will force row logging of other tables in any
  future statements using the temporary table to be row logged.
- DROP TEMPORARY is binlogged only of the CREATE was binlogged.

Changes done:
- Row logging is forced for any statement using temporary tables that
  are not up to date in the binary log.
  (Before the row logging was forced if the user has a temporary table)
- If there is any changes to the temporary table that is not binlogged,
  the table is marked as not up to date.
- TABLE_SHARE->table_creation_was_logged has a new definition for
  temporary tables:
  0  Table creating was not logged to binary log
  1  Table creating was logged to binary log and table is up to date.
  2  Table creating was logged to binary log but some changes where
     not logged to binary log.
  Table is not up to date in binary log is defined as value 0 or 2.
- If a multi-table-update or multi-table-delete fails then
  all updated temporary tables are marked as not up to date.
- Enforce row logging if the query is using temporary tables
  that are not up to date.
  Before row logging was enforced if the user had any
  temporary tables.
- When dropping temporary tables use IF EXISTS. This ensures
  that slave will not stop if it had crashed and lost the
  temporary tables.
- Remove comment and version from DROP /*!4000 TEMPORARY.. generated when
  a connection closes that has open temporary tables. Added 'generated by
  server' at the end of the DROP.

Bugs fixed:
- When using temporary tables with commands that forced row based,
  like INSERT INTO temporary_table VALUES (UUID()), this was never
  logged which causes the temporary table to be inconsistent on
  master and slave.
- Used binlog format is now clearly defined. It is now only depending
  on the current binlog_format and the tables used.
  Before it was depending on the user had ANY temporary tables and
  the state of 'current_stmt_binlog_format' set by previous queries.
  This also caused temporary tables to be logged to binary log in
  some cases.
- CREATE TABLE t1 LIKE not_logged_temporary_table caused replication
  to stop.
- Rename of not binlogged temporary tables where binlogged to binary log
  which caused replication to stop.

Changes in behavior:

- By default create_temporary_table_binlog_formats=STATEMENT, which
  means that CREATE TEMPORARY is not logged to binary log under MIXED
  binary logging. This can be changed by setting
  create_temporary_table_binlog_formats to MIXED,STATEMENT.
- Using temporary tables that was not logged to the binary log will
  cause any query using them for updating other tables to be logged in
  ROW format. Before all queries was logged in ROW format if the user had
  any temporary tables, even if they were not used by the query.
- Generated DROP TEMPORARY TABLE is now always using IF EXISTS and
  has a "generated by server" comment in the binary log.

The consequences of the above is that manipulations of a lot of rows
through temporary tables will by default be be slower in mixed mode.

For example:
  BEGIN;
  CREATE TEMPORARY TABLE tmp AS SELECT a, b, c FROM
  large_table1 JOIN large_table2 ON ...;
  INSERT INTO other_table SELECT b, c FROM tmp WHERE a <100;
  DROP TEMPORARY TABLE tmp;
  COMMIT;

By default this will create a huge entry in the binary log, compared
to just a few hundred bytes in statement mode. However the change in
this commit will make usage of temporary tables more reliable and
predicable and is thus worth it. Using statement mode or
create_temporary_table_binlog_formats can be used to avoid this issue.
2025-04-28 12:59:38 +03:00
237e24497b Merge remote-tracking branch 'github/bb-11.4-release' into bb-11.8-serg 2025-04-27 19:40:00 +02:00
a8d4642375 Merge branch '10.11' into 11.4 2025-04-26 10:53:02 +02:00
75ad1e9f00 Merge 10.6 into 10.11 2025-04-23 08:53:53 +03:00
47e687b109 MDEV-36639 innodb_snapshot_isolation=1 gives error for not committed row changes
Set solution is to check if transaction, which modified a record, is
still active in lock_clust_rec_read_check_and_lock(). if yes, then just
request a lock. If no, then, depending on if the current transaction read
view can see the changes, return eighter DB_RECORD_CHANGED or request a
lock.

We can do the check in lock_clust_rec_read_check_and_lock() because
transaction tries to set a lock on the record which cursor points to after
transaction resuming and cursor position restoring. If the lock already
exists, then we don't request the lock again. But for the current commit
it's important that lock_clust_rec_read_check_and_lock() will be invoked
again for the same record, so we can do the check again after
transaction, which modified a record, was committed or rolled back.

MDEV-33802(4aa9291) is partially reverted. If some transaction holds
implicit lock on some record and transaction with snapshot isolation level
requests conflicting lock on the same record, it should be blocked instead
of returning DB_RECORD_CHANGED to have ability to continue execution when
implicit lock owner is rolled back.

The construction
--------------------------------------------------------------------------
let $wait_condition=
  select count(*) = 1 from information_schema.processlist
  where state = 'Updating' and info = 'UPDATE t SET b = 2 WHERE a';
--source include/wait_condition.inc
--------------------------------------------------------------------------

is not reliable enought to make sure transaction is blocked in test
case, the test failed sporadically with
--------------------------------------------------------------------------
./mtr --max-test-fail=1 --parallel=96 lock_isolation{,,,,,,,}{,,,}{,,} \
--repeat=500
--------------------------------------------------------------------------

command. That's why it was replaced with debug sync-points.

Reviewed by: Marko Mäkelä
2025-04-22 20:41:43 +03:00
dac3d702f7 MDEV-36649 dict_acquire_mdl_shared() aborts when table mode is DICT_TABLE_OP_OPEN_ONLY_IF_CACHED
- InnoDB fails to check the table is being dropped or evicted
while acquiring the MDL for the table when table open operation
mode is DICT_TABLE_OP_OPEN_ONLY_IF_CACHED. This is caused by
the commit 337bf8ac4b (MDEV-36122)

Fix:
===
dict_acquire_mdl_shared(): If the table is evicted or dropped when
table operation mode is DICT_TABLE_OP_OPEN_IF_CACHED then return
nullptr
2025-04-22 15:17:29 +05:30
1a85ae444a MDEV-36050 DATA/INDEX DIRECTORY handling is inconsistent
consistently issue a

Note 1618 DATA DIRECTORY option ignored
Note 1618 INDEX DIRECTORY option ignored

in archive/csv/innodb/rocksdb whenever an option is ignored.

Note that csv doesn't say "INDEX DIRECTORY option ignored"
because it does not create index files at all anywhere.

Other engines don't say "INDEX DIRECTORY option ignored"
if the table has no indexes.

additionally InnoDB doesn't say that if INDEX DIRECTORY is
the same as DATA DIRECTORY, because in that case indexes are
technically stored in INDEX DIRECTORY.

collateral fix: use strmake to zero-terminate the string
2025-04-18 09:41:23 +02:00
f388222d49 MDEV-36504 Memory leak after CREATE TABLE..SELECT
Problem:
========
- After commit cc8eefb0dc (MDEV-33087),
InnoDB does use bulk insert operation for ALTER TABLE.. ALGORITHM=COPY
and CREATE TABLE..SELECT as well. InnoDB fails to clear the bulk
buffer when it encounters error during CREATE..SELECT. Problem
is that while transaction cleanup, InnoDB fails to identify
the bulk insert for DDL operation.

Fix:
====
- Represent bulk_insert in trx by 2 bits. By doing that, InnoDB
can distinguish between TRX_DML_BULK, TRX_DDL_BULK. During DDL,
set bulk insert value for transaction to TRX_DDL_BULK.

- Introduce a parameter HA_EXTRA_ABORT_ALTER_COPY which rollbacks
only TRX_DDL_BULK transaction.

- bulk_insert_apply() happens for TRX_DDL_BULK transaction happens
only during HA_EXTRA_END_ALTER_COPY extra() call.
2025-04-17 12:04:09 +05:30
1a013cea95 Merge branch '10.6' into '10.11' 2025-04-16 03:34:40 +02:00
c6de1267dd MDEV-35689 InnoDB system tables cannot be optimized or defragmented
- With the help of MDEV-14795, InnoDB implemented a way to shrink
the InnoDB system tablespace after undo tablespaces have been moved
to separate files (MDEV-29986). There is no way to defragment any
pages of InnoDB system tables. By doing that, shrinking of
system tablespace can be more effective. This patch deals with
defragment of system tables inside ibdata1.

Following steps are done to do the defragmentation of system
tablespace:
1) Make sure that there is no user tables exist in ibdata1

2) Iterate through all extent descriptor pages in system tablespace
and note their states.

3) Find the free earlier extent to replace the lastly used
extents in the system tablespace.

4) Iterate through all indexes of system tablespace and defragment
the tree level by level.

5) Iterate the level from left page to right page and find out
the page comes under the extent to be replaced. If it is then
do step (6) else step(4)

6) Prepare the allocation of new extent by latching necessary
pages. If any error happens then there is no modification of
page happened till step (5).

7) Allocate the new page from the new extent

8) Prepare the associated pages for the block to be modified

9) Prepare the step of freeing of page

10) If any error happens during preparing of associated pages,
freeing of page then restore the page which was modified
during new page allocation

11) Copy the old page content to new page

12) Change the associative pages like left, right and parent page

13) Complete the freeing of old page

Allocation of page from new extent, changing of relative pages,
freeing of page are done by 2 steps. one is prepare which
latches the to be modified pages and checks their validation.
Other is complete(), Do the operation

fseg_validate(): Validate the list exist in inode segment

Defragmentation is enabled only when :autoextend exist in
innodb_data_file_path variable.
2025-04-10 17:13:34 +05:30
669f719cc2 MDEV-36489 10.11 crashes during bootstrap on macOS
buf_block_t::initialise(): Remove a redundant call to page.lock.init()
that was already executed in buf_pool_t::create() or
buf_pool_t::resize().

This fixes a regression that was introduced in
commit b6923420f3 (MDEV-29445).
2025-04-07 11:01:17 +03:00
db4763a0d1 Fix a slow test
When we expect a lock wait timeout, let us override
the default innodb_lock_wait_timeout=50
with the minimum timeout of 1 second.
2025-04-07 10:25:34 +03:00
b11772d9a5 MDEV-33167 ASAN errors in dict_sys_t::load_table / get_foreign_key_info after failing to load a table
Problem:
=======
- While loading the foreign key constraints for the parent table,
if child table wasn't open then InnoDB uses the parent table heap
to store the child table name in fk_tables list. If the consecutive
foreign key relation for the parent table fails with error,
InnoDB evicts the parent table from memory. But InnoDB accesses the
evicted table memory again in dict_sys.load_table()

Solution:
========
dict_load_table_one(): In case of error, remove the child table
names which was added during dict_load_foreigns()
2025-04-03 17:39:40 +05:30
0d7ef4f478 MDEV-36236 Instant alter aborts when InnoDB fails to rollback instant operation
Problem:
========
- InnoDB does consecutive instant alter operation, first instant DDL
fails, it fails to reset the old instant information in table during
rollback. This lead to consecutive instant alter to have wrong
assumption about the exisitng instant column information.

Fix:
====
dict_table_t::instant_column(): Duplicate the instant information
field of the table. By doing this, InnoDB alter retains the old
instant information and reset it during rollback operation
2025-04-03 13:09:08 +05:30
58a3677309 MDEV-29445 fixup: Do not skip a test 2025-04-02 15:56:22 +03:00
bb1d88b6dc Merge 11.4 into 11.8 2025-04-02 14:07:01 +03:00
3ae8f114e2 Merge 10.11 into 11.4 2025-04-02 10:15:08 +03:00
aaec841865 Merge 10.6 into 10.11 2025-04-02 09:33:20 +03:00
4c0e2f1aca MDEV-35813: even more robust test case
The test in commit 1756b0f37d
is occasionally failing if there are unexpectedly many page cleaner
batches that are updating the log checkpoint by small amounts.
This occurs in particular when running the server under Valgrind.

Let us insert the same number of records with a larger number of
statements in a hope that the test would then be more likely to pass.
2025-04-02 08:12:29 +03:00
f56099a95d Fix some tests mainly on Valgrind 2025-03-29 10:14:56 +02:00
f5bd250f5b Merge 10.11 into 11.4 2025-03-28 13:55:21 +02:00
ab0f2a00b6 Merge 10.6 into 10.11 2025-03-27 08:01:47 +02:00
191209d8ab Merge 10.5 into 10.6 2025-03-26 17:09:57 +02:00
ba81009f63 MDEV-34863 RAM Usage Changed Significantly Between 10.11 Releases
innodb_buffer_pool_size_auto_min: A minimum innodb_buffer_pool_size that
a Linux memory pressure event can lead to shrinking the buffer pool to.
On a memory pressure event, we will attempt to shrink
innodb_buffer_pool_size halfway between its current value and
innodb_buffer_pool_size_auto_min. If innodb_buffer_pool_size_auto_min is
specified as 0 or not specified on startup, its default value will be
adjusted to innodb_buffer_pool_size_max, that is, memory pressure events
will be disregarded by default.

buf_pool_t::garbage_collect(): For up to 15 seconds, attempt to shrink
the buffer pool in response to a memory pressure event.

Reviewed by: Debarun Banerjee
2025-03-26 17:05:48 +02:00
b6923420f3 MDEV-29445: Reimplement SET GLOBAL innodb_buffer_pool_size
We deprecate and ignore the parameter innodb_buffer_pool_chunk_size
and let the buffer pool size to be changed in arbitrary 1-megabyte
increments.

innodb_buffer_pool_size_max: A new read-only startup parameter
that specifies the maximum innodb_buffer_pool_size.  If 0 or
unspecified, it will default to the specified innodb_buffer_pool_size
rounded up to the allocation unit (2 MiB or 8 MiB).  The maximum value
is 4GiB-2MiB on 32-bit systems and 16EiB-8MiB on 64-bit systems.
This maximum is very likely to be limited further by the operating system.

The status variable Innodb_buffer_pool_resize_status will reflect
the status of shrinking the buffer pool. When no shrinking is in
progress, the string will be empty.

Unlike before, the execution of SET GLOBAL innodb_buffer_pool_size
will block until the requested buffer pool size change has been
implemented, or the execution is interrupted by a KILL statement
a client disconnect, or server shutdown.  If the
buf_flush_page_cleaner() thread notices that we are running out of
memory, the operation may fail with ER_WRONG_USAGE.

SET GLOBAL innodb_buffer_pool_size will be refused
if the server was started with --large-pages (even if
no HugeTLB pages were successfully allocated). This functionality
is somewhat exercised by the test main.large_pages, which now runs
also on Microsoft Windows.  On Linux, explicit HugeTLB mappings are
apparently excluded from the reported Redident Set Size (RSS), and
apparently unshrinkable between mmap(2) and munmap(2).

The buffer pool will be mapped to a contiguous virtual memory area
that will be aligned and partitioned into extents of 8 MiB on
64-bit systems and 2 MiB on 32-bit systems.

Within an extent, the first few innodb_page_size blocks contain
buf_block_t objects that will cover the page frames in the rest
of the extent.  The number of such frames is precomputed in the
array first_page_in_extent[] for each innodb_page_size.
In this way, there is a trivial mapping between
page frames and block descriptors and we do not need any
lookup tables like buf_pool.zip_hash or buf_pool_t::chunk_t::map.

We will always allocate the same number of block descriptors for
an extent, even if we do not need all the buf_block_t in the last
extent in case the innodb_buffer_pool_size is not an integer multiple
of the of extents size.

The minimum innodb_buffer_pool_size is 256*5/4 pages.  At the default
innodb_page_size=16k this corresponds to 5 MiB.  However, now that the
innodb_buffer_pool_size includes the memory allocated for the block
descriptors, the minimum would be innodb_buffer_pool_size=6m.

my_large_virtual_alloc(): A new function, similar to my_large_malloc().

my_virtual_mem_reserve(), my_virtual_mem_commit(),
my_virtual_mem_decommit(), my_virtual_mem_release():
New interface mostly by Vladislav Vaintroub, to separately
reserve and release virtual address space, as well as to
commit and decommit memory within it.

After my_virtual_mem_decommit(), the virtual memory range will be
read-only or unaccessible, depending on whether the build option
cmake -DHAVE_UNACCESSIBLE_AFTER_MEM_DECOMMIT=1
has been specified.  This option is hard-coded on Microsoft Windows,
where VirtualMemory(MEM_DECOMMIT) will make the memory unaccessible.
On IBM AIX, Linux, Illumos and possibly Apple macOS, the virtual memory
will be zeroed out immediately.  On other POSIX-like systems,
madvise(MADV_FREE) will be used if available, to give the operating
system kernel a permission to zero out the virtual memory range.
We prefer immediate freeing so that the reported
resident set size (RSS) of the process will reflect the current
innodb_buffer_pool_size.  Shrinking the buffer pool is a rarely
executed resource intensive operation, and the immediate configuration
of the MMU mappings should not incur significant additional penalty.

opt_super_large_pages: Declare only on Solaris. Actually, this is
specific to the SPARC implementation of Solaris, but because we
lack access to a Solaris development environment, we will not revise
this for other MMU and ISA.

buf_pool_t::chunk_t::create(): Remove.

buf_pool_t::create(): Initialize all n_blocks of the buf_pool.free list.

buf_pool_t::allocate(): Renamed from buf_LRU_get_free_only().

buf_pool_t::LRU_warned: Changed to Atomic_relaxed<bool>,
only to be modified by the buf_flush_page_cleaner() thread.

buf_pool_t::shrink(): Attempt to shrink the buffer pool.
There are 3 possible outcomes: SHRINK_DONE (success),
SHRINK_IN_PROGRESS (the caller may keep trying),
and SHRINK_ABORT (we seem to be running out of buffer pool).
While traversing buf_pool.LRU, release the contended
buf_pool.mutex once in every 32 iterations in order to
reduce starvation. Use lru_scan_itr for efficient traversal,
similar to buf_LRU_free_from_common_LRU_list().

buf_pool_t::shrunk(): Update the reduced size of the buffer pool
in a way that is compatible with buf_pool_t::page_guess(),
and invoke my_virtual_mem_decommit().

buf_pool_t::resize(): Before invoking shrink(), run one batch of
buf_flush_page_cleaner() in order to prevent LRU_warn().
Abort if shrink() recommends it, or no blocks were withdrawn in
the past 15 seconds, or the execution of the statement
SET GLOBAL innodb_buffer_pool_size was interrupted.

buf_pool_t::first_to_withdraw: The first block descriptor that is
out of the bounds of the shrunk buffer pool.

buf_pool_t::withdrawn: The list of withdrawn blocks.
If buf_pool_t::resize() is aborted before shrink() completes,
we must be able to resurrect the withdrawn blocks in the free list.

buf_pool_t::contains_zip(): Added a parameter for the
number of least significant pointer bits to disregard,
so that we can find any pointers to within a block
that is supposed to be free.

buf_pool_t::is_shrinking(): Return the total number or blocks that
were withdrawn or are to be withdrawn.

buf_pool_t::to_withdraw(): Return the number of blocks that will need to
be withdrawn.

buf_pool_t::usable_size(): Number of usable pages, considering possible
in-progress attempt at shrinking the buffer pool.

buf_pool_t::page_guess(): Try to buffer-fix a guessed block pointer.
If HAVE_UNACCESSIBLE_AFTER_MEM_DECOMMIT is set, the pointer will
be validated before being dereferenced.

buf_pool_t::get_info(): Replaces buf_stats_get_pool_info().

innodb_init_param(): Refactored. We must first compute
srv_page_size_shift and then determine the valid bounds of
innodb_buffer_pool_size.

buf_buddy_shrink(): Replaces buf_buddy_realloc().
Part of the work is deferred to buf_buddy_condense_free(),
which is being executed when we are not holding any
buf_pool.page_hash latch.

buf_buddy_condense_free(): Do not relocate blocks.

buf_buddy_free_low(): Do not care about buffer pool shrinking.
This will be handled by buf_buddy_shrink() and
buf_buddy_condense_free().

buf_buddy_alloc_zip(): Assert !buf_pool.contains_zip()
when we are allocating from the binary buddy system.
Previously we were asserting this on multiple recursion levels.

buf_buddy_block_free(), buf_buddy_free_low():
Assert !buf_pool.contains_zip().

buf_buddy_alloc_from(): Remove the redundant parameter j.

buf_flush_LRU_list_batch(): Add the parameter to_withdraw
to keep track of buf_pool.n_blocks_to_withdraw.

buf_do_LRU_batch(): Skip buf_free_from_unzip_LRU_list_batch()
if we are shrinking the buffer pool. In that case, we want
to minimize the page relocations and just finish as quickly
as possible.

trx_purge_attach_undo_recs(): Limit purge_sys.n_pages_handled()
in every iteration, in case the buffer pool is being shrunk
in the middle of a purge batch.

Reviewed by: Debarun Banerjee
2025-03-26 17:05:44 +02:00
1f4a901576 MDEV-36281 DML aborts during online virtual index
Reason:
=======
- InnoDB DML commit aborts the server when InnoDB does online
virtual index. During online DDL, concurrent DML commit
operation does read the undo log record and their related
current version of the clustered index record. Based on the
operation, InnoDB do build the old tuple and new tuple for
the table. If the concurrent online index can be affected
by the operation, InnoDB does build the entry
for the index and log the operation.

Problematic case is update operation, InnoDB does build the
update vector. But while building the old row, InnoDB fails
to fill the non-affected virtual column. This lead to
server abort while build the entry for index.

Fix:
===
- First, fill the virtual column entries for the new row.
Duplicate the old row based on new row and change only the
affected fields in old row based on the update vector.
2025-03-26 12:48:39 +01:00
33a462e0b1 MDEV-36373 Bogus Warning: ... storage is corrupted
ha_innobase::statistics_init(), ha_innobase::info_low():
Correctly handle a DB_READ_ONLY return value from dict_stats_save().
Fixes up commit 6e6a1b316c (MDEV-35000)
2025-03-25 08:48:08 +02:00
1756b0f37d MDEV-35813: more robust test case
Let us integrate the test case with innodb.page_cleaner so that there
will be less interference from log writes due to checkpoints.
Also, make the test compatible with ./mtr --cursor-protocol.
2025-03-18 10:41:38 +02:00
0e8e0065d6 MDEV-35813 test case 2025-03-17 16:21:09 +02:00