1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00

MDEV-19514 Defer change buffer merge until pages are requested

We will remove the InnoDB background operation of merging buffered
changes to secondary index leaf pages. Changes will only be merged as a
result of an operation that accesses a secondary index leaf page,
such as a SQL statement that performs a lookup via that index,
or is modifying the index. Also ROLLBACK and some background operations,
such as purging the history of committed transactions, or computing
index cardinality statistics, can cause change buffer merge.
Encryption key rotation will not perform change buffer merge.

The motivation of this change is to simplify the I/O logic and to
allow crash recovery to happen in the background (MDEV-14481).
We also hope that this will reduce the number of "mystery" crashes
due to corrupted data. Because change buffer merge will typically
take place as a result of executing SQL statements, there should be
a clearer connection between the crash and the SQL statements that
were executed when the server crashed.

In many cases, a slight performance improvement was observed.

This is joint work with Thirunarayanan Balathandayuthapani
and was tested by Axel Schwenke and Matthias Leich.

The InnoDB monitor counter innodb_ibuf_merge_usec will be removed.

On slow shutdown (innodb_fast_shutdown=0), we will continue to
merge all buffered changes (and purge all undo log history).

Two InnoDB configuration parameters will be changed as follows:

innodb_disable_background_merge: Removed.
This parameter existed only in debug builds.
All change buffer merges will use synchronous reads.

innodb_force_recovery will be changed as follows:
* innodb_force_recovery=4 will be the same as innodb_force_recovery=3
(the change buffer merge cannot be disabled; it can only happen as
a result of an operation that accesses a secondary index leaf page).
The option used to be capable of corrupting secondary index leaf pages.
Now that capability is removed, and innodb_force_recovery=4 becomes 'safe'.
* innodb_force_recovery=5 (which essentially hard-wires
SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED)
becomes safe to use. Bogus data can be returned to SQL, but
persistent InnoDB data files will not be corrupted further.
* innodb_force_recovery=6 (ignore the redo log files)
will be the only option that can potentially cause
persistent corruption of InnoDB data files.

Code changes:

buf_page_t::ibuf_exist: New flag, to indicate whether buffered
changes exist for a buffer pool page. Pages with pending changes
can be returned by buf_page_get_gen(). Previously, the changes
were always merged inside buf_page_get_gen() if needed.

ibuf_page_exists(const buf_page_t&): Check if a buffered changes
exist for an X-latched or read-fixed page.

buf_page_get_gen(): Add the parameter allow_ibuf_merge=false.
All callers that know that they may be accessing a secondary index
leaf page must pass this parameter as allow_ibuf_merge=true,
unless it does not matter for that caller whether all buffered
changes have been applied. Assert that whenever allow_ibuf_merge
holds, the page actually is a leaf page. Attempt change buffer
merge only to secondary B-tree index leaf pages.

btr_block_get(): Add parameter 'bool merge'.
All callers of btr_block_get() should know whether the page could be
a secondary index leaf page. If it is not, we should avoid consulting
the change buffer bitmap to even consider a merge. This is the main
interface to requesting index pages from the buffer pool.

ibuf_merge_or_delete_for_page(), recv_recover_page(): Replace
buf_page_get_known_nowait() with much simpler logic, because
it is now guaranteed that that the block is x-latched or read-fixed.

mlog_init_t::mark_ibuf_exist(): Renamed from mlog_init_t::ibuf_merge().
On crash recovery, we will no longer merge any buffered changes
for the pages that we read into the buffer pool during the last batch
of applying log records.

buf_page_get_gen_known_nowait(), BUF_MAKE_YOUNG, BUF_KEEP_OLD: Remove.

btr_search_guess_on_hash(): Merge buf_page_get_gen_known_nowait()
to its only remaining caller.

buf_page_make_young_if_needed(): Define as an inline function.
Add the parameter buf_pool.

buf_page_peek_if_young(), buf_page_peek_if_too_old(): Add the
parameter buf_pool.

fil_space_validate_for_mtr_commit(): Remove a bogus comment
about background merge of the change buffer.

btr_cur_open_at_rnd_pos_func(), btr_cur_search_to_nth_level_func(),
btr_cur_open_at_index_side_func(): Use narrower data types and scopes.

ibuf_read_merge_pages(): Replaces buf_read_ibuf_merge_pages().
Merge the change buffer by invoking buf_page_get_gen().
This commit is contained in:
Marko Mäkelä
2019-10-11 17:28:15 +03:00
parent b5fae7f743
commit b42294bc64
43 changed files with 549 additions and 911 deletions

View File

@@ -49,7 +49,7 @@ SELECT * FROM t1;
# TODO: MDEV-12700 Allow innodb_read_only startup without prior slow shutdown.
--source include/kill_mysqld.inc
--error 1
--exec $MYSQLD_LAST_CMD --log-bin=master-bin --binlog-format=mixed --core-file --loose-debug-sync-timeout=300 --innodb-force-recovery=4
--exec $MYSQLD_LAST_CMD --log-bin=master-bin --binlog-format=mixed --core-file --loose-debug-sync-timeout=300 --debug_dbug="+d,innobase_xa_fail"
--let SEARCH_PATTERN= was in the XA prepared state
--source include/search_pattern_in_file.inc
@@ -59,7 +59,7 @@ SELECT * FROM t1;
--source include/search_pattern_in_file.inc
--error 1
--exec $MYSQLD_LAST_CMD --log-bin=master-bin --binlog-format=mixed --core-file --loose-debug-sync-timeout=300 --innodb-force-recovery=4 --tc-heuristic-recover=COMMIT
--exec $MYSQLD_LAST_CMD --log-bin=master-bin --binlog-format=mixed --core-file --loose-debug-sync-timeout=300 --debug_dbug="+d,innobase_xa_fail" --tc-heuristic-recover=COMMIT
--let SEARCH_PATTERN= was in the XA prepared state
--source include/search_pattern_in_file.inc
--let SEARCH_PATTERN= Found 1 prepared transactions!

View File

@@ -21,7 +21,6 @@ INSERT INTO t1 SELECT 0,b,c FROM t1;
# restart: --innodb-force-recovery=6
check table t1;
Table Op Msg_type Msg_text
test.t1 check Warning InnoDB: Index 'b' contains #### entries, should be 4096.
test.t1 check error Corrupt
test.t1 check status OK
# restart
DROP TABLE t1;

View File

@@ -491,7 +491,6 @@ INDEX idx3(c4(512))) Engine=InnoDB;
connect purge_control,localhost,root;
START TRANSACTION WITH CONSISTENT SNAPSHOT;
connection default;
SET GLOBAL innodb_disable_background_merge=ON;
SET GLOBAL innodb_monitor_reset = ibuf_merges;
SET GLOBAL innodb_monitor_reset = ibuf_merges_insert;
INSERT INTO test_wl5522.t1(c2, c3, c4) VALUES
@@ -642,6 +641,7 @@ SELECT name
FROM information_schema.innodb_metrics
WHERE name = 'ibuf_merges_insert' AND count = 0;
name
ibuf_merges_insert
FLUSH TABLES test_wl5522.t1 FOR EXPORT;
backup: t1
UNLOCK TABLES;
@@ -649,12 +649,10 @@ SELECT name
FROM information_schema.innodb_metrics
WHERE name = 'ibuf_merges' AND count > 0;
name
ibuf_merges
SELECT name
FROM information_schema.innodb_metrics
WHERE name = 'ibuf_merges_inserts' AND count > 0;
name
SET GLOBAL innodb_disable_background_merge=OFF;
connection purge_control;
COMMIT;
disconnect purge_control;

View File

@@ -3,38 +3,26 @@ create table t2(f1 int primary key, f2 int, index idx(f2))engine=innodb;
insert into t1 values(1, 2);
insert into t2 values(1, 2);
SET GLOBAL innodb_fast_shutdown = 0;
# Restart the server with innodb_force_recovery as 4.
# restart: --innodb-force-recovery=4
select * from t1;
f1 f2
1 2
begin;
insert into t1 values(2, 3);
ERROR HY000: Running in read-only mode
rollback;
alter table t1 add f3 int not null, algorithm=copy;
ERROR HY000: Can't create table `test`.`t1` (errno: 165 "Table is read only")
alter table t1 add f3 int not null, algorithm=inplace;
ERROR 0A000: ALGORITHM=INPLACE is not supported. Reason: Running in read-only mode. Try ALGORITHM=COPY
alter table t1 add f4 int not null, algorithm=inplace;
drop index idx on t1;
ERROR HY000: Can't create table `test`.`t1` (errno: 165 "Table is read only")
alter table t1 drop index idx, algorithm=inplace;
ERROR 0A000: ALGORITHM=INPLACE is not supported. Reason: Running in read-only mode. Try ALGORITHM=COPY
update t1 set f1=3 where f2=2;
ERROR HY000: Running in read-only mode
create table t3(f1 int not null)engine=innodb;
ERROR HY000: Can't create table `test`.`t3` (errno: 165 "Table is read only")
drop table t3;
ERROR 42S02: Unknown table 'test.t3'
rename table t1 to t3;
ERROR HY000: Error on rename of './test/t1' to './test/t3' (errno: 165 "Table is read only")
rename table t3 to t1;
truncate table t1;
ERROR HY000: Table 't1' is read only
drop table t1;
ERROR HY000: Table 't1' is read only
show tables;
Tables_in_test
t1
t2
# Restart the server with innodb_force_recovery as 5.
# restart: --innodb-force-recovery=5
select * from t2;
f1 f2
@@ -65,7 +53,6 @@ show tables;
Tables_in_test
t1
t2
# Restart the server with innodb_force_recovery as 6.
# restart: --innodb-force-recovery=6
select * from t2;
f1 f2
@@ -94,7 +81,6 @@ show tables;
Tables_in_test
t1
t2
# Restart the server with innodb_force_recovery=2
# restart: --innodb-force-recovery=2
select * from t2;
f1 f2
@@ -108,7 +94,6 @@ drop table t1;
disconnect con1;
connection default;
# Kill the server
# Restart the server with innodb_force_recovery=3
# restart: --innodb-force-recovery=3
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
select * from t2;

View File

@@ -248,7 +248,6 @@ innodb_activity_count server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NU
innodb_master_active_loops server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of times master thread performs its tasks when server is active
innodb_master_idle_loops server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of times master thread performs its tasks when server is idle
innodb_background_drop_table_usec server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Time (in microseconds) spent to process drop table list
innodb_ibuf_merge_usec server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Time (in microseconds) spent to process change buffer merge
innodb_log_flush_usec server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Time (in microseconds) spent to flush log records
innodb_mem_validate_usec server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Time (in microseconds) spent to do memory validation
innodb_master_purge_usec server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Time (in microseconds) spent by master thread to purge records

View File

@@ -214,7 +214,6 @@ innodb_activity_count disabled
innodb_master_active_loops disabled
innodb_master_idle_loops disabled
innodb_background_drop_table_usec disabled
innodb_ibuf_merge_usec disabled
innodb_log_flush_usec disabled
innodb_mem_validate_usec disabled
innodb_master_purge_usec disabled

View File

@@ -1047,9 +1047,6 @@ connect (purge_control,localhost,root);
START TRANSACTION WITH CONSISTENT SNAPSHOT;
connection default;
# Disable change buffer merge from the master thread, additionally
# enable aggressive flushing so that more changes are buffered.
SET GLOBAL innodb_disable_background_merge=ON;
SET GLOBAL innodb_monitor_reset = ibuf_merges;
SET GLOBAL innodb_monitor_reset = ibuf_merges_insert;
@@ -1112,8 +1109,6 @@ SELECT name
FROM information_schema.innodb_metrics
WHERE name = 'ibuf_merges_inserts' AND count > 0;
SET GLOBAL innodb_disable_background_merge=OFF;
# Enable normal operation
connection purge_control;
COMMIT;

View File

@@ -17,47 +17,33 @@ insert into t2 values(1, 2);
SET GLOBAL innodb_fast_shutdown = 0;
--echo # Restart the server with innodb_force_recovery as 4.
--let $restart_parameters= --innodb-force-recovery=4
--source include/restart_mysqld.inc
let $status=`SHOW ENGINE INNODB STATUS`;
select * from t1;
--error ER_READ_ONLY_MODE
begin;
insert into t1 values(2, 3);
rollback;
--error ER_CANT_CREATE_TABLE
alter table t1 add f3 int not null, algorithm=copy;
--error ER_ALTER_OPERATION_NOT_SUPPORTED_REASON
alter table t1 add f3 int not null, algorithm=inplace;
alter table t1 add f4 int not null, algorithm=inplace;
--error ER_CANT_CREATE_TABLE
drop index idx on t1;
--error ER_ALTER_OPERATION_NOT_SUPPORTED_REASON
alter table t1 drop index idx, algorithm=inplace;
--error ER_READ_ONLY_MODE
update t1 set f1=3 where f2=2;
--error ER_CANT_CREATE_TABLE
create table t3(f1 int not null)engine=innodb;
--error ER_BAD_TABLE_ERROR
drop table t3;
--error ER_ERROR_ON_RENAME
rename table t1 to t3;
--error ER_OPEN_AS_READONLY
rename table t3 to t1;
truncate table t1;
--error ER_OPEN_AS_READONLY
drop table t1;
show tables;
--echo # Restart the server with innodb_force_recovery as 5.
--let $restart_parameters= --innodb-force-recovery=5
--source include/restart_mysqld.inc
let $status=`SHOW ENGINE INNODB STATUS`;
@@ -98,7 +84,6 @@ create schema db;
drop schema db;
show tables;
--echo # Restart the server with innodb_force_recovery as 6.
--let $restart_parameters= --innodb-force-recovery=6
--source include/restart_mysqld.inc
let $status=`SHOW ENGINE INNODB STATUS`;
@@ -136,7 +121,6 @@ truncate table t2;
drop table t2;
show tables;
--echo # Restart the server with innodb_force_recovery=2
--let $restart_parameters= --innodb-force-recovery=2
--source include/restart_mysqld.inc
let $status=`SHOW ENGINE INNODB STATUS`;
@@ -154,7 +138,6 @@ disconnect con1;
connection default;
--source include/kill_mysqld.inc
--echo # Restart the server with innodb_force_recovery=3
--let $restart_parameters= --innodb-force-recovery=3
--source include/start_mysqld.inc
let $status=`SHOW ENGINE INNODB STATUS`;

View File

@@ -120,7 +120,6 @@ ROW_FORMAT=COMPRESSED;
connect purge_control,localhost,root;
START TRANSACTION WITH CONSISTENT SNAPSHOT;
connection default;
SET GLOBAL innodb_disable_background_merge=ON;
SET GLOBAL innodb_monitor_reset = ibuf_merges;
SET GLOBAL innodb_monitor_reset = ibuf_merges_insert;
INSERT INTO test_wl5522.t1(c2, c3, c4) VALUES
@@ -271,6 +270,7 @@ SELECT name
FROM information_schema.innodb_metrics
WHERE name = 'ibuf_merges_insert' AND count = 0;
name
ibuf_merges_insert
FLUSH TABLES test_wl5522.t1 FOR EXPORT;
backup: t1
UNLOCK TABLES;
@@ -278,12 +278,10 @@ SELECT name
FROM information_schema.innodb_metrics
WHERE name = 'ibuf_merges' AND count > 0;
name
ibuf_merges
SELECT name
FROM information_schema.innodb_metrics
WHERE name = 'ibuf_merges_inserts' AND count > 0;
name
SET GLOBAL innodb_disable_background_merge=OFF;
connection purge_control;
COMMIT;
disconnect purge_control;

View File

@@ -312,9 +312,6 @@ connect (purge_control,localhost,root);
START TRANSACTION WITH CONSISTENT SNAPSHOT;
connection default;
# Disable change buffer merge from the master thread, additionally
# enable aggressive flushing so that more changes are buffered.
SET GLOBAL innodb_disable_background_merge=ON;
SET GLOBAL innodb_monitor_reset = ibuf_merges;
SET GLOBAL innodb_monitor_reset = ibuf_merges_insert;
@@ -377,8 +374,6 @@ SELECT name
FROM information_schema.innodb_metrics
WHERE name = 'ibuf_merges_inserts' AND count > 0;
SET GLOBAL innodb_disable_background_merge=OFF;
# Enable normal operation
connection purge_control;
COMMIT;

View File

@@ -1,4 +0,0 @@
SET @orig = @@global.innodb_disable_background_merge;
SELECT @orig;
@orig
0

View File

@@ -176,7 +176,7 @@
VARIABLE_SCOPE GLOBAL
-VARIABLE_TYPE BIGINT UNSIGNED
+VARIABLE_TYPE INT UNSIGNED
VARIABLE_COMMENT Helps to save your data in case the disk image of the database becomes corrupt.
VARIABLE_COMMENT Helps to save your data in case the disk image of the database becomes corrupt. Value 5 can return bogus data, and 6 can permanently corrupt data.
NUMERIC_MIN_VALUE 0
NUMERIC_MAX_VALUE 6
@@ -949,7 +949,7 @@

View File

@@ -621,18 +621,6 @@ NUMERIC_BLOCK_SIZE NULL
ENUM_VALUE_LIST OFF,ON
READ_ONLY NO
COMMAND_LINE_ARGUMENT OPTIONAL
VARIABLE_NAME INNODB_DISABLE_BACKGROUND_MERGE
SESSION_VALUE NULL
DEFAULT_VALUE OFF
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BOOLEAN
VARIABLE_COMMENT Disable change buffering merges by the master thread
NUMERIC_MIN_VALUE NULL
NUMERIC_MAX_VALUE NULL
NUMERIC_BLOCK_SIZE NULL
ENUM_VALUE_LIST OFF,ON
READ_ONLY NO
COMMAND_LINE_ARGUMENT NONE
VARIABLE_NAME INNODB_DISABLE_RESIZE_BUFFER_POOL_DEBUG
SESSION_VALUE NULL
DEFAULT_VALUE ON
@@ -926,7 +914,7 @@ SESSION_VALUE NULL
DEFAULT_VALUE 0
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BIGINT UNSIGNED
VARIABLE_COMMENT Helps to save your data in case the disk image of the database becomes corrupt.
VARIABLE_COMMENT Helps to save your data in case the disk image of the database becomes corrupt. Value 5 can return bogus data, and 6 can permanently corrupt data.
NUMERIC_MIN_VALUE 0
NUMERIC_MAX_VALUE 6
NUMERIC_BLOCK_SIZE 0

View File

@@ -1,12 +0,0 @@
#
# Basic test for innodb_disable_background_merge.
#
-- source include/have_innodb.inc
# The config variable is a debug variable
-- source include/have_debug.inc
# Check the default value
SET @orig = @@global.innodb_disable_background_merge;
SELECT @orig;

View File

@@ -223,7 +223,8 @@ btr_root_block_get(
return NULL;
}
buf_block_t* block = btr_block_get(*index, index->page, mode, mtr);
buf_block_t* block = btr_block_get(*index, index->page, mode, false,
mtr);
if (!block) {
index->table->file_unreadable = true;
@@ -833,7 +834,8 @@ btr_node_ptr_get_child(
return btr_block_get(
*index, btr_node_ptr_get_child_page_no(node_ptr, offsets),
RW_SX_LATCH, mtr);
RW_SX_LATCH, btr_page_get_level(page_align(node_ptr)) == 1,
mtr);
}
/************************************************************//**
@@ -2498,7 +2500,6 @@ btr_attach_half_pages(
{
ulint prev_page_no;
ulint next_page_no;
ulint level;
page_t* page = buf_block_get_frame(block);
page_t* lower_page;
page_t* upper_page;
@@ -2551,6 +2552,10 @@ btr_attach_half_pages(
upper_page_zip = buf_block_get_page_zip(new_block);
}
/* Get the level of the split pages */
const ulint level = btr_page_get_level(buf_block_get_frame(block));
ut_ad(level == btr_page_get_level(buf_block_get_frame(new_block)));
/* Get the previous and next pages of page */
prev_page_no = btr_page_get_prev(page, mtr);
next_page_no = btr_page_get_next(page, mtr);
@@ -2558,17 +2563,13 @@ btr_attach_half_pages(
/* for consistency, both blocks should be locked, before change */
if (prev_page_no != FIL_NULL && direction == FSP_DOWN) {
prev_block = btr_block_get(*index, prev_page_no, RW_X_LATCH,
mtr);
!level, mtr);
}
if (next_page_no != FIL_NULL && direction != FSP_DOWN) {
next_block = btr_block_get(*index, next_page_no, RW_X_LATCH,
mtr);
!level, mtr);
}
/* Get the level of the split pages */
level = btr_page_get_level(buf_block_get_frame(block));
ut_ad(level == btr_page_get_level(buf_block_get_frame(new_block)));
/* Build the node pointer (= node key and page address) for the upper
half */
@@ -2709,7 +2710,7 @@ btr_insert_into_right_sibling(
ulint max_size;
next_block = btr_block_get(*cursor->index, next_page_no, RW_X_LATCH,
mtr);
page_is_leaf(page), mtr);
if (UNIV_UNLIKELY(!next_block)) {
return NULL;
}
@@ -3218,7 +3219,8 @@ void btr_level_list_remove(const buf_block_t& block, const dict_index_t& index,
if (prev_page_no != FIL_NULL) {
buf_block_t* prev_block = btr_block_get(
index, prev_page_no, RW_X_LATCH, mtr);
index, prev_page_no, RW_X_LATCH, page_is_leaf(page),
mtr);
page_t* prev_page
= buf_block_get_frame(prev_block);
#ifdef UNIV_BTR_DEBUG
@@ -3234,7 +3236,8 @@ void btr_level_list_remove(const buf_block_t& block, const dict_index_t& index,
if (next_page_no != FIL_NULL) {
buf_block_t* next_block = btr_block_get(
index, next_page_no, RW_X_LATCH, mtr);
index, next_page_no, RW_X_LATCH, page_is_leaf(page),
mtr);
page_t* next_page
= buf_block_get_frame(next_block);
#ifdef UNIV_BTR_DEBUG
@@ -4199,7 +4202,7 @@ btr_discard_page(
ut_d(bool parent_is_different = false);
if (left_page_no != FIL_NULL) {
merge_block = btr_block_get(*index, left_page_no, RW_X_LATCH,
mtr);
true, mtr);
merge_page = buf_block_get_frame(merge_block);
#ifdef UNIV_BTR_DEBUG
ut_a(btr_page_get_next(merge_page, mtr)
@@ -4213,7 +4216,7 @@ btr_discard_page(
== btr_cur_get_rec(&parent_cursor)));
} else if (right_page_no != FIL_NULL) {
merge_block = btr_block_get(*index, right_page_no, RW_X_LATCH,
mtr);
true, mtr);
merge_page = buf_block_get_frame(merge_block);
#ifdef UNIV_BTR_DEBUG
ut_a(btr_page_get_prev(merge_page, mtr)
@@ -4866,7 +4869,8 @@ btr_validate_level(
savepoint2 = mtr_set_savepoint(&mtr);
block = btr_block_get(*index, left_page_no,
RW_SX_LATCH, &mtr);
RW_SX_LATCH, false,
&mtr);
page = buf_block_get_frame(block);
left_page_no = btr_page_get_prev(page, &mtr);
}
@@ -4935,7 +4939,7 @@ loop:
savepoint = mtr_set_savepoint(&mtr);
right_block = btr_block_get(*index, right_page_no, RW_SX_LATCH,
&mtr);
!level, &mtr);
right_page = buf_block_get_frame(right_block);
if (btr_page_get_prev(right_page, &mtr)
@@ -5109,10 +5113,11 @@ loop:
&mtr, savepoint, right_block);
btr_block_get(*index, parent_right_page_no,
RW_SX_LATCH, &mtr);
RW_SX_LATCH, false, &mtr);
right_block = btr_block_get(*index,
right_page_no,
RW_SX_LATCH, &mtr);
RW_SX_LATCH,
!level, &mtr);
}
btr_cur_position(
@@ -5187,16 +5192,17 @@ node_ptr_fails:
if (parent_right_page_no != FIL_NULL) {
btr_block_get(*index,
parent_right_page_no,
RW_SX_LATCH, &mtr);
RW_SX_LATCH, false,
&mtr);
}
} else if (parent_page_no != FIL_NULL) {
btr_block_get(*index, parent_page_no,
RW_SX_LATCH, &mtr);
RW_SX_LATCH, false, &mtr);
}
}
block = btr_block_get(*index, right_page_no, RW_SX_LATCH,
&mtr);
!level, &mtr);
page = buf_block_get_frame(block);
goto loop;
@@ -5299,7 +5305,8 @@ btr_can_merge_with_page(
index = btr_cur_get_index(cursor);
page = btr_cur_get_page(cursor);
mblock = btr_block_get(*index, page_no, RW_X_LATCH, mtr);
mblock = btr_block_get(*index, page_no, RW_X_LATCH, page_is_leaf(page),
mtr);
mpage = buf_block_get_frame(mblock);
n_recs = page_get_n_recs(page);

View File

@@ -120,7 +120,7 @@ PageBulk::init()
}
} else {
new_block = btr_block_get(*m_index, m_page_no, RW_X_LATCH,
&m_mtr);
false, &m_mtr);
new_page = buf_block_get_frame(new_block);
new_page_zip = buf_block_get_page_zip(new_block);
@@ -1014,7 +1014,7 @@ BtrBulk::finish(dberr_t err)
ut_ad(last_page_no != FIL_NULL);
last_block = btr_block_get(*m_index, last_page_no, RW_X_LATCH,
&mtr);
false, &mtr);
first_rec = page_rec_get_next(
page_get_infimum_rec(last_block->frame));
ut_ad(page_rec_is_user_rec(first_rec));

View File

@@ -248,7 +248,8 @@ btr_cur_latch_leaves(
mode = latch_mode == BTR_MODIFY_LEAF ? RW_X_LATCH : RW_S_LATCH;
latch_leaves.savepoints[1] = mtr_set_savepoint(mtr);
get_block = btr_block_get(*cursor->index,
block->page.id.page_no(), mode, mtr);
block->page.id.page_no(), mode,
true, mtr);
latch_leaves.blocks[1] = get_block;
#ifdef UNIV_BTR_DEBUG
ut_a(page_is_comp(get_block->frame) == page_is_comp(page));
@@ -278,7 +279,8 @@ btr_cur_latch_leaves(
latch_leaves.savepoints[0] = mtr_set_savepoint(mtr);
get_block = btr_block_get(
*cursor->index, left_page_no, RW_X_LATCH, mtr);
*cursor->index, left_page_no, RW_X_LATCH,
true, mtr);
latch_leaves.blocks[0] = get_block;
if (spatial) {
@@ -295,7 +297,7 @@ btr_cur_latch_leaves(
latch_leaves.savepoints[1] = mtr_set_savepoint(mtr);
get_block = btr_block_get(
*cursor->index, block->page.id.page_no(),
RW_X_LATCH, mtr);
RW_X_LATCH, true, mtr);
latch_leaves.blocks[1] = get_block;
#ifdef UNIV_BTR_DEBUG
@@ -326,7 +328,7 @@ btr_cur_latch_leaves(
latch_leaves.savepoints[2] = mtr_set_savepoint(mtr);
get_block = btr_block_get(*cursor->index,
right_page_no, RW_X_LATCH,
mtr);
true, mtr);
latch_leaves.blocks[2] = get_block;
#ifdef UNIV_BTR_DEBUG
ut_a(page_is_comp(get_block->frame)
@@ -353,7 +355,8 @@ btr_cur_latch_leaves(
if (left_page_no != FIL_NULL) {
latch_leaves.savepoints[0] = mtr_set_savepoint(mtr);
get_block = btr_block_get(
*cursor->index, left_page_no, mode, mtr);
*cursor->index, left_page_no, mode,
true, mtr);
latch_leaves.blocks[0] = get_block;
cursor->left_block = get_block;
#ifdef UNIV_BTR_DEBUG
@@ -366,7 +369,8 @@ btr_cur_latch_leaves(
latch_leaves.savepoints[1] = mtr_set_savepoint(mtr);
get_block = btr_block_get(*cursor->index,
block->page.id.page_no(), mode, mtr);
block->page.id.page_no(), mode,
true, mtr);
latch_leaves.blocks[1] = get_block;
#ifdef UNIV_BTR_DEBUG
ut_a(page_is_comp(get_block->frame) == page_is_comp(page));
@@ -752,18 +756,17 @@ btr_cur_optimistic_latch_leaves(
goto unpin_failed;
}
left_page_no = btr_page_get_prev(
buf_block_get_frame(block), mtr);
left_page_no = btr_page_get_prev(block->frame, mtr);
rw_lock_s_unlock(&block->lock);
cursor->left_block = left_page_no != FIL_NULL
? btr_block_get(*cursor->index, left_page_no, mode,
mtr)
page_is_leaf(block->frame), mtr)
: NULL;
if (buf_page_optimistic_get(mode, block, modify_clock,
file, line, mtr)) {
if (btr_page_get_prev(buf_block_get_frame(block), mtr)
if (btr_page_get_prev(block->frame, mtr)
== left_page_no) {
buf_block_buf_fix_dec(block);
*latch_mode = mode;
@@ -1185,7 +1188,6 @@ btr_cur_search_to_nth_level_func(
ulint up_bytes;
ulint low_match;
ulint low_bytes;
ulint savepoint;
ulint rw_latch;
page_cur_mode_t page_mode;
page_cur_mode_t search_mode = PAGE_CUR_UNSUPP;
@@ -1197,7 +1199,6 @@ btr_cur_search_to_nth_level_func(
ulint root_height = 0; /* remove warning */
dberr_t err = DB_SUCCESS;
ulint upper_rw_latch, root_leaf_rw_latch;
btr_intention_t lock_intention;
bool modify_external;
buf_block_t* tree_blocks[BTR_MAX_LEVELS];
@@ -1387,7 +1388,9 @@ btr_cur_search_to_nth_level_func(
/* Store the position of the tree latch we push to mtr so that we
know how to release it when we have latched leaf node(s) */
savepoint = mtr_set_savepoint(mtr);
ulint savepoint = mtr_set_savepoint(mtr);
rw_lock_type_t upper_rw_latch;
switch (latch_mode) {
case BTR_MODIFY_TREE:
@@ -1448,7 +1451,8 @@ btr_cur_search_to_nth_level_func(
upper_rw_latch = RW_NO_LATCH;
}
}
root_leaf_rw_latch = btr_cur_latch_for_root_leaf(latch_mode);
const rw_lock_type_t root_leaf_rw_latch = btr_cur_latch_for_root_leaf(
latch_mode);
page_cursor = btr_cur_get_page_cur(cursor);
@@ -1536,7 +1540,8 @@ retry_page_get:
ut_ad(n_blocks < BTR_MAX_LEVELS);
tree_savepoints[n_blocks] = mtr_set_savepoint(mtr);
block = buf_page_get_gen(page_id, zip_size, rw_latch, guess,
buf_mode, file, line, mtr, &err);
buf_mode, file, line, mtr, &err,
height == 0 && !index->is_clust());
tree_blocks[n_blocks] = block;
/* Note that block==NULL signifies either an error or change
@@ -1681,8 +1686,9 @@ retry_page_get:
tree_blocks[n_blocks]);
tree_savepoints[n_blocks] = mtr_set_savepoint(mtr);
block = buf_page_get_gen(page_id, zip_size, rw_latch, NULL,
buf_mode, file, line, mtr, &err);
block = buf_page_get_gen(page_id, zip_size,
rw_latch, NULL, buf_mode,
file, line, mtr, &err);
tree_blocks[n_blocks] = block;
if (err != DB_SUCCESS) {
@@ -2339,7 +2345,7 @@ need_opposite_intention:
buf_block_t* child_block = btr_block_get(
*index, page_id.page_no(),
latch_mode == BTR_CONT_MODIFY_TREE
? RW_X_LATCH : RW_SX_LATCH, mtr);
? RW_X_LATCH : RW_SX_LATCH, false, mtr);
btr_assert_not_corrupted(child_block, index);
} else {
ut_ad(mtr_memo_contains(mtr, block, upper_rw_latch));
@@ -2471,8 +2477,6 @@ btr_cur_open_at_index_side_func(
ulint root_height = 0; /* remove warning */
rec_t* node_ptr;
ulint estimate;
ulint savepoint;
ulint upper_rw_latch, root_leaf_rw_latch;
btr_intention_t lock_intention;
buf_block_t* tree_blocks[BTR_MAX_LEVELS];
ulint tree_savepoints[BTR_MAX_LEVELS];
@@ -2509,7 +2513,9 @@ btr_cur_open_at_index_side_func(
/* Store the position of the tree latch we push to mtr so that we
know how to release it when we have latched the leaf node */
savepoint = mtr_set_savepoint(mtr);
ulint savepoint = mtr_set_savepoint(mtr);
rw_lock_type_t upper_rw_latch;
switch (latch_mode) {
case BTR_CONT_MODIFY_TREE:
@@ -2548,7 +2554,9 @@ btr_cur_open_at_index_side_func(
upper_rw_latch = RW_NO_LATCH;
}
}
root_leaf_rw_latch = btr_cur_latch_for_root_leaf(latch_mode);
const rw_lock_type_t root_leaf_rw_latch = btr_cur_latch_for_root_leaf(
latch_mode);
page_cursor = btr_cur_get_page_cur(cursor);
cursor->index = index;
@@ -2563,22 +2571,17 @@ btr_cur_open_at_index_side_func(
height = ULINT_UNDEFINED;
for (;;) {
buf_block_t* block;
ulint rw_latch;
ut_ad(n_blocks < BTR_MAX_LEVELS);
if (height != 0
&& (latch_mode != BTR_MODIFY_TREE
|| height == level)) {
rw_latch = upper_rw_latch;
} else {
rw_latch = RW_NO_LATCH;
}
tree_savepoints[n_blocks] = mtr_set_savepoint(mtr);
block = buf_page_get_gen(page_id, zip_size, rw_latch, NULL,
BUF_GET, file, line, mtr, &err);
const ulint rw_latch = height
&& (latch_mode != BTR_MODIFY_TREE || height == level)
? upper_rw_latch : RW_NO_LATCH;
buf_block_t* block = buf_page_get_gen(page_id, zip_size,
rw_latch, NULL, BUF_GET,
file, line, mtr, &err,
height == 0
&& !index->is_clust());
ut_ad((block != NULL) == (err == DB_SUCCESS));
tree_blocks[n_blocks] = block;
@@ -2630,75 +2633,62 @@ btr_cur_open_at_index_side_func(
ut_ad(height == btr_page_get_level(page));
}
if (height == level) {
if (srv_read_only_mode) {
btr_cur_latch_leaves(
block, latch_mode, cursor, mtr);
} else if (height == 0) {
if (rw_latch == RW_NO_LATCH) {
btr_cur_latch_leaves(block, latch_mode,
cursor, mtr);
}
/* In versions <= 3.23.52 we had
forgotten to release the tree latch
here. If in an index scan we had to
scan far to find a record visible to
the current transaction, that could
starve others waiting for the tree
latch. */
if (height == 0) {
if (rw_latch == RW_NO_LATCH) {
btr_cur_latch_leaves(block, latch_mode,
cursor, mtr);
}
switch (latch_mode) {
case BTR_MODIFY_TREE:
case BTR_CONT_MODIFY_TREE:
case BTR_CONT_SEARCH_TREE:
/* In versions <= 3.23.52 we had forgotten to
release the tree latch here. If in an index
scan we had to scan far to find a record
visible to the current transaction, that could
starve others waiting for the tree latch. */
switch (latch_mode) {
case BTR_MODIFY_TREE:
case BTR_CONT_MODIFY_TREE:
case BTR_CONT_SEARCH_TREE:
break;
default:
if (UNIV_UNLIKELY(srv_read_only_mode)) {
break;
default:
if (!s_latch_by_caller) {
/* Release the tree s-latch */
mtr_release_s_latch_at_savepoint(
mtr, savepoint,
dict_index_get_lock(
index));
}
/* release upper blocks */
for (; n_releases < n_blocks;
n_releases++) {
mtr_release_block_at_savepoint(
mtr,
tree_savepoints[
n_releases],
tree_blocks[
n_releases]);
}
}
} else { /* height != 0 */
/* We already have the block latched. */
ut_ad(latch_mode == BTR_SEARCH_TREE);
ut_ad(s_latch_by_caller);
ut_ad(upper_rw_latch == RW_S_LATCH);
if (!s_latch_by_caller) {
/* Release the tree s-latch */
mtr_release_s_latch_at_savepoint(
mtr, savepoint, &index->lock);
}
ut_ad(mtr_memo_contains(mtr, block,
upper_rw_latch));
if (s_latch_by_caller) {
/* to exclude modifying tree operations
should sx-latch the index. */
ut_ad(mtr_memo_contains(
/* release upper blocks */
for (; n_releases < n_blocks; n_releases++) {
mtr_release_block_at_savepoint(
mtr,
dict_index_get_lock(index),
MTR_MEMO_SX_LOCK));
/* because has sx-latch of index,
can release upper blocks. */
for (; n_releases < n_blocks;
n_releases++) {
mtr_release_block_at_savepoint(
mtr,
tree_savepoints[
n_releases],
tree_blocks[
n_releases]);
}
tree_savepoints[n_releases],
tree_blocks[n_releases]);
}
}
} else if (height == level /* height != 0 */
&& UNIV_LIKELY(!srv_read_only_mode)) {
/* We already have the block latched. */
ut_ad(latch_mode == BTR_SEARCH_TREE);
ut_ad(s_latch_by_caller);
ut_ad(upper_rw_latch == RW_S_LATCH);
ut_ad(mtr_memo_contains(mtr, block, upper_rw_latch));
if (s_latch_by_caller) {
/* to exclude modifying tree operations
should sx-latch the index. */
ut_ad(mtr_memo_contains(mtr, &index->lock,
MTR_MEMO_SX_LOCK));
/* because has sx-latch of index,
can release upper blocks. */
for (; n_releases < n_blocks; n_releases++) {
mtr_release_block_at_savepoint(
mtr,
tree_savepoints[n_releases],
tree_blocks[n_releases]);
}
}
}
@@ -2838,8 +2828,6 @@ btr_cur_open_at_rnd_pos_func(
ulint node_ptr_max_size = srv_page_size / 2;
ulint height;
rec_t* node_ptr;
ulint savepoint;
ulint upper_rw_latch, root_leaf_rw_latch;
btr_intention_t lock_intention;
buf_block_t* tree_blocks[BTR_MAX_LEVELS];
ulint tree_savepoints[BTR_MAX_LEVELS];
@@ -2856,7 +2844,9 @@ btr_cur_open_at_rnd_pos_func(
ut_ad(!(latch_mode & BTR_MODIFY_EXTERNAL));
savepoint = mtr_set_savepoint(mtr);
ulint savepoint = mtr_set_savepoint(mtr);
rw_lock_type_t upper_rw_latch;
switch (latch_mode) {
case BTR_MODIFY_TREE:
@@ -2903,7 +2893,8 @@ btr_cur_open_at_rnd_pos_func(
return(false);
}
root_leaf_rw_latch = btr_cur_latch_for_root_leaf(latch_mode);
const rw_lock_type_t root_leaf_rw_latch = btr_cur_latch_for_root_leaf(
latch_mode);
page_cursor = btr_cur_get_page_cur(cursor);
cursor->index = index;
@@ -2919,22 +2910,19 @@ btr_cur_open_at_rnd_pos_func(
height = ULINT_UNDEFINED;
for (;;) {
buf_block_t* block;
page_t* page;
ulint rw_latch;
ut_ad(n_blocks < BTR_MAX_LEVELS);
if (height != 0
&& latch_mode != BTR_MODIFY_TREE) {
rw_latch = upper_rw_latch;
} else {
rw_latch = RW_NO_LATCH;
}
tree_savepoints[n_blocks] = mtr_set_savepoint(mtr);
block = buf_page_get_gen(page_id, zip_size, rw_latch, NULL,
BUF_GET, file, line, mtr, &err);
const rw_lock_type_t rw_latch = height
&& latch_mode != BTR_MODIFY_TREE
? upper_rw_latch : RW_NO_LATCH;
buf_block_t* block = buf_page_get_gen(page_id, zip_size,
rw_latch, NULL, BUF_GET,
file, line, mtr, &err,
height == 0
&& !index->is_clust());
tree_blocks[n_blocks] = block;
ut_ad((block != NULL) == (err == DB_SUCCESS));
@@ -7453,7 +7441,7 @@ struct btr_blob_log_check_t {
if (m_op == BTR_STORE_INSERT_BULK) {
mtr_x_lock(dict_index_get_lock(index), m_mtr);
m_pcur->btr_cur.page_cur.block = btr_block_get(
*index, page_no, RW_X_LATCH, m_mtr);
*index, page_no, RW_X_LATCH, false, m_mtr);
m_pcur->btr_cur.page_cur.rec
= m_pcur->btr_cur.page_cur.block->frame
+ offs;

View File

@@ -585,7 +585,8 @@ btr_defragment_n_pages(
break;
}
blocks[i] = btr_block_get(*index, page_no, RW_X_LATCH, mtr);
blocks[i] = btr_block_get(*index, page_no, RW_X_LATCH, true,
mtr);
}
if (n_pages == 1) {

View File

@@ -465,7 +465,8 @@ btr_pcur_move_to_next_page(
}
buf_block_t* next_block = btr_block_get(
*btr_pcur_get_btr_cur(cursor)->index, next_page_no, mode, mtr);
*btr_pcur_get_btr_cur(cursor)->index, next_page_no, mode,
page_is_leaf(page), mtr);
if (UNIV_UNLIKELY(!next_block)) {
return;

View File

@@ -431,7 +431,7 @@ btr_pessimistic_scrub(
}
/* read block variables */
const ulint page_no = mach_read_from_4(page + FIL_PAGE_OFFSET);
const ulint page_no = block->page.id.page_no();
const ulint left_page_no = mach_read_from_4(page + FIL_PAGE_PREV);
const ulint right_page_no = mach_read_from_4(page + FIL_PAGE_NEXT);
@@ -448,12 +448,14 @@ btr_pessimistic_scrub(
*/
mtr->release_block_at_savepoint(scrub_data->savepoint, block);
btr_block_get(*index, left_page_no, RW_X_LATCH, mtr);
btr_block_get(*index, left_page_no, RW_X_LATCH,
page_is_leaf(page), mtr);
/**
* Refetch block and re-initialize page
*/
block = btr_block_get(*index, page_no, RW_X_LATCH, mtr);
block = btr_block_get(*index, page_no, RW_X_LATCH,
page_is_leaf(page), mtr);
page = buf_block_get_frame(block);
@@ -465,7 +467,8 @@ btr_pessimistic_scrub(
}
if (right_page_no != FIL_NULL) {
btr_block_get(*index, right_page_no, RW_X_LATCH, mtr);
btr_block_get(*index, right_page_no, RW_X_LATCH,
page_is_leaf(page), mtr);
}
/* arguments to btr_page_split_and_insert */

View File

@@ -882,10 +882,8 @@ btr_search_guess_on_hash(
const rec_t* rec;
ulint fold;
index_id_t index_id;
#ifdef notdefined
btr_cur_t cursor2;
btr_pcur_t pcur;
#endif
ut_ad(mtr->is_active());
ut_ad(!ahi_latch || rw_lock_own_flagged(
ahi_latch, RW_LOCK_FLAG_X | RW_LOCK_FLAG_S));
@@ -893,11 +891,12 @@ btr_search_guess_on_hash(
return(FALSE);
}
ut_ad(index && info && tuple && cursor && mtr);
ut_ad(!dict_index_is_ibuf(index));
ut_ad(!index->is_ibuf());
ut_ad(!ahi_latch || ahi_latch == btr_get_search_latch(index));
ut_ad((latch_mode == BTR_SEARCH_LEAF)
|| (latch_mode == BTR_MODIFY_LEAF));
compile_time_assert(ulint{BTR_SEARCH_LEAF} == ulint{RW_S_LATCH});
compile_time_assert(ulint{BTR_MODIFY_LEAF} == ulint{RW_X_LATCH});
/* Not supported for spatial index */
ut_ad(!dict_index_is_spatial(index));
@@ -955,16 +954,47 @@ fail:
return(FALSE);
}
buf_block_t* block = buf_block_from_ahi(rec);
buf_block_t* block = buf_block_from_ahi(rec);
buf_pool_t* buf_pool = buf_pool_from_block(block);
if (use_latch) {
mutex_enter(&block->mutex);
if (!buf_page_get_known_nowait(
latch_mode, block, BUF_MAKE_YOUNG,
__FILE__, __LINE__, mtr)) {
if (buf_block_get_state(block) == BUF_BLOCK_REMOVE_HASH) {
/* Another thread is just freeing the block
from the LRU list of the buffer pool: do not
try to access this page. */
mutex_exit(&block->mutex);
goto fail;
}
ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
ut_ad(!block->page.file_page_was_freed);
buf_page_set_accessed(&block->page);
buf_block_buf_fix_inc(block, __FILE__, __LINE__);
mutex_exit(&block->mutex);
buf_page_make_young_if_needed(buf_pool, &block->page);
mtr_memo_type_t fix_type;
if (latch_mode == BTR_SEARCH_LEAF) {
if (!rw_lock_s_lock_nowait(&block->lock,
__FILE__, __LINE__)) {
got_no_latch:
buf_block_buf_fix_dec(block);
goto fail;
}
fix_type = MTR_MEMO_PAGE_S_FIX;
} else {
if (!rw_lock_x_lock_func_nowait_inline(
&block->lock, __FILE__, __LINE__)) {
goto got_no_latch;
}
fix_type = MTR_MEMO_PAGE_X_FIX;
}
mtr->memo_push(block, fix_type);
buf_pool->stat.n_page_gets++;
rw_lock_s_unlock(use_latch);
buf_block_dbg_add_level(block, SYNC_TREE_NODE_FROM_HASH);
@@ -1052,20 +1082,15 @@ fail:
#ifdef UNIV_SEARCH_PERF_STAT
btr_search_n_succ++;
#endif
if (!ahi_latch && buf_page_peek_if_too_old(&block->page)) {
buf_page_make_young(&block->page);
}
/* Increment the page get statistics though we did not really
fix the page: for user info only */
{
buf_pool_t* buf_pool = buf_pool_from_bpage(&block->page);
++buf_pool->stat.n_page_gets;
++buf_pool->stat.n_page_gets;
if (!ahi_latch) {
buf_page_make_young_if_needed(buf_pool, &block->page);
}
return(TRUE);
return true;
}
/** Drop any adaptive hash index entries that point to an index page.

View File

@@ -3708,28 +3708,6 @@ buf_page_make_young(
buf_pool_mutex_exit(buf_pool);
}
/********************************************************************//**
Moves a page to the start of the buffer pool LRU list if it is too old.
This high-level function can be used to prevent an important page from
slipping out of the buffer pool. */
static
void
buf_page_make_young_if_needed(
/*==========================*/
buf_page_t* bpage) /*!< in/out: buffer block of a
file page */
{
#ifdef UNIV_DEBUG
buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
ut_ad(!buf_pool_mutex_own(buf_pool));
#endif /* UNIV_DEBUG */
ut_a(buf_page_in_file(bpage));
if (buf_page_peek_if_too_old(bpage)) {
buf_page_make_young(bpage);
}
}
#ifdef UNIV_DEBUG
/** Sets file_page_was_freed TRUE if the page is found in the buffer pool.
@@ -3913,7 +3891,7 @@ got_block:
mutex_exit(block_mutex);
buf_page_make_young_if_needed(bpage);
buf_page_make_young_if_needed(buf_pool, bpage);
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
ut_a(++buf_dbg_counter % 5771 || buf_validate());
@@ -4193,14 +4171,6 @@ buf_debug_execute_is_force_flush()
/*==============================*/
{
DBUG_EXECUTE_IF("ib_buf_force_flush", return(true); );
/* This is used during queisce testing, we want to ensure maximum
buffering by the change buffer. */
if (srv_ibuf_disable_background_merge) {
return(true);
}
return(false);
}
#endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
@@ -4247,16 +4217,20 @@ buf_wait_for_read(
}
/** This is the general function used to get access to a database page.
@param[in] page_id page id
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
@param[in] rw_latch RW_S_LATCH, RW_X_LATCH, RW_NO_LATCH
@param[in] guess guessed block or NULL
@param[in] mode BUF_GET, BUF_GET_IF_IN_POOL,
@param[in] page_id page id
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
@param[in] rw_latch RW_S_LATCH, RW_X_LATCH, RW_NO_LATCH
@param[in] guess guessed block or NULL
@param[in] mode BUF_GET, BUF_GET_IF_IN_POOL,
BUF_PEEK_IF_IN_POOL, BUF_GET_NO_LATCH, or BUF_GET_IF_IN_POOL_OR_WATCH
@param[in] file file name
@param[in] line line where called
@param[in] mtr mini-transaction
@param[out] err DB_SUCCESS or error code
@param[in] file file name
@param[in] line line where called
@param[in] mtr mini-transaction
@param[out] err DB_SUCCESS or error code
@param[in] allow_ibuf_merge Allow change buffer merge to happen
while reading the page from file
then it makes sure that it does merging of change buffer changes while
reading the page from file.
@return pointer to the block or NULL */
buf_block_t*
buf_page_get_gen(
@@ -4268,7 +4242,8 @@ buf_page_get_gen(
const char* file,
unsigned line,
mtr_t* mtr,
dberr_t* err)
dberr_t* err,
bool allow_ibuf_merge)
{
buf_block_t* block;
unsigned access_time;
@@ -4283,6 +4258,10 @@ buf_page_get_gen(
|| (rw_latch == RW_X_LATCH)
|| (rw_latch == RW_SX_LATCH)
|| (rw_latch == RW_NO_LATCH));
ut_ad(!allow_ibuf_merge
|| mode == BUF_GET
|| mode == BUF_GET_IF_IN_POOL
|| mode == BUF_GET_IF_IN_POOL_OR_WATCH);
if (err) {
*err = DB_SUCCESS;
@@ -4499,11 +4478,11 @@ loop:
if (fsp_is_system_temporary(page_id.space())) {
/* For temporary tablespace, the mutex is being used
for synchronization between user thread and flush
thread, instead of block->lock. See buf_flush_page()
for the flush thread counterpart. */
for synchorization between user thread and flush thread,
instead of block->lock. See buf_flush_page() for the flush
thread counterpart. */
BPageMutex* fix_mutex = buf_page_get_mutex(
&fix_block->page);
&fix_block->page);
mutex_enter(fix_mutex);
fix_block->fix();
mutex_exit(fix_mutex);
@@ -4539,13 +4518,11 @@ got_block:
}
}
switch (buf_block_get_state(fix_block)) {
buf_page_t* bpage;
switch (UNIV_EXPECT(buf_block_get_state(fix_block),
BUF_BLOCK_FILE_PAGE)) {
case BUF_BLOCK_FILE_PAGE:
bpage = &block->page;
if (fsp_is_system_temporary(page_id.space())
&& buf_page_get_io_fix(bpage) != BUF_IO_NONE) {
&& buf_block_get_io_fix(block) != BUF_IO_NONE) {
/* This suggests that the page is being flushed.
Avoid returning reference to this page.
Instead wait for the flush action to complete. */
@@ -4568,9 +4545,16 @@ evict_from_pool:
return(NULL);
}
break;
default:
ut_error;
break;
case BUF_BLOCK_ZIP_PAGE:
case BUF_BLOCK_ZIP_DIRTY:
if (UNIV_UNLIKELY(mode == BUF_EVICT_IF_IN_POOL)) {
goto evict_from_pool;
}
if (mode == BUF_PEEK_IF_IN_POOL) {
/* This mode is only used for dropping an
adaptive hash index. There cannot be an
@@ -4581,7 +4565,7 @@ evict_from_pool:
return(NULL);
}
bpage = &block->page;
buf_page_t* bpage = &block->page;
/* Note: We have already buffer fixed this block. */
if (bpage->buf_fix_count > 1
@@ -4599,10 +4583,6 @@ evict_from_pool:
goto loop;
}
if (UNIV_UNLIKELY(mode == BUF_EVICT_IF_IN_POOL)) {
goto evict_from_pool;
}
/* Buffer-fix the block so that it cannot be evicted
or relocated while we are attempting to allocate an
uncompressed page. */
@@ -4696,35 +4676,31 @@ evict_from_pool:
buf_page_mutex_exit(block);
if (!access_time && !recv_no_ibuf_operations
&& ibuf_page_exists(block->page)) {
block->page.ibuf_exist = true;
}
buf_page_free_descriptor(bpage);
/* Decompress the page while not holding
buf_pool->mutex or block->mutex. */
{
bool success = buf_zip_decompress(block, TRUE);
if (!buf_zip_decompress(block, TRUE)) {
buf_pool_mutex_enter(buf_pool);
buf_page_mutex_enter(fix_block);
buf_block_set_io_fix(fix_block, BUF_IO_NONE);
buf_page_mutex_exit(fix_block);
if (!success) {
buf_pool_mutex_enter(buf_pool);
buf_page_mutex_enter(fix_block);
buf_block_set_io_fix(fix_block, BUF_IO_NONE);
buf_page_mutex_exit(fix_block);
--buf_pool->n_pend_unzip;
fix_block->unfix();
buf_pool_mutex_exit(buf_pool);
rw_lock_x_unlock(&fix_block->lock);
--buf_pool->n_pend_unzip;
fix_block->unfix();
buf_pool_mutex_exit(buf_pool);
rw_lock_x_unlock(&fix_block->lock);
if (err) {
*err = DB_PAGE_CORRUPTED;
}
return NULL;
if (err) {
*err = DB_PAGE_CORRUPTED;
}
}
if (!access_time && !recv_no_ibuf_operations) {
ibuf_merge_or_delete_for_page(
block, block->page.id, zip_size, true);
return NULL;
}
buf_pool_mutex_enter(buf_pool);
@@ -4742,14 +4718,6 @@ evict_from_pool:
rw_lock_x_unlock(&block->lock);
break;
case BUF_BLOCK_POOL_WATCH:
case BUF_BLOCK_NOT_USED:
case BUF_BLOCK_READY_FOR_USE:
case BUF_BLOCK_MEMORY:
case BUF_BLOCK_REMOVE_HASH:
ut_error;
break;
}
ut_ad(block == fix_block);
@@ -4876,7 +4844,7 @@ evict_from_pool:
}
if (mode != BUF_PEEK_IF_IN_POOL) {
buf_page_make_young_if_needed(&fix_block->page);
buf_page_make_young_if_needed(buf_pool, &fix_block->page);
}
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
@@ -4905,35 +4873,49 @@ evict_from_pool:
return NULL;
}
mtr_memo_type_t fix_type;
switch (rw_latch) {
case RW_NO_LATCH:
fix_type = MTR_MEMO_BUF_FIX;
break;
case RW_S_LATCH:
rw_lock_s_lock_inline(&fix_block->lock, 0, file, line);
fix_type = MTR_MEMO_PAGE_S_FIX;
break;
case RW_SX_LATCH:
rw_lock_sx_lock_inline(&fix_block->lock, 0, file, line);
fix_type = MTR_MEMO_PAGE_SX_FIX;
break;
default:
ut_ad(rw_latch == RW_X_LATCH);
if (allow_ibuf_merge
&& mach_read_from_2(fix_block->frame + FIL_PAGE_TYPE)
== FIL_PAGE_INDEX
&& page_is_leaf(fix_block->frame)) {
rw_lock_x_lock_inline(&fix_block->lock, 0, file, line);
fix_type = MTR_MEMO_PAGE_X_FIX;
break;
}
if (fix_block->page.ibuf_exist) {
fix_block->page.ibuf_exist = false;
ibuf_merge_or_delete_for_page(fix_block, page_id,
zip_size, true);
}
mtr_memo_push(mtr, fix_block, fix_type);
if (rw_latch == RW_X_LATCH) {
mtr->memo_push(fix_block, MTR_MEMO_PAGE_X_FIX);
} else {
rw_lock_x_unlock(&fix_block->lock);
goto get_latch;
}
} else {
get_latch:
mtr_memo_type_t fix_type;
switch (rw_latch) {
case RW_NO_LATCH:
fix_type = MTR_MEMO_BUF_FIX;
break;
case RW_S_LATCH:
rw_lock_s_lock_inline(&fix_block->lock, 0, file, line);
fix_type = MTR_MEMO_PAGE_S_FIX;
break;
case RW_SX_LATCH:
rw_lock_sx_lock_inline(&fix_block->lock, 0, file, line);
fix_type = MTR_MEMO_PAGE_SX_FIX;
break;
default:
ut_ad(rw_latch == RW_X_LATCH);
rw_lock_x_lock_inline(&fix_block->lock, 0, file, line);
fix_type = MTR_MEMO_PAGE_X_FIX;
break;
}
mtr->memo_push(block, fix_type);
}
if (mode != BUF_PEEK_IF_IN_POOL && !access_time) {
/* In the case of a first access, try to apply linear
@@ -4962,7 +4944,6 @@ buf_page_optimistic_get(
unsigned line, /*!< in: line where called */
mtr_t* mtr) /*!< in: mini-transaction */
{
buf_pool_t* buf_pool;
unsigned access_time;
ibool success;
@@ -4988,7 +4969,8 @@ buf_page_optimistic_get(
buf_page_mutex_exit(block);
buf_page_make_young_if_needed(&block->page);
buf_pool_t* buf_pool = buf_pool_from_block(block);
buf_page_make_young_if_needed(buf_pool, &block->page);
ut_ad(!ibuf_inside(mtr)
|| ibuf_page(block->page.id, block->zip_size(), NULL));
@@ -5049,109 +5031,6 @@ buf_page_optimistic_get(
ibuf_inside(mtr));
}
buf_pool = buf_pool_from_block(block);
buf_pool->stat.n_page_gets++;
return(TRUE);
}
/********************************************************************//**
This is used to get access to a known database page, when no waiting can be
done. For example, if a search in an adaptive hash index leads us to this
frame.
@return TRUE if success */
ibool
buf_page_get_known_nowait(
/*======================*/
ulint rw_latch,/*!< in: RW_S_LATCH, RW_X_LATCH */
buf_block_t* block, /*!< in: the known page */
ulint mode, /*!< in: BUF_MAKE_YOUNG or BUF_KEEP_OLD */
const char* file, /*!< in: file name */
unsigned line, /*!< in: line where called */
mtr_t* mtr) /*!< in: mini-transaction */
{
buf_pool_t* buf_pool;
ibool success;
ut_ad(mtr->is_active());
ut_ad((rw_latch == RW_S_LATCH) || (rw_latch == RW_X_LATCH));
buf_page_mutex_enter(block);
if (buf_block_get_state(block) == BUF_BLOCK_REMOVE_HASH) {
/* Another thread is just freeing the block from the LRU list
of the buffer pool: do not try to access this page; this
attempt to access the page can only come through the hash
index because when the buffer block state is ..._REMOVE_HASH,
we have already removed it from the page address hash table
of the buffer pool. */
buf_page_mutex_exit(block);
return(FALSE);
}
ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
buf_block_buf_fix_inc(block, file, line);
buf_page_set_accessed(&block->page);
buf_page_mutex_exit(block);
buf_pool = buf_pool_from_block(block);
if (mode == BUF_MAKE_YOUNG) {
buf_page_make_young_if_needed(&block->page);
}
ut_ad(!ibuf_inside(mtr) || mode == BUF_KEEP_OLD);
mtr_memo_type_t fix_type;
switch (rw_latch) {
case RW_S_LATCH:
success = rw_lock_s_lock_nowait(&block->lock, file, line);
fix_type = MTR_MEMO_PAGE_S_FIX;
break;
case RW_X_LATCH:
success = rw_lock_x_lock_func_nowait_inline(
&block->lock, file, line);
fix_type = MTR_MEMO_PAGE_X_FIX;
break;
default:
ut_error; /* RW_SX_LATCH is not implemented yet */
}
if (!success) {
buf_block_buf_fix_dec(block);
return(FALSE);
}
mtr_memo_push(mtr, block, fix_type);
#if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
ut_a(++buf_dbg_counter % 5771 || buf_validate());
ut_a(block->page.buf_fix_count > 0);
ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
#endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
#ifdef UNIV_DEBUG
if (mode != BUF_KEEP_OLD) {
/* If mode == BUF_KEEP_OLD, we are executing an I/O
completion routine. Avoid a bogus assertion failure
when ibuf_merge_or_delete_for_page() is processing a
page that was just freed due to DROP INDEX, or
deleting a record from SYS_INDEXES. This check will be
skipped in recv_recover_page() as well. */
buf_page_mutex_enter(block);
ut_a(!block->page.file_page_was_freed);
buf_page_mutex_exit(block);
}
#endif /* UNIV_DEBUG */
buf_pool->stat.n_page_gets++;
return(TRUE);
@@ -5258,6 +5137,7 @@ buf_page_init_low(
bpage->write_size = 0;
bpage->real_size = 0;
bpage->slot = NULL;
bpage->ibuf_exist = false;
HASH_INVALIDATE(bpage, hash);
@@ -6004,9 +5884,9 @@ static dberr_t buf_page_check_corrupt(buf_page_t* bpage, fil_space_t* space)
}
/** Complete a read or write request of a file page to or from the buffer pool.
@param[in,out] bpage page to complete
@param[in] dblwr whether the doublewrite buffer was used (on write)
@param[in] evict whether or not to evict the page from LRU list
@param[in,out] bpage page to complete
@param[in] dblwr whether the doublewrite buffer was used (on write)
@param[in] evict whether or not to evict the page from LRU list
@return whether the operation succeeded
@retval DB_SUCCESS always when writing, or if a read page was OK
@retval DB_TABLESPACE_DELETED if the tablespace does not exist
@@ -6201,10 +6081,9 @@ release_page:
&& (bpage->id.space() == 0
|| !is_predefined_tablespace(bpage->id.space()))
&& fil_page_get_type(frame) == FIL_PAGE_INDEX
&& page_is_leaf(frame)) {
ibuf_merge_or_delete_for_page(
reinterpret_cast<buf_block_t*>(bpage),
bpage->id, bpage->zip_size(), true);
&& page_is_leaf(frame)
&& ibuf_page_exists(*bpage)) {
bpage->ibuf_exist = true;
}
space->release_for_io();

View File

@@ -312,7 +312,7 @@ buf_read_ahead_random(const page_id_t page_id, ulint zip_size, bool ibuf)
if (bpage != NULL
&& buf_page_is_accessed(bpage)
&& buf_page_peek_if_young(bpage)) {
&& buf_page_peek_if_young(buf_pool, bpage)) {
recent_blocks++;
@@ -754,89 +754,6 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf)
return(count);
}
/********************************************************************//**
Issues read requests for pages which the ibuf module wants to read in, in
order to contract the insert buffer tree. Technically, this function is like
a read-ahead function. */
void
buf_read_ibuf_merge_pages(
/*======================*/
bool sync, /*!< in: true if the caller
wants this function to wait
for the highest address page
to get read in, before this
function returns */
const ulint* space_ids, /*!< in: array of space ids */
const ulint* page_nos, /*!< in: array of page numbers
to read, with the highest page
number the last in the
array */
ulint n_stored) /*!< in: number of elements
in the arrays */
{
#ifdef UNIV_IBUF_DEBUG
ut_a(n_stored < srv_page_size);
#endif
for (ulint i = 0; i < n_stored; i++) {
fil_space_t* s = fil_space_acquire_for_io(space_ids[i]);
if (!s) {
tablespace_deleted:
/* The tablespace was not found: remove all
entries for it */
ibuf_delete_for_discarded_space(space_ids[i]);
while (i + 1 < n_stored
&& space_ids[i + 1] == space_ids[i]) {
i++;
}
continue;
}
const ulint zip_size = s->zip_size();
s->release_for_io();
const page_id_t page_id(space_ids[i], page_nos[i]);
buf_pool_t* buf_pool = buf_pool_get(page_id);
while (buf_pool->n_pend_reads
> buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) {
os_thread_sleep(500000);
}
dberr_t err;
buf_read_page_low(&err,
sync && (i + 1 == n_stored),
0,
BUF_READ_ANY_PAGE, page_id, zip_size,
true, true /* ignore_missing_space */);
switch(err) {
case DB_SUCCESS:
case DB_ERROR:
break;
case DB_TABLESPACE_DELETED:
goto tablespace_deleted;
case DB_PAGE_CORRUPTED:
case DB_DECRYPTION_FAILED:
ib::error() << "Failed to read or decrypt " << page_id
<< " for change buffer merge";
break;
default:
ut_error;
}
}
os_aio_simulated_wake_handler_threads();
if (n_stored) {
DBUG_PRINT("ib_buf",
("ibuf merge read-ahead %u pages, space %u",
unsigned(n_stored), unsigned(space_ids[0])));
}
}
/** Issues read requests for pages which recovery wants to read in.
@param[in] sync true if the caller wants this function to wait
for the highest address page to get read in, before this function returns

View File

@@ -1,7 +1,7 @@
/*****************************************************************************
Copyright (c) 1996, 2017, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 2016, 2018, MariaDB Corporation.
Copyright (c) 2016, 2019, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
@@ -450,36 +450,15 @@ dict_boot(void)
/* Initialize the insert buffer table and index for each tablespace */
dberr_t err = DB_SUCCESS;
err = ibuf_init_at_db_start();
dberr_t err = ibuf_init_at_db_start();
if (err == DB_SUCCESS) {
if (srv_read_only_mode
&& srv_force_recovery != SRV_FORCE_NO_LOG_REDO
&& !ibuf_is_empty()) {
/* Load definitions of other indexes on system tables */
if (srv_force_recovery < SRV_FORCE_NO_IBUF_MERGE) {
ib::error() << "Change buffer must be empty when"
" --innodb-read-only is set!"
"You can try to recover the database with innodb_force_recovery=5";
err = DB_ERROR;
} else {
ib::warn() << "Change buffer not empty when --innodb-read-only "
"is set! but srv_force_recovery = " << srv_force_recovery
<< " , ignoring.";
}
}
if (err == DB_SUCCESS) {
/* Load definitions of other indexes on system tables */
dict_load_sys_table(dict_sys.sys_tables);
dict_load_sys_table(dict_sys.sys_columns);
dict_load_sys_table(dict_sys.sys_indexes);
dict_load_sys_table(dict_sys.sys_fields);
}
dict_load_sys_table(dict_sys.sys_tables);
dict_load_sys_table(dict_sys.sys_columns);
dict_load_sys_table(dict_sys.sys_indexes);
dict_load_sys_table(dict_sys.sys_fields);
}
mutex_exit(&dict_sys.mutex);

View File

@@ -1493,6 +1493,7 @@ dict_stats_analyze_index_below_cur(
rec_offs_set_n_alloc(offsets2, size);
rec = btr_cur_get_rec(cur);
page = page_align(rec);
ut_ad(!page_rec_is_leaf(rec));
offsets_rec = rec_get_offsets(rec, index, offsets1, false,
@@ -1514,9 +1515,11 @@ dict_stats_analyze_index_below_cur(
dberr_t err = DB_SUCCESS;
block = buf_page_get_gen(page_id, zip_size, RW_S_LATCH,
NULL /* no guessed block */,
BUF_GET, __FILE__, __LINE__, &mtr, &err);
block = buf_page_get_gen(page_id, zip_size,
RW_S_LATCH, NULL, BUF_GET,
__FILE__, __LINE__, &mtr, &err,
!index->is_clust()
&& 1 == btr_page_get_level(page));
page = buf_block_get_frame(block);
@@ -3143,7 +3146,7 @@ dict_stats_update(
if (!table->is_readable()) {
return (dict_stats_report_error(table));
} else if (srv_force_recovery >= SRV_FORCE_NO_IBUF_MERGE) {
} else if (srv_force_recovery > SRV_FORCE_NO_IBUF_MERGE) {
/* If we have set a high innodb_force_recovery level, do
not calculate statistics, as a badly corrupted index can
cause a crash in it. */

View File

@@ -4783,15 +4783,7 @@ fil_space_validate_for_mtr_commit(
/* We are serving mtr_commit(). While there is an active
mini-transaction, we should have !space->stop_new_ops. This is
guaranteed by meta-data locks or transactional locks, or
dict_sys.latch (X-lock in DROP, S-lock in purge).
However, a file I/O thread can invoke change buffer merge
while fil_check_pending_operations() is waiting for operations
to quiesce. This is not a problem, because
ibuf_merge_or_delete_for_page() would call
fil_space_acquire() before mtr_start() and
fil_space_t::release() after mtr_commit(). This is why
n_pending_ops should not be zero if stop_new_ops is set. */
dict_sys.latch (X-lock in DROP, S-lock in purge). */
ut_ad(!space->stop_new_ops
|| space->is_being_truncated /* fil_truncate_prepare() */
|| space->referenced());

View File

@@ -756,7 +756,7 @@ rtr_adjust_upper_level(
/* Update page links of the level */
if (prev_page_no != FIL_NULL) {
buf_block_t* prev_block = btr_block_get(
*index, prev_page_no, RW_X_LATCH, mtr);
*index, prev_page_no, RW_X_LATCH, false, mtr);
#ifdef UNIV_BTR_DEBUG
ut_a(page_is_comp(prev_block->frame) == page_is_comp(page));
ut_a(btr_page_get_next(prev_block->frame, mtr)
@@ -770,7 +770,7 @@ rtr_adjust_upper_level(
if (next_page_no != FIL_NULL) {
buf_block_t* next_block = btr_block_get(
*index, next_page_no, RW_X_LATCH, mtr);
*index, next_page_no, RW_X_LATCH, false, mtr);
#ifdef UNIV_BTR_DEBUG
ut_a(page_is_comp(next_block->frame) == page_is_comp(page));
ut_a(btr_page_get_prev(next_block->frame, mtr)

View File

@@ -5901,7 +5901,7 @@ initialize_auto_increment(dict_table_t* table, const Field* field)
table->persistent_autoinc without
autoinc_mutex protection, and there might be multiple
ha_innobase::open() executing concurrently. */
} else if (srv_force_recovery >= SRV_FORCE_NO_IBUF_MERGE) {
} else if (srv_force_recovery > SRV_FORCE_NO_IBUF_MERGE) {
/* If the recovery level is set so high that writes
are disabled we force the AUTOINC counter to 0
value effectively disabling writes to the table.
@@ -14037,7 +14037,7 @@ ha_innobase::info_low(
}
}
if (srv_force_recovery >= SRV_FORCE_NO_IBUF_MERGE) {
if (srv_force_recovery > SRV_FORCE_NO_IBUF_MERGE) {
goto func_exit;
@@ -16683,6 +16683,9 @@ innobase_commit_by_xid(
{
DBUG_ASSERT(hton == innodb_hton_ptr);
DBUG_EXECUTE_IF("innobase_xa_fail",
return XAER_RMFAIL;);
if (high_level_read_only) {
return(XAER_RMFAIL);
}
@@ -16715,6 +16718,9 @@ innobase_rollback_by_xid(
{
DBUG_ASSERT(hton == innodb_hton_ptr);
DBUG_EXECUTE_IF("innobase_xa_fail",
return XAER_RMFAIL;);
if (high_level_read_only) {
return(XAER_RMFAIL);
}
@@ -19039,7 +19045,7 @@ static MYSQL_SYSVAR_ULONG(write_io_threads, srv_n_write_io_threads,
static MYSQL_SYSVAR_ULONG(force_recovery, srv_force_recovery,
PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,
"Helps to save your data in case the disk image of the database becomes corrupt.",
"Helps to save your data in case the disk image of the database becomes corrupt. Value 5 can return bogus data, and 6 can permanently corrupt data.",
NULL, NULL, 0, 0, 6, 0);
static MYSQL_SYSVAR_ULONG(page_size, srv_page_size,
@@ -19227,12 +19233,6 @@ static MYSQL_SYSVAR_UINT(change_buffering_debug, ibuf_debug,
PLUGIN_VAR_RQCMDARG,
"Debug flags for InnoDB change buffering (0=none, 1=try to buffer)",
NULL, NULL, 0, 0, 1, 0);
static MYSQL_SYSVAR_BOOL(disable_background_merge,
srv_ibuf_disable_background_merge,
PLUGIN_VAR_NOCMDARG | PLUGIN_VAR_RQCMDARG,
"Disable change buffering merges by the master thread",
NULL, NULL, FALSE);
#endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
static MYSQL_SYSVAR_ULONG(buf_dump_status_frequency, srv_buf_dump_status_frequency,
@@ -19694,7 +19694,6 @@ static struct st_mysql_sys_var* innobase_system_variables[]= {
MYSQL_SYSVAR(change_buffer_max_size),
#if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
MYSQL_SYSVAR(change_buffering_debug),
MYSQL_SYSVAR(disable_background_merge),
#endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
#ifdef WITH_INNODB_DISALLOW_WRITES
MYSQL_SYSVAR(disallow_writes),

View File

@@ -28,10 +28,6 @@ Created 7/19/1997 Heikki Tuuri
#include "sync0sync.h"
#include "btr0sea.h"
#if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
my_bool srv_ibuf_disable_background_merge;
#endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
/** Number of bits describing a single page */
#define IBUF_BITS_PER_PAGE 4
/** The start address for an insert buffer bitmap page bitmap */
@@ -257,16 +253,6 @@ const ulint IBUF_MERGE_THRESHOLD = 4;
batch, in order to merge the entries for them in the insert buffer */
const ulint IBUF_MAX_N_PAGES_MERGED = IBUF_MERGE_AREA;
/** If the combined size of the ibuf trees exceeds ibuf.max_size by this
many pages, we start to contract it in connection to inserts there, using
non-synchronous contract */
const ulint IBUF_CONTRACT_ON_INSERT_NON_SYNC = 0;
/** If the combined size of the ibuf trees exceeds ibuf.max_size by this
many pages, we start to contract it in connection to inserts there, using
synchronous contract */
const ulint IBUF_CONTRACT_ON_INSERT_SYNC = 5;
/** If the combined size of the ibuf trees exceeds ibuf.max_size by
this many pages, we start to contract it synchronous contract, but do
not insert */
@@ -701,9 +687,9 @@ ibuf_bitmap_get_map_page_func(
buf_block_t* block = NULL;
dberr_t err = DB_SUCCESS;
block = buf_page_get_gen(ibuf_bitmap_page_no_calc(page_id, zip_size),
zip_size, RW_X_LATCH, NULL, BUF_GET,
file, line, mtr, &err);
block = buf_page_get_gen(
ibuf_bitmap_page_no_calc(page_id, zip_size),
zip_size, RW_X_LATCH, NULL, BUF_GET, file, line, mtr, &err);
if (err != DB_SUCCESS) {
return NULL;
@@ -2083,10 +2069,6 @@ void
ibuf_free_excess_pages(void)
/*========================*/
{
if (srv_force_recovery >= SRV_FORCE_NO_IBUF_MERGE) {
return;
}
/* Free at most a few pages at a time, so that we do not delay the
requested service too much */
@@ -2369,6 +2351,40 @@ ibuf_get_merge_pages(
return(volume);
}
/** Merge the change buffer to some pages. */
static void ibuf_read_merge_pages(const ulint* space_ids,
const ulint* page_nos, ulint n_stored)
{
for (ulint i = 0; i < n_stored; i++) {
const ulint space_id = space_ids[i];
fil_space_t* s = fil_space_acquire_for_io(space_id);
if (!s) {
tablespace_deleted:
/* The tablespace was not found: remove all
entries for it */
ibuf_delete_for_discarded_space(space_id);
while (i + 1 < n_stored
&& space_ids[i + 1] == space_id) {
i++;
}
continue;
}
const ulint zip_size = s->zip_size();
s->release_for_io();
mtr_t mtr;
mtr.start();
dberr_t err;
buf_page_get_gen(page_id_t(space_id, page_nos[i]),
zip_size, RW_X_LATCH, NULL, BUF_GET,
__FILE__, __LINE__, &mtr, &err, true);
mtr.commit();
if (err == DB_TABLESPACE_DELETED) {
goto tablespace_deleted;
}
}
}
/*********************************************************************//**
Contracts insert buffer trees by reading pages to the buffer pool.
@return a lower limit for the combined size in bytes of entries which
@@ -2378,10 +2394,7 @@ static
ulint
ibuf_merge_pages(
/*=============*/
ulint* n_pages, /*!< out: number of pages to which merged */
bool sync) /*!< in: true if the caller wants to wait for
the issued read with the highest tablespace
address to complete */
ulint* n_pages) /*!< out: number of pages to which merged */
{
mtr_t mtr;
btr_pcur_t pcur;
@@ -2424,15 +2437,10 @@ ibuf_merge_pages(
btr_pcur_get_rec(&pcur), &mtr,
space_ids,
page_nos, n_pages);
#if 0 /* defined UNIV_IBUF_DEBUG */
fprintf(stderr, "Ibuf contract sync %lu pages %lu volume %lu\n",
sync, *n_pages, sum_sizes);
#endif
ibuf_mtr_commit(&mtr);
btr_pcur_close(&pcur);
buf_read_ibuf_merge_pages(
sync, space_ids, page_nos, *n_pages);
ibuf_read_merge_pages(space_ids, page_nos, *n_pages);
return(sum_sizes + 1);
}
@@ -2502,8 +2510,7 @@ ibuf_merge_space(
}
#endif /* UNIV_DEBUG */
buf_read_ibuf_merge_pages(
true, spaces, pages, n_pages);
ibuf_read_merge_pages(spaces, pages, n_pages);
}
return(n_pages);
@@ -2516,11 +2523,8 @@ the issued reads to complete
@return a lower limit for the combined size in bytes of entries which
will be merged from ibuf trees to the pages read, 0 if ibuf is
empty */
static MY_ATTRIBUTE((warn_unused_result))
ulint
ibuf_merge(
ulint* n_pages,
bool sync)
MY_ATTRIBUTE((warn_unused_result))
static ulint ibuf_merge(ulint* n_pages)
{
*n_pages = 0;
@@ -2536,88 +2540,46 @@ ibuf_merge(
return(0);
#endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
} else {
return(ibuf_merge_pages(n_pages, sync));
return ibuf_merge_pages(n_pages);
}
}
/** Contract the change buffer by reading pages to the buffer pool.
@param[in] sync whether the caller waits for
the issued reads to complete
@return a lower limit for the combined size in bytes of entries which
will be merged from ibuf trees to the pages read, 0 if ibuf is empty */
static
ulint
ibuf_contract(
bool sync)
static ulint ibuf_contract()
{
ulint n_pages;
return(ibuf_merge_pages(&n_pages, sync));
ulint n_pages;
return ibuf_merge_pages(&n_pages);
}
/** Contract the change buffer by reading pages to the buffer pool.
@param[in] full If true, do a full contraction based
on PCT_IO(100). If false, the size of contract batch is determined
based on the current size of the change buffer.
@return a lower limit for the combined size in bytes of entries which
will be merged from ibuf trees to the pages read, 0 if ibuf is
empty */
ulint
ibuf_merge_in_background(
bool full)
ulint ibuf_merge_all()
{
ulint sum_bytes = 0;
ulint sum_pages = 0;
ulint n_pag2;
ulint n_pages;
#if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
if (srv_ibuf_disable_background_merge) {
return(0);
}
#endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
if (full) {
/* Caller has requested a full batch */
n_pages = PCT_IO(100);
} else {
/* By default we do a batch of 5% of the io_capacity */
n_pages = PCT_IO(5);
mutex_enter(&ibuf_mutex);
/* If the ibuf.size is more than half the max_size
then we make more agreesive contraction.
+1 is to avoid division by zero. */
if (ibuf.size > ibuf.max_size / 2) {
ulint diff = ibuf.size - ibuf.max_size / 2;
n_pages += PCT_IO((diff * 100)
/ (ibuf.max_size + 1));
}
mutex_exit(&ibuf_mutex);
}
#if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
if (ibuf_debug) {
return(0);
}
#endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
while (sum_pages < n_pages) {
ulint n_bytes;
ulint sum_bytes = 0;
ulint n_pages = PCT_IO(100);
n_bytes = ibuf_merge(&n_pag2, false);
for (ulint sum_pages = 0; sum_pages < n_pages; ) {
ulint n_pag2;
ulint n_bytes = ibuf_merge(&n_pag2);
if (n_bytes == 0) {
return(sum_bytes);
break;
}
sum_bytes += n_bytes;
sum_pages += n_pag2;
}
return(sum_bytes);
return sum_bytes;
}
/*********************************************************************//**
@@ -2629,11 +2591,6 @@ ibuf_contract_after_insert(
ulint entry_size) /*!< in: size of a record which was inserted
into an ibuf tree */
{
ibool sync;
ulint sum_sizes;
ulint size;
ulint max_size;
/* Perform dirty reads of ibuf.size and ibuf.max_size, to
reduce ibuf_mutex contention. ibuf.max_size remains constant
after ibuf_init_at_db_start(), but ibuf.size should be
@@ -2641,22 +2598,16 @@ ibuf_contract_after_insert(
machine word, this should be OK; at worst we are doing some
excessive ibuf_contract() or occasionally skipping a
ibuf_contract(). */
size = ibuf.size;
max_size = ibuf.max_size;
if (size < max_size + IBUF_CONTRACT_ON_INSERT_NON_SYNC) {
if (ibuf.size < ibuf.max_size) {
return;
}
sync = (size >= max_size + IBUF_CONTRACT_ON_INSERT_SYNC);
/* Contract at least entry_size many bytes */
sum_sizes = 0;
size = 1;
ulint sum_sizes = 0;
ulint size;
do {
size = ibuf_contract(sync);
size = ibuf_contract();
sum_sizes += size;
} while (size > 0 && sum_sizes < entry_size);
}
@@ -3296,7 +3247,7 @@ ibuf_insert_low(
#ifdef UNIV_IBUF_DEBUG
fputs("Ibuf too big\n", stderr);
#endif
ibuf_contract(true);
ibuf_contract();
return(DB_STRONG_FAIL);
}
@@ -3551,8 +3502,7 @@ func_exit:
#ifdef UNIV_IBUF_DEBUG
ut_a(n_stored <= IBUF_MAX_N_PAGES_MERGED);
#endif
buf_read_ibuf_merge_pages(false, space_ids,
page_nos, n_stored);
ibuf_read_merge_pages(space_ids, page_nos, n_stored);
}
return(err);
@@ -4251,6 +4201,42 @@ func_exit:
return(TRUE);
}
/** Check whether buffered changes exist for a page.
@param[in,out] block page
@return whether buffered changes exist */
bool ibuf_page_exists(const buf_page_t& bpage)
{
ut_ad(buf_page_get_io_fix(&bpage) == BUF_IO_READ
|| recv_recovery_is_on());
ut_ad(!fsp_is_system_temporary(bpage.id.space()));
ut_ad(buf_page_in_file(&bpage));
ut_ad(buf_page_get_state(&bpage) != BUF_BLOCK_FILE_PAGE
|| bpage.io_fix == BUF_IO_READ
|| rw_lock_own(&const_cast<buf_block_t&>
(reinterpret_cast<const buf_block_t&>
(bpage)).lock, RW_LOCK_X));
const ulint physical_size = bpage.physical_size();
if (ibuf_fixed_addr_page(bpage.id, physical_size)
|| fsp_descr_page(bpage.id, physical_size)) {
return false;
}
mtr_t mtr;
bool bitmap_bits = false;
ibuf_mtr_start(&mtr);
if (const page_t* bitmap_page = ibuf_bitmap_get_map_page(
bpage.id, bpage.zip_size(), &mtr)) {
bitmap_bits = ibuf_bitmap_page_get_bits(
bitmap_page, bpage.id, bpage.zip_size(),
IBUF_BITMAP_BUFFERED, &mtr) != 0;
}
ibuf_mtr_commit(&mtr);
return bitmap_bits;
}
/** When an index page is read from a disk to the buffer pool, this function
applies any buffered operations to the page and deletes the entries from the
insert buffer. If the page is not read, but created in the buffer pool, this
@@ -4286,11 +4272,9 @@ ibuf_merge_or_delete_for_page(
ulint dops[IBUF_OP_COUNT];
ut_ad(block == NULL || page_id == block->page.id);
ut_ad(block == NULL || buf_block_get_io_fix(block) == BUF_IO_READ
|| recv_recovery_is_on());
ut_ad(!block || buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
if (srv_force_recovery >= SRV_FORCE_NO_IBUF_MERGE
|| trx_sys_hdr_page(page_id)
if (trx_sys_hdr_page(page_id)
|| fsp_is_system_temporary(page_id.space())) {
return;
}
@@ -4391,16 +4375,12 @@ loop:
&pcur, &mtr);
if (block != NULL) {
ibool success;
ut_ad(rw_lock_own(&block->lock, RW_LOCK_X));
buf_block_buf_fix_inc(block, __FILE__, __LINE__);
rw_lock_x_lock(&block->lock);
mtr.set_named_space(space);
success = buf_page_get_known_nowait(
RW_X_LATCH, block,
BUF_KEEP_OLD, __FILE__, __LINE__, &mtr);
ut_a(success);
mtr.memo_push(block, MTR_MEMO_PAGE_X_FIX);
/* This is a user page (secondary index leaf page),
but we pretend that it is a change buffer page in
order to obey the latching order. This should be OK,
@@ -4466,7 +4446,6 @@ loop:
ut_ad(page_validate(block->frame, dummy_index));
switch (op) {
ibool success;
case IBUF_OP_INSERT:
#ifdef UNIV_IBUF_DEBUG
volume += rec_get_converted_size(
@@ -4515,11 +4494,11 @@ loop:
ibuf_mtr_start(&mtr);
mtr.set_named_space(space);
success = buf_page_get_known_nowait(
RW_X_LATCH, block,
BUF_KEEP_OLD,
__FILE__, __LINE__, &mtr);
ut_a(success);
ut_ad(rw_lock_own(&block->lock, RW_LOCK_X));
buf_block_buf_fix_inc(block,
__FILE__, __LINE__);
rw_lock_x_lock(&block->lock);
mtr.memo_push(block, MTR_MEMO_PAGE_X_FIX);
/* This is a user page (secondary
index leaf page), but it should be OK

View File

@@ -221,12 +221,13 @@ btr_height_get(
@param[in] index index tree
@param[in] page page number
@param[in] mode latch mode
@param[in] merge whether change buffer merge should be attempted
@param[in] file file name
@param[in] line line where called
@param[in,out] mtr mini-transaction
@return block */
inline buf_block_t* btr_block_get_func(const dict_index_t& index, ulint page,
ulint mode,
ulint mode, bool merge,
const char* file, unsigned line,
mtr_t* mtr)
{
@@ -235,7 +236,7 @@ inline buf_block_t* btr_block_get_func(const dict_index_t& index, ulint page,
if (buf_block_t* block = buf_page_get_gen(
page_id_t(index.table->space->id, page),
index.table->space->zip_size(), mode, NULL, BUF_GET,
file, line, mtr, &err)) {
file, line, mtr, &err, merge && !index.is_clust())) {
ut_ad(err == DB_SUCCESS);
if (mode != RW_NO_LATCH) {
buf_block_dbg_add_level(block, index.is_ibuf()
@@ -260,10 +261,11 @@ inline buf_block_t* btr_block_get_func(const dict_index_t& index, ulint page,
@param index index tree
@param page page number
@param mode latch mode
@param merge whether change buffer merge should be attempted
@param mtr mini-transaction handle
@return the block descriptor */
# define btr_block_get(index, page, mode, mtr) \
btr_block_get_func(index, page, mode, __FILE__, __LINE__, mtr)
# define btr_block_get(index, page, mode, merge, mtr) \
btr_block_get_func(index, page, mode, merge, __FILE__, __LINE__, mtr)
/**************************************************************//**
Gets the index id field of a page.
@return index id */

View File

@@ -67,16 +67,6 @@ struct fil_addr_t;
if the file page has been freed. */
#define BUF_EVICT_IF_IN_POOL 20 /*!< evict a clean block if found */
/* @} */
/** @name Modes for buf_page_get_known_nowait */
/* @{ */
#define BUF_MAKE_YOUNG 51 /*!< Move the block to the
start of the LRU list if there
is a danger that the block
would drift out of the buffer
pool*/
#define BUF_KEEP_OLD 52 /*!< Preserve the current LRU
position of the block. */
/* @} */
#define MAX_BUFFER_POOLS_BITS 6 /*!< Number of bits to representing
a buffer pool ID */
@@ -132,7 +122,6 @@ enum buf_page_state {
before putting to the free list */
};
/** This structure defines information we will fetch from each buffer pool. It
will be used to print table IO stats */
struct buf_pool_info_t{
@@ -357,7 +346,8 @@ NOTE! The following macros should be used instead of buf_page_get_gen,
to improve debugging. Only values RW_S_LATCH and RW_X_LATCH are allowed
in LA! */
#define buf_page_get(ID, SIZE, LA, MTR) \
buf_page_get_gen(ID, SIZE, LA, NULL, BUF_GET, __FILE__, __LINE__, MTR, NULL)
buf_page_get_gen(ID, SIZE, LA, NULL, BUF_GET, __FILE__, __LINE__, MTR)
/**************************************************************//**
Use these macros to bufferfix a page with no latching. Remember not to
read the contents of the page unless you know it is safe. Do not modify
@@ -366,7 +356,7 @@ error-prone programming not to set a latch, and it should be used
with care. */
#define buf_page_get_with_no_latch(ID, SIZE, MTR) \
buf_page_get_gen(ID, SIZE, RW_NO_LATCH, NULL, BUF_GET_NO_LATCH, \
__FILE__, __LINE__, MTR, NULL)
__FILE__, __LINE__, MTR)
/********************************************************************//**
This is the general function used to get optimistic access to a database
page.
@@ -380,19 +370,6 @@ buf_page_optimistic_get(
const char* file, /*!< in: file name */
unsigned line, /*!< in: line where called */
mtr_t* mtr); /*!< in: mini-transaction */
/********************************************************************//**
This is used to get access to a known database page, when no waiting can be
done.
@return TRUE if success */
ibool
buf_page_get_known_nowait(
/*======================*/
ulint rw_latch,/*!< in: RW_S_LATCH, RW_X_LATCH */
buf_block_t* block, /*!< in: the known page */
ulint mode, /*!< in: BUF_MAKE_YOUNG or BUF_KEEP_OLD */
const char* file, /*!< in: file name */
unsigned line, /*!< in: line where called */
mtr_t* mtr); /*!< in: mini-transaction */
/** Given a tablespace id and page number tries to get that page. If the
page is not in the buffer pool it is not loaded and NULL is returned.
@@ -431,16 +408,18 @@ the same set of mutexes or latches.
buf_page_t* buf_page_get_zip(const page_id_t page_id, ulint zip_size);
/** This is the general function used to get access to a database page.
@param[in] page_id page id
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
@param[in] rw_latch RW_S_LATCH, RW_X_LATCH, RW_NO_LATCH
@param[in] guess guessed block or NULL
@param[in] mode BUF_GET, BUF_GET_IF_IN_POOL,
@param[in] page_id page id
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
@param[in] rw_latch RW_S_LATCH, RW_X_LATCH, RW_NO_LATCH
@param[in] guess guessed block or NULL
@param[in] mode BUF_GET, BUF_GET_IF_IN_POOL,
BUF_PEEK_IF_IN_POOL, BUF_GET_NO_LATCH, or BUF_GET_IF_IN_POOL_OR_WATCH
@param[in] file file name of caller
@param[in] line line number of caller
@param[in,out] mtr mini-transaction
@param[out] err DB_SUCCESS or error code
@param[in] file file name
@param[in] line line where called
@param[in] mtr mini-transaction
@param[out] err DB_SUCCESS or error code
@param[in] allow_ibuf_merge Allow change buffer merge while
reading the pages from file.
@return pointer to the block or NULL */
buf_block_t*
buf_page_get_gen(
@@ -452,7 +431,8 @@ buf_page_get_gen(
const char* file,
unsigned line,
mtr_t* mtr,
dberr_t* err);
dberr_t* err = NULL,
bool allow_ibuf_merge = false);
/** Initialize a page in the buffer pool. The page is usually not read
from a file even if it cannot be found in the buffer buf_pool. This is one
@@ -538,28 +518,36 @@ buf_block_get_freed_page_clock(
const buf_block_t* block) /*!< in: block */
MY_ATTRIBUTE((warn_unused_result));
/********************************************************************//**
Tells if a block is still close enough to the MRU end of the LRU list
/** Determine if a block is still close enough to the MRU end of the LRU list
meaning that it is not in danger of getting evicted and also implying
that it has been accessed recently.
Note that this is for heuristics only and does not reserve buffer pool
mutex.
@return TRUE if block is close to MRU end of LRU */
UNIV_INLINE
ibool
buf_page_peek_if_young(
/*===================*/
const buf_page_t* bpage); /*!< in: block */
/********************************************************************//**
Recommends a move of a block to the start of the LRU list if there is danger
of dropping from the buffer pool. NOTE: does not reserve the buffer pool
mutex.
@return TRUE if should be made younger */
UNIV_INLINE
ibool
buf_page_peek_if_too_old(
/*=====================*/
const buf_page_t* bpage); /*!< in: block to make younger */
@param[in] buf_pool buffer pool
@param[in] bpage buffer pool page
@return whether bpage is close to MRU end of LRU */
inline bool buf_page_peek_if_young(const buf_pool_t* buf_pool,
const buf_page_t* bpage);
/** Determine if a block should be moved to the start of the LRU list if
there is danger of dropping from the buffer pool.
@param[in,out] buf_pool buffer pool
@param[in] bpage buffer pool page
@return true if bpage should be made younger */
inline bool buf_page_peek_if_too_old(buf_pool_t* buf_pool,
const buf_page_t* bpage);
/** Move a page to the start of the buffer pool LRU list if it is too old.
@param[in,out] buf_pool buffer pool
@param[in,out] bpage buffer pool page */
inline void buf_page_make_young_if_needed(buf_pool_t* buf_pool,
buf_page_t* bpage)
{
if (UNIV_UNLIKELY(buf_page_peek_if_too_old(buf_pool, bpage))) {
buf_page_make_young(bpage);
}
}
/********************************************************************//**
Gets the youngest modification log sequence number for a frame.
Returns zero if not file page or no modification occurred yet.
@@ -1175,7 +1163,10 @@ buf_page_init_for_read(
not match */
UNIV_INTERN
dberr_t
buf_page_io_complete(buf_page_t* bpage, bool dblwr = false, bool evict = false)
buf_page_io_complete(
buf_page_t* bpage,
bool dblwr = false,
bool evict = false)
MY_ATTRIBUTE((nonnull));
/********************************************************************//**
@@ -1619,6 +1610,9 @@ public:
protected by buf_pool->zip_mutex
or buf_block_t::mutex. */
# endif /* UNIV_DEBUG */
/** Change buffer entries for the page exist.
Protected by io_fix==BUF_IO_READ or by buf_block_t::lock. */
bool ibuf_exist;
void fix() { buf_fix_count++; }
uint32_t unfix()

View File

@@ -141,21 +141,17 @@ buf_block_get_freed_page_clock(
return(buf_page_get_freed_page_clock(&block->page));
}
/********************************************************************//**
Tells if a block is still close enough to the MRU end of the LRU list
/** Determine if a block is still close enough to the MRU end of the LRU list
meaning that it is not in danger of getting evicted and also implying
that it has been accessed recently.
Note that this is for heuristics only and does not reserve buffer pool
mutex.
@return TRUE if block is close to MRU end of LRU */
UNIV_INLINE
ibool
buf_page_peek_if_young(
/*===================*/
const buf_page_t* bpage) /*!< in: block */
@param[in] buf_pool buffer pool
@param[in] bpage buffer pool page
@return whether bpage is close to MRU end of LRU */
inline bool buf_page_peek_if_young(const buf_pool_t* buf_pool,
const buf_page_t* bpage)
{
buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
/* FIXME: bpage->freed_page_clock is 31 bits */
return((buf_pool->freed_page_clock & ((1UL << 31) - 1))
< (bpage->freed_page_clock
@@ -164,18 +160,16 @@ buf_page_peek_if_young(
/ (BUF_LRU_OLD_RATIO_DIV * 4))));
}
/********************************************************************//**
Recommends a move of a block to the start of the LRU list if there is danger
of dropping from the buffer pool. NOTE: does not reserve the buffer pool
mutex.
@return TRUE if should be made younger */
UNIV_INLINE
ibool
buf_page_peek_if_too_old(
/*=====================*/
const buf_page_t* bpage) /*!< in: block to make younger */
/** Determine if a block should be moved to the start of the LRU list if
there is danger of dropping from the buffer pool.
@param[in,out] buf_pool buffer pool
@param[in] bpage buffer pool page
@return true if bpage should be made younger */
inline bool buf_page_peek_if_too_old(buf_pool_t* buf_pool,
const buf_page_t* bpage)
{
buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
ut_ad(!buf_pool_mutex_own(buf_pool));
ut_ad(buf_page_in_file(bpage));
if (buf_pool->freed_page_clock == 0) {
/* If eviction has not started yet, do not update the
@@ -198,9 +192,9 @@ buf_page_peek_if_too_old(
}
buf_pool->stat.n_pages_not_made_young++;
return(FALSE);
return false;
} else {
return(!buf_page_peek_if_young(bpage));
return !buf_page_peek_if_young(buf_pool, bpage);
}
}

View File

@@ -100,26 +100,6 @@ which could result in a deadlock if the OS does not support asynchronous io.
ulint
buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf);
/********************************************************************//**
Issues read requests for pages which the ibuf module wants to read in, in
order to contract the insert buffer tree. Technically, this function is like
a read-ahead function. */
void
buf_read_ibuf_merge_pages(
/*======================*/
bool sync, /*!< in: true if the caller
wants this function to wait
for the highest address page
to get read in, before this
function returns */
const ulint* space_ids, /*!< in: array of space ids */
const ulint* page_nos, /*!< in: array of page numbers
to read, with the highest page
number the last in the
array */
ulint n_stored); /*!< in: number of elements
in the arrays */
/** Issues read requests for pages which recovery wants to read in.
@param[in] sync true if the caller wants this function to wait
for the highest address page to get read in, before this function returns

View File

@@ -317,6 +317,11 @@ ibuf_insert(
ulint zip_size,
que_thr_t* thr);
/** Check whether buffered changes exist for a page.
@param[in,out] bpage buffer page
@return whether buffered changes exist */
bool ibuf_page_exists(const buf_page_t& bpage);
/** When an index page is read from a disk to the buffer pool, this function
applies any buffered operations to the page and deletes the entries from the
insert buffer. If the page is not read, but created in the buffer pool, this
@@ -343,15 +348,10 @@ in DISCARD TABLESPACE, IMPORT TABLESPACE, or crash recovery.
void ibuf_delete_for_discarded_space(ulint space);
/** Contract the change buffer by reading pages to the buffer pool.
@param[in] full If true, do a full contraction based
on PCT_IO(100). If false, the size of contract batch is determined
based on the current size of the change buffer.
@return a lower limit for the combined size in bytes of entries which
will be merged from ibuf trees to the pages read, 0 if ibuf is
empty */
ulint
ibuf_merge_in_background(
bool full);
ulint ibuf_merge_all();
/** Contracts insert buffer trees by reading pages referring to space_id
to the buffer pool.

View File

@@ -44,6 +44,11 @@ ibuf_mtr_start(
{
mtr_start(mtr);
mtr->enter_ibuf();
if (high_level_read_only || srv_read_only_mode) {
mtr_set_log_mode(mtr, MTR_LOG_NONE);
}
}
/***************************************************************//**
Commits an insert buffer mini-transaction. */
@@ -130,8 +135,7 @@ ibuf_should_try(
&& !dict_index_is_clust(index)
&& !dict_index_is_spatial(index)
&& index->table->quiesce == QUIESCE_NONE
&& (ignore_sec_unique || !dict_index_is_unique(index))
&& srv_force_recovery < SRV_FORCE_NO_IBUF_MERGE);
&& (ignore_sec_unique || !dict_index_is_unique(index)));
}
/******************************************************************//**

View File

@@ -397,7 +397,6 @@ enum monitor_id_t {
MONITOR_MASTER_ACTIVE_LOOPS,
MONITOR_MASTER_IDLE_LOOPS,
MONITOR_SRV_BACKGROUND_DROP_TABLE_MICROSECOND,
MONITOR_SRV_IBUF_MERGE_MICROSECOND,
MONITOR_SRV_LOG_FLUSH_MICROSECOND,
MONITOR_SRV_MEM_VALIDATE_MICROSECOND,
MONITOR_SRV_PURGE_MICROSECOND,

View File

@@ -257,7 +257,7 @@ recovery and open all tables in RO mode instead of RW mode. We don't
sync the max trx id to disk either. */
extern my_bool srv_read_only_mode;
/** Set if InnoDB operates in read-only mode or innodb-force-recovery
is greater than SRV_FORCE_NO_TRX_UNDO. */
is greater than SRV_FORCE_NO_IBUF_MERGE. */
extern my_bool high_level_read_only;
/** store to its own file each table created by an user; data
dictionary tables are in the system tablespace 0 */
@@ -534,10 +534,6 @@ extern ulint srv_main_idle_loops;
/** Log writes involving flush. */
extern ulint srv_log_writes_and_flush;
#if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
extern my_bool srv_ibuf_disable_background_merge;
#endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
#ifdef UNIV_DEBUG
extern my_bool innodb_evict_tables_on_commit_debug;
extern my_bool srv_sync_debug;

View File

@@ -254,42 +254,37 @@ public:
{
ut_ad(mutex_own(&recv_sys.mutex));
ut_ad(recv_no_ibuf_operations);
for (map::iterator i= inits.begin(); i != inits.end(); i++) {
i->second.created = false;
for (auto &i : inits) {
i.second.created = false;
}
}
/** On the last recovery batch, merge buffered changes to those
pages that were initialized by buf_page_create() and still reside
in the buffer pool. Stale pages are not allowed in the buffer pool.
/** On the last recovery batch, mark whether the page contains
change buffered changes for the list of pages that were initialized
by buf_page_create() and still reside in the buffer pool.
Note: When MDEV-14481 implements redo log apply in the
background, we will have to ensure that buf_page_get_gen()
will not deliver stale pages to users (pages on which the
change buffer was not merged yet). Normally, the change
buffer merge is performed on I/O completion. Maybe, add a
flag to buf_page_t and perform the change buffer merge on
the first actual access?
change buffer was not merged yet).
@param[in,out] mtr dummy mini-transaction */
void ibuf_merge(mtr_t& mtr)
void mark_ibuf_exist(mtr_t& mtr)
{
ut_ad(mutex_own(&recv_sys.mutex));
ut_ad(!recv_no_ibuf_operations);
mtr.start();
for (map::const_iterator i= inits.begin(); i != inits.end();
i++) {
if (!i->second.created) {
for (const auto& i : inits) {
if (!i.second.created) {
continue;
}
if (buf_block_t* block = buf_page_get_gen(
i->first, 0, RW_X_LATCH, NULL,
i.first, 0, RW_X_LATCH, NULL,
BUF_GET_IF_IN_POOL, __FILE__, __LINE__,
&mtr, NULL)) {
&mtr)) {
mutex_exit(&recv_sys.mutex);
ibuf_merge_or_delete_for_page(
block, i->first,
block->zip_size(), true);
block->page.ibuf_exist = ibuf_page_exists(
block->page);
mtr.commit();
mtr.start();
mutex_enter(&recv_sys.mutex);
@@ -1995,11 +1990,9 @@ void recv_recover_page(buf_page_t* bpage)
x-latch on it. This is needed for the operations to
the page to pass the debug checks. */
rw_lock_x_lock_move_ownership(&block->lock);
buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
ibool success = buf_page_get_known_nowait(
RW_X_LATCH, block, BUF_KEEP_OLD,
__FILE__, __LINE__, &mtr);
ut_a(success);
buf_block_buf_fix_inc(block, __FILE__, __LINE__);
rw_lock_x_lock(&block->lock);
mtr.memo_push(block, MTR_MEMO_PAGE_X_FIX);
mutex_enter(&recv_sys.mutex);
if (recv_sys.apply_log_recs) {
@@ -2275,7 +2268,7 @@ done:
mlog_init.reset();
} else if (!recv_no_ibuf_operations) {
/* We skipped this in buf_page_create(). */
mlog_init.ibuf_merge(mtr);
mlog_init.mark_ibuf_exist(mtr);
}
recv_sys.apply_log_recs = false;

View File

@@ -2029,7 +2029,7 @@ end_of_index:
block = page_cur_get_block(cur);
block = btr_block_get(
*clust_index, next_page_no,
RW_S_LATCH, &mtr);
RW_S_LATCH, false, &mtr);
btr_leaf_page_release(page_cur_get_block(cur),
BTR_SEARCH_LEAF, &mtr);

View File

@@ -1188,11 +1188,6 @@ static monitor_info_t innodb_counter_info[] =
MONITOR_NONE,
MONITOR_DEFAULT_START, MONITOR_SRV_BACKGROUND_DROP_TABLE_MICROSECOND},
{"innodb_ibuf_merge_usec", "server",
"Time (in microseconds) spent to process change buffer merge",
MONITOR_NONE,
MONITOR_DEFAULT_START, MONITOR_SRV_IBUF_MERGE_MICROSECOND},
{"innodb_log_flush_usec", "server",
"Time (in microseconds) spent to flush log records",
MONITOR_NONE,

View File

@@ -2172,13 +2172,6 @@ srv_master_do_active_tasks(void)
srv_main_thread_op_info = "checking free log space";
log_free_check();
/* Do an ibuf merge */
srv_main_thread_op_info = "doing insert buffer merge";
counter_time = microsecond_interval_timer();
ibuf_merge_in_background(false);
MONITOR_INC_TIME_IN_MICRO_SECS(
MONITOR_SRV_IBUF_MERGE_MICROSECOND, counter_time);
/* Flush logs if needed */
srv_main_thread_op_info = "flushing log";
srv_sync_log_buffer_in_background();
@@ -2265,13 +2258,6 @@ srv_master_do_idle_tasks(void)
srv_main_thread_op_info = "checking free log space";
log_free_check();
/* Do an ibuf merge */
counter_time = microsecond_interval_timer();
srv_main_thread_op_info = "doing insert buffer merge";
ibuf_merge_in_background(true);
MONITOR_INC_TIME_IN_MICRO_SECS(
MONITOR_SRV_IBUF_MERGE_MICROSECOND, counter_time);
if (srv_shutdown_state != SRV_SHUTDOWN_NONE) {
return;
}
@@ -2335,7 +2321,7 @@ srv_shutdown(bool ibuf_merge)
srv_main_thread_op_info = "checking free log space";
log_free_check();
srv_main_thread_op_info = "doing insert buffer merge";
n_bytes_merged = ibuf_merge_in_background(true);
n_bytes_merged = ibuf_merge_all();
/* Flush logs if needed */
srv_sync_log_buffer_in_background();

View File

@@ -1311,7 +1311,7 @@ dberr_t srv_start(bool create_new_db)
}
high_level_read_only = srv_read_only_mode
|| srv_force_recovery > SRV_FORCE_NO_TRX_UNDO
|| srv_force_recovery > SRV_FORCE_NO_IBUF_MERGE
|| srv_sys_space.created_new_raw();
/* Reset the start state. */
@@ -2135,7 +2135,7 @@ files_checked:
/* Validate a few system page types that were left
uninitialized before MySQL or MariaDB 5.5. */
if (!high_level_read_only) {
ut_ad(srv_force_recovery < SRV_FORCE_NO_IBUF_MERGE);
ut_ad(srv_force_recovery <= SRV_FORCE_NO_IBUF_MERGE);
buf_block_t* block;
mtr.start();
/* Bitmap page types will be reset in
@@ -2190,7 +2190,7 @@ files_checked:
/* FIXME: Skip the following if srv_read_only_mode,
while avoiding "Allocated tablespace ID" warnings. */
if (srv_force_recovery < SRV_FORCE_NO_IBUF_MERGE) {
if (srv_force_recovery <= SRV_FORCE_NO_IBUF_MERGE) {
/* Open or Create SYS_TABLESPACES and SYS_DATAFILES
so that tablespace names and other metadata can be
found. */

View File

@@ -230,7 +230,6 @@ innodb_activity_count server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NU
innodb_master_active_loops server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of times master thread performs its tasks when server is active
innodb_master_idle_loops server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of times master thread performs its tasks when server is idle
innodb_background_drop_table_usec server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Time (in microseconds) spent to process drop table list
innodb_ibuf_merge_usec server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Time (in microseconds) spent to process change buffer merge
innodb_log_flush_usec server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Time (in microseconds) spent to flush log records
innodb_mem_validate_usec server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Time (in microseconds) spent to do memory validation
innodb_master_purge_usec server 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Time (in microseconds) spent by master thread to purge records