mirror of
https://github.com/MariaDB/server.git
synced 2025-07-29 05:21:33 +03:00
MDEV-33118 optimizer_adjust_secondary_key_costs variable
optimizer-adjust_secondary_key_costs is added to provide 2 small adjustments to the 10.x optimizer cost model. This can be used in the case where the optimizer wrongly uses a secondary key instead of a clustered primary key. The reason behind this change is that MariaDB 10.x does not take into account that for engines like InnoDB, that scanning a primary key can be up to 7x faster than scanning a secondary key + read the row data trough the primary key. The different values for optimizer_adjust_secondary_key_costs are: optimizer_adjust_secondary_key_costs=0 - No changes to current model optimizer_adjust_secondary_key_costs=1 - Ensure that the cost of of secondary indexes has a cost of at least 5x times the cost of a clustered primary key (if one exists). This disables part of the worst_seek optimization described below. optimizer_adjust_secondary_key_costs=2 - Disable "worst_seek optimization" and adjust filter cost slightly (add cost of 1 if filter is used). The idea behind 'worst_seek optimization' is that we limit the cost for all non clustered ref access to the least of: - best-rows-by-range (or all rows in no range found) / 10 - scan-time-table (roughly number of file blocks to scan table) * 3 In addition we also do not try to use rowid_filter if number of rows estimated for 'ref' access is less than the worst_seek limitation. The idea is that worst_seek is trying to take into account that if we do a lot of accesses through a key, this is likely to be cached. However it only does this for secondary keys, and not for clustered keys or index only reads. The effect of the worst_seek are: - In some cases 'ref' will have a much lower cost than range or using a clustered key. - Some possible rowid filters for secondary keys will be ignored. When implementing optimizer_adjust_secondary_key_costs=2, I noticed that there is a slightly different costs for how ref+filter and range+filter are calculated. This caused a lot of range and range+filter to change to ref+filter, which is not good as range+filter provides the optimizer a better estimate of how many accepted rows there will be in the result set. Adding a extra small cost (1 seek) when using filter mitigated the above problems in almost all cases. This patch should not be applied to MariaDB 11.0 as worst_seeks is removed in 11.0 and the cost calculation for clustered keys, secondary keys, index scan and filter is more exact. Test case changes for --optimizer-adjust_secondary_key_costs=1 (Fix secondary key costs to be 5x of primary key): - stat_tables_innodb: - Complex change (probably ok as number of rows are really small) - ref over 1 row changed to range over 10 rows with join buffer - ref over 5 rows changed to eq_ref - secondary ref over 1 row changed to ref of primary key over 4 rows - Change of key to use longer key with index pushdown (a little bit worse but not significant). - Change to use secondary (1 row) -> primary (4 rows) - rowid_filter_innodb: - index_merge (2 rows) & ref (1) -> all (23 rows) -> primary eq_ref. Test case changes for --optimizer-adjust_secondary_key_costs=2 (remove of worst_seeks & adjust filter cost): - stat_tables_innodb: - Join order change (probably ok as number of rows are really small) - ref (5 rows) & ref(1 row) changed to range (10 rows & join buffer) & eq_ref. - selectivity_innodb: - ref -> ref|filter (ok) - rowid_filter_innodb: - ref -> ref|filter (ok) - range|filter (64 rows) changed to ref|filter (128 rows). ok as ref|filter outputs wrong number of rows in explain. - range, range_mrr_icp: -ref (500 rows -> ALL (1000 rows) (ok) - select_pkeycache, select, select_jcl6: - ref|filter (2 rows) -> ref (2 rows) (ok) - selectivity: - ref -> ref_filter (ok) - range: - Change of 'filtered' but no stat or plan change (ok) - selectivity: - ref -> ref+filter (ok) - Change of filtered but no plan change (ok) - join_nested_jcl6: - range -> ref|filter (ok as only 2 rows) - subselect3, subselect3_jcl6: - ref_or_null (4 rows) -> ALL (10 rows) (ok) - Index_subquery (4 rows) -> ALL (10 rows) (ok) - partition_mrr_myisam, partition_mrr_aria and partition_mrr_innodb: - Uses ALL instead of REF for a key value that is the same for > 50% of rows. (good) order_by_innodb: - range (200 rows) -> ref (20 rows)+filesort (ok) - subselect_sj2_mat: - One test changed. One ALL removed and replaced with eq_ref. Likely to be better. - join_cache: - Changed ref over 60% of the rows to use hash join (ok) - opt_tvc: - Changed to use eq_ref instead of ref with plan change (probably ok) - opt_trace: - No worst/max seeks clipping (good). - Almost double range_scan_time and index_scan_time (ok). - rowid_filter: - ref -> ref|filtered (ok) - range|filter (77 rows) changed to ref|filter (151 rows). Proably ok as ref|filter outputs wrong number of rows in explain. Reviewer: Sergei Petrunia <sergey@mariadb.com>
This commit is contained in:
@ -712,6 +712,13 @@ The following specify which files/extra groups are read (specified before remain
|
||||
max_connections*5 or max_connections + table_cache*2
|
||||
(whichever is larger) number of file descriptors
|
||||
(Automatically configured unless set explicitly)
|
||||
--optimizer-adjust-secondary-key-costs=#
|
||||
0 = No changes. 1 = Update secondary key costs for ranges
|
||||
to be at least 5x of clustered primary key costs. 2 =
|
||||
Remove 'max_seek optimization' for secondary keys and
|
||||
slight adjustment of filter cost. This option will be
|
||||
deleted in MariaDB 11.0 as it is not needed with the new
|
||||
11.0 optimizer.
|
||||
--optimizer-max-sel-arg-weight=#
|
||||
The maximum weight of the SEL_ARG graph. Set to 0 for no
|
||||
limit
|
||||
@ -1684,6 +1691,7 @@ old-alter-table DEFAULT
|
||||
old-mode UTF8_IS_UTF8MB3
|
||||
old-passwords FALSE
|
||||
old-style-user-limits FALSE
|
||||
optimizer-adjust-secondary-key-costs 0
|
||||
optimizer-max-sel-arg-weight 32000
|
||||
optimizer-max-sel-args 16000
|
||||
optimizer-prune-level 1
|
||||
|
82
mysql-test/main/secondary_key_costs.result
Normal file
82
mysql-test/main/secondary_key_costs.result
Normal file
@ -0,0 +1,82 @@
|
||||
create table t1 (
|
||||
pk int primary key auto_increment,
|
||||
nm varchar(32),
|
||||
fl1 tinyint default 0,
|
||||
fl2 tinyint default 0,
|
||||
index idx1(nm, fl1),
|
||||
index idx2(fl2)
|
||||
) engine=myisam;
|
||||
create table name (
|
||||
pk int primary key auto_increment,
|
||||
nm bigint
|
||||
) engine=myisam;
|
||||
create table flag2 (
|
||||
pk int primary key auto_increment,
|
||||
fl2 tinyint
|
||||
) engine=myisam;
|
||||
insert into name(nm) select seq from seq_1_to_1000 order by rand(17);
|
||||
insert into flag2(fl2) select seq mod 2 from seq_1_to_1000 order by rand(19);
|
||||
insert into t1(nm,fl2)
|
||||
select nm, fl2 from name, flag2 where name.pk = flag2.pk;
|
||||
analyze table t1 persistent for all;
|
||||
Table Op Msg_type Msg_text
|
||||
test.t1 analyze status Engine-independent statistics collected
|
||||
test.t1 analyze status Table is already up to date
|
||||
set optimizer_trace="enabled=on";
|
||||
set optimizer_switch='rowid_filter=on';
|
||||
set statement optimizer_adjust_secondary_key_costs=0 for
|
||||
explain select * from t1 where nm like '500%' AND fl2 = 0;
|
||||
id select_type table type possible_keys key key_len ref rows Extra
|
||||
1 SIMPLE t1 range idx1,idx2 idx1 35 NULL 1 Using index condition; Using where
|
||||
set @trace=(select trace from information_schema.optimizer_trace);
|
||||
select json_detailed(json_extract(@trace, '$**.considered_access_paths'));
|
||||
json_detailed(json_extract(@trace, '$**.considered_access_paths'))
|
||||
[
|
||||
[
|
||||
{
|
||||
"access_type": "ref",
|
||||
"index": "idx2",
|
||||
"used_range_estimates": true,
|
||||
"rowid_filter_skipped": "worst/max seeks clipping",
|
||||
"rows": 492,
|
||||
"cost": 492.3171406,
|
||||
"chosen": true
|
||||
},
|
||||
{
|
||||
"access_type": "range",
|
||||
"resulting_rows": 0.492,
|
||||
"cost": 1.448699097,
|
||||
"chosen": true
|
||||
}
|
||||
]
|
||||
]
|
||||
|
||||
The following trace should have a different rowid_filter_key cost
|
||||
|
||||
set statement optimizer_adjust_secondary_key_costs=2 for
|
||||
explain select * from t1 where nm like '500%' AND fl2 = 0;
|
||||
id select_type table type possible_keys key key_len ref rows Extra
|
||||
1 SIMPLE t1 range idx1,idx2 idx1 35 NULL 1 Using index condition; Using where
|
||||
set @trace=(select trace from information_schema.optimizer_trace);
|
||||
select json_detailed(json_extract(@trace, '$**.considered_access_paths'));
|
||||
json_detailed(json_extract(@trace, '$**.considered_access_paths'))
|
||||
[
|
||||
[
|
||||
{
|
||||
"access_type": "ref",
|
||||
"index": "idx2",
|
||||
"used_range_estimates": true,
|
||||
"rowid_filter_key": "idx1",
|
||||
"rows": 492,
|
||||
"cost": 3.814364688,
|
||||
"chosen": true
|
||||
},
|
||||
{
|
||||
"access_type": "range",
|
||||
"resulting_rows": 0.492,
|
||||
"cost": 1.448699097,
|
||||
"chosen": true
|
||||
}
|
||||
]
|
||||
]
|
||||
drop table t1, name, flag2;
|
53
mysql-test/main/secondary_key_costs.test
Normal file
53
mysql-test/main/secondary_key_costs.test
Normal file
@ -0,0 +1,53 @@
|
||||
--source include/have_sequence.inc
|
||||
--source include/not_embedded.inc
|
||||
|
||||
#
|
||||
# Show the costs for rowid filter
|
||||
#
|
||||
|
||||
create table t1 (
|
||||
pk int primary key auto_increment,
|
||||
nm varchar(32),
|
||||
fl1 tinyint default 0,
|
||||
fl2 tinyint default 0,
|
||||
index idx1(nm, fl1),
|
||||
index idx2(fl2)
|
||||
) engine=myisam;
|
||||
|
||||
create table name (
|
||||
pk int primary key auto_increment,
|
||||
nm bigint
|
||||
) engine=myisam;
|
||||
|
||||
create table flag2 (
|
||||
pk int primary key auto_increment,
|
||||
fl2 tinyint
|
||||
) engine=myisam;
|
||||
|
||||
insert into name(nm) select seq from seq_1_to_1000 order by rand(17);
|
||||
insert into flag2(fl2) select seq mod 2 from seq_1_to_1000 order by rand(19);
|
||||
|
||||
insert into t1(nm,fl2)
|
||||
select nm, fl2 from name, flag2 where name.pk = flag2.pk;
|
||||
|
||||
analyze table t1 persistent for all;
|
||||
|
||||
--disable_ps_protocol
|
||||
set optimizer_trace="enabled=on";
|
||||
set optimizer_switch='rowid_filter=on';
|
||||
set statement optimizer_adjust_secondary_key_costs=0 for
|
||||
explain select * from t1 where nm like '500%' AND fl2 = 0;
|
||||
set @trace=(select trace from information_schema.optimizer_trace);
|
||||
select json_detailed(json_extract(@trace, '$**.considered_access_paths'));
|
||||
|
||||
--echo
|
||||
--echo The following trace should have a different rowid_filter_key cost
|
||||
--echo
|
||||
set statement optimizer_adjust_secondary_key_costs=2 for
|
||||
explain select * from t1 where nm like '500%' AND fl2 = 0;
|
||||
set @trace=(select trace from information_schema.optimizer_trace);
|
||||
select json_detailed(json_extract(@trace, '$**.considered_access_paths'));
|
||||
|
||||
--enable_ps_protocol
|
||||
|
||||
drop table t1, name, flag2;
|
@ -2272,6 +2272,16 @@ NUMERIC_BLOCK_SIZE 1
|
||||
ENUM_VALUE_LIST NULL
|
||||
READ_ONLY YES
|
||||
COMMAND_LINE_ARGUMENT REQUIRED
|
||||
VARIABLE_NAME OPTIMIZER_ADJUST_SECONDARY_KEY_COSTS
|
||||
VARIABLE_SCOPE SESSION
|
||||
VARIABLE_TYPE BIGINT UNSIGNED
|
||||
VARIABLE_COMMENT 0 = No changes. 1 = Update secondary key costs for ranges to be at least 5x of clustered primary key costs. 2 = Remove 'max_seek optimization' for secondary keys and slight adjustment of filter cost. This option will be deleted in MariaDB 11.0 as it is not needed with the new 11.0 optimizer.
|
||||
NUMERIC_MIN_VALUE 0
|
||||
NUMERIC_MAX_VALUE 2
|
||||
NUMERIC_BLOCK_SIZE 1
|
||||
ENUM_VALUE_LIST NULL
|
||||
READ_ONLY NO
|
||||
COMMAND_LINE_ARGUMENT REQUIRED
|
||||
VARIABLE_NAME OPTIMIZER_MAX_SEL_ARGS
|
||||
VARIABLE_SCOPE SESSION
|
||||
VARIABLE_TYPE BIGINT UNSIGNED
|
||||
|
@ -2432,6 +2432,16 @@ NUMERIC_BLOCK_SIZE 1
|
||||
ENUM_VALUE_LIST NULL
|
||||
READ_ONLY YES
|
||||
COMMAND_LINE_ARGUMENT REQUIRED
|
||||
VARIABLE_NAME OPTIMIZER_ADJUST_SECONDARY_KEY_COSTS
|
||||
VARIABLE_SCOPE SESSION
|
||||
VARIABLE_TYPE BIGINT UNSIGNED
|
||||
VARIABLE_COMMENT 0 = No changes. 1 = Update secondary key costs for ranges to be at least 5x of clustered primary key costs. 2 = Remove 'max_seek optimization' for secondary keys and slight adjustment of filter cost. This option will be deleted in MariaDB 11.0 as it is not needed with the new 11.0 optimizer.
|
||||
NUMERIC_MIN_VALUE 0
|
||||
NUMERIC_MAX_VALUE 2
|
||||
NUMERIC_BLOCK_SIZE 1
|
||||
ENUM_VALUE_LIST NULL
|
||||
READ_ONLY NO
|
||||
COMMAND_LINE_ARGUMENT REQUIRED
|
||||
VARIABLE_NAME OPTIMIZER_MAX_SEL_ARGS
|
||||
VARIABLE_SCOPE SESSION
|
||||
VARIABLE_TYPE BIGINT UNSIGNED
|
||||
|
@ -698,7 +698,6 @@ typedef struct system_variables
|
||||
ulonglong max_statement_time;
|
||||
ulonglong optimizer_switch;
|
||||
ulonglong optimizer_trace;
|
||||
ulong optimizer_trace_max_mem_size;
|
||||
sql_mode_t sql_mode; ///< which non-standard SQL behaviour should be enabled
|
||||
sql_mode_t old_behavior; ///< which old SQL behaviour should be enabled
|
||||
ulonglong option_bits; ///< OPTION_xxx constants, e.g. OPTION_PROFILING
|
||||
@ -761,6 +760,8 @@ typedef struct system_variables
|
||||
ulong optimizer_use_condition_selectivity;
|
||||
ulong optimizer_max_sel_arg_weight;
|
||||
ulong optimizer_max_sel_args;
|
||||
ulong optimizer_trace_max_mem_size;
|
||||
ulong optimizer_adjust_secondary_key_costs;
|
||||
ulong use_stat_tables;
|
||||
double sample_percentage;
|
||||
ulong histogram_size;
|
||||
|
@ -5895,11 +5895,15 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
|
||||
This is can't be to high as otherwise we are likely to use
|
||||
table scan.
|
||||
*/
|
||||
s->worst_seeks= MY_MIN((double) s->found_records / 10,
|
||||
(double) s->read_time*3);
|
||||
if (s->worst_seeks < 2.0) // Fix for small tables
|
||||
s->worst_seeks=2.0;
|
||||
|
||||
/* Largest integer that can be stored in double (no compiler warning) */
|
||||
s->worst_seeks= (double) (1ULL << 53);
|
||||
if (thd->variables.optimizer_adjust_secondary_key_costs != 2)
|
||||
{
|
||||
s->worst_seeks= MY_MIN((double) s->found_records / 10,
|
||||
(double) s->read_time*3);
|
||||
if (s->worst_seeks < 2.0) // Fix for small tables
|
||||
s->worst_seeks=2.0;
|
||||
}
|
||||
/*
|
||||
Add to stat->const_keys those indexes for which all group fields or
|
||||
all select distinct fields participate in one index.
|
||||
@ -7817,8 +7821,27 @@ double cost_for_index_read(const THD *thd, const TABLE *table, uint key,
|
||||
if (table->covering_keys.is_set(key))
|
||||
cost= file->keyread_time(key, 1, records);
|
||||
else
|
||||
{
|
||||
cost= ((file->keyread_time(key, 0, records) +
|
||||
file->read_time(key, 1, MY_MIN(records, worst_seeks))));
|
||||
if (thd->variables.optimizer_adjust_secondary_key_costs == 1 &&
|
||||
file->is_clustering_key(0))
|
||||
{
|
||||
/*
|
||||
According to benchmarks done in 11.0 to calculate the new cost
|
||||
model secondary key ranges are about 7x slower than primary
|
||||
key ranges for big tables. Here we are a bit conservative and
|
||||
only calculate with 5x. The reason for having it only 5x and
|
||||
not for example 7x is is that choosing plans with more rows
|
||||
that are read (ignored by the WHERE clause) causes the 10.x
|
||||
optimizer to believe that there are more rows in the result
|
||||
set, which can cause problems in finding the best join order.
|
||||
Note: A clustering primary key is always key 0.
|
||||
*/
|
||||
double clustering_key_cost= file->read_time(0, 1, records);
|
||||
cost= MY_MAX(cost, clustering_key_cost * 5);
|
||||
}
|
||||
}
|
||||
|
||||
DBUG_PRINT("statistics", ("cost: %.3f", cost));
|
||||
DBUG_RETURN(cost);
|
||||
@ -7992,6 +8015,14 @@ best_access_path(JOIN *join,
|
||||
double keyread_tmp= 0;
|
||||
ha_rows rec;
|
||||
bool best_uses_jbuf= FALSE;
|
||||
/*
|
||||
if optimizer_use_condition_selectivity adjust filter cost to be slightly
|
||||
higher to ensure that ref|filter is not less than range over same
|
||||
number of rows
|
||||
*/
|
||||
double filter_setup_cost= (thd->variables.
|
||||
optimizer_adjust_secondary_key_costs == 2 ?
|
||||
1.0 : 0.0);
|
||||
MY_BITMAP *eq_join_set= &s->table->eq_join_set;
|
||||
KEYUSE *hj_start_key= 0;
|
||||
SplM_plan_info *spl_plan= 0;
|
||||
@ -8548,6 +8579,7 @@ best_access_path(JOIN *join,
|
||||
type == JT_EQ_REF ? 0.5 * tmp : MY_MIN(tmp, keyread_tmp);
|
||||
double access_cost_factor= MY_MIN((tmp - key_access_cost) / rows, 1.0);
|
||||
|
||||
|
||||
if (!(records < s->worst_seeks &&
|
||||
records <= thd->variables.max_seeks_for_key))
|
||||
{
|
||||
@ -8564,7 +8596,9 @@ best_access_path(JOIN *join,
|
||||
}
|
||||
if (filter)
|
||||
{
|
||||
tmp-= filter->get_adjusted_gain(rows) - filter->get_cmp_gain(rows);
|
||||
tmp-= (filter->get_adjusted_gain(rows) -
|
||||
filter->get_cmp_gain(rows) -
|
||||
filter_setup_cost);
|
||||
DBUG_ASSERT(tmp >= 0);
|
||||
trace_access_idx.add("rowid_filter_key",
|
||||
table->key_info[filter->key_no].name);
|
||||
@ -8810,7 +8844,7 @@ best_access_path(JOIN *join,
|
||||
access_cost_factor);
|
||||
if (filter)
|
||||
{
|
||||
tmp-= filter->get_adjusted_gain(rows);
|
||||
tmp-= filter->get_adjusted_gain(rows) - filter_setup_cost;
|
||||
DBUG_ASSERT(tmp >= 0);
|
||||
}
|
||||
|
||||
|
@ -2817,6 +2817,19 @@ static Sys_var_ulong Sys_optimizer_trace_max_mem_size(
|
||||
SESSION_VAR(optimizer_trace_max_mem_size), CMD_LINE(REQUIRED_ARG),
|
||||
VALID_RANGE(0, ULONG_MAX), DEFAULT(1024 * 1024), BLOCK_SIZE(1));
|
||||
|
||||
static Sys_var_ulong Sys_optimizer_adjust_secondary_key_costs(
|
||||
"optimizer_adjust_secondary_key_costs",
|
||||
"0 = No changes. "
|
||||
"1 = Update secondary key costs for ranges to be at least 5x of clustered "
|
||||
"primary key costs. "
|
||||
"2 = Remove 'max_seek optimization' for secondary keys and slight "
|
||||
"adjustment of filter cost. "
|
||||
"This option will be deleted in MariaDB 11.0 as it is not needed with the "
|
||||
"new 11.0 optimizer.",
|
||||
SESSION_VAR(optimizer_adjust_secondary_key_costs), CMD_LINE(REQUIRED_ARG),
|
||||
VALID_RANGE(0, 2), DEFAULT(0), BLOCK_SIZE(1));
|
||||
|
||||
|
||||
static Sys_var_charptr_fscs Sys_pid_file(
|
||||
"pid_file", "Pid file used by safe_mysqld",
|
||||
READ_ONLY GLOBAL_VAR(pidfile_name_ptr), CMD_LINE(REQUIRED_ARG),
|
||||
|
Reference in New Issue
Block a user