From 6f65e08277a9e65c193367e18c5e7bf68f68e3d2 Mon Sep 17 00:00:00 2001
From: Monty <monty@mariadb.org>
Date: Fri, 22 Dec 2023 13:53:57 +0200
Subject: [PATCH] MDEV-33118 optimizer_adjust_secondary_key_costs variable

optimizer-adjust_secondary_key_costs is added to provide 2 small
adjustments to the 10.x optimizer cost model. This can be used in the
case where the optimizer wrongly uses a secondary key instead of a
clustered primary key.

The reason behind this change is that MariaDB 10.x does not take into
account that for engines like InnoDB, that scanning a primary key can be
up to 7x faster than scanning a secondary key + read the row data trough
the primary key.

The different values for optimizer_adjust_secondary_key_costs are:

optimizer_adjust_secondary_key_costs=0
- No changes to current model

optimizer_adjust_secondary_key_costs=1
- Ensure that the cost of of secondary indexes has a cost of at
  least 5x times the cost of a clustered primary key (if one exists).
  This disables part of the worst_seek optimization described below.

optimizer_adjust_secondary_key_costs=2
- Disable "worst_seek optimization" and adjust filter cost slightly
  (add cost of 1 if filter is used).

The idea behind 'worst_seek optimization' is that we limit the
cost for all non clustered ref access to the least of:
- best-rows-by-range (or all rows in no range found) / 10
- scan-time-table (roughly number of file blocks to scan table) * 3

In addition we also do not try to use rowid_filter if number of rows
estimated for 'ref' access is less than the worst_seek limitation.

The idea is that worst_seek is trying to take into account that if
we do a lot of accesses through a key, this is likely to be cached.
However it only does this for secondary keys, and not for clustered
keys or index only reads.

The effect of the worst_seek are:
- In some cases 'ref' will have a much lower cost than range or using
  a clustered key.
- Some possible rowid filters for secondary keys will be ignored.

When implementing optimizer_adjust_secondary_key_costs=2, I noticed
that there is a slightly different costs for how ref+filter and
range+filter are calculated.  This caused a lot of range and
range+filter to change to ref+filter, which is not good as
range+filter provides the optimizer a better estimate of how many
accepted rows there will be in the result set.
Adding a extra small cost (1 seek) when using filter mitigated the
above problems in almost all cases.

This patch should not be applied to MariaDB 11.0 as worst_seeks is
removed in 11.0 and the cost calculation for clustered keys, secondary
keys, index scan and filter is more exact.

Test case changes for --optimizer-adjust_secondary_key_costs=1
(Fix secondary key costs to be 5x of primary key):

- stat_tables_innodb:
  - Complex change (probably ok as number of rows are really small)
    - ref over 1 row changed to range over 10 rows with join buffer
    - ref over 5 rows changed to eq_ref
    - secondary ref over 1 row changed to ref of primary key over 4 rows
    - Change of key to use longer key with index pushdown (a little
      bit worse but not significant).
  - Change to use secondary (1 row) -> primary (4 rows)
- rowid_filter_innodb:
  - index_merge (2 rows) & ref (1) -> all (23 rows) -> primary eq_ref.

Test case changes for --optimizer-adjust_secondary_key_costs=2
(remove of worst_seeks & adjust filter cost):

- stat_tables_innodb:
  - Join order change (probably ok as number of rows are really small)
  - ref (5 rows) & ref(1 row) changed to range (10 rows & join buffer)
    & eq_ref.
- selectivity_innodb:
  - ref -> ref|filter  (ok)
- rowid_filter_innodb:
  - ref -> ref|filter (ok)
  - range|filter (64 rows) changed to ref|filter (128 rows).
    ok as ref|filter outputs wrong number of rows in explain.
- range, range_mrr_icp:
  -ref (500 rows -> ALL (1000 rows) (ok)
- select_pkeycache, select, select_jcl6:
  - ref|filter (2 rows) -> ref (2 rows) (ok)
- selectivity:
  - ref -> ref_filter (ok)
- range:
  - Change of 'filtered' but no stat or plan change (ok)
- selectivity:
 - ref -> ref+filter (ok)
 - Change of filtered but no plan change (ok)
- join_nested_jcl6:
  - range -> ref|filter (ok as only 2 rows)
- subselect3, subselect3_jcl6:
  - ref_or_null (4 rows) -> ALL (10 rows) (ok)
  - Index_subquery (4 rows) -> ALL (10 rows)  (ok)
- partition_mrr_myisam, partition_mrr_aria and partition_mrr_innodb:
  - Uses ALL instead of REF for a key value that is the same for > 50%
    of rows.  (good)
order_by_innodb:
  - range (200 rows) -> ref (20 rows)+filesort (ok)
- subselect_sj2_mat:
  - One test changed. One ALL removed and replaced with eq_ref. Likely
    to be better.
- join_cache:
  - Changed ref over 60% of the rows to use hash join (ok)
- opt_tvc:
  - Changed to use eq_ref instead of ref with plan change (probably ok)
- opt_trace:
  - No worst/max seeks clipping (good).
  - Almost double range_scan_time and index_scan_time (ok).
- rowid_filter:
  - ref -> ref|filtered (ok)
  - range|filter (77 rows) changed to ref|filter (151 rows).  Proably
    ok as ref|filter outputs wrong number of rows in explain.

Reviewer: Sergei Petrunia <sergey@mariadb.com>
---
 mysql-test/main/mysqld--help.result           |  8 ++
 mysql-test/main/secondary_key_costs.result    | 82 +++++++++++++++++++
 mysql-test/main/secondary_key_costs.test      | 53 ++++++++++++
 .../sys_vars/r/sysvars_server_embedded.result | 10 +++
 .../r/sysvars_server_notembedded.result       | 10 +++
 sql/sql_class.h                               |  3 +-
 sql/sql_select.cc                             | 48 +++++++++--
 sql/sys_vars.cc                               | 13 +++
 8 files changed, 219 insertions(+), 8 deletions(-)
 create mode 100644 mysql-test/main/secondary_key_costs.result
 create mode 100644 mysql-test/main/secondary_key_costs.test

diff --git a/mysql-test/main/mysqld--help.result b/mysql-test/main/mysqld--help.result
index 219bad22528..d8a8fe449b6 100644
--- a/mysql-test/main/mysqld--help.result
+++ b/mysql-test/main/mysqld--help.result
@@ -712,6 +712,13 @@ The following specify which files/extra groups are read (specified before remain
  max_connections*5 or max_connections + table_cache*2
  (whichever is larger) number of file descriptors
  (Automatically configured unless set explicitly)
+ --optimizer-adjust-secondary-key-costs=# 
+ 0 = No changes. 1 = Update secondary key costs for ranges
+ to be at least 5x of clustered primary key costs. 2 =
+ Remove 'max_seek optimization' for secondary keys and
+ slight adjustment of filter cost. This option will be
+ deleted in MariaDB 11.0 as it is not needed with the new
+ 11.0 optimizer.
  --optimizer-max-sel-arg-weight=# 
  The maximum weight of the SEL_ARG graph. Set to 0 for no
  limit
@@ -1684,6 +1691,7 @@ old-alter-table DEFAULT
 old-mode UTF8_IS_UTF8MB3
 old-passwords FALSE
 old-style-user-limits FALSE
+optimizer-adjust-secondary-key-costs 0
 optimizer-max-sel-arg-weight 32000
 optimizer-max-sel-args 16000
 optimizer-prune-level 1
diff --git a/mysql-test/main/secondary_key_costs.result b/mysql-test/main/secondary_key_costs.result
new file mode 100644
index 00000000000..55c84705abf
--- /dev/null
+++ b/mysql-test/main/secondary_key_costs.result
@@ -0,0 +1,82 @@
+create table t1 (
+pk int primary key auto_increment,
+nm varchar(32),
+fl1 tinyint default 0,
+fl2 tinyint default 0,
+index idx1(nm, fl1),
+index idx2(fl2)
+) engine=myisam;
+create table name (
+pk int primary key auto_increment,
+nm bigint
+) engine=myisam;
+create table flag2 (
+pk int primary key auto_increment,
+fl2 tinyint
+) engine=myisam;
+insert into name(nm) select seq from seq_1_to_1000 order by rand(17);
+insert into flag2(fl2) select seq mod 2 from seq_1_to_1000 order by rand(19);
+insert into t1(nm,fl2)
+select nm, fl2 from name, flag2 where name.pk = flag2.pk;
+analyze table t1 persistent for all;
+Table	Op	Msg_type	Msg_text
+test.t1	analyze	status	Engine-independent statistics collected
+test.t1	analyze	status	Table is already up to date
+set optimizer_trace="enabled=on";
+set optimizer_switch='rowid_filter=on';
+set statement optimizer_adjust_secondary_key_costs=0 for
+explain select * from t1  where nm like '500%' AND fl2 = 0;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	range	idx1,idx2	idx1	35	NULL	1	Using index condition; Using where
+set @trace=(select trace from information_schema.optimizer_trace);
+select json_detailed(json_extract(@trace, '$**.considered_access_paths'));
+json_detailed(json_extract(@trace, '$**.considered_access_paths'))
+[
+    [
+        {
+            "access_type": "ref",
+            "index": "idx2",
+            "used_range_estimates": true,
+            "rowid_filter_skipped": "worst/max seeks clipping",
+            "rows": 492,
+            "cost": 492.3171406,
+            "chosen": true
+        },
+        {
+            "access_type": "range",
+            "resulting_rows": 0.492,
+            "cost": 1.448699097,
+            "chosen": true
+        }
+    ]
+]
+
+The following trace should have a different rowid_filter_key cost
+
+set statement optimizer_adjust_secondary_key_costs=2 for
+explain select * from t1  where nm like '500%' AND fl2 = 0;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	t1	range	idx1,idx2	idx1	35	NULL	1	Using index condition; Using where
+set @trace=(select trace from information_schema.optimizer_trace);
+select json_detailed(json_extract(@trace, '$**.considered_access_paths'));
+json_detailed(json_extract(@trace, '$**.considered_access_paths'))
+[
+    [
+        {
+            "access_type": "ref",
+            "index": "idx2",
+            "used_range_estimates": true,
+            "rowid_filter_key": "idx1",
+            "rows": 492,
+            "cost": 3.814364688,
+            "chosen": true
+        },
+        {
+            "access_type": "range",
+            "resulting_rows": 0.492,
+            "cost": 1.448699097,
+            "chosen": true
+        }
+    ]
+]
+drop table t1, name, flag2;
diff --git a/mysql-test/main/secondary_key_costs.test b/mysql-test/main/secondary_key_costs.test
new file mode 100644
index 00000000000..d3db137653b
--- /dev/null
+++ b/mysql-test/main/secondary_key_costs.test
@@ -0,0 +1,53 @@
+--source include/have_sequence.inc
+--source include/not_embedded.inc
+
+#
+# Show the costs for rowid filter
+#
+
+create table t1 (
+  pk int primary key auto_increment,
+  nm varchar(32),
+  fl1 tinyint default 0,
+  fl2 tinyint default 0,
+  index idx1(nm, fl1),
+  index idx2(fl2)
+) engine=myisam;
+
+create table name (
+  pk int primary key auto_increment,
+  nm bigint
+) engine=myisam;
+
+create table flag2 (
+  pk int primary key auto_increment,
+  fl2 tinyint
+) engine=myisam;
+
+insert into name(nm) select seq from seq_1_to_1000 order by rand(17);
+insert into flag2(fl2) select seq mod 2 from seq_1_to_1000 order by rand(19);
+
+insert into t1(nm,fl2)
+  select nm, fl2 from name, flag2 where name.pk = flag2.pk;
+
+analyze table t1 persistent for all;
+
+--disable_ps_protocol
+set optimizer_trace="enabled=on";
+set optimizer_switch='rowid_filter=on';
+set statement optimizer_adjust_secondary_key_costs=0 for
+explain select * from t1  where nm like '500%' AND fl2 = 0;
+set @trace=(select trace from information_schema.optimizer_trace);
+select json_detailed(json_extract(@trace, '$**.considered_access_paths'));
+
+--echo
+--echo The following trace should have a different rowid_filter_key cost
+--echo
+set statement optimizer_adjust_secondary_key_costs=2 for
+explain select * from t1  where nm like '500%' AND fl2 = 0;
+set @trace=(select trace from information_schema.optimizer_trace);
+select json_detailed(json_extract(@trace, '$**.considered_access_paths'));
+
+--enable_ps_protocol
+
+drop table t1, name, flag2;
diff --git a/mysql-test/suite/sys_vars/r/sysvars_server_embedded.result b/mysql-test/suite/sys_vars/r/sysvars_server_embedded.result
index 8e50d8e7268..aa88a8f74a8 100644
--- a/mysql-test/suite/sys_vars/r/sysvars_server_embedded.result
+++ b/mysql-test/suite/sys_vars/r/sysvars_server_embedded.result
@@ -2272,6 +2272,16 @@ NUMERIC_BLOCK_SIZE	1
 ENUM_VALUE_LIST	NULL
 READ_ONLY	YES
 COMMAND_LINE_ARGUMENT	REQUIRED
+VARIABLE_NAME	OPTIMIZER_ADJUST_SECONDARY_KEY_COSTS
+VARIABLE_SCOPE	SESSION
+VARIABLE_TYPE	BIGINT UNSIGNED
+VARIABLE_COMMENT	0 = No changes. 1 = Update secondary key costs for ranges to be at least 5x of clustered primary key costs. 2 = Remove 'max_seek optimization' for secondary keys and slight adjustment of filter cost. This option will be deleted in MariaDB 11.0 as it is not needed with the new 11.0 optimizer.
+NUMERIC_MIN_VALUE	0
+NUMERIC_MAX_VALUE	2
+NUMERIC_BLOCK_SIZE	1
+ENUM_VALUE_LIST	NULL
+READ_ONLY	NO
+COMMAND_LINE_ARGUMENT	REQUIRED
 VARIABLE_NAME	OPTIMIZER_MAX_SEL_ARGS
 VARIABLE_SCOPE	SESSION
 VARIABLE_TYPE	BIGINT UNSIGNED
diff --git a/mysql-test/suite/sys_vars/r/sysvars_server_notembedded.result b/mysql-test/suite/sys_vars/r/sysvars_server_notembedded.result
index 00c66ddf072..c07b7169608 100644
--- a/mysql-test/suite/sys_vars/r/sysvars_server_notembedded.result
+++ b/mysql-test/suite/sys_vars/r/sysvars_server_notembedded.result
@@ -2432,6 +2432,16 @@ NUMERIC_BLOCK_SIZE	1
 ENUM_VALUE_LIST	NULL
 READ_ONLY	YES
 COMMAND_LINE_ARGUMENT	REQUIRED
+VARIABLE_NAME	OPTIMIZER_ADJUST_SECONDARY_KEY_COSTS
+VARIABLE_SCOPE	SESSION
+VARIABLE_TYPE	BIGINT UNSIGNED
+VARIABLE_COMMENT	0 = No changes. 1 = Update secondary key costs for ranges to be at least 5x of clustered primary key costs. 2 = Remove 'max_seek optimization' for secondary keys and slight adjustment of filter cost. This option will be deleted in MariaDB 11.0 as it is not needed with the new 11.0 optimizer.
+NUMERIC_MIN_VALUE	0
+NUMERIC_MAX_VALUE	2
+NUMERIC_BLOCK_SIZE	1
+ENUM_VALUE_LIST	NULL
+READ_ONLY	NO
+COMMAND_LINE_ARGUMENT	REQUIRED
 VARIABLE_NAME	OPTIMIZER_MAX_SEL_ARGS
 VARIABLE_SCOPE	SESSION
 VARIABLE_TYPE	BIGINT UNSIGNED
diff --git a/sql/sql_class.h b/sql/sql_class.h
index d61ee1545b6..5175d440136 100644
--- a/sql/sql_class.h
+++ b/sql/sql_class.h
@@ -698,7 +698,6 @@ typedef struct system_variables
   ulonglong max_statement_time;
   ulonglong optimizer_switch;
   ulonglong optimizer_trace;
-  ulong optimizer_trace_max_mem_size;
   sql_mode_t sql_mode; ///< which non-standard SQL behaviour should be enabled
   sql_mode_t old_behavior; ///< which old SQL behaviour should be enabled
   ulonglong option_bits; ///< OPTION_xxx constants, e.g. OPTION_PROFILING
@@ -761,6 +760,8 @@ typedef struct system_variables
   ulong optimizer_use_condition_selectivity;
   ulong optimizer_max_sel_arg_weight;
   ulong optimizer_max_sel_args;
+  ulong optimizer_trace_max_mem_size;
+  ulong optimizer_adjust_secondary_key_costs;
   ulong use_stat_tables;
   double sample_percentage;
   ulong histogram_size;
diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 62c7f52748a..b1779ebc24d 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -5895,11 +5895,15 @@ make_join_statistics(JOIN *join, List<TABLE_LIST> &tables_list,
         This is can't be to high as otherwise we are likely to use
         table scan.
       */
-      s->worst_seeks= MY_MIN((double) s->found_records / 10,
-        (double) s->read_time*3);
-      if (s->worst_seeks < 2.0)			// Fix for small tables
-        s->worst_seeks=2.0;
-
+      /* Largest integer that can be stored in double (no compiler warning) */
+      s->worst_seeks= (double) (1ULL << 53);
+      if (thd->variables.optimizer_adjust_secondary_key_costs != 2)
+      {
+        s->worst_seeks= MY_MIN((double) s->found_records / 10,
+                               (double) s->read_time*3);
+        if (s->worst_seeks < 2.0)			// Fix for small tables
+          s->worst_seeks=2.0;
+      }
       /*
         Add to stat->const_keys those indexes for which all group fields or
         all select distinct fields participate in one index.
@@ -7817,8 +7821,27 @@ double cost_for_index_read(const THD *thd, const TABLE *table, uint key,
     if (table->covering_keys.is_set(key))
     cost= file->keyread_time(key, 1, records);
   else
+  {
     cost= ((file->keyread_time(key, 0, records) +
             file->read_time(key, 1, MY_MIN(records, worst_seeks))));
+    if (thd->variables.optimizer_adjust_secondary_key_costs == 1 &&
+        file->is_clustering_key(0))
+    {
+      /*
+        According to benchmarks done in 11.0 to calculate the new cost
+        model secondary key ranges are about 7x slower than primary
+        key ranges for big tables. Here we are a bit conservative and
+        only calculate with 5x. The reason for having it only 5x and
+        not for example 7x is is that choosing plans with more rows
+        that are read (ignored by the WHERE clause) causes the 10.x
+        optimizer to believe that there are more rows in the result
+        set, which can cause problems in finding the best join order.
+        Note: A clustering primary key is always key 0.
+      */
+      double clustering_key_cost= file->read_time(0, 1, records);
+      cost= MY_MAX(cost, clustering_key_cost * 5);
+    }
+  }
 
   DBUG_PRINT("statistics", ("cost: %.3f", cost));
   DBUG_RETURN(cost);
@@ -7992,6 +8015,14 @@ best_access_path(JOIN      *join,
   double keyread_tmp= 0;
   ha_rows rec;
   bool best_uses_jbuf= FALSE;
+  /*
+    if optimizer_use_condition_selectivity adjust filter cost to be slightly
+    higher to ensure that ref|filter is not less than range over same
+    number of rows
+  */
+  double filter_setup_cost= (thd->variables.
+                             optimizer_adjust_secondary_key_costs == 2 ?
+                             1.0 : 0.0);
   MY_BITMAP *eq_join_set= &s->table->eq_join_set;
   KEYUSE *hj_start_key= 0;
   SplM_plan_info *spl_plan= 0;
@@ -8548,6 +8579,7 @@ best_access_path(JOIN      *join,
 	       type == JT_EQ_REF ? 0.5 * tmp : MY_MIN(tmp, keyread_tmp);
         double access_cost_factor= MY_MIN((tmp - key_access_cost) / rows, 1.0);
 
+
         if (!(records < s->worst_seeks &&
               records <= thd->variables.max_seeks_for_key))
         {
@@ -8564,7 +8596,9 @@ best_access_path(JOIN      *join,
         }
         if (filter)
         {
-          tmp-= filter->get_adjusted_gain(rows) - filter->get_cmp_gain(rows);
+          tmp-= (filter->get_adjusted_gain(rows) -
+                 filter->get_cmp_gain(rows) -
+                 filter_setup_cost);
           DBUG_ASSERT(tmp >= 0);
           trace_access_idx.add("rowid_filter_key",
                                table->key_info[filter->key_no].name);
@@ -8810,7 +8844,7 @@ best_access_path(JOIN      *join,
                                                             access_cost_factor);
         if (filter)
         {
-          tmp-= filter->get_adjusted_gain(rows);
+          tmp-= filter->get_adjusted_gain(rows) - filter_setup_cost;
           DBUG_ASSERT(tmp >= 0);
         }
 
diff --git a/sql/sys_vars.cc b/sql/sys_vars.cc
index dea04bf3bc5..61911d877e5 100644
--- a/sql/sys_vars.cc
+++ b/sql/sys_vars.cc
@@ -2817,6 +2817,19 @@ static Sys_var_ulong Sys_optimizer_trace_max_mem_size(
     SESSION_VAR(optimizer_trace_max_mem_size), CMD_LINE(REQUIRED_ARG),
     VALID_RANGE(0, ULONG_MAX), DEFAULT(1024 * 1024), BLOCK_SIZE(1));
 
+static Sys_var_ulong Sys_optimizer_adjust_secondary_key_costs(
+    "optimizer_adjust_secondary_key_costs",
+    "0 = No changes. "
+    "1 = Update secondary key costs for ranges to be at least 5x of clustered "
+    "primary key costs. "
+    "2 = Remove 'max_seek optimization' for secondary keys and slight "
+    "adjustment of filter cost. "
+    "This option will be deleted in MariaDB 11.0 as it is not needed with the "
+    "new 11.0 optimizer.",
+    SESSION_VAR(optimizer_adjust_secondary_key_costs), CMD_LINE(REQUIRED_ARG),
+    VALID_RANGE(0, 2), DEFAULT(0), BLOCK_SIZE(1));
+
+
 static Sys_var_charptr_fscs Sys_pid_file(
        "pid_file", "Pid file used by safe_mysqld",
        READ_ONLY GLOBAL_VAR(pidfile_name_ptr), CMD_LINE(REQUIRED_ARG),