1
0
mirror of https://github.com/MariaDB/server.git synced 2025-07-30 16:24:05 +03:00

Fixed the problem of mdev-5947.

Back-ported from the mysql 5.6 code line the patch with
the following comment:

  Fix for Bug#11757108 CHANGE IN EXECUTION PLAN FOR COUNT_DISTINCT_GROUP_ON_KEY
                       CAUSES PEFORMANCE REGRESSION

  The cause for the performance regression is that the access strategy for the
  GROUP BY query is changed form using "index scan" in mysql-5.1 to use "loose
  index scan" in mysql-5.5. The index used for group by is unique and thus each
  "loose scan" group will only contain one record. Since loose scan needs to
  re-position on each "loose scan" group this query will do a re-position for
  each index entry. Compared to just reading the next index entry as a normal
  index scan does, the use of loose scan for this query becomes more expensive.

  The cause for selecting to use loose scan for this query is that in the current
  code when the size of the "loose scan" group is one, the formula for
  calculating the cost estimates becomes almost identical to the cost of using
  normal index scan. Differences in use of integer versus floating point arithmetic
  can cause one or the other access strategy to be selected.

  The main issue with the formula for estimating the cost of using loose scan is
  that it does not take into account that it is more costly to do a re-position
  for each "loose scan" group compared to just reading the next index entry.
  Both index scan and loose scan estimates the cpu cost as:

    "number of entries needed too read/scan" * ROW_EVALUATE_COST

  The results from testing with the query in this bug indicates that the real
  cost for doing re-position four to eight times higher than just reading the
  next index entry. Thus, the cpu cost estimate for loose scan should be increased.
  To account for the extra work to re-position in the index we increase the
  cost for loose index scan to include the cost of navigating the index.
  This is modelled as a function of the height of the b-tree:

    navigation cost= ceil(log(records in table)/log(indexes per block))
                   * ROWID_COMPARE_COST;

  This will avoid loose index scan being used for indexes where the "loose scan"
  group contains very few index entries.
This commit is contained in:
Igor Babaev
2014-04-22 14:39:57 -07:00
parent bd44c086b3
commit 3e0f63c18f
8 changed files with 49 additions and 26 deletions

View File

@ -13424,7 +13424,7 @@ SEL_ARG * get_index_range_tree(uint index, SEL_TREE* range_tree, PARAM *param,
DESCRIPTION
This method computes the access cost of a TRP_GROUP_MIN_MAX instance and
the number of rows returned. It updates this->read_cost and this->records.
the number of rows returned.
NOTES
The cost computation distinguishes several cases:
@ -13480,7 +13480,6 @@ void cost_group_min_max(TABLE* table, KEY *index_info, uint used_key_parts,
double p_overlap; /* Probability that a sub-group overlaps two blocks. */
double quick_prefix_selectivity;
double io_cost;
double cpu_cost= 0; /* TODO: CPU cost of index_read calls? */
DBUG_ENTER("cost_group_min_max");
table_records= table->stat_records();
@ -13528,11 +13527,25 @@ void cost_group_min_max(TABLE* table, KEY *index_info, uint used_key_parts,
(double) num_blocks;
/*
TODO: If there is no WHERE clause and no other expressions, there should be
no CPU cost. We leave it here to make this cost comparable to that of index
scan as computed in SQL_SELECT::test_quick_select().
CPU cost must be comparable to that of an index scan as computed
in SQL_SELECT::test_quick_select(). When the groups are small,
e.g. for a unique index, using index scan will be cheaper since it
reads the next record without having to re-position to it on every
group. To make the CPU cost reflect this, we estimate the CPU cost
as the sum of:
1. Cost for evaluating the condition (similarly as for index scan).
2. Cost for navigating the index structure (assuming a b-tree).
Note: We only add the cost for one comparision per block. For a
b-tree the number of comparisons will be larger.
TODO: This cost should be provided by the storage engine.
*/
cpu_cost= (double) num_groups / TIME_FOR_COMPARE;
const double tree_traversal_cost=
ceil(log(static_cast<double>(table_records))/
log(static_cast<double>(keys_per_block))) *
1/double(2*TIME_FOR_COMPARE);
const double cpu_cost= num_groups *
(tree_traversal_cost + 1/double(TIME_FOR_COMPARE));
*read_cost= io_cost + cpu_cost;
*records= num_groups;