The first patch for the bug was erroneous: it did not take into account
the fact that the modified function get_key_scans_params() was called in
different contexts. As a result the patch caused a regression bug MDEV-22191.
The patch for this bug introduced an extra parameter. Actually we can
do without this parameter and use the fourth parameter for the same
puropose - to differentiate between the calls of the function for range
access and for index merge access.
Also removed the call of get_key_scans_params() in the code of the function
merge_same_index_scans() as not needed.
In main.index_merge_myisam we remove the test that was added in
commit a2d24def8c because
it duplicates the test case that was added in
commit 5af12e4635.
When index_merge_sort_union is turned off only ror scans were considered for range
scans, which is wrong.
To fix the problem ensure both ror scans and non ror scans are considered for range
access
SQL_SELECT::check_quick() returns error status only
test_quick_select() returns -1. Fix error handling when lower frames
throw error, but it is ignored by test_quick_select(). Fix return
status for out-of-memory errors which are obviously must be processed
as error in upper frames.
- multi_range_read_info_const now uses the new records_in_range interface
- Added handler::avg_io_cost()
- Don't calculate avg_io_cost() in get_sweep_read_cost if avg_io_cost is
not 1.0. In this case we trust the avg_io_cost() from the handler.
- Changed test_quick_select to use TIME_FOR_COMPARE instead of
TIME_FOR_COMPARE_IDX to align this with the rest of the code.
- Fixed bug when using test_if_cheaper_ordering where we didn't use
keyread if index was changed
- Fixed a bug where we didn't use index only read when using order-by-index
- Added keyread_time() to HEAP.
The default keyread_time() was optimized for blocks and not suitable for
HEAP. The effect was the HEAP prefered table scans over ranges for btree
indexes.
- Fixed get_sweep_read_cost() for HEAP tables
- Ensure that range and ref have same cost for simple ranges
Added a small cost (MULTI_RANGE_READ_SETUP_COST) to ranges to ensure
we favior ref for range for simple queries.
- Fixed that matching_candidates_in_table() uses same number of records
as the rest of the optimizer
- Added avg_io_cost() to JT_EQ_REF cost. This helps calculate the cost for
HEAP and temporary tables better. A few tests changed because of this.
- heap::read_time() and heap::keyread_time() adjusted to not add +1.
This was to ensure that handler::keyread_time() doesn't give
higher cost for heap tables than for normal tables. One effect of
this is that heap and derived tables stored in heap will prefer
key access as this is now regarded as cheap.
- Changed cost for index read in sql_select.cc to match
multi_range_read_info_const(). All index cost calculation is now
done trough one function.
- 'ref' will now use quick_cost for keys if it exists. This is done
so that for '=' ranges, 'ref' is prefered over 'range'.
- scan_time() now takes avg_io_costs() into account
- get_delayed_table_estimates() uses block_size and avg_io_cost()
- Removed default argument to test_if_order_by_key(); simplifies code
Prototype change:
- virtual ha_rows records_in_range(uint inx, key_range *min_key,
- key_range *max_key)
+ virtual ha_rows records_in_range(uint inx, const key_range *min_key,
+ const key_range *max_key,
+ page_range *res)
The handler can ignore the page_range parameter. In the case the handler
updates the parameter, the optimizer can deduce the following:
- If previous range's last key is on the same block as next range's first
key
- If the current key range is in one block
- We can also assume that the first and last block read are cached!
This can be used for a better calculation of IO seeks when we
estimate the cost of a range index scan.
The parameter is fully implemented for MyISAM, Aria and InnoDB.
A separate patch will update handler::multi_range_read_info_const() to
take the benefits of this change and also remove the double
records_in_range() calls that are not anymore needed.
This was done to both simplify the code and also to be easier to handle
storage engines that are clustered on some other index than the primary
key.
As pk_is_clustering_key() and is_clustering_key now are using only
index_flags, these where removed from all storage engines.
'index_merge_sort_union=off'
When index_merge_sort_union is set to 'off' and index_merge_union is set
to 'on' then any evaluated index merge scan must consist only of ROR scans.
The cheapest out of such index merges must be chosen. This index merge
might not be the cheapest index merge.
e.g.
- dont -> don't
- occurence -> occurrence
- succesfully -> successfully
- easyly -> easily
Also remove trailing space in selected files.
These changes span:
- server core
- Connect and Innobase storage engine code
- OQgraph, Sphinx and TokuDB storage engines
Related to MDEV-21769.
This bug could manifest itself in a very rare cases when the optimizer
chose an execution plan by which a joined table was accessed by a table
scan and the optimizer was checking whether ranges checked for each record
could improve this plan. In such cases the optimizer evaluates range
conditions over a table that depend on other tables. For such conditions
the constructed SEL_ARG trees are marked as MAYBE_KEY. If a SEL_ARG object
constructed for a sargable condition marked as RANGE_KEY had the same
first key part as a MAYBE_KEY SEL_ARG object and the key_and() function
was called for this pair of SEL_ARG objects then an invalid SEL_ARG
object could be constructed that ultimately could lead to a crash before
the execution phase.
'index_merge_sort_union=off'
When index_merge_sort_union is set to 'off' and index_merge_union is set
to 'on' then any evaluated index merge scan must consist only of ROR scans.
The cheapest out of such index merges must be chosen. This index merge
might not be the cheapest index merge.
The default keyread_time() was optimized for blocks and not suitable for
HEAP. The effect was the HEAP prefered table scans over ranges for btree
indexes.
Fixed also get_sweep_read_cost() for HEAP tables.
- Added unlikely() to optimize for not having optimizer trace enabled
- Made THD::trace_started() inline
- Added 'if (trace_enabled())' around some potentially expensive code
(not many found)
- Added ASSERT's to ensure we don't call expensive optimizer trace calls
if optimizer trace is not enabled
- Added length to Json_writer functions to speed up buffer writes
when optimizer trace is enabled.
- Changed LEX_CSTRING argument handling to not send full struct to writer
function on_add_str() functions now trusts length arguments
This patch fixes the following defects/bugs.
1. If BKA[H] algorithm was used to join a table for which the optimizer
had decided to employ a rowid filter the filter actually was not built.
2. The patch for the bug MDEV-21356 that added the code canceling pushing
rowid filter into an engine for the table joined with employment of
BKA[H] and MRR was not quite correct for Innodb engine because this
cancellation was done after InnoDB code had already bound the the pushed
filter to internal InnoDB structures.
ANding of the range built from inferred NOT NULL conditions and the range
built from other conditions used in WHERE/ON clauses may produce an
IMPOSSIBLE range. The code of MDEV-15777 did not take into account this
possibility.
[Variant 2 of the fix: collect the attached conditions]
Problem:
make_join_select() has a section of code which starts with
"We plan to scan all rows. Check again if we should use an index."
the code in that section will [unnecessarily] re-run the range
optimizer using this condition:
condition_attached_to_current_table AND current_table's_ON_expr
Note that the original invocation of range optimizer in
make_join_statistics was done using the whole select's WHERE condition.
Taking the whole select's WHERE condition and using multiple-equalities
allowed the range optimizer to infer more range restrictions.
The fix:
- Do range optimization using a condition that is an AND of this table's
condition and all of the previous tables' conditions.
- Also, fix the range optimizer to prefer SEL_ARGs with type=KEY_RANGE
over SEL_ARGS with type=MAYBE_KEY, regardless of the key part.
Computing
key_and(
SEL_ARG(type=MAYBE_KEY key_part=1),
SEL_ARG(type=KEY_RANGE, key_part=2)
)
will now produce the SEL_ARG with type=KEY_RANGE.
In the function test_if_cheaper_ordering we make a decision if using an index is better than
using filesort for ordering. If we chose to do range access then in test_quick_select we
should make sure that cost for table scan is set to DBL_MAX so that it is not picked.
selectivity values fails
After having set the assertion that checks validity of selectivity values
returned by the function table_cond_selectivity() a test case from
order_by.tesst failed. The failure occurred because range optimizer could
return as an estimate of the cardinality of the ranges built for an index
a number exceeding the total number of records in the table.
The second bug is more subtle. It may happen when there are several
indexes with same prefix defined on the first joined table t accessed by
a constant ref access. In this case the range optimizer estimates the
number of accessed records of t for each usable index and these
estimates can be different. Only the first of these estimates is taken
into account when the selectivity of the ref access is calculated.
However the optimizer later can choose a different index that provides
a different estimate. The function table_condition_selectivity() could use
this estimate to discount the selectivity of the ref access. This could
lead to an selectivity value returned by this function that was greater
that 1.
This patch introduces the optimization that allows range optimizer to
consider index range scans that are built employing NOT NULL predicates
inferred from WHERE conditions and ON expressions.
The patch adds a new optimizer switch not_null_range_scan.