The issue here is the wrong estimate of the cardinality of a partial join,
the cardinality is too high because the function table_cond_selectivity()
returns an absurd number 100 while selectivity cannot be greater than 1.
When accessing table t by outer reference t1.a via index we do not perform any
range analysis for t. Yet we see TABLE::quick_key_parts[key] and
TABLE->quick_rows[key] contain a non-zero value though these should have been
remained untouched and equal to 0.
Thus real cause of the problem is that TABLE::init does not clean the arrays
TABLE::quick_key_parts[] and TABLE::>quick_rows[].
It should have done it because the TABLE structure created for any
instance of a table can be reused for many queries.
In the function prev_record_reads where one finds the different row combinations for a
subset of partial join, it did not take into account the selectivity of tables
involved in the subset of partial join.
selectivity values fails
After having set the assertion that checks validity of selectivity values
returned by the function table_cond_selectivity() a test case from
order_by.tesst failed. The failure occurred because range optimizer could
return as an estimate of the cardinality of the ranges built for an index
a number exceeding the total number of records in the table.
The second bug is more subtle. It may happen when there are several
indexes with same prefix defined on the first joined table t accessed by
a constant ref access. In this case the range optimizer estimates the
number of accessed records of t for each usable index and these
estimates can be different. Only the first of these estimates is taken
into account when the selectivity of the ref access is calculated.
However the optimizer later can choose a different index that provides
a different estimate. The function table_condition_selectivity() could use
this estimate to discount the selectivity of the ref access. This could
lead to an selectivity value returned by this function that was greater
that 1.
When discounting selectivity of ref access, don't discount the
selectivity we've already discounted for range access.
The 10.1 version of the fix. Will need to adjust condition filtering
test results in 10.4
Currently for selectivity calculation we perform range analysis for a column even when we don't have any statistics(EITS).
This makes less sense but is used to catch contradiction for WHERE condition.
So the solution is to not perform range analysis for selectivity calculation for columns that do not have statistics.
materialization scan over materialization lookup
For non-mergeable semi-joins we don't store the estimates of the IN subquery in table->file->stats.records.
In the function TABLE_LIST::fetch_number_of_rows, we store the number of rows in the tables
(estimates in case of derived table/views).
Currently we don't store the estimates for non-mergeable semi-joins, which leads to a problem of selecting
materialization scan over materialization lookup.
Fixed this by storing these estimated appropriately
The function Item_func_isnull::update_used_tables() must
handle the case when the predicate is over not nullable
column in a special way.
This is actually a bug of MariaDB 5.3/5.5, but it's probably
hard to demonstrate that it can cause problems there.
In the function create_key_parts_for_pseudo_indexes()
the key part structures of pseudo-indexes created for
BLOB fields were set incorrectly.
Also the key parts for long fields must be 'truncated'
up to the maximum length acceptable for key parts.
1. When min/max value is provided the null flag for it must be set to 0
in the bitmap Culumn_statistics::column_stat_nulls.
2. When the calculation of the selectivity of the range condition
over a column requires min and max values for the column then we
have to check that these values are provided.
- Fix Histogram::point_selectivity() to work in the case where the
passed value_pos=0 (or 1) and the first (or the last) bucket in the
histogram has zero value-range (i.e one value).
[Attempt #2]
- Use a new selectivity calculation formula in Histogram::point_selectivity.
The formula is different from the old one because it was developed from scratch.
it doesn't have any possible division-by-zero problems.
After constant table row substitution the where condition may be converted
to always true. The function calculate_cond_selectivity_for_table() should
take into account this possibility.
The function calculate_cond_selectivity_for_table() must consider
the case when the key range tree returned by the call of get_mm_tree()
is of the type SEL_TREE::ALWAYS.
When calculating the selectivity of a range in the function
get_column_range_cardinality a check whether NULL values are
included into into the range must be done.
Don't try to a histogram if it is not read into the cache for statistical data.
It may happen so if optimizer_use_condition_selectivity is set to 3. This
setting orders the optimizer not use histograms to calculate selectivity.
When performing the range analysis for a conjunction the function
calculate_cond_selectivity_for_table should take in to account that
the analysis of some conjuncts may return SEL_ARG::IMPOSSIBLE.