This bug may affect the queries that uses a grouping derived table with
grouping list containing references to columns from different tables if
the optimizer decides to employ the split optimization for the derived
table. In some very specific cases it may affect queries with a grouping
derived table that refers only one base table.
This bug was caused by an improper fix for the bug MDEV-25128. The fix
tried to get rid of the equality conditions pushed into the where clause
of the grouping derived table T to which the split optimization had been
applied. The fix erroneously assumed that only those pushed equalities
that were used for ref access of the tables referenced by T were needed.
In fact the function remove_const() that figures out what columns from the
group list can be removed if the split optimization is applied can uses
other pushed equalities as well.
This patch actually provides a proper fix for MDEV-25128. Rather than
trying to remove invalid pushed equalities referencing the fields of SJM
tables with a look-up access the patch attempts not to push such equalities.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This bug could affect queries with a grouping derived table containing
equalities in the where clause of its specification if the optimizer
chose to apply split optimization to access the derived table. In such
cases wrong results could be returned from the queries.
When the optimizer considers a possibility of using split optimization
to a derived table it injects equalities joining the derived table with
other tables into the where condition of the derived table. After the join
order for the execution using split optimization has been chosen as the
cheapest the injected equalities that are not used to access the derived
table are removed from the where condition of the derived table.
For this removal the optimizer looks through the conjuncts of the where
condition of the derived table, fetches the equalities and checks whether
they belong to the list of injected equalities.
As the injection of the list was performed just by the insertion of it
into the list of top level AND condition of the where condition some extra
conjuncts from the where condition could be automatically attached to the
end of the list of injected equalities. If such attached conjunct happened
to be an equality predicate it was removed from the where condition of the
derived table and thus lost for checking at the execution phase.
The bug has been fixed by injecting of a shallow copy of the list of the
pushed equalities rather than the list itself leaving the latter intact.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
If a join query uses a derived table (view / CTE) with GROUP BY clause then
the execution plan for such join may employ split optimization. When this
optimization is employed the derived table is not materialized. Rather only
some partitions of the derived table are subject to grouping. Split
optimization can be applied only if:
- there are some indexes over the tables used in the join specifying the
derived table whose prefixes partially cover the field items used in the
GROUP BY list (such indexes are called splitting indexes)
- the WHERE condition of the join query contains conjunctive equalities
between columns of the derived table that comprise major parts of
splitting indexes and columns of the other join tables.
When the optimizer evaluates extending of a partial join by the rows of the
derived table it always considers a possibility of using split optimization.
Different splitting indexes can be used depending on the extended partial
join. At some rare conditions, for example, when there is a non-splitting
covering index for a table joined in the join specifying the derived table
usage of a splitting index to produce rows needed for grouping may be still
less beneficial than usage of such covering index without any splitting
technique. The function JOIN_TAB::choose_best_splitting() must take this
into account.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
If a join query uses a derived table (view / CTE) with GROUP BY clause then
the execution plan for such join may employ split optimization. When this
optimization is employed the derived table is not materialized. Rather only
some partitions of the derived table are subject to grouping. Split
optimization can be applied only if:
- there are some indexes over the tables used in the join specifying the
derived table whose prefixes partially cover the field items used in the
GROUP BY list (such indexes are called splitting indexes)
- the WHERE condition of the join query contains conjunctive equalities
between columns of the derived table that comprise major parts of
splitting indexes and columns of the other join tables.
When the optimizer evaluates extending of a partial join by the rows of the
derived table it always considers a possibility of using split optimization.
Different splitting indexes can be used depending on the extended partial
join. At some rare conditions, for example, when there is a non-splitting
covering index for a table joined in the join specifying the derived table
usage of a splitting index to produce rows needed for grouping may be still
less beneficial than usage of such covering index without any splitting
technique. The function JOIN_TAB::choose_best_splitting() must take this
into account.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
splittable derived
If one of joined tables of the processed query is a materialized derived
table (or view or CTE) with GROUP BY clause then under some conditions it
can be subject to split optimization. With this optimization new equalities
are injected into the WHERE condition of the SELECT that specifies this
derived table. The injected equalities are generated for all join orders
with which the split optimization can employed. After the best join order
has been chosen only certain of this equalities are really needed. The
others can be safely removed. If it's not done and some of injected
equalities involve expressions over semi-joins with look-up access then
the query may return a wrong result set.
This patch effectively removes equalities injected for split optimization
that are needed only at the optimization stage and not needed for execution.
Approved by serg@mariadb.com
sprintf() format of double changed from '%lg' to '%-.11lg'
The change was to make it easier to read optimizer trace output
with tables that has millions of records.
- multi_range_read_info_const now uses the new records_in_range interface
- Added handler::avg_io_cost()
- Don't calculate avg_io_cost() in get_sweep_read_cost if avg_io_cost is
not 1.0. In this case we trust the avg_io_cost() from the handler.
- Changed test_quick_select to use TIME_FOR_COMPARE instead of
TIME_FOR_COMPARE_IDX to align this with the rest of the code.
- Fixed bug when using test_if_cheaper_ordering where we didn't use
keyread if index was changed
- Fixed a bug where we didn't use index only read when using order-by-index
- Added keyread_time() to HEAP.
The default keyread_time() was optimized for blocks and not suitable for
HEAP. The effect was the HEAP prefered table scans over ranges for btree
indexes.
- Fixed get_sweep_read_cost() for HEAP tables
- Ensure that range and ref have same cost for simple ranges
Added a small cost (MULTI_RANGE_READ_SETUP_COST) to ranges to ensure
we favior ref for range for simple queries.
- Fixed that matching_candidates_in_table() uses same number of records
as the rest of the optimizer
- Added avg_io_cost() to JT_EQ_REF cost. This helps calculate the cost for
HEAP and temporary tables better. A few tests changed because of this.
- heap::read_time() and heap::keyread_time() adjusted to not add +1.
This was to ensure that handler::keyread_time() doesn't give
higher cost for heap tables than for normal tables. One effect of
this is that heap and derived tables stored in heap will prefer
key access as this is now regarded as cheap.
- Changed cost for index read in sql_select.cc to match
multi_range_read_info_const(). All index cost calculation is now
done trough one function.
- 'ref' will now use quick_cost for keys if it exists. This is done
so that for '=' ranges, 'ref' is prefered over 'range'.
- scan_time() now takes avg_io_costs() into account
- get_delayed_table_estimates() uses block_size and avg_io_cost()
- Removed default argument to test_if_order_by_key(); simplifies code
This bug could cause a crash for any query that used a derived table/view/CTE
whose specification was a SELECT with a GROUP BY clause and a FROM list
containing 32 or more table references.
The problem appeared only in the cases when the splitting optimization
could be applied to such derived table/view/CTE.
Do not materialize a semi-join nest if it contains a materialized derived
table /view that potentially can be subject to the split optimization.
Splitting of materialization of such nest would help, but currently there
is no code to support this technique.
with condition_pushdown_from_having
This bug could manifest itself for queries with GROUP BY and HAVING clauses
when the HAVING clause was a conjunctive condition that depended
exclusively on grouping fields and at least one conjunct contained an
equality of the form fld=sq where fld is a grouping field and sq is a
constant subquery.
In this case the optimizer tries to perform a pushdown of the HAVING
condition into WHERE. To construct the pushable condition the optimizer
first transforms all multiple equalities in HAVING into simple equalities.
This has to be done for a proper processing of the pushed conditions
in WHERE. The multiple equalities at all AND/OR levels must be converted
to simple equalities because any multiple equality may refer to a multiple
equality at the upper level.
Before this patch the conversion was performed like this:
multiple_equality(x,f1,...,fn) => x=f1 and ... and x=fn.
When an equality item for x=fi was constructed both the items for x and fi
were cloned. If x happened to be a constant subquery that could not be
cloned the conversion failed. If the conversions of multiple equalities
previously performed had succeeded then the whole condition became in an
inconsistent state that could cause different failures.
The solution provided by the patch is:
1. to use a different conversion rule if x is a constant
multiple_equality(x,f1,...,fn) => f1=x and f2=f1 and ... and fn=f1
2. not to clone x if it's a constant.
Such conversions cannot fail and besides the result of the conversion
preserves the equivalence of f1,...,fn that can be used for other
optimizations.
This patch also made sure that expensive predicates are not pushed from
HAVING to WHERE.
This bug is caused by pushdown from HAVING into WHERE.
It appears because condition that is pushed wasn't fixed.
It is also discovered that condition pushdown from HAVING into
WHERE is done wrong. There is no need to build clones for some
conditions that can be pushed. They can be simply moved from HAVING
into WHERE without cloning.
build_pushable_cond_for_having_pushdown(),
remove_pushed_top_conjuncts_for_having() methods are changed.
It is found that there is no transformation made for fields of
pushed condition.
field_transformer_for_having_pushdown transformer is added.
New tests are added. Some comments are changed.
The bug manifested itself when executing a query with materialized
view/derived/CTE whose specification was a SELECT query contained
another materialized derived and impossible WHERE/HAVING condition
was detected for this SELECT.
As soon as such condition is detected the join structures of all
derived tables used in the SELECT are destroyed. So optimization
of the queries specifying these derived tables is impossible. Besides
it's not needed.
In 10.3 optimization of a materialized derived table is performed before
detection of impossible WHERE/HAVING condition in the embedding SELECT.