derived table / view by equality
Now rows of a materialized derived table are always put into a
temporary table before join operation. If BNLH is used to join this
table with the result of a partial join then both operands of the
join are actually put into main memory. In most cases this is not
efficient.
We could avoid this by sending the rows of the derived table directly
to the join operation. However this kind of data flow is not supported
yet.
Fixed by not allowing usage of hash join algorithm to join a materialized
derived table if it's joined by an equality predicate of the form
f=e where f is a field of the derived table.
derived table / view by equality
Now rows of a materialized derived table are always put into a
temporary table before join operation. If BNLH is used to join this
table with the result of a partial join then both operands of the
join are actually put into main memory. In most cases this is not
efficient.
We could avoid this by sending the rows of the derived table directly
to the join operation. However this kind of data flow is not supported
yet.
Fixed by not allowing usage of hash join algorithm to join a materialized
derived table if it's joined by an equality predicate of the form
f=e where f is a field of the derived table.
The bug was this scenario:
1. Join optimizer picks a range plan on index IDX1
(This index doesn't match the ORDER BY clause, so sorting will be needed)
2. Index Condition Pushdown pushes a part of WHERE down. The pushed
condition is removed from SQL_SELECT::cond
3. test_if_skip_sort_order() figures that it's better to use IDX2
(as it will match ORDER BY ... LIMIT and so will execute faster)
3.1 It sees that there was a possible range access on IDX2. It tries to
construct it by calling SQL_SELECT::test_quick_select(), but alas,
SQL_SELECT::cond doesn't have all parts of WHERE anymore.
So it uses full index scan which is slow.
(The execution works fine because there's code further in test_if_skip_sort_order()
which "Unpushes" the index condition and restores the original WHERE clause.
It was just the test_quick_select call that suffered).
After the commit b76b69cd5f
loose index scan for queries with DISTINCT stopped working.
That is why that commit has to be reverted.
Additionally this patch fixes the problem of MDEV-10880.
with join_cache_level>2
During muliple equality propagation for a query in which we have an IN subquery, the items in the select list of the
subquery may not be part of the multiple equality because there might be another occurence of the same field in the
where clause of the subquery.
So we keyuse_is_valid_for_access_in_chosen_plan function which expects the items in the select list of the subquery to
be same to the ones in the multiple equality (through these multiple equalities we create keyuse array).
The solution would be that we expect the same field not the same Item because when we have SEMI JOIN MATERIALIZATION SCAN,
we use copy back technique to copies back the materialised table fields to the original fields of the base tables.
This patch fixes another problem introduced by the patch for mdev-4817.
The latter changed Item_cond::fix_fields() in such a way that it could
call the virtual method is_expensive(). With the first its call
the method saves the result in Item::is_expensive_cache. For all next
calls the method returns the result from this cache. So if the item
once was determined as expensive the method always returns true.
For subqueries it's not good, because non-optimized subqueries always
is considered as expensive.
It means that the cache should be invalidated after the call of
optimize_constant_subqueries().
This patch fixes another problem introduced by the patch for mdev-4817.
The latter changed Item_cond::fix_fields() in such a way that it could
call the virtual method is_expensive(). With the first its call
the method saves the result in Item::is_expensive_cache. For all next
calls the method returns the result from this cache. So if the item
once was determined as expensive the method always returns true.
For subqueries it's not good, because non-optimized subqueries always
is considered as expensive.
It means that the cache should be invalidated after the call of
optimize_constant_subqueries().
This patch fixes another problem introduced by the patch for mdev-4817.
The latter changed Item_cond::fix_fields() in such a way that it could
call the virtual method is_expensive(). With the first its call
the method saves the result in Item::is_expensive_cache. For all next
calls the method returns the result from this cache. So if the item
once was determined as expensive the method always returns true.
For subqueries it's not good, because non-optimized subqueries always
is considered as expensive.
It means that the cache should be invalidated after the call of
optimize_constant_subqueries().
Due to a legacy bug in the code of make_join_statistics() detecting
so-called constant tables could miss some of them in rare queries
that used RIGHT JOIN. As a result these queries had execution plans
different from the execution plans of the equivalent queries with
LEFT JOIN.
Besides starting from 10.2 this could trigger an assertion failure.
When the definition of the index used for hash join was created
in create_hj_key_for_table() it could cause memory overwrite
due to a bug that led to an underestimation of the number of
the index component.
In this case we are accessing incorrect memory when we have mergeable semi-joins.
In the case when we have mergeable semi joins parent select will have a table count
of all the tables in that select plus all the tables involved in the IN-subquery.
But this table count does not include the "sjm table" (only includes the inner and outer tables)
denotes as <subquery#> in explain.
For non-semi-join subquery optimization we do a cost based decision between
Materialisation and IN -> EXIST transformation. The issue in this case is that for IN->EXIST transformation
we run JOIN::reoptimize with the IN->EXISt conditions and we come up with a new query plan. But when we compare
the cost with Materialization, we make the decision to chose Materialization so we need to restore the query plan
for Materilization.
The saving and restoring for keyuse array and join_tab keyuse is only done when we have atleast
one element in the keyuse_array , we are now changing to do it even for 0 elements to main the generality.
upon SELECT .. LIMIT 0
The code must differentiate between a SELECT with contradictory
WHERE/HAVING and one with LIMIT 0.
Also for the latter printed 'Zero limit' instead of 'Impossible where'
in the EXPLAIN output.
multiple times with different arguments.
If the ON expression of an outer join is an OR formula with one
of the disjunct being a constant formula then the expression
cannot be null-rejected if the constant formula is true. Otherwise
it can be null-rejected and if so the outer join can be converted
into inner join. This optimization was added in the patch for
mdev-4817. Yet the code had a defect: if the query was used in
a stored procedure with parameters and the constant item contained
some of them then the value of this constant item depended on the
values of the parameters. With some parameters it may be true,
for others not. The validity of conversion to inner join is checked
only once and it happens only for the first call of procedure.
So if the parameters in the first call allowed the conversion it
was done and next calls used the transformed query though there
could be calls whose parameters made the conversion invalid.
Fixed by cheking whether the constant disjunct in the ON expression
originally contained an SP parameter. If so the expression is not
considered as null-rejected. For this check a new item's attribute
was intruduced: Item::with_param. It is calculated for each item
by fix fields() functions.
Also moved the call of optimize_constant_subqueries() in
JOIN::optimize after the call of simplify_joins(). The reason
for this is that after the optimization introduced by the patch
for mdev-4817 simplify_joins() can use the results of execution
of non-expensive constant subqueries and this is not valid.
Counter for select numbering made stored with the statement (before was global)
So now it does have always accurate value which does not depend on
interruption of statement prepare by errors like lack of table in
a view definition.
MDEV-14957: JOIN::prepare gets unusable "conds" as argument
Do not touch merged derived (it is irreversible)
Fix first argument of in_optimizer for calls possible before fix_fields()
The assertion failure was caused by an incorrectly set read_set for
functions in the ORDER BY clause in part of a union, when we are using
a mergeable view and the order by clause can be skipped (removed).
An order by clause can be skipped if it's part of one part of the UNION as
the result set is not meaningful when multiple SELECT queries are UNIONed. The
server is aware of this optimization and tries to remove the order by
clause before JOIN::prepare. The problem is that we need to throw an
error when the ORDER BY clause contains invalid columns. To do this, we
attempt resolving the ORDER BY expressions, then subsequently drop them
if resolution succeeded. However, ORDER BY resolution had the side
effect of adding the expressions to the all_fields list, which is used
to construct temporary tables to store the result. We may be ignoring
the ORDER BY statement, but the tmp table still tried to compute the
values for the expressions, even if the columns are never used.
The assertion only shows itself if the order by clause contains members
which were not previously in the select list, and are part of a
function.
There is an additional question as to why this only manifests when using
VIEWS and not when using a regular table. The difference lies with the
"reset" of the read_set for the temporary table during
SELECT_LEX::update_used_tables() in JOIN::optimize(). The changes
introduced in fdf789a7ea cleared the
read_set when a mergeable view is encountered in the TABLE_LIST
defintion.
Upon initial order_list resolution, the table's read_set is updated
correctly. JOIN::optimize() will only reset the read_set if it
encounters a VIEW. Since we no longer have ORDER BY clause in
JOIN::optimize() we never get to correctly update the read_set again.
Other relevant commit by Timour, which first introduced the order
resolution when we "can_skip_sort_order":
883af99e7d
Solution:
Don't add the resolved ORDER BY elements to all_fields. We only resolve
them to check if an error should be returned for the query. Ignore them
completely otherwise.
TRASH was mapped to TRASH_FREE and was supposed to be used for memory
that should not be accessed anymore, while TRASH_ALLOC() is to be
used for uninitialized but to-be-used memory.
But sometimes TRASH() was used in the latter sense.
Remove TRASH() macro, always use explicit TRASH_ALLOC() or TRASH_FREE().
In this case we were using the optimization derived_with_keys but we could not create a key
because the length of the key was greater than the max allowed(MI_MAX_KEY_LENGTH).
To do the join we needed to create a hash join key instead, but in the explain output it
showed that we were still referring to derived keys which were created but not used.
In the function JOIN::shrink_join_buffers the iteration over joined
tables was organized in a wrong way. This could cause a crash if
the optimizer chose to materialize a semi-join that used join caches
for which the sizes must be adjusted.
* get_rec_bits() was always reading two bytes, even if the
bit field contained only of one byte
* In various places the code used field->pack_length() bytes
starting from field->ptr, while it should be field->pack_length_in_rec()
* Field_bit::key_cmp and Field_bit::cmp_max passed field_length as
an argument to memcmp(), but field_length is the number of bits!
optimizer_switch
For DATE and DATETIME columns defined as NOT NULL,
"date_notnull IS NULL" has to be modified to:
"date_notnull IS NULL OR date_notnull == 0"
if date_notnull is from an inner table of outer join);
"date_notnull == 0" - otherwise.
This must hold for such columns of mergeable views and derived
tables as well. So far the code did the above re-writing only
for columns of base tables and temporary tables.
with joins, SQ, ORDER BY, semijoin=on
A bug in get_sort_by_table() could mislead the function
setup_semijoin_dups_elimination(). As a result the optimizer
could produce invalid execution plans for queries with ORDER BY
and subquery predicates that could be converted to semi-joins.