Also fixes:
MDEV-17741 Assertion `thd->Item_change_list::is_empty()' failed in mysql_parse after unsuccessful PS
The problem was introduced by:
commit f033fbd9f2
Changed the test case for MDEV-15571
It was later fixed, but in 10.3 only:
commit ce2cf855bf
MDEV-16043 Assertion thd->Item_change_list::is_empty() failed in mysql_parse
upon SELECT from a view reading from a versioned table
This patch is a backport of ce2cf855bf to 10.2
While calculating distinct with the function remove_dup_with_compare, we don't have rnd_end calls
when we have completed the scan over the temporary table.
Added ha_rnd_end calls when we are done with the scan of the table.
When the with clause of a query contains a recursive CTE that is not used
then processing of EXPLAIN for this query does not require optimization
of the unit specifying this CTE. In this case if 'derived' is the
TABLE_LIST object created for this CTE then derived->derived_result is NULL
and any assignment to derived->derived_result->table causes a crash.
After fixing this problem in the code of st_select_lex_unit::prepare()
EXPLAIN for such a query worked without crashes. Yet an execution
plan for the recursive CTE appeared there. The cause of this problem was
an incorrect condition used in JOIN::save_explain_data_intern() that
determined whether CTE was to be optimized or not. A similar condition was
used in select_describe() and this patch has corrected it as well.
During the optimize state of a query, we come know that the result set
would atmost contain one row, then for such a query we don't need
to compute GROUP BY, ORDER BY and DISTINCT.
Users expect window functions to produce a certain ordering of rows in
the final result set. Although the standard does not require this, we
already have the filesort result done for when we computed the window
function. If there is no ORDER BY attached to the query, just keep it
till the SELECT is completely evaluated and use that to print the
result.
Update test cases as many did not take care to guarantee a stable
result.
The ONLY_FULL_GROUP_BY mode states that for SELECT ... GROUP BY queries,
disallow SELECTing columns which are not referred to in the GROUP BY clause,
unless they are passed to an aggregate function like COUNT() or MAX().
This holds only for the GROUP BY clause of the query.
The code also checks this for the partition clause of the window function which is
incorrect.
When we have a query which has implicit_grouping then we are sure that we would end up with only one
row so there is no point to do DISTINCT computation
derived table / view by equality
Now rows of a materialized derived table are always put into a
temporary table before join operation. If BNLH is used to join this
table with the result of a partial join then both operands of the
join are actually put into main memory. In most cases this is not
efficient.
We could avoid this by sending the rows of the derived table directly
to the join operation. However this kind of data flow is not supported
yet.
Fixed by not allowing usage of hash join algorithm to join a materialized
derived table if it's joined by an equality predicate of the form
f=e where f is a field of the derived table.
derived table / view by equality
Now rows of a materialized derived table are always put into a
temporary table before join operation. If BNLH is used to join this
table with the result of a partial join then both operands of the
join are actually put into main memory. In most cases this is not
efficient.
We could avoid this by sending the rows of the derived table directly
to the join operation. However this kind of data flow is not supported
yet.
Fixed by not allowing usage of hash join algorithm to join a materialized
derived table if it's joined by an equality predicate of the form
f=e where f is a field of the derived table.
derived table / view by equality
Now rows of a materialized derived table are always put into a
temporary table before join operation. If BNLH is used to join this
table with the result of a partial join then both operands of the
join are actually put into main memory. In most cases this is not
efficient.
We could avoid this by sending the rows of the derived table directly
to the join operation. However this kind of data flow is not supported
yet.
Fixed by not allowing usage of hash join algorithm to join a materialized
derived table if it's joined by an equality predicate of the form
f=e where f is a field of the derived table.
The bug was this scenario:
1. Join optimizer picks a range plan on index IDX1
(This index doesn't match the ORDER BY clause, so sorting will be needed)
2. Index Condition Pushdown pushes a part of WHERE down. The pushed
condition is removed from SQL_SELECT::cond
3. test_if_skip_sort_order() figures that it's better to use IDX2
(as it will match ORDER BY ... LIMIT and so will execute faster)
3.1 It sees that there was a possible range access on IDX2. It tries to
construct it by calling SQL_SELECT::test_quick_select(), but alas,
SQL_SELECT::cond doesn't have all parts of WHERE anymore.
So it uses full index scan which is slow.
(The execution works fine because there's code further in test_if_skip_sort_order()
which "Unpushes" the index condition and restores the original WHERE clause.
It was just the test_quick_select call that suffered).
using INSERT INTO
This patch allows condition pushdown into a materialized derived / view when
this table is used in INSERT SELECT, multi-table UPDATE and multi-table DELETE.
After the commit b76b69cd5f
loose index scan for queries with DISTINCT stopped working.
That is why that commit has to be reverted.
Additionally this patch fixes the problem of MDEV-10880.
Assertion `used_tables_cache == 0' failed
This bug manifested itself when executing queries
over materialized derived tables /vies and with
conjunctive always true predicates containing
inexpensive single-row subqueries.
This bug disappeared after the patch mdev-15035
had been applied.
This patch fixes another problem introduced by the patch for mdev-4817.
The latter changed Item_cond::fix_fields() in such a way that it could
call the virtual method is_expensive(). With the first its call
the method saves the result in Item::is_expensive_cache. For all next
calls the method returns the result from this cache. So if the item
once was determined as expensive the method always returns true.
For subqueries it's not good, because non-optimized subqueries always
is considered as expensive.
It means that the cache should be invalidated after the call of
optimize_constant_subqueries().
with join_cache_level>2
During muliple equality propagation for a query in which we have an IN subquery, the items in the select list of the
subquery may not be part of the multiple equality because there might be another occurence of the same field in the
where clause of the subquery.
So we keyuse_is_valid_for_access_in_chosen_plan function which expects the items in the select list of the subquery to
be same to the ones in the multiple equality (through these multiple equalities we create keyuse array).
The solution would be that we expect the same field not the same Item because when we have SEMI JOIN MATERIALIZATION SCAN,
we use copy back technique to copies back the materialised table fields to the original fields of the base tables.
This patch fixes another problem introduced by the patch for mdev-4817.
The latter changed Item_cond::fix_fields() in such a way that it could
call the virtual method is_expensive(). With the first its call
the method saves the result in Item::is_expensive_cache. For all next
calls the method returns the result from this cache. So if the item
once was determined as expensive the method always returns true.
For subqueries it's not good, because non-optimized subqueries always
is considered as expensive.
It means that the cache should be invalidated after the call of
optimize_constant_subqueries().
This patch fixes another problem introduced by the patch for mdev-4817.
The latter changed Item_cond::fix_fields() in such a way that it could
call the virtual method is_expensive(). With the first its call
the method saves the result in Item::is_expensive_cache. For all next
calls the method returns the result from this cache. So if the item
once was determined as expensive the method always returns true.
For subqueries it's not good, because non-optimized subqueries always
is considered as expensive.
It means that the cache should be invalidated after the call of
optimize_constant_subqueries().
This patch fixes another problem introduced by the patch for mdev-4817.
The latter changed Item_cond::fix_fields() in such a way that it could
call the virtual method is_expensive(). With the first its call
the method saves the result in Item::is_expensive_cache. For all next
calls the method returns the result from this cache. So if the item
once was determined as expensive the method always returns true.
For subqueries it's not good, because non-optimized subqueries always
is considered as expensive.
It means that the cache should be invalidated after the call of
optimize_constant_subqueries().
Due to a legacy bug in the code of make_join_statistics() detecting
so-called constant tables could miss some of them in rare queries
that used RIGHT JOIN. As a result these queries had execution plans
different from the execution plans of the equivalent queries with
LEFT JOIN.
Besides starting from 10.2 this could trigger an assertion failure.
When the definition of the index used for hash join was created
in create_hj_key_for_table() it could cause memory overwrite
due to a bug that led to an underestimation of the number of
the index component.
In this case we are accessing incorrect memory when we have mergeable semi-joins.
In the case when we have mergeable semi joins parent select will have a table count
of all the tables in that select plus all the tables involved in the IN-subquery.
But this table count does not include the "sjm table" (only includes the inner and outer tables)
denotes as <subquery#> in explain.
materialized derived table/view that uses aliases is done
The problem appears when a column alias inside the materialized derived
table/view t1 definition coincides with the column name used in the
GROUP BY clause of t1. If the condition that can be pushed into t1
uses that ambiguous column name this column is determined as a column that
is used in the GROUP BY clause instead of the alias used in the projection
list of t1. That causes wrong result.
To prevent it resolve_ref_in_select_and_group() was changed.
For non-semi-join subquery optimization we do a cost based decision between
Materialisation and IN -> EXIST transformation. The issue in this case is that for IN->EXIST transformation
we run JOIN::reoptimize with the IN->EXISt conditions and we come up with a new query plan. But when we compare
the cost with Materialization, we make the decision to chose Materialization so we need to restore the query plan
for Materilization.
The saving and restoring for keyuse array and join_tab keyuse is only done when we have atleast
one element in the keyuse_array , we are now changing to do it even for 0 elements to main the generality.
upon SELECT .. LIMIT 0
The code must differentiate between a SELECT with contradictory
WHERE/HAVING and one with LIMIT 0.
Also for the latter printed 'Zero limit' instead of 'Impossible where'
in the EXPLAIN output.