When working on MWL#247 I forgot to adjust the function create_hj_key_for_table()
that created a key definition for hash join keys. The modified function must
set the values of the fields ext_key_parts, ext_key_flags, ext_key_part_map
added to the key definition structure in MWL#247.
- Disable use of join cache when we're using FirstMatch strategy, and the join
order is such that subquery's inner tables are interleaved with outer. Join
buffering code is incapable of handling such join orders.
- The testcase requires use of @@debug_optimizer_prefer_join_prefix to hit the bug,
but I'm pushing it anyway (including the mention of the variable in .test file),
so that it can be found and enabled when/if we get something comparable in the
main tree.
The problem was that LooseScan execution code assumed that tab->key holds
the index used for looseScan. This is only true when range or full index
scan are used. In case of ref access, the index is in tab->ref.key (and
tab->index==0 which explains how LooseScan passed tests with ref access: they
used one index)
Fixed by setting/using loosescan_key, which always the correct index#.
- equality substitution code was geared towards processing WHERE/ON clauses.
that is, it assumed that it was doing substitions on the code that
= wasn't attached to any particular join_tab yet
= was going to be fed to make_join_select() which would take the condition
apart and attach various parts of it to tables inside/outside semi-joins.
- However, somebody added equality substition for ref access. That is, if
we have a ref access on TBL.key=expr, they would do equality substition in
'expr'. This possibility wasn't accounted for.
- Fixed equality substition code by adding a mode that does equality
substition under assumption that the processed expression will be
attached to a certain particular table TBL.
If the expression for a derived table of a query contained a LIMIT
clause the estimate of the number of rows in this derived table
returned by the EXPLAIN command could be badly off since the
optimizer ignored the limit number from the LIMIT clause when
getting the estimate.
The call of the method SELECT_LEX_UNIT->set_limit added in the code
of mysql_derived_optimize() will be needed also in maria-5.5 where
parameters in the LIMIT clause are supported.
The patch for MWL #247 forgot to initialize the TABLE::ext_key_parts and
TABLE::ext_key_flags of the temporary tables by a query. This could cause
crashes for queries the execution of which needed creation of temporary
tables.
Problem: When building the condition for JOIN::outer_ref_cond the optimizer forgot to take into account
that this condition could depend on constant tables as well.
- Create/use do_copy_nullable_row_to_notnull() function for ref access, which is used
when copying from not-NULL field in table that can be NULL-complemented to not-NULL field.
Slightly corrected the implementation of the function ha_innobase::read_time().
Changed the implementation of handler::keyread_time to make the cost of single key
index only look-ups dependent on the key entry length.
Corrected the index of the last possible components of an extended key in the
function best_access_path().
fixed several defects in the greedy optimization:
1) The greedy optimizer calculated the 'compare-cost' (CPU-cost)
for iterating over the partial plan result at each level in
the query plan as 'record_count / (double) TIME_FOR_COMPARE'
This cost was only used locally for 'best' calculation at each
level, and *not* accumulated into the total cost for the query plan.
This fix added the 'CPU-cost' of processing 'current_record_count'
records at each level to 'current_read_time' *before* it is used as
'accumulated cost' argument to recursive
best_extension_by_limited_search() calls. This ensured that the
cost of a huge join-fanout early in the QEP was correctly
reflected in the cost of the final QEP.
To get identical cost for a 'best' optimized query and a
straight_join with the same join order, the same change was also
applied to optimize_straight_join() and get_partial_join_cost()
2) Furthermore to get equal cost for 'best' optimized query and a
straight_join the new code substrcated the same '0.001' in
optimize_straight_join() as it had been already done in
best_extension_by_limited_search()
3) When best_extension_by_limited_search() aggregated the 'best' plan a
plan was 'best' by the check :
'if ((search_depth == 1) || (current_read_time < join->best_read))'
The term '(search_depth == 1' incorrectly caused a new best plan to be
collected whenever the specified 'search_depth' was reached - even if
this partial query plan was more expensive than what we had already
found.
If the sorted table belongs to a dependent subquery then the function
create_sort_index() should not clear TABLE:: select and TABLE::select
for this table after the sort of the table has been performed, because
these members are needed for the second execution of the subquery.
The patch differs from the original MySQL patch as follows:
- All test case differences have been reviewed one by one, and
care has been taken to restore the original plan so that each
test case executes the code path it was designed for.
- A bug was found and fixed in MariaDB 5.3 in
Item_allany_subselect::cleanup().
- ORDER BY is not removed because we are unsure of all effects,
and it would prevent enabling ORDER BY ... LIMIT subqueries.
- ref_pointer_array.m_size is not adjusted because we don't do
array bounds checking, and because it looks risky.
Original comment by Jorgen Loland:
-------------------------------------------------------------
WL#5953 - Optimize away useless subquery clauses
For IN/ALL/ANY/SOME/EXISTS subqueries, the following clauses are
meaningless:
* ORDER BY (since we don't support LIMIT in these subqueries)
* DISTINCT
* GROUP BY if there is no HAVING clause and no aggregate
functions
This WL detects and optimizes away these useless parts of the
query during JOIN::prepare()
- Correctly handle plan refinement stage for LooseScan plans: run create_ref_for_key() if LooseScan
plan includes a ref access, and if we don't have any fixed key components, switch to a full index scan.
- Let JTBM optimization code handle the case where the subquery is degenerate and doesn't have a
join query plan. Regular materialization would fall back to IN->EXISTS for such cases. Semi-Join
materialization does not have such option, instead we introduce and use "constant JTBM join tabs".
- Do a "more thorough" cleanup of SJ-Materialization join tab in JOIN_TAB::cleanup. The bug
was due to the fact that JOIN_TAB::cleanup() may be called multiple times for the same tab
if the join has grouping.
If the duplicate elimination strategy is used for a semi-join and potentially
one of the block-based join algorithms can be employed to join the inner
tables of the semi-join then sorting of the head (first non-constant) table
for a query with ORDER BY / GROUP BY cannot be used.
The execution plan cannot use sorting on the first table from the
sequence of the joined tables if it plans to employ the block-based
hash join algorithm.
- Part 1 of the fix: for semi-join merged subqueries, calling child_join->optimize() until we're done with all
PS-lifetime optimizations in the parent.
- Make create_tmp_table() set KEY_PART_INFO attributes for the keys it creates.
This wasn't needed before but is needed now, when temp. tables that are
results of SJ-Materialization are being used for joins.
This particular bug depended on HA_VAR_LENGTH_PART being set,
but also added code to set HA_BLOB_PART and HA_NULL_PART when appropriate.
KEYUSE elements for a possible hash join key are not sorted by field
numbers of the second table T of the hash join operation. Besides
some of these KEYUSE elements cannot be used to build any key as their
key expressions depend on the tables that are planned to be accessed
after the table T.
The code before the patch did not take this into account and, as a result,
execition of a query the employing block-based hash join algorithm could
cause a crash or return a wrong result set.
in EXPLAIN as select_type==MATERIALIZED.
Before, we had select_type==SUBQUERY and it was difficult to tell materialized
subqueries from uncorrelated scalar-context subqueries.
If has been decided that the first match strategy is to be used to join table T
from a semi-join nest while no buffer can be employed to join this table
then no join buffer can be used to join any table in the join sequence between
the first one belonging to the semi-join nest and table T.
The tables from the same semi-join or outer join nest cannot use
join buffers if in the join sequence of the query execution plan
they are separated by a table that is planned to be joined without
usage of a join buffer.
Analysis:
lp:894397 was a consequence of a prior incorrect fix of lp:833777
which didn't take into account that even when all tables are
constant there may be correlated conditions, and the where clause
is not equivalent to the constant conditions.
Solution:
When there are constant tables only, evaluate only the conditions
that reference outer fields, because the constant conditions are
already checked, and the where clause doesn't have other conditions
than constant ones, and outer referencing ones. The fix for
lp:894397 also fixes lp:833777.