Analysis:
The reason for the wrong result is the interaction between constant
optimization (in this case 1-row table) and subquery optimization.
- First the outer query is optimized, and 'make_join_statistics' finds that
table t2 has one row, reads that row, and marks the whole table as constant.
This also means that all fields of t2 are constant.
- Next, we optimize the subquery in the end of the outer 'make_join_statistics'.
The field 'f2' is considered constant, with value '3'. The subquery predicate
is rewritten as the constant TRUE.
- The outer query execution detects early that the whole query result is empty
and calls 'return_zero_rows'. Since the query is with implicit grouping, we
have to produce one row with special values for the aggregates (depending on
each aggregate function), and NULL values for all non-aggregate fields. This
function calls 'no_rows_in_result' to set each aggregate function to the
default value when it aggregates over an empty result, and then calls
'send_data', which in turn evaluates each Item in the SELECT list.
- When evaluation reaches the subquery predicate, it executes the subquery
with field 'f2' having a constant value '3', and the subquery produces the
incorrect result '7'.
Solution:
Implement Item::no_rows_in_result for all subquery predicates. In order to
make this work, it is also needed to make all val_* methods of all subquery
predicates respect the Item_subselect::forced_const flag. Otherwise subqueries
are executed anyways, and override the default value set by no_rows_in_result
with whatever result is produced from the subquery evaluation.
- Remove all references of MAX_TABLES from JOIN struct and make these dynamic
- Updated Join_plan_state to allocate just as many elements as it's needed
sql/opt_subselect.cc:
Optimized version of Join_plan_state
sql/sql_select.cc:
Set join->positions and join->best_positions dynamicly
Don't call update_virtual_fields() if table->vfield is not set.
sql/sql_select.h:
Remove all references of MAX_TABLES from JOIN struct and Join_plan_state and make these dynamic
- Disable use of join cache when we're using FirstMatch strategy, and the join
order is such that subquery's inner tables are interleaved with outer. Join
buffering code is incapable of handling such join orders.
- The testcase requires use of @@debug_optimizer_prefer_join_prefix to hit the bug,
but I'm pushing it anyway (including the mention of the variable in .test file),
so that it can be found and enabled when/if we get something comparable in the
main tree.
The problem was that LooseScan execution code assumed that tab->key holds
the index used for looseScan. This is only true when range or full index
scan are used. In case of ref access, the index is in tab->ref.key (and
tab->index==0 which explains how LooseScan passed tests with ref access: they
used one index)
Fixed by setting/using loosescan_key, which always the correct index#.
The function subselect_uniquesubquery_engine::copy_ref_key has to take into
account that when EXPLAIN is processed the array of store_key object created
for any TABLE_REF may contain elements for constant items. These items should
be ignored by thefunction.
Problem: When building the condition for JOIN::outer_ref_cond the optimizer forgot to take into account
that this condition could depend on constant tables as well.
- Correctly handle plan refinement stage for LooseScan plans: run create_ref_for_key() if LooseScan
plan includes a ref access, and if we don't have any fixed key components, switch to a full index scan.
If the duplicate elimination strategy is used for a semi-join and potentially
one of the block-based join algorithms can be employed to join the inner
tables of the semi-join then sorting of the head (first non-constant) table
for a query with ORDER BY / GROUP BY cannot be used.
The execution plan cannot use sorting on the first table from the
sequence of the joined tables if it plans to employ the block-based
hash join algorithm.
KEYUSE elements for a possible hash join key are not sorted by field
numbers of the second table T of the hash join operation. Besides
some of these KEYUSE elements cannot be used to build any key as their
key expressions depend on the tables that are planned to be accessed
after the table T.
The code before the patch did not take this into account and, as a result,
execition of a query the employing block-based hash join algorithm could
cause a crash or return a wrong result set.
- Make EXPLAIN display "Start temporary" at the start of the fanout (it used to display
at the first table whose rowid gets into temp. table which is not that useful for
the user)
- Updated test results (all checked)
- Break down POSITION/advance_sj_state() into four classes
representing potential semi-join strategies.
- Treat all strategies uniformly (before, DuplicateWeedout
was special as it was the catch-all strategy. Now, we're
still relying on it to be the catch-all, but are able to
function,e.g. with firstmatch=on,duplicate_weedout=off.
- Update test results (checked)
If the optimizer switch 'semijoin_with_cache' is set to 'off' then
join cache cannot be used to join inner tables of a semijoin.
Also fixed a bug in the function check_join_cache_usage() that led
to wrong output of the EXPLAIN commands for some test cases.
sql/sql_insert.cc:
CREATE ... IF NOT EXISTS may do nothing, but
it is still not a failure. don't forget to my_ok it.
******
CREATE ... IF NOT EXISTS may do nothing, but
it is still not a failure. don't forget to my_ok it.
sql/sql_table.cc:
small cleanup
******
small cleanup
- The problem was that JOIN::save/restore_query_plan() did not save/restore parts of
the query plan that are located inside SJ_MATERIALIZATION_INFO structures. This could
cause parts of one plan to be used with another, which led get_best_combination() to
constructing non-sensical join plans (and crash).
Fixed by saving/restoring SJM parts of the query plans.
- check_and_do_in_subquery_rewrites() will not set SUBS_MATERIALIZATION flag when it
records that the subquery predicate is to be converted into semi-join.
If convert_join_subqueries_to_semijoins() later decides not to convert to semi-join,
let it set SUBS_MATERIALIZATION flag, if appropriate.
- Let join buffering code correctly take into account rowids needed
by DuplicateElimination when it is calculating minimum record sizes.
- In JOIN_CACHE::write_record_data, added asserts that prevent us from
writing beyond the end of the buffer.
Analysis:
In the test query semi-join merges the inner-most subquery
into the outer subquery, and the optimization of the merged
subquery finds some new index access methods. Later the
IN-EXISTS transformation is applied to the unmerged subquery.
Since the optimizer is instructed to not consider
materialization, it reoptimizes the plan in-place to take into
account the new IN-EXISTS conditions. Just before reoptimization
JOIN::choose_subquery_plan resets the query plan, which also
resets the access methods found during the semi-join merge.
Then reoptimization discovers there are no new access methods,
but it leaves the query plan in its reset state. Later semi-join
crashes because it assumes these access methods are present.
Solution:
When reoptimizing in-place, reset the query plan only after new
access methods were discovered. If no new access methods were
discovered, leave the current plan as it was.
First code
- "Asynchronous procedure call" system
- new THD::check_killed() that serves APC request is called from within most important loops
- EXPLAIN code is now able to generate EXPLAIN output on-the-fly [incomplete]
Parts that are still missing:
- put THD::check_killed() call into every loop where we could spend significant amount of time
- Make sure EXPLAIN code works for group-by queries that replace JOIN::join_tab with make_simple_join()
and other such cases.
- User interface: what error code to use, where to get timeout settings from, etc.
The problem was that optimizer removes some outer references (it they are
constant for example) and the list of outer items built during prepare phase is
not actual during execution phase when we need it as the cache parameters.
First solution was use pointer on pointer on outer reference Item and
initialize temporary table on demand. This solved most problem except case
when optimiser also reduce Item which contains outer references ('OR' in
this bug test suite).
The solution is to build the list of outer reference items on execution
phase (after optimization) on demand (just before temporary table creation)
by walking Item tree and finding outer references among Item_ident
(Item_field/Item_ref) and Item_sum items.
Removed depends_on list (because it is not neede any mnore for the cache, in the place where it was used it replaced with upper_refs).
Added processor (collect_outer_ref_processor) and get_cache_parameters() methods to collect outer references (or other expression parameters in future).
mysql-test/r/subselect_cache.result:
A new test added.
mysql-test/r/subselect_scache.result:
Changes in creating the cache and its paremeters order or adding arguments of aggregate function (which is a parameter also, but this has no influence on the result).
mysql-test/t/subselect_cache.test:
Added a new test.
sql/item.cc:
depends_on removed.
Added processor (collect_outer_ref_processor) and get_cache_parameters() methods to collect outer references.
Item_cache_wrapper collect parameters befor initialization of its cache.
sql/item.h:
depends_on removed.
Added processor (collect_outer_ref_processor) and get_cache_parameters() methods to collect outer references.
sql/item_cmpfunc.cc:
depends_on removed.
Added processor (collect_outer_ref_processor) to collect outer references.
sql/item_cmpfunc.h:
Added processor (collect_outer_ref_processor) to collect outer references.
sql/item_subselect.cc:
depends_on removed.
Added processor get_cache_parameters() method to collect outer references.
sql/item_subselect.h:
depends_on removed.
Added processor get_cache_parameters() method to collect outer references.
sql/item_sum.cc:
Added processor (collect_outer_ref_processor) method to collect outer references.
sql/item_sum.h:
Added processor (collect_outer_ref_processor) and get_cache_parameters() methods to collect outer references.
sql/opt_range.cc:
depends_on removed.
sql/sql_base.cc:
depends_on removed.
sql/sql_class.h:
New iterator added.
sql/sql_expression_cache.cc:
Build of list of items resolved in outer query done just before creating expression cache on the first execution of the subquery which removes influence of optimizer removing items (all optimization already done).
sql/sql_expression_cache.h:
Build of list of items resolved in outer query done just before creating expression cache on the first execution of the subquery which removes influence of optimizer removing items (all optimization already done).
sql/sql_lex.cc:
depends_on removed.
sql/sql_lex.h:
depends_on removed.
sql/sql_list.h:
Added add_unique method to add only unique elements to the list.
sql/sql_select.cc:
Support of new Item list added.
sql/sql_select.h:
Support of new Item list added.
Also:
1. simplified the code of the function mysql_derived_merge_for_insert.
2. moved merge of views/dt for multi-update/delete to the prepare stage.
3. the list of the references to the candidates for semi-join now is
allocated in the statement memory.
The bitmap of used tables must be evaluated for the select list of every
materialized derived table / view and saved in a dedicated field.
This is also applied to materialized subqueries.