This bug is a consequence of the fix in the function add_ref_to_table_cond
for LP bug 826935 that turned out to be not quite correct: it tried to AND
the same generated condition with two different other conditions.
This patch creates a copy of the generated condition if the condition needs
to be ANDed with two different items.
Problematic query:
insert ignore into `t1_federated` (`c1`) select `c1` from `t1_local` a
where not exists (select 1 from `t1_federated` b where a.c1 = b.c1);
When this query is killed in another connection it could lead to crash.
The problem is follwing:
An attempt to obtain table statistics for subselect table in killed query
fails with an error. So JOIN::optimize() for subquery is failed but
it does not prevent further subquery evaluation.
At the first subquery execution JOIN::optimize() is called
(see subselect_single_select_engine::exec()) and fails with
an error. 'executed' flag is set to TRUE and it prevents
further subquery evaluation. At the second call
JOIN::optimize() does not happen as 'JOIN::optimized' is TRUE
and in case of uncacheable subquery the 'executed' flag is set
to FALSE before subquery evaluation. So we loose 'optimize stage'
error indication (see subselect_single_select_engine::exec()).
In other words 'executed' flag is used for two purposes, for
error indication at JOIN::optimize() stage and for an
indication of subquery execution. And it seems it's wrong
as the flag could be reset.
mysql-test/r/error_simulation.result:
test case
mysql-test/t/error_simulation.test:
test case
sql/item_subselect.cc:
added new flag subselect_single_select_engine::optimize_error
which is used for error detection which could happen at optimize
stage.
sql/item_subselect.h:
added new flag subselect_single_select_engine::optimize_error
sql/sql_select.cc:
test case
Problematic query:
insert ignore into `t1_federated` (`c1`) select `c1` from `t1_local` a
where not exists (select 1 from `t1_federated` b where a.c1 = b.c1);
When this query is killed in another connection it could lead to crash.
The problem is follwing:
An attempt to obtain table statistics for subselect table in killed query
fails with an error. So JOIN::optimize() for subquery is failed but
it does not prevent further subquery evaluation.
At the first subquery execution JOIN::optimize() is called
(see subselect_single_select_engine::exec()) and fails with
an error. 'executed' flag is set to TRUE and it prevents
further subquery evaluation. At the second call
JOIN::optimize() does not happen as 'JOIN::optimized' is TRUE
and in case of uncacheable subquery the 'executed' flag is set
to FALSE before subquery evaluation. So we loose 'optimize stage'
error indication (see subselect_single_select_engine::exec()).
In other words 'executed' flag is used for two purposes, for
error indication at JOIN::optimize() stage and for an
indication of subquery execution. And it seems it's wrong
as the flag could be reset.
in the function best_access_path revealed another bug: currently
table scans on NULL keys used for NOT IN subqueries cannot work
together with employment of join caches for inner tables of these
subqueries. Otherwise the result can be wrong as it could be seen
with the result of the test case constructed for bug #37894
in the file subselect3_jcl6.result.
of the 5.3 code line after a merge with 5.2 on 2010-10-28
in order not to allow the cost to access a joined table to be equal
to 0 ever.
Expanded data sets for many test cases to get the same execution plans
as before.
- The problem was that JOIN::save/restore_query_plan() did not save/restore parts of
the query plan that are located inside SJ_MATERIALIZATION_INFO structures. This could
cause parts of one plan to be used with another, which led get_best_combination() to
constructing non-sensical join plans (and crash).
Fixed by saving/restoring SJM parts of the query plans.
- check_and_do_in_subquery_rewrites() will not set SUBS_MATERIALIZATION flag when it
records that the subquery predicate is to be converted into semi-join.
If convert_join_subqueries_to_semijoins() later decides not to convert to semi-join,
let it set SUBS_MATERIALIZATION flag, if appropriate.
- are_tables_local() failed to recognize the fact that OUTER_REF_TABLE_BIT is ok
for SJ-Materialization. This caused zero-length ref access to be constructed, which
led to an assert.
- Implement new approach to testing (the DBUG_EXECUTE_IF variant)
- add an 'evalp' mysqltest command that is like 'eval' except that
it prints the original query.
- Fix select_describe() not to change join_tab[i]->type
- More tests
- Let join buffering code correctly take into account rowids needed
by DuplicateElimination when it is calculating minimum record sizes.
- In JOIN_CACHE::write_record_data, added asserts that prevent us from
writing beyond the end of the buffer.
For any query JOIN::optimize() should call the method
SELECT::save_leaf_tables after the last transformation
that utilizes the statement memory rather than the
execution memory.
sql/item_subselect.cc:
Added check of error condtions (safety)
sql/sql_join_cache.cc:
Added DBUG to some functions.
Added error checking for calls to check_match(); This fixed the bug.
sql/sql_select.cc:
Moved variable assignment to be close to where it's used (cleanup)
Analysis:
In the test query semi-join merges the inner-most subquery
into the outer subquery, and the optimization of the merged
subquery finds some new index access methods. Later the
IN-EXISTS transformation is applied to the unmerged subquery.
Since the optimizer is instructed to not consider
materialization, it reoptimizes the plan in-place to take into
account the new IN-EXISTS conditions. Just before reoptimization
JOIN::choose_subquery_plan resets the query plan, which also
resets the access methods found during the semi-join merge.
Then reoptimization discovers there are no new access methods,
but it leaves the query plan in its reset state. Later semi-join
crashes because it assumes these access methods are present.
Solution:
When reoptimizing in-place, reset the query plan only after new
access methods were discovered. If no new access methods were
discovered, leave the current plan as it was.
- Fix st_select_lex::set_explain_type() to allow producing exactly the
same EXPLAINs as it did before. SHOW EXPLAIN output may produce
select_type=SIMPLE instead or select_type=PRIMARY or vice versa (which
is ok because values of select_type weren't self-consistent in this
regard to begin with)
Analysis:
Constant table optimization of the outer query finds that
the right side of the equality is a constant that can
be used for an eq_ref access to fetch one row from t1,
and substitute t1 with a constant. Thus constant optimization
triggers evaluation of the subquery during the optimize
phase of the outer query.
The innermost subquery requires a plan with a temporary
table because with InnoDB tables the exact count of rows
is not known, and the empty tables cannot be optimzied
way. JOIN::exec for the innermost subquery substitutes
the subquery tables with a temporary table.
When EXPLAIN gets to print the tables in the innermost
subquery, EXPLAIN needs to print the name of each table
through the corresponding TABLE_LIST object. However,
the temporary table created during execution doesn't
have a corresponding TABLE_LIST, so we get a null
pointer exception.
Solution:
The solution is to forbid using expensive constant
expressions for eq_ref access for contant table
optimization. Notice that eq_ref with a subquery
providing the value is still possible during regular
execution.
First code
- "Asynchronous procedure call" system
- new THD::check_killed() that serves APC request is called from within most important loops
- EXPLAIN code is now able to generate EXPLAIN output on-the-fly [incomplete]
Parts that are still missing:
- put THD::check_killed() call into every loop where we could spend significant amount of time
- Make sure EXPLAIN code works for group-by queries that replace JOIN::join_tab with make_simple_join()
and other such cases.
- User interface: what error code to use, where to get timeout settings from, etc.
When the WHERE/HAVING condition of a subquery has been transformed
by the optimizer the pointer stored the 'where'/'having' field
of the SELECT_LEX structure used for the subquery must be updated
accordingly. Otherwise the pointer may refer to an invalid item.
This can lead to the reported assertion failure for some queries
with correlated subqueries
- add_ref_to_table_cond() should not just overwrite pre_idx_push_select_cond
with the contents tab->select_cond.
pre_idx_push_select_cond exists precisely for the reason that it may contain
a condition that is a strict superset of what is in tab->select_cond.
The fix is to inject generated equality into pre_idx_push_select_cond.
- Make simplify_joins() set maybe_null=FALSE for tables that were on the
inner sides of inner joins and then were moved to the inner sides of semi-joins.
This bug is a special case of lp:813447.
Analysis:
Constant optimization finds that the condition t2.a = 1
can be used to access the primary key of table 't2'. As
a result both outer table t1,t2 are considered as constant
when we reach the execution phase. At the same time, during
constant optimization, the IN predicate is not evaluated
because it is expensive.
When execution of the outer query reaches do_select(),
control flow enter the branch:
if (join->table_count == join->const_tables)
{ ... }
This branch checks only the WHERE and HAVING clauses,
but doesn't check the ON clauses of the query. Since the
IN predicate was not evaluated during optimization, it is
not evaluated at all, thus execution doesn't detect that
the ON clause is FALSE.
Solution:
Similar to the patch for bug lp:813447, exclude system
tables from constant substitution based on unique key
lookups if there is an expensive ON condition on the
inner table.
- create_ref_for_key() has the code that walks KEYUSE array and tries to use
maximum number of keyparts for ref (and eq_ref and ref_or_null) access.
When one constructs ref access for table that is inside a SJ-Materialization
nest, it is not possible to use tables that are ouside the nest (because
materialization is performed before they have any "current value").
The bug was caused by this function not taking this into account.
There is an optimization of DISTINCT in JOIN::optimize()
which depends on THD::used_tables value. Each SELECT statement
inside SP resets used_tables value(see mysql_select()) and it
leads to wrong result. The fix is to replace THD::used_tables
with LEX::used_tables.
mysql-test/r/sp.result:
test case
mysql-test/t/sp.test:
test case
sql/sql_base.cc:
THD::used_tables is replaced with LEX::used_tables
sql/sql_class.cc:
THD::used_tables is replaced with LEX::used_tables
sql/sql_class.h:
THD::used_tables is replaced with LEX::used_tables
sql/sql_insert.cc:
THD::used_tables is replaced with LEX::used_tables
sql/sql_lex.cc:
THD::used_tables is replaced with LEX::used_tables
sql/sql_lex.h:
THD::used_tables is replaced with LEX::used_tables
sql/sql_prepare.cc:
THD::used_tables is replaced with LEX::used_tables
sql/sql_select.cc:
THD::used_tables is replaced with LEX::used_tables
There is an optimization of DISTINCT in JOIN::optimize()
which depends on THD::used_tables value. Each SELECT statement
inside SP resets used_tables value(see mysql_select()) and it
leads to wrong result. The fix is to replace THD::used_tables
with LEX::used_tables.
This problem could be observed for queries with nested outer joins
for which the not_exist optimization were applicable.
The problem was caused by the code of the patch for bug #49322
that erroneously forced the return to the previous nested loop
level when the join algorithm successfully builds a partial record
for an embedded outer to which the not_exist optimization could be
applied.
Actually the immediate return to the previous nested loops level
is correct only if this partial record is rejected by a predicate
pushed down to one of the inner tables of this outer join. Otherwise
attempts to find extensions of this record must be made.
An aggregating query over an empty set of a join of two tables
with a rejecting HAVING clause erroneously could return a row.
It could happen in the cases when the optimizer made a conclusion
that the aggregating set was empty.
Wrong results were produced because the server missed initial
setting for aggregation functions in the mentioned cases.
reset_nj_counters() used to rely on the fact that join nests have
table->table==NULL. This ceased to be true wit new derived table
optimizations. Use test for table->nested_join!=NULL instead.
The problem was that optimizer removes some outer references (it they are
constant for example) and the list of outer items built during prepare phase is
not actual during execution phase when we need it as the cache parameters.
First solution was use pointer on pointer on outer reference Item and
initialize temporary table on demand. This solved most problem except case
when optimiser also reduce Item which contains outer references ('OR' in
this bug test suite).
The solution is to build the list of outer reference items on execution
phase (after optimization) on demand (just before temporary table creation)
by walking Item tree and finding outer references among Item_ident
(Item_field/Item_ref) and Item_sum items.
Removed depends_on list (because it is not neede any mnore for the cache, in the place where it was used it replaced with upper_refs).
Added processor (collect_outer_ref_processor) and get_cache_parameters() methods to collect outer references (or other expression parameters in future).
mysql-test/r/subselect_cache.result:
A new test added.
mysql-test/r/subselect_scache.result:
Changes in creating the cache and its paremeters order or adding arguments of aggregate function (which is a parameter also, but this has no influence on the result).
mysql-test/t/subselect_cache.test:
Added a new test.
sql/item.cc:
depends_on removed.
Added processor (collect_outer_ref_processor) and get_cache_parameters() methods to collect outer references.
Item_cache_wrapper collect parameters befor initialization of its cache.
sql/item.h:
depends_on removed.
Added processor (collect_outer_ref_processor) and get_cache_parameters() methods to collect outer references.
sql/item_cmpfunc.cc:
depends_on removed.
Added processor (collect_outer_ref_processor) to collect outer references.
sql/item_cmpfunc.h:
Added processor (collect_outer_ref_processor) to collect outer references.
sql/item_subselect.cc:
depends_on removed.
Added processor get_cache_parameters() method to collect outer references.
sql/item_subselect.h:
depends_on removed.
Added processor get_cache_parameters() method to collect outer references.
sql/item_sum.cc:
Added processor (collect_outer_ref_processor) method to collect outer references.
sql/item_sum.h:
Added processor (collect_outer_ref_processor) and get_cache_parameters() methods to collect outer references.
sql/opt_range.cc:
depends_on removed.
sql/sql_base.cc:
depends_on removed.
sql/sql_class.h:
New iterator added.
sql/sql_expression_cache.cc:
Build of list of items resolved in outer query done just before creating expression cache on the first execution of the subquery which removes influence of optimizer removing items (all optimization already done).
sql/sql_expression_cache.h:
Build of list of items resolved in outer query done just before creating expression cache on the first execution of the subquery which removes influence of optimizer removing items (all optimization already done).
sql/sql_lex.cc:
depends_on removed.
sql/sql_lex.h:
depends_on removed.
sql/sql_list.h:
Added add_unique method to add only unique elements to the list.
sql/sql_select.cc:
Support of new Item list added.
sql/sql_select.h:
Support of new Item list added.
This bug could lead to wrong result sets for a query over a
materialized derived table or view accessed by a multi-component
key.
It happened because the function get_next_field_for_derived_key
was supposed to update its argument, and it did not do it.
Also:
1. simplified the code of the function mysql_derived_merge_for_insert.
2. moved merge of views/dt for multi-update/delete to the prepare stage.
3. the list of the references to the candidates for semi-join now is
allocated in the statement memory.