In this case we are setting the field Item_func_eq::in_eqaulity_no for the semi-join equalities.
This helps us to remove these equalites as the inner tables are not available during parent select execution
while the outer tables are not available during materialization phase.
We only have it set for the equalites for the fields involved with the IN subquery
and reset it for the equalities which do not belong to the IN subquery.
For example in case of nested IN subqueries:
SELECT t1.a FROM t1 WHERE t1.a IN
(SELECT t2.a FROM t2 where t2.b IN
(select t3.b from t3 where t3.c=27 ))
there are two equalites involving the fields of the IN subquery
1) t2.b = t3.b : the field Item_func_eq::in_eqaulity_no is set when we merge the grandchild select into the child select
2) t1.a = t2.a : the field Item_func_eq::in_eqaulity_no is set when we merge the child select into the parent select
But when we perform case 2) we should ensure that we reset the equalities in the child's WHERE clause.
with join_cache_level>2
During muliple equality propagation for a query in which we have an IN subquery, the items in the select list of the
subquery may not be part of the multiple equality because there might be another occurence of the same field in the
where clause of the subquery.
So we keyuse_is_valid_for_access_in_chosen_plan function which expects the items in the select list of the subquery to
be same to the ones in the multiple equality (through these multiple equalities we create keyuse array).
The solution would be that we expect the same field not the same Item because when we have SEMI JOIN MATERIALIZATION SCAN,
we use copy back technique to copies back the materialised table fields to the original fields of the base tables.
The problem described in the bug report happened because the code
did not test check_cols(1) after fix_fields() in a few places.
Additionally, fix_fields() could be called multiple times for SP variables,
because they are all fixed at a early stage in append_for_log().
Solution:
1. Adding a few helper methods
- fix_fields_if_needed()
- fix_fields_if_needed_for_scalar()
- fix_fields_if_needed_for_bool()
- fix_fields_if_needed_for_order_by()
and using it in many cases instead of fix_fields() where
the "fixed" status is not definitely known to be "false".
2. Adding DBUG_ASSERT(!fixed) into Item_splocal*::fix_fields()
to catch double execution.
3. Adding tests.
As a good side effect, the patch removes a lot of duplicate code (~60 lines):
if (!item->fixed &&
item->fix_fields(..) &&
item->check_cols(1))
return true;
The logic and the implementation scheme are similar with the
MDEV-9197 Pushdown conditions into non-mergeable views/derived tables
How the push down is made on the example:
select * from t1
where a>3 and b>10 and
(a,b) in (select x,max(y) from t2 group by x);
-->
select * from t1
where a>3 and b>10 and
(a,b) in (select x,max(y)
from t2
where x>3
group by x
having max(y)>10);
The implementation scheme:
1. Search for the condition cond that depends only on the fields
from the left part of the IN subquery (left_part)
2. Find fields F_group in the select of the right part of the
IN subquery (right_part) that are used in the GROUP BY
3. Extract from the cond condition cond_where that depends only on the
fields from the left_part that stay at the same places in the left_part
(have the same indexes) as the F_group fields in the projection of the
right_part
4. Transform cond_where so it can be pushed into the WHERE clause of the
right_part and delete cond_where from the cond
5. Transform cond so it can be pushed into the HAVING clause of the right_part
The optimization is made in the
Item_in_subselect::pushdown_cond_for_in_subquery() and is controlled by the
variable condition_pushdown_for_subquery.
New test file in_subq_cond_pushdown.test is created.
There are also some changes made for setup_jtbm_semi_joins().
Now it is decomposed into the 2 procedures: setup_degenerate_jtbm_semi_joins()
that is called before optimize_cond() for cond and setup_jtbm_semi_joins()
that is called after optimize_cond().
New setup_jtbm_semi_joins() is made in the way so that the result of its work is
the same as if it was called before optimize_cond().
The code that is common for pushdown into materialized derived and into materialized
IN subqueries is factored out into pushdown_cond_for_derived(),
Item_in_subselect::pushdown_cond_for_in_subquery() and
st_select_lex::pushdown_cond_into_where_clause().
work in the IN subqueries
The pushdown into the materialized derived table/view wasn't done because
optimize() for the derived was called before any conditions that can
be pushed down were extracted. So optimize() in
convert_join_subqueries_to_semijoins() method is called too early and is
unnecessary. The second optimize() call in mysql_handle_single_derived()
is enough.
The problem was that SJ (semi-join) used secondary list (array) of subquery select list. The items there was prepared once then cleaned up (but not really freed from memory because it was made in statement memory).
Original list was not prepared after first execution because select was removed by conversion to SJ.
The solution is to use original list but prepare it first.
Any expensive WHERE condition for a table-less query with
implicit aggregation was lost. As a result the used aggregate
functions were calculated over a non-empty set of rows even
in the case when the condition was false.
Conversion of a subquery to a semi-join is blocked when we have an
IN subquery predicate in the on_expr of an outer join. Currently this
scenario is handled but the cases when an IN subquery predicate is wrapped
inside a Item_in_optimizer item then this blocking is not done.
This will make it easier to how memory allocation is done when debugging
with either DBUG or gdb.
Will especially help when debugging stored procedures
Main change is a name argument as second argument to init_alloc_root()
init_sql_alloc()
Other things:
- Added DBUG_ENTER/EXIT to some Virtual_tmp_table functions
This was done in, among other things:
- thd->db and thd->db_length
- TABLE_LIST tablename, db, alias and schema_name
- Audit plugin database name
- lex->db
- All db and table names in Alter_table_ctx
- st_select_lex db
Other things:
- Changed a lot of functions to take const LEX_CSTRING* as argument
for db, table_name and alias. See init_one_table() as an example.
- Changed some function arguments from LEX_CSTRING to const LEX_CSTRING
- Changed some lists from LEX_STRING to LEX_CSTRING
- threads_mysql.result changed because process list_db wasn't always
correctly updated
- New append_identifier() function that takes LEX_CSTRING* as arguments
- Added new element tmp_buff to Alter_table_ctx to separate temp name
handling from temporary space
- Ensure we store the length after my_casedn_str() of table/db names
- Removed not used version of rename_table_in_stat_tables()
- Changed Natural_join_column::table_name and db_name() to never return
NULL (used for print)
- thd->get_db() now returns db as a printable string (thd->db.str or "")