This patch implements correct NULL semantics for materialized subquery execution.
The implementation has the following properties and main limitations:
- It passes all query result tests, but fails a number of EXPLAIN tests because of
changed plans.
- The EXPLAIN output for partial matching is not decided yet.
- It works only when all necessary indexes fit into main memory. Notice that these
are not the general B-tree/Hash indexes, but instead much more compact ones,
therefore this limitation may not be a problem in many practical cases.
- It doesn't contain specialized tests.
- In several places the implementation uses methods that are modified copies of
other similar methods. These cases need to be refactored to avoid code duplication.
- Add a test if the predicate is top-level just before deciding on partial matching.
If it is top-level, use a more efficient exec method (index lookup).
- Add sorting of indexes according to their selectivity. The code is almost there.
- Needs more comments, and to sync existing ones with the implementation.
sql/item_cmpfunc.h:
Expose the Arg_comparator of a comparison predicate. This makes it possible to
directly get the comparison result {-1,0,1}, which is not possible through the
val_XXX() methods which "fold" such results into a boolean.
sql/item_subselect.cc:
The core of the implementation of MWL#68.
sql/item_subselect.h:
The core of the implementation of MWL#68.
sql/opt_subselect.cc:
Removed the limitation for materialized subquery execution that it is applicable only
for top-level predicates.
sql/sql_class.cc:
New class select_materialize_with_stats that collects data statistics about
the data being inserted into the target table.
sql/sql_class.h:
New class select_materialize_with_stats that collects data statistics about
the data being inserted into the target table.
sql/sql_select.cc:
- more complete initialization of the TABLE object of a temp table.
- call setup_subquery_materialization at one more exit point.
The problem is that during temporary table creation uneven bits
are not taken into account for hidden fields. It leads to incorrect
calculation&allocation of null bytes size for table record. And
if grouped value is null we set wrong bit for this value(see end_update()).
Fixed by adding separate calculation of uneven bit for hidden fields.
mysql-test/r/type_bit.result:
test case
mysql-test/t/type_bit.test:
test case
sql/sql_select.cc:
added separate calculation of uneven bit for hidden fields
The problem is that during temporary table creation uneven bits
are not taken into account for hidden fields. It leads to incorrect
calculation&allocation of null bytes size for table record. And
if grouped value is null we set wrong bit for this value(see end_update()).
Fixed by adding separate calculation of uneven bit for hidden fields.
function with distinct.
Loose index scan is used to find MIN/MAX values using appropriate index and
thus allow to avoid grouping. For each found row it updates non-aggregated
fields with values from row with found MIN/MAX value.
Without loose index scan non-aggregated fields are copied by end_send_group
function. With loose index scan there is no need in end_send_group and
end_send is used instead. Non-aggregated fields still need to be copied and
this was wrongly implemented in QUICK_GROUP_MIN_MAX_SELECT::get_next.
WL#3220 added a case when loose index scan can be used with end_send_group to
optimize calculation of aggregate functions with distinct. In this case
the row found by QUICK_GROUP_MIN_MAX_SELECT::get_next might belong to a next
group and copying it will produce wrong result.
Update of non-aggregated fields is moved to the end_send function from
QUICK_GROUP_MIN_MAX_SELECT::get_next.
mysql-test/r/group_min_max.result:
Added a test case for the bug#50539.
mysql-test/t/group_min_max.test:
Added a test case for the bug#50539.
sql/opt_range.cc:
Bug#50539: Wrong result when loose index scan is used for an aggregate
function with distinct.
Update of non-aggregated fields is moved to the end_send function from
QUICK_GROUP_MIN_MAX_SELECT::get_next.
sql/sql_select.cc:
Bug#50539: Wrong result when loose index scan is used for an aggregate
function with distinct.
Update of non-aggregated fields is moved to the end_send function from
QUICK_GROUP_MIN_MAX_SELECT::get_next.
function with distinct.
Loose index scan is used to find MIN/MAX values using appropriate index and
thus allow to avoid grouping. For each found row it updates non-aggregated
fields with values from row with found MIN/MAX value.
Without loose index scan non-aggregated fields are copied by end_send_group
function. With loose index scan there is no need in end_send_group and
end_send is used instead. Non-aggregated fields still need to be copied and
this was wrongly implemented in QUICK_GROUP_MIN_MAX_SELECT::get_next.
WL#3220 added a case when loose index scan can be used with end_send_group to
optimize calculation of aggregate functions with distinct. In this case
the row found by QUICK_GROUP_MIN_MAX_SELECT::get_next might belong to a next
group and copying it will produce wrong result.
Update of non-aggregated fields is moved to the end_send function from
QUICK_GROUP_MIN_MAX_SELECT::get_next.
Conflicts:
Text conflict in .bzr-mysql/default.conf
Text conflict in mysql-test/suite/rpl/r/rpl_slow_query_log.result
Text conflict in mysql-test/suite/rpl/t/rpl_slow_query_log.test
Conflict adding files to server-tools. Created directory.
Conflict because server-tools is not versioned, but has versioned children. Versioned directory.
Conflict adding files to server-tools/instance-manager. Created directory.
Conflict because server-tools/instance-manager is not versioned, but has versioned children. Versioned directory.
Contents conflict in server-tools/instance-manager/options.cc
Text conflict in sql/mysqld.cc
Conflicts:
Text conflict in .bzr-mysql/default.conf
Text conflict in mysql-test/suite/rpl/r/rpl_slow_query_log.result
Text conflict in mysql-test/suite/rpl/t/rpl_slow_query_log.test
Conflict adding files to server-tools. Created directory.
Conflict because server-tools is not versioned, but has versioned children. Versioned directory.
Conflict adding files to server-tools/instance-manager. Created directory.
Conflict because server-tools/instance-manager is not versioned, but has versioned children. Versioned directory.
Contents conflict in server-tools/instance-manager/options.cc
Text conflict in sql/mysqld.cc
Grouping by a subquery in a query with a distinct aggregate
function lead to a wrong result (wrong and unordered
grouping values).
There are two related problems:
1) The query like this:
SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c
FROM t1 GROUP BY aa
returned wrong result, because the outer reference "t1.a"
in the subquery was substituted with the Item_ref item.
The Item_ref item obtains data from the result_field object
that refreshes once after the end of each group. This data
is not applicable to filesort since filesort() doesn't care
about groups (and doesn't update result_field objects with
copy_fields() and so on). Also that data is not applicable
to group separation algorithm: end_send_group() checks every
record with test_if_group_changed() that evaluates Item_ref
items, but it refreshes those Item_ref-s only after the end
of group, that is a vicious circle and the grouped column
values in the output are shifted.
Fix: if
a) we grouping by a subquery and
b) that subquery has outer references to FROM list
of the grouping query,
then we substitute these outer references with
Item_direct_ref like references under aggregate
functions: Item_direct_ref obtains data directly
from the current record.
2) The query with a non-trivial grouping expression like:
SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c
FROM t1 GROUP BY aa+0
also returned wrong result, since JOIN::exec() substitutes
references to top-level aliases in SELECT list with Item_copy
caching items. Item_copy items have same refreshing policy
as Item_ref items, so the whole groping expression with
Item_copy inside returns wrong result in filesort() and
end_send_group().
Fix: include aliased items into GROUP BY item tree instead
of Item_ref references to them.
mysql-test/r/group_by.result:
Test case for bug #45640
mysql-test/t/group_by.test:
Test case for bug #45640
sql/item.cc:
Bug #45640: optimizer bug produces wrong results
Item_field::fix_fields() has been modified to resolve
aliases in GROUP BY item trees into aliased items instead
of Item_ref items.
sql/item.h:
Bug #45640: optimizer bug produces wrong results
- Item::find_item_processor() has been introduced.
- Item_ref::walk() has been modified to apply processors
to itself too (not only to referenced item).
sql/mysql_priv.h:
Bug #45640: optimizer bug produces wrong results
fix_inner_refs() has been modified to accept group_list
parameter.
sql/sql_lex.cc:
Bug #45640: optimizer bug produces wrong results
Initialization of st_select_lex::group_fix_field has
been added.
sql/sql_lex.h:
Bug #45640: optimizer bug produces wrong results
The st_select_lex::group_fix_field field has been introduced
to control alias resolution in Itef_fied::fix_fields.
sql/sql_select.cc:
Bug #45640: optimizer bug produces wrong results
- The fix_inner_refs function has been modified to treat
subquery outer references like outer fields under aggregate
functions, if they are included in GROUP BY item tree.
- The find_order_in_list function has been modified to
fix Item_field alias fields included in the GROUP BY item
trees in a special manner.
Grouping by a subquery in a query with a distinct aggregate
function lead to a wrong result (wrong and unordered
grouping values).
There are two related problems:
1) The query like this:
SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c
FROM t1 GROUP BY aa
returned wrong result, because the outer reference "t1.a"
in the subquery was substituted with the Item_ref item.
The Item_ref item obtains data from the result_field object
that refreshes once after the end of each group. This data
is not applicable to filesort since filesort() doesn't care
about groups (and doesn't update result_field objects with
copy_fields() and so on). Also that data is not applicable
to group separation algorithm: end_send_group() checks every
record with test_if_group_changed() that evaluates Item_ref
items, but it refreshes those Item_ref-s only after the end
of group, that is a vicious circle and the grouped column
values in the output are shifted.
Fix: if
a) we grouping by a subquery and
b) that subquery has outer references to FROM list
of the grouping query,
then we substitute these outer references with
Item_direct_ref like references under aggregate
functions: Item_direct_ref obtains data directly
from the current record.
2) The query with a non-trivial grouping expression like:
SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c
FROM t1 GROUP BY aa+0
also returned wrong result, since JOIN::exec() substitutes
references to top-level aliases in SELECT list with Item_copy
caching items. Item_copy items have same refreshing policy
as Item_ref items, so the whole groping expression with
Item_copy inside returns wrong result in filesort() and
end_send_group().
Fix: include aliased items into GROUP BY item tree instead
of Item_ref references to them.
mysql-test/t/disabled.def:
Restore disabled ssl tests: SSL certificates were updated.
Disable sp_sync.test, the test case can't work in next-4284.
mysql-test/t/partition_innodb.test:
Disable parsing of the test case for Bug#47343,
the test can not work in next-4284.
mysql-test/t/ps_ddl.test:
Update results (CREATE TABLE IF NOT EXISTS takes
into account existence of the temporary table).
fulltext search and row op.
The search for fulltext indexes is searching for some special
predicate layouts. While doing so it's not checking for the number
of columns of the expressions it tries to calculate.
And since row expressions can't return a single scalar value there
was a crash.
Fixed by checking if the expressions are scalar (in addition to
being constant) before calling Item::val_xxx() methods.
fulltext search and row op.
The search for fulltext indexes is searching for some special
predicate layouts. While doing so it's not checking for the number
of columns of the expressions it tries to calculate.
And since row expressions can't return a single scalar value there
was a crash.
Fixed by checking if the expressions are scalar (in addition to
being constant) before calling Item::val_xxx() methods.
Fixed 2 problems :
1. test_if_order_by_key() was continuing on the primary key
as if it has a primary key suffix (as the secondary keys do).
This leads to crashes in ORDER BY <pk>,<pk>.
Fixed by not treating the primary key as the secondary one
and not depending on it being clustered with a primary key.
2. The cost calculation was trying to read the records
per key when operating on ORDER BYs that order on all of the
secondary key + some of the primary key.
This leads to crashes because of out-of-bounds array access.
Fixed by assuming we'll find 1 record per key in such cases.