Root cause for this bug is that the optimizer try to detect&
optimize the special case:
'<field> BETWEEN c1 AND c1' and handle this as the condition '<field> = c1'
This was implemented inside add_key_field(.. *field, *value[]...)
which assumed field to refer key Field, and value[] to refer a [low...high]
constant pair. value[0] and value[1] was then compared for equality.
In a 'normal' BETWEEN condition of the form '<field> BETWEEN val1 and val2' the
BETWEEN operation is represented with an argementlist containing the
values [<field>, val1, val2] - add_key_field() is then called with
parameters field=<field>, *value=val1.
However, if the BETWEEN predicate specified:
1) '<const1> BETWEEN<const2> AND<field>
the 'field' and 'value' arguments to add_key_field() had to be swapped.
This was implemented by trying to cheat add_key_field() to handle it like:
2) '<const1> GE<const2> AND<const1> LE<field>'
As we didn't really replace the BETWEEN operation with 'ge' and 'le',
add_key_field() still handled it as a 'BETWEEN' and compared the (swapped)
arguments<const1> and<const2> for equality. If they was equal, the
condition 1) was incorrectly 'optimized' to:
3) '<field> EQ <const1>'
This fix moves this optimization of '<field> BETWEEN c1 AND c1' into
add_key_fields() which then calls add_key_equal_fields() to collect
key equality / comparison for the key fields in the BETWEEN condition.
Root cause for this bug is that the optimizer try to detect&
optimize the special case:
'<field> BETWEEN c1 AND c1' and handle this as the condition '<field> = c1'
This was implemented inside add_key_field(.. *field, *value[]...)
which assumed field to refer key Field, and value[] to refer a [low...high]
constant pair. value[0] and value[1] was then compared for equality.
In a 'normal' BETWEEN condition of the form '<field> BETWEEN val1 and val2' the
BETWEEN operation is represented with an argementlist containing the
values [<field>, val1, val2] - add_key_field() is then called with
parameters field=<field>, *value=val1.
However, if the BETWEEN predicate specified:
1) '<const1> BETWEEN<const2> AND<field>
the 'field' and 'value' arguments to add_key_field() had to be swapped.
This was implemented by trying to cheat add_key_field() to handle it like:
2) '<const1> GE<const2> AND<const1> LE<field>'
As we didn't really replace the BETWEEN operation with 'ge' and 'le',
add_key_field() still handled it as a 'BETWEEN' and compared the (swapped)
arguments<const1> and<const2> for equality. If they was equal, the
condition 1) was incorrectly 'optimized' to:
3) '<field> EQ <const1>'
This fix moves this optimization of '<field> BETWEEN c1 AND c1' into
add_key_fields() which then calls add_key_equal_fields() to collect
key equality / comparison for the key fields in the BETWEEN condition.
Queries involving predicates of the form "const NOT BETWEEN
not_indexed_column AND indexed_column" could return wrong data
due to incorrect handling by the range optimizer.
For "c NOT BETWEEN f1 AND f2" predicates, get_mm_tree()
produces a disjunction of the SEL_ARG trees for "f1 > c" and
"f2 < c". If one of the trees is empty (i.e. one of the
arguments is not sargable) the resulting tree should be empty
as well, since the whole expression in this case is not
sargable.
The above logic is implemented in get_mm_tree() as follows. The
initial state of the resulting tree is NULL (aka empty). We
then iterate through arguments and compute the corresponding
SEL_ARG tree (either "f1 > c" or "f2 < c"). If the resulting
tree is NULL, it is simply replaced by the generated
tree. Otherwise it is replaced by a disjunction of itself and
the generated tree. The obvious flaw in this implementation is
that if the first argument is not sargable and thus produces a
NULL tree, the resulting tree will simply be replaced by the
tree for the second argument. As a result, "c NOT BETWEEN f1
AND f2" will end up as just "f2 < c".
Fixed by adding a check so that when the first argument
produces an empty tree for the NOT BETWEEN case, the loop is
aborted with an empty tree as a result. The whole idea of using
a loop for 2 arguments does not make much sense, but it was
probably used to avoid code duplication for several BETWEEN
variants.
Queries involving predicates of the form "const NOT BETWEEN
not_indexed_column AND indexed_column" could return wrong data
due to incorrect handling by the range optimizer.
For "c NOT BETWEEN f1 AND f2" predicates, get_mm_tree()
produces a disjunction of the SEL_ARG trees for "f1 > c" and
"f2 < c". If one of the trees is empty (i.e. one of the
arguments is not sargable) the resulting tree should be empty
as well, since the whole expression in this case is not
sargable.
The above logic is implemented in get_mm_tree() as follows. The
initial state of the resulting tree is NULL (aka empty). We
then iterate through arguments and compute the corresponding
SEL_ARG tree (either "f1 > c" or "f2 < c"). If the resulting
tree is NULL, it is simply replaced by the generated
tree. Otherwise it is replaced by a disjunction of itself and
the generated tree. The obvious flaw in this implementation is
that if the first argument is not sargable and thus produces a
NULL tree, the resulting tree will simply be replaced by the
tree for the second argument. As a result, "c NOT BETWEEN f1
AND f2" will end up as just "f2 < c".
Fixed by adding a check so that when the first argument
produces an empty tree for the NOT BETWEEN case, the loop is
aborted with an empty tree as a result. The whole idea of using
a loop for 2 arguments does not make much sense, but it was
probably used to avoid code duplication for several BETWEEN
variants.
feature
The test for bug no 50939 was put in range.test which isn't such a good idea
since it requires partitioning. Fixed by moving the test case to
partitioning_range.test.
feature
The test for bug no 50939 was put in range.test which isn't such a good idea
since it requires partitioning. Fixed by moving the test case to
partitioning_range.test.
remember range endpoints
The Loose Index Scan optimization keeps track of a sequence
of intervals. For the current interval it maintains the
current interval's endpoints. But the maximum endpoint was
not stored in the SQL layer; rather, it relied on the
storage engine to retain this value in-between reads. By
coincidence this holds for MyISAM and InnoDB. Not for the
partitioning engine, however.
Fixed by making the key values iterator
(QUICK_RANGE_SELECT) keep track of the current maximum endpoint.
This is also more efficient as we save a call through the
handler API in case of open-ended intervals.
The code to calculate endpoints was extracted into
separate methods in QUICK_RANGE_SELECT, and it was possible to
get rid of some code duplication as part of fix.
remember range endpoints
The Loose Index Scan optimization keeps track of a sequence
of intervals. For the current interval it maintains the
current interval's endpoints. But the maximum endpoint was
not stored in the SQL layer; rather, it relied on the
storage engine to retain this value in-between reads. By
coincidence this holds for MyISAM and InnoDB. Not for the
partitioning engine, however.
Fixed by making the key values iterator
(QUICK_RANGE_SELECT) keep track of the current maximum endpoint.
This is also more efficient as we save a call through the
handler API in case of open-ended intervals.
The code to calculate endpoints was extracted into
separate methods in QUICK_RANGE_SELECT, and it was possible to
get rid of some code duplication as part of fix.
WL#2474 "Multi Range Read: Change the default MRR implementation to implement new MRR interface"
WL#2475 "Batched range read functions for MyISAM/InnoDb"
"Index condition pushdown for MyISAM/InnoDB"
Igor's fix from sp1r-igor@olga.mysql.com-20080330055902-07614:
There could be observed the following problems:
1. EXPLAIN did not mention pushdown conditions from on expressions in the
'extra' column. As a result if a query had no where conditions pushed
down to a table, but had on conditions pushed to this table the 'extra'
column in the EXPLAIN for the table missed 'using where'.
2. Conditions for ref access were not eliminated from on expressions
though such conditions were eliminated from the where condition.
for each record'
There was an error in an internal structure in the range
optimizer (SEL_ARG). Bad design causes parts of a data
structure not to be initialized when it is in a certain
state. All client code must check that this state is not
present before trying to access the structure's data. Fixed
by
- Checking the state before trying to access data (in
several places, most of which not covered by test case.)
- Copying the keypart id when cloning SEL_ARGs
mysql-test/r/range.result:
Bug#48459: Test result.
mysql-test/t/range.test:
Bug#48459: Test case.
sql/opt_range.cc:
Bug#48459: Fix + doxygenated count_key_part_usage comment.
for each record'
There was an error in an internal structure in the range
optimizer (SEL_ARG). Bad design causes parts of a data
structure not to be initialized when it is in a certain
state. All client code must check that this state is not
present before trying to access the structure's data. Fixed
by
- Checking the state before trying to access data (in
several places, most of which not covered by test case.)
- Copying the keypart id when cloning SEL_ARGs
When merging ranges during calculation of the result of OR
to two range sets the current range may be obsoleted by the
resulting merged range.
The first overlapping range can be obsoleted as well.
Fixed by moving the pointer to the first overlapping range to the
pointer of the resulting union range.
Added few comments at key places in key_or().
When merging ranges during calculation of the result of OR
to two range sets the current range may be obsoleted by the
resulting merged range.
The first overlapping range can be obsoleted as well.
Fixed by moving the pointer to the first overlapping range to the
pointer of the resulting union range.
Added few comments at key places in key_or().
When a query was using a DATE or DATETIME value formatted
using any other separator characters beside hyphen '-', a
query with a greater-or-equal '>=' condition matching only
the greatest value in an indexed column, the result was
empty if index range scan was employed.
The range optimizer got a new feature between 5.1.38 and
5.1.39 that changes a greater-or-equal condition to a
greater-than if the value matching that in the query was not
present in the table. But the value comparison function
compared the dates as strings instead of dates.
The bug was fixed by splitting the function
get_date_from_str in two: One part that parses and does
error checking. This function is now visible outside the
module. The old get_date_from_str now calls the new
function.
mysql-test/r/range.result:
Bug#47925: Test result
mysql-test/t/range.test:
Bug#47925: Test case
sql/item.cc:
Bug#47925: Fix + some edit on the comments
sql/item.h:
Bug#47925: Changed function signature
sql/item_cmpfunc.cc:
Bug#47925: Split function in two
sql/item_cmpfunc.h:
Bug#47925: Declaration of new function
sql/opt_range.cc:
Bug#47925: Added THD to function call
sql/time.cc:
Bug#47925: Added microsecond comparison
When a query was using a DATE or DATETIME value formatted
using any other separator characters beside hyphen '-', a
query with a greater-or-equal '>=' condition matching only
the greatest value in an indexed column, the result was
empty if index range scan was employed.
The range optimizer got a new feature between 5.1.38 and
5.1.39 that changes a greater-or-equal condition to a
greater-than if the value matching that in the query was not
present in the table. But the value comparison function
compared the dates as strings instead of dates.
The bug was fixed by splitting the function
get_date_from_str in two: One part that parses and does
error checking. This function is now visible outside the
module. The old get_date_from_str now calls the new
function.
The problem was in incorrect handling of predicates involving
NULL as a constant value by the range optimizer.
For example, when creating a SEL_ARG node from a condition of
the form "field < const" (which would normally result in the
"NULL < field < const" SEL_ARG), the special case when "const"
is NULL was not taken into account, so "NULL < field < NULL"
was produced for the "field < NULL" condition.
As a result, SEL_ARG structures of this form could not be
further optimized which in turn could lead to incorrectly
constructed SEL_ARG trees. In particular, code assuming SEL_ARG
structures to always form a sequence of ordered disjoint
intervals could enter an infinite loop under some
circumstances.
Fixed by changing get_mm_leaf() so that for any sargable
predicate except "<=>" involving NULL as a constant, "empty"
SEL_ARG is returned, since such a predicate is always false.
mysql-test/r/partition_pruning.result:
Fixed a broken test case.
mysql-test/r/range.result:
Added a test case for bug #47123.
mysql-test/r/subselect.result:
Fixed a broken test cases.
mysql-test/t/range.test:
Added a test case for bug #47123.
sql/opt_range.cc:
Fixed get_mm_leaf() so that for any sargable
predicate except "<=>" involving NULL as a constant, "empty"
SEL_ARG is returned, since such a predicate is always false.
The problem was in incorrect handling of predicates involving
NULL as a constant value by the range optimizer.
For example, when creating a SEL_ARG node from a condition of
the form "field < const" (which would normally result in the
"NULL < field < const" SEL_ARG), the special case when "const"
is NULL was not taken into account, so "NULL < field < NULL"
was produced for the "field < NULL" condition.
As a result, SEL_ARG structures of this form could not be
further optimized which in turn could lead to incorrectly
constructed SEL_ARG trees. In particular, code assuming SEL_ARG
structures to always form a sequence of ordered disjoint
intervals could enter an infinite loop under some
circumstances.
Fixed by changing get_mm_leaf() so that for any sargable
predicate except "<=>" involving NULL as a constant, "empty"
SEL_ARG is returned, since such a predicate is always false.
The problem was in incorrect handling of predicates involving
NULL as a constant value by the range optimizer.
For example, when creating a SEL_ARG node from a condition of
the form "field < const" (which would normally result in the
"NULL < field < const" SEL_ARG), the special case when "const"
is NULL was not taken into account, so "NULL < field < NULL"
was produced for the "field < NULL" condition.
As a result, SEL_ARG structures of this form could not be
further optimized which in turn could lead to incorrectly
constructed SEL_ARG trees. In particular, code assuming SEL_ARG
structures to always form a sequence of ordered disjoint
intervals could enter an infinite loop under some
circumstances.
Fixed by changing get_mm_leaf() so that for any sargable
predicate except "<=>" involving NULL as a constant, "empty"
SEL_ARG is returned, since such a predicate is always false.
mysql-test/r/range.result:
Added a test case for bug #47123.
mysql-test/t/range.test:
Added a test case for bug #47123.
sql/opt_range.cc:
Fixed get_mm_leaf() so that for any sargable
predicate except "<=>" involving NULL as a constant, "empty"
SEL_ARG is returned, since such a predicate is always false.
The problem was in incorrect handling of predicates involving
NULL as a constant value by the range optimizer.
For example, when creating a SEL_ARG node from a condition of
the form "field < const" (which would normally result in the
"NULL < field < const" SEL_ARG), the special case when "const"
is NULL was not taken into account, so "NULL < field < NULL"
was produced for the "field < NULL" condition.
As a result, SEL_ARG structures of this form could not be
further optimized which in turn could lead to incorrectly
constructed SEL_ARG trees. In particular, code assuming SEL_ARG
structures to always form a sequence of ordered disjoint
intervals could enter an infinite loop under some
circumstances.
Fixed by changing get_mm_leaf() so that for any sargable
predicate except "<=>" involving NULL as a constant, "empty"
SEL_ARG is returned, since such a predicate is always false.
covering index
When two range predicates were combined under an OR
predicate, the algorithm tried to merge overlapping ranges
into one. But the case when a range overlapped several other
ranges was not handled. This lead to
1) ranges overlapping, which gave repeated results and
2) a range that overlapped several other ranges was cut off.
Fixed by
1) Making sure that a range got an upper bound equal to the
next range with a greater minimum.
2) Removing a continue statement
mysql-test/r/group_min_max.result:
Bug#42846: Changed query plans
mysql-test/r/range.result:
Bug#42846: Test result.
mysql-test/t/range.test:
Bug#42846: Test case.
sql/opt_range.cc:
Bug#42846: The fix.
Part1: Previously, both endpoints from key2 were copied,
which is not safe. Since ranges are processed in ascending
order of minimum endpoints, it is safe to copy the minimum
endpoint from key2 but not the maximum. The maximum may only
be copied if there is no other range or the other range's
minimum is greater than key2's maximum.
covering index
When two range predicates were combined under an OR
predicate, the algorithm tried to merge overlapping ranges
into one. But the case when a range overlapped several other
ranges was not handled. This lead to
1) ranges overlapping, which gave repeated results and
2) a range that overlapped several other ranges was cut off.
Fixed by
1) Making sure that a range got an upper bound equal to the
next range with a greater minimum.
2) Removing a continue statement
WHERE f1 < n ignored row if f1 was indexed integer column and
f1 = TYPE_MAX ^ n = TYPE_MAX+1. The latter value when treated
as TYPE overflowed (obviously). This was not handled, it is now.
mysql-test/r/range.result:
show that on an index int column, we no longer disregard
a field val of TYPE_MAX in SELECT ... WHERE ... < TYPE_MAX+1
mysql-test/t/range.test:
show that on an index int column, we no longer disregard
a field val of TYPE_MAX in SELECT ... WHERE ... < TYPE_MAX+1
sql/opt_range.cc:
Handle overflowing of int-types in range-optimizer.
Unfortunately requires re-indentation of entire block.
Overflow (err == 1) was handled, but only if
field->cmp_type() != value->result_type(), which it
just wasn't in our case.
WHERE f1 < n ignored row if f1 was indexed integer column and
f1 = TYPE_MAX ^ n = TYPE_MAX+1. The latter value when treated
as TYPE overflowed (obviously). This was not handled, it is now.