Bug#17867117 - ERROR RESULT WHEN "COUNT + DISTINCT + CASE WHEN" NEED MERGE_WALK
Problem:
COUNT DISTINCT gives incorrect result when it uses a Unique
Tree and its last inserted record has null value.
Here is how COUNT DISTINCT is processed, given that this query is not
using loose index scan.
When a row is produced as a result of joining tables (there is only
one table here), we store the SELECTed value in a Unique tree. This
allows elimination of any duplicates, and thus implements DISTINCT.
When we have processed all rows like this, we walk the Unique tree,
counting its elements, in Aggregator_distinct::endup() (tree->walk());
for each element we call Item_sum_count::add(). Such function wants to
ignore any NULL value, for that it checks item_sum -> args[0] ->
null_value. It is a mistake: when walking the Unique tree, the value
to be aggregated is not item_sum ->args[0] but rather table ->
field[0].
Solution:
instead of item_sum -> args[0] -> null_value, use arg_is_null(), which
knows where to look (like in fix for bug 57932).
As a consequence of this solution, we have to make arg_is_null() a
little more general:
1) Because it was so far only used for AVG() (which always has a
single argument), this function was looking at a single argument; now
that it has to work with COUNT(DISTINCT expression1,expression2), it
must look at all arguments.
2) Because we start using arg_is_null () for COUNT(DISTINCT), i.e. in
Item_sum_count::add (), it implies that we are also using it for
COUNT(no DISTINCT) (same add ()). For COUNT(no DISTINCT), the
nullness to check is that of item_sum -> args[0]. But the null_value
of such item is reliable only if val_*() has been called on it. So far
arg_is_null() was always used after a call to arg_val*(), so could
rely on null_value; but for COUNT, there is no call to arg_val*(), so
arg_is_null() has to call is_null() instead.
Testcase for 16539979 by Neeraj. Testcase for 17867117 contributed by
Xiaobin Lin from Taobao.
CONTAINING NULL
Problem:-
In MySQL, We can obtain the number of distinct expression
combinations that do not contain NULL by giving a list of
expressions in COUNT(DISTINCT).
However rows with NULL values are
incorrectly included in the count when loose index scan is
used.
Analysis:-
In case of loose index scan, we check whether the field is null or
not and increase the count in Item_sum_count::add().
But there we are checking for the first field in COUNT(DISTINCT),
not for every field. This is causing an incorrect result.
Solution:-
Check all field in Item_sum_count::add(), whether there values
are null or not. Then only increment the count.
******
Bug#17222452 - SELECT COUNT(DISTINCT A,B) INCORRECTLY COUNTS ROWS
CONTAINING NULL
Problem:-
In MySQL, We can obtain the number of distinct expression
combinations that do not contain NULL by giving a list of
expressions in COUNT(DISTINCT).
However rows with NULL values are
incorrectly included in the count when loose index scan is
used.
Analysis:-
In case of loose index scan, we check whether the field is null or
not and increase the count in Item_sum_count::add().
But there we are checking for the first field in COUNT(DISTINCT),
not for every field. This is causing an incorrect result.
Solution:-
Check all field in Item_sum_count::add(), whether there values
are null or not. Then only increment the count.
!TABLES->NEXT_NAME_RESOLUTION_TABLE) || !TAB
Problem:
The context info of select query gets corrupted when a query
with group_concat having order by is present in an order by
clause of the select query. As a result, server crashes with
an assert.
Analysis:
While parsing order by for group_concat, it is presumed that
it is always present before the actual order by for the
select query.
As a result, parser uses select->order_list to populate the
order by items of group_concat and creates a select->gorder_list
to which select->order_list is copied onto. Once this is done,
it empties the select->order_list.
In the case presented in the bugpage, as order by is already
parsed when group_concat's order by is encountered, parser
presumes that it is the second order by in the select query
and creates fake_lex_unit which results in the change of
context info.
Solution:
Make group_concat's order by parsing independent of the select
Problem:
A select query inside a group_concat function having an
outer reference results in a crash.
Analysis:
In function Item_group_concat::add, we do not check if
return value of get_tmp_table_field can be NULL for
a non-const item. This can happen for a query with a
outer reference.
While resolving the outer reference in the query present
inside group_concat function, we set the "const_item_cache"
to false. As a result in the call to const_item() from
Item_func_group_concat::add, it returns false and goes on
to check if this can be NULL resulting in the crash.
get_tmp_table_field does not return NULL for Items of type
Item_field, Item_result_field and Item_ref.
For all other items, it returns NULL.
Solution:
Check for the return value of get_tmp_table_field before we
access field contents.
Item_func_group_concat::copy_or_same() creates a copy of original object.
It also creates a copy of ORDER structure because ORDER struct elements may
be modified in find_order_in_list() called from Item_func_group_concat::setup().
As ORDER copy is created using memcpy, ORDER::next elements point to original
ORDER structs. Thus find_order_in_list() called from EXECUTE stmt modifies
ordinal ORDER item pointers so they point to runtime items, these items are
freed after execution, so original ORDER structure becomes invalid.
The fix is to properly update ORDER::next fields so that they point to
new ORDER elements.
Analysis:
---------
When the server is out of memory, an error is raised
to indicate the same. Handling the error requires
more memory to be allocated which fails, hence the
error handling loops in a recursion and causes the
server to crash.
Fix:
---
a) Prevents pushing the 'out of memory' error condition
to the diagnostic area as it requires memory allocation.
GET DIAGNOSTICS, SHOW WARNINGS and SHOW ERRORS statements
will not show information about this error. However the
'out of memory' error is returned to the client.
b) It sets the ME_FATALERROR flag when 'out of memory' errors
are reported (for places where the flag is not already set).
This flag prevents activation of SP error handlers which also
require memory allocation and therefore are likely to fail.
The problem is a shift operation that is not 64-bit safe.
The consequence is that used tables information for a join with 32 tables
or more will be incorrect.
Fixed by adding a type cast in Item_sum::update_used_tables().
Also used the opportunity to fix some other potential bugs by adding an
explicit type-cast to an integer in a left-shift operation.
Some of them were quite harmless, but was fixed in order to get the same
signed-ness as the other operand of the operation it was used in.
sql/item_cmpfunc.cc
Adjusted signed-ness for some integers in left-shift.
sql/item_subselect.cc
Added type-cast to nesting_map (which is a 32/64 bit type, so
potential bug for deeply nested queries).
sql/item_sum.cc
Added type-cast to nesting_map (32/64-bit type) and table_map
(64-bit type).
sql/opt_range.cc
Added type-cast to ulonglong (which is a 64-bit type).
sql/sql_base.cc
Added type-cast to nesting_map (which is a 32/64-bit type).
sql/sql_select.cc
Added type-cast to nesting_map (32/64-bit type) and key_part_map
(64-bit type).
sql/strfunc.cc
Changed type-cast from longlong to ulonglong, to preserve signed-ness.
When temporary tables is used for result sorting
result field for gconcat function is created using
group_concat_max_len size. It leads to result truncation
when character_set_results is multi-byte character set due
to insufficient tmp table field size.
The fix is to increase temporary table field size for
gconcat. Method make_string_field() is overloaded
for Item_func_group_concat class and uses
max_characters * collation.collation->mbmaxlen size for
result field. max_characters is maximum number of characters
what can fit into max_length size.
When we create temporary result table for UNION
incorrect max_length for YEAR field is used and
it leads to incorrect field value and incorrect
result string length as YEAR field value calculation
depends on field length.
The fix is to use underlying item max_length for
Item_sum_hybrid::max_length intialization.
- Removed files specific to compiling on OS/2
- Removed files specific to SCO Unix packaging
- Removed "libmysqld/copyright", text is included in documentation
- Removed LaTeX headers for NDB Doxygen documentation
- Removed obsolete NDB files
- Removed "mkisofs" binaries
- Removed the "cvs2cl.pl" script
- Changed a few GPL texts to use "program" instead of "library"
Item_sum_max/Item_sum_min incorrectly set null_value flag and
attempt to get result in parent functions leads to crash.
This happens due to double evaluation of the function argumet.
First evaluation happens in the comparator and second one
happens in Item_cache::cache_value().
The fix is to introduce new Item_cache object which
holds result of the argument and use this cached value
as an argument of the comparator.
Original revid: alexey.kopytov@sun.com-20100723115254-jjwmhq97b9wl932l
> Bug #54476: crash when group_concat and 'with rollup' in
> prepared statements
>
> Using GROUP_CONCAT() together with the WITH ROLLUP modifier
> could crash the server.
>
> The reason was a combination of several facts:
>
> 1. The Item_func_group_concat class stores pointers to ORDER
> objects representing the columns in the ORDER BY clause of
> GROUP_CONCAT().
>
> 2. find_order_in_list() called from
> Item_func_group_concat::setup() modifies the ORDER objects so
> that their 'item' member points to the arguments list
> allocated in the Item_func_group_concat constructor.
>
> 3. In some cases (e.g. in JOIN::rollup_make_fields) a copy of
> the original Item_func_group_concat object could be created by
> using the Item_func_group_concat::Item_func_group_concat(THD
> *thd, Item_func_group_concat *item) copy constructor. The
> latter essentially creates a shallow copy of the source
> object. Memory for the arguments array is allocated on
> thd->mem_root, but the pointers for arguments and ORDER are
> copied verbatim.
>
> What happens in the test case is that when executing the query
> for the first time, after a copy of the original
> Item_func_group_concat object has been created by
> JOIN::rollup_make_fields(), find_order_in_list() is called for
> this new object. It then resolves ORDER BY by modifying the
> ORDER objects so that they point to elements of the arguments
> array which is local to the cloned object. When thd->mem_root
> is freed upon completing the execution, pointers in the ORDER
> objects become invalid. Those ORDER objects, however, are also
> shared with the original Item_func_group_concat object which is
> preserved between executions of a prepared statement. So the
> first call to find_order_in_list() for the original object on
> the second execution tries to dereference an invalid pointer.
>
> The solution is to create copies of the ORDER objects when
> copying Item_func_group_concat to not leave any stale pointers
> in other instances with different lifecycles.
Explain fails at fix_fields stage and some items are left unfixed,
particulary Item_group_concat. Item_group_concat::orig_args field
is uninitialized in this case and Item_group_concat::print call
leads to crash.
The fix:
move the initialization of Item_group_concat::orig_args
into constructor.
when semijoin=on
When setting the aggregate function as having no rows to report
the function no_rows_in_result() was calling Item_sum::reset().
However this function in addition to cleaning up the aggregate
value by calling aggregator_clear() was also adding the current
value to the aggregate value by calling aggregator_add().
Fixed by making no_rows_in_result() to call aggregator_clear()
directly.
Renamed Item_sum::reset to Item_sum::reset_and_add() to
and added a comment to avoid misinterpretation of what the
function does.
The problem is caused by bug49487 fix and became visible
after after bug56679 fix.
Items are cleaned up and set to unfixed state after filling derived table.
So we can not rely on item::fixed state in Item_func_group_concat::print
and we can not use 'args' array as items there may be cleaned up.
The fix is always to use orig_args array of items as it
always should contain the correct data.
== MYSQL_TYPE_LONGLONG
A MIN/MAX() function with a subquery as its argument could lead
to a debug assertion on debug builds or wrong data on release
ones.
The problem was a combination of the following factors:
- Item_sum_hybrid::fix_fields() might use the argument
(args[0]) to calculate 'hybrid_field_type' which was later used
to decide how the data should be sent to the client.
- Item_sum::make_field() might use the argument again to
calculate the field's type when sending result set metadata to
the client.
- The argument could be changed in between these two calls via
Item::set_arg() leading to inconsistent metadata being
reported.
Here is what was happening for the bug's test case:
1. Item_sum_hybrid::fix_fields() calculates hybrid_field_type
as MYSQL_TYPE_LONGLONG based on args[0] which is an
Item::SUBSELECT_ITEM at that time.
2. A temporary table is created to execute the
query. create_tmp_field_from_item() creates a Field_long object
according to the subselect's max_length.
3. The subselect item in Item_sum_hybrid is replaced by the
Item_field object referencing the newly created Field_long.
4. Item_sum::make_field() rightfully returns the
MYSQL_TYPE_LONG type when calculating the result set metadata.
5. When sending the actual data, Item::send() relies on the
virtual field_type() function which in our case returns
previously calculated hybrid_field_type == MYSQL_TYPE_LONGLONG.
It looks like the only solution is to never refer to the
argument's metadata after the result metadata has been
calculated in fix_fields(), since the argument itself may be
different by then. In this sense, Item_sum::make_field() should
never be used, because it may rely on the argument's metadata
and is only called after fix_fields(). The "default"
implementation in Item::make_field() should be used instead as
it relies only on field_type(), but not on the argument's type.
Fixed by removing Item_sum::make_field() so that the superclass
implementation Item::make_field() is always used.
The server was not checking for errors generated during
the execution of Item::val_xxx() methods when copying
data to the group, order, or distinct temp table's row.
Fixed by extending the copy_funcs() to return an error
code and by checking for that error code on the places
copy_funcs() is called.
Test case added.
prepared statements
Using GROUP_CONCAT() together with the WITH ROLLUP modifier
could crash the server.
The reason was a combination of several facts:
1. The Item_func_group_concat class stores pointers to ORDER
objects representing the columns in the ORDER BY clause of
GROUP_CONCAT().
2. find_order_in_list() called from
Item_func_group_concat::setup() modifies the ORDER objects so
that their 'item' member points to the arguments list
allocated in the Item_func_group_concat constructor.
3. In some cases (e.g. in JOIN::rollup_make_fields) a copy of
the original Item_func_group_concat object could be created by
using the Item_func_group_concat::Item_func_group_concat(THD
*thd, Item_func_group_concat *item) copy constructor. The
latter essentially creates a shallow copy of the source
object. Memory for the arguments array is allocated on
thd->mem_root, but the pointers for arguments and ORDER are
copied verbatim.
What happens in the test case is that when executing the query
for the first time, after a copy of the original
Item_func_group_concat object has been created by
JOIN::rollup_make_fields(), find_order_in_list() is called for
this new object. It then resolves ORDER BY by modifying the
ORDER objects so that they point to elements of the arguments
array which is local to the cloned object. When thd->mem_root
is freed upon completing the execution, pointers in the ORDER
objects become invalid. Those ORDER objects, however, are also
shared with the original Item_func_group_concat object which is
preserved between executions of a prepared statement. So the
first call to find_order_in_list() for the original object on
the second execution tries to dereference an invalid pointer.
The solution is to create copies of the ORDER objects when
copying Item_func_group_concat to not leave any stale pointers
in other instances with different lifecycles.
This bug is a design flaw of the fix for the bug#33546. It assumed that an
item can be used only in one comparison context, but actually it isn't the
case. Item_cache_datetime is used to store result for MIX/MAX aggregate
functions. Because Arg_comparator always compares datetime values as INTs when
possible the Item_cache_datetime most time caches only INT value. But
since all datetime values has STRING result type MIN/MAX functions are asked
for a STRING value when the result is being sent to a client. The
Item_cache_datetime was designed to avoid conversions and get INT/STRING
values from an underlying item, but at the moment the values is asked
underlying item doesn't hold it anymore thus wrong result is returned.
Beside that MIN/MAX aggregate functions was wrongly initializing cached result
and this led to a wrong result.
The Item::has_compatible_context helper function is added. It checks whether
this and given items has the same comparison context or can be compared as
DATETIME values by Arg_comparator. The equality propagation optimization is
adjusted to take into account that items which being compared as DATETIME
can have different comparison contexts.
The Item_cache_datetime now converts cached INT value to a correct STRING
DATETIME value by means of number_to_datetime & my_TIME_to_str functions.
The Arg_comparator::set_cmp_context_for_datetime helper function is added.
It sets comparison context of items being compared as DATETIMEs to INT if
items will be compared as longlong.
The Item_sum_hybrid::setup function now correctly initializes its result
value.
In order to avoid unnecessary conversions Item_sum_hybrid now states that it
can provide correct longlong value if the item being aggregated can do it
too.
strict aliasing violations.
One somewhat major source of strict-aliasing violations and
related warnings is the SQL_LIST structure. For example,
consider its member function `link_in_list` which takes
a pointer to pointer of type T (any type) as a pointer to
pointer to unsigned char. Dereferencing this pointer, which
is done to reset the next field, violates strict-aliasing
rules and might cause problems for surrounding code that
uses the next field of the object being added to the list.
The solution is to use templates to parametrize the SQL_LIST
structure in order to deference the pointers with compatible
types. As a side bonus, it becomes possible to remove quite
a few casts related to acessing data members of SQL_LIST.
Fix various mismatches between function's language linkage. Any
particular function that is declared in C++ but should be callable
from C must have C linkage. Note that function types with different
linkages are also distinct. Thus, if a function type is declared in
C code, it will have C linkage (same if declared in a extern "C"
block).