When we create temporary result table for UNION
incorrect max_length for YEAR field is used and
it leads to incorrect field value and incorrect
result string length as YEAR field value calculation
depends on field length.
The fix is to use underlying item max_length for
Item_sum_hybrid::max_length intialization.
In the string context the MIN() and MAX() functions don't take
into account the unsignedness of the UNSIGNED BIGINT argument
column.
I.e.:
CREATE TABLE t1 (a BIGINT UNSIGNED);
INSERT INTO t1 VALUES (18446668621106209655);
SELECT CONCAT(MAX(a)) FROM t1;
returns -75452603341961.
Item_sum_max/Item_sum_min incorrectly set null_value flag and
attempt to get result in parent functions leads to crash.
This happens due to double evaluation of the function argumet.
First evaluation happens in the comparator and second one
happens in Item_cache::cache_value().
The fix is to introduce new Item_cache object which
holds result of the argument and use this cached value
as an argument of the comparator.
== MYSQL_TYPE_LONGLONG
A MIN/MAX() function with a subquery as its argument could lead
to a debug assertion on debug builds or wrong data on release
ones.
The problem was a combination of the following factors:
- Item_sum_hybrid::fix_fields() might use the argument
(args[0]) to calculate 'hybrid_field_type' which was later used
to decide how the data should be sent to the client.
- Item_sum::make_field() might use the argument again to
calculate the field's type when sending result set metadata to
the client.
- The argument could be changed in between these two calls via
Item::set_arg() leading to inconsistent metadata being
reported.
Here is what was happening for the bug's test case:
1. Item_sum_hybrid::fix_fields() calculates hybrid_field_type
as MYSQL_TYPE_LONGLONG based on args[0] which is an
Item::SUBSELECT_ITEM at that time.
2. A temporary table is created to execute the
query. create_tmp_field_from_item() creates a Field_long object
according to the subselect's max_length.
3. The subselect item in Item_sum_hybrid is replaced by the
Item_field object referencing the newly created Field_long.
4. Item_sum::make_field() rightfully returns the
MYSQL_TYPE_LONG type when calculating the result set metadata.
5. When sending the actual data, Item::send() relies on the
virtual field_type() function which in our case returns
previously calculated hybrid_field_type == MYSQL_TYPE_LONGLONG.
It looks like the only solution is to never refer to the
argument's metadata after the result metadata has been
calculated in fix_fields(), since the argument itself may be
different by then. In this sense, Item_sum::make_field() should
never be used, because it may rely on the argument's metadata
and is only called after fix_fields(). The "default"
implementation in Item::make_field() should be used instead as
it relies only on field_type(), but not on the argument's type.
Fixed by removing Item_sum::make_field() so that the superclass
implementation Item::make_field() is always used.
The Item_cache_datetime::val_str function wasn't taking into account that time
could be negative. This led to failed assertion.
Now Item_cache_datetime::val_str correctly converts negative time values
from integer to string representation.
MySQL manual describes values of the YEAR(2) field type as follows:
values 00 - 69 mean 2000 - 2069 years and values 70 - 99 mean 1970 - 1999
years. MIN/MAX and comparison functions was comparing them as int values
thus producing wrong result.
Now the Arg_comparator class is extended with compare_year function which
performs correct comparison of the YEAR type.
The Item_sum_hybrid class now uses Item_cache and Arg_comparator objects to
correctly calculate its value.
To allow Arg_comparator to use func_name() function for Item_func and Item_sum
objects the func_name declaration is moved to the Item_result_field class.
A helper function is_owner_equal_func is added to the Arg_comparator class.
It checks whether the Arg_comparator object owner is the <=> function or not.
A helper function setup is added to the Item_sum_hybrid class. It sets up
cache item and comparator.
columns without where/group
Simple SELECT with implicit grouping used to return many rows if
the query was ordered by the aggregated column in the SELECT
list. This was incorrect because queries with implicit grouping
should only return a single record.
The problem was that when JOIN:exec() decided if execution needed
to handle grouping, it was assumed that sum_func_count==0 meant
that there were no aggregate functions in the query. This
assumption was not correct in JOIN::exec() because the aggregate
functions might have been optimized away during JOIN::optimize().
The reason why queries without ordering behaved correctly was
that sum_func_count is only recalculated if the optimizer chooses
to use temporary tables (which it does in the ordered case).
Hence, non-ordered queries were correctly treated as grouped.
The fix for this bug was to remove the assumption that
sum_func_count==0 means that there is no need for grouping. This
was done by introducing variable "bool implicit_grouping" in the
JOIN object.
ONLY_FULL_GROUP_BY
The check for non-aggregated columns in queries with aggregate function, but without
GROUP BY was treating all the parts of the query as if they are in the SELECT list.
Fixed by ignoring the non-aggregated fields in the WHERE clause.
The optimizer pulls up aggregate functions which should be aggregated in
an outer select. At some point it may substitute such a function for a field
in the temporary table. The setup_copy_fields function doesn't take this
into account and may overrun the copy_field buffer.
Fixed by filtering out the fields referenced through the specialized
reference for aggregates (Item_aggregate_ref).
Added an assertion to make sure bugs that cause similar discrepancy
don't go undetected.
returns wrong results
Casting AVG() to DECIMAL led to incorrect results when the arguments
had a non-DECIMAL type, because in this case
Item_sum_avg::val_decimal() performed the division by the number of
arguments twice.
Fixed by changing Item_sum_avg::val_decimal() to not rely on
Item_sum_sum::val_decimal(), i.e. calculate sum and divide using
DECIMAL arithmetics for DECIMAL arguments, and utilize val_real() with
subsequent conversion to DECIMAL otherwise.
When resolving references we need to take into consideration
the view "fields" and allow qualified access to them.
Fixed by extending the reference resolution to process view
fields correctly.
The HAVING clause is subject to the same rules as the SELECT list
about using aggregated and non-aggregated columns.
But this was not enforced when processing implicit grouping from
using aggregate functions.
Fixed by performing the same checks for HAVING as for SELECT.
file .\opt_sum.cc, line
The optimizer pre-calculates the MIN/MAX values for queries like
SELECT MIN(kp_k) WHERE kp_1 = const AND ... AND kp_k-1 = const
when there is a key over kp_1...kp_k
In doing so it was not checking correctly nullability and
there was a superfluous assert().
Fixed by making sure that the field can be null before checking and
taking out the wrong assert().
.
Introduced a correct check for nullability
The MIN(field) can return NULL when all the row values in the group
are NULL-able or if there were no rows.
Fixed the assertion to reflect the case when there are no rows.
Item_sum_distinct::setup(THD*): Assertion
There was an assertion to detect a bug in ROLLUP
implementation. However the assertion is not true
when used in a subquery context with non-cacheable
statements.
Fixed by turning the assertion to accepted case
(just like it's done for the other aggregate functions).
to NULL
For queries of the form SELECT MIN(key_part_k) FROM t1
WHERE key_part_1 = const and ... and key_part_k-1 = const,
the opt_sum_query optimization tries to
use an index to substitute MIN/MAX functions with their values according
to the following rules:
1) Insert the minimum non-null values where the WHERE clause still matches, or
3) A row of nulls
However, the correct semantics requires that there is a third case 2)
such that a NULL value is substituted if there are only NULL values for
key_part_k.
The patch modifies opt_sum_query() to handle this missing case.
When only one row was present, the subtraction of nearly the same number
resulted in catastropic cancellation, introducing an error in the
VARIANCE calculation near 1e-15. That was sqrt()ed to get STDDEV, the
error was escallated to near 1e-8.
The simple fix of testing for a row count of 1 and forcing that to yield
0.0 is insufficient, as two rows of the same value should also have a
variance of 0.0, yet the error would be about the same.
So, this patch changes the formula that computes the VARIANCE to be one
that is not subject to catastrophic cancellation.
In addition, it now uses only (faster-than-decimal) floating point numbers
to calculate, and renders that to other types on demand.
We use val_int() calls (followed by null_value check) to determine
nullness in some Item_sum_count' and Item_sum_count_distinct' methods,
as a side effect we get extra warnings raised in the val_int().
Fix: use is_null() instead.