The problem was that we restored SQL_CACHE, SQL_NO_CACHE flags in SELECT
statement from internal structures based on value set later at runtime, not
the original value set by the user.
The solution is to remember that original value.
'SELECT DISTINCT a,b FROM t1' should not use temp table if there is unique
index (or primary key) on a.
There are a number of other similar cases that can be calculated without the
use of a temp table : multi-part unique indexes, primary keys or using GROUP BY
instead of DISTINCT.
When a GROUP BY/DISTINCT clause contains all key parts of a unique
index, then it is guaranteed that the fields of the clause will be
unique, therefore we can optimize away GROUP BY/DISTINCT altogether.
This optimization has two effects:
* there is no need to create a temporary table to compute the
GROUP/DISTINCT operation (or the temporary table will be smaller if only GROUP
is removed and DISTINCT stays or if DISTINCT is removed and GROUP BY stays)
* this causes the statement in effect to become updatable in Connector/Java
because the result set columns will be direct reference to the primary key of
the table (instead to the temporary table that it currently references).
Implemented a check that will optimize away GROUP BY/DISTINCT for queries like
the above.
Currently it will work only for single non-constant table in the FROM clause.
Bug#17294 - INSERT DELAYED puting an \n before data
Bug#16611 - INSERT DELAYED corrupts data
Bug#13707 - Server crash with INSERT DELAYED on MyISAM table
Combined as Bug#16218.
INSERT DELAYED crashed in 5.0 on a table with a varchar that
could be NULL and was created pre-5.0 (Bugs 16218 and 13707).
INSERT DELAYED corrupted data in 5.0 on a table with varchar
fields that was created pre-5.0 (Bugs 17294 and 16611).
In case of INSERT DELAYED the open table is copied from the
delayed insert thread to be able to create a record for the
queue. When copying the fields, a method was used that did
convert old varchar to new varchar fields and did not set up
some pointers into the record buffer of the table.
The field conversion was guilty for the misinterpretation of
the record contents by the delayed insert thread. The wrong
pointer setup was guilty for the crashes.
For Bug 13707 (Server crash with INSERT DELAYED on MyISAM table)
I fixed the above mentioned method to set up one of the pointers.
For Bug 16218 I set up the other pointers too.
But when looking at the corruptions I got aware that converting
the field type was totally wrong for INSERT DELAYED. The copied
table is used to create a record that is to be sent to the
delayed insert thread. Of course it can interpret the record
correctly only if all field types are the same in both table
objects.
So I revoked the fix for Bug 13707 and changed the new_field()
method so that it can suppress conversions.
No test case as this is a migration problem. One needs to
create a table with 4.x and use it with 5.x. I added two
test scripts to the bug report.
tables
Currently in INSERT ... SELECT ... LIMIT ... the compiler uses a
temporary table to store the results of SELECT ... LIMIT .. and then
uses that table as a source for INSERT. The problem is that in some cases
it actually skips the LIMIT clause in doing that and materializes the
whole SELECT result set regardless of the LIMIT.
This fix is limiting the process of filling up the temp table with only
that much rows that will be actually used by propagating the LIMIT value.
The bug report revealed two problems related to min/max optimization:
1. If the length of a constant key used in a SARGable condition for
for the MIN/MAX fields is greater than the length of the field an
unwanted warning on key truncation is issued;
2. If MIN/MAX optimization is applied to a partial index, like INDEX(b(4))
than can lead to returning a wrong result set.
The check for view security was lacking several points :
1. Check with the right set of permissions : for each table ref that
participates in a view there were the right credentials to use in it's
security_ctx member, but these weren't used for checking the credentials.
This makes hard enforcing the SQL SECURITY DEFINER|INVOKER property
consistently.
2. Because of the above the security checking for views was just ruled out
in explicit ways in several places.
3. The security was checked only for the columns of the tables that are
brought into the query from a view. So if there is no column reference
outside of the view definition it was not detecting the lack of access to
the tables in the view in SQL SECURITY INVOKER mode.
The fix below tries to fix the above 3 points.
When a CREATE TABLE command created a table from a materialized
view id does not inherit default values from the underlying table.
Moreover the temporary table used for the view materialization
does not inherit those default values.
In the case when the underlying table contained ENUM fields it caused
misleading error messages. In other cases the created table contained
wrong default values.
The code was modified to ensure inheritance of default values for
materialized views.
Re-work best_access_path() and find_best() to reuse E(#rows(range access)) as
E(#rows(ref[_or_null](const) access) only when it is appropriate.
[This is the final cumulative patch]
When converting DISTINCT to GROUP BY where the columns are from the covering
index and they are quoted twice in the SELECT list the optimizer is creating
improper processing sequence. This is because of the fact that the columns
of the covering index are not recognized as such and treated as non-index
columns.
Generally speaking duplicate columns can safely be removed from the GROUP
BY/DISTINCT list because this will not add or remove new rows in the
resulting set. Duplicates can be removed even if they are not consecutive
(as is the case for ORDER BY, where the duplicate columns can be removed
only if they are consecutive).
So we can safely transform "SELECT DISTINCT a,a FROM ... ORDER BY a" to
"SELECT a,a FROM ... GROUP BY a ORDER BY a" instead of
"SELECT a,a FROM .. GROUP BY a,a ORDER BY a". We can even transform
"SELECT DISTINCT a,b,a FROM ... ORDER BY a,b" to
"SELECT a,b,a FROM ... GROUP BY a,b ORDER BY a,b".
The fix to this bug consists of checking for duplicate columns in the SELECT
list when constructing the GROUP BY list in transforming DISTINCT to GROUP
BY and skipping the ones that are already in.
A query with a group by and having clauses could return a wrong
result set if the having condition contained a constant conjunct
evaluated to FALSE.
It happened because the pushdown condition for table with
grouping columns lost its constant conjuncts.
Pushdown conditions are always built by the function make_cond_for_table
that ignores constant conjuncts. This is apparently not correct when
constant false conjuncts are present.
The bug was as follows: When merge_key_fields() encounters "t.key=X OR t.key=Y" it will
try to join them into ref_or_null access via "t.key=X OR NULL". In order to make this
inference it checks if Y<=>NULL, ignoring the fact that value of Y may be not yet known.
The fix is that the check if Y<=>NULL is made only if value of Y is known (i.e. it is a
constant).
TODO: When merging to 5.0, replace used_tables() with const_item() everywhere in merge_key_fields().
This performance degradation was due to the fact that some
cost evaluation code added into 4.1 in the function find_best was
not merged into the code of the function best_access_path added
together with other code for greedy optimizer.
Added a parameter to the function print_plan. The parameter contains
accumulated cost for a given partial join.
The patch does not include a special test case since this performance
degradation is hard to reproduse with a simple example.
TODO: make the function find_best use the function best_access_path
in order to remove duplication of code which might result in incomplete
merges in the future.
The bug caused wrong result sets for union constructs of the form
(SELECT ... ORDER BY order_list1 [LIMIT n]) ORDER BY order_list2.
For such queries order lists were concatenated and limit clause was
completely neglected.
The SQL standard doesn't allow to use in HAVING clause fields that are not
present in GROUP BY clause and not under any aggregate function in the HAVING
clause. However, mysql allows using such fields. This extension assume that
the non-grouping fields will have the same group-wise values. Otherwise, the
result will be unpredictable. This extension allowed in strict
MODE_ONLY_FULL_GROUP_BY sql mode results in misunderstanding of HAVING
capabilities.
The new error message ER_NON_GROUPING_FIELD_USED message is added. It says
"non-grouping field '%-.64s' is used in %-.64s clause". This message is
supposed to be used for reporting errors when some field is not found in the
GROUP BY clause but have to be present there. Use cases for this message are
this bug and when a field is present in a SELECT item list not under any
aggregate function and there is GROUP BY clause present which doesn't mention
that field. It renders the ER_WRONG_FIELD_WITH_GROUP error message obsolete as
being more descriptive.
The resolve_ref_in_select_and_group() function now reports the
ER_NON_GROUPING_FIELD_FOUND error if the strict mode is set and the field for
HAVING clause is found in the SELECT item list only.
used
In a simple queries a result of the GROUP_CONCAT() function was always of
varchar type.
But if length of GROUP_CONCAT() result is greater than 512 chars and temporary
table is used during select then the result is converted to blob, due to
policy to not to store fields longer than 512 chars in tmp table as varchar
fields.
In order to provide consistent behaviour, result of GROUP_CONCAT() now
will always be converted to blob if it is longer than 512 chars.
Item_func_group_concat::field_type() is modified accordingly.
Multiple equalities were not adjusted after reading constant tables.
It resulted in neglecting good index based methods that could be
used to access of other tables.
out of a nested join to the on conditions for the nest.
The bug happened due to:
1. The function simplify_joins could change on expressions for nested joins.
Yet modified on expressions were not saved in prep_on_expr.
2. On expressions were not restored for nested joins in
reinit_stmt_before_use.
The GROUP_CONCAT uses its own temporary table. When ROLLUP is present
it creates the second copy of Item_func_group_concat. This copy receives the
same list of arguments that original group_concat does. When the copy is
set up the result_fields of functions from the argument list are reset to the
temporary table of this copy.
As a result of this action data from functions flow directly to the ROLLUP copy
and the original group_concat functions shows wrong result.
Since queries with COUNT(DISTINCT ...) use temporary tables to store
the results the COUNT function they are also affected by this bug.
The idea of the fix is to copy content of the result_field for the function
under GROUP_CONCAT/COUNT from the first temporary table to the second one,
rather than setting result_field to point to the second temporary table.
To achieve this goal force_copy_fields flag is added to Item_func_group_concat
and Item_sum_count_distinct classes. This flag is initialized to 0 and set to 1
into the make_unique() member function of both classes.
To the TMP_TABLE_PARAM structure is modified to include the similar flag as
well.
The create_tmp_table() function passes that flag to create_tmp_field().
When the flag is set the create_tmp_field() function will set result_field
as a source field and will not reset that result field to newly created
field for Item_func_result_field and its descendants. Due to this there
will be created copy func to copy data from old result_field to newly
created field.
When there is conjunction of conds, the substitute_for_best_equal_field()
will call the eliminate_item_equal() function in loop to build final
expression. But if eliminate_item_equal() finds that some cond will always
evaluate to 0, then that cond will be substituted by Item_int with value ==
0. In this case on the next iteration eliminate_item_equal() will get that
Item_int and treat it as Item_cond. This is leads to memory corruption and
server crash on cleanup phase.
To the eliminate_item_equal() function was added DBUG_ASSERT for checking
that all items treaten as Item_cond are really Item_cond.
The substitute_for_best_equal_field() now checks that if
eliminate_item_equal() returns Item_int and it's value is 0 then this
value is returned as the result of whole conjunction.
fix_fields() was not called for "order by" variables if the type was a
"constant integer", and thus interpreted as a column index.
However, a local variable is an expression and should not be interpreted
as a column index. Instead it behaves just like when using a user variable
for instance (i.e. it will not affect the ordering).