- make make_cond_after_sjm() correctly handle OR clauses where one branch refers to the semi-join table
while the other branch refers to the non-semijoin table.
The patch backports two patches from mysql 5.6:
- BUG#12640437: USING SQL_BUFFER_RESULT RESULTS IN A DIFFERENT QUERY OUTPUT
- Bug#12578908: SELECT SQL_BUFFER_RESULT OUTPUTS TOO MANY ROWS WHEN GROUP IS OPTIMIZED AWAY
Original comment:
-----------------
3714 Jorgen Loland 2012-03-01
BUG#12640437 - USING SQL_BUFFER_RESULT RESULTS IN A DIFFERENT
QUERY OUTPUT
For all but simple grouped queries, temporary tables are used to
resolve grouping. In these cases, the list of grouping fields is
stored in the temporary table and grouping is resolved
there (e.g. by adding a unique constraint on the involved
fields). Because of this, grouping is already done when the rows
are read from the temporary table.
In the case where a group clause may be optimized away, grouping
does not have to be resolved using a temporary table. However, if
a temporary table is explicitly requested (e.g. because the
SQL_BUFFER_RESULT hint is used, or the statement is
INSERT...SELECT), a temporary table is used anyway. In this case,
the temporary table is created with an empty group list (because
the group clause was optimized away) and it will therefore not
create groups. Since the temporary table does not take care of
grouping, JOIN::group shall not be set to false in
make_simple_join(). This was fixed in bug 12578908.
However, there is an exception where make_simple_join() should
set JOIN::group to false even if the query uses a temporary table
that was explicitly requested but is not strictly needed. That
exception is if the loose index scan access method (explain
says "Using index for group-by") is used to read into the
temporary table. With loose index scan, grouping is resolved
by the access method. This is exactly what happens in this bug.
The problem was in the code (update_const_equal_items()) which marked
index parts constant independently of the place where the equality was used.
In the test suite it marked t2_1.c part constant despite the fact that
it connected by OR with other expression.
Solution is to mark constant only top equalities connected with AND.
The previous patch for the bug (that erroneously identified the bug as
bug 972973 in its comment) was incorrect.
It turned out that the code that triggered the abort complain reported for
the bug was not needed at all.
When the function free_tmp_table deletes the handler object for
a temporary table the field TABLE::file for this table should be
set to NULL. Otherwise an assertion failure may occur.
The main problem was a bug in CSV where it provided wrong statistics (it claimed the table was empty when it wasn't)
I also fixed wrong freeing of blob's in the CSV handler. (Any call to handler::read_first_row() on a CSV table with blobs would fail)
mysql-test/r/csv.result:
Added new test case
mysql-test/r/partition_innodb.result:
Updated test results after fixing bug with impossible partitions and const tables
mysql-test/t/csv.test:
Added new test case
sql/sql_select.cc:
Cleaned up code for handling of partitions.
Fixed also a bug where we didn't threat a table with impossible partitions as a const table.
storage/csv/ha_tina.cc:
Allocate blobroot onces.
- When doing join optimization, pre-sort the tables so that they mimic the execution
order we've had with 'semijoin=off'.
- That way, we will not get regressions when there are two query plans (the old and the
new) that have indentical costs but different execution times (because of factors that
the optimizer was not able to take into account).
- This is a regession introduced by fix for BUG#951937
- The problem was that there were scenarios where check_simple_equality() would create an
Item_equal object but would not call item_equal->set_context_field() on it.
- The fix was to add the missing calls.
- Fix equality propagation to work with SJM nests and OR clauses (full descirption of problem and
solution in the comment in the patch)
(The second commit with post-review fixes)
- Remove all references of MAX_TABLES from JOIN struct and make these dynamic
- Updated Join_plan_state to allocate just as many elements as it's needed
sql/opt_subselect.cc:
Optimized version of Join_plan_state
sql/sql_select.cc:
Set join->positions and join->best_positions dynamicly
Don't call update_virtual_fields() if table->vfield is not set.
sql/sql_select.h:
Remove all references of MAX_TABLES from JOIN struct and Join_plan_state and make these dynamic
If the first component of a ref key happened to be a constant appeared
after constant row substitution then no store_key element should be
created for such a component. Yet create_ref_for_key() erroneously could
create such an element that caused construction of invalid ref keys and
wrong results for some joins.
Do not call, directly or indirectly, SQL_SELECT::test_quick_select()
for derived materialized tables / views when optimizing joins referring
to these tables / views to get cost estimates of materialization.
The current code does not create B-tree indexes for materialized
derived tables / views. So now it's not possible to get any estimates
for ranges conditions over the results of the materialization.
The function mysql_derived_create() must take into account the fact
that array of the KEY structures specifying the keys over a derived
table / view may be moved after the optimization phase if the
derived table / view is materialized.
Added 'from_end' as extra parameter to Field::unpack() to detect wrong from data.
Change ha_archive::unpack_row() to detect wrong field lengths.
Replication code changed to detect wrong field information in events.
mysql-test/r/archive.result:
dded test case for lp:917689
sql/field.cc:
Added 'from_end' as extra parameter to Field::unpack() to detect wrong from data.
Removed not used 'unpack_key' functions.
sql/field.h:
Added 'from_end' as extra parameter to Field::unpack() to detect wrong from data.
Removed not used 'unpack_key' functions.
Removed some not needed unpack() functions.
sql/filesort.cc:
Added buffer end parameter to unpack_addon_fields()
sql/log_event.h:
Added end of buffer argument to unpack_row()
sql/log_event_old.cc:
Added end of buffer argument to unpack_row()
sql/log_event_old.h:
Added end of buffer argument to unpack_row()
sql/records.cc:
Added buffer end parameter to unpack_addon_fields()
sql/rpl_record.cc:
Added end of buffer argument to unpack_row()
Added detection of wrong field information in events
sql/rpl_record.h:
Added end of buffer argument to unpack_row()
sql/rpl_record_old.cc:
Added end of buffer argument to unpack_row()
Added detection of wrong field information in events
sql/rpl_record_old.h:
Added end of buffer argument to unpack_row()
sql/table.h:
Added buffer end parameter to unpack()
storage/archive/ha_archive.cc:
Change ha_archive::unpack_row() to detect wrong field lengths.
This fixes lp:917689
The function create_hj_key_for_table() that builds the descriptor of
the hash join key to access a table of a materialized subquery must
ignore any equi-join predicate depending on the tables not belonging
to the subquery.
This bug in the function JOIN::drop_unused_derived_keys() could
leave the internal structures for a materialized derived table
in an inconsistent state. This led to a not quite correct EXPLAIN
output when no additional key had been created to access the table.
It also may lead to more serious consequences: so, the test case
added with this fix caused a crash in mariadb-5.5.20.
This bug appeared after the patch for bug 939009 that in the
function merge_key_fields forgot to reset a proper value for
the val field in the result of the merge operation of the key
field created for a regular key access and the key field
created to look for a NULL key.
Adjusted the results of the test case for bug 939009 that
actually were incorrect.
The result of materialization of the right part of an IN subquery predicate
is placed into a temporary table. Each row of the materialized table is
distinct. A unique key over all fields of the temporary table is defined and
created. It allows to perform key look-ups into the table.
The table created for a materialized subquery can be accessed by key as
any other table. The function best_access-path search for the best access
to join a table to a given partial join. With some where conditions this
function considers a possibility of a ref_or_null access. If such access
employs the unique key on the temporary table then when estimating
the cost this access the function tries to use the array rec_per_key. Yet,
such array is not built for this unique key. This causes a crash of the server.
Rows returned by the subquery that contain nulls don't have to be placed
into temporary table, as they cannot be match any row produced by the
left part of the subquery predicate. So all fields of the temporary table
can be defined as non-nullable. In this case any ref_or_null access
to the temporary table does not make any sense and it does not make sense
to estimate such an access.
The fix makes sure that the temporary table for a materialized IN subquery
is defined with columns that are all non-nullable. The also ensures that
any row with nulls returned by the subquery is not placed into the
temporary table.
- After the exec_const_cond->val_int() call, check for error and return.
(if we don't do it, we will eventually hit an error when trying to set status OK in
the diagnostics area, which already has an error status).
- In return_zero_rows(), don't call mark_as_null_row() for semi-join
materialized tables, because 1) they may have been already freed, and
2)there is no real need to call mark_as_null_row() for them.
This bug is the result of an incomplete/inconsistent change introduced into
5.3 code when the cond_equal parameter were added to the function optimize_cond.
The change was made during a merge from 5.2 in October 2010.
The bug could affect only queries with HAVING.
An outer join query with a semi-join subquery could return a wrong result
if the optimizer chose to materialize the subquery.
It happened because when substituting for the best field into a ref item
used to build access keys not all COND_EQUAL objects that could be employed
at substitution were checked.
Also refined some code in the function check_join_cache_usage to make it
safer.
If the flag 'optimize_join_buffer_size' is set to 'off' and the value
of the system variable 'join_buffer_size' is greater than the value of
the system variable 'join_buffer_space_limit' than no join cache can
be employed to join tables of the executed query.
A bug in the function JOIN_CACHE::alloc_buffer allowed to use join
buffer even in this case while another bug in the function
revise_cache_usage could cause a crash of the server in this case if the
chosen execution plan for the query contained outer join or semi-join
operation.
IS EXECUTED TWICE FROM P
This bug is a duplicate of bug 12567331, which was pushed to the
optimizer backporting tree on 2011-06-11. This is just a back-port of
the fix. Both test cases are included as they differ somewhat.
- Disable use of join cache when we're using FirstMatch strategy, and the join
order is such that subquery's inner tables are interleaved with outer. Join
buffering code is incapable of handling such join orders.
- The testcase requires use of @@debug_optimizer_prefer_join_prefix to hit the bug,
but I'm pushing it anyway (including the mention of the variable in .test file),
so that it can be found and enabled when/if we get something comparable in the
main tree.
The problem was that LooseScan execution code assumed that tab->key holds
the index used for looseScan. This is only true when range or full index
scan are used. In case of ref access, the index is in tab->ref.key (and
tab->index==0 which explains how LooseScan passed tests with ref access: they
used one index)
Fixed by setting/using loosescan_key, which always the correct index#.
- equality substitution code was geared towards processing WHERE/ON clauses.
that is, it assumed that it was doing substitions on the code that
= wasn't attached to any particular join_tab yet
= was going to be fed to make_join_select() which would take the condition
apart and attach various parts of it to tables inside/outside semi-joins.
- However, somebody added equality substition for ref access. That is, if
we have a ref access on TBL.key=expr, they would do equality substition in
'expr'. This possibility wasn't accounted for.
- Fixed equality substition code by adding a mode that does equality
substition under assumption that the processed expression will be
attached to a certain particular table TBL.
If the expression for a derived table of a query contained a LIMIT
clause the estimate of the number of rows in this derived table
returned by the EXPLAIN command could be badly off since the
optimizer ignored the limit number from the LIMIT clause when
getting the estimate.
The call of the method SELECT_LEX_UNIT->set_limit added in the code
of mysql_derived_optimize() will be needed also in maria-5.5 where
parameters in the LIMIT clause are supported.