This patch allows to use semi-join optimization at the top level of
single-table update and delete statements.
The problem of supporting such optimization became easy to resolve after
processing a single-table update/delete statement started using JOIN
structure. This allowed to use JOIN::prepare() not only for multi-table
updates/deletes but for single-table ones as well. This was done in the
patch for mdev-28883:
Re-design the upper level of handling UPDATE and DELETE statements.
Note that JOIN::prepare() detects all subqueries that can be considered
as candidates for semi-join optimization. The code added by this patch
looks for such candidates at the top level and if such candidates are found
in the processed single-table update/delete the statement is handled in
the same way as a multi-table update/delete.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This bug caused a crash of the server at the second execution of a stored
function that used DELETE or UPDATE statement if the first execution
of this function reported an error encountered after the prepare phase.
This happened because in such cases the executed DELETE/UPDATE statement
remained marked as prepared. As a result the second execution of SF missed
the prepare phase for the statement altogether and the statement could not
be executed properly.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This patch introduces a new way of handling UPDATE and DELETE commands at
the top level after the parsing phase. This new way of processing update
and delete statements can be seen in the implementation of the prepare()
and execute() methods from the new Sql_cmd_dml class. This class derived
from the Sql_cmd class can be considered as an interface class for processing
such commands as SELECT, INSERT, UPDATE, DELETE and other comands
manipulating data in tables.
With this patch processing of update and delete statements after parsing
proceeds by the following schema:
- precheck of the access rights is performed for the used tables
- the used tables are opened
- context analysis phase is performed for the statement
- the used tables are locked
- the statement is optimized and executed
- clean-up is performed for the statement
The implementation of the method Sql_cmd_dml::execute() adheres this schema.
The virtual functions of the class Sql_cmd_dml used for precheck of the
access rights, context analysis, optimization and execution allow to adjust
this schema for processing data manipulation statements of any types.
This schema of processing data manipulation statements is taken from the
current MySQL code. Moreover the definition the class Sql_cmd_dml introduced
in this patch is almost a full replica of such class in the existing MySQL.
However the implementation of the derived classes for update and delete
statements is quite different. This implementation employs the JOIN class
for all kinds of update and delete statements. It allows to perform main
bulk of context analysis actions by the function JOIN::prepare(). This
guarantees that characteristics and properties of the statement tree
discovered for optimization phase when doing context analysis are the same
for single-table and multi-table updates and deletes.
With this patch the following functions are gone:
mysql_prepare_update(), mysql_multi_update_prepare(),
mysql_update(), mysql_multi_update(),
mysql_prepare_delete(), mysql_multi_delete_prepare(), mysql_delete().
The code within these functions have been used as much as possible though.
The functions mysql_test_update() and mysql_test_delete() are also not
needed anymore. The method Sql_cmd_dml::prepare() serves processing
- update/delete statement
- PREPARE stmt FROM "<update/delete statement>"
- EXECUTE stmt when stmt is prepared from update/delete statement.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Removed an old '* 2' from the HASH join cost. This was made obsolete by
a later patch that added cost for copying the data out from the join buffer
to table->record.
I also added some 'echo' to some test cases to make it easier to debug
test case changes.
Test case changes:
- subselect3_jcl6 and subselect_sj2_jcl6 result changes as materialized
tables changed to hash join + first_match
Firstmatch_picker::check_qep() has an optimization that allows firstmatch
to be used together with join buffer under some conditions. In this
case the cost was assumed to be same as what best_access_path()
had calculated.
However if HASH+join_buffer was used, then
fix_semijoin_strategies_for_picked_join_order() would remove the
join_buffer (which would cause a full join to be used) and the cost
assumption by Firstmatch_picker::check_qep() would be wrong.
Later check_join_cache_usage() sees that it's a full scan and decides
it can use join buffering, (But not the hash join).
Fixed by also allowing HASH joins with firstmatch.
This removes the need to change disable and re-enable join buffer.
Test case changes:
- HASH join used with firstmatch (Using join buffer (flat, BNLH join))
- Filtered could change with firstmatch as the conversion with and without
join_buffered lost the filtering information.
- The not "re-enabling join buffer" is shown in main.optimizer_trace
Original code by Sergei, optimized by Monty.
Author: Sergei Petrunia <sergey@mariadb.com>, monty@mariadb.org
This stabilizes main.order_by_optimizer_innodb, where the result varies
depending on the rec_per_key status from the engine.
The logic to prefer range over a const ref:
- If range of has only one part and it uses more key parts than ref, then
use the range.
Example:
WHERE key_part1=1 and key_part2 > #
Here we will prefer a range over (key_part1,key_part2) instead a ref
over key_part1.
This error was discovered while working on
MDEV-30540 Wrong result with IN list length reaching
IN_PREDICATE_CONVERSION_THRESHOLD
If there is read error from handler::ha_rnd_next() during a recursive
query, st_select_lex_unit::exec_recursive() will crash as it will try to
get the error code from a structure that was deleted by the callee.
The code was using the construct:
sl->join->exec();
saved_error=sl->join->error;
This does not work as sl->join was freed by the exec() and sl->join would
be set to 0.
Fixed by having JOIN::exec() return the error code.
The included test case simulates the error in ha_rnd_next(), which causes
a crash without the patch.
scovered whle working on
MDEV-30540 Wrong result with IN list length reaching
IN_PREDICATE_CONVERSION_THRESHOLD
If there is read error from handler::ha_rnd_next() during a recursive
query, st_select_lex_unit::exec_recursive() will crash as it will try to
get the error code from a structure that was deleted by the callee.
The code was using the construct:
sl->join->exec();
saved_error=sl->join->error;
This does not work as sl->join was freed by the exec() and sl->join was
set to 0.
Fixed by having JOIN::exec() return the error code.
The included test case simulates the error in ha_rnd_next(), which causes
a crash without the patch.
The problem was the mysql_derived_prepare() did not correctly set
'distinct' when creating a temporary derivated table.
Fixed by separating checking for distinct for queries with and without
UNION.
Other things:
- Fixed bug in generate_derived_keys_for_table() where we set the wrong
bit for join_tab->keys
- Cleaned up JOIN::drop_unused_derived_keys()
- Changed TABLE::use_index() to keep unique keys and update
share->key_parts
Author: Sergei Petrunia <sergey@mariadb.com>, monty@mariadb.org
Subselect_single_value_engine cannot handle table value constructor used as
subquery. That's why any table value constructor TVC used as subquery is
converted into a select over derived table whose specification is TVC.
Currently the names of the columns of the derived table DT are taken from
the first element of TVC and if the k-th component of the element happens
to be a subquery the text representation of this subquery serves as the
name of the k-th column of the derived table. References of all columns of
the derived table DT compose the select list of the result of the conversion.
If a definition of a view contained a table value constructor used as a
subquery and the view was registered after this conversion had been
applied we could register an invalid view definition if the first element
of TVC contained a subquery as its component: the name of this component
was taken from the original subquery, while the name of the corresponding
column of the derived table was taken from the text representation of the
subquery produced by the function SELECT_LEX::print() and these names were
usually differ from each other.
To avoid registration of such invalid views the function SELECT_LEX::print()
now prints the original TVC instead of the select in which this TVC has
been wrapped. Now the specification of registered view looks like as if no
conversions from TVC to selects were done.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This patch also fixes some bugs detected by valgrind after this
patch:
- Not enough copy_func elements was allocated by Create_tmp_table() which
causes an memory overwrite in Create_tmp_table::add_fields()
I added an ASSERT() to be able to detect this also without valgrind.
The bug was that TMP_TABLE_PARAM::copy_fields was not correctly set
when calling create_tmp_table().
- Aria::empty_bits is not allocated if there is no varchar/char/blob
fields in the table. Fixed code to take this into account.
This cannot cause any issues as this is just a memory access
into other Aria memory and the content of the memory would not be used.
- Aria::last_key_buff was not allocated big enough. This may have caused
issues with rtrees and ma_extra(HA_EXTRA_REMEMBER_POS) as they
would use the same memory area.
- Aria and MyISAM didn't take extended key parts into account, which
caused problems when copying rec_per_key from engine to sql level.
- Mark asan builds with 'asan' in version strihng to detect these in
not_valgrind_build.inc.
This is needed to not have main.sp-no-valgrind fail with asan.
This was discovered as part of adding a protected memory area between
each area allocated by multi_alloc().
The patch that adds the protection will be pushed in 10.5.
This patch adds fixes that are unique for 10.10
The old code in prev_record_reads() did give wrong estimates when a
join_buffer was used or if the table was depending on more than one
other tables. When join_cache is used, it will cause a re-order of row
combinations, which causes more calls to the engine for tables that
are depending on tables before the join_cached one.
The new prev_records_read() code provides more exact estimates and
should never give a 'too low estimate', assuming that the data to the
function is correct
The definition of prev_record_read() is also updated.
The new definition is:
"Estimate the number of engine ha_index_read_calls for EQ_REF tables
when taking into account the one-row-cache in join_read_always_key()"
The cost of using prev_record_reads() value is changed. The value is
now used similar as before to calculate the cost of the storage engine
calls. However the cost of the WHERE cost is changed to take into
account the total number of row combinations as the WHERE has to be
checked even if the one-row-cache is used. This makes the cost
slightly higher than before (for the same prev_record_reads() value).
Other things:
- Cached return value of prev_record_read() in best_access_path() to
avoid some function calls.
- Fixed bug where position[].use_join_buffer was set in
best_acess_path() when join buffer was not used. This confused the
semi join optimizer to try to reoptimize plans that did not need to be
reoptimized.
The effect of the bug fix is that we avoid doing some re-optimziations
with semi-joins when join_buffer is not used. In these cases the value
shown for the 'Filtering' column in EXPLAIN EXTENDED may change.
- Added 'prev_record.cc' that was used to verify the logic in
prev_record_reads().
Changes in test suite:
- EQ_REF tables are moved up to be earlier. This is because either the
higher WHERE cost when EQ_REF is used with more row combination or
change of cost when using join_cache.
- Filtered has changed (to the better) for some cases using semi-joins
subselect_sj.test subselect_sj_jcl6.test
Amended patch from Monty:
The issue was that Loose_scan_opt::save_to_position() did not take
into account records_out from best_access_path()
Make sure that POSITION object filled by Loose_scan_opt::save_to_position()
has records_out not higher than any other possible access method.
SELECT DISTINCT did not work with expressions with sum functions.
Distinct was only done on the values stored in the intermediate temporary
tables, which only stored the value of each sum function.
In other words:
SELECT DISTINCT sum(a),sum(b),avg(c) ... worked.
SELECT DISTINCT sum(a),sum(b) > 2,sum(c)+sum(d) would not work.
The later query would do ONLY apply distinct on the sum(a) part.
Reviewer: Sergei Petrunia <sergey@mariadb.com>
This was fixed by extending remove_dup_with_hash_index() and
remove_dup_with_compare() to take into account the columns in the result
list that where not stored in the temporary table.
Note that in many cases the above dup removal functions are not used as
the optimizer may be able to either remove duplicates early or it will
discover that duplicate remove is not needed. The later happens for
example if the group by fields is part of the result.
Other things:
- Backported from 11.0 the change of Sort_param.tmp_buffer from char* to
String.
- Changed Type_handler::make_sort_key() to take String as a parameter
instead of Sort_param. This was done to allow make_sort_key() functions
to be reused by distinct elimination functions.
This makes Type_handler_string_result::make_sort_key() similar to code
in 11.0
- Simplied error handling in remove_dup_with_compare() to remove code
duplication.
WITH TIES would not take effect if SELECT DISTINCT was used in a
context where an INDEX is used to resolve the ORDER BY clause.
WITH TIES relies on the `JOIN::order` to contain the non-constant
fields to test the equality of ORDER BY fiels required for WITH TIES.
The cause of the problem was a premature removal of the `JOIN::order`
member during a DISTINCT optimization. This lead to WITH TIES code assuming
ORDER BY only contained "constant" elements.
Disable this optimization when WITH TIES is in effect.
(side-note: the order by removal does not impact any current tests, thus
it will be removed in a future version)
Reviewed by: monty@mariadb.org
There was a bug in JOIN::make_notnull_conds_for_range_scans() when
clearing TABLE->tmp_set, which was used to mark fields that could not be
null.
This function was only used if 'not_null_range_scan=on' is set.
The effect was that tmp_set contained a 'random value' and this caused
the optimizer to think that some fields could not be null.
FLUSH TABLES clears tmp_set and because of this things worked temporarily.
Fixed by clearing tmp_set properly.
JOIN_CACHE::alloc_buffer() used wrong logic when calculating the size
of all join buffers. Then, it computed the ratio by which
JOIN::shrink_join_buffers() should shrink the buffers.
shrink_join_buffers() ended up in a situation where buffers would not
fit into the total quota after shrinking, which resulted in negative
buffer sizes. Due to use of unsigned integers it would cause very large
buffers to be used instead.
Make JOIN_CACHE::alloc_buffer() use the same logic as
JOIN::shrink_join_buffers() when it calculates the total size of
all join buffers so far.
Also, add a safety check in JOIN::shrink_join_buffers()
This patch doesn't include a testcase, because the original test dataset
is too big and fragile. We have dbt3_s001.inc but I wasn't able to demonstrate
the issue with it.
Extended keys works by first checking if the engine supports extended
keys.
If yes, it extends secondary key with primary key components and mark the
secondary keys as HA_EXT_NOSAME (unique).
If we later notice that there where no primary key, the extended key
information for secondary keys in share->key_info is reset. However the
key_info->flag HA_EXT_NOSAME was not reset!
This causes some strange things to happen:
- Tables that have no primary key or secondary index that contained the
primary key would be wrongly optimized as the secondary key could be
thought to be unique when it was not and not unique when it was.
- The problem was not shown in EXPLAIN because of a bug in
create_ref_for_key() that caused EQ_REF to be displayed by EXPLAIN as REF
when extended keys where used and the secondary key contained the primary
key.
This is fixed with:
- Removed wrong test in make_join_select() which did not detect that key
where unique when a secondary key contains the primary.
- Moved initialization of extended keys from create_key_infos() to
init_from_binary_frm_image() after we know if there is a usable primary
key or not. One disadvantage with this approach is that
key_info->key_parts may have not used slots (for keys we thought could
be extended but could not). Fixed by adding a check for unused key_parts
to copy_keys_from_share().
Other things:
- Simplified copying of first key part in create_key_infos().
- Added a lot of code comments in code that I had to check as part of
finding the issue.
- Fixed some indentation.
- Replaced a couple of looks using references to pointers in C
context where the reference does not give any benefit.
- Updated Aria and Maria to not assume the all key_info->rec_per_key
are in one memory block (this could happen when using dervived
tables with many keys).
- Fixed a bug where key_info->rec_per_key where not allocated
- Optimized TABLE::add_tmp_key() to only call alloc() once.
(No logic changes)
Test case changes:
- innodb_mysql.test changed index as an index the optimizer thought
was unique, was not. (Table had no primary key)
TODO:
- Move code that checks for partial or too long keys to the primary loop
earlier that initally decides if we should add extended key fields.
This is needed to ensure that HA_EXT_NOSAME is not set for partial or
too long keys. It will also shorten the current code notable.
Some tables where not eliminated when they could have been.
This was caused because HA_KEYREAD_ONLY is not set anymore for InnoDB
clustered index and the elimination code was depending on
field->part_of_key_not_clustered which was not set if HA_KEYREAD_ONLY
is not present.
Fixed by moving out field->part_of_key and
field->part_of_key_not_clustered from under HA_KEYREAD_ONLY (which
they should never have been part of).
Other things:
- Fixed a bug in make_join_select() that caused range to be used when
there where elminiated or constant tables present (Caused wrong
change of plans in join_outer_innodb.test). This also affected
show_explain.test and subselct_sj_mat.test where wrong 'range's where
replaced with index scans.
Reviewer: Sergei Petrunia <sergey@mariadb.com>
matching_candidates_in_table() computes the number of rows one
gets from the current table after applying the WHERE clause on
just this table
The function had a "found_counstraint heuristic" which reduced the
number of rows after WHERE check by 25% if there were comparisons
between key parts in table T and previous tables, like WHERE
T.keyXpartY= func(prev_table.cols)
Note that such comparisons can only be checked when the row of
table T is joined with rows of the previous tables. It is wrong
to apply the selectivity before the join operation.
Fixed by moving the 'found_constraint' code to a separate function
and only reducing the #rows in 'records_out'.
Renamed matching_candidates_in_table() to apply_selectivity_for_table() as
the function now either applies selectivity on the rows (depending
on the value of thd->variables.optimizer_use_condition_selectivity)
or uses the selectivity from the available range conditions.
Detailed description:
- Added more function comments and fixed types in some old comments
- Removed an outdated comment
- Cleaned up some functions in records.cc
- Replaced "while" with "if"
- Reused error code
- Made functions similar
- Added caching of pfs_batch_update()
- Simplified some rowid_filter code
- Only call build_range_rowid_filter() if rowid filter will be used
- Replaced tab->is_rowid_filter_built with need_to_build_rowid_filter.
We only have to test need_to_build_rowid_filter to know if we have
to build the filter. Old code needed two tests
- Added function 'clear_range_rowid_filter' to disable rowid filter.
Made things simpler as we can now clear all rowid filter variables
in one place.
- Removed some 'if' in sub_select()
The problem was that make_join_select() called test_quick_select() outside
of best_access_path(). This could use indexes that where not taken into
account before and this caused changes to selectivity and 'records_out'.
Fixed by updating records_out if test_quick_select() was called.
Added comments that not used keys of derivied tables will be deleted.
Added some comments about checking if pos_in_table_list is 0.
Other things:
- Added a marker (DBTYPE_IN_PREDICATE) in TABLE_LIST->derived_type
to indicate that the table was generated from IN (list). This is
useful for debugging and can later be used by explain if needed.
- Removed a not needed test of table->pos_in_table_list as it should
always be valid at this point in time.
The main difference in code path between EQ_REF and REF is that for
REF we have to do an extra read_next on the index to check that there
is no more matching rows.
Before this patch we added a preference of EQ_REF by ensuring that REF
would always estimate to find at least 2 rows.
This patch adds the cost of the extra key read_next to REF access and
removes the code that limited REF to at least 2 rows. For some queries
this can have a big effect as the total estimated rows will be halved
for each REF table with 1 rows.
multi_range cost calculations are also changed to take into account
the difference between EQ_REF and REF.
The effect of the patch to the test suite:
- About 80 test case changed
- Almost all changes where for EXPLAIN where estimated rows for REF
where changed from 2 to 1.
- A few test cases using explain extended had a change of 'filtered'.
This is because of the estimated rows are now closer to the
calculated selectivity.
- A very few test had a change of table order.
This is because the change of estimated rows from 2 to 1 or the small
cost change for REF
(main.subselect_sj_jcl6, main.group_by, main.dervied_cond_pushdown,
main.distinct, main.join_nested, main.order_by, main.join_cache)
- No key statistics and the estimated rows are now smaller which cased
estimated filtering to be lower.
(main.subselect_sj_mat)
- The number of total rows are halved.
(main.derived_cond_pushdown)
- Plans with 1 row changed to use RANGE instead of REF.
(main.group_min_max)
- ALL changed to REF
(main.key_diff)
- Key changed from ref + index_only to PRIMARY key for InnoDB, as
OPTIMIZER_ROW_LOOKUP_COST + OPTIMIZER_ROW_NEXT_FIND_COST is smaller than
OPTIMIZER_KEY_LOOKUP_COST + OPTIMIZER_KEY_NEXT_FIND_COST.
(main.join_outer_innodb)
- Cost changes printouts
(main.opt_trace*)
- Result order change
(innodb_gis.rtree)
The old code counted selectivity double in case of queries like:
WHERE key_part1=1 and key_part2 < 100
if the optimizer would decide to use a REF access on key_part1.
The new code in best_access_path() that changes REF access to RANGE
if the RANGE key is longer makes this issue less likely to happen.
I was not able to create a test case for 11.0, however if one ports this
patch to a MariaDB version without the change of REF to RANGE, the
selectivity will be counted double.
The reason for this is that we call file->index_flags(index, 0, 1)
multiple times in best_access_patch()when optimizing a table.
For example, in InnoDB, the calls is not trivial (4 if's and 2 assignments)
Now the function is inlined and is just a memory reference.
Other things:
- handler::is_clustering_key() and pk_is_clustering_key() are now inline.
- Added TABLE::can_use_rowid_filter() to simplify some code.
- Test if we should use a rowid_filter only if can_use_rowid_filter() is
true.
- Added TABLE::is_clustering_key() to avoid a memory reference.
- Simplify some code using the fact that HA_KEYREAD_ONLY is true implies
that HA_CLUSTERED_INDEX is false.
- Added DBUG_ASSERT to TABLE::best_range_rowid_filter() to ensure we
do not call it with a clustering key.
- Reorginized elements in struct st_key to get better memory alignment.
- Updated ha_innobase::index_flags() to not have
HA_DO_RANGE_FILTER_PUSHDOWN for clustered index