The code in choose_best_splitting() assumed that the join prefix is
in join->positions[].
This is not necessarily the case. This function might be called when
the join prefix is in join->best_positions[], too.
Follow the approach from best_access_path(), which calls this function:
pass the current join prefix as an argument,
"const POSITION *join_positions" and use that.
EXPLAIN EXTENDED should always print the field item used in the left part
of an equality expression from the SET clause of an update statement as a
reference to table column.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
The problem was that when JOIN_TAB::remove_duplicates() noticed there
can only be one possible row in the output, it adjusted limits but
didn't take into account any possible offset.
Fixed by not adjusting limit offset when setting one-row-limit.
When a query does implicit grouping and join operation produces an empty
result set, a NULL-complemented row combination is generated.
However, constant table fields still show non-NULL values.
What happens in the is that end_send_group() is called with a
const row but without any rows matching the WHERE clause.
This last part is shown by 'join->first_record' not being set.
This causes item->no_rows_in_result() to be called for all items to reset
all sum functions to their initial state. However fields are not set
to NULL.
The used fix is to produce NULL-complemented records for constant tables
as well. Also, reset the constant table's records back in case we're
in a subquery which may get re-executed.
An alternative fix would have item->no_rows_in_result() also work
with Item_field objects.
There is some other issues with the code:
- join->no_rows_in_result_called is used but never set.
- Tables that are used with group functions are not properly marked as
maybe_null, which is required if the table rows should be regarded as
null-complemented (not existing).
- The code that tries to detect if mixed_implicit_grouping should be set
didn't take into account all usage of fields and sum functions.
- Item_func::restore_to_before_no_rows_in_result() called the wrong
function.
- join->clear() does not use a table_map argument to clear_tables(),
which caused it to ignore constant tables.
- unclear_tables() does not correctly restore status to what is
was before clear_tables().
Main bug fix was to always use a table_map argument to clear_tables() and
always use join->clear() and clear_tables() together with unclear_tables().
Other fixes:
- Fixed Item_func::restore_to_before_no_rows_in_result()
- Set 'join->no_rows_in_result_called' when no_rows_in_result_set()
is called.
- Removed not used argument from setup_end_select_func().
- More code comments
- Ensure that end_send_group() modifies the same fields as are in the
result set.
- Changed return_zero_rows() to use pointers instead of references,
similar to the rest of the code.
Reviewer: Sergei Petrunia <sergey@mariadb.com>
The problem, introduced in patch for MDEV-26301:
When check_join_cache_usage() decides not to use join buffer, it must
adjust the access method accordingly. For BNL-H joins this means switching
from pseudo-"ref access"(with index=MAX_KEY) to some other access method.
Failing to do this will cause assertions down the line when code that is
not aware of BNL-H will try to initialize index use for ref access with
index=MAX_KEY.
The fix is to follow the regular code path to disable the join buffer for
the join_tab ("goto no_join_cache") instead of just returning from
check_join_cache_usage().
This patch optimizes the number of refills for the lateral derived table
to which a materialized derived table subject to split optimization is
is converted. This optimized number of refills is now considered as the
expected number of refills of the materialized derived table when searching
for the best possible splitting of the table.
When processing a query over a mergeable view at some conditions checked
at prepare stage it may be decided to materialize the view rather than
to merge it. Before this patch in such case the field 'derived' of the
TABLE_LIST structure created for the view remained set to 0. As a result
the guard condition preventing range analysis for materialized views did
not work properly. This led to a call of some handler method for the
temporary table created to contain the view's records that was supposed
to be used only for opened tables. However temporary tables created for
materialization of derived tables or views are not opened yet when range
analysis is performed.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Introduce @@optimizer_switch flag: hash_join_cardinality
When it is on, use EITS statistics to produce tighter bounds for
hash join output cardinality.
Amended by Monty.
Reviewed by: Monty <monty@mariadb.org>
Fix-up for commit 476b24d084
Author: Monty
Date: Thu Feb 16 14:19:33 2023 +0200
MDEV-20057 Distinct SUM on CROSS JOIN and grouped returns wrong result
which misses initializing of sorder->suffix_length.
In this commit the initialization is implemented by passing
MY_ZEROFILL flag to the allocation of SORT_FIELD elements
Now the same rule applied to vews and derived tables. So we should
allow merge of views (and derived) in queries with rownum, because
it do not change results, only makes query plans better.
This bug caused server crash when processing a multi-update statement that
used views if optimizer tracing was enabled.
The bug was introduced in the patch for MDEV-30539 that could incorrectly
detect the most top level selects of queries if views were used in them.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
After MDEV-30830 has added block-nl-join.r_unpack_time_ms, it became
apparent that there is some unaccounted-for time in BNL join operation,
namely the time that is spent after unpacking the join buffer record.
Fix this by adding a Gap_time_tracker to track the time that is spent
after unpacking the join buffer record and before any next time tracking.
The collected time is printed in block-nl-join.r_other_time_ms.
Reviewed by: Monty <monty@mariadb.org>
In block-nl-join, add:
- r_loops - this shows how many incoming record combinations this
query plan node had.
- r_effective_rows - this shows the average number of matching rows
that this table had for each incoming record combination. This is
comparable with r_rows in non-blocked access methods.
For BNL-joins, it is always equal to
$.table.r_rows * $.table.r_filtered
For BNL-H joins the value cannot be computed from other values
Reviewed by: Monty <monty@mariadb.org>
EXPLAIN EXTENDED for an UPDATE/DELETE/INSERT/REPLACE statement did not
produce the warning containing the text representation of the query
obtained after the optimization phase. Such warning was produced for
SELECT statements, but not for DML statements.
The patch fixes this defect of EXPLAIN EXTENDED for DML statements.
Subselect_single_value_engine cannot handle table value constructor used as
subquery. That's why any table value constructor TVC used as subquery is
converted into a select over derived table whose specification is TVC.
Currently the names of the columns of the derived table DT are taken from
the first element of TVC and if the k-th component of the element happens
to be a subquery the text representation of this subquery serves as the
name of the k-th column of the derived table. References of all columns of
the derived table DT compose the select list of the result of the conversion.
If a definition of a view contained a table value constructor used as a
subquery and the view was registered after this conversion had been
applied we could register an invalid view definition if the first element
of TVC contained a subquery as its component: the name of this component
was taken from the original subquery, while the name of the corresponding
column of the derived table was taken from the text representation of the
subquery produced by the function SELECT_LEX::print() and these names were
usually differ from each other.
To avoid registration of such invalid views the function SELECT_LEX::print()
now prints the original TVC instead of the select in which this TVC has
been wrapped. Now the specification of registered view looks like as if no
conversions from TVC to selects were done.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This patch also fixes some bugs detected by valgrind after this
patch:
- Not enough copy_func elements was allocated by Create_tmp_table() which
causes an memory overwrite in Create_tmp_table::add_fields()
I added an ASSERT() to be able to detect this also without valgrind.
The bug was that TMP_TABLE_PARAM::copy_fields was not correctly set
when calling create_tmp_table().
- Aria::empty_bits is not allocated if there is no varchar/char/blob
fields in the table. Fixed code to take this into account.
This cannot cause any issues as this is just a memory access
into other Aria memory and the content of the memory would not be used.
- Aria::last_key_buff was not allocated big enough. This may have caused
issues with rtrees and ma_extra(HA_EXTRA_REMEMBER_POS) as they
would use the same memory area.
- Aria and MyISAM didn't take extended key parts into account, which
caused problems when copying rec_per_key from engine to sql level.
- Mark asan builds with 'asan' in version strihng to detect these in
not_valgrind_build.inc.
This is needed to not have main.sp-no-valgrind fail with asan.
SELECT DISTINCT did not work with expressions with sum functions.
Distinct was only done on the values stored in the intermediate temporary
tables, which only stored the value of each sum function.
In other words:
SELECT DISTINCT sum(a),sum(b),avg(c) ... worked.
SELECT DISTINCT sum(a),sum(b) > 2,sum(c)+sum(d) would not work.
The later query would do ONLY apply distinct on the sum(a) part.
Reviewer: Sergei Petrunia <sergey@mariadb.com>
This was fixed by extending remove_dup_with_hash_index() and
remove_dup_with_compare() to take into account the columns in the result
list that where not stored in the temporary table.
Note that in many cases the above dup removal functions are not used as
the optimizer may be able to either remove duplicates early or it will
discover that duplicate remove is not needed. The later happens for
example if the group by fields is part of the result.
Other things:
- Backported from 11.0 the change of Sort_param.tmp_buffer from char* to
String.
- Changed Type_handler::make_sort_key() to take String as a parameter
instead of Sort_param. This was done to allow make_sort_key() functions
to be reused by distinct elimination functions.
This makes Type_handler_string_result::make_sort_key() similar to code
in 11.0
- Simplied error handling in remove_dup_with_compare() to remove code
duplication.
WITH TIES would not take effect if SELECT DISTINCT was used in a
context where an INDEX is used to resolve the ORDER BY clause.
WITH TIES relies on the `JOIN::order` to contain the non-constant
fields to test the equality of ORDER BY fiels required for WITH TIES.
The cause of the problem was a premature removal of the `JOIN::order`
member during a DISTINCT optimization. This lead to WITH TIES code assuming
ORDER BY only contained "constant" elements.
Disable this optimization when WITH TIES is in effect.
(side-note: the order by removal does not impact any current tests, thus
it will be removed in a future version)
Reviewed by: monty@mariadb.org
There was a bug in JOIN::make_notnull_conds_for_range_scans() when
clearing TABLE->tmp_set, which was used to mark fields that could not be
null.
This function was only used if 'not_null_range_scan=on' is set.
The effect was that tmp_set contained a 'random value' and this caused
the optimizer to think that some fields could not be null.
FLUSH TABLES clears tmp_set and because of this things worked temporarily.
Fixed by clearing tmp_set properly.
Enable use of Rowid Filter optimization with eq_ref access.
Use the following assumptions:
- Assume index-only access cost is 50% of non-index-only access cost.
- Take into account that "Eq_ref access cache" reduces the number of
lookups eq_ref access will make.
= This means the number of Rowid Filter checks is reduced also
= Eq_ref access cost is computed using that assumption (see
prev_record_reads() call), so we should use it in all cost '
computations.