(Variant 2: only allow rewrite for ref(const))
make_join_select() has a "ref_to_range" rewrite: it would rewrite
any ref access to a range access on the same index if the latter uses
more keyparts.
It seems, he initial intent of this was to fix poor query plan choice
in cases like
t.keypart1=const AND t.keypart2 < 'foo'
Due to deficiency in cost model, ref access could be picked while range
would enumerate fewer rows and be cheaper.
However, the condition also forces a rewrite in cases like:
t.keypart1=prev_table.col AND t.keypart1<='foo' AND t.keypart2<'bar'
Here, it can be that
* keypart1=prev_table.col is highly selective
* (keypart1, keypart2) <= ('foo', 'bar') is not at all selective.
Still, the rewrite would be made and poor query plan chosen.
Fixed this by only doing the rewrite if ref access was ref(const)
so we can be certain that quick select also used these restrictions
and will scan a subset of rows that ref access would scan.
Remove an assert added by fix for MDEV-34417. BNL-H join can be used with
prefix keys. This happens when there are real prefix indexes on the
equi-join columns (although it probably doesn't make a lot of sense).
Anyway, remove the assert. The code receives properly truncated key values
for hashing/comparison so it can handle them just fine.
JOIN_CACHE has a light-weight initialization mode that's targeted at
EXPLAINs. In that mode, JOIN_CACHE objects are not able to execute.
Light-weight mode was used whenever the statement was an EXPLAIN. However
the EXPLAIN can execute subqueries, provided they enumerate less than
@@expensive_subquery_limit rows.
Make sure we use light-weight initialization mode only when the select is
more expensive @@expensive_subquery_limit.
Also add an assert into JOIN_CACHE::put_record() which prevents its use
if it was initialized for EXPLAIN only.
This patch fixes a performance regression introduced in the patch for the
bug MDEV-21104. The performance regression could affect queries for which
join buffer was used for an outer join such that its on expression from
which a conjunctive condition depended only on outer tables can be
extracted. If the number of records in the join buffer for which this
condition was false greatly exceeded the number of other records the
slowdown could be significant.
If there is a conjunctive condition extracted from the ON expression
depending only on outer tables this condition is evaluated when interesting
fields of each survived record of outer tables are put into the join buffer.
Each such set of fields for any join operation is supplied with a match
flag field used to generate null complemented rows. If the result of the
evaluation of the condition is false the flag is set to MATCH_IMPOSSIBLE.
When looking in the join buffer for records matching a record of the
right operand of the outer join operation the records with such flags
are not needed to be unpacked into record buffers for evaluation of on
expressions.
The patch for MDEV-21104 fixing some problem of wrong results when
'not exists' optimization by mistake broke the code that allowed to
ignore records with the match flag set to MATCH_IMPOSSIBLE when looking
for matching records. As a result such records were unpacked for each
record of the right operand of the outer join operation. This caused
significant execution penalty in some cases.
One of the test cases added in the patch can be used only for demonstration
of the restored performance for the reported query. The second test case is
needed to demonstrate the validity of the fix.
This is also related to
MDEV-31348 Assertion `last_key_entry >= end_pos' failed in virtual bool
JOIN_CACHE_HASHED::put_record()
Valgrind exposed a problem with the join_cache for hash joins:
=25636== Conditional jump or move depends on uninitialised value(s)
==25636== at 0xA8FF4E: JOIN_CACHE_HASHED::init_hash_table()
(sql_join_cache.cc:2901)
The reason for this was that avg_record_length contained a random value
if one had used SET optimizer_switch='optimize_join_buffer_size=off'.
This causes either 'random size' memory to be allocated (up to
join_buffer_size) which can increase memory usage or, if avg_record_length
is less than the row size, memory overwrites in thd->mem_root, which is
bad.
Fixed by setting avg_record_length in JOIN_CACHE_HASHED::init()
before it's used.
There is no test case for MDEV-31893 as valgrind of join_cache_notasan
checks that.
I added a test case for MDEV-31348.
This is also related to
MDEV-31348 Assertion `last_key_entry >= end_pos' failed in virtual bool
JOIN_CACHE_HASHED::put_record()
Valgrind exposed a problem with the join_cache for hash joins:
=25636== Conditional jump or move depends on uninitialised value(s)
==25636== at 0xA8FF4E: JOIN_CACHE_HASHED::init_hash_table()
(sql_join_cache.cc:2901)
The reason for this was that avg_record_length contained a random value
if one had used SET optimizer_switch='optimize_join_buffer_size=off'.
This causes either 'random size' memory to be allocated (up to
join_buffer_size) which can increase memory usage or, if avg_record_length
is less than the row size, memory overwrites in thd->mem_root, which is
bad.
Fixed by setting avg_record_length in JOIN_CACHE_HASHED::init()
before it's used.
There is no test case for MDEV-31893 as valgrind of join_cache_notasan
checks that.
I added a test case for MDEV-31348.
The problem was that JOIN_CACHE::alloc_buffer() did not check if the
given join_buffer_value is less than the query require.
Added a check for this and disabled join cache if it cannot be used.
- main.selectivity failed because one test produced different result with
embedded (missing feature). Fixed by moving the failing part to
selectivity_notembedded.
- Disabled maria.encrypt-no-key for embedded as embedded does not support
encryption
- Moved test from join_cache to join_cache_notasan that tried to alloc()
a buffer bigger than available memory.
The old code did set max_records to either number_of_rows
(partial_join_cardinality) or memory size (join_buffer_space_limit)
which did not make sense.
Fixed by setting max_records to number of rows that fits into
join_buffer_size.
Other things:
- Initialize buffer cache values in JOIN_CACHE constructors (safety)
Reviewer: Sergei Petrunia <sergey@mariadb.com>
The problem was that join_buffer_size conflicted with
join_buffer_space_limit, which caused the query to be run without join
buffer. However this caused wrong results as the optimizer assumed
that hash+join buffer would ensure that the equi-join condition
would be satisfied, and didn't check it itself.
Fixed by not using join_buffer_space_limit when
optimize_join_buffer_size=off. This matches the documentation at
https://mariadb.com/kb/en/block-based-join-algorithms
Other things:
- Removed not used variable JOIN_TAB::join_buffer_size_limit
- Give an error if we cannot allocate a join buffer. This can
only happen if the join_buffer variables are wrongly configured or
we are running out of memory.
In the future, instead of returning an error, we could properly
convert the query plan that uses BNL-H join into one that doesn't
use join buffering:
make sure the equi-join condition is checked where appropriate.
Reviewer: Sergei Petrunia <sergey@mariadb.com>
Introduces @@optimizer_switch flag: hash_join_cardinality
When this option is on, use EITS statistics to produce tighter bounds
for hash join output cardinality.
This patch is an extension / replacement to a similar patch in 10.6
New features:
- optimizer_switch hash_join_cardinality is on by default
- records_out is set to fanout when HASH is used
- Fixed bug in is_eits_usable: The function did not work with views
The old code in prev_record_reads() did give wrong estimates when a
join_buffer was used or if the table was depending on more than one
other tables. When join_cache is used, it will cause a re-order of row
combinations, which causes more calls to the engine for tables that
are depending on tables before the join_cached one.
The new prev_records_read() code provides more exact estimates and
should never give a 'too low estimate', assuming that the data to the
function is correct
The definition of prev_record_read() is also updated.
The new definition is:
"Estimate the number of engine ha_index_read_calls for EQ_REF tables
when taking into account the one-row-cache in join_read_always_key()"
The cost of using prev_record_reads() value is changed. The value is
now used similar as before to calculate the cost of the storage engine
calls. However the cost of the WHERE cost is changed to take into
account the total number of row combinations as the WHERE has to be
checked even if the one-row-cache is used. This makes the cost
slightly higher than before (for the same prev_record_reads() value).
Other things:
- Cached return value of prev_record_read() in best_access_path() to
avoid some function calls.
- Fixed bug where position[].use_join_buffer was set in
best_acess_path() when join buffer was not used. This confused the
semi join optimizer to try to reoptimize plans that did not need to be
reoptimized.
The effect of the bug fix is that we avoid doing some re-optimziations
with semi-joins when join_buffer is not used. In these cases the value
shown for the 'Filtering' column in EXPLAIN EXTENDED may change.
- Added 'prev_record.cc' that was used to verify the logic in
prev_record_reads().
Changes in test suite:
- EQ_REF tables are moved up to be earlier. This is because either the
higher WHERE cost when EQ_REF is used with more row combination or
change of cost when using join_cache.
- Filtered has changed (to the better) for some cases using semi-joins
subselect_sj.test subselect_sj_jcl6.test
The main difference in code path between EQ_REF and REF is that for
REF we have to do an extra read_next on the index to check that there
is no more matching rows.
Before this patch we added a preference of EQ_REF by ensuring that REF
would always estimate to find at least 2 rows.
This patch adds the cost of the extra key read_next to REF access and
removes the code that limited REF to at least 2 rows. For some queries
this can have a big effect as the total estimated rows will be halved
for each REF table with 1 rows.
multi_range cost calculations are also changed to take into account
the difference between EQ_REF and REF.
The effect of the patch to the test suite:
- About 80 test case changed
- Almost all changes where for EXPLAIN where estimated rows for REF
where changed from 2 to 1.
- A few test cases using explain extended had a change of 'filtered'.
This is because of the estimated rows are now closer to the
calculated selectivity.
- A very few test had a change of table order.
This is because the change of estimated rows from 2 to 1 or the small
cost change for REF
(main.subselect_sj_jcl6, main.group_by, main.dervied_cond_pushdown,
main.distinct, main.join_nested, main.order_by, main.join_cache)
- No key statistics and the estimated rows are now smaller which cased
estimated filtering to be lower.
(main.subselect_sj_mat)
- The number of total rows are halved.
(main.derived_cond_pushdown)
- Plans with 1 row changed to use RANGE instead of REF.
(main.group_min_max)
- ALL changed to REF
(main.key_diff)
- Key changed from ref + index_only to PRIMARY key for InnoDB, as
OPTIMIZER_ROW_LOOKUP_COST + OPTIMIZER_ROW_NEXT_FIND_COST is smaller than
OPTIMIZER_KEY_LOOKUP_COST + OPTIMIZER_KEY_NEXT_FIND_COST.
(main.join_outer_innodb)
- Cost changes printouts
(main.opt_trace*)
- Result order change
(innodb_gis.rtree)