optimizer implicitly assumed that if `a` in `a=b` is not NULL,
then it's safe to convert `a` to the type of `b` and search the
result in the index(b).
which is not always the case, as converting a non-null value
to a different type might produce NULL. And searching for NULL
in the index might find NULL there, so NULL will be equal to NULL,
making `a=b` behave as if it was `a<=>b`
Fake_select_lex->join was prepared at the unit execution stage so
the validation of fake_select_lex before the unit pushdown
was incomplete. That caused pushing down of statements having
an incorrect ORDER BY clause.
This commit moves preparation of the fake_select_lex->join to the unit
prepare() method, before initializing of the pushdown handler,
so incorrect clauses error out before being pushed down
The reason for the crash wad that 'best splitting' optimization
predicted less rows to be found than what opt_range did.
This code in apply_selectivity_for_table(), when using use_cond_selectivity=1,
was not prepared for this case which caused an assert in debug builds.
Production builds is not affected.
The fix is to choose the smaller of the two row counts. It will have a
minimum on costs when using use_cond_selectivity=1 and should not cause
any problems in production.
This test case exposed 2 different bugs:
- When replacing a range with an index scan on a covering key
in test_if_skip_sort_order() we didn't disable filtering.
Filtering does not make much sense in this case.
- Fixed by disabling filtering in this case.
- Range_rowid_filter::fill() did not take into account that keyread
could already active, which caused an assert when it tried to
activate another keyread.
- Fixed by remembering old keyread state at start and restoring it
at end.
Other things:
- ha_start_keyread() allowed multiple calls. This is wrong, especially
as we do no check if the index changed!
I added an assert() to ensure that we don't call it there is already
an active keyread.
- ha_end_keyread() always called ha_extra(), even if keyread was not
active. Added a check to avoid the extra call.
This patch also fixes
MDEV-31391 Assertion `((best.records_out) == 0.0 ... failed
Cost changes caused by this change:
- range queries with join buffer now have a notable smaller cost.
- range ranges are bit more expensive as the MULTI_RANGE_COST is now
properly applied to it in all cases (this extra cost is equal to a
key lookup).
- table scan cost is slight smaller as we now assume data is cached in
the engine after the first scan pass. (We did this before for range
scans and other access methods).
- partition tables had wrong values for max_row_blocks and
max_index_blocks. Correcting this, causes range access on
partitioned tables to have slightly higher cost because of the
increased estimated IO.
- Using first match + join buffer caused 'filtered' to be calcualted
wrong. (Only affected EXPLAIN, not query costs).
- Added cost_without_join_buffer to optimizer_trace.
- check_quick_select() adjusted the number of rows according to persistent
statistics, but did not adjust cost. Now fixed.
The big change in the patch are:
- In best_access_path(), where we now are using storing the cost in
'ALL_READ_COST cost' and only converting it to a double at the end.
This allows us to more exactly calculate the effect of the join_cache.
- In JOIN_TAB::estimate_scan_time(), store the cost also in a
ALL_READ_COST object.
One of effect if this change is that when joining very small tables:
t1 some_access_method
t2 range
t3 ALL Use join buffer
This is swiched to
t1 some_access_method
t3 ALL
t2 range use join buffer
Both plans has the same cost, but as table scan in this case has less
cost than rang, the table scan will be considered first and thus have
precidence.
Test case changes:
- optimizer_trace - Addition of cost_without_join_buffer
- subselect_mat_cost_bugs - Small tables and scan versus range
- range & range_mrr_icp - Range + join_cache is faster than ref
- optimizer_trace - cost_without_join_buffer, smaller scan cost,
range setup cost.
- mrr - range+join_buffer used as smaller cost
Allow queries of multiple SELECTs combined together with
UNIONs/EXCEPTs/INTERSECTs to be pushed down to foreign engines.
If the foreign engine provides an interface method "create_unit"
and the UNIT is a top-level unit of the SQL query then the server
tries to push the whole SELECT_LEX_UNIT down to the engine for execution.
The engine should perform necessary checks and if they succeed,
execute the query. If the engine is unable to execute the whole unit,
then another attempt is made to push down SELECTs composing the unit
separately using the "create_select" interface method. In this case
the results of separate SELECTs are combined at the server side
thus composing the final result
The code in choose_best_splitting() assumed that the join prefix is
in join->positions[].
This is not necessarily the case. This function might be called when
the join prefix is in join->best_positions[], too.
Follow the approach from best_access_path(), which calls this function:
pass the current join prefix as an argument,
"const POSITION *join_positions" and use that.
This is 11.0 part of the fix: in 11.0, get_costs_for_tables() calls
best_access_path() for all possible tables, for each call it saves
a POSITION object with the access method and "loose_scan_pos"
POSITION object.
The latter is saved even if there is no possible LooseScan plan. Saving
is done by copying POSITION objects which may generate a spurious
UBSan error.
EXPLAIN EXTENDED should always print the field item used in the left part
of an equality expression from the SET clause of an update statement as a
reference to table column.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
The problem was a wrong assert. I changed it to match the code in
best_access_path().
The given test case was a bit tricky for the optimizer, which first
decided on using a index scan (because of force index), but then
test_if_skip_sort_order() decided to use range anyway to handle
distinct.
This was caused of two minor issues:
- get_quick_record_count() returned the number of rows for range with
least cost, when it should have returned the minum number of rows
for any range.
- When changing REF to RANGE, we also changed records_out, which
should not be done (number of records in the result will not
change).
The above change can cause a small change in row estimates where the
optimizer chooses a clustered key with more rows than a range one
secondary key (unlikely case).
The problem was that when JOIN_TAB::remove_duplicates() noticed there
can only be one possible row in the output, it adjusted limits but
didn't take into account any possible offset.
Fixed by not adjusting limit offset when setting one-row-limit.
When a query does implicit grouping and join operation produces an empty
result set, a NULL-complemented row combination is generated.
However, constant table fields still show non-NULL values.
What happens in the is that end_send_group() is called with a
const row but without any rows matching the WHERE clause.
This last part is shown by 'join->first_record' not being set.
This causes item->no_rows_in_result() to be called for all items to reset
all sum functions to their initial state. However fields are not set
to NULL.
The used fix is to produce NULL-complemented records for constant tables
as well. Also, reset the constant table's records back in case we're
in a subquery which may get re-executed.
An alternative fix would have item->no_rows_in_result() also work
with Item_field objects.
There is some other issues with the code:
- join->no_rows_in_result_called is used but never set.
- Tables that are used with group functions are not properly marked as
maybe_null, which is required if the table rows should be regarded as
null-complemented (not existing).
- The code that tries to detect if mixed_implicit_grouping should be set
didn't take into account all usage of fields and sum functions.
- Item_func::restore_to_before_no_rows_in_result() called the wrong
function.
- join->clear() does not use a table_map argument to clear_tables(),
which caused it to ignore constant tables.
- unclear_tables() does not correctly restore status to what is
was before clear_tables().
Main bug fix was to always use a table_map argument to clear_tables() and
always use join->clear() and clear_tables() together with unclear_tables().
Other fixes:
- Fixed Item_func::restore_to_before_no_rows_in_result()
- Set 'join->no_rows_in_result_called' when no_rows_in_result_set()
is called.
- Removed not used argument from setup_end_select_func().
- More code comments
- Ensure that end_send_group() modifies the same fields as are in the
result set.
- Changed return_zero_rows() to use pointers instead of references,
similar to the rest of the code.
Reviewer: Sergei Petrunia <sergey@mariadb.com>
The problem was trying to access JOIN_TAB::select which is set to NULL
when using the filesort. The correct way is accessing either
JOIN_TAB::select or JOIN_TAB::filesort->select depending on whether
the filesort is used.
This commit introduces member function JOIN_TAB::get_sql_select()
encapsulating that check so the code duplication is eliminated.
The new condition (s->table->quick_keys.is_set(best_key->key))
was added to best_access_path() to eliminate a Valgrind error.
The cause of that error was using TRASH_ALLOC(quick_key_parts)
instead of bzero(quick_key_parts); hence, accessing
s->table->quick_key_parts[best_key->key]) without prior checking
for quick_keys.is_set() might have caused reading "dirty" memory
EXPLAIN EXTENDED should always print the field item used in the left part
of an equality expression from the SET clause of an update statement as a
reference to table column.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
The code in create_internal_tmp_table() didn't take into account that
now temporary (derived) tables may have multiple indexes:
- one index due to duplicate removal
= In this example created by conversion of big-IN(...) into subquery
= this index might be converted into a "unique constraint" if the key
length is too large.
- one index added by derived_with_keys optimization.
Make create_internal_tmp_table() handle multiple indexes.
Before this patch, use of a unique constraint was indicated in
TABLE_SHARE::uniques. This was ok as unique constraint was the only index
in the table. Now it's no longer the case so TABLE_SHARE::uniques is removed
and replaced with an in-memory-only flag HA_UNIQUE_HASH.
This patch is based on Monty's patch.
Co-Author: Monty <monty@mariadb.org>