The NULL-aware index statistics fix is now controlled by the
FIX_INDEX_STATS_FOR_ALL_NULLS flag and disabled by default
for preserving execution plan stability in stable versions.
To enable:
SET @@new_mode = 'FIX_INDEX_STATS_FOR_ALL_NULLS';
Or via command line:
--new-mode=FIX_INDEX_STATS_FOR_ALL_NULLS
Or in configuration file:
[mysqld]
new_mode=FIX_INDEX_STATS_FOR_ALL_NULLS
`all_nulls_key_parts` bitmap is now calculated at set_statistics_for_table()
Variant 2: "Don't call optimize_stage2 too early"
The affected query had Split-Materialized derived table in another
derived table:
select * -- select#1
from (
select * -- select#2
from t1,
(select * from t2 ... group by t2.group_col) DT -- select#3
where t1.col=t2.group_col
) TBL;
The optimization went as follows:
JOIN::optimize() calls select_lex->handle_derived(DT_OPTIMIZE) which
calls JOIN::optimize() for all (direct and indirect) children SELECTs.
select#1->handle_derived() calls JOIN::optimize() for select#2 and #3;
select#2->handle_derived() calls JOIN::optimize() for select#3 the second
time. That call would call JOIN::optimize_stage2(), assuming the query
plan choice has been finalized for select#3.
But after that, JOIN::optimize() for select#2 would continue and consider
Split-Materialized for select#3. This would attempt to pick another join
order and cause a crash.
The fix is to have JOIN::optimize() not call optimize_stage2() ever.
Finalizing the query plan choice is now done by a separate call:
select_lex->handle_derived(thd->lex, DT_OPTIMIZE_STAGE2)
which invokes JOIN::optimize_stage2() and saves the query plan.
When all values in an indexed column are NULL, EITS statistics show
avg_frequency == 0. This commit adds logic to distinguish between
"no statistics available" and "all values are NULL" scenarios.
For NULL-rejecting conditions (e.g., t1.col = t2.col), when statistics
confirm all indexed values are NULL, the optimizer can now return a
very low cardinality estimate (1.0) instead of unknown (0.0), since
NULL = NULL never matches.
For non-NULL-rejecting conditions (e.g., t1.col <=> t2.col),
normal cardinality estimation continues to apply since matches are possible.
Changes:
- Added KEY::rec_per_key_null_aware() to check nulls_ratio from column
statistics when avg_frequency is 0
- Modified best_access_path() in sql_select.cc to use the new
rec_per_key_null_aware() method for ref access cost estimation
- The optimization works with single-column and composite indexes,
checking each key part's NULL-rejecting status via notnull_part bitmap
Wrong result is produced when split-materialized optimization is used for
grouping with order by and limit.
The fix is to not let Split-Materialized optimization to happen
when the sub-query has an ORDER BY with LIMIT, by returning FALSE early
in the method opt_split.cc#check_for_splittable_materialized()
However, with just the above change, there is a side-effect of
NOT "using index for group by" in the scenario when
all the following conditions are met: -
1. The query has derived table with GROUP BY and ORDER BY LIMIT
2. joined in a way that would allow Split-Materialized
if ORDER BY LIMIT wasn't present
3. An index suitable for using "index for group-by"
4. No where clause so that, "using for group by" is applicable,
but the index is not included in "possible_keys".
The reason being, join_tab's "keys" field wasn't being set in
sql_select.cc#make_join_select(). So, made this change as well
as part of this PR.
The "FETCH FIRST n ROWS WITH TIES" was not enforced when the SELECT used
"using index for group-by".
This was caused by an optimization which removed the ORDER BY clause
when the GROUP BY clause prescribed a compatible ordering.
Other GROUP BY strategies used workarounds to still handle WITH TIES,
see comment to using_with_ties_and_group_min_max() in this patch for
details. QUICK_GROUP_MIN_MAX_SELECT didn't have a workaround.
Fix this by disabling removal of ORDER BY when
QUICK_GROUP_MIN_MAX_SELECT is used.
If the call to mysql_handle_single_derived(..., DT_OPTIMIZE) in
JOIN::optimize_inner() returns 1, then JOIN::optimize_inner() will
also return 1 but will not set JOIN::error.
Sql_cmd_dml::execute_inner() would note this error but will return
0 (JOIN::error) to its caller, Sql_cmd_update|delete::execute_inner.
The caller will try to print the query plan and hit an assertion due
to query plan not being present.
Queries affected by the this bug involve explain format=json with a
DML statement containing a derived table and some manner of execution
abortion, a timeout or kill.
Fixed by setting JOIN::error.
Approved by: Sergei Petrunia (sergey@mariadb.com)
A nested select query is crashing in when optimizer_join_limit_pref_ratio=10
and optimizer_search_depth=1 due to an assertion failure in
JOIN::dbug_verify_sj_inner_tables().
In sql_select.cc#choose_plan(), there are 2 back-2-back calls to
greedy_search(). The first one is invoked to build a join plan
that can short-cut ORDER BY...LIMIT, while the second invocation
to not consider short-cut.
The greedy_search() should start with a value of join->cur_sj_inner_tables
set to 0. However, the first greedy_search() call left the value of
join->cur_sj_inner_tables to "6". This caused the assert to fail in
dbug_verify_sj_inner_tables() as soon as the second greedy_search() started,
where in it was expecting a value of 0.
Similar problem is noticed with cur_embedding_map in the case of nested
joins, and nested_join counter.
introduced init_join_plan_search_state() which is called
from start of greedy_search() and optimize_straight join
that does all the needed initialization by :
1. setting 0's to cur_sj_inner_tables, and cur_embedding_map
2. invoking reset_nj_counters().
- Removed duplicate words, like "the the" and "to to"
- Removed duplicate lines (one double sort line found in mysql.cc)
- Fixed some typos found while searching for duplicate words.
Command used to find duplicate words:
egrep -rI "\s([a-zA-Z]+)\s+\1\s" | grep -v param
Thanks to Artjoms Rimdjonoks for the command and pointing out the
spelling errors.
Parser changes made by Alexander Barkov <bar@mariadb.com>.
Part of the tests made by Iqbal Hassan <iqbal@hasprime.com>.
Initially marking with ORA_JOIN flag made also by
Iqbal Hassan <iqbal@hasprime.com>.
Main idea is that parser mark fields with (+) with a flag
(ORA_JOIN).
During Prepare the flag bring to the new created items if needed.
Later after preparing (fix_firlds()) WHERE confition the
relations betweel the tables analyzed and tables reordered
so to make JOIN/LEFT JOIN operators in chain equivalent to
the query with oracle outer join operator (+).
Then the flags of (+) removed.
main/statistics_json.result is updated for f8ba5ced55 (MDEV-36099)
The test uses 'delete from t1' in many places and then populates
the table again. The natural order of rows in a MyISAM table is well
defined and the test was implicitly relying on that.
before f8ba5ced55 delete was deleting rows one by one, using
ha_myisam::delete_row() because the connection was stuck in rbr mode.
This caused rows to be shown in the reverse insertion order (because of
the delete link list).
MDEV-36099 fixes this bug and the server now correctly uses
ha_myisam::delete_all_rows(). This makes rows to be shown in the
insertion order as expected.
temporary table
Compressed field cannot be part of a key by its nature: there is no
data to order, only the compressed data.
For optimizer temporary table we create uncompressed substitute.
In all other cases (MDEV-16808) we don't use key: add_keyuse() is
skipped by !field->compression_method() condition.
Also expand vcol field index coverings to include indexes covering all
the fields in the expression. The reasoning goes as follows: let f(c1,
c2, ..., cn) be a function on applied to columns c1, c2, ..., cn, if
f(...) is covered by an index, so should vc whose expression is
f(...).
For example, if t.vf = t.c1 + t.c2, and t has three indexes (vf), (c1,
c2), (c1).
Before this change, vf's index covering is a singleton {(vf)}. Let's call
that the "conventional" index covering.
After this change vf's index covering is now {(vf), (c1, c2)}, since
(c1, c2) covers both c1 and c2. Let's call (c1, c2) in this case the
"extra" covering.
With the coverings updated, when an index in the "extra" covering is
chosen for keyread, the vcol also needs to be calculated. In this case
we mark vcol in the table read_set, and ensure it is computed.
With these changes, we see various improvements, including from using
full table scan + filesort to full index scan + filesort when ORDER BY
an indexed vcol (here vc = c + 1 is a vcol and both c and vc are
indexes):
explain select c + 1 from t order by vc;
id select_type table type possible_keys key key_len ref rows Extra
-1 SIMPLE t ALL NULL NULL NULL NULL 10000 Using filesort
+1 SIMPLE t index NULL c 5 NULL 10000 Using index; Using filesort
The substitutions are followed updates to all_fields which include a
copy of the ORDER BY/GROUP BY item pointers, as well as corresponding
updates to ref_pointer_array so that the all_fields and
ref_pointer_array remain in sync.
Another, related change is the recomputation of table index covering
on substitutions. It not only reflects the correct table index
covering after the substitutions, but also improve executions where
the vcol index can be chosen, such as this example (here vc = c + 1
and vc is the only index in the table), from full table scan +
filesort to full index scan:
select vc from t order by c + 1;
We do it in SELECT as well as in single table DELETE/UPDATE.
For all degenerate select queries having sub-queries in them,
the field rows_examined in the slow query log is always being set to 0.
The problem is that, although sub-queries increment the rows_examined field
of the thd object correctly, the degenerate outer select query is resetting
the rows_examined to zero after it has finished execution, by
invoking thd->set_examined_row_count(0).
The solution is to remove the thd->set_examined_row_count(0) in the
degenerate select queries.
The recursive nature of add_table_function_dependencies
resolution meant that the detection of a stack overrun
would continue to recursively call itself.
Its quite possible that a user SQL could get multiple
ER_STACK_OVERRUN_NEED_MORE errors.
Additionaly the results of the stack overrrun check
result was incorrectly assigned to a table_map result.
Its only because of the "if error" check after
add_table_function_dependencies is called, that would
detected the stack overrun error, prevented a
potential corruped tablemap is from being processed.
Corrected add_table_function_dependencies to stop and
return on the detection of a stack overrun error.
The add_extra_deps call also was true on a stack overrun.
Wrong assertion was added by f1f9284181 (MDEV-34046) because PS
parameter is applicable not only to DELETE HISTORY.
Keeping value of select_lex->where for DELETE HISTORY was remade via
prep_where which is read by reinit_stmt_before_use(). For SELECT
prep_where is set in JOIN::optimize_inner() and that is not called for
DELETE.
Implements and tests the optimizer hints DERIVED_CONDITION_PUSHDOWN
and NO_DERIVED_CONDITION_PUSHDOWN, table-level hints to enable and
disable, respectively, the condition pushdown for derived tables
which is typically controlled by the condition_pushdown_for_derived
optimizer switch.
Implements and tests the optimizer hints MERGE and NO_MERGE, table-level
hints to enable and disable, respectively, the derived_merge optimization
which is typically controlled by the derived_merge optimizer switch.
Sometimes hints need to be fixed before TABLE instances are available, but
after TABLE_LIST instances have been created (as in the cases of MERGE and
NO_MERGE). This commit introduces a new function called
fix_hints_for_derived_table to allow early hint fixing for derived tables,
using only a TABLE_LIST instance (so long as such hints are not index-level).
replication problems
DELETE HISTORY did not process parameterized PS properly as the
history expression was checked on prepare stage when the parameters
was not yet substituted. In that case check_units() succeeded as there
is no invalid type: Item_param has type_handler_null which is
inherited from string type and this is valid type for history
expression. The warning was thrown when the expression was evaluated
for comparison on delete execution (when the parameter was already
substituted).
The fix postpones check_units() until the first PS execution. We have
to postpone where conditions processing until the first execution and
update select_lex.where on every execution as it is reset to the state
after prepare.
When processing queries like
INSERT INTO t1 (..) SELECT .. FROM t1, t2 ...,
there is a single query block (i.e., a single SELECT_LEX) for both INSERT and
SELECT parts. During hints resolution, when hints are attached to particular
TABLE_LIST's, the search is performed by table name across the whole
query block.
So, if a table mentioned in an optimizer hint is present in the INSERT part,
the hint is attached to the that table. This is obviously wrong as
optimizer hints are supposed to only affect the SELECT part of
an INSERT..SELECT clause.
This commit disables possible attaching hints to tables in the INSERT part
and fixes some other bugs related to INSERT..SELECT statements processing
This commit:
- fixes a couple of bugs in check_join_cache_usage();
- separates a part of opt_hints.test to a new file opt_hints_join_cache.test;
- add a batch of test cases run against different join_cache_level settings.
This commit implements optimizer hints allowing to affect the order
of joining tables:
- JOIN_FIXED_ORDER similar to existing STRAIGHT_JOIN hint;
- JOIN_ORDER to apply the specified table order;
- JOIN_PREFIX to hint what tables should be first in the join;
- JOIN_SUFFIX to hint what tables should be last in the join.
- remove get_args_printer() from hints printing
- add append_hint_arguments(THD *thd, opt_hints_enum hint, String *str)
- add more comments
- rename st_opt_hint_info::hint_name to hint_type
- add pptimizer trace support for hints
- add dbug_print_hints()
- make print_warn() not be a template
- introduce Printable_parser_rule interface, make grammar rules that
emit warnings implement it and print_warn invokes its function)
- remove Parser::Hint::append_args() as it is not used anywhere
(it used to be necessary call print_warn(... (Parser::Hint*)NULL);
It places a limit N (a timeout value in milliseconds) on how long
a statement is permitted to execute before the server terminates it.
Syntax:
SELECT /*+ MAX_EXECUTION_TIME(milliseconds) */ ...
Only top-level SELECT statements support the hint.