Followup: remove this line from get_column_range_cardinality()
set_if_bigger(res, col_stats->get_avg_frequency());
and make sure it is only used with the binary histograms.
For JSON histograms, it makes the estimates unnecessarily imprecise.
Federated and Federatex cannot be used with ROR scans
Federated::position() and Federatex::position() is storing in 'ref' a
pointer into a local result set buffer. This means that one cannot
compare 'ref' from different handler instances to see if they point to the
same physical record.
This bug caused federated.federatedx to return wrong results when the
optimizer tried to use index_merge to resolve some queries.
Fixed by introducing table flag HA_NON_COMPARABLE_ROWID and using this
with the above handlers.
Todo:
- Fix multi_delete(), multi_update and read_records() to use primary key
instead of 'ref' if case HA_NON_COMPARABLE_ROWID is set. The current
code only works if we have only one range (like table scan) for the
tables that will be updated in the second pass.
- Enable DBUG_ASSERT() in ha_federated::cmp_ref() and
ha_federatedx::cmp_ref().
If when extracting a range condition for an index from the WHERE condition
Range Optimizer sees that the range condition covers the whole index then
such condition should be discarded because it cannot be used in any range
scan. In some cases Range Optimizer really does it, but there remained some
conditions for which it was not done. As a result the optimizer could
produce index merge plans with the full index scan for one of the indexes
participating in the index merge.
This could be observed in one of the test cases from index_merge1.inc
where a plan with index_merge_sort_union was produced and in the test case
reported for this bug where a plan with index_merge_sort_intersect was
produced. In both cases one of two index scans participating in index merge
ran over the whole index.
The patch slightly changes the original above mentioned test case from
index_merge1.inc to be able to produce an intended plan employing
index_merge_sort_union. The original query was left to show that index
merge is not used for it anymore.
It should be noted that for the plan with index_merge_sort_intersect could
be chosen for execution only due to a defect in the InnoDB code that
returns wrong estimates for the cardinality of big ranges.
This bug led to serious problems in 10.4+ where the optimization using
Rowid filters is employed (see mdev-26446).
Approved by Sergey Petrunia <sergey@mariadb.com>
https://jira.mariadb.org/browse/MDEV-26221
my_sys DYNAMIC_ARRAY and DYNAMIC_STRING inconsistancy
The DYNAMIC_STRING uses size_t for sizes, but DYNAMIC_ARRAY used uint.
This patch adjusts DYNAMIC_ARRAY to use size_t like DYNAMIC_STRING.
As the MY_DIR member number_of_files is copied from a DYNAMIC_ARRAY,
this is changed to be size_t.
As MY_TMPDIR members 'cur' and 'max' are copied from a DYNAMIC_ARRAY,
these are also changed to be size_t.
The lists of plugins and stored procedures use DYNAMIC_ARRAY,
but their APIs assume a size of 'uint'; these are unchanged.
in about a hundred of users of MY_BITMAP, only two were using its
built-in mutex, and only one of those two was actually needing it.
Remove the mutex from MY_BITMAP, remove all associated conditions
and checks in bitmap functions. Use an external LOCK_temp_pool
mutex and temp_pool_set_next/temp_pool_clear_bit acccessors.
Remove bitmap_init/bitmap_free, always use my_* versions.
Changes:
- To detect automatic strlen() I removed the methods in String that
uses 'const char *' without a length:
- String::append(const char*)
- Binary_string(const char *str)
- String(const char *str, CHARSET_INFO *cs)
- append_for_single_quote(const char *)
All usage of append(const char*) is changed to either use
String::append(char), String::append(const char*, size_t length) or
String::append(LEX_CSTRING)
- Added STRING_WITH_LEN() around constant string arguments to
String::append()
- Added overflow argument to escape_string_for_mysql() and
escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow.
This was needed as most usage of the above functions never tested the
result for -1 and would have given wrong results or crashes in case
of overflows.
- Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING.
Changed all Item_func::func_name()'s to func_name_cstring()'s.
The old Item_func_or_sum::func_name() is now an inline function that
returns func_name_cstring().str.
- Changed Item::mode_name() and Item::func_name_ext() to return
LEX_CSTRING.
- Changed for some functions the name argument from const char * to
to const LEX_CSTRING &:
- Item::Item_func_fix_attributes()
- Item::check_type_...()
- Type_std_attributes::agg_item_collations()
- Type_std_attributes::agg_item_set_converter()
- Type_std_attributes::agg_arg_charsets...()
- Type_handler_hybrid_field_type::aggregate_for_result()
- Type_handler_geometry::check_type_geom_or_binary()
- Type_handler::Item_func_or_sum_illegal_param()
- Predicant_to_list_comparator::add_value_skip_null()
- Predicant_to_list_comparator::add_value()
- cmp_item_row::prepare_comparators()
- cmp_item_row::aggregate_row_elements_for_comparison()
- Cursor_ref::print_func()
- Removes String_space() as it was only used in one cases and that
could be simplified to not use String_space(), thanks to the fixed
my_vsnprintf().
- Added some const LEX_CSTRING's for common strings:
- NULL_clex_str, DATA_clex_str, INDEX_clex_str.
- Changed primary_key_name to a LEX_CSTRING
- Renamed String::set_quick() to String::set_buffer_if_not_allocated() to
clarify what the function really does.
- Rename of protocol function:
bool store(const char *from, CHARSET_INFO *cs) to
bool store_string_or_null(const char *from, CHARSET_INFO *cs).
This was done to both clarify the difference between this 'store' function
and also to make it easier to find unoptimal usage of store() calls.
- Added Protocol::store(const LEX_CSTRING*, CHARSET_INFO*)
- Changed some 'const char*' arrays to instead be of type LEX_CSTRING.
- class Item_func_units now used LEX_CSTRING for name.
Other things:
- Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character
in the prompt would cause some part of the prompt to be duplicated.
- Fixed a lot of instances where the length of the argument to
append is known or easily obtain but was not used.
- Removed some not needed 'virtual' definition for functions that was
inherited from the parent. I added override to these.
- Fixed Ordered_key::print() to preallocate needed buffer. Old code could
case memory overruns.
- Simplified some loops when adding char * to a String with delimiters.
Handle "col<>const" in the same way that MDEV-21958 did for
"col NOT IN(const-list)": do not use the condition for range/index_merge
accesses if there is a unique UNIQUE KEY(col).
The testcase is in main/range.test. The rest of test updates are
due to widespread use of 'pk<>1' in the testsuite. Changed the test
to use different but equivalent forms of the conditions.
Note they key_or() may call tree_delete(), which will cause the weight
asserts to be checked. In order to avoid them from firing, update key1
tree's weight after we've changed key1->some_local_child->next_key_part.
Having done that, do we still need this at the function end:
/* Re-compute the result tree's weight. */
key1->update_weight_locally();
?
The problem was in and_all_keys(), the code of MDEV-9759 which calculates
the new tree weight:
First, it didn't take into account the case when
(next->next_key_part=tmp) == NULL
and dereferenced a NULL pointer when getting tmp->weight.
Second, "if (param->alloced_sel_args > SEL_ARG::MAX_SEL_ARGS) break"
could leave the loop with incorrect value of weight.
Fixed by introducing SEL_ARG::update_weight_locally() and calling it
at the end of the function. This allows to avoid caring about all the
above cases.
Also update the SEL_ARG graph weight in:
- sel_add()
- SEL_ARG::clone()
Make key_{and,or}_with_limit() to also verify weight for the arguments
(There is no single point to verify SEL_ARG graphs constructed from
conditions that are not AND-OR formulas, so we hope that those are
connected with AND/OR and do it here).
(Variant #5, full patch, for 10.5)
Do not produce SEL_ARG graphs that would yield huge numbers of ranges.
Introduce a concept of SEL_ARG graph's "weight". If we are about to
produce a graph whose "weight" exceeds the limit, remove the parts
of SEL_ARG graph that represent the biggest key parts. Do so until
the graph's is within the limit.
Includes
- debug code to verify SEL_ARG graph weight
- A user-visible @@optimizer_max_sel_arg_weight to control the optimization
- Logging the optimization into the optimizer trace.
Apply the patch based on the patch by Varun Gupta:
PARAM::is_ror_scan might be used unitialized when check_quick_select()
is invoked for a "degenerate" SEL_ARG tree (e.g. one having type
SEL_ARG::IMPOSSIBLE).
Make check_quick_select() always initialize PARAM::is_ror_scan.
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.
This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.
The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.
To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
orig_*, all_set bitmaps are never substituted already.
This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
MY_BITMAP* instead of my_bitmap_map*
These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.
This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.
The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.
To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
orig_*, all_set bitmaps are never substituted already.
This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
MY_BITMAP* instead of my_bitmap_map*
These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
Basic variant of the fix: do not consider conditions in form
unique_key NOT IN (c1,c2...)
to be sargable. If there are only a few constants, the condition
is not selective. If there are a lot constants, the overhead of
processing such a huge range list is not worth it.
(Backport to 10.2)
Basic variant of the fix: do not consider conditions in form
unique_key NOT IN (c1,c2...)
to be sargable. If there are only a few constants, the condition
is not selective. If there are a lot constants, the overhead of
processing such a huge range list is not worth it.
This bug could manifest itself for a query with WHERE condition containing
top level OR formula such that each conjunct contained a single-range
condition supported by the same index. One of these range conditions must
be fully covered by another range condition that is used later in the OR
formula. Additionally at least one of these condition should be ANDed with
a sargable range condition supported by a different index.
There were several attempts to fix related problems for OR conditions after
the backport of range optimizer code from MySQL (commit
0e19f3e36f). Unfortunately the first of these
fixes contained typo remained unnoticed until recently. This typo bug led
to rejection of valid range accesses. This patch fixed this typo bug.
The fix revealed another two bugs: one in a constructor for SEL_ARG,
the other in the function tree_or(). Both are fixed in this patch.
Part#1: Revert the patch that caused it:
commit 291be49474
Author: Igor Babaev <igor@askmonty.org>
Date: Thu Sep 24 22:02:00 2020 -0700
MDEV-23811: With large number of indexes optimizer chooses an inefficient plan