This commit updates default memory allocations size used with MEM_ROOT
objects to minimize the number of calls to malloc().
Changes:
- Updated MEM_ROOT block sizes in sql_const.h
- Updated MALLOC_OVERHEAD to also take into account the extra memory
allocated by my_malloc()
- Updated init_alloc_root() to only take MALLOC_OVERHEAD into account as
buffer size, not MALLOC_OVERHEAD + sizeof(USED_MEM).
- Reset mem_root->first_block_usage if and only if first block was used.
- Increase MEM_ROOT buffers sized used by my_load_defaults, plugin_init,
Create_tmp_table, allocate_table_share, TABLE and TABLE_SHARE.
This decreases number of malloc calls during queries.
- Use a small buffer for THD->main_mem_root in THD::THD. This avoids
multiple malloc() call for new connections.
I tried the above changes on a complex select query with 12 tables.
The following shows the number of extra allocations that where used
to increase the size of the MEM_ROOT buffers.
Original code:
- Connection to MariaDB: 9 allocations
- First query run: 146 allocations
- Second query run: 24 allocations
Max memory allocated for thd when using with heap table: 61,262,408
Max memory allocated for thd when using Aria tmp table: 419,464
After changes:
Connection to MariaDB: 0 allocations
- First run: 25 allocations
- Second run: 7 allocations
Max memory allocated for thd when using with heap table: 61,347,424
Max memory allocated for thd when using Aria table: 529,168
The new code uses slightly more memory, but avoids memory fragmentation
and is slightly faster thanks to much fewer calls to malloc().
Reviewed-by: Sergei Golubchik <serg@mariadb.org>
Heap tables are allocated blocks to store rows according to
my_default_record_cache (mapped to the server global variable
read_buffer_size).
This causes performance issues when the record length is big
(> 1000 bytes) and the my_default_record_cache is small.
Changed to instead split the default heap allocation to 1/16 of the
allowed space and not use my_default_record_cache anymore when creating
the heap. The allocation is also aligned to be just under a power of 2.
For some test that I have been running, which was using record length=633,
the speed of the query doubled thanks to this change.
Other things:
- Fixed calculation of max_records passed to hp_create() to take
into account padding between records.
- Updated calculation of memory needed by heap tables. Before we
did not take into account internal structures needed to access rows.
- Changed block sized for memory_table from 1 to 16384 to get less
fragmentation. This also avoids a problem where we need 1K
to manage index and row storage which was not counted for before.
- Moved heap memory usage to a separate test for 32 bit.
- Allocate all data blocks in heap in powers of 2. Change reported
memory usage for heap to reflect this.
Reviewed-by: Sergei Golubchik <serg@mariadb.org>
MDEV-32329 (patch) pushdown from having into where: Server crashes at sub_select
When generating an Item_equal with a Item_ref that refers to a field
outside of a subselect, remove_item_direct_ref() causes the dependency
(depended_from) on the outer select to be lost, which causes trouble
for code downstream that can no longer determine the scope of the Item.
Not calling remove_item_direct_ref() retains the Item's dependency.
Test cases from MDEV-32395 and MDEV-32329 are included.
Some fixes from other developers:
Monty:
- Fixed wrong code in Item_equal::create_pushable_equalities()
that could cause wrong item to be used if there was no matching items.
Daniel Black:
- Added test cases from MDEV-32329
Igor Babaev:
- Provided fix for removing call to remove_item_direct_ref() in
eliminate_item_equal()
MDEV-32395: update_depend_map_for_order: SEGV at /mariadb-11.3.0/sql/sql_select.cc:16583
Include test cases from MDEV-32329.
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict
The functions queue_compare, qsort2_cmp, and qsort_cmp2
all had similar interfaces, and were used interchangable
and unsafely cast to one another.
This patch consolidates the functions all into the
qsort_cmp2 interface.
Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
The code in best_access_path() uses PREV_BITS(uint, N) to
compute a bitmap of all keyparts: {keypart0, ... keypart{N-1}).
The problem is that PREV_BITS($type, N) macro code can't handle the case
when N=<number of bits in $type).
Also, why use PREV_BITS(uint, ...) for key part map computations when
we could have used PREV_BITS(key_part_map) ?
Fixed both:
- Change PREV_BITS(type, N) to handle any N in [0; n_bits(type)].
- Change PREV_BITS() to use key_part_map when computing key_part_map bitmaps.
Assertion failure has happened due to this scenario:
A query was ran with optimizer_join_limit_pref_ratio=1.
The query had "ORDER BY t1.col LIMIT N".
The optimizer set join->limit_shortcut_applicable=1.
Then, table t1 was marked as constant.
The code in choose_query_plan() still set join->limit_optimization_mode=1
which caused the optimizer to only consider t1 as the first non-const table.
But t1 was already put into the join prefix as the constant table.
The optimizer couldn't produce any join order at all and crashed.
Fixed by not searching for shortcut plan if ORDER BY table is a constant.
We will not try to do sorting anyway in this case (and LIMIT short-cutting
will be done for any join order).
(Variant 2: only allow rewrite for ref(const))
make_join_select() has a "ref_to_range" rewrite: it would rewrite
any ref access to a range access on the same index if the latter uses
more keyparts.
It seems, he initial intent of this was to fix poor query plan choice
in cases like
t.keypart1=const AND t.keypart2 < 'foo'
Due to deficiency in cost model, ref access could be picked while range
would enumerate fewer rows and be cheaper.
However, the condition also forces a rewrite in cases like:
t.keypart1=prev_table.col AND t.keypart1<='foo' AND t.keypart2<'bar'
Here, it can be that
* keypart1=prev_table.col is highly selective
* (keypart1, keypart2) <= ('foo', 'bar') is not at all selective.
Still, the rewrite would be made and poor query plan chosen.
Fixed this by only doing the rewrite if ref access was ref(const)
so we can be certain that quick select also used these restrictions
and will scan a subset of rows that ref access would scan.
Pre-11.0 variant:
1. In recompute_join_cost_with_limit(), add an assertion that that
partial_join_cost >= 0.0.
2. best_extension_by_limited_search() subtracts COST_EPS from
join->best_read. But it is not subtracted from
join->positions[0].read_time, add it back.
2. We could get very small negative partial_join_cost due to rounding
errors. For fraction=1.0, we were computing essentially this (denote
as EXPR-1):
$row_read_cost + $where_cost - ($row_read_cost + $where_cost)
which should compute to 0.
But the computation was done in the following order (left-to-right):
EXPR-2:
($row_read_cost + $where_cost) - $row_read_cost - $where_cost
this produced a value of -1.1102230246251565e-16 due to a rounding
error. Change the computation use EXPR-1 instead of EXPR-2.
optimize_straight_join and best_extension_by_limited_search()
use 0.001 to make choice between plans with identical cost deterministic.
Use COST_EPS instead of 0.001, like it's done in newer versions.
Stop skipping const items when selecting but skip them when storing
their results to spider row to avoid storing in mismatching temporary
table fields.
Skip auxiliary fields in SELECTing, and do not store
the (non-existing) results to the corresponding temporary table
accordingly.
When there are BOTH auxiliary fields AND const items in the auxiliary
field items, do not use the spider GBH. This is a rare occasion if it
happens at all and not worth the added complexity to cover it.
Use the original item (item_ptr) in constructing GROUP BY and ORDER
BY, which also means using item->name instead of field->field_name as
aliases in constructing SELECT items. This fixes spurious regressions
caused by the above changes in some tests using ORDER BY, such as
mdev_24517.test. As a by-product, this also fixes MDEV-29546.
Therefore we update mdev_29008.test to include the MDEV-29546 case.
Field_blob::store() has special code for GROUP_CONCAT temporary table
(to store blob values in Blob_mem_storage - this prevents them
from being freed/overwritten when a next row is read).
Field_geom and Field_blob_compressed inherit from Field_blob but they
have their own ::store() method without this special Blob_mem_storage
support.
Considering that non-grouping CONCAT() of such fields converts
them to plain BLOB, let's do the same for GROUP_CONCAT. To do it,
Item_func_group_concat::setup will signal that it's creating
a temporary table for GROUP_CONCAT, and Field_blog::make_new_field()
override will create base Field_blob when under group concat.
Search conditions were evaluated using val_int(), which was wrong.
Fixing the code to use val_bool() instead.
Details:
- Adding a new item_base_t::IS_COND flag which marks Items used
as <search condition> in WHERE, HAVING, JOIN ON, CASE WHEN clauses.
The flag is at the parse time.
These expressions must be evaluated using val_bool() rather than val_int().
Note, the optimizer creates more Items which are used as search conditions.
Most of these items are not marked with IS_COND yet. This is OK for now,
but eventually these Items can also be fixed to have the flag.
- Adding a method Item::is_cond() which tests if the Item has the IS_COND flag.
- Implementing Item_cache_bool. It evaluates the cached expression using
val_bool() rather than val_int().
Overriding Type_handler_bool::Item_get_cache() to create Item_cache_bool.
- Implementing Item::save_bool_in_field(). It uses val_bool() rather than
val_int() to evaluate the expression.
- Implementing Type_handler_bool::Item_save_in_field()
using Item::save_bool_in_field().
- Fixing all Item_bool_func descendants to implement a virtual val_bool()
rather than a virtual val_int().
- To find places where val_int() should be fixed to val_bool(), a few
DBUG_ASSERT(!is_cond()) where added into val_int() implementations
of selected (most frequent) classes:
Item_field
Item_str_func
Item_datefunc
Item_timefunc
Item_datetimefunc
Item_cache_bool
Item_bool_func
Item_func_hybrid_field_type
Item_basic_constant descendants
- Fixing all places where DBUG_ASSERT() happened during an "mtr" run
to use val_bool() instead of val_int().
A call to
dbug_print_join_prefix(join_positions, idx, s)
returns a const char* ponter to string with current join prefix,
including the table being added to it.
(Variant 4, with @@optimizer_adjust_secondary_key_costs, reuse in two
places, and conditions are replaced with equivalent simpler forms in two more)
In best_access_path(), ReuseRangeEstimateForRef-3, the check
for whether
"all used key_part_i used key_part_i=const"
was incorrect: it may produced a "NO" answer for cases when we
had:
key_part1= const // some key parts are usable
key_part2= value_not_in_join_prefix //present but unusable
key_part3= non_const_value // unusable due to gap in key parts.
This caused the optimizer to fail to apply ReuseRangeEstimateForRef
heuristics. The consequence is poor query plan choice when the index
in question has very skewed data distribution.
The fix is enabled if its @@optimizer_adjust_secondary_key_costs flag
is set.
The memory leak happened on second execution of a prepared statement
that runs UPDATE statement with correlated subquery in right hand side of
the SET clause. In this case, invocation of the method
table->stat_records()
could return the zero value that results in going into the 'if' branch
that handles impossible where condition. The issue is that this condition
branch missed saving of leaf tables that has to be performed as first
condition optimization activity. Later the PS statement memory root
is marked as read only on finishing first time execution of the prepared
statement. Next time the same statement is executed it hits the assertion
on attempt to allocate a memory on the PS memory root marked as read only.
This memory allocation takes place by the sequence of the following
invocations:
Prepared_statement::execute
mysql_execute_command
Sql_cmd_dml::execute
Sql_cmd_update::execute_inner
Sql_cmd_update::update_single_table
st_select_lex::save_leaf_tables
List<TABLE_LIST>::push_back
To fix the issue, add the flag SELECT_LEX::leaf_tables_saved to control
whether the method SELECT_LEX::save_leaf_tables() has to be called or
it has been already invoked and no more invocation required.
Similar issue could take place on running the DELETE statement with
the LIMIT clause in PS/SP mode. The reason of memory leak is the same as for
UPDATE case and be fixed in the same way.
(Variant 2b: call greedy_search() twice, correct handling for limited
search_depth)
Modify the join optimizer to specifically try to produce join orders that
can short-cut their execution for ORDER BY..LIMIT clause.
The optimization is controlled by @@optimizer_join_limit_pref_ratio.
Default value 0 means don't construct short-cutting join orders.
Other value means construct short-cutting join order, and prefer it only
if it promises speedup of more than #value times.
In Optimizer Trace, look for these names:
* join_limit_shortcut_is_applicable
* join_limit_shortcut_plan_search
* join_limit_shortcut_choice
Discovered this while working on MDEV-34720: test_if_cheaper_ordering()
uses rec_per_key, while the original estimate for the access method
is produced in best_access_path() by using actual_rec_per_key().
Make test_if_cheaper_ordering() also use actual_rec_per_key().
Also make several getter function "const" to make this compile.
Also adjusted the testcase to handle this (the change backported from
11.0)
Before this patch the crash occured when a single row dataset is used and
Item::remove_eq_conds() is called for HAVING. This function is not supposed
to be called after the elimination of multiple equalities.
To fix this problem instead of Item::remove_eq_conds() Item::val_int() is
used. In this case the optimizer tries to evaluate the condition for the
single row dataset and discovers impossible HAVING immediately. So, the
execution phase is skipped.
Approved by Igor Babaev <igor@maridb.com>
New runtime diagnostic introduced with MDEV-34490 has detected
that `Item_int_with_ref` incorrectly returns an instance of its ancestor
class `Item_int`. This commit fixes that.
In addition, this commit reverts a part of the diagnostic related
to `clone_item()` checks. As it turned out, `clone_item()` is not required
to return an object of the same class as the cloned one. For example,
look at `Item_param::clone_item()`: it can return objects of `Item_null`,
`Item_int`, `Item_string`, etc, depending on the object state.
So the runtime type diagnostic is not applicable to `clone_item()` and
is disabled with this commit.
As the similar diagnostic failures are expected to appear again
in the future, this commit introduces a new test file in the main suite:
item_types.test, and new test cases may be added to this file
Reviewer: Oleksandr Byelkin <sanja@mariadb.com>
The `Item` class methods `get_copy()`, `build_clone()`, and `clone_item()`
face an issue where they may be defined in a descendant class
(e.g., `Item_func`) but not in a further descendant (e.g., `Item_func_child`).
This can lead to scenarios where `build_clone()`, when operating on an
instance of `Item_func_child` with a pointer to the base class (`Item`),
returns an instance of `Item_func` instead of `Item_func_child`.
Since this limitation cannot be resolved at compile time, this commit
introduces runtime type checks for the copy/clone operations.
A debug assertion will now trigger in case of a type mismatch.
`get_copy()`, `build_clone()`, and `clone_item()` are no more virtual,
but virtual `do_get_copy()`, `do_build_clone()`, and `do_clone_item()`
are added to the protected section of the class `Item`.
Additionally, const qualifiers have been added to certain methods
to enhance code reliability.
Reviewer: Oleksandr Byelkin <sanja@mariadb.com>
This commits adds the "materialization" block to the output of
EXPLAIN/ANALYZE FORMAT=JSON when materialized subqueries are involved
into processing. In the case of ANALYZE additional runtime information
is displayed, such as:
- chosen strategy of materialization
- number of partial match/index lookup loops
- sizes of partial match buffers
Improve performance of queries like
SELECT * FROM t1 WHERE field = NAME_CONST('a', 4);
by, in this example, replacing the WHERE clause with field = 4
in the case of ref access.
The rewrite is done during fix_fields and we disambiguate this
case from other cases of NAME_CONST by inspecting where we are
in parsing. We rely on THD::where to accomplish this. To
improve performance there, we change the type of THD::where to
be an enumeration, so we can avoid string comparisons during
Item_name_const::fix_fields. Consequently, this patch also
changes all usages of THD::where to conform likewise.
The optimizer deals with Rowid Filters this way:
1. First, range optimizer is invoked. It saves information
about all potential range accesses.
2. A query plan is chosen. Suppose, it uses a Rowid Filter on
index $IDX.
3. JOIN::make_range_rowid_filters() calls the range optimizer
again to create a quick select on index $IDX which will be used
to populate the rowid filter.
The problem: KILL command catches the query in step #3. Quick
Select is not created which causes a crash.
Fixed by checking if query was killed. Note: the problem also
affects 10.6, even if error handling for
SQL_SELECT::test_quick_select is different there.
(Variant for 10.6: return error code from SQL_SELECT::test_quick_select)
The optimizer deals with Rowid Filters this way:
1. First, range optimizer is invoked. It saves information
about all potential range accesses.
2. A query plan is chosen. Suppose, it uses a Rowid Filter on
index $IDX.
3. JOIN::make_range_rowid_filters() calls the range optimizer
again to create a quick select on index $IDX which will be used
to populate the rowid filter.
The problem: KILL command catches the query in step #3. Quick
Select is not created which causes a crash.
Fixed by checking if query was killed.
Rowid Filter cannot be used with reverse-ordered scans, for the
same reason as IndexConditionPushdown cannot be.
test_if_skip_sort_order() already has logic to disable ICP when
setting up a reverse-ordered scan. Added logic to also disable
Rowid Filter in this case, factored out the code into
prepare_for_reverse_ordered_access(), and added a comment describing
the cause of this limitation.
A memory leak happens on the second execution of a query that run in PS mode
and uses the function ROWNUM().
A memory leak took place on allocation of an instance of the class Item_int
for storing a limit value that is performed at the function set_limit_for_unit
indirectly called from JOIN::optimize_inner. Typical trace to the place where
the memory leak occurred is below:
JOIN::optimize_inner
optimize_rownum
process_direct_rownum_comparison
set_limit_for_unit
new (thd->mem_root) Item_int(thd, lim, MAX_BIGINT_WIDTH);
To fix this memory leak, calling of the function optimize_rownum()
has to be performed only once on first execution and never called
after that. To control it, the new data member
first_rownum_optimization
added into the structure st_select_lex.