temporary table
Compressed field cannot be part of a key by its nature: there is no
data to order, only the compressed data.
For optimizer temporary table we create uncompressed substitute.
In all other cases (MDEV-16808) we don't use key: add_keyuse() is
skipped by !field->compression_method() condition.
For all degenerate select queries having sub-queries in them,
the field rows_examined in the slow query log is always being set to 0.
The problem is that, although sub-queries increment the rows_examined field
of the thd object correctly, the degenerate outer select query is resetting
the rows_examined to zero after it has finished execution, by
invoking thd->set_examined_row_count(0).
The solution is to remove the thd->set_examined_row_count(0) in the
degenerate select queries.
The recursive nature of add_table_function_dependencies
resolution meant that the detection of a stack overrun
would continue to recursively call itself.
Its quite possible that a user SQL could get multiple
ER_STACK_OVERRUN_NEED_MORE errors.
Additionaly the results of the stack overrrun check
result was incorrectly assigned to a table_map result.
Its only because of the "if error" check after
add_table_function_dependencies is called, that would
detected the stack overrun error, prevented a
potential corruped tablemap is from being processed.
Corrected add_table_function_dependencies to stop and
return on the detection of a stack overrun error.
The add_extra_deps call also was true on a stack overrun.
Wrong assertion was added by f1f9284181 (MDEV-34046) because PS
parameter is applicable not only to DELETE HISTORY.
Keeping value of select_lex->where for DELETE HISTORY was remade via
prep_where which is read by reinit_stmt_before_use(). For SELECT
prep_where is set in JOIN::optimize_inner() and that is not called for
DELETE.
replication problems
DELETE HISTORY did not process parameterized PS properly as the
history expression was checked on prepare stage when the parameters
was not yet substituted. In that case check_units() succeeded as there
is no invalid type: Item_param has type_handler_null which is
inherited from string type and this is valid type for history
expression. The warning was thrown when the expression was evaluated
for comparison on delete execution (when the parameter was already
substituted).
The fix postpones check_units() until the first PS execution. We have
to postpone where conditions processing until the first execution and
update select_lex.where on every execution as it is reset to the state
after prepare.
In non-EXPLAIN queries with subqueries, the trace was flooded
with empty "join_execution":{} nodes. Now, they are gone.
The "Range checked for each record" optimization still prints
content into trace on join execution. Now, we wrap it into
"range-checked-for-each-record" to delimit the invocations.
This new object has fields "select_id" which corresponds to
the outer query block, and the "loop" which corresponds to
the inner query block iteration number. Additionally,
the field "row_estimation" which itself is an object has
"table", and "range_analysis" fields that were moved
from the old "join_execution"'s steps array.
This bug is exposed by MDEV-30073, causing bogus warning messages to
be pushed by find_order_in_list(), but which is otherwise benign.
An existing test case in show_explain.test, MDEV-238 can be used together
with an assert to find a query which exposes the issue.
if (resolution == RESOLVED_BEHIND_ALIAS &&
order_item->fix_fields_if_needed_for_order_by(thd, order->item))
return TRUE;
/* Lookup the current GROUP field in the FROM clause. */
order_item_type= order_item->type();
+ DBUG_ASSERT( order_item_type == (*order->item)->type() );
This will fail here
CREATE TABLE t2 ( a INT );
INSERT INTO t2 VALUES (1),(2),(1),(4),(2);
explain SELECT alias.a FROM t2, ( SELECT * FROM t2 ) AS alias
GROUP BY alias.a;
This assert makes little sense after the patch.
DaveGosselin-MariaDB approved these changes Apr 18, 2025
Avoid ASAN failure by collecting statistics from Result objects
before cleaning them up. In related single-table cases, statistics
are maintained directly by the single-table update and delete
functions.
The problem is that copy function was used in field list but never
copied in this execution path.
So copy should be performed before returning result.
Protection against uninitialized copy usage added.
When subquery with LEFT JOIN is converted into semi-join, it is possible
to construct cases where the LEFT JOIN's ON expression refers to a table
in the current select but not in the current join nest. For example:
t1 SEMI JOIN (
t2
LEFT JOIN (t3 LEFT JOIN t4 ON t4.col=t1.col) ON expr
)
here, ON t4.col=t1.col" has this property. Let's denote it as
ON-EXPR-HAS-REF-OUTSIDE-NEST.
The optimizer handles LEFT JOINs like so:
- Outer join runtime requires that "inner tables follow outer" in
any join order.
- Join optimizer enforces this by constructing join orders that follow
table dependencies as they are specified in TABLE_LIST::dep_tables.
- The dep_tables are set in simplify_joins() according to the contents
of ON expressions and LEFT JOIN structure.
However, the logic in simplify_joins() failed to account for possible
ON-EXPR-HAS-REF-OUTSIDE-NEST. It assumed that references outside of the
current join nest could only be OUTER_REF_TABLE_BIT or RAND_TABLE_BIT.
The fix was to add the missing logic.
In the `check_join_cache_usage()` function there is a branching issue
where an accidental fall-through to BKA/BKAH buffers may occur, even
when the join_cache_level setting does not permit their use.
This patch corrects the condition to ensure that BKA/BKAH join caching
is only enabled when explicitly allowed by join_cache_level
Reviewer: Sergei Petrunia <sergey@mariadb.com>
(Variant 3) (commit in 11.4)
When a derived table has a GROUP BY clause:
SELECT ...
FROM (SELECT ... GROUP BY col1, col2) AS tbl
The optimizer would use inner join's output cardinality as an estimate
of derived table size, ignoring the fact that GROUP BY operation would
produce much fewer groups.
Add code to produce tighter bounds:
- The GROUP BY list is split into per-table lists. If GROUP BY list has
expressions that refer to multiple tables, we fall back to join output
cardinality.
- For each table, the first cardinality estimate is join_tab->read_records.
- Then, we try to get a tighter bound by using index statistics.
- If indexes do not cover all GROUP BY columns, we try to use per-column
EITS statistics.
Backport of commit 74f70c3944 to 10.11.
The new logic is disabled by default, to enable, use
optimizer_adjust_secondary_key_costs=fix_derived_table_read_cost.
== Original commit comment ==
Fixed costs in JOIN_TAB::estimate_scan_time() and HEAP
Estimate_scan_time() calculates the cost of scanning a derivied table.
The old code did not take into account that the temporary table heap table
may be converted to Aria.
Things fixed:
- Added checking if the temporary tables data will fit in the heap.
If not, then calculate the cost based on the designated internal
temporary table engine (Aria).
- Removed MY_MAX(records, 1000) and instead trust the optimizer's
estimate of records. This reduces the cost of temporary tables a bit
for small tables, which caused a few changes in mtr results.
- Fixed cost calculation for HEAP.
- HEAP costs->row_next_find_cost was not set. This does not affect old
costs calculation as this cost slot was not used anywhere.
Now HEAP cost->row_next_find_cost is set, which allowed me to remove
some duplicated computation in ha_heap::scan_time()
MDEV-35958 Cost estimates for materialized derived tables are poor
(Backport 11.8->11.4, the same patch)
Estimate_scan_time() calculates the cost of scanning a derivied table.
The old code did not take into account that the temporary table heap table
may be converted to Aria.
Things fixed:
- Added checking if the temporary tables data will fit in the heap.
If not, then calculate the cost based on the designated internal
temporary table engine (Aria).
- Removed MY_MAX(records, 1000) and instead trust the optimizer's
estimate of records. This reduces the cost of temporary tables a bit
for small tables, which caused a few changes in mtr results.
- Fixed cost calculation for HEAP.
- HEAP costs->row_next_find_cost was not set. This does not affect old
costs calculation as this cost slot was not used anywhere.
Now HEAP cost->row_next_find_cost is set, which allowed me to remove
some duplicated computation in ha_heap::scan_time()
Reviewed by: Sergei Petrunia <sergey@mariadb.com>
MDEV-35958 Cost estimates for materialized derived tables are poor
Estimate_scan_time() calculates the cost of scanning a derivied table.
The old code did not take into account that the temporary table heap table
may be converted to Aria.
Things fixed:
- Added checking if the temporary tables data will fit in the heap.
If not, then calculate the cost based on the designated internal
temporary table engine (Aria).
- Removed MY_MAX(records, 1000) and instead trust the optimizer's
estimate of records. This reduces the cost of temporary tables a bit
for small tables, which caused a few changes in mtr results.
- Fixed cost calculation for HEAP.
- HEAP costs->row_next_find_cost was not set. This does not affect old
costs calculation as this cost slot was not used anywhere.
Now HEAP cost->row_next_find_cost is set, which allowed me to remove
some duplicated computation in ha_heap::scan_time()
Reviewed by: Sergei Petrunia <sergey@mariadb.com>
(Variant 2)
Multi-table UPDATE ... ORDER BY ... LIMIT could update the wrong rows when
ORDER BY was resolved by Using temporary + Using filesort.
== Background: ref_pointer_array ==
join->order[->next*]->item point into join->ref_pointer_array, which
has pointers to the used Item objects.
This indirection is employed so that we can switch the ORDER BY expressions
from using the original Items to using the values of their "image" fields
in the temporary table.
The variant of ref_pointer_array that has pointers to temp table fields
is created when JOIN::make_aggr_tables_info() calls
change_refs_to_tmp_fields().
== The problem ==
The created array didn't match element-by-element the original
ref_pointer_array. When arrays were switched, ORDER BY elements started
to point to the wrong temp.table fields, causing the wrong sorting.
== The cause ==
The cause is JOIN::add_fields_for_current_rowid(). This function is
called for UPDATE statements to make the rowids of rows in the original
tables to be saved in the temporary tables.
It adds extra columns to the select list in table_fields argument.
However, select lists are organized in a way that extra elements must
be added *to the front* of the list, and then change_refs_to_tmp_fields()
will add extra fields *to the end* of ref_pointer_array.
So, add_fields_for_current_rowid() adds new fields to the back of
table_fields list. This caused change_refs_to_tmp_fields() to produce
ref_pointer_array slice with extra elements in the front, causing any
references through ref_pointer_array to come to the wrong values.
== The fix ==
Make JOIN::add_fields_for_current_rowid() add fields to the front of
the select list.
Alternative, more general fix, Variant 2.
The problem was as follows: Suppose we are running a PS/SP statement and
we get an error while doing optimization that is done once per statement
life. This may leave the statement data structures in an undefined state,
where it is not safe to execute it again.
The fix: introduce LEX::needs_reprepare and set it in such cases.
Make PS and SP runtime check it and re-prepare the statement before
executing it again.
We do not use Reprepare_observer, because it turns out it is tightly tied
to watching versions of statement's objects. For example, it must not be
used when running the statement for the first time, exactly when the
once-per-statement-lifetime optimizations are done.
(Variant 2)
Multi-table UPDATE ... ORDER BY ... LIMIT could update the wrong rows when
ORDER BY was resolved by Using temporary + Using filesort.
== Background: ref_pointer_array ==
join->order[->next*]->item point into join->ref_pointer_array, which
has pointers to the used Item objects.
This indirection is employed so that we can switch the ORDER BY expressions
from using the original Items to using the values of their "image" fields
in the temporary table.
The variant of ref_pointer_array that has pointers to temp table fields
is created when JOIN::make_aggr_tables_info() calls
change_refs_to_tmp_fields().
== The problem ==
The created array didn't match element-by-element the original
ref_pointer_array. When arrays were switched, ORDER BY elements started
to point to the wrong temp.table fields, causing the wrong sorting.
== The cause ==
The cause is JOIN::add_fields_for_current_rowid(). This function is
called for UPDATE statements to make the rowids of rows in the original
tables to be saved in the temporary tables.
It adds extra columns to the select list in table_fields argument.
However, select lists are organized in a way that extra elements must
be added *to the front* of the list, and then change_refs_to_tmp_fields()
will add extra fields *to the end* of ref_pointer_array.
So, add_fields_for_current_rowid() adds new fields to the back of
table_fields list. This caused change_refs_to_tmp_fields() to produce
ref_pointer_array slice with extra elements in the front, causing any
references through ref_pointer_array to come to the wrong values.
== The fix ==
Make JOIN::add_fields_for_current_rowid() add fields to the front of
the select list.
normalize_cond() translated `WHERE col` into `WHERE col<>0`
But the opetator "not equal to 0" does not necessarily exists
for all data types.
For example, the query:
SELECT * FROM t1 WHERE inet6col;
was translated to:
SELECT * FROM t1 WHERE inet6col<>0;
which further failed with this error:
ERROR : Illegal parameter data types inet6 and bigint for operation '<>'
This patch changes the translation from `col<>0` to `col IS TRUE`.
So now
SELECT * FROM t1 WHERE inet6col;
gets translated to:
SELECT * FROM t1 WHERE inet6col IS TRUE;
Details:
1. Implementing methods:
- Field_longstr::val_bool()
- Field_string::val_bool()
- Item::val_int_from_val_str()
If the input contains bad data,
these methods raise a better error message:
Truncated incorrect BOOLEAN value
Before the change, the error was:
Truncated incorrect DOUBLE value
2. Fixing normalize_cond() to generate Item_func_istrue/Item_func_isfalse
instances instead of Item_func_ne/Item_func_eq
3. Making Item_func_truth sargable, so it uses the range optimizer.
Implementing the following methods:
- get_mm_tree(), get_mm_leaf(), add_key_fields() in Item_func_truth.
- get_func_mm_tree(), for all Item_func_truth descendants.
4. Implementing the method negated_item() for all Item_func_truth
descendants, so the negated item has a chance to be sargable:
For example,
WHERE NOT col IS NOT FALSE -- this notation is not sargable
is now translated to:
WHERE col IS FALSE -- this notation is sargable
(Review input addressed)
After this patch, the optimizer can handle virtual column expressions
in WHERE/ON clauses. If the table has an indexed virtual column:
ALTER TABLE t1
ADD COLUMN vcol INT AS (col1+1),
ADD INDEX idx1(vcol);
and the query uses the exact virtual column expression:
SELECT * FROM t1 WHERE col1+1 <= 100
then the optimizer will be able use index idx1 for it.
This is achieved by walking the WHERE/ON clauses and replacing instances
of virtual column expression (like "col1+1" above) with virtual column's
Item_field (like "vcol"). The latter can be processed by the optimizer.
Replacement is considered (and done) only in items that are potentially
usable to the range optimizer.
The problem was caused by this scenario: The query had both SELECT DISTINCT
and ORDER BY. DISTINCT was converted into GROUP BY. Then, vector index was
used to resolve the GROUP BY.
When join_read_first() initialized vector index scan, it used the ORDER BY
clause instead of GROUP BY, which caused a crash.
Fixed by making test_if_skip_sort_order() remember which ordering the scan
produces in JOIN_TAB::full_index_scan_order, and join_read_first() using that.
This commit updates default memory allocations size used with MEM_ROOT
objects to minimize the number of calls to malloc().
Changes:
- Updated MEM_ROOT block sizes in sql_const.h
- Updated MALLOC_OVERHEAD to also take into account the extra memory
allocated by my_malloc()
- Updated init_alloc_root() to only take MALLOC_OVERHEAD into account as
buffer size, not MALLOC_OVERHEAD + sizeof(USED_MEM).
- Reset mem_root->first_block_usage if and only if first block was used.
- Increase MEM_ROOT buffers sized used by my_load_defaults, plugin_init,
Create_tmp_table, allocate_table_share, TABLE and TABLE_SHARE.
This decreases number of malloc calls during queries.
- Use a small buffer for THD->main_mem_root in THD::THD. This avoids
multiple malloc() call for new connections.
I tried the above changes on a complex select query with 12 tables.
The following shows the number of extra allocations that where used
to increase the size of the MEM_ROOT buffers.
Original code:
- Connection to MariaDB: 9 allocations
- First query run: 146 allocations
- Second query run: 24 allocations
Max memory allocated for thd when using with heap table: 61,262,408
Max memory allocated for thd when using Aria tmp table: 419,464
After changes:
Connection to MariaDB: 0 allocations
- First run: 25 allocations
- Second run: 7 allocations
Max memory allocated for thd when using with heap table: 61,347,424
Max memory allocated for thd when using Aria table: 529,168
The new code uses slightly more memory, but avoids memory fragmentation
and is slightly faster thanks to much fewer calls to malloc().
Reviewed-by: Sergei Golubchik <serg@mariadb.org>
Heap tables are allocated blocks to store rows according to
my_default_record_cache (mapped to the server global variable
read_buffer_size).
This causes performance issues when the record length is big
(> 1000 bytes) and the my_default_record_cache is small.
Changed to instead split the default heap allocation to 1/16 of the
allowed space and not use my_default_record_cache anymore when creating
the heap. The allocation is also aligned to be just under a power of 2.
For some test that I have been running, which was using record length=633,
the speed of the query doubled thanks to this change.
Other things:
- Fixed calculation of max_records passed to hp_create() to take
into account padding between records.
- Updated calculation of memory needed by heap tables. Before we
did not take into account internal structures needed to access rows.
- Changed block sized for memory_table from 1 to 16384 to get less
fragmentation. This also avoids a problem where we need 1K
to manage index and row storage which was not counted for before.
- Moved heap memory usage to a separate test for 32 bit.
- Allocate all data blocks in heap in powers of 2. Change reported
memory usage for heap to reflect this.
Reviewed-by: Sergei Golubchik <serg@mariadb.org>
Make Item_func_eq of the following forms sargable by updating the relevant range
analysis methods:
1. substr(col, 1, n) = str
2. str = substr(col, 1, n)
3. left(col, n) = str
4. str = left(col, n)
where col is a indexed column and str is a const and inexpensive item
of length n.
We do this by factoring out Item_func_like::get_mm_leaf() and apply it
to a string obtained from escaping str and then appending a wildcard
"%" to it.
The addition of the two Functype enums, LEFT_FUNC and SUBSTR_FUNC,
requires changes in the spider group by handler to continue handling
LEFT and SUBSTR correctly.
Co-authored-by: Yuchen Pei <ycp@mariadb.com>
Co-authored-by: Sergei Petrunia <sergey@mariadb.com>