1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-05 13:16:09 +03:00
Commit Graph

8566 Commits

Author SHA1 Message Date
Marko Mäkelä
f5bd250f5b Merge 10.11 into 11.4 2025-03-28 13:55:21 +02:00
Sergei Petrunia
3b4de4c281 MDEV-32084: Assertion in best_extension_by_limited_search() ...
When subquery with LEFT JOIN is converted into semi-join, it is possible
to construct cases where the LEFT JOIN's ON expression refers to a table
in the current select but not in the current join nest. For example:

  t1 SEMI JOIN (
    t2
    LEFT JOIN (t3 LEFT JOIN t4 ON t4.col=t1.col) ON expr
  )

here, ON t4.col=t1.col" has this property. Let's denote it as
ON-EXPR-HAS-REF-OUTSIDE-NEST.

The optimizer handles LEFT JOINs like so:
- Outer join runtime requires that "inner tables follow outer" in
  any join order.
- Join optimizer enforces this by constructing join orders that follow
  table dependencies as they are specified in TABLE_LIST::dep_tables.
- The dep_tables are set in simplify_joins() according to the contents
  of ON expressions and LEFT JOIN structure.

However, the logic in simplify_joins() failed to account for possible
ON-EXPR-HAS-REF-OUTSIDE-NEST. It assumed that references outside of the
current join nest could only be OUTER_REF_TABLE_BIT or RAND_TABLE_BIT.

The fix was to add the missing logic.
2025-03-26 15:52:54 +02:00
Sergei Petrunia
47d11328c9 MDEV-36381: Comment "Procedure of keys generation ..." is in the wrong place
Move it from the middle of table.cc to sql_select.cc:generate_derived_keys()
2025-03-25 19:05:32 +02:00
Marko Mäkelä
49a6baec56 Merge 10.11 into 11.4 2025-03-03 11:07:56 +02:00
Oleg Smirnov
733852d4c3 BKA join cache buffer is employed despite join_cache_level=3 (flat BNLH)
In the `check_join_cache_usage()` function there is a branching issue
where an accidental fall-through to BKA/BKAH buffers may occur, even
when the join_cache_level setting does not permit their use.

This patch corrects the condition to ensure that BKA/BKAH join caching
is only enabled when explicitly allowed by join_cache_level

Reviewer: Sergei Petrunia <sergey@mariadb.com>
2025-02-27 16:47:25 +07:00
Sergei Petrunia
ef966af801 MDEV-30877: Output cardinality for derived table ignores GROUP BY
(Variant 3) (commit in 11.4)
When a derived table has a GROUP BY clause:

  SELECT ...
    FROM  (SELECT ... GROUP BY col1, col2) AS tbl

The optimizer would use inner join's output cardinality as an estimate
of derived table size, ignoring the fact that GROUP BY operation would
produce much fewer groups.

Add code to produce tighter bounds:
- The GROUP BY list is split into per-table lists. If GROUP BY list has
  expressions that refer to multiple tables, we fall back to join output
  cardinality.
- For each table, the first cardinality estimate is join_tab->read_records.
- Then, we try to get a tighter bound by using index statistics.
- If indexes do not cover all GROUP BY columns, we try to use per-column
  EITS statistics.
2025-02-10 22:06:49 +02:00
Sergei Petrunia
43c5d1303f MDEV-35958 Cost estimates for materialized derived tables are poor
Backport of commit 74f70c3944 to 10.11.
The new logic is disabled by default, to enable, use
optimizer_adjust_secondary_key_costs=fix_derived_table_read_cost.

== Original commit comment ==
Fixed costs in JOIN_TAB::estimate_scan_time() and HEAP

Estimate_scan_time() calculates the cost of scanning a derivied table.
The old code did not take into account that the temporary table heap table
may be converted to Aria.

  Things fixed:
  - Added checking if the temporary tables data will fit in the heap.
    If not, then calculate the cost based on the designated internal
    temporary table engine (Aria).
  - Removed MY_MAX(records, 1000) and instead trust the optimizer's
    estimate of records. This reduces the cost of temporary tables a bit
    for small tables, which caused a few changes in mtr results.
  - Fixed cost calculation for HEAP.
  - HEAP costs->row_next_find_cost was not set. This does not affect old
    costs calculation as this cost slot was not used anywhere.
    Now HEAP cost->row_next_find_cost is set, which allowed me to remove
    some duplicated computation in ha_heap::scan_time()
2025-02-10 21:14:01 +02:00
Monty
cd03bf5c53 Fixed costs in JOIN_TAB::estimate_scan_time() and HEAP
MDEV-35958 Cost estimates for materialized derived tables are poor

(Backport 11.8->11.4, the same patch)

Estimate_scan_time() calculates the cost of scanning a derivied table.
The old code did not take into account that the temporary table heap table
may be converted to Aria.

Things fixed:
- Added checking if the temporary tables data will fit in the heap.
  If not, then calculate the cost based on the designated internal
  temporary table engine (Aria).
- Removed MY_MAX(records, 1000) and instead trust the optimizer's
  estimate of records. This reduces the cost of temporary tables a bit
  for small tables, which caused a few changes in mtr results.
- Fixed cost calculation for HEAP.
  - HEAP costs->row_next_find_cost was not set. This does not affect old
    costs calculation as this cost slot was not used anywhere.
    Now HEAP cost->row_next_find_cost is set, which allowed me to remove
    some duplicated computation in ha_heap::scan_time()

Reviewed by: Sergei Petrunia <sergey@mariadb.com>
2025-02-10 15:59:28 +02:00
Sergei Petrunia
5f68fd52a9 MDEV-35955 Wrong result for UPDATE ... ORDER BY LIMIT which uses tmp.table
(Variant 2)
Multi-table UPDATE ... ORDER BY ... LIMIT could update the wrong rows when
ORDER BY was resolved by Using temporary + Using filesort.

== Background: ref_pointer_array ==
join->order[->next*]->item point into join->ref_pointer_array, which
has pointers to the used Item objects.

This indirection is employed so that we can switch the ORDER BY expressions
from using the original Items to using the values of their "image" fields
in the temporary table.
The variant of ref_pointer_array that has pointers to temp table fields
is created when JOIN::make_aggr_tables_info() calls
change_refs_to_tmp_fields().

== The problem ==
The created array didn't match element-by-element the original
ref_pointer_array. When arrays were switched, ORDER BY elements started
to point to the wrong temp.table fields, causing the wrong sorting.

== The cause ==
The cause is JOIN::add_fields_for_current_rowid(). This function is
called for UPDATE statements to make the rowids of rows in the original
tables to be saved in the temporary tables.

It adds extra columns to the select list in table_fields argument.
However, select lists are organized in a way that extra elements must
be added *to the front* of the list, and then change_refs_to_tmp_fields()
will add extra fields *to the end* of ref_pointer_array.

So, add_fields_for_current_rowid() adds new fields to the back of
table_fields list. This caused change_refs_to_tmp_fields() to produce
ref_pointer_array slice with extra elements in the front, causing any
references through ref_pointer_array to come to the wrong values.

== The fix ==
Make JOIN::add_fields_for_current_rowid() add fields to the front of
the select list.
2025-01-31 11:45:16 +02:00
Marko Mäkelä
17f01186f5 Merge 10.11 into 11.4 2025-01-09 07:58:08 +02:00
Marko Mäkelä
420d9eb27f Merge 10.6 into 10.11 2025-01-08 12:51:26 +02:00
Monty
996e7fd7d5 Avoid printing "rowid_filter_skipped" in optimizer trace if no rowid filter
There is no point in saying something is skipped when it does not exists.
2025-01-05 16:40:11 +02:00
Monty
e600f9aebb MDEV-35750 Change MEM_ROOT allocation sizes to reduse calls to malloc() and avoid memory fragmentation
This commit updates default memory allocations size used with MEM_ROOT
objects to minimize the number of calls to malloc().

Changes:
- Updated MEM_ROOT block sizes in sql_const.h
- Updated MALLOC_OVERHEAD to also take into account the extra memory
  allocated by my_malloc()
- Updated init_alloc_root() to only take MALLOC_OVERHEAD into account as
  buffer size, not MALLOC_OVERHEAD + sizeof(USED_MEM).
- Reset mem_root->first_block_usage if and only if first block was used.
- Increase MEM_ROOT buffers sized used by my_load_defaults, plugin_init,
  Create_tmp_table, allocate_table_share, TABLE and TABLE_SHARE.
  This decreases number of malloc calls during queries.
- Use a small buffer for THD->main_mem_root in THD::THD. This avoids
  multiple malloc() call for new connections.

I tried the above changes on a complex select query with 12 tables.
The following shows the number of extra allocations that where used
to increase the size of the MEM_ROOT buffers.

Original code:
- Connection to MariaDB:   9 allocations
- First query run:       146 allocations
- Second query run:       24 allocations

Max memory allocated for thd when using with heap table:  61,262,408
Max memory allocated for thd when using Aria tmp table:      419,464

After changes:
Connection to MariaDB:     0 allocations
- First run:              25 allocations
- Second run:              7 allocations

Max memory allocated for thd when using with heap table:  61,347,424
Max memory allocated for thd when using Aria table:          529,168

The new code uses slightly more memory, but avoids memory fragmentation
and is slightly faster thanks to much fewer calls to malloc().

Reviewed-by: Sergei Golubchik <serg@mariadb.org>
2025-01-05 16:40:11 +02:00
Monty
52c29f3bdc MDEV-35469 Heap tables are calling mallocs to often
Heap tables are allocated blocks to store rows according to
my_default_record_cache (mapped to the server global variable
 read_buffer_size).
This causes performance issues when the record length is big
(> 1000 bytes) and the my_default_record_cache is small.

Changed to instead split the default heap allocation to 1/16 of the
allowed space and not use my_default_record_cache anymore when creating
the heap. The allocation is also aligned to be just under a power of 2.

For some test that I have been running, which was using record length=633,
the speed of the query doubled thanks to this change.

Other things:
- Fixed calculation of max_records passed to hp_create() to take
  into account padding between records.
- Updated calculation of memory needed by heap tables. Before we
  did not take into account internal structures needed to access rows.
- Changed block sized for memory_table from 1 to 16384 to get less
  fragmentation. This also avoids a problem where we need 1K
  to manage index and row storage which was not counted for before.
- Moved heap memory usage to a separate test for 32 bit.
- Allocate all data blocks in heap in powers of 2. Change reported
  memory usage for heap to reflect this.

Reviewed-by: Sergei Golubchik <serg@mariadb.org>
2025-01-05 16:40:11 +02:00
Marko Mäkelä
3f914afd3a Merge 10.6 into 10.11 2025-01-02 12:39:56 +02:00
Yuchen Pei
671f80c738 Merge branch '10.5' into 10.6 2024-12-17 11:06:09 +11:00
Oleg Smirnov
e640373389 Revert "MDEV-26427 MariaDB Server SEGV on INSERT .. SELECT"
This reverts commit 49e14000ee
as it introduces regression MDEV-29935 and has to be reconsidered
in general
2024-12-14 13:08:17 +07:00
Kristian Nielsen
0f47db8525 Merge 10.11 -> 11.4
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 11:01:42 +01:00
Kristian Nielsen
e7c6cdd842 Merge 10.6 -> 10.11
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 10:11:58 +01:00
Kristian Nielsen
0166c89e02 Merge 10.5 -> 10.6
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 09:20:36 +01:00
Jason Cu
2bf9f0d422 MDEV-32395: update_depend_map_for_order: SEGV at /mariadb-11.3.0/sql/sql_select.cc:16583
MDEV-32329 (patch) pushdown from having into where: Server crashes at sub_select

When generating an Item_equal with a Item_ref that refers to a field
outside of a subselect, remove_item_direct_ref() causes the dependency
(depended_from) on the outer select to be lost, which causes trouble
for code downstream that can no longer determine the scope of the Item.
Not calling remove_item_direct_ref() retains the Item's dependency.

Test cases from MDEV-32395 and MDEV-32329 are included.

Some fixes from other developers:

Monty:
- Fixed wrong code in Item_equal::create_pushable_equalities()
  that could cause wrong item to be used if there was no matching items.
Daniel Black:
- Added test cases from MDEV-32329
Igor Babaev:
- Provided fix for removing call to remove_item_direct_ref() in
  eliminate_item_equal()

MDEV-32395: update_depend_map_for_order: SEGV at /mariadb-11.3.0/sql/sql_select.cc:16583

Include test cases from MDEV-32329.
2024-12-04 13:22:45 +02:00
Marko Mäkelä
2719cc4925 Merge 10.11 into 11.4 2024-12-02 11:35:34 +02:00
Marko Mäkelä
3d23adb766 Merge 10.6 into 10.11 2024-11-29 13:43:17 +02:00
Marko Mäkelä
7d4077cc11 Merge 10.5 into 10.6 2024-11-29 12:37:46 +02:00
Brandon Nesterenko
dbfee9fc2b MDEV-34348: Consolidate cmp function declarations
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict

The functions queue_compare, qsort2_cmp, and qsort_cmp2
all had similar interfaces, and were used interchangable
and unsafely cast to one another.

This patch consolidates the functions all into the
qsort_cmp2 interface.

Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
2024-11-23 08:14:22 -07:00
Monty
93fb364cd9 Removed not used ha_drop_table()
This was done after changing call in sql_select.cc from
ha_drop_table() to drop_table(), like in 11.5
2024-11-20 09:59:43 +02:00
Oleksandr Byelkin
c770bce898 Merge branch '11.2' into 11.4 2024-10-30 15:11:17 +01:00
Oleksandr Byelkin
69d033d165 Merge branch '10.11' into 11.2 2024-10-29 16:42:46 +01:00
Oleksandr Byelkin
3d0fb15028 Merge branch '10.6' into 10.11 2024-10-29 15:24:38 +01:00
Oleksandr Byelkin
f00711bba2 Merge branch '10.5' into 10.6 2024-10-29 14:20:03 +01:00
Sergei Petrunia
284593413f MDEV-35253: xa_prepare_unlock_unmodified fails: shift exponent 32 is too large
The code in best_access_path() uses PREV_BITS(uint, N) to
compute a bitmap of all keyparts: {keypart0, ... keypart{N-1}).

The problem is that PREV_BITS($type, N) macro code can't handle the case
when N=<number of bits in $type).
Also, why use PREV_BITS(uint, ...) for key part map computations when
we could have used PREV_BITS(key_part_map) ?

Fixed both:
- Change PREV_BITS(type, N) to handle any N in [0; n_bits(type)].
- Change PREV_BITS() to use key_part_map when computing key_part_map bitmaps.
2024-10-25 18:02:14 +03:00
Oleg Smirnov
6bd1cb0ea0 MDEV-34880 Incorrect result for query with derived table having TEXT field
When a derived table which has distinct values and BLOB fields is
materialized, an index is created over all columns to ensure only
unique values are placed to the result.
This index is created in a special mode HA_UNIQUE_HASH to support BLOBs.
Later the optimizer may incorrectly choose this index to retrieve values
from the derived table, although such type of index cannot be used
for data retrieval.

This commit excludes HA_UNIQUE_HASH indexes from adding to
`JOIN::keyuse` array thus preventing their subsequent usage for
data retrieval
2024-10-23 17:55:00 +07:00
Sergei Petrunia
a68e74b5a4 MDEV-35164: optimizer_join_limit_pref_ratio: assertion when the ORDER BY table becomes constant
Assertion failure has happened due to this scenario:

A query was ran with optimizer_join_limit_pref_ratio=1.
The query had "ORDER BY t1.col LIMIT N".
The optimizer set join->limit_shortcut_applicable=1.
Then, table t1 was marked as constant.
The code in choose_query_plan() still set join->limit_optimization_mode=1
which caused the optimizer to only consider t1 as the first non-const table.
But t1 was already put into the join prefix as the constant table.
The optimizer couldn't produce any join order at all and crashed.

Fixed by not searching for shortcut plan if ORDER BY table is a constant.
We will not try to do sorting anyway in this case (and LIMIT short-cutting
will be done for any join order).
2024-10-18 15:42:05 +03:00
Sergei Petrunia
0540eac05c MDEV-35180: ref_to_range rewrite causes poor query plan
(Variant 2: only allow rewrite for ref(const))

make_join_select() has a "ref_to_range" rewrite: it would rewrite
any ref access to a range access on the same index if the latter uses
more keyparts.
It seems, he initial intent of this was to fix poor query plan choice
in cases like

  t.keypart1=const AND t.keypart2 < 'foo'

Due to deficiency in cost model, ref access could be picked while range
would enumerate fewer rows and be cheaper.
However, the condition also forces a rewrite in cases like:

  t.keypart1=prev_table.col AND t.keypart1<='foo' AND t.keypart2<'bar'

Here, it can be that
* keypart1=prev_table.col is highly selective
* (keypart1, keypart2) <= ('foo', 'bar') is not at all selective.

Still, the rewrite would be made and poor query plan chosen.
Fixed this by only doing the rewrite if ref access was ref(const)
so we can be certain that quick select also used these restrictions
and will scan a subset of rows that ref access would scan.
2024-10-18 13:37:04 +03:00
Sergei Petrunia
9849e3f948 MDEV-35072: Assertion with optimizer_join_limit_pref_ratio and 1-table select
Pre-11.0 variant:

1. In recompute_join_cost_with_limit(), add an assertion that that
partial_join_cost >= 0.0.

2. best_extension_by_limited_search() subtracts COST_EPS from
   join->best_read. But it is not subtracted from
   join->positions[0].read_time, add it back.

2. We could get very small negative partial_join_cost due to rounding
  errors. For fraction=1.0, we were computing essentially this (denote
  as EXPR-1):

  $row_read_cost + $where_cost - ($row_read_cost + $where_cost)

which should compute to 0.
But the computation was done in the following order (left-to-right):
EXPR-2:

  ($row_read_cost + $where_cost) - $row_read_cost - $where_cost

this produced a value of -1.1102230246251565e-16 due to a rounding
error. Change the computation use EXPR-1 instead of EXPR-2.
2024-10-15 15:56:41 +03:00
Sergei Petrunia
5619f29384 Replace 0.001 with symbolic name COST_EPS
optimize_straight_join and best_extension_by_limited_search()
use 0.001 to make choice between plans with identical cost deterministic.

Use COST_EPS instead of 0.001, like it's done in newer versions.
2024-10-15 15:56:41 +03:00
Sergei Petrunia
66b8d32b75 MDEV-35072: Assertion with optimizer_join_limit_pref_ratio and 1-table select
Variant for 11.2+:

In recompute_join_cost_with_limit(), do not subtract the cost of checking
the WHERE:
   pos->records_read* WHERE_COST_THD(join->thd)

It is already included in pos->read_time.

Also added comments about difference between this fix and the pre-11.2
variant.
2024-10-15 15:01:29 +03:00
Yuchen Pei
cd5577ba4a Merge branch '10.5' into 10.6 2024-10-15 16:00:44 +11:00
Yuchen Pei
77ed235d50 MDEV-26345 Spider GBH should execute original queries on the data node
Stop skipping const items when selecting but skip them when storing
their results to spider row to avoid storing in mismatching temporary
table fields.

Skip auxiliary fields in SELECTing, and do not store
the (non-existing) results to the corresponding temporary table
accordingly.

When there are BOTH auxiliary fields AND const items in the auxiliary
field items, do not use the spider GBH. This is a rare occasion if it
happens at all and not worth the added complexity to cover it.

Use the original item (item_ptr) in constructing GROUP BY and ORDER
BY, which also means using item->name instead of field->field_name as
aliases in constructing SELECT items. This fixes spurious regressions
caused by the above changes in some tests using ORDER BY, such as
mdev_24517.test. As a by-product, this also fixes MDEV-29546.
Therefore we update mdev_29008.test to include the MDEV-29546 case.
2024-10-15 15:36:12 +11:00
Oleksandr Byelkin
a79c9b3812 MDEV-35135 Assertion `!is_cond()' failed in Item_bool_func::val_int / do_select
Change val_int with val_bool when it is a condition.
2024-10-14 09:36:17 +02:00
Oleksandr Byelkin
1d0e94c55f Merge branch '10.5' into 10.6 2024-10-09 08:38:48 +02:00
Sergei Golubchik
3ea71a2c8e MDEV-16699 heap-use-after-free in group_concat with compressed or GIS columns
Field_blob::store() has special code for GROUP_CONCAT temporary table
(to store blob values in Blob_mem_storage - this prevents them
from being freed/overwritten when a next row is read).

Field_geom and Field_blob_compressed inherit from Field_blob but they
have their own ::store() method without this special Blob_mem_storage
support.

Considering that non-grouping CONCAT() of such fields converts
them to plain BLOB, let's do the same for GROUP_CONCAT. To do it,
Item_func_group_concat::setup will signal that it's creating
a temporary table for GROUP_CONCAT, and Field_blog::make_new_field()
override will create base Field_blob when under group concat.
2024-10-08 15:31:02 +02:00
Alexander Barkov
a931da82fa MDEV-34123 CONCAT Function Returns Unexpected Empty Set in Query
Search conditions were evaluated using val_int(), which was wrong.
Fixing the code to use val_bool() instead.

Details:
- Adding a new item_base_t::IS_COND flag which marks Items used
  as <search condition> in WHERE, HAVING, JOIN ON, CASE WHEN clauses.
  The flag is at the parse time.
  These expressions must be evaluated using val_bool() rather than val_int().

  Note, the optimizer creates more Items which are used as search conditions.
  Most of these items are not marked with IS_COND yet. This is OK for now,
  but eventually these Items can also be fixed to have the flag.

- Adding a method Item::is_cond() which tests if the Item has the IS_COND flag.

- Implementing Item_cache_bool. It evaluates the cached expression using
  val_bool() rather than val_int().
  Overriding Type_handler_bool::Item_get_cache() to create Item_cache_bool.

- Implementing Item::save_bool_in_field(). It uses val_bool() rather than
  val_int() to evaluate the expression.

- Implementing Type_handler_bool::Item_save_in_field()
  using Item::save_bool_in_field().

- Fixing all Item_bool_func descendants to implement a virtual val_bool()
  rather than a virtual val_int().

- To find places where val_int() should be fixed to val_bool(), a few
  DBUG_ASSERT(!is_cond()) where added into val_int() implementations
  of selected (most frequent) classes:

  Item_field
  Item_str_func
  Item_datefunc
  Item_timefunc
  Item_datetimefunc
  Item_cache_bool
  Item_bool_func
  Item_func_hybrid_field_type
  Item_basic_constant descendants

- Fixing all places where DBUG_ASSERT() happened during an "mtr" run
  to use val_bool() instead of val_int().
2024-10-08 11:58:46 +02:00
Marko Mäkelä
b53b81e937 Merge 11.2 into 11.4 2024-10-03 14:32:14 +03:00
Marko Mäkelä
12a91b57e2 Merge 10.11 into 11.2 2024-10-03 13:24:43 +03:00
Marko Mäkelä
63913ce5af Merge 10.6 into 10.11 2024-10-03 10:55:08 +03:00
Marko Mäkelä
7e0afb1c73 Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
Sergei Golubchik
b1bbdbab9e cleanup: remove redundant if()
likely, a result of auto-merge of two fixes in different versions
2024-10-01 18:29:11 +02:00
Oleksandr Byelkin
8d810e9426 MDEV-29537 Creation of view with UNION and SELECT ... FOR UPDATE in definition is failed with error
lock_type is writen in the last SELECT of the unit even if it parsed last,
so it should be printed last from the last select of the unit.
2024-10-01 11:07:45 +02:00
Sergei Petrunia
d67c88946f Debugging: add dbug_print_join_prefix() to use in best_access_path
A call to

  dbug_print_join_prefix(join_positions, idx, s)

returns a const char* ponter to string with current join prefix,
including the table being added to it.
2024-09-25 16:53:59 +03:00