1
0
mirror of https://github.com/MariaDB/server.git synced 2025-11-09 11:41:36 +03:00
Commit Graph

4851 Commits

Author SHA1 Message Date
Sergei Golubchik
0da820cb12 mhnsw: use plugin index options and transaction_participant API 2024-11-05 14:00:50 -08:00
Sergei Golubchik
5c2b7c6e7f mhnsw: configurable parameters
1. introduce alpha. the value of 1.1 is optimal, so hard-code it.

2. hard-code ef_construction=10, best by test

3. rename hnsw_max_connection_per_layer to mhnsw_max_edges_per_node
   (max_connection is rather ambiguous in MariaDB) and add a help text

4. rename hnsw_ef_search to mhnsw_min_limit and add a help text
2024-11-05 14:00:49 -08:00
Vicențiu Ciorbaru
88839e71a3 Initial HNSW implementation
This commit includes the work done in collaboration with Hugo Wen from
Amazon:

    MDEV-33408 Alter HNSW graph storage and fix memory leak

    This commit changes the way HNSW graph information is stored in the
    second table. Instead of storing connections as separate records, it now
    stores neighbors for each node, leading to significant performance
    improvements and storage savings.

    Comparing with the previous approach, the insert speed is 5 times faster,
    search speed improves by 23%, and storage usage is reduced by 73%, based
    on ann-benchmark tests with random-xs-20-euclidean and
    random-s-100-euclidean datasets.

    Additionally, in previous code, vector objects were not released after
    use, resulting in excessive memory consumption (over 20GB for building
    the index with 90,000 records), preventing tests with large datasets.
    Now ensure that vectors are released appropriately during the insert and
    search functions. Note there are still some vectors that need to be
    cleaned up after search query completion. Needs to be addressed in a
    future commit.

    All new code of the whole pull request, including one or several files
    that are either new files or modified ones, are contributed under the
    BSD-new license. I am contributing on behalf of my employer Amazon Web
    Services, Inc.

As well as the commit:

    Introduce session variables to manage HNSW index parameters

    Three variables:

    hnsw_max_connection_per_layer
    hnsw_ef_constructor
    hnsw_ef_search

    ann-benchmark tool is also updated to support these variables in commit
    https://github.com/HugoWenTD/ann-benchmarks/commit/e09784e for branch
    https://github.com/HugoWenTD/ann-benchmarks/tree/mariadb-configurable

    All new code of the whole pull request, including one or several files
    that are either new files or modified ones, are contributed under the
    BSD-new license. I am contributing on behalf of my employer Amazon Web
    Services, Inc.

Co-authored-by: Hugo Wen <wenhug@amazon.com>
2024-11-05 14:00:48 -08:00
Sergei Golubchik
d6add9a03d initial support for vector indexes
MDEV-33407 Parser support for vector indexes

The syntax is

  create table t1 (... vector index (v) ...);

limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported

MDEV-33404 Engine-independent indexes: subtable method

added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.

MDEV-33406 basic optimizer support for k-NN searches

for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.
2024-11-05 14:00:48 -08:00
Sergei Golubchik
44c6328cbb cleanup: thd->alloc<>() and thd->calloc<>()
create templates

  thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n)

and the same for thd->calloc(). By the default the type is char,
so old usage of thd->alloc(size) works too.
2024-11-05 14:00:48 -08:00
Sergei Golubchik
aa09cb3b11 open frm for DROP TABLE
needed to get partitioning and information about
secondary objects
2024-11-05 14:00:48 -08:00
Sergei Golubchik
32e6f8ff2e cleanup: remove unconditional #ifdef's 2024-11-05 14:00:47 -08:00
Oleg Smirnov
a914087fab MDEV-35307 Unexpected error WARN_SORTING_ON_TRUNCATED_LENGTH or assertion failure in diagnostics area #2
When strict mode is enabled, all warnings during `INSERT` are
converted to errors regardless of their actual severity.
`WARN_SORTING_ON_TRUNCATED_LENGTH` is not considered severe enough
to be elevated to the ERROR level, and this commit fixes that
2024-11-05 14:52:20 +07:00
Monty
40810baffe MDEV-33144 Implement the Percona variable slow_query_log_always_write_time
This task is inspired by the Percona implementation of
slow_query_log_always_write_time.

This task implements the variable log_slow_always_query_time (name
matching other MariaDB variables using the slow query log). The
default value for the variable is 31536000, which makes MariaDB
compatible with older installations.

For queries with execution time longer than log_slow_always_query_time
the variables log_slow_rate_limit and log_slow_min_examined_row_limit
will be ignored and the query will be written to the slow query log
if there is no other limitations (like log_slow_filter etc).

Other things:
- long_query_time internal variable renamed to log_slow_query_time.
- More descriptive information for "log_slow_query_time".
2024-11-01 08:58:37 +01:00
Oleg Smirnov
fd87e01f38 MDEV-27277 Add a warning when max_sort_length is reached
During a query execution some sorting and grouping operations
on strings may be involved. System variable max_sort_length defines
the maximum number of bytes to use when comparing strings during
sorting/grouping. Thus, the comparable parts of strings may be less
than their actual size, so the results of the query may be not
sorted/grouped properly.
To indicate that some comparisons were done on a truncated lengths,
a new warning has been introduced with this commit.
2024-10-22 22:39:36 +07:00
Christian Gonzalez
fd0cc2b1fd Make SESSION_USER() comparable with CURRENT_USER()
Update `SESSION_USER()` behaviour to be comparable with `CURRENT_USER()`.
`SESSION_USER()` will return the user and host columns from `mysql.user`
used to authenticate the user when the session was created.

Historically `SESSION_USER()` was an alias of `USER()` function. The
main difference with `USER()` behaviour after this changes is that
`SESSION_USER()` now returns the host column from `mysql.user` instead of
the client host or ip.

NOTE: `SESSION_USER_IS_USER` old mode is added to make the change
backward compatible.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
2024-10-04 13:22:40 +02:00
Marko Mäkelä
f493e46494 Merge 11.6 into 11.7 2024-10-03 18:15:13 +03:00
Marko Mäkelä
43465352b9 Merge 11.4 into 11.6 2024-10-03 16:09:56 +03:00
Marko Mäkelä
b53b81e937 Merge 11.2 into 11.4 2024-10-03 14:32:14 +03:00
Marko Mäkelä
12a91b57e2 Merge 10.11 into 11.2 2024-10-03 13:24:43 +03:00
Marko Mäkelä
63913ce5af Merge 10.6 into 10.11 2024-10-03 10:55:08 +03:00
Marko Mäkelä
7e0afb1c73 Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
Yuchen Pei
ba7088d462 Merge '11.4' into 11.6 2024-10-03 15:59:20 +10:00
Sergei Petrunia
1cda4726ca MDEV-34993, part2: backport optimizer_adjust_secondary_key_costs
...and make the fix for MDEV-34993 switchable. It is enabled by default
and controlled with @optimizer_adjust_secondary_key_costs=fix_card_multiplier
2024-10-02 10:52:09 +03:00
Oleksandr Byelkin
20f57a8529 MDEV-33373 part 1: Unexpected ER_FILE_NOT_FOUND upon reading from logging table after crash recovery
We have found that my_errno can be "passed" to the next commad in some cases.

It is practically impossible to check/fix all cases of my_errno in the server,
plugins and engines so we will reset it as we reset other errors.

The test case will be fixed by CSV engine fix so will be added with it
(see part2).
2024-09-30 12:53:07 +02:00
Yuchen Pei
b7bca3ff71 MDEV-34518 Initialise THD::max_tmp_space_used 2024-09-30 16:33:15 +10:00
Sergei Petrunia
2c3b298337 Merge 11.2 into 11.4 2024-09-09 14:40:02 +03:00
Sergei Petrunia
abd98336d2 Merge 10.11 -> 11.2 2024-09-09 13:50:38 +03:00
Marko Mäkelä
2da4839bb6 Merge 10.6 into 10.11 2024-09-06 14:45:22 +03:00
Kristian Nielsen
db5d1cde45 MDEV-34857: Implement --slave-abort-blocking-timeout
If a slave replicating an event has waited for more than
@@slave_abort_blocking_timeout for a conflicting metadata lock held by a
non-replication thread, the blocking query is killed to allow replication to
proceed and not be blocked indefinitely by a user query.

Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-09-04 11:44:14 +02:00
Sergei Petrunia
c8d040938a MDEV-34720: Poor plan choice for large JOIN with ORDER BY and small LIMIT
(Variant 2b: call greedy_search() twice, correct handling for limited
 search_depth)

Modify the join optimizer to specifically try to produce join orders that
can short-cut their execution for ORDER BY..LIMIT clause.

The optimization is controlled by @@optimizer_join_limit_pref_ratio.
Default value 0 means don't construct short-cutting join orders.
Other value means construct short-cutting join order, and prefer it only
if it promises speedup of more than #value times.

In Optimizer Trace, look for these names:
* join_limit_shortcut_is_applicable
* join_limit_shortcut_plan_search
* join_limit_shortcut_choice
2024-09-02 16:37:18 +03:00
Oleksandr Byelkin
d6444022ca Merge branch 'bb-11.5-release' into bb-11.6-release 2024-08-06 17:28:38 +02:00
Oleksandr Byelkin
ea75a0b600 Merge branch '11.4' into 11.5 2024-08-05 17:50:18 +02:00
Oleksandr Byelkin
1640c9b06e Merge branch '11.2' into 11.4 2024-08-04 17:27:48 +02:00
Oleksandr Byelkin
dced6cbdb6 Merge branch '11.1' into 11.2 2024-08-03 09:50:16 +02:00
Oleksandr Byelkin
80abd847da Merge branch '10.11' into 11.1 2024-08-03 09:32:42 +02:00
Oleksandr Byelkin
0e8fb977b0 Merge branch '10.6' into 10.11 2024-08-03 09:15:40 +02:00
Monty
4bf7c966b3 MDEV-34664: Add an option to fix InnoDB's doubling of secondary index cardinalities
(With trivial fixes by sergey@mariadb.com)
Added option fix_innodb_cardinality to optimizer_adjust_secondary_key_costs

Using fix_innodb_cardinality disables the 'divide by 2' of rec_per_key_int
in InnoDB that in effect doubles the Cardinality for secondary keys.
This has the biggest effect for indexes where a few rows has the same key
value. Using this may also cause table scans for very small tables (which
in some cases may be better than an index scan).

The user visible effect is that 'SHOW INDEX FROM table_name' will for
InnoDB show the true Cardinality (and not 2x the real value). It will
also allow the optimizer to chose a better index in some cases as the
division by 2 could have a bad effect for tables with 2-5 identical values
per key.

A few notes about using fix_innodb_cardinality:
- It has direct affect for SHOW INDEX FROM table_name. SHOW INDEX
  will also update the statistics in table share.
- The effect of fix_innodb_cardinality for query plans or EXPLAIN
  is only visible after first open of the table. This is why one must
  do a flush tables or use SHOW INDEX for the option to take effect.
- Using fix_innodb_cardinality can thus affect all user in their query
  plans if they are using the same tables.

Because of this, it is strongly recommended that one uses
optimizer_adjust_secondary_key_costs=fix_innodb_cardinality mainly
in configuration files to not cause issues for other users.
2024-07-29 16:40:53 +03:00
Oleksandr Byelkin
0fe39d368a Merge branch '10.6' into 10.11 2024-07-22 15:14:50 +02:00
Monty
ecc7961140 MDEV-34571 Add page accessed and pages read from disk to table_stats
Trivial batch, using the handler statistics already collected for
the slow query log.

The reason for the changes in test cases was mainly to change to use
select TABLE_SCHEMA ... from information_schema.table_statistics instead
of 'show table_statistics' to avoid future changes to test results
if we add more columns to table_statistics.
2024-07-12 11:28:18 +03:00
Dave Gosselin
02e38e2ece MDEV-33971 NAME_CONST in WHERE clause replaced by inner item
Improve performance of queries like
  SELECT * FROM t1 WHERE field = NAME_CONST('a', 4);
by, in this example, replacing the WHERE clause with field = 4
in the case of ref access.

The rewrite is done during fix_fields and we disambiguate this
case from other cases of NAME_CONST by inspecting where we are
in parsing.  We rely on THD::where to accomplish this.  To
improve performance there, we change the type of THD::where to
be an enumeration, so we can avoid string comparisons during
Item_name_const::fix_fields.  Consequently, this patch also
changes all usages of THD::where to conform likewise.
2024-07-10 17:23:43 -04:00
Alexander Barkov
4e805aed85 Merge remote-tracking branch 'origin/11.4' into 11.5 2024-07-10 12:17:09 +04:00
Alexander Barkov
5fb07d942b Merge remote-tracking branch 'origin/11.2' into 11.4 2024-07-09 21:45:37 +04:00
Alexander Barkov
8aad19ddfc Merge remote-tracking branch 'origin/11.1' into 11.2 2024-07-09 14:04:11 +04:00
Oleksandr Byelkin
2447dda2c0 Merge branch '10.11' into 11.1 2024-07-08 22:40:16 +02:00
Marko Mäkelä
27a3366663 Merge 10.6 into 10.11 2024-06-27 10:26:09 +03:00
Yuchen Pei
d7042ec4da Merge branch '10.5' into 10.6 2024-06-26 09:16:54 +08:00
Yuchen Pei
aebd2397cc MDEV-34404 Use safe_str in spider udfs to avoid passing NULL str 2024-06-25 13:45:04 +08:00
Marko Mäkelä
0076eb3d4e Merge 10.5 into 10.6 2024-06-24 13:09:47 +03:00
Dave Gosselin
db0c28eff8 MDEV-33746 Supply missing override markings
Find and fix missing virtual override markings.  Updates cmake
maintainer flags to include -Wsuggest-override and
-Winconsistent-missing-override.
2024-06-20 11:32:13 -04:00
Marko Mäkelä
a21e49cbcc Merge 11.1 into 11.2 2024-06-17 12:02:03 +03:00
Yuchen Pei
2d3e2c58b6 Merge branch '10.11' into 11.1 2024-05-31 10:54:31 +10:00
Marko Mäkelä
22ba7e4ff8 Merge 10.6 into 10.11 2024-05-30 16:04:00 +03:00
Marko Mäkelä
5ba542e9ee Merge 10.5 into 10.6 2024-05-30 14:27:07 +03:00
Monty
94033fcf83 MDEV-33151 Add more columns to TABLE_STATISTICS and USER STATS
Columns added to TABLE_STATISTICS
- ROWS_INSERTED, ROWS_DELETED, ROWS_UPDATED, KEY_READ_HITS and
  KEY_READ_MISSES.

Columns added to CLIENT_STATISTICS and USER_STATISTICS:
- KEY_READ_HITS and KEY_READ_MISSES.

User visible changes (except new columns):
- CLIENT_STATISTICS and USER_STATISTICS has columns KEY_READ_HITS and
  KEY_READ_MISSES added after column ROWS_UPDATED before SELECT_COMMANDS.

Other changes:
- Do not collect table statistics for system tables like index_stats
  table_stats, performance_schema, information_schema etc as the user
  has no control of these and the generate noice in the statistics.
- All row variables that are part of user_stats are moved to
  'struct rows_stats' to make it easy to clear all of them at once.
- ha_read_key_misses added to STATUS_VAR

Notes:
- userstat.result has a change of numbers of rows for handler_read_key.
  This is because use-stat-tables is now disabled for the test.
2024-05-27 12:39:04 +02:00