support SQL semantics for SELECT ... WHERE ... ORDER BY ... LIMIT
* switch from returning k nearest neighbors to returning
as many as needed, in k-neighbor chunks, with increasing distance
* make search_layer() skips nodes that are closer than a threshold
* read_next keeps a search context - list of k found nodes,
threshold, ctx, etc.
* when the list of found nodes is exhausted, it repeats the search
starting from last found nodes and a threshold
* search context kepts ctx->refcount incremented, so ctx won't go away
* but commit_lock is unlocked between calls, so InnoDB can modify the table
* use ctx version to detect that, switch to MHNSW_Trx when it happens
bugfix:
* use the correct lock in ha_external_lock() for the graph table
* InnoDB didn't reset locks on ha_external_lock(F_UNLCK) and previous
LOCK_X leaked into the next statement
Fixes for ALTER TABLE ... ADD/DROP COLUMN, ALGORITHM=COPY.
Let quick_rm_table() remove high-level indexes along with original table.
Avoid locking uninitialized LOCK_share for INTERNAL_TMP_TABLEs.
Don't enable bulk insert when altering a table containing vector index.
InnoDB can't handle situation when bulk insert is enabled for one table
but disabled for another. We can't do bulk insert on vector index as it
does table updates currently.
* fix the truncate-by-handler variant, used by InnoDB
* test that insert works after truncate, meaning graph table was emptied
* test that the vector index size is zero after truncate in MyISAM
When the source row is deleted, mark the corresponding node in HNSW
index by setting `tref` to null. An index is added for the `tref` in
secondary table for faster searching of the to-be-marked nodes.
The nodes marked as deleted will still be used for search, but will not
be included in the final query results.
As skipping deleted nodes and not adding deleted nodes for new-inserted
nodes' neighbor list could impact the performance, we now only skip
these nodes in search results.
- for some reason the bitmap is not set for hlindex during the delete so
I had to temporarily comment out one line
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
* preserve the graph in memory between statements
* keep it in a TABLE_SHARE, available for concurrent searches
* nodes are generally read-only, walking the graph doesn't change them
* distance to target is cached, calculated only once
* SIMD-optimized bloom filter detects visited nodes
* nodes are stored in an array, not List, to better utilize bloom filter
* auto-adjusting heuristic to estimate the number of visited nodes
(to configure the bloom filter)
* many threads can concurrently walk the graph. MEM_ROOT and Hash_set
are protected with a mutex, but walking doesn't need them
* up to 8 threads can concurrently load nodes into the cache,
nodes are partitioned into 8 mutexes (8 is chosen arbitrarily, might
need tuning)
* concurrent editing is not supported though
* this is fine for MyISAM, TL_WRITE protects the TABLE_SHARE and the
graph (note that TL_WRITE_CONCURRENT_INSERT is not allowed, because an
INSERT into the main table means multiple UPDATEs in the graph)
* InnoDB uses secondary transaction-level caches linked in a list in
in thd->ha_data via a fake handlerton
* on rollback the secondary cache is discarded, on commit nodes
from the secondary cache are invalidated in the shared cache
while it is exclusively locked
* on savepoint rollback both caches are flushed. this can be improved
in the future with a row visibility callback
* graph size is controlled by @@mhnsw_cache_size, the cache is flushed
when it reaches the threshold
* mhnsw:
* use primary key, innodb loves and (and the index cannot have dupes anyway)
* MyISAM is ok with that, performance-wise
* must be ha_rnd_init(0) because we aren't going to scan
* MyISAM resets the position on ha_rnd_init(0) so query it before
* oh, and use the correct handler, just in case
* HA_ERR_RECORD_IS_THE_SAME is no error
* innodb:
* return ref_length on create
* don't assume table->pos_in_table_list is set
* ok, assume away, but only for system versioned tables
* set alter_info on create (InnoDB needs to check for FKs)
* pair external_lock/external_unlock correctly
This commit includes the work done in collaboration with Hugo Wen from
Amazon:
MDEV-33408 Alter HNSW graph storage and fix memory leak
This commit changes the way HNSW graph information is stored in the
second table. Instead of storing connections as separate records, it now
stores neighbors for each node, leading to significant performance
improvements and storage savings.
Comparing with the previous approach, the insert speed is 5 times faster,
search speed improves by 23%, and storage usage is reduced by 73%, based
on ann-benchmark tests with random-xs-20-euclidean and
random-s-100-euclidean datasets.
Additionally, in previous code, vector objects were not released after
use, resulting in excessive memory consumption (over 20GB for building
the index with 90,000 records), preventing tests with large datasets.
Now ensure that vectors are released appropriately during the insert and
search functions. Note there are still some vectors that need to be
cleaned up after search query completion. Needs to be addressed in a
future commit.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
As well as the commit:
Introduce session variables to manage HNSW index parameters
Three variables:
hnsw_max_connection_per_layer
hnsw_ef_constructor
hnsw_ef_search
ann-benchmark tool is also updated to support these variables in commit
https://github.com/HugoWenTD/ann-benchmarks/commit/e09784e for branch
https://github.com/HugoWenTD/ann-benchmarks/tree/mariadb-configurable
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
Co-authored-by: Hugo Wen <wenhug@amazon.com>
MDEV-33407 Parser support for vector indexes
The syntax is
create table t1 (... vector index (v) ...);
limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported
MDEV-33404 Engine-independent indexes: subtable method
added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.
MDEV-33406 basic optimizer support for k-NN searches
for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.
let the caller tell init_tmp_table_share() whether the table
should be thread_specific or not.
In particular, internal tmp tables created in the slave thread
are perfectly thread specific
create templates
thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n)
and the same for thd->calloc(). By the default the type is char,
so old usage of thd->alloc(size) works too.
Single-table UPDATE/DELETE didn't provide outer_lookup_keys value for
subqueries. This didn't allow to make a meaningful choice between
IN->EXISTS and Materialization strategies for subqueries.
Fix this:
* Make UPDATE/DELETE save Sql_cmd_dml::scanned_rows,
* Then, subquery's JOIN::choose_subquery_plan() can fetch it from
there for outer_lookup_keys
Details:
UPDATE/DELETE now calls select_lex->optimize_unflattened_subqueries()
twice, like SELECT does (first call optimize_constant_subquries() in
JOIN::optimize_inner(), then call optimize_unflattened_subqueries() in
JOIN::optimize_stage2()):
1. Call with const_only=true before any optimizations. This allows
range optimizer and others to use the values of cheap const
subqueries.
2. Call it with const_only=false after range optimizer, partition
pruning, etc. outer_lookup_keys value is provided, so it's possible to
pick a good subquery strategy.
Note: PROTECT_STATEMENT_MEMROOT requires that first SP execution
performs subquery optimization for all subqueries, even for degenerate
query plans like "Impossible WHERE". Due to that, we ensure that the
call to optimize_unflattened_subqueries (with const_only=false) even
for degenerate query plans still happens, as was the case before this
change.
Extend derived table syntax to support column name assignment.
(subquery expression) [as|=] ident [comma separated column name list].
Prior to this patch, the optional comma separated column name list is
not supported.
Processing within the unit of the subquery expression will use
original column names, outside the unit will use the new names.
For example, in the query
select a1, a2 from
(select c1, c2, c3 from t1 where c2 > 0) as dt (a1, a2, a3)
where a2 > 10;
we see the second column of the derived table dt being used both within,
(where c2 > 0), and outside, (where a2 > 10), the specification.
Both conditions apply to t1.c2.
When multiple unit preparations are required, such as when being used within
a prepared statement or procedure, original column names are needed for
correct resolution. Original names are reset within mysql_derived_reinit().
Item_holder items, used for result tables in both TVC and union preparations
are renamed before use within st_select_lex_unit::prepare().
During wildcard expansion, if column names are present, items names are
set directly after creation.
Reviewed by Igor Babaev (igor@mariadb.com)
If a slave replicating an event has waited for more than
@@slave_abort_blocking_timeout for a conflicting metadata lock held by a
non-replication thread, the blocking query is killed to allow replication to
proceed and not be blocked indefinitely by a user query.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Running an UPDATE statement in PS mode and having positional
parameter(s) bound with an array of actual values (that is
prepared to be run in bulk mode) results in incorrect behaviour
in presence of on update trigger that also executes an UPDATE
statement. The same is true for handling a DELETE statement in
presence of on delete trigger. Typically, the visible effect of
such incorrect behaviour is expressed in a wrong number of
updated/deleted rows of a target table. Additionally, in case UPDATE
statement, a number of modified rows and a state message returned
by a statement contains wrong information about a number of modified rows.
The reason for incorrect number of updated/deleted rows is that
a data structure used for binding positional argument with its
actual values is stored in THD (this is thd->bulk_param) and reused
on processing every INSERT/UPDATE/DELETE statement. It leads to
consuming actual values bound with top-level UPDATE/DELETE statement
by other DML statements used by triggers' body.
To fix the issue, reset the thd->bulk_param temporary to the value
nullptr before invoking triggers and restore its value on finishing
its execution.
The second part of the problem relating with wrong value of affected
rows reported by Connector/C API is caused by the fact that diagnostics
area is reused by an original DML statement and a statement invoked
by a trigger. This fact should be take into account on finalizing a
state of diagnostics area on completion running of a statement.
Important remark: in case the macros DBUG_OFF is on, call of the method
Diagnostics_area::reset_diagnostics_area()
results in reset of the data members
m_affected_rows, m_statement_warn_count.
Values of these data members of the class Diagnostics_area are used on
sending OK and EOF messages. In case DML statement is executed in PS bulk
mode such resetting results in sending wrong result values to a client
for affected rows in case the DML statement fires a triggers. So, reset
these data members only in case the current statement being processed
is not run in bulk mode.
Improve performance of queries like
SELECT * FROM t1 WHERE field = NAME_CONST('a', 4);
by, in this example, replacing the WHERE clause with field = 4
in the case of ref access.
The rewrite is done during fix_fields and we disambiguate this
case from other cases of NAME_CONST by inspecting where we are
in parsing. We rely on THD::where to accomplish this. To
improve performance there, we change the type of THD::where to
be an enumeration, so we can avoid string comparisons during
Item_name_const::fix_fields. Consequently, this patch also
changes all usages of THD::where to conform likewise.
The special logic used by the memory storage engine
to keep slaves in sync with the master on a restart can
break replication. In particular, after a restart, the
master writes DELETE statements in the binlog for
each MEMORY-based table so the slave can empty its
data. If the DELETE is not executable, e.g. due to
invalid triggers, the slave will error and fail, whereas
the master will never see the problem.
Instead of DELETE statements, use TRUNCATE to
keep slaves in-sync with the master, thereby bypassing
triggers.
Reviewed By:
===========
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
One of possible use cases that reproduces the memory leakage listed below:
set timestamp= unix_timestamp('2000-01-01 00:00:00');
create or replace table t1 (x int) with system versioning
partition by system_time interval 1 hour auto
partitions 3;
create table t2 (x int);
create trigger tr after insert on t2 for each row update t1 set x= 11;
create or replace procedure sp2() insert into t2 values (5);
set timestamp= unix_timestamp('2000-01-01 04:00:00');
call sp2;
set timestamp= unix_timestamp('2000-01-01 13:00:00');
call sp2; # <<=== Memory leak happens there. In case MariaDB server is built
with the option -DWITH_PROTECT_STATEMENT_MEMROOT,
the second execution would hit assert failure.
The reason of leaking a memory is that once a new partition be created
the table should be closed and re-opened. It results in calling the function
extend_table_list() that indirectly invokes the function sp_add_used_routine()
to add routines implicitly used by the statement that makes a new memory
allocation.
To fix it, don't remove routines and tables the statement implicitly depends
on when a table being closed for subsequent re-opening.
In cmake -DWITH_UBSAN=ON builds with clang but not with GCC,
-fsanitize=undefined will flag several runtime errors on
function pointer mismatch related to the lock-free hash table LF_HASH.
Let us use matching function signatures and remove function pointer
casts in order to avoid potential bugs due to undefined behaviour.
These errors could be caught at compilation time by
-Wcast-function-type-strict, which is available starting with clang-16,
but not available in any version of GCC as of now. The old GCC flag
-Wcast-function-type is enabled as part of -Wextra, but it specifically
does not catch these errors.
Reviewed by: Vladislav Vaintroub