Support for optimizer hints NO_SPLIT_MATERIALIZED and
SPLIT_MATERIALIZED. These hints allow fine-grained control
of the "lateral derived" optimization within a query.
Introduces new overload of hint_table_state function which
tells both a hint's value as well as whether it is present.
This is useful to disambiguate cases that the other version
of hint_table_state cannot, such as when a hint is forcing
a behavior in the optimizer that it would not normally do
and the corresponding optimizer switch is enabled.
This commit introduces:
- the infrastructure for optimizer hints;
- hints for join buffering: BNL(), NO_BNL(), BKA(), NO_BKA();
- NO_ICP() hint for disabling index condition pushdown;
- MRR(), MO_MRR() hint for multi-range reads control;
- NO_RANGE_OPTIMIZATION() for disabling range optimization;
- QB_NAME() for assigning names for query blocks.
MDEV-36563 Assertion `!mysql_bin_log.is_open()' failed in
THD::mark_tmp_table_as_free_for_reuse
The purpose of this commit is to ensure that creation and changes of
temporary tables are properly and predicable logged to the binary
log. It also fixes some bugs where ROW logging was used in MIXED mode,
when STATEMENT would be a better (and expected) choice.
In this comment STATEMENT stands for logging to binary log in
STATEMENT format, MIXED stands for MIXED binlog format and ROW for ROW
binlog format.
New rules for logging of temporary tables
- CREATE of temporary tables are now by default binlogged only if
STATEMENT binlog format is used. If it is binlogged, 1 is stored in
TABLE_SHARE->table_creation_was_logged. The user can change this
behavior by setting create_temporary_table_binlog_formats to
MIXED,STATEMENT in which case the create is logged in statement
format also in MIXED mode (as before).
- Changes to temporary tables are only binlogged if and only if
the CREATE was logged. The logging happens under STATEMENT or MIXED.
If binlog_format=ROW, temporary table changes are not binlogged. A
temporary table that are changed under ROW are marked as 'not up to
date in binlog' and no future row changes are logged. Any usage of
this temporary table will force row logging of other tables in any
future statements using the temporary table to be row logged.
- DROP TEMPORARY is binlogged only of the CREATE was binlogged.
Changes done:
- Row logging is forced for any statement using temporary tables that
are not up to date in the binary log.
(Before the row logging was forced if the user has a temporary table)
- If there is any changes to the temporary table that is not binlogged,
the table is marked as not up to date.
- TABLE_SHARE->table_creation_was_logged has a new definition for
temporary tables:
0 Table creating was not logged to binary log
1 Table creating was logged to binary log and table is up to date.
2 Table creating was logged to binary log but some changes where
not logged to binary log.
Table is not up to date in binary log is defined as value 0 or 2.
- If a multi-table-update or multi-table-delete fails then
all updated temporary tables are marked as not up to date.
- Enforce row logging if the query is using temporary tables
that are not up to date.
Before row logging was enforced if the user had any
temporary tables.
- When dropping temporary tables use IF EXISTS. This ensures
that slave will not stop if it had crashed and lost the
temporary tables.
- Remove comment and version from DROP /*!4000 TEMPORARY.. generated when
a connection closes that has open temporary tables. Added 'generated by
server' at the end of the DROP.
Bugs fixed:
- When using temporary tables with commands that forced row based,
like INSERT INTO temporary_table VALUES (UUID()), this was never
logged which causes the temporary table to be inconsistent on
master and slave.
- Used binlog format is now clearly defined. It is now only depending
on the current binlog_format and the tables used.
Before it was depending on the user had ANY temporary tables and
the state of 'current_stmt_binlog_format' set by previous queries.
This also caused temporary tables to be logged to binary log in
some cases.
- CREATE TABLE t1 LIKE not_logged_temporary_table caused replication
to stop.
- Rename of not binlogged temporary tables where binlogged to binary log
which caused replication to stop.
Changes in behavior:
- By default create_temporary_table_binlog_formats=STATEMENT, which
means that CREATE TEMPORARY is not logged to binary log under MIXED
binary logging. This can be changed by setting
create_temporary_table_binlog_formats to MIXED,STATEMENT.
- Using temporary tables that was not logged to the binary log will
cause any query using them for updating other tables to be logged in
ROW format. Before all queries was logged in ROW format if the user had
any temporary tables, even if they were not used by the query.
- Generated DROP TEMPORARY TABLE is now always using IF EXISTS and
has a "generated by server" comment in the binary log.
The consequences of the above is that manipulations of a lot of rows
through temporary tables will by default be be slower in mixed mode.
For example:
BEGIN;
CREATE TEMPORARY TABLE tmp AS SELECT a, b, c FROM
large_table1 JOIN large_table2 ON ...;
INSERT INTO other_table SELECT b, c FROM tmp WHERE a <100;
DROP TEMPORARY TABLE tmp;
COMMIT;
By default this will create a huge entry in the binary log, compared
to just a few hundred bytes in statement mode. However the change in
this commit will make usage of temporary tables more reliable and
predicable and is thus worth it. Using statement mode or
create_temporary_table_binlog_formats can be used to avoid this issue.
Get rid of need of matherialization for usual INSERT (cache results in
Item_cache* if needed)
- subqueries in VALUE do not see new records in the table we are
inserting to
- subqueries in RETIRNING prohibited to use the table we are inserting to
If one of the selected field is a MIN or MAX and it has been optimized
into a constant, it is not added to the temp table used by a group by
handler (GBH). The GBH therefore cannot store results to this missing
field.
On the other hand, when SELECTing from a view or a derived table,
TMP_TABLE_ALL_COLUMNS is set. If the query has no group by or order
by, an Item_temptable_field is created for this MIN/MAX field and
added to the JOIN. Since the GBH could not store results to the
corresponding field in the temp table, the value of this
Item_temptable_field remains NULL. And the NULL value is passed to the
record, then the temp row, and finally output as the (wrong) result.
To fix this, we opt to not creating a spider GBH when a view or
derived table is involved.
This fixes spider/bugfix.mdev_26345 for --view-protocol
Also fixed a comment:
TABLE_LIST::belong_to_derived is NULL if the table belongs to a
derived table that has non-MERGE type.
Propagate discard/import tablespace request to hlindexes.
Let FLUSH TABLES ... FOR EXPORT open/lock hlindexes, so that InnoDB
prepares hlindexes for export.
Moved reset_hlindexes() to external_lock(F_UNLCK), so that hlindexes
are available for export until UNLOCK TABLES.
Closes#3631
* rpl.rpl_system_versioning_partitions updated for MDEV-32188
* innodb.row_size_error_log_warnings_3 changed error for MDEV-33658
(checks are done in a different order)
DELAYED with virtual columns
Segfault was cause by two different copies of same Field instance in
prepared delayed insert. One was made by
Delayed_insert::get_local_table() (see make_new_field()). That copy
went through parse_vcol_defs() and received new vcol_info->expr.
Another one was made by copy_keys_from_share() by this code:
/*
We are using only a prefix of the column as a key:
Create a new field for the key part that matches the index
*/
field= key_part->field=field->make_new_field(root, outparam, 0);
field->field_length= key_part->length;
So, key_part and table got different objects of same field and the
crash was because key_part->field->vcol_info->expr is NULL.
The fix does update_keypart_vcol_info() to update vcol_info->expr in
key_part->field.
Cleanup: memdup_vcol() is static inline instead of macro + check OOM.
ALTER TABLE needs to open hlindex tables early enough, right after they
were created, so that cleanup after an error would see and delete them.
But they need to be external_lock-ed only in copy_data_between_tables,
after mysql_trans_prepare_alter_copy_data().
Let's move locking out of hlindex_open() into hlindex_lock()
support SQL semantics for SELECT ... WHERE ... ORDER BY ... LIMIT
* switch from returning k nearest neighbors to returning
as many as needed, in k-neighbor chunks, with increasing distance
* make search_layer() skips nodes that are closer than a threshold
* read_next keeps a search context - list of k found nodes,
threshold, ctx, etc.
* when the list of found nodes is exhausted, it repeats the search
starting from last found nodes and a threshold
* search context kepts ctx->refcount incremented, so ctx won't go away
* but commit_lock is unlocked between calls, so InnoDB can modify the table
* use ctx version to detect that, switch to MHNSW_Trx when it happens
bugfix:
* use the correct lock in ha_external_lock() for the graph table
* InnoDB didn't reset locks on ha_external_lock(F_UNLCK) and previous
LOCK_X leaked into the next statement
* fix the truncate-by-handler variant, used by InnoDB
* test that insert works after truncate, meaning graph table was emptied
* test that the vector index size is zero after truncate in MyISAM
When the source row is deleted, mark the corresponding node in HNSW
index by setting `tref` to null. An index is added for the `tref` in
secondary table for faster searching of the to-be-marked nodes.
The nodes marked as deleted will still be used for search, but will not
be included in the final query results.
As skipping deleted nodes and not adding deleted nodes for new-inserted
nodes' neighbor list could impact the performance, we now only skip
these nodes in search results.
- for some reason the bitmap is not set for hlindex during the delete so
I had to temporarily comment out one line
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
* preserve the graph in memory between statements
* keep it in a TABLE_SHARE, available for concurrent searches
* nodes are generally read-only, walking the graph doesn't change them
* distance to target is cached, calculated only once
* SIMD-optimized bloom filter detects visited nodes
* nodes are stored in an array, not List, to better utilize bloom filter
* auto-adjusting heuristic to estimate the number of visited nodes
(to configure the bloom filter)
* many threads can concurrently walk the graph. MEM_ROOT and Hash_set
are protected with a mutex, but walking doesn't need them
* up to 8 threads can concurrently load nodes into the cache,
nodes are partitioned into 8 mutexes (8 is chosen arbitrarily, might
need tuning)
* concurrent editing is not supported though
* this is fine for MyISAM, TL_WRITE protects the TABLE_SHARE and the
graph (note that TL_WRITE_CONCURRENT_INSERT is not allowed, because an
INSERT into the main table means multiple UPDATEs in the graph)
* InnoDB uses secondary transaction-level caches linked in a list in
in thd->ha_data via a fake handlerton
* on rollback the secondary cache is discarded, on commit nodes
from the secondary cache are invalidated in the shared cache
while it is exclusively locked
* on savepoint rollback both caches are flushed. this can be improved
in the future with a row visibility callback
* graph size is controlled by @@mhnsw_cache_size, the cache is flushed
when it reaches the threshold
MDEV-33407 Parser support for vector indexes
The syntax is
create table t1 (... vector index (v) ...);
limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported
MDEV-33404 Engine-independent indexes: subtable method
added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.
MDEV-33406 basic optimizer support for k-NN searches
for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.
let the caller tell init_tmp_table_share() whether the table
should be thread_specific or not.
In particular, internal tmp tables created in the slave thread
are perfectly thread specific
to explicit row_start/row_end columns
In case of adding both system fields of same type (length, unsigned
flag) as old implicit system fields do the rename of implicit system
fields to the ones specified in ALTER, remove SYSTEM_INVISIBLE flag in
that case. Correct PERIOD clause must be specified in ALTER as well.
MDEV-34904 Inplace alter for implicit to explicit versioning is broken
Whether ALTER goes inplace and how it goes inplace depends on
handler_flags which goes from alter_info->flags by this logic:
ha_alter_info->handler_flags|= (alter_info->flags & ~flags_to_remove);
ALTER_VERS_EXPLICIT was not in flags_to_remove and its value (1ULL <<
35) clashed with ALTER_ADD_NON_UNIQUE_NON_PRIM_INDEX.
ALTER_VERS_EXPLICIT must not affect inplace, it is SQL-only so we
remove it from handler_flags.
Fix for MDEV-31466 - add optional derived table column names.
Column names within a SELECT_LEX structure can be left in a non-reparsable
state (as printed out from *::print) after JOIN::prepare. This caused
an incorrect view definition to be written into the .FRM file.
Fixed by resetting item list names in SELECT_LEX structures representing
derived tables before writing out the view definition.
Reviewed by Igor Babaev (igor@mariadb.com)
Extend derived table syntax to support column name assignment.
(subquery expression) [as|=] ident [comma separated column name list].
Prior to this patch, the optional comma separated column name list is
not supported.
Processing within the unit of the subquery expression will use
original column names, outside the unit will use the new names.
For example, in the query
select a1, a2 from
(select c1, c2, c3 from t1 where c2 > 0) as dt (a1, a2, a3)
where a2 > 10;
we see the second column of the derived table dt being used both within,
(where c2 > 0), and outside, (where a2 > 10), the specification.
Both conditions apply to t1.c2.
When multiple unit preparations are required, such as when being used within
a prepared statement or procedure, original column names are needed for
correct resolution. Original names are reset within mysql_derived_reinit().
Item_holder items, used for result tables in both TVC and union preparations
are renamed before use within st_select_lex_unit::prepare().
During wildcard expansion, if column names are present, items names are
set directly after creation.
Reviewed by Igor Babaev (igor@mariadb.com)
Field_blob::store() has special code for GROUP_CONCAT temporary table
(to store blob values in Blob_mem_storage - this prevents them
from being freed/overwritten when a next row is read).
Field_geom and Field_blob_compressed inherit from Field_blob but they
have their own ::store() method without this special Blob_mem_storage
support.
Considering that non-grouping CONCAT() of such fields converts
them to plain BLOB, let's do the same for GROUP_CONCAT. To do it,
Item_func_group_concat::setup will signal that it's creating
a temporary table for GROUP_CONCAT, and Field_blog::make_new_field()
override will create base Field_blob when under group concat.