1
0
mirror of https://github.com/MariaDB/server.git synced 2025-09-13 13:47:59 +03:00
Commit Graph

866 Commits

Author SHA1 Message Date
Oleg Smirnov
877e4a386c MDEV-33281 Implement optimizer hints
This commit introduces:
    - the infrastructure for optimizer hints;
    - hints for join buffering: BNL(), NO_BNL(), BKA(), NO_BKA();
    - NO_ICP() hint for disabling index condition pushdown;
    - MRR(), MO_MRR() hint for multi-range reads control;
    - NO_RANGE_OPTIMIZATION() for disabling range optimization;
    - QB_NAME() for assigning names for query blocks.
2025-05-05 12:02:47 +07:00
Alexander Barkov
6340c23933 MDEV-33281 Implement optimizer hints
Implementing a recursive descent parser for optimizer hints.
2025-05-05 12:02:43 +07:00
Alexander Barkov
f11504af51 MDEV-20034 Add support for the pre-defined weak SYS_REFCURSOR
This patch adds support for SYS_REFCURSOR (a weakly typed cursor)
for both sql_mode=ORACLE and sql_mode=DEFAULT.

Works as a regular stored routine variable, parameter and return value:

- can be passed as an IN parameter to stored functions and procedures
- can be passed as an INOUT and OUT parameter to stored procedures
- can be returned from a stored function

Note, strongly typed REF CURSOR will be added separately.

Note, to maintain dependencies easier, some parts of sql_class.h
and item.h were moved to new header files:

- select_results.h:
  class select_result_sink
  class select_result
  class select_result_interceptor

- sp_cursor.h:
  class sp_cursor_statistics
  class sp_cursor

- sp_rcontext_handler.h
  class Sp_rcontext_handler and its descendants

The implementation consists of the following parts:
- A new class sp_cursor_array deriving from Dynamic_array

- A new class Statement_rcontext which contains data shared
  between sub-statements of a compound statement.
  It has a member m_statement_cursors of the sp_cursor_array data type,
  as well as open cursor counter. THD inherits from Statement_rcontext.

- A new data type handler Type_handler_sys_refcursor in plugins/type_cursor/
  It is designed to store uint16 references -
  positions of the cursor in THD::m_statement_cursors.

- Type_handler_sys_refcursor suppresses some derived numeric features.
  When a SYS_REFCURSOR variable is used as an integer an error is raised.

- A new abstract class sp_instr_fetch_cursor. It's needed to share
  the common code between "OPEN cur" (for static cursors) and
  "OPER cur FOR stmt" (for SYS_REFCURSORs).

- New sp_instr classes:
  * sp_instr_copen_by_ref      - OPEN sys_ref_curor FOR stmt;
  * sp_instr_cfetch_by_ref     - FETCH sys_ref_cursor INTO targets;
  * sp_instr_cclose_by_ref     - CLOSE sys_ref_cursor;
  * sp_instr_destruct_variable - to destruct SYS_REFCURSOR variables when
                                 the execution goes out of the BEGIN..END block
                                 where SYS_REFCURSOR variables are declared.
- New methods in LEX:
  * sp_open_cursor_for_stmt   - handles "OPEN sys_ref_cursor FOR stmt".
  * sp_add_instr_fetch_cursor - "FETCH cur INTO targets" for both
                                static cursors and SYS_REFCURSORs.
  * sp_close - handles "CLOSE cur" both for static cursors and SYS_REFCURSORs.

- Changes in cursor functions to handle both static cursors and SYS_REFCURSORs:
  * Item_func_cursor_isopen
  * Item_func_cursor_found
  * Item_func_cursor_notfound
  * Item_func_cursor_rowcount

- A new system variable @@max_open_cursors - to limit the number
  of cursors (static and SYS_REFCURSORs) opened at the same time.
  Its allowed range is [0-65536], with 50 by default.

- A new virtual method Type_handler::can_return_bool() telling
  if calling item->val_bool() is allowed for Items of this data type,
  or if otherwise the "Illegal parameter for operation" error should be raised
  at fix_fields() time.

- New methods in Sp_rcontext_handler:
  * get_cursor()
  * get_cursor_by_ref()

- A new class Sp_rcontext_handler_statement to handle top level statement
  wide cursors which are shared by all substatements.

- A new virtual method expr_event_handler() in classes Item and Field.
  It's needed to close (and make available for a new OPEN)
  unused THD::m_statement_cursors elements which do not have any references
  any more. It can happen in various moments in time, e.g.
  * after evaluation parameters of an SQL routine
  * after assigning a cursor expression into a SYS_REFCURSOR variable
  * when leaving a BEGIN..END block with SYS_REFCURSOR variables
  * after setting OUT/INOUT routine actual parameters from formal
    parameters.
2025-04-19 10:59:58 +04:00
Marko Mäkelä
bb9f010432 Merge 11.4 into 11.8 2025-03-05 20:39:47 +02:00
Sergei Petrunia
ef966af801 MDEV-30877: Output cardinality for derived table ignores GROUP BY
(Variant 3) (commit in 11.4)
When a derived table has a GROUP BY clause:

  SELECT ...
    FROM  (SELECT ... GROUP BY col1, col2) AS tbl

The optimizer would use inner join's output cardinality as an estimate
of derived table size, ignoring the fact that GROUP BY operation would
produce much fewer groups.

Add code to produce tighter bounds:
- The GROUP BY list is split into per-table lists. If GROUP BY list has
  expressions that refer to multiple tables, we fall back to join output
  cardinality.
- For each table, the first cardinality estimate is join_tab->read_records.
- Then, we try to get a tighter bound by using index statistics.
- If indexes do not cover all GROUP BY columns, we try to use per-column
  EITS statistics.
2025-02-10 22:06:49 +02:00
Sergei Petrunia
1c2a83179d MDEV-35616: Add basic optimizer support for virtual column
(Review input addressed)

After this patch, the optimizer can handle virtual column expressions
in WHERE/ON clauses. If the table has an indexed virtual column:

  ALTER TABLE t1
    ADD COLUMN vcol INT AS (col1+1),
    ADD INDEX idx1(vcol);

and the query uses the exact virtual column expression:

  SELECT * FROM t1 WHERE col1+1 <= 100

then the optimizer will be able use index idx1 for it.

This is achieved by walking the WHERE/ON clauses and replacing instances
of virtual column expression (like "col1+1" above) with virtual column's
Item_field (like "vcol"). The latter can be processed by the optimizer.

Replacement is considered (and done) only in items that are potentially
usable to the range optimizer.
2025-01-25 10:50:52 +02:00
Yuchen Pei
e021770667 MDEV-34911 Sargable substr(col, 1, n) = str
Make Item_func_eq of the following forms sargable by updating the relevant range
analysis methods:

1. substr(col, 1, n) = str
2. str = substr(col, 1, n)
3. left(col, n) = str
4. str = left(col, n)

where col is a indexed column and str is a const and inexpensive item
of length n.

We do this by factoring out Item_func_like::get_mm_leaf() and apply it
to a string obtained from escaping str and then appending a wildcard
"%" to it.

The addition of the two Functype enums, LEFT_FUNC and SUBSTR_FUNC,
requires changes in the spider group by handler to continue handling
LEFT and SUBSTR correctly.

Co-authored-by: Yuchen Pei <ycp@mariadb.com>
Co-authored-by: Sergei Petrunia <sergey@mariadb.com>
2024-12-20 13:25:28 +11:00
Sergei Golubchik
78119d1ae5 MDEV-33410 VECTOR data type 2024-11-05 14:00:51 -08:00
Sergei Golubchik
d6add9a03d initial support for vector indexes
MDEV-33407 Parser support for vector indexes

The syntax is

  create table t1 (... vector index (v) ...);

limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported

MDEV-33404 Engine-independent indexes: subtable method

added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.

MDEV-33406 basic optimizer support for k-NN searches

for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.
2024-11-05 14:00:48 -08:00
Sergei Golubchik
9ccf02a9a7 MDEV-32885 VEC_DISTANCE() function 2024-11-05 14:00:48 -08:00
Libing Song
72cc58bb71 MDEV-32014 Rename binlog cache temporary file to binlog file
for large transaction

Description
===========
When a transaction commits, it copies the binlog events from
binlog cache to binlog file. Very large transactions
(eg. gigabytes) can stall other transactions for a long time
because the data is copied while holding LOCK_log, which blocks
other commits from binlogging.

The solution in this patch is to rename the binlog cache file to
a binlog file instead of copy, if the commiting transaction has
large binlog cache. Rename is a very fast operation, it doesn't
block other transactions a long time.

Design
======
* binlog_large_commit_threshold
  type: ulonglong
  scope: global
  dynamic: yes
  default: 128MB

  Only the binlog cache temporary files large than 128MB are
  renamed to binlog file.

* #binlog_cache_files directory
  To support rename, all binlog cache temporary files are managed
  as normal files now. `#binlog_cache_files` directory is in the same
  directory with binlog files. It is created at server startup if it doesn't
  exist. Otherwise, all files in the directory is deleted at startup.

  The temporary files are named with ML_ prefix and the memorary address
  of the binlog_cache_data object which guarantees it is unique.

* Reserve space
  To supprot rename feature, It must reserve enough space at the
  begin of the binlog cache file. The space is required for
  Format description, Gtid list, checkpoint and Gtid events when
  renaming it to a binlog file.

  Since binlog_cache_data's cache_log is directly accessed by binlog log,
  online alter and wsrep. It is not easy to update all the code. Thus
  binlog cache will not reserve space if it is not session binlog cache or
  wsrep session is enabled.

  - m_file_reserved_bytes
    Stores the bytes reserved at the begin of the cache file.
    It is initialized in write_prepare() and cleared by reset().

    The reserved file header is hide to callers. Thus there is no
    change for callers. E.g.
    - get_byte_position() still get the length of binlog data
      written to the cache, but not the file length.
    - truncate(0) will truncate the file to m_file_reserved_bytes but not 0.

  - write_prepare()
    write_prepare() is called everytime when anything is being written
    into the cache. It will call init_file_reserved_bytes() to  create
    the cache file (if it doesn't exist) and reserve suitable space if
    the data written exceeds buffer's size.

* Binlog_commit_by_rotate
  It is used to encapsulate the code for remaing a binlog cache
  tempoary file to binlog file.
  - should_commit_by_rotate()
    it is called by write_transaction_to_binlog_events() to check if
    a binlog cache should be rename to a binlog file.
  - commit()
    That is the entry to rename a binlog cache and commit the
    transaction. Both rename and commit are protected by LOCK_log,
    Thus not other transactions can write anything into the renamed
    binlog before it.

    Rename happens in a rotation. After the new binlog file is generated,
    replace_binlog_file() is called to:
    - copy data from the new binlog file to its binlog cache file.
    - write gtid event.
    - rename the binlog cache file to binlog file.

    After that the rotation will continue to succeed. Then the transaction
    is committed in a seperated group itself. Its cache file will be
    detached and cache log will be reset before calling
    trx_group_commit_with_engines(). Thus only Xid event be written.
2024-10-17 07:53:59 -06:00
Oleksandr Byelkin
99b370e023 Merge branch '11.2' into 11.4 2024-05-21 19:38:51 +02:00
Sergei Golubchik
bf5da43e50 Merge branch '11.1' into 11.2 2024-05-13 10:00:26 +02:00
Sergei Golubchik
f0a5412037 Merge branch '11.0' into 11.1 2024-05-13 09:52:30 +02:00
Sergei Golubchik
f9807aadef Merge branch '10.11' into 11.0 2024-05-12 12:18:28 +02:00
Sergei Golubchik
a6b2f820e0 Merge branch '10.6' into 10.11 2024-05-10 20:02:18 +02:00
Sergei Golubchik
7b53672c63 Merge branch '10.5' into 10.6 2024-05-08 20:06:00 +02:00
Vladislav Vaintroub
029e2a5fd9 MDEV-33876 CMake, zlib - use names compatible with official FindZLIB.cmake
- ZLIB_LIBRARIES, not ZLIB_LIBRARY
- ZLIB_INCLUDE_DIRS, not ZLIB_INCLUDE_DIR

For building libmariadb, ZLIB_LIBRARY/ZLIB_INCLUDE_DIR are still defined
This workaround will be removed later.
2024-05-03 21:48:47 +02:00
Sergei Golubchik
49f2e9f700 update bison version dependency
need at least 2.4 because we use
per-type %destructor that was added after 2.3 in 2.3a
2024-03-27 16:14:55 +01:00
Oleksandr Byelkin
fa69b085b1 Merge branch '11.3' into 11.4 2024-02-15 13:53:21 +01:00
Oleksandr Byelkin
d21cb43db1 Merge branch '11.2' into 11.3 2024-02-04 16:42:31 +01:00
Sergei Golubchik
79580f4f96 Merge branch '11.1' into 11.2 2024-02-02 17:43:57 +01:00
Sergei Golubchik
b6680e0101 Merge branch '11.0' into 11.1 2024-02-02 11:30:47 +01:00
Sergei Golubchik
87e13722a9 Merge branch '10.6' into 10.11 2024-02-01 18:36:14 +01:00
Oleksandr Byelkin
fe490f85bb Merge branch '10.11' into 11.0 2024-01-30 08:54:10 +01:00
Oleksandr Byelkin
14d930db5d Merge branch '10.6' into 10.11 2024-01-30 08:17:58 +01:00
Kristian Nielsen
d039346a7a MDEV-4991: GTID binlog indexing
Improve the performance of slave connect using B+-Tree indexes on each binlog
file. The index allows fast lookup of a GTID position to the corresponding
offset in the binlog file, as well as lookup of a position to find the
corresponding GTID position.

This eliminates a costly sequential scan of the starting binlog file
to find the GTID starting position when a slave connects. This is
especially costly if the binlog file is not cached in memory (IO
cost), or if it is encrypted or a lot of slaves connect simultaneously
(CPU cost).

The size of the index files is generally less than 1% of the binlog data, so
not expected to be an issue.

Most of the work writing the index is done as a background task, in
the binlog background thread. This minimises the performance impact on
transaction commit. A simple global mutex is used to protect index
reads and (background) index writes; this is fine as slave connect is
a relatively infrequent operation.

Here are the user-visible options and status variables. The feature is on by
default and is expected to need no tuning or configuration for most users.

binlog_gtid_index
  On by default. Can be used to disable the indexes for testing purposes.

binlog_gtid_index_page_size (default 4096)
  Page size to use for the binlog GTID index. This is the size of the nodes
  in the B+-tree used internally in the index. A very small page-size (64 is
  the minimum) will be less efficient, but can be used to stress the
  BTree-code during testing.

binlog_gtid_index_span_min (default 65536)
  Control sparseness of the binlog GTID index. If set to N, at most one
  index record will be added for every N bytes of binlog file written.
  This can be used to reduce the number of records in the index, at
  the cost only of having to scan a few more events in the binlog file
  before finding the target position

Two status variables are available to monitor the use of the GTID indexes:

  Binlog_gtid_index_hit
  Binlog_gtid_index_miss

The "hit" status increments for each successful lookup in a GTID index.
The "miss" increments when a lookup is not possible. This indicates that the
index file is missing (eg. binlog written by old server version
without GTID index support), or corrupt.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-01-27 12:09:54 +01:00
Monty
1ca813bf76 Added socketpair.c as a replacement for 'pipe()' call for Windows.
This was needed to get semisync to work on Windows.
2024-01-23 13:03:11 +02:00
Marko Mäkelä
35cc4b6c05 Merge 10.11 into 11.0 2024-01-22 10:10:50 +02:00
Marko Mäkelä
b3ca7fa089 Merge 10.6 into 10.11 2024-01-22 08:49:04 +02:00
Sergei Golubchik
3a33ae8601 MDEV-33091 pcre2 headers aren't found on Solaris
use pkg-config to find pcre2, if possible

rename PCRE_INCLUDES to use PKG_CHECK_MODULES naming, PCRE_INCLUDE_DIRS
2024-01-19 20:23:45 +01:00
Sergei Golubchik
c154aafe1a Merge remote-tracking branch '11.3' into 11.4 2023-12-21 15:40:55 +01:00
Vladislav Vaintroub
3424ed7d42 MDEV-32189 Use icu for timezones on windows
Use ICU to work with timezones, to retrieve current timezone name,
abbreviation, and offset from GMT. However in case TZ environment variable
is used to set timezone, and ICU does not have corresponding one,
C runtime functions will be used.

Moved some of timezone handling to mysys.
Added unit tests.
2023-11-21 21:35:02 +01:00
Oleksandr Byelkin
34272bd6a5 Merge branch '11.2' into 11.3 2023-11-14 18:33:03 +01:00
Oleksandr Byelkin
0427c4739e Merge tag '11.1' into 11.2
MariaDB 11.1.3 release
2023-11-14 18:28:37 +01:00
Oleksandr Byelkin
0f5613a25f Merge branch '11.0' into 11.1 2023-11-08 18:03:08 +01:00
Oleksandr Byelkin
48af85db21 Merge branch '10.11' into 11.0 2023-11-08 17:09:44 +01:00
Nikita Malyavin
cb52174693 online alter: extract the source to a separate file
Move all the functions dedicated to online alter to a newly created
online_alter.cc.

With that, make many functions static and simplify the static functions
naming.

Also, rename binlog_log_row_online_alter -> online_alter_log_row.
2023-11-02 22:58:03 +04:00
Nikita Malyavin
830bdfccbd MDEV-32126 Assertion fails upon online ALTER and binary log enabled
Assertion `!writer.checksum_len || writer.remains == 0' fails upon
concurrent online ALTER and transactions with failing statements and binary
log enabled.
Also another assertion, `pos != (~(my_off_t) 0)', fails in my_seek, upon
reinit_io_cache, on a simplified test. This means that IO_CACHE wasn't
properly initialized, or had an error before.

The overall problem is a deep interference with the effect of an installed
binlog_hton: the assumption about that thd->binlog_get_cache_mngr() is,
sufficiently, NULL, when we shouldn't run the binlog part of
binlog_commit/binlog_rollback, is wrong: as turns out, sometimes the binlog
handlerton can be not installed in current thd, but binlog_commit can be
called on behalf of binlog, as in the bug reported.

One separate condition found is XA recovery of the orphaned transaction,
when binlog_commit is also called, but it has nothing to do with
online alter.

Solution:
Extract online alter operations into a separate handlerton.
2023-11-02 22:58:03 +04:00
Marko Mäkelä
7b842f1536 Merge 11.2 into 11.3 2023-10-27 10:48:29 +03:00
Yuchen Pei
d0f8dfbcf0 Merge branch '11.1' into 11.2 2023-10-27 18:11:56 +11:00
Marko Mäkelä
5a8fca5a4f Merge 10.6 into 10.10 2023-10-23 18:43:36 +03:00
Sergei Petrunia
4941ac9192 MDEV-32113: utf8mb3_key_col=utf8mb4_value cannot be used for ref
(Variant#3: Allow cross-charset comparisons, use a special
CHARSET_INFO to create lookup keys. Review input addressed.)

Equalities that compare utf8mb{3,4}_general_ci strings, like:

  WHERE ... utf8mb3_key_col=utf8mb4_value    (MB3-4-CMP)

can now be used to construct ref[const] access and also participate
in multiple-equalities.
This means that utf8mb3_key_col can be used for key-lookups when
compared with an utf8mb4 constant, field or expression using '=' or
'<=>' comparison operators.

This is controlled by optimizer_switch='cset_narrowing=on', which is
OFF by default.

IMPLEMENTATION
Item value comparison in (MB3-4-CMP) is done using utf8mb4_general_ci.
This is valid as any utf8mb3 value is also an utf8mb4 value.

When making index lookup value for utf8mb3_key_col, we do "Charset
Narrowing": characters that are in the Basic Multilingual Plane (=BMP) are
copied as-is, as they can be represented in utf8mb3. Characters that are
outside the BMP cannot be represented in utf8mb3 and are replaced
with U+FFFD, the "Replacement Character".

In utf8mb4_general_ci, the Replacement Character compares as equal to any
character that's not in BMP. Because of this, the constructed lookup value
will find all index records that would be considered equal by the original
condition (MB3-4-CMP).

Approved-by: Monty <monty@mariadb.org>
2023-10-19 17:24:30 +03:00
Marko Mäkelä
9b2a65e41a Merge 11.0 into 11.1 2023-10-19 08:26:16 +03:00
Marko Mäkelä
be24e75229 Merge 10.11 into 11.0 2023-10-19 08:12:16 +03:00
Marko Mäkelä
d5e15424d8 Merge 10.6 into 10.10
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.

Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue

Disabled test:
- spider/bugfix.mdev_27239 because we started to get
  +Error	1429 Unable to connect to foreign data source: localhost
  -Error	1158 Got an error reading communication packets
- main.delayed
  - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
    This part is disabled for now as it fails randomly with different
    warnings/errors (no corruption).
2023-10-14 13:36:11 +03:00
Sergei Petrunia
e987b9350c MDEV-31496: Make optimizer handle UCASE(varchar_col)=...
(Review input addressed)
(Added handling of UPDATE/DELETE and partitioning w/o index)

If the properties of the used collation allow, do the following
equivalent rewrites:

1. UPPER(key_col)=expr  ->  key_col=expr
   expr=UPPER(key_col)  ->  expr=key_col
   (also rewrite both sides of the equality at the same time)

2. UPPER(key_col) IN (constant-list)  -> key_col IN (constant-list)

- Mark utf8mb{3,4}_general_ci as collations that allow this.
- Add optimizer_switch='sargable_casefold=ON' to control this.
  (ON by default in this patch)
- Cover the rewrite in Optimizer Trace, rewrite name is
  "sargable_casefold_removal".
2023-09-12 17:14:43 +03:00
Marko Mäkelä
b0a43818b4 Merge 10.5 into 10.6 2023-09-04 10:15:02 +03:00
Marko Mäkelä
59952b2625 Merge 10.4 into 10.5 2023-09-04 09:40:26 +03:00
Dmitry Shulga
1fde785315 MDEV-31890: Compilation failing on MacOS (unknown warning option -Wno-unused-but-set-variable)
For clang compiler the compiler's flag -Wno-unused-but-set-variable
was set based on compiler version. This approach could result in
false positive detection for presence of compiler option since
only first three groups of digits in compiler version taken into account
and it could lead to inaccuracy in determining of supported compiler's
features.

Correct way to detect options supported by a compiler is to use
the macros  MY_CHECK_CXX_COMPILER_FLAG and to check the result of
variable with prefix have_CXX__
So, to check whether compiler does support the option
 -Wno-unused-but-set-variable
the macros
 MY_CHECK_CXX_COMPILER_FLAG(-Wno-unused-but-set-variable)
should be called and the result variable
 have_CXX__Wno_unused_but_set_variable
be tested for assigned value.
2023-08-28 16:47:00 +07:00