1
0
mirror of https://github.com/MariaDB/server.git synced 2025-09-11 05:52:26 +03:00
Commit Graph

367 Commits

Author SHA1 Message Date
Yuchen Pei
8cdee25952 MDEV-36132 Substitute vcol expressions with indexed vcol fields in ORDER BY and GROUP BY
Also expand vcol field index coverings to include indexes covering all
the fields in the expression. The reasoning goes as follows: let f(c1,
c2, ..., cn) be a function on applied to columns c1, c2, ..., cn, if
f(...) is covered by an index, so should vc whose expression is
f(...).

For example, if t.vf = t.c1 + t.c2, and t has three indexes (vf), (c1,
c2), (c1).

Before this change, vf's index covering is a singleton {(vf)}. Let's call
that the "conventional" index covering.

After this change vf's index covering is now {(vf), (c1, c2)}, since
(c1, c2) covers both c1 and c2. Let's call (c1, c2) in this case the
"extra" covering.

With the coverings updated, when an index in the "extra" covering is
chosen for keyread, the vcol also needs to be calculated. In this case
we mark vcol in the table read_set, and ensure it is computed.

With these changes, we see various improvements, including from using
full table scan + filesort to full index scan + filesort when ORDER BY
an indexed vcol (here vc = c + 1 is a vcol and both c and vc are
indexes):

 explain select c + 1 from t order by vc;
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t	ALL	NULL	NULL	NULL	NULL	10000	Using filesort
+1	SIMPLE	t	index	NULL	c	5	NULL	10000	Using index; Using filesort

The substitutions are followed updates to all_fields which include a
copy of the ORDER BY/GROUP BY item pointers, as well as corresponding
updates to ref_pointer_array so that the all_fields and
ref_pointer_array remain in sync.

Another, related change is the recomputation of table index covering
on substitutions. It not only reflects the correct table index
covering after the substitutions, but also improve executions where
the vcol index can be chosen, such as this example (here vc = c + 1
and vc is the only index in the table), from full table scan +
filesort to full index scan:

select vc from t order by c + 1;

We do it in SELECT as well as in single table DELETE/UPDATE.
2025-07-22 10:44:12 +10:00
Sergei Golubchik
bead24b7f3 mariadb-test: wait on disconnect
Remove one of the major sources of race condiitons in mariadb-test.
Normally, mariadb_close() sends COM_QUIT to the server and immediately
disconnects. In mariadb-test it means the test can switch to another
connection and sends queries to the server before the server even
started parsing the COM_QUIT packet and these queries can see the
connection as fully active, as it didn't reach dispatch_command yet.

This is a major source of instability in tests and many - but not all,
still less than a half - tests employ workarounds. The correct one
is a pair count_sessions.inc/wait_until_count_sessions.inc.
Also very popular was wait_until_disconnected.inc, which was completely
useless, because it verifies that the connection is closed, and after
disconnect it always is, it didn't verify whether the server processed
COM_QUIT. Sadly the placebo was as widely used as the real thing.

Let's fix this by making mariadb-test `disconnect` command _to wait_ for
the server to confirm. This makes almost all workarounds redundant.

In some cases count_sessions.inc/wait_until_count_sessions.inc is still
needed, though, as only `disconnect` command is changed:

 * after external tools, like `exec $MYSQL`
 * after failed `connect` command
 * replication, after `STOP SLAVE`
 * Federated/CONNECT/SPIDER/etc after `DROP TABLE`

and also in some XA tests, because an XA transaction is dissociated from
the THD very late, after the server has closed the client connection.

Collateral cleanups: fix comments, remove some redundant statements:
 * DROP IF EXISTS if nothing is known to exist
 * DROP table/view before DROP DATABASE
 * REVOKE privileges before DROP USER
 etc
2025-07-16 09:14:33 +07:00
Marko Mäkelä
cffbb17480 MDEV-28933: Per-table unique FOREIGN KEY constraint names
Before MySQL 4.0.18, user-specified constraint names were ignored.
Starting with MySQL 4.0.18, the specified constraint name was
prepended with the schema name and '/'.  Now we are transforming
into a format where the constraint name is prepended with the
dict_table_t::name and the impossible UTF-8 sequence 0xff.
Generated constraint names will be ASCII decimal numbers.

On upgrade, old FOREIGN KEY constraint names will be displayed
without any schema name prefix. They will be updated to the new
format on DDL operations.

dict_foreign_t::sql_id(): Return the SQL constraint name
without any schemaname/tablename\377 or schemaname/ prefix.

row_rename_table_for_mysql(), dict_table_rename_in_cache():
Simplify the logic: Just rename constraints to the new format.

dict_table_get_foreign_id(): Replaces dict_table_get_highest_foreign_id().

innobase_get_foreign_key_info(): Let my_error() refer to erroneous
anonymous constraints as "(null)".

row_delete_constraint(): Try to drop all 3 constraint name variants.

Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
2025-07-08 12:30:27 +03:00
Oleksandr Byelkin
f1102da37a Merge branch '11.8' into 12.0 2025-05-22 09:22:55 +02:00
Oleg Smirnov
4bb2669d18 MDEV-33281 Optimizer hints Cleanup: fix formatting, rename objects 2025-05-05 12:02:47 +07:00
Sergei Golubchik
237e24497b Merge remote-tracking branch 'github/bb-11.4-release' into bb-11.8-serg 2025-04-27 19:40:00 +02:00
Oleksandr Byelkin
a8d4642375 Merge branch '10.11' into 11.4 2025-04-26 10:53:02 +02:00
Oleksandr Byelkin
20b818f45e Merge branch '10.6' into 10.11 2025-04-21 11:23:11 +02:00
Oleksandr Byelkin
a135551569 Merge branch '10.5' into 10.6 2025-04-21 10:43:17 +02:00
Marko Mäkelä
1c9f64e54f MDEV-36613 Incorrect undo logging for indexes on virtual columns
Starting with mysql/mysql-server@02f8eaa998
and commit 2e814d4702 the index ID of
indexes on virtual columns was being encoded insufficiently in
InnoDB undo log records.  Only the least significant 32 bits were
being written.  This could lead to some corruption of the affected
indexes on ROLLBACK, as well as to missed chances to remove some
history from such indexes when purging the history of committed
transactions that included DELETE or an UPDATE in the indexes.

dict_hdr_create(): In debug instrumented builds, initialize the
DICT_HDR_INDEX_ID close to the 32-bit barrier, instead of initializing
it to DICT_HDR_FIRST_ID (10).  This will allow the changed code to
be exercised while running ./mtr --suite=gcol,vcol.

trx_undo_log_v_idx(): Encode large index->id in a similar way as
mysql/mysql-server@e00328b4d0
but using a different implementation.

trx_undo_read_v_idx_low(): Decode large index->id in a similar way
as mach_u64_read_much_compressed().

Reviewed by: Debarun Banerjee
2025-04-16 15:55:45 +03:00
Marko Mäkelä
bb9f010432 Merge 11.4 into 11.8 2025-03-05 20:39:47 +02:00
Marko Mäkelä
49a6baec56 Merge 10.11 into 11.4 2025-03-03 11:07:56 +02:00
Marko Mäkelä
6e6a1b316c MDEV-35000: dict_table_close() breaks STATS_AUTO_RECALC
stats_deinit(): Replaces dict_stats_deinit().
Deinitialize the statistics for persistent tables,
so that they will be reloaded or recalculated
on a subsequent ha_innobase::open().

ha_innobase::rename_table(): Invoke stats_deinit() so that the
subsequent ha_innobase::open() will reload the InnoDB persistent
statistics. That is, it will remain possible to have the InnoDB
persistent statistics reloaded by executing the following:
RENAME TABLE t TO tmp, tmp TO t;

dict_table_close(table): Replaced with table->release().
There will no longer be any logic that would attempt to ensure
that the InnoDB persistent statistics will be reloaded after
FLUSH TABLES has been executed. This also fixes the problem that
dict_table_t::stat_modified_counter would be frequently reset to 0,
whenever ha_innobase::open() is invoked after the table reference
count had dropped to 0.

dict_table_close(table, thd, mdl): Remove the parameter "dict_locked".
Do not try to invalidate the statistics.

ha_innobase::statistics_init(): Replaces dict_stats_init(table).

Reviewed by: Thirunarayanan Balathandayuthapani
2025-02-28 09:00:16 +02:00
Sergei Golubchik
9ee09a33bb Merge branch '11.7' into 11.8 2025-02-11 20:29:43 +01:00
Sergei Golubchik
ba01c2aaf0 Merge branch '11.4' into 11.7
* rpl.rpl_system_versioning_partitions updated for MDEV-32188
* innodb.row_size_error_log_warnings_3 changed error for MDEV-33658
  (checks are done in a different order)
2025-02-06 16:46:36 +01:00
Sergei Petrunia
1c2a83179d MDEV-35616: Add basic optimizer support for virtual column
(Review input addressed)

After this patch, the optimizer can handle virtual column expressions
in WHERE/ON clauses. If the table has an indexed virtual column:

  ALTER TABLE t1
    ADD COLUMN vcol INT AS (col1+1),
    ADD INDEX idx1(vcol);

and the query uses the exact virtual column expression:

  SELECT * FROM t1 WHERE col1+1 <= 100

then the optimizer will be able use index idx1 for it.

This is achieved by walking the WHERE/ON clauses and replacing instances
of virtual column expression (like "col1+1" above) with virtual column's
Item_field (like "vcol"). The latter can be processed by the optimizer.

Replacement is considered (and done) only in items that are potentially
usable to the range optimizer.
2025-01-25 10:50:52 +02:00
Sergei Golubchik
f1a7693bc0 Merge branch '10.11' into 11.4 2025-01-14 23:45:41 +01:00
Sergei Golubchik
221aa5e08f Merge branch '10.6' into 10.11 2025-01-10 13:14:42 +01:00
Sergei Golubchik
a0e5dd5433 mysqltest: fix --sorted_results
only sort actual results not warnings or metadata
also work for vertical results
warnings are sorted separately
2025-01-09 10:00:36 +01:00
Sergei Golubchik
062f8eb37d cleanup: key algorithm vs key flags
the information about index algorithm was stored in two
places inconsistently split between both.

BTREE index could have key->algorithm == HA_KEY_ALG_BTREE, if the user
explicitly specified USING BTREE or HA_KEY_ALG_UNDEF, if not.

RTREE index had key->algorithm == HA_KEY_ALG_RTREE
and always had key->flags & HA_SPATIAL

FULLTEXT index had  key->algorithm == HA_KEY_ALG_FULLTEXT
and always had key->flags & HA_FULLTEXT

HASH index had key->algorithm == HA_KEY_ALG_HASH or HA_KEY_ALG_UNDEF

long unique index always had key->algorithm == HA_KEY_ALG_LONG_HASH

In this commit:

All indexes except BTREE and HASH always have key->algorithm
set, HA_SPATIAL and HA_FULLTEXT flags are not used anymore (except
for storage to keep frms backward compatible).

As a side effect ALTER TABLE now detects FULLTEXT index renames correctly
2024-11-05 14:00:47 -08:00
Alexander Barkov
36eba98817 MDEV-19123 Change default charset from latin1 to utf8mb4
Changing the default server character set from latin1 to utf8mb4.
2024-07-11 10:21:07 +04:00
Alexander Barkov
903b5d6a83 MDEV-25829 Change default Unicode collation to uca1400_ai_ci
Step#3 The main patch
2024-05-24 15:50:05 +04:00
Oleksandr Byelkin
cd28b2479c Merge branch '11.1' into 11.2 2024-04-09 12:12:33 +02:00
Marko Mäkelä
fec2fd6add Merge 10.11 into 11.0 2024-03-28 10:51:36 +02:00
Marko Mäkelä
788953463d Merge 10.6 into 10.11
Some fixes related to commit f838b2d799 and
Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row()
for system-versioned tables were provided by Nikita Malyavin.
This was required by test versioning.rpl,trx_id,row.
2024-03-28 09:16:57 +02:00
Marko Mäkelä
b8a6719889 MDEV-26642/MDEV-26643/MDEV-32898 Implement innodb_snapshot_isolation
https://jepsen.io/analyses/mysql-8.0.34 highlights that the
transaction isolation levels in the InnoDB storage engine do not
correspond to any widely accepted definitions, such as
"Generalized Isolation Level Definitions"
https://pmg.csail.mit.edu/papers/icde00.pdf
(PL-1 = READ UNCOMMITTED, PL-2 = READ COMMITTED, PL-2.99 = REPEATABLE READ,
PL-3 = SERIALIZABLE).
Only READ UNCOMMITTED in InnoDB seems to match the above definition.

The issue is that InnoDB does not detect write/write conflicts
(Section 4.4.3, Definition 6) in the above.

It appears that as soon as we implement write/write conflict detection
(SET SESSION innodb_snapshot_isolation=ON), the default isolation level
(SET TRANSACTION ISOLATION LEVEL REPEATABLE READ) will become
Snapshot Isolation (similar to Postgres), as defined in Section 4.2 of
"A Critique of ANSI SQL Isolation Levels", MSR-TR-95-51, June 1995
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf

Locking reads inside InnoDB used to read the latest committed version,
ignoring what should actually be visible to the transaction.
The added test innodb.lock_isolation illustrates this. The statement
	UPDATE t SET a=3 WHERE b=2;
is executed in a transaction that was started before a read view or
a snapshot of the current transaction was created, and committed before
the current transaction attempts to execute
	UPDATE t SET b=3;
If SET innodb_snapshot_isolation=ON is in effect when the second
transaction was started, the second transaction will be aborted with
the error ER_CHECKREAD. By default (innodb_snapshot_isolation=OFF),
the second transaction would execute inconsistently, displaying an
incorrect SELECT COUNT(*) FROM t in its read view.

If innodb_snapshot_isolation=ON, if an attempt to acquire a lock on a
record that does not exist in the current read view is made, an error
DB_RECORD_CHANGED (HA_ERR_RECORD_CHANGED, ER_CHECKREAD) will
be raised. This error will be treated in the same way as a deadlock:
the transaction will be rolled back.

lock_clust_rec_read_check_and_lock(): If the current transaction has
a read view where the record is not visible and
innodb_snapshot_isolation=ON, fail before trying to acquire the lock.

row_sel_build_committed_vers_for_mysql(): If innodb_snapshot_isolation=ON,
disable the "semi-consistent read" logic that had been implemented by
myself on the directions of Heikki Tuuri in order to address
https://bugs.mysql.com/bug.php?id=3300 that was motivated by a customer
wanting UPDATE to skip locked rows that do not match the WHERE condition.
It looks like my changes were included in the MySQL 5.1.5
commit ad126d90e019f223470e73e1b2b528f9007c4532; at that time, employees
of Innobase Oy (a recent acquisition of Oracle) had lost write access to
the repository.

The only reason why we set innodb_snapshot_isolation=OFF by default is
backward compatibility with applications, such as the one that motivated
the implementation of "semi-consistent read" back in 2005. In a later
major release, we can default to innodb_snapshot_isolation=ON.

Thanks to Peter Alvaro, Kyle Kingsbury and Alexey Gotsman for their work
on https://github.com/jepsen-io/ and to Kyle and Alexey for explanations
and some testing of this fix.

Thanks to Vladislav Lesin for the initial test for MDEV-26643,
as well as reviewing these changes.
2024-03-20 09:48:03 +02:00
Sergei Golubchik
fef31a26f3 Merge branch '11.1' into 11.2 2023-12-20 23:43:05 +01:00
Sergei Golubchik
8c8bce05d2 Merge branch '10.11' into 11.0 2023-12-19 15:53:18 +01:00
Sergei Golubchik
fd0b47f9d6 Merge branch '10.6' into 10.11 2023-12-18 11:19:04 +01:00
Sergei Golubchik
e95bba9c58 Merge branch '10.5' into 10.6 2023-12-17 11:20:43 +01:00
Sergei Golubchik
98a39b0c91 Merge branch '10.4' into 10.5 2023-12-02 01:02:50 +01:00
Marko Mäkelä
0d29f3759c Merge 11.1 into 11.2 2023-11-28 11:19:06 +02:00
Monty
06f7ed4dcd MDEV-28566 Assertion `!expr->is_fixed()' failed in bool virtual_column_info::fix_session_expr(THD*)
The problem was that table->vcol_cleanup_expr() was not called in case
of error in open_table().
2023-11-27 19:08:14 +02:00
Marko Mäkelä
5b6134b040 Merge 10.11 into 11.0 2023-11-24 11:20:56 +02:00
Marko Mäkelä
f87c7d1772 Merge 10.6 into 10.11 2023-11-21 12:47:51 +02:00
Marko Mäkelä
4c16ec3e77 MDEV-32050 fixup: Stabilize tests
In any test that uses wait_all_purged.inc, ensure that InnoDB tables
will be created without persistent statistics.

This is a follow-up to commit cd04673a17
after a similar failure was observed in the innodb_zip.blob test.
2023-11-21 12:42:00 +02:00
Oleksandr Byelkin
0427c4739e Merge tag '11.1' into 11.2
MariaDB 11.1.3 release
2023-11-14 18:28:37 +01:00
Oleksandr Byelkin
48af85db21 Merge branch '10.11' into 11.0 2023-11-08 17:09:44 +01:00
Oleksandr Byelkin
04d9a46c41 Merge branch '10.6' into 10.10 2023-11-08 16:23:30 +01:00
Marko Mäkelä
228b7e4db5 MDEV-13626 Merge InnoDB test cases from MySQL 5.7
This imports and adapts a number of MySQL 5.7 test cases that are
applicable to MariaDB.

Some tests for old bug fixes are not that relevant because the code
has been refactored since then (especially starting with
MariaDB Server 10.6), and the tests would not reproduce the
original bug if the fix was reverted.

In the test innodb_fts.opt, there are many duplicate MATCH ranks, which
would make the results nondeterministic. The test was stabilized by
changing some LIMIT clauses or by adding sorted_result in those cases
where the purpose of a test was to show that no sorting took place
in the server.

In the test innodb_fts.phrase, MySQL 5.7 would generate FTS_DOC_ID that
are 1 larger than in MariaDB. In innodb_fts.index_table the difference is 2.
This is because in MariaDB, fts_get_next_doc_id() post-increments
cache->next_doc_id, while MySQL 5.7 pre-increments it.

Reviewed by: Thirunarayanan Balathandayuthapani
2023-11-08 12:17:14 +02:00
Yuchen Pei
d0f8dfbcf0 Merge branch '11.1' into 11.2 2023-10-27 18:11:56 +11:00
Marko Mäkelä
14685b10df MDEV-32050: Deprecate&ignore innodb_purge_rseg_truncate_frequency
The motivation of introducing the parameter
innodb_purge_rseg_truncate_frequency in
mysql/mysql-server@28bbd66ea5 and
mysql/mysql-server@8fc2120fed
seems to have been to avoid stalls due to freeing undo log pages
or truncating undo log tablespaces. In MariaDB Server,
innodb_undo_log_truncate=ON should be a much lighter operation
than in MySQL, because it will not involve any log checkpoint.

Another source of performance stalls should be
trx_purge_truncate_rseg_history(), which is shrinking the history list
by freeing the undo log pages whose undo records have been purged.
To alleviate that, we will introduce a purge_truncation_task that will
offload this from the purge_coordinator_task. In that way, the next
innodb_purge_batch_size pages may be parsed and purged while the pages
from the previous batch are being freed and the history list being shrunk.

The processing of innodb_undo_log_truncate=ON will still remain the
responsibility of the purge_coordinator_task.

purge_coordinator_state::count: Remove. We will ignore
innodb_purge_rseg_truncate_frequency, and act as if it had been
set to 1 (the maximum shrinking frequency).

purge_coordinator_state::do_purge(): Invoke an asynchronous task
purge_truncation_callback() to free the undo log pages.

purge_sys_t::iterator::free_history(): Free those undo log pages
that have been processed. This used to be a part of
trx_purge_truncate_history().

purge_sys_t::clone_end_view(): Take a new value of purge_sys.head
as a parameter, so that it will be updated while holding exclusive
purge_sys.latch. This is needed for race-free access to the field
in purge_truncation_callback().

Reviewed by: Vladislav Lesin
2023-10-25 09:11:58 +03:00
Marko Mäkelä
be24e75229 Merge 10.11 into 11.0 2023-10-19 08:12:16 +03:00
Marko Mäkelä
d5e15424d8 Merge 10.6 into 10.10
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.

Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue

Disabled test:
- spider/bugfix.mdev_27239 because we started to get
  +Error	1429 Unable to connect to foreign data source: localhost
  -Error	1158 Got an error reading communication packets
- main.delayed
  - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
    This part is disabled for now as it fails randomly with different
    warnings/errors (no corruption).
2023-10-14 13:36:11 +03:00
Marko Mäkelä
8096139b3a Merge 10.5 into 10.6 2023-09-19 10:47:26 +03:00
Marko Mäkelä
6c05edfdcd Merge 10.4 into 10.5 2023-09-19 10:20:09 +03:00
Marko Mäkelä
76b688f100 MDEV-30024 InnoDB: tried to purge non-delete-marked of a virtual column prefix
row_vers_vc_matches_cluster(): Invoke dtype_get_at_most_n_mbchars()
to extract the correct number of bytes corresponding to the number
of characters in a virtual column prefix index, just like we do in
row_sel_sec_rec_is_for_clust_rec().

The test case would occasionally reproduce the failure when this
fix is not present.
2023-09-19 09:31:34 +03:00
Sergei Golubchik
18ddde4826 Merge branch '11.1' into 11.2 2023-08-18 00:59:16 +02:00
Sergei Golubchik
a8a22b7af2 support 'alter online table t1 page_checksum=0' 2023-08-15 10:16:11 +02:00
Nikita Malyavin
ab4bfad206 MDEV-16329 [5/5] ALTER ONLINE TABLE
* Log rows in online_alter_binlog.
* Table online data is replicated within dedicated binlog file
* Cached data is written on commit.
* Versioning is fully supported.
* Works both wit and without binlog enabled.

* For now savepoints setup is forbidden while ONLINE ALTER goes on.
  Extra support is required. We can simply log the SAVEPOINT query events
  and replicate them together with row events. But it's not implemented
  for now.

* Cache flipping:

  We want to care for the possible bottleneck in the online alter binlog
  reading/writing in advance.

  IO_CACHE does not provide anything better that sequential access,
  besides, only a single write is mutex-protected, which is not suitable,
  since we should write a transaction atomically.

  To solve this, a special layer on top Event_log is implemented.
  There are two IO_CACHE files underneath: one for reading, and one for
  writing.

  Once the read cache is empty, an exclusive lock is acquired (we can wait
  for a currently active transaction finish writing), and flip() is emitted,
  i.e. the write cache is reopened for read, and the read cache is emptied,
  and reopened for writing.

  This reminds a buffer flip that happens in accelerated graphics
  (DirectX/OpenGL/etc).

  Cache_flip_event_log is considered non-blocking for a single reader and a
  single writer in this sense, with the only lock held by reader during flip.

  An alternative approach by implementing a fair concurrent circular buffer
  is described in MDEV-24676.

* Cache managers:
  We have two cache sinks: statement and transactional.
  It is important that the changes are first cached per-statement and
  per-transaction.
  If a statement fails, then only statement data is rolled back. The
  transaction moves along, however.

  Turns out, there's no guarantee that TABLE well persist in
  thd->open_tables to the transaction commit moment.
  If an error occurs, tables from statement are purged.
  Therefore, we can't store te caches in TABLE. Ideally, it should be
  handlerton, but we cut the corner and store it in THD in a list.
2023-08-15 10:16:11 +02:00