1
0
mirror of https://github.com/MariaDB/server.git synced 2025-11-09 11:41:36 +03:00
Commit Graph

5278 Commits

Author SHA1 Message Date
Sergei Golubchik
18ddde4826 Merge branch '11.1' into 11.2 2023-08-18 00:59:16 +02:00
Nikita Malyavin
8aa1a9e6a7 MDEV-31812 Add switch to old_mode to disable non-locking ALTER
Add LOCK_ALTER_TABE_COPY bit to old_mode. Disables online copy by default,
but still allows to force it with explicit lock=none
2023-08-15 14:00:28 +02:00
Nikita Malyavin
a1af525588 MDEV-31804 Assertion `thd->m_transaction_psi == __null' fails
... upon replicating online ALTER

When an online event is applied and slave_exec_mode is idempotent,
Write_rows_log_event::do_before_row_operations had reset
thd->lex->sql_command to SQLCOM_REPLACE.

This led to that a statement was detected as a row-type during binlogging,
and was logged as not standalone.

So the corresponding Gtid_log_event, when applied on replica, did not exit
early and created a new PSI transaction. Hence the difference with
non-online ALTER.
2023-08-15 14:00:28 +02:00
Nikita Malyavin
30c965f866 MDEV-31777 ER_GET_ERRNO upon online alter on CONNECT table
Forbid Online for CONNECT.
2023-08-15 14:00:28 +02:00
Nikita Malyavin
44ca37ef17 MDEV-31631 Adding auto-increment to table with history online misbehaves
Adding an auto_increment column online leads to an undefined behavior.
Basically any DEFAULTs that depend on a row order in the table, or on
the non-deterministic (in scope of the ALTER TABLE statement) function
is UB.

For example, NOW() is considered generally non-deterministic
(Item_func_now_utc is marked with VCOL_NON_DETERMINISTIC), but it's fixed
in scope of a single statement.

Same for any other function that depends only on the session/status vars
apart from its arguments.

Only two UB cases are known:
* adding new AUTO_INCREMENT column. Modifying the existing column may be
fine under certain circumstances, see MDEV-31058.
* adding new column with DEFAULT(nextval(...)). Modifying the existing
column is possible, since its value will be always present in the online
event, except for the NULL -> NOT NULL modification
2023-08-15 14:00:28 +02:00
Nikita Malyavin
e026a366bf MDEV-31776 Online ALTER reports the number of affected rows incorrectly
Add a new virtual function that will increase the inserted rows count
for the insert log event and decrease it for the delete event.

Reuses Rows_log_event::m_row_count on the replication side, which was only
set on the logging side.
2023-08-15 14:00:28 +02:00
Nikita Malyavin
9c8554259a MDEV-31775 Server crash upon online alter on sequence
Forbid online for sequences. It doesn't work now and sequence tables are
always only one row, so not worth the extra complexity.
2023-08-15 14:00:28 +02:00
Nikita Malyavin
2952274ac4 make a proper cleanup if online_alter_binlog is failed to create 2023-08-15 14:00:28 +02:00
Andrei
bac8f189ed MDEV-31755 Replica's DML event deadlocks wit online alter table
The deadlock was caused by too strong MDL acquired by the start ALTER.

Replica's ALTER TABLE replication consists of two phases:
1. Start ALTER (SA) -- the event is emittd in the very beginning,
allowing replication start ALTER in parallel
2. Commit ALTER (CA) -- ensures that master finishes successfully

CA is normally received by wait_for_master call.
If parallel DML was run, the following sequence will take place:

|- SA
|- DML
|- CA

If CA is handled after MDL upgrade, it'll will deadlock with DML.

While MDL is shared by the start ALTER wait for its 2nd part
to allow concurrent DMLs to grab the lock.

The fix uses wait_for_master reentrancy -- no need to avoid a second call
in the end of mysql_alter_table.

Since SA and CA are marked with FL_DDL, the DML issued in-between cannot be
rescheduled before or after them. However, SA "commits" (by he call of
write_bin_log_start_alter and, subsequently,
thd->wakeup_subsequent_commits) before the copy stage begins, unlocking
the DMLs to run on this table. That is, these DMLs will be executed
concurrently with the copy stage, making Online alter effective on replicas
as well

Co-authored-by: Nikita Malyavin (nikitamalyavin@gmail.com)
2023-08-15 14:00:28 +02:00
Nikita Malyavin
a539fac8bd MDEV-31646 untie from max_allowed_packet and opt_binlog_rows_event_max_size 2023-08-15 14:00:28 +02:00
Nikita Malyavin
d5e59c983f MDEV-31646 Online alter applies binlog cache limit to cache writes
1. Make online disk writes unlimited, same as filesort does.
2. Make proper error handling -- in 32-bit build IO_CACHE capacity limit is
4GB, so it is quite possible to overfill there.
3. Event_log::write_cache complicated with event reparsing, and as it was
proven by QA, contains some mistakes. Rewrite introbuce a simpler and much
faster version, not featuring reparsing and therefore copying a whole
buffer at once. This also disables checksums and crypto.
4. Handle read_log_event errors correctly: error returned is -1 (eof
signal for alter table), and my_error is not called. Call my_error and
always return 1. There's no test for this, since it shouldn't happen,
see the next bullet.
5. An event could be written partially in case of error, if it's bigger
than the IO_CACHE buffer. Restore the position where it was before the
error was emitted.

As a result, online alter is untied of several binlog variables, which was
a second aim of this patch.
2023-08-15 13:59:07 +02:00
Nikita Malyavin
2cecb5a638 MDEV-31601 Some ALTER TABLEs fail ... with a wrong error message
Report correct algorithm in the error message.
2023-08-15 10:16:14 +02:00
Nikita Malyavin
43cb98b420 fix main.mysql57_virtual, main.alter_table, innodb.alter_algorithm
The correct (best) algorithm is now chosen for ALGORITHM=DEFAULT
and alter_algorithm=DEFAULT

See also MDEV-30906
2023-08-15 10:16:13 +02:00
Nikita Malyavin
c382de72ea MDEV-30984 Online ALTER table is denied with non-informative error messages
Group all the checks in online_alter_check_supported().

There is now two groups of checks:
1. A technical availability of online, that is checked before open_tables,
and affects table_list->lock_type. It's supposed to be safe to make it
TL_READ even if COPY algorithm will fall back to not-online, since MDL is
SHARED_UPGRADEABLE anyway.
2. An 'online' availability for a COPY algorithm. It can be done as late as
just before the copy_data_between_tables call. The lock_type influence is
disclosed above, so the only other place it affects is
Alter_info::supports_lock, where `online` flag is only used to decide
whether to report the error at the inplace preparation stage. We'd want to
make that at the last resort, which is COPY preparation, if no algorithm is
chosen by the user. So it's event better now.

Some changes are required to the autoinc support detection, as the check
now happens after mysql_prepare_alter_table:
* alter_info->drop_list is empty
* instead, dropped columns are in tmp_set
* alter_info->create_list now has every field that's in the new table.
* the column definition's change.str will be nonnull whether the column
  remains in the new table (vs whether it was changed, as before).
  But it also has `field` field set.
* IF EXISTS doesn't have to be dealt anymore

This infers that the changes are now checked in more detail: a field's
definition shouldn't be changed, vs a field shouldn't be mentioned in
the CHANGE list, as it was before. This is reflected by the line 193 test.
2023-08-15 10:16:13 +02:00
Nikita Malyavin
500379cf49 unpack_row: unpack a correct number of fields 2023-08-15 10:16:13 +02:00
Nikita Malyavin
da277396bd MDEV-31058 ER_KEY_NOT_FOUND upon concurrent CHANGE column autoinc and DML
When column is changed to autoinc, ALTER TABLE may update zero/NULL values,
if NO_AUTO_VALUE_ON_ZERO mode is not enabled.

Forbid this for LOCK=NONE for the unreliable cases.
The cases are described in online_alter_check_autoinc.
2023-08-15 10:16:13 +02:00
Nikita Malyavin
e1f5c58ac7 MDEV-30891 Assertion `!table->versioned(VERS_TRX_ID)' failed
Assertion `!table->versioned(VERS_TRX_ID)' failed in
Write_rows_log_event::binlog_row_logging_function during ONLINE ALTER.

trxid-versioned tables can't be replicated.
ONLINE ALTER will also be forbidden for these tables.
2023-08-15 10:16:13 +02:00
Nikita Malyavin
b08b78c5b9 MDEV-29068 Cascade foreign key updates do not apply in online alter 2023-08-15 10:16:13 +02:00
Sergei Golubchik
6ecc90ae36 set table->pos_in_table_list in online alter 2023-08-15 10:16:13 +02:00
Sergei Golubchik
8a8f71b8b8 cleanup: ifdefs 2023-08-15 10:16:12 +02:00
Nikita Malyavin
6e0f456090 MDEV-29013 ER_KEY_NOT_FOUND/lock timeout upon online alter with long unique
1. ER_KEY_NOT_FOUND
general replcation problem, already fixed earlier.
test added.

2. ER_LOCK_WAIT_TIMEOUT
This is a long unique specific problem.

Sometimes, lookup_handler is created for to->file. To properly free it,
ha_reset should be called. It is usually done by calling
close_thread_table, but ALTER TABLE makes it differently. Hence, a single
ha_reset call is added to mysql_alter_table.

Also, event_mem_root is removed. Normally, no per-event data should be
allocated on thd->mem_root, that would mean a leak. And otherwise,
lookup_handler is lazily allocated, but its lifetime matches statement,
not event.
2023-08-15 10:16:12 +02:00
Sergei Golubchik
472c3d082f don't do ALTER IGNORE TABLE online
because online means we'll apply events from the binlog, and
ignore means that bad rows will be skipped. So a bad Write_row_log_event
will be skipped and a following Update_row_log_event will fail to
apply.
2023-08-15 10:16:12 +02:00
Sergei Golubchik
ea46fdcea4 cleanup, remove dead code 2023-08-15 10:16:12 +02:00
Sergei Golubchik
845c939601 MDEV-28943 Online alter fails under LOCK TABLE with ER_ALTER_OPERATION_NOT_SUPPORTED_REASON
if ALTER TABLE ... LOCK=xxx is executed under LOCK TABLES,
ignore the LOCK clause, because ALTER should not downgrade
already taken EXCLUSIVE table lock to SHARED or NONE.

This commit preserves the existing behavior (LOCK was de facto ignored),
but makes it explicit.
2023-08-15 10:16:12 +02:00
Nikita Malyavin
2ed03a41e6 MDEV-28930 ALTER TABLE Deadlocks with parallel TL_WRITE
ALTER ONLINE TABLE acquires table with TL_READ. Myisam normally acquires
TL_WRITE for DML, which makes it hang until table is freed.

We deadlock once ALTER upgrades its MDL lock.

Solution:
Unlock table earlier. We don't need to hold TL_READ once we finished
copying. Relay log replication requires no data locks on `from` table.
2023-08-15 10:16:12 +02:00
Sergei Golubchik
8fbdc76038 MDEV-28967 Assertion marked_for_write_or_computed()' failed in Field_new_decimal::store_value / online_alter_read_from_binlog
in the catch-up phase of the online alter we apply row events,
they're unpacked into `from->record[0]` and then converted
to `to->record[0]`.

This needs all fields of `from` to be in the `write_set`.

Although practically `Field::unpack()` does not assert the `write_set`,
and `Field::reset()` - used when a field value is not present in the
after-image - also doesn't assert the `write_set` for many types,
`Field_new_decimal::reset()` does.
2023-08-15 10:16:12 +02:00
Sergei Golubchik
93049e3de6 MDEV-28959 Online alter ignores strict table mode
* don't disable warnings when catching up
* do propagate warnings up like copy_data_between_tables() does
2023-08-15 10:16:12 +02:00
Sergei Golubchik
13f1e970a1 MDEV-28944 XA assertions failing in binlog_rollback and binlog_commit 2023-08-15 10:16:12 +02:00
Nikita Malyavin
8f6f219a68 control Cache_flip_event_log lifetime with reference count
If online alter fails, TABLE_SHARE can be freed while concurrent
transactions still have row events in their online_alter_cache_data.
On commit they try'll to flush them, writing to TABLE_SHARE's
Cache_flip_event_log, which is already freed.
This causes a crash in main.alter_table_online_debug test
2023-08-15 10:16:12 +02:00
Sergei Golubchik
62a1e282d0 MDEV-28771 Assertion `table->in_use&&tdc->flushed' failed after ALTER
don't simply set tdc->flushed, use flush_unused(1) that removes opened
but unused TABLE instances (that would otherwise prevent TABLE_SHARE from
being closed by keeping the ref_count>0).
2023-08-15 10:16:12 +02:00
Sergei Golubchik
b2be2e39a6 don't crash if ALTER TABLE fails and a long transaction blocks MDL upgrade 2023-08-15 10:16:11 +02:00
Sergei Golubchik
3099a756ab don't do DROP SYSTEM VERSIONING online
because ALTER TABLE ... DROP SYSTEM VERSIONING
is not just a change in the table structure, it also deletes
all historical rows
2023-08-15 10:16:11 +02:00
Sergei Golubchik
df0771c6a2 no ALTER TABLE should return ER_NO_DEFAULT_FOR_FIELD 2023-08-15 10:16:11 +02:00
Sergei Golubchik
a5776aa341 online alter always uses ALGORITHM=COPY, LOCK=NONE
so any other value of ALGORITHM or LOCK disables online alter
2023-08-15 10:16:11 +02:00
Sergei Golubchik
d767ed5c89 remove handler::open_read_view()
ht->start_consistent_snapshot() is also not a way,
because some engines (e.g. rocksdb) only do it readonly.
instead, downgrade the lock after reading the first row
(which implicitly opens a read view).
2023-08-15 10:16:11 +02:00
Sergei Golubchik
0b67af5a81 cleanup
no functional changes here
2023-08-15 10:16:11 +02:00
Sergei Golubchik
a8a22b7af2 support 'alter online table t1 page_checksum=0' 2023-08-15 10:16:11 +02:00
Nikita Malyavin
ab4bfad206 MDEV-16329 [5/5] ALTER ONLINE TABLE
* Log rows in online_alter_binlog.
* Table online data is replicated within dedicated binlog file
* Cached data is written on commit.
* Versioning is fully supported.
* Works both wit and without binlog enabled.

* For now savepoints setup is forbidden while ONLINE ALTER goes on.
  Extra support is required. We can simply log the SAVEPOINT query events
  and replicate them together with row events. But it's not implemented
  for now.

* Cache flipping:

  We want to care for the possible bottleneck in the online alter binlog
  reading/writing in advance.

  IO_CACHE does not provide anything better that sequential access,
  besides, only a single write is mutex-protected, which is not suitable,
  since we should write a transaction atomically.

  To solve this, a special layer on top Event_log is implemented.
  There are two IO_CACHE files underneath: one for reading, and one for
  writing.

  Once the read cache is empty, an exclusive lock is acquired (we can wait
  for a currently active transaction finish writing), and flip() is emitted,
  i.e. the write cache is reopened for read, and the read cache is emptied,
  and reopened for writing.

  This reminds a buffer flip that happens in accelerated graphics
  (DirectX/OpenGL/etc).

  Cache_flip_event_log is considered non-blocking for a single reader and a
  single writer in this sense, with the only lock held by reader during flip.

  An alternative approach by implementing a fair concurrent circular buffer
  is described in MDEV-24676.

* Cache managers:
  We have two cache sinks: statement and transactional.
  It is important that the changes are first cached per-statement and
  per-transaction.
  If a statement fails, then only statement data is rolled back. The
  transaction moves along, however.

  Turns out, there's no guarantee that TABLE well persist in
  thd->open_tables to the transaction commit moment.
  If an error occurs, tables from statement are purged.
  Therefore, we can't store te caches in TABLE. Ideally, it should be
  handlerton, but we cut the corner and store it in THD in a list.
2023-08-15 10:16:11 +02:00
Nikita Malyavin
d2d0995cf2 MDEV-16329 [4/5] Refactor MYSQL_BIN_LOG: extract Event_log ancestor
Event_log is supposed to be a basic logging class that can write events in
a single file.

MYSQL_BIN_LOG in comparison will have:
* rotation support
* index files
* purging
* gtid and transactional information handling.
* is dedicated for a general-purpose binlog
2023-08-15 10:16:11 +02:00
Sergei Golubchik
275684d8fe cleanup: remove vcol_info->stored_in_db
it was redundant, duplicating vcol_type == VCOL_GENERATED_STORED.

Note that VCOL_DEFAULT is not "stored", "stored vcol" means that after
rnd_next or index_read/etc the field value is already in the record[0]
and does not need to be calculated separately
2023-08-15 10:16:11 +02:00
Sergei Golubchik
4e87081b56 cleanup: remove useless check
create_table_impl() doesn't need to search for a temporary table
when the table_name is like #sql-xxx
2023-08-11 19:36:22 +02:00
Oleksandr Byelkin
51f9d62005 Merge branch '10.11' into 11.0 2023-08-09 07:53:48 +02:00
Oleksandr Byelkin
036df5f970 Merge branch '10.10' into 10.11 2023-08-08 14:57:31 +02:00
Oleksandr Byelkin
ced243a099 Merge branch '10.9' into 10.10 2023-08-05 20:34:09 +02:00
Oleksandr Byelkin
34a8e78581 Merge branch '10.6' into 10.9 2023-08-04 08:01:06 +02:00
Oleksandr Byelkin
5ea5291d97 Merge branch '10.5' into 10.6 2023-08-04 07:52:54 +02:00
Sergei Golubchik
61acb43689 MDEV-31822 ALTER TABLE ENGINE=x started failing instead of producing warning on unsupported TRANSACTIONAL=1
make TRANSACTIONAL table option behave similar to other engine-defined
table options. If the engine doesn't suport it:
* if specified expicitly in CREATE or ALTER - it's ER_UNKNOWN_OPTION
* an error or a warning depending on sql_mode IGNORE_BAD_TABLE_OPTIONS
* in ALTER TABLE from the engine that suppors it to the engine that
  doesn't - silently preserved (no warning)
* it is commented out in SHOW CREATE unless IGNORE_BAD_TABLE_OPTIONS
2023-08-02 14:45:31 +02:00
Sergei Golubchik
da09ae05a9 MDEV-18114 Foreign Key Constraint actions don't affect Virtual Column
* invoke check_expression() for all vcol_info's in
  mysql_prepare_create_table() to check for FK CASCADE
* also check for SET NULL and SET DEFAULT
* to check against existing FKs when a vcol is added in ALTER TABLE,
  old FKs must be added to the new_key_list just like other indexes are
* check columns recursively, if vcol1 references vcol2,
  flags of vcol2 must be taken into account
* remove check_table_name_processor(), put that logic under
  check_vcol_func_processor() to avoid walking the tree twice
2023-08-02 14:45:31 +02:00
Sergei Golubchik
ab1191c039 cleanup: key->key_create_info.check_for_duplicate_indexes -> key->old
mark old keys in the ALTER TABLE with the `old` flag, not with
the `key_create_info.check_for_duplicate_indexes`.

This allows to mark old foreign keys too.
2023-08-01 22:43:16 +02:00
Sergei Golubchik
b8233b38da cleanup: put db/table_name into Alter_info
also, prefer Lex_table_name and Lex_ident over LEX_CSTRING
2023-08-01 22:43:16 +02:00