Group all the checks in online_alter_check_supported().
There is now two groups of checks:
1. A technical availability of online, that is checked before open_tables,
and affects table_list->lock_type. It's supposed to be safe to make it
TL_READ even if COPY algorithm will fall back to not-online, since MDL is
SHARED_UPGRADEABLE anyway.
2. An 'online' availability for a COPY algorithm. It can be done as late as
just before the copy_data_between_tables call. The lock_type influence is
disclosed above, so the only other place it affects is
Alter_info::supports_lock, where `online` flag is only used to decide
whether to report the error at the inplace preparation stage. We'd want to
make that at the last resort, which is COPY preparation, if no algorithm is
chosen by the user. So it's event better now.
Some changes are required to the autoinc support detection, as the check
now happens after mysql_prepare_alter_table:
* alter_info->drop_list is empty
* instead, dropped columns are in tmp_set
* alter_info->create_list now has every field that's in the new table.
* the column definition's change.str will be nonnull whether the column
remains in the new table (vs whether it was changed, as before).
But it also has `field` field set.
* IF EXISTS doesn't have to be dealt anymore
This infers that the changes are now checked in more detail: a field's
definition shouldn't be changed, vs a field shouldn't be mentioned in
the CHANGE list, as it was before. This is reflected by the line 193 test.
When column is changed to autoinc, ALTER TABLE may update zero/NULL values,
if NO_AUTO_VALUE_ON_ZERO mode is not enabled.
Forbid this for LOCK=NONE for the unreliable cases.
The cases are described in online_alter_check_autoinc.
Assertion `!table->versioned(VERS_TRX_ID)' failed in
Write_rows_log_event::binlog_row_logging_function during ONLINE ALTER.
trxid-versioned tables can't be replicated.
ONLINE ALTER will also be forbidden for these tables.
1. ER_KEY_NOT_FOUND
general replcation problem, already fixed earlier.
test added.
2. ER_LOCK_WAIT_TIMEOUT
This is a long unique specific problem.
Sometimes, lookup_handler is created for to->file. To properly free it,
ha_reset should be called. It is usually done by calling
close_thread_table, but ALTER TABLE makes it differently. Hence, a single
ha_reset call is added to mysql_alter_table.
Also, event_mem_root is removed. Normally, no per-event data should be
allocated on thd->mem_root, that would mean a leak. And otherwise,
lookup_handler is lazily allocated, but its lifetime matches statement,
not event.
because online means we'll apply events from the binlog, and
ignore means that bad rows will be skipped. So a bad Write_row_log_event
will be skipped and a following Update_row_log_event will fail to
apply.
if ALTER TABLE ... LOCK=xxx is executed under LOCK TABLES,
ignore the LOCK clause, because ALTER should not downgrade
already taken EXCLUSIVE table lock to SHARED or NONE.
This commit preserves the existing behavior (LOCK was de facto ignored),
but makes it explicit.
ALTER ONLINE TABLE acquires table with TL_READ. Myisam normally acquires
TL_WRITE for DML, which makes it hang until table is freed.
We deadlock once ALTER upgrades its MDL lock.
Solution:
Unlock table earlier. We don't need to hold TL_READ once we finished
copying. Relay log replication requires no data locks on `from` table.
in the catch-up phase of the online alter we apply row events,
they're unpacked into `from->record[0]` and then converted
to `to->record[0]`.
This needs all fields of `from` to be in the `write_set`.
Although practically `Field::unpack()` does not assert the `write_set`,
and `Field::reset()` - used when a field value is not present in the
after-image - also doesn't assert the `write_set` for many types,
`Field_new_decimal::reset()` does.
If online alter fails, TABLE_SHARE can be freed while concurrent
transactions still have row events in their online_alter_cache_data.
On commit they try'll to flush them, writing to TABLE_SHARE's
Cache_flip_event_log, which is already freed.
This causes a crash in main.alter_table_online_debug test
don't simply set tdc->flushed, use flush_unused(1) that removes opened
but unused TABLE instances (that would otherwise prevent TABLE_SHARE from
being closed by keeping the ref_count>0).
ht->start_consistent_snapshot() is also not a way,
because some engines (e.g. rocksdb) only do it readonly.
instead, downgrade the lock after reading the first row
(which implicitly opens a read view).
* Log rows in online_alter_binlog.
* Table online data is replicated within dedicated binlog file
* Cached data is written on commit.
* Versioning is fully supported.
* Works both wit and without binlog enabled.
* For now savepoints setup is forbidden while ONLINE ALTER goes on.
Extra support is required. We can simply log the SAVEPOINT query events
and replicate them together with row events. But it's not implemented
for now.
* Cache flipping:
We want to care for the possible bottleneck in the online alter binlog
reading/writing in advance.
IO_CACHE does not provide anything better that sequential access,
besides, only a single write is mutex-protected, which is not suitable,
since we should write a transaction atomically.
To solve this, a special layer on top Event_log is implemented.
There are two IO_CACHE files underneath: one for reading, and one for
writing.
Once the read cache is empty, an exclusive lock is acquired (we can wait
for a currently active transaction finish writing), and flip() is emitted,
i.e. the write cache is reopened for read, and the read cache is emptied,
and reopened for writing.
This reminds a buffer flip that happens in accelerated graphics
(DirectX/OpenGL/etc).
Cache_flip_event_log is considered non-blocking for a single reader and a
single writer in this sense, with the only lock held by reader during flip.
An alternative approach by implementing a fair concurrent circular buffer
is described in MDEV-24676.
* Cache managers:
We have two cache sinks: statement and transactional.
It is important that the changes are first cached per-statement and
per-transaction.
If a statement fails, then only statement data is rolled back. The
transaction moves along, however.
Turns out, there's no guarantee that TABLE well persist in
thd->open_tables to the transaction commit moment.
If an error occurs, tables from statement are purged.
Therefore, we can't store te caches in TABLE. Ideally, it should be
handlerton, but we cut the corner and store it in THD in a list.
Event_log is supposed to be a basic logging class that can write events in
a single file.
MYSQL_BIN_LOG in comparison will have:
* rotation support
* index files
* purging
* gtid and transactional information handling.
* is dedicated for a general-purpose binlog
it was redundant, duplicating vcol_type == VCOL_GENERATED_STORED.
Note that VCOL_DEFAULT is not "stored", "stored vcol" means that after
rnd_next or index_read/etc the field value is already in the record[0]
and does not need to be calculated separately
make TRANSACTIONAL table option behave similar to other engine-defined
table options. If the engine doesn't suport it:
* if specified expicitly in CREATE or ALTER - it's ER_UNKNOWN_OPTION
* an error or a warning depending on sql_mode IGNORE_BAD_TABLE_OPTIONS
* in ALTER TABLE from the engine that suppors it to the engine that
doesn't - silently preserved (no warning)
* it is commented out in SHOW CREATE unless IGNORE_BAD_TABLE_OPTIONS
* invoke check_expression() for all vcol_info's in
mysql_prepare_create_table() to check for FK CASCADE
* also check for SET NULL and SET DEFAULT
* to check against existing FKs when a vcol is added in ALTER TABLE,
old FKs must be added to the new_key_list just like other indexes are
* check columns recursively, if vcol1 references vcol2,
flags of vcol2 must be taken into account
* remove check_table_name_processor(), put that logic under
check_vcol_func_processor() to avoid walking the tree twice
mark old keys in the ALTER TABLE with the `old` flag, not with
the `key_create_info.check_for_duplicate_indexes`.
This allows to mark old foreign keys too.
differently react to SQL_MODE => unusable SHOW CREATE
Use abort_on_warning dependent on strict mode over create new table
like it is done for copy data and inplace alter.
The DBUG_ASSER in HA_CREATE_INFO::resolve_to_charset_collation_context()
didn't take into account that the second execution is possible not only
during a prepared EXECUTE, but also during a CALL.
This patch adds a way to override default collations
(or "character set collations") for desired character sets.
The SQL standard says:
> Each collation known in an SQL-environment is applicable to one
> or more character sets, and for each character set, one or more
> collations are applicable to it, one of which is associated with
> it as its character set collation.
In MariaDB, character set collations has been hard-coded so far,
e.g. utf8mb4_general_ci has been a hard-coded character set collation
for utf8mb4.
This patch allows to override (globally per server, or per session)
character set collations, so for example, uca1400_ai_ci can be set as a
character set collation for Unicode character sets
(instead of compiled xxx_general_ci).
The array of overridden character set collations is stored in a new
(session and global) system variable @@character_set_collations and
can be set as a comma separated list of charset=collation pairs, e.g.:
SET @@character_set_collations='utf8mb3=uca1400_ai_ci,utf8mb4=uca1400_ai_ci';
The variable is empty by default, which mean use the hard-coded
character set collations (e.g. utf8mb4_general_ci for utf8mb4).
The variable can also be set globally by passing to the server startup command
line, and/or in my.cnf.