Caused by:
5d37cac7 MDEV-33348 ALTER TABLE lock waiting stages are indistinguishable.
In that commit, progress reporting was moved to
mysql_alter_table from copy_data_between_tables.
The temporary table case wasn't taken into the consideration,
where the execution of mysql_alter_table ends earlier than usual, by the
'end_temporary' label. There, thd_progress_end has been missing.
Fix:
Add missing thd_progress_end() call in mysql_alter_table.
Several points of synchronization during ALTER TABLE COPY looked identical
in the progress report query. Besides, if its the late lock upgrade stage,
the data would be:
STAGE 0
MAX_STAGE 0
PROGRESS 0.000
which looks irrelevant.
This patch moves thd_progress_deinit call after the last lock upgrade.
Also, for online alter, if there is nothing to replicate, the
progress and max_progress values would be 0, which discard the result data
on the side of sql_show, see processlist_callback in sql_show.cc.
So now the minimal max_progress will be 1. To avoid 0% progress in the
report, minimax progress value is also set to 1, so we will see 100% if
there's nothing to replicate.
Adding an auto_increment column online leads to an undefined behavior.
Basically any DEFAULTs that depend on a row order in the table, or on
the non-deterministic (in scope of the ALTER TABLE statement) function
is UB.
For example, NOW() is considered generally non-deterministic
(Item_func_now_utc is marked with VCOL_NON_DETERMINISTIC), but it's fixed
in scope of a single statement.
Same for any other function that depends only on the session/status vars
apart from its arguments.
Only two UB cases are known:
* adding new AUTO_INCREMENT column. Modifying the existing column may be
fine under certain circumstances, see MDEV-31058.
* adding new column with DEFAULT(nextval(...)). Modifying the existing
column is possible, since its value will be always present in the online
event, except for the NULL -> NOT NULL modification
Group all the checks in online_alter_check_supported().
There is now two groups of checks:
1. A technical availability of online, that is checked before open_tables,
and affects table_list->lock_type. It's supposed to be safe to make it
TL_READ even if COPY algorithm will fall back to not-online, since MDL is
SHARED_UPGRADEABLE anyway.
2. An 'online' availability for a COPY algorithm. It can be done as late as
just before the copy_data_between_tables call. The lock_type influence is
disclosed above, so the only other place it affects is
Alter_info::supports_lock, where `online` flag is only used to decide
whether to report the error at the inplace preparation stage. We'd want to
make that at the last resort, which is COPY preparation, if no algorithm is
chosen by the user. So it's event better now.
Some changes are required to the autoinc support detection, as the check
now happens after mysql_prepare_alter_table:
* alter_info->drop_list is empty
* instead, dropped columns are in tmp_set
* alter_info->create_list now has every field that's in the new table.
* the column definition's change.str will be nonnull whether the column
remains in the new table (vs whether it was changed, as before).
But it also has `field` field set.
* IF EXISTS doesn't have to be dealt anymore
This infers that the changes are now checked in more detail: a field's
definition shouldn't be changed, vs a field shouldn't be mentioned in
the CHANGE list, as it was before. This is reflected by the line 193 test.
When column is changed to autoinc, ALTER TABLE may update zero/NULL values,
if NO_AUTO_VALUE_ON_ZERO mode is not enabled.
Forbid this for LOCK=NONE for the unreliable cases.
The cases are described in online_alter_check_autoinc.
Assertion `!table->versioned(VERS_TRX_ID)' failed in
Write_rows_log_event::binlog_row_logging_function during ONLINE ALTER.
trxid-versioned tables can't be replicated.
ONLINE ALTER will also be forbidden for these tables.
because online means we'll apply events from the binlog, and
ignore means that bad rows will be skipped. So a bad Write_row_log_event
will be skipped and a following Update_row_log_event will fail to
apply.
don't simply set tdc->flushed, use flush_unused(1) that removes opened
but unused TABLE instances (that would otherwise prevent TABLE_SHARE from
being closed by keeping the ref_count>0).
* Log rows in online_alter_binlog.
* Table online data is replicated within dedicated binlog file
* Cached data is written on commit.
* Versioning is fully supported.
* Works both wit and without binlog enabled.
* For now savepoints setup is forbidden while ONLINE ALTER goes on.
Extra support is required. We can simply log the SAVEPOINT query events
and replicate them together with row events. But it's not implemented
for now.
* Cache flipping:
We want to care for the possible bottleneck in the online alter binlog
reading/writing in advance.
IO_CACHE does not provide anything better that sequential access,
besides, only a single write is mutex-protected, which is not suitable,
since we should write a transaction atomically.
To solve this, a special layer on top Event_log is implemented.
There are two IO_CACHE files underneath: one for reading, and one for
writing.
Once the read cache is empty, an exclusive lock is acquired (we can wait
for a currently active transaction finish writing), and flip() is emitted,
i.e. the write cache is reopened for read, and the read cache is emptied,
and reopened for writing.
This reminds a buffer flip that happens in accelerated graphics
(DirectX/OpenGL/etc).
Cache_flip_event_log is considered non-blocking for a single reader and a
single writer in this sense, with the only lock held by reader during flip.
An alternative approach by implementing a fair concurrent circular buffer
is described in MDEV-24676.
* Cache managers:
We have two cache sinks: statement and transactional.
It is important that the changes are first cached per-statement and
per-transaction.
If a statement fails, then only statement data is rolled back. The
transaction moves along, however.
Turns out, there's no guarantee that TABLE well persist in
thd->open_tables to the transaction commit moment.
If an error occurs, tables from statement are purged.
Therefore, we can't store te caches in TABLE. Ideally, it should be
handlerton, but we cut the corner and store it in THD in a list.
Reason for the change was that ha_notify_table_changed() was done
after table open when .frm had been replaced, which caused failure
in engines that checks on open if .frm matches the engines table
definition.
Other changes:
- Remove not needed open/close call at end of inline alter table.
Some test that depended on the table beeing in the table cache after
ALTER TABLE had to be updated.
Introduced new alter algorithm type called NOCOPY & INSTANT for
inplace alter operation.
NOCOPY - Algorithm refuses any alter operation that would
rebuild the clustered index. It is a subset of INPLACE algorithm.
INSTANT - Algorithm allow any alter operation that would
modify only meta data. It is a subset of NOCOPY algorithm.
Introduce new variable called alter_algorithm. The values are
DEFAULT(0), COPY(1), INPLACE(2), NOCOPY(3), INSTANT(4)
Message to deprecate old_alter_table variable and make it alias
for alter_algorithm variable.
alter_algorithm variable for slave is always set to default.