calculate auto-inc value even if long duplicate check fails -
this is what the engine does for normal uniques.
auto-inc value is needed if it's a REPLACE
XA support for online alter was totally missing.
Tying on binlog_hton made this hardly visible: simply having binlog_commit
called from xa_commit made an impression that it will automagically work
for online alter, which turns out wrong: all binlog does is writes
"XA END" into trx cache and flushes it to a real binlog.
In comparison, online alter can't do the same, since online replication
happens in a single transaction.
Solution: make a dedicated XA support.
* Extend struct xid_t with a pointer to Online_alter_cache_list
* On prepare: move online alter cache from THD::ha_data to XID passed
* On XA commit/rollback: use the online alter cache stored in this XID.
This makes us pass xid_cache_element->xid to xa_commit/xa_rollback
instead of lex->xid
* Use manual memory management for online alter cache list, instead of
mem_root allocation, since we don't have mem_root connected to the XA
transaction.
Move all the functions dedicated to online alter to a newly created
online_alter.cc.
With that, make many functions static and simplify the static functions
naming.
Also, rename binlog_log_row_online_alter -> online_alter_log_row.
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.
Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue
Disabled test:
- spider/bugfix.mdev_27239 because we started to get
+Error 1429 Unable to connect to foreign data source: localhost
-Error 1158 Got an error reading communication packets
- main.delayed
- Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
This part is disabled for now as it fails randomly with different
warnings/errors (no corruption).
- This failure happens due to commit bf3b787e02 (MDEV-31835). InnoDB fails to apply buffered insert
operation for transaction_registry during commit operation.
To avoid this, ha_commit_trans() should call extra() with
HA_EXTRA_RESET_STATE to apply bulk buffered insert operation.
Problem:
=========
During commit, server calls prepare_commit_versioned to
determine the transaction modified system-versioned data.
Due to binlog_do_db option, we disable the binlog for the
statement. But prepare_commit_versioned() is being
called only when binlog is enabled for the statement.
Fix:
===
prepare_commit_versioned() should happen irrespective
of binlog state. So if the server has any read-write operation
then we should call prepare_commit_versioned().
- Moving get_canonical_filename() from a public function to a method in handler.
- Adding a helper method is_canonical_filename() to handler.
- Adding helper methods left(), substr(), starts_with() to Lex_cstring.
- Adding helper methods is_sane(), buffer_overlaps(),
max_data_size() to CharBuffer.
- Adding append_casedn() to CharBuffer. It implements the main functionality
that replaces the being removed my_casedn_str() call.
- Adding a class Table_path_buffer,
a descendant of CharBuffer with size FN_REFLEN.
- Changing get_canonical_filename() to get a pointer to Table_path_buffer
instead just a pointer to char.
- Changing the data type of the "path" parameter and the return type of
get_canonical_filename() from char* to Lex_cstring.
- Replacing the old style inplace check_db_name() in make_table_name_list()
to the new style non-modifying code
- Adding "const" qualifier to the "db" parameter to ha_discover_table_names()
and its dependency functions.
The row events were applied "twice": once for the ha_partition, and one
more time for the underlying storage engine.
There's no such problem in binlog/rpl, because ha_partiton::row_logging
is normally set to false.
The fix makes the events replicate only when the handler is a root handler.
We will try to *guess* this by comparing it to table->file. The same
approach is used in the MDEV-21540 fix, 231feabd. The assumption is made,
that the row methods are only called for table->file (and never for a
cloned handler), hence the assertions are added in ha_innobase and
ha_myisam to make sure that this is true at least for those engines
Also closes MDEV-31040, however the test is not included, since we have no
convenient way to construct a deterministic version.
instead use only one (trx) IO_CACHE and truncate it if the
statement is rolled back.
don't use binlog_cache_mngr to accumulate the data,
use binlog_cache_data instead.
(binlog_cache_data owns one IO_CACHE, binlog_cache_mngr owns
two binlog_cache_data's, trx and stmt).
* Log rows in online_alter_binlog.
* Table online data is replicated within dedicated binlog file
* Cached data is written on commit.
* Versioning is fully supported.
* Works both wit and without binlog enabled.
* For now savepoints setup is forbidden while ONLINE ALTER goes on.
Extra support is required. We can simply log the SAVEPOINT query events
and replicate them together with row events. But it's not implemented
for now.
* Cache flipping:
We want to care for the possible bottleneck in the online alter binlog
reading/writing in advance.
IO_CACHE does not provide anything better that sequential access,
besides, only a single write is mutex-protected, which is not suitable,
since we should write a transaction atomically.
To solve this, a special layer on top Event_log is implemented.
There are two IO_CACHE files underneath: one for reading, and one for
writing.
Once the read cache is empty, an exclusive lock is acquired (we can wait
for a currently active transaction finish writing), and flip() is emitted,
i.e. the write cache is reopened for read, and the read cache is emptied,
and reopened for writing.
This reminds a buffer flip that happens in accelerated graphics
(DirectX/OpenGL/etc).
Cache_flip_event_log is considered non-blocking for a single reader and a
single writer in this sense, with the only lock held by reader during flip.
An alternative approach by implementing a fair concurrent circular buffer
is described in MDEV-24676.
* Cache managers:
We have two cache sinks: statement and transactional.
It is important that the changes are first cached per-statement and
per-transaction.
If a statement fails, then only statement data is rolled back. The
transaction moves along, however.
Turns out, there's no guarantee that TABLE well persist in
thd->open_tables to the transaction commit moment.
If an error occurs, tables from statement are purged.
Therefore, we can't store te caches in TABLE. Ideally, it should be
handlerton, but we cut the corner and store it in THD in a list.
Event_log is supposed to be a basic logging class that can write events in
a single file.
MYSQL_BIN_LOG in comparison will have:
* rotation support
* index files
* purging
* gtid and transactional information handling.
* is dedicated for a general-purpose binlog
* Eliminate most usages of THD::use_trans_table. Only 3 left, and they are
at quite high levels, and really essential.
* Eliminate is_transactional argument when possible. Lots of places are
left though, because of some WSREP error handling in
MYSQL_BIN_LOG::set_write_error.
* Remove junk binlog functions from THD
* binlog_prepare_pending_rows_event is moved to log.cc inside MYSQL_BIN_LOG
and is not anymore template. Instead it accepls event factory with a type
code, and a callback to a constructing function in it.
mark old keys in the ALTER TABLE with the `old` flag, not with
the `key_create_info.check_for_duplicate_indexes`.
This allows to mark old foreign keys too.
We introduce simple plugin dependency. A plugin init function may
return HA_ERR_RETRY_INIT. If this happens during server startup when
the server is trying to initialise all plugins, the failed plugins
will be retried, until no more plugins succeed in initialisation or
want to be retried.
This will fix spider init bugs which is caused in part by its
dependency on Aria for initialisation.
The reason we need a new return code, instead of treating every
failure as a request for retry, is that it may be impossible to clean
up after a failed plugin initialisation. Take InnoDB for example, it
has a global variable `buf_page_cleaner_is_active`, which may not
satisfy an assertion during a second initialisation try, probably
because InnoDB does not expect the initialisation to be called
twice.
The problem was an incorrect unmark_start_commit() in
signal_error_to_sql_driver_thread(). If an event group gets an error, this
unmark could run after the following GCO started, and the subsequent
re-marking could access de-allocated GCO.
The offending unmark_start_commit() looks obviously incorrect, and the fix
is to just remove it. It was introduced in the MDEV-8302 patch, the commit
message of which suggests it was added there solely to satisfy an assertion
in ha_rollback_trans(). So update this assertion instead to not trigger for
event groups that experienced an error (rgi->worker_error). When an error
occurs in an event group, all following event groups are skipped anyway, so
the unmark should never be needed in this case.
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>