to explicit row_start/row_end columns
In case of adding both system fields of same type (length, unsigned
flag) as old implicit system fields do the rename of implicit system
fields to the ones specified in ALTER, remove SYSTEM_INVISIBLE flag in
that case. Correct PERIOD clause must be specified in ALTER as well.
MDEV-34904 Inplace alter for implicit to explicit versioning is broken
Whether ALTER goes inplace and how it goes inplace depends on
handler_flags which goes from alter_info->flags by this logic:
ha_alter_info->handler_flags|= (alter_info->flags & ~flags_to_remove);
ALTER_VERS_EXPLICIT was not in flags_to_remove and its value (1ULL <<
35) clashed with ALTER_ADD_NON_UNIQUE_NON_PRIM_INDEX.
ALTER_VERS_EXPLICIT must not affect inplace, it is SQL-only so we
remove it from handler_flags.
The tests fail on assertion
ut_ad(!wsrep_is_wsrep_xid(&trx->xid));
in `innobase_recover_rollback_by_xid()`.
The fix is to avoid async rollback for prepared transactions
when wsrep is ON or wsrep recovery is in progress. The rationale
is that the rollback of prepared transactions must complete
before the node starts applying write sets after SST, or in
case of wsrep recovery, the recovery must complete before the
process exists.
Change the assertion into stronger one
ut_ad(!(WSREP_ON || wsrep_recovery));
to catch if the async rollback codepath is taken when wsrep is
enabled.
This is an extension of MDEV-30423 "Deadlock on Replica during BACKUP
STAGE BLOCK_COMMIT on XA transactions"
The original commit in MDEV-30423 was not complete as some usage in XA of
MDL_BACKUP_COMMIT locks did not set thd->backup_commit_lock.
This is required to be set when using parallel replication.
Fixed by ensuring that all usage of BACKUP_COMMIT lock i XA is uniform and
all sets thd->backup_commit_lock. I also changed all locks to be
MDL_EXPLICIT to keep also that part uniform.
A regression test is added.
When a derived table which has distinct values and BLOB fields is
materialized, an index is created over all columns to ensure only
unique values are placed to the result.
This index is created in a special mode HA_UNIQUE_HASH to support BLOBs.
Later the optimizer may incorrectly choose this index to retrieve values
from the derived table, although such type of index cannot be used
for data retrieval.
This commit excludes HA_UNIQUE_HASH indexes from adding to
`JOIN::keyuse` array thus preventing their subsequent usage for
data retrieval
work consistently on replication
Row-based replication does not execute CREATE .. SELECT but instead
CREATE TABLE. CREATE .. SELECT creates implict system fields on
unusual place: in-between declared fields and select fields. That was
done because select_field_pos logic requires select fields go last in
create_list.
So, CREATE .. SELECT on master and CREATE TABLE on slave create system
fields on different positions and replication gets field mismatch.
To fix this we've changed CREATE .. SELECT to create implicit system
fields on usual place in the end and updated select_field_pos for
handling this case.
Search conditions were evaluated using val_int(), which was wrong.
Fixing the code to use val_bool() instead.
Details:
- Adding a new item_base_t::IS_COND flag which marks Items used
as <search condition> in WHERE, HAVING, JOIN ON, CASE WHEN clauses.
The flag is at the parse time.
These expressions must be evaluated using val_bool() rather than val_int().
Note, the optimizer creates more Items which are used as search conditions.
Most of these items are not marked with IS_COND yet. This is OK for now,
but eventually these Items can also be fixed to have the flag.
- Adding a method Item::is_cond() which tests if the Item has the IS_COND flag.
- Implementing Item_cache_bool. It evaluates the cached expression using
val_bool() rather than val_int().
Overriding Type_handler_bool::Item_get_cache() to create Item_cache_bool.
- Implementing Item::save_bool_in_field(). It uses val_bool() rather than
val_int() to evaluate the expression.
- Implementing Type_handler_bool::Item_save_in_field()
using Item::save_bool_in_field().
- Fixing all Item_bool_func descendants to implement a virtual val_bool()
rather than a virtual val_int().
- To find places where val_int() should be fixed to val_bool(), a few
DBUG_ASSERT(!is_cond()) where added into val_int() implementations
of selected (most frequent) classes:
Item_field
Item_str_func
Item_datefunc
Item_timefunc
Item_datetimefunc
Item_cache_bool
Item_bool_func
Item_func_hybrid_field_type
Item_basic_constant descendants
- Fixing all places where DBUG_ASSERT() happened during an "mtr" run
to use val_bool() instead of val_int().
crash recovery
Summary
=======
When doing server recovery, the active transactions will be rolled
back by InnoDB background rollback thread automatically. The
prepared transactions will be committed or rolled back accordingly
by binlog recovery. Binlog recovery is done in main thread before
the server can provide service to users. If there is a big
transaction to rollback, the server will not available for a long
time.
This patch provides a way to rollback the prepared transactions
asynchronously. Thus the rollback will not block server startup.
Design
======
- Handler::recover_rollback_by_xid()
This patch provides a new handler interface to rollback transactions
in recover phase. InnoDB just set the transaction's state to active.
Then the transaction will be rolled back by the background rollback
thread.
- Handler::signal_tc_log_recover_done()
This function is called after tc log is opened(typically binlog opened)
has done. When this function is called, all transactions will be rolled
back have been reverted to ACTIVE state. Thus it starts rollback thread
to rollback the transactions.
- Background rollback thread
With this patch, background rollback thread is defered to run until binlog
recovery is finished. It is started by innobase_tc_log_recovery_done().
ENGINE_SUBSTITUTION only applies to CREATE TABLE and ALTER TABLE, and
Storage_engine_name::resolve_storage_engine_with_error() could be
called when executing any sql command.
Trivial batch, using the handler statistics already collected for
the slow query log.
The reason for the changes in test cases was mainly to change to use
select TABLE_SCHEMA ... from information_schema.table_statistics instead
of 'show table_statistics' to avoid future changes to test results
if we add more columns to table_statistics.
There are two problems.
First, replication fails when XA transactions are used where the
slave has replicate_do_db set and the client has touched a different
database when running DML such as inserts. This is because XA
commands are not treated as keywords, and are thereby not exempt
from the replication filter. The effect of this is that during an XA
transaction, if its logged “use db” from the master is filtered out
by the replication filter, then XA END will be ignored, yet its
corresponding XA PREPARE will be executed in an invalid state,
thereby breaking replication.
Second, if the slave replicates an XA transaction which results in
an empty transaction, the XA START through XA PREPARE first phase of
the transaction won’t be binlogged, yet the XA COMMIT will be
binlogged. This will break replication in chain configurations.
The first problem is fixed by treating XA commands in
Query_log_event as keywords, thus allowing them to bypass the
replication filter. Note that Query_log_event::is_trans_keyword() is
changed to accept a new parameter to define its mode, to either
check for XA commands or regular transaction commands, but not both.
In addition, mysqlbinlog is adapted to use this mode so its
--database filter does not remove XA commands from its output.
The second problem fixed by overwriting the XA state in the XID
cache to be XA_ROLLBACK_ONLY, so at commit time, the server knows to
rollback the transaction and skip its binlogging. If the xid cache
is cleared before an XA transaction receives its completion command
(e.g. on server shutdown), then before reporting ER_XAER_NOTA when
the completion command is executed, the filter is first checked if
the database is ignored, and if so, the error is ignored.
Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
InnoDB transactions may be reused after committed:
- when taken from the transaction pool
- during a DDL operation execution
In this case wsrep flag on trx object is cleared, which may cause wrong
execution logic afterwards (wsrep-related hooks are not run).
Make trx->wsrep flag initialize from THD object only once on InnoDB transaction
start and don't change it throughout the transaction's lifetime.
The flag is reset at commit time as before.
Unconditionally set wsrep=OFF for THD objects that represent InnoDB background
threads.
Make Wsrep_schema::store_view() operate in its own transaction.
Fix streaming replication transactions' fragments rollback to not switch
THD->wsrep value during transaction's execution
(use THD->wsrep_ignore_table as a workaround).
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Affects:
MDEV-34150 Assertion failure in Diagnostics_area::set_error_status upon binary
logging hitting tmp space limit
MDEV-9101 Limit size of created disk temporary files and tables
This bug was caused by moving flushing of the in-memory-row-events from
close_thread_tables() to binlog_commit() in MDEV-34150.
This was needed to be able to handle the case where binlog writes could
fail.
Galera have two case where the change caused problems:
- Row events in commit_one_phase_2() was not done in the case the standard
binary log was not enabled but Galera was using the binary log
internally.
- Galera disabled the call to binlog_commit_flush_stmt_cache() for not
ending transactions.
Fixed by adding code that flushes the in-memory-row-events to the binary
log (write, but now sync) in the two above cases if Galera is enabled.
Columns added to TABLE_STATISTICS
- ROWS_INSERTED, ROWS_DELETED, ROWS_UPDATED, KEY_READ_HITS and
KEY_READ_MISSES.
Columns added to CLIENT_STATISTICS and USER_STATISTICS:
- KEY_READ_HITS and KEY_READ_MISSES.
User visible changes (except new columns):
- CLIENT_STATISTICS and USER_STATISTICS has columns KEY_READ_HITS and
KEY_READ_MISSES added after column ROWS_UPDATED before SELECT_COMMANDS.
Other changes:
- Do not collect table statistics for system tables like index_stats
table_stats, performance_schema, information_schema etc as the user
has no control of these and the generate noice in the statistics.
- All row variables that are part of user_stats are moved to
'struct rows_stats' to make it easy to clear all of them at once.
- ha_read_key_misses added to STATUS_VAR
Notes:
- userstat.result has a change of numbers of rows for handler_read_key.
This is because use-stat-tables is now disabled for the test.
Two new variables added:
- max_tmp_space_usage : Limits the the temporary space allowance per user
- max_total_tmp_space_usage: Limits the temporary space allowance for
all users.
New status variables: tmp_space_used & max_tmp_space_used
New field in information_schema.process_list: TMP_SPACE_USED
The temporary space is counted for:
- All SQL level temporary files. This includes files for filesort,
transaction temporary space, analyze, binlog_stmt_cache etc.
It does not include engine internal temporary files used for repair,
alter table, index pre sorting etc.
- All internal on disk temporary tables created as part of resolving a
SELECT, multi-source update etc.
Special cases:
- When doing a commit, the last flush of the binlog_stmt_cache
will not cause an error even if the temporary space limit is exceeded.
This is to avoid giving errors on commit. This means that a user
can temporary go over the limit with up to binlog_stmt_cache_size.
Noteworthy issue:
- One has to be careful when using small values for max_tmp_space_limit
together with binary logging and with non transactional tables.
If a the binary log entry for the query is bigger than
binlog_stmt_cache_size and one hits the limit of max_tmp_space_limit
when flushing the entry to disk, the query will abort and the
binary log will not contain the last changes to the table.
This will also stop the slave!
This is also true for all Aria tables as Aria cannot do rollback
(except in case of crashes)!
One way to avoid it is to use @@binlog_format=statement for
queries that updates a lot of rows.
Implementation:
- All writes to temporary files or internal temporary tables, that
increases the file size, are routed through temp_file_size_cb_func()
which updates and checks the temp space usage.
- Most of the temporary file monitoring is done inside IO_CACHE.
Temporary file monitoring is done inside the Aria engine.
- MY_TRACK and MY_TRACK_WITH_LIMIT are new flags for ini_io_cache().
MY_TRACK means that we track the file usage. TRACK_WITH_LIMIT means
that we track the file usage and we give an error if the limit is
breached. This is used to not give an error on commit when
binlog_stmp_cache is flushed.
- global_tmp_space_used contains the total tmp space used so far.
This is needed quickly check against max_total_tmp_space_usage.
- Temporary space errors are using EE_LOCAL_TMP_SPACE_FULL and
handler errors are using HA_ERR_LOCAL_TMP_SPACE_FULL.
This is needed until we move general errors to it's own error space
so that they cannot conflict with system error numbers.
- Return value of my_chsize() and mysql_file_chsize() has changed
so that -1 is returned in the case my_chsize() could not decrease
the file size (very unlikely and will not happen on modern systems).
All calls to _chsize() are updated to check for > 0 as the error
condition.
- At the destruction of THD we check that THD::tmp_file_space == 0
- At server end we check that global_tmp_space_used == 0
- As a precaution against errors in the tmp_space_used code, one can set
max_tmp_space_usage and max_total_tmp_space_usage to 0 to disable
the tmp space quota errors.
- truncate_io_cache() function added.
- Aria tables using static or dynamic row length are registered in 8K
increments to avoid some calls to update_tmp_file_size().
Other things:
- Ensure that all handler errors are registered. Before, some engine
errors could be printed as "Unknown error".
- Fixed bug in filesort() that causes a assert if there was an error
when writing to the temporay file.
- Fixed that compute_window_func() now takes into account write errors.
- In case of parallel replication, rpl_group_info::cleanup_context()
could call trans_rollback() with thd->error set, which would cause
an assert. Fixed by resetting the error before calling trans_rollback().
- Fixed bug in subselect3.inc which caused following test to use
heap tables with low value for max_heap_table_size
- Fixed bug in sql_expression_cache where it did not overflow
heap table to Aria table.
- Added Max_tmp_disk_space_used to slow query log.
- Fixed some bugs in log_slow_innodb.test