Under unknown circumstances, the SQL layer may wrongly disregard an
invocation of thd_mark_transaction_to_rollback() when an InnoDB
transaction had been aborted (rolled back) due to one of the following errors:
* HA_ERR_LOCK_DEADLOCK
* HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON)
* HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON)
Such an error used to cause a crash of InnoDB during transaction commit.
These changes aim to catch and report the error earlier, so that not only
this crash can be avoided but also the original root cause be found and
fixed more easily later.
The idea of this fix is from Michael 'Monty' Widenius.
HA_ERR_ROLLBACK: A new error code that will be translated into
ER_ROLLBACK_ONLY, signalling that the current transaction
has been aborted and the only allowed action is ROLLBACK.
trx_t::state: Add TRX_STATE_ABORTED that is like
TRX_STATE_NOT_STARTED, but noting that the transaction had been
rolled back and aborted.
trx_t::is_started(): Replaces trx_is_started().
ha_innobase: Check the transaction state in various places.
Simplify the logic around SAVEPOINT.
ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only().
The InnoDB logic around transaction savepoints, commit, and rollback
was unnecessarily complex and might have contributed to this
inconsistency. So, we are simplifying that logic as well.
trx_savept_t: Replace with const undo_no_t*. When we rollback to
a savepoint, all we need to know is the number of undo log records
that must survive.
trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t
directly in the space allocated at innobase_hton->savepoint_offset.
fts_trx_create(): Do not copy previous savepoints.
fts_savepoint_rollback(): If a savepoint was not found, roll back
everything after the default savepoint of fts_trx_create().
The test innodb_fts.savepoint is extended to cover this code.
Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
Where a read-only server permits writes through replication, it
should not permit user connections to commit/rollback XA
transactions prepared via replication. The bug reported in
MDEV-30978 shows that this can happen. This is because there is no
read only check in the XA transaction logic, the most relevant one
occurs in ha_commit_trans() for normal statements/transactions.
This patch extends the XA transaction logic to check the read only
status of the server before performing an XA COMMIT or ROLLBACK.
Reviewed By:
Andrei Elkin <andrei.elkin@mariadb.com>
The DeadlockChecker expects to be able to freeze the waits-for graph.
Hence, it is best executed somewhere where we are not holding any
additional mutexes.
lock_wait(): Defer the deadlock check to this function, instead
of executing it in lock_rec_enqueue_waiting(), lock_table_enqueue_waiting().
DeadlockChecker::trx_rollback(): Merge with the only caller,
check_and_resolve().
LockMutexGuard: RAII accessor for lock_sys.mutex.
lock_sys.deadlocks: Replaces lock_deadlock_found.
trx_t: Clean up some comments.
Lifted long standing limitation to the XA of rolling it back at the
transaction's
connection close even if the XA is prepared.
Prepared XA-transaction is made to sustain connection close or server
restart.
The patch consists of
- binary logging extension to write prepared XA part of
transaction signified with
its XID in a new XA_prepare_log_event. The concusion part -
with Commit or Rollback decision - is logged separately as
Query_log_event.
That is in the binlog the XA consists of two separate group of
events.
That makes the whole XA possibly interweaving in binlog with
other XA:s or regular transaction but with no harm to
replication and data consistency.
Gtid_log_event receives two more flags to identify which of the
two XA phases of the transaction it represents. With either flag
set also XID info is added to the event.
When binlog is ON on the server XID::formatID is
constrained to 4 bytes.
- engines are made aware of the server policy to keep up user
prepared XA:s so they (Innodb, rocksdb) don't roll them back
anymore at their disconnect methods.
- slave applier is refined to cope with two phase logged XA:s
including parallel modes of execution.
This patch does not address crash-safe logging of the new events which
is being addressed by MDEV-21469.
CORNER CASES: read-only, pure myisam, binlog-*, @@skip_log_bin, etc
Are addressed along the following policies.
1. The read-only at reconnect marks XID to fail for future
completion with ER_XA_RBROLLBACK.
2. binlog-* filtered XA when it changes engine data is regarded as
loggable even when nothing got cached for binlog. An empty
XA-prepare group is recorded. Consequent Commit-or-Rollback
succeeds in the Engine(s) as well as recorded into binlog.
3. The same applies to the non-transactional engine XA.
4. @@skip_log_bin=OFF does not record anything at XA-prepare
(obviously), but the completion event is recorded into binlog to
admit inconsistency with slave.
The following actions are taken by the patch.
At XA-prepare:
when empty binlog cache - don't do anything to binlog if RO,
otherwise write empty XA_prepare (assert(binlog-filter case)).
At Disconnect:
when Prepared && RO (=> no binlogging was done)
set Xid_cache_element::error := ER_XA_RBROLLBACK
*keep* XID in the cache, and rollback the transaction.
At XA-"complete":
Discover the error, if any don't binlog the "complete",
return the error to the user.
Kudos
-----
Alexey Botchkov took to drive this work initially.
Sergei Golubchik, Sergei Petrunja, Marko Mäkelä provided a number of
good recommendations.
Sergei Voitovich made a magnificent review and improvements to the code.
They all deserve a bunch of thanks for making this work done!
MDEV-21854 xa commit `xid` one phase for already prepared transaction must always error out
Added state and one-phase option checks to XA "external" commit/rollback
branches. While the XA standard does not prohibit it,
Commit and Rollback of an XA external to the current ongoing transaction
is not allowed; after all the current transaction may rollback
to not being able to revert that decision.
XA specification doesn't permit empty gtrid. It is now enforced by this
patch. This solution was agreed in favour of fixing InnoDB, which doesn't
expect empty XID since early 10.5.
Also fixed wrong assertion (and added a test cases) that didn't permit
64 bytes gtrid + 64 bytes bqual.
In InnoDB, an INSERT will not create an explicit lock object. Instead,
the inserted record is initially implicitly locked by the transaction
that wrote its trx_t::id to the hidden system column DB_TRX_ID.
(Other transactions would check if DB_TRX_ID is referring to a
transaction that has not been committed.)
If a record was inserted in the current transaction, it would be
implicitly locked by that transaction. Only if some other transaction
is requesting access to the record, the implicit lock should be
converted to an explicit one, so that the waits-for graph can be
constructed for detecting deadlocks and lock wait timeouts.
Before this fix, InnoDB would convert implicit locks to
explicit ones, even if no conflict exists.
lock_rec_convert_impl_to_expl(): Return whether caller_trx
already holds an explicit lock that covers the record.
row_vers_impl_x_locked_low(): Avoid a lookup if the record matches
caller_trx->id.
lock_trx_has_expl_x_lock(): Renamed from lock_trx_has_rec_x_lock().
row_upd_clust_step(): In a debug assertion, check for implicit lock
before invoking lock_trx_has_expl_x_lock().
rw_trx_hash_t::find(): Make do_ref_count a mandatory parameter.
Assert that trx_id is not 0 (the caller should check it).
trx_sys_t::is_registered(): Only invoke find() if id != 0.
trx_sys_t::find(): Add the optional parameter do_ref_count.
lock_rec_queue_validate(): Avoid lookup for trx_id == 0.