This patch adds the possibility to have client commands that do not
return results from DBMS. While processing such commands we must be
able to preserve errors until the next interaction with client.
Specifically if the transaction is bf aborted while processing such
a non-returning command, then we have to keep the deadlock error until
the client issues a command that may return the error.
To handle such cases, client_state::before_command() now takes
parameter keep_command_error. The DBMS is supposed set
keep_command_error true to instruct wsrep-lib to preserve errors (if
any) until the next command which sets keep_command_error false.
Dealing with a case where current client command does not return result.
Work in progress.
Fix typo and add assertions in keep_command_error()
Make keep_command_error a parameter to before_commit()
Fix comment about keep_command_error
Handle keep_command_error with s_must_abort in wsrep_before_command()
Fix unit test
This patch implments replaying for prepared XA transactions.
Replay may happen in the following cases:
1) The transaction is BF aborted in prepared state and is idle. In
that case, the transaction is handed over to rollbacker for replay.
2) The transaction is BF aborted while executing the
commit (i.e. before or after successful certification). In
which case the transaction replays itself from fragment storage.
3) The transaction is BF aborted while certifying its commit
fragment. This case is handled like replay for streaming transactions,
where the provider is directly involved and re-delivers the last
fragment.
Add support for detaching XA transactions. This is useful for handling
the case where the DBMS client has a transaction in prepared state and
disconnects. Before disconnect, the DBMS calls the newly introduced
client_state::xa_detach(), to cleanup the local transaction and
convert it to a high priority transaction. The DBMS may later attempt
to terminate the transaction through client_state::commit_by_xid() or
client_state::rollback_by_xid().
Also in this patch:
- Fix client_state::close() so that it does not rollback transactions
in prepared state
- Changed class wsrep::xid representation to hold enough information
so that DBMS can convert to its native representation
- Fix potential infinite loop in
server_state::find_streaming_applier(wsrep:xid&)
- Append SR keys on prepare fragment and make it pa_unsafe
- Handle one phase commit (simply fall back to two phase)
- Do not rollback prepared streaming clients in
server_state::close_orphaned_transactions()
An assertion
`server_state_.rollback_mode() == wsrep::server_state::rm_async`
fired in `client_state::before_command()` if a BF abort happened
between calls to wait_rollback_complete_and_acquire_ownership()
and before_command().
This commit adds a test to reproduce the assertion and verify
the correct behavior, as well as removes the incorrect assertion
to fix the issue.
when losing error voting:
- if NBO has failed locally (DBMS side), don't override original DBMS
error so it gets reported to the client
- otherwise, report "query interrupted" instead of "error during commit"
- Retry enter_toi() in poll_enter_toi() also for error_connection_failed
which means that the connectivity to the cluster has been lost,
a.k.a non-prim.
Certification keys are needed for NBO end to resolve dependencies
for the write sets which follow NBO end. Without keys the following
write sets do not detect dependency to NBO event and may start applying
too early.
- Set both current error and current error status if provider enter_toi()
or leave_toi() fails.
- Leave NBO mode if TOI cannot be entered in begin_nbo_phase_two().
If timeout option is give, enter_toi_local() and begin_nbo_phase_one()
retry provider::enter_toi() as long as return status indicates
certification failure, given timeout expires or the client is interrupted.
- High priority interface method to apply NBO begin, separate from
apply_toi() in order to avoid implementation to force interpreting
ws_meta flags.
- Method to put client_state into NBO mode when applying NBO begin.
The client_state will process in m_local mode.
- Unit tests for applying NBO
- Implemented calls to enter and leave NBO phase one and two
- Extended client_state mode checking to include m_nbo
- Changed client_state state and mode change sanity checks to
print a warning and assert() instead of throwing exceptions
to be more graceful in release builds.
Sanity checks to detect concurrency bugs were assuming a threading
model where each client state would always be processed within
single thread of execution. This however may be too strong assumption
if the application uses some kind of thread pooling.
This patch relaxes those assumptions by removing current_thread_id_
from client_state and relaxing assertions against owning_thread_id_.
This patch also adds a new method
wait_rollback_complete_and_acquire_ownership() into
client_state. This method is idempotent and can be used to gain
control to client_state before before_command() is called.
The method will wait until possible background rollback process is
over and marks the state to s_exec to protect the state against
new background rollbacks.
Other fixes/improvements:
- High priority globals state is restored after discarding streaming.
- Allowed server_state transition donor -> synced.
- Client state method store_globals() was renamed to acquire_ownership()
to better describe the intent. Method store_globals() was left for
backwards compatibility and marked deprecated.
- populate and pass real error description buffer to provider in case
of applying error
- return 0 from server_state::on_apply() if error voting confirmed
consistency
- remove fragments and rollback after fragment applying failure
- always release streaming applier on commit or rollback
If the application uses caching for client sessions, the
client_state object may be reused. This will cause the opened
client session to have unexpected value for sync_wait_gtid and
last_written_gtid.
In order to work around the problem, reset sync_wait_gtid and
last_written_gtid in client_state::open().
If the size of a SR fragment exceeds the maximum size that the
replication provider allows us to replicate, then we are expected to
set the client error code to e_error_during_commit.
However, client_state::after_statement() unconditionally overrides it
to error e_deadlock_error.
Fixes client_state::after_statement() so that it overrided the error
only if noerror has been set yet.
Storing information that background rollbacker in ongoing in client state has_rollback_
This can be used for detecting if there is ongoing background rollback,
and client should keep waiting in before_command() entry to avoid conflicts
in accessing client state during background rollbacking.
transaction::bf_abort() is modified to set has_rollback_ flag when
backgroung rollbacking has been assigned for the client
sync_rollback_complete() method has been modified to reset the backround
rollbacker flag
* Check error code from fragment release
* Always call streaming rollback from must abort if the
transaction is in executing phase. This is needed to ensure
that rollback fragment replication happens before rollback starts
* Initiate streaming rollback from certify fragment if BF abort
happens after fragment certification.
* Check fragment removal error code in prepare phase. It is possible
that the transaction gets BF aborted during fragment removal.
* Mark fragment certified in certify_fragment() even if the provider
returns cert failed error. With current wsrep-API error codes
it may not be possible to distinquish certification failure
and BF abort during fragment replication. This may also be a
provider bug. As a result rollback fragment may sometimes be
replicated when it would not be necessary.
* Count separately fragments certified and fragments stored in
streaming context. Storing the fragment may ultimately fail
due to BF abort even if the fragment was succesfully certified.
Therefore we need to have separate counter for certified fragments
to determine if the transaction is streaming and seqnos of fragments
which have been succesfully stored.
* Provider release is called only after succesful fragment certification
and fragment store.
* Fixed handling of write sets with rollback flag set in apply_write_set()
* Handle BF rollback also in after_statement() call.
* Added missing after_apply() call when handling rollback fragment.
* Fixed state changes when rollback is starated during preparing state.
The intended purpose for local mode was local storage access without
entering replication hooks. However, the same can be achieved with
high priority mode. Removed replicating mode and use local instead to
denote locally processing clients.
quite useful as there might not be enough information for it
after the statement has been processed. Better to handle retrying
on DBMS side. Also removed after_statement_result enumeration and
return plain int from after_statement().