Moved the check for transaction state before certification step
into separate method abort_or_interrupted() which will check the state
and adjust state and client_state error status accordingly.
Moved the check for abort_or_interrupted() to happen before
the state is changed to certifying and write set data is appended.
This makes the check atomic and reduces the probability of race
conditions. After this check we rely on provider side transaction
state management and error reporting until the certification step
is over.
Change to public API: Pass client_state mutex wrappend in unique_lock
object to client_service::interrupted() call. This way the DBMS side
has a control to the lock object in case it needs to unlock it
temporarily. The underlying mutex will always be locked when the lock
object is passed via interrupted() call.
Other: Allow server_state change from donor to connected. This may
happen if the joiner crashes during SST and the provider reports
it before the DBMS side SST mechanism detects the error.
The streaming_rollback() was not called from transaction::bf_abort()
if a streaming transaction was BF aborted in executing state. This
was because the condition to enter streaming_rollback() checked
if the transaction state is executing. However, the transaction
state had already been changed to must_abort before.
As a fix, save the state to local variable when entering
bf_abort() and check against saved state in condition to enter
streaming_rollback().
Added unit test for the correct behavior in the case when streaming
transaction is BF aborted in executing state.
Silenced warning about failing to replicate rollback fragment.
This is pretty normal condition and may happen during shutdown/
configuration changes when SR transactions are rolled back.
Fragment removal for SR transaction is done in transaction context
to make it atomic. However, this means that if the BF abort arrives
during fragment removal, the transaction may be rolled back inside
client_service::remove_fragments(), and client_state::after_prepare()
is left in aborted state. This case was not handled in the
post condition checks of after_prepare().
Fixed assertion in after_prepare() and implemented unit test.
Introduced server_service recover_streaming_appliers() interface
call which will be called in total order whenever streaming appliers
must be recovered. The call comes with two overloads, one which
can be called from client context (e.g. after SST has been received)
and the other from high priority context (e.g. view event handling).
The client context overload should be eventually be deprecated once
there is a mechanism to make provider signal that it has joined to
the cluster and will start applying events.
* Implemented encryption callback and enc_set_key
* Added pure virtual functions for encryption functionality
* Set enc key if provider was not loaded on time
In general the position where the storage recovers after a SST
cannot be known untile the recovery process is over. This in turn
means that the position cannot be known when the server_state
sst_received() method is called. Worked around the problem by
introducing get_position() method into server service which
can be used to get the position from stable storage after SST
has completed and the state has been recovered.
Certification for commit fragments was changed to happen in
before_prepare in order to make GTID available for storage
engines in prepare phase.
Removed unused file src/wsrep-lib_test.cpp
* Adds method wsrep::transaction::streaming_step() so that there is a
single place where streaming context unit counter is udpated.
The method also checks that some data has been generated before
attempting fragment replication.
* Emit a warning if there is an attempt to replicate a fragment and
there is no data to replicate.
Init first join crashed in
server: s1 unallowed state transition: joined -> joined
This was due to missing state check for state in on_primary_view()
before changing to joined state. Added appropriate check.
Implemented unit tests for simple IST scenarios.
Replaced all references to provider_ in server_state methods to
provider() call which is virtual and can be overridden by test classes.
Provider pointer may not be initialized during unit tests yet.
Transition joiner - disconnecting may happen when the joiner failed
to receive SST succesfully. Because the system is at undefined state
at this point, skip most of the processing in sst_received()
and return control to caller after notifying the provider about
failure.
Transition from server_state connected state to disconnecting must
be allowed to deal with errors during server startup.
Added SST first test cases for server_state transitions:
* Successful join via SST
* Error in connect state
* Error in joiner state
Use iterators for scanning members vector in order to avoid
issues with integer signedness and range checks. The vector is
usually rather small and not in hot codepath, so performance
is here not an issue.
Added unit test for member_index() method.
Storing information that background rollbacker in ongoing in client state has_rollback_
This can be used for detecting if there is ongoing background rollback,
and client should keep waiting in before_command() entry to avoid conflicts
in accessing client state during background rollbacking.
transaction::bf_abort() is modified to set has_rollback_ flag when
backgroung rollbacking has been assigned for the client
sync_rollback_complete() method has been modified to reset the backround
rollbacker flag
When member joins the group and needs to receive an SST it won't
receive the corresponding menbership view event because the SST
happens after the event and will already include the effects of
all events ordered before it. The view then must be recovered from
the received state.
Minor renames and cleanups.
References codership/wsrep-lib#18
it on disconnect.
- Don't rely on own index from the view because the view may come from
another member (IST/SST), instead always determine own index from own ID.
Refs codership/wsrep-lib#13
Unit tests which cause streaming rollback leaked memory because
the streaming applier handle which was created for rollback
fragment handling was not released. Roll back a streaming transaction
and release applier handle appropriately in corresponding tests.
with superproject definitions.
Avoid using client_service::do_2pc() in before_commit() to
determine if 2pc is actually happening, will use transaction
states to deduce that. client_service::do_2pc() should be deprecated.
Fixed a compiler warning in db_high_priority_service.cpp.
* Release server lock temporarily when BF aborting local SR
transaction during view event processing
* Check transaction state for BF aborts in before_prepare() after
the lock has acquired after fragment removal
* Send rollback fragment only from streaming_rollback()
* Check fragment removal error code in prepare phase. It is possible
that the transaction gets BF aborted during fragment removal.
* Mark fragment certified in certify_fragment() even if the provider
returns cert failed error. With current wsrep-API error codes
it may not be possible to distinquish certification failure
and BF abort during fragment replication. This may also be a
provider bug. As a result rollback fragment may sometimes be
replicated when it would not be necessary.
SR tranasctions are BF aborted or rolled back on primary view
changes according to the following rules:
* Ongoing local SR transactions are BF aborted if the processing
server is not found from the current view.
* All remote SR transactions whose origin server is not included in the
current view are rolled back.
* Enable codepath to BF abort high priority SR applier
* Pass ws_handle, ws_meta to high priority service rollback
call to allow total ordering of rollback process
* Added server_id into transaction in order to be able to stop
streaming applier during high priority BF abort
* Added missing commit fragment applying
* Don't clear fragments for replaying SR transaction