Transition joiner - disconnecting may happen when the joiner failed
to receive SST succesfully. Because the system is at undefined state
at this point, skip most of the processing in sst_received()
and return control to caller after notifying the provider about
failure.
Transition from server_state connected state to disconnecting must
be allowed to deal with errors during server startup.
Added SST first test cases for server_state transitions:
* Successful join via SST
* Error in connect state
* Error in joiner state
Use iterators for scanning members vector in order to avoid
issues with integer signedness and range checks. The vector is
usually rather small and not in hot codepath, so performance
is here not an issue.
Added unit test for member_index() method.
Provider desync may return an error if the provider cannot communicate
with rest of the cluster. However, this is acceptable for example
if the node has dropped from primary view. Instead of returning
error immediately after failed desync(), attempt to pause the provider
regardless of the error. If pause operation fails, error is returned.
In order to avoid resync in resume_and_resync() in the case desync
failed in desync_and_pause(), new member variable desynced_on_pause_
was introduced to decide whether to resync or not in resume_and_resync().
This variable is protected by pause()/resume() calls since they do
not allow concurrent pause/resume operations.
If the size of a SR fragment exceeds the maximum size that the
replication provider allows us to replicate, then we are expected to
set the client error code to e_error_during_commit.
However, client_state::after_statement() unconditionally overrides it
to error e_deadlock_error.
Fixes client_state::after_statement() so that it overrided the error
only if noerror has been set yet.
Method wsrep::server_state::convert_streaming_client_to_applier() may
insert an entry in streaming_appliers_ map which contains undefined
server_id. This happens if the method is called while in non-primary
state, and server_state::id_ is undefined.
The fix is to use the server_id which is recorded in client's
tansaction object.
Storing information that background rollbacker in ongoing in client state has_rollback_
This can be used for detecting if there is ongoing background rollback,
and client should keep waiting in before_command() entry to avoid conflicts
in accessing client state during background rollbacking.
transaction::bf_abort() is modified to set has_rollback_ flag when
backgroung rollbacking has been assigned for the client
sync_rollback_complete() method has been modified to reset the backround
rollbacker flag
- fixed node ID assertion in on_connect() method,
fixed "sanity checks" to allow reconnection to primary component
- fixed code duplication in on_view() method
Added a call to log_view() to do the internal initializations that
need to be done on receiveing a new view. Note however that it is not
a view *event*. Here we only need to configure the application to
comply with a new state that it has received, so that it can go on
to apply replication events and catch up with the cluster.
This patch changes wsrep::transaction::after_rollback() and
wsrep::transaction::certify_fragment() so that no client state locking
is performed while in storage service scope.
The reason for this change is to not confuse the application as to
which client context locks/unlocks a mutex. More specifically, this
caused MariaDB's safe_mutex to report "Wrong usage of mutex" warnings
as the underlying THD context was switched while using storage service.
When member joins the group and needs to receive an SST it won't
receive the corresponding menbership view event because the SST
happens after the event and will already include the effects of
all events ordered before it. The view then must be recovered from
the received state.
Minor renames and cleanups.
References codership/wsrep-lib#18
Dbsim has internal map of server objects for SST simulation.
This was mapped using server_id, which is not available
anymore when server object is constructed. Changed the dbsim to
use server name instead for internal mapping.
it on disconnect.
- Don't rely on own index from the view because the view may come from
another member (IST/SST), instead always determine own index from own ID.
Refs codership/wsrep-lib#13
There are some corner cases where keys with two parts are needed
for a transaction. Relaxed the assertion and sanity check so that
at least two key parts are needed for each key which is assigned
to a transaction.
Travis:
* Don't install all g++ versions on all build targets.
* Install addons per target to have finer control on what is needed
* Use WSREP_STRICT_BUILD_FLAGS on other targets than GCC 4.8
cmake:
* Fixed CMakeLists.txt to check only boost libraries which are
actually needed
In order to make build successful on wider number of platforms,
removed -Weffc++ from default build options. Added a new cmake
option WSREP_LIB_STRICT_BUILD_FLAGS to enable it.
Unit tests which cause streaming rollback leaked memory because
the streaming applier handle which was created for rollback
fragment handling was not released. Roll back a streaming transaction
and release applier handle appropriately in corresponding tests.
with superproject definitions.
Avoid using client_service::do_2pc() in before_commit() to
determine if 2pc is actually happening, will use transaction
states to deduce that. client_service::do_2pc() should be deprecated.
Fixed a compiler warning in db_high_priority_service.cpp.
Moved SR fragment removal for total order BFd SR transactions
into after_rollback() call to avoid deadlocking while trying
to access storage before rolling back the transaction.
* Check error code from fragment release
* Always call streaming rollback from must abort if the
transaction is in executing phase. This is needed to ensure
that rollback fragment replication happens before rollback starts
* Initiate streaming rollback from certify fragment if BF abort
happens after fragment certification.