1
0
mirror of https://github.com/codership/wsrep-lib.git synced 2025-07-30 07:23:07 +03:00
Commit Graph

95 Commits

Author SHA1 Message Date
20128556d6 Restore original thread local storage after releasing streaming applier.
In convert_streaming_client_to_applier() the new streaming applier
is created and deleted if the server has been disconnected. However,
releasing streaming applier may modify thread local storage. Call
store_globals() to restore thread local storage before returning.
2019-08-29 14:56:21 +03:00
55427188c1 avoid holding server_state lock when creating streaming applier
This fix avoids creating a lock cycle in MariaDB, between
LOCK_global_system_variables, LOCK_wsrep_cluster_config and
LOCK_wsrep_server_state.
2019-08-07 16:39:03 -03:00
0c54cbd3f8 codership/wsrep-lib#106 Relaxed assumptions about threading model
Sanity checks to detect concurrency bugs were assuming a threading
model where each client state would always be processed within
single thread of execution. This however may be too strong assumption
if the application uses some kind of thread pooling.

This patch relaxes those assumptions by removing current_thread_id_
from client_state and relaxing assertions against owning_thread_id_.

This patch also adds a new method
wait_rollback_complete_and_acquire_ownership() into
client_state. This method is idempotent and can be used to gain
control to client_state before before_command() is called.
The method will wait until possible background rollback process is
over and marks the state to s_exec to protect the state against
new background rollbacks.

Other fixes/improvements:
- High priority globals state is restored after discarding streaming.
- Allowed server_state transition donor -> synced.
- Client state method store_globals() was renamed to acquire_ownership()
  to better describe the intent. Method store_globals() was left for
  backwards compatibility and marked deprecated.
2019-08-05 15:12:44 +03:00
0f676bd893 codership/wsrep-lib#104 Error voting support
- populate and pass real error description buffer to provider in case
   of applying error
 - return 0 from server_state::on_apply() if error voting confirmed
   consistency
 - remove fragments and rollback after fragment applying failure
 - always release streaming applier on commit or rollback
2019-07-15 03:48:55 +03:00
ae746fb289 fixing reviewer comments
- style fixes
- small improvement to avoid unnecessary search on close_orphaned_sr
2019-03-05 10:53:21 +01:00
5ef5becea6 removing previous_primary_view from public iface and style fixes 2019-03-05 10:34:30 +01:00
71f3fb2d01 close SR transacions on equal consecutive views
Fixes a bug where the fact that an SR master leaves the primary view
gets missed. When two consecutive primary views have the same
membership we now assume that every SR needs to be rolled back, as the
system may have been through a state of only non-primary components.
2019-03-05 09:41:48 +01:00
badf53a28d Return error code from high_priority_service::adopt_transaction()
Adopt transaction may need to start a new transaction on DBMS side,
allow returning an error if the transaction start fails.
2019-02-25 12:37:58 +02:00
49deb7da98 Refactored checks for transaction state before certification
Moved the check for transaction state before certification step
into separate method abort_or_interrupted() which will check the state
and adjust state and client_state error status accordingly.

Moved the check for abort_or_interrupted() to happen before
the state is changed to certifying and write set data is appended.
This makes the check atomic and reduces the probability of race
conditions. After this check we rely on provider side transaction
state management and error reporting until the certification step
is over.

Change to public API: Pass client_state mutex wrappend in unique_lock
object to client_service::interrupted() call. This way the DBMS side
has a control to the lock object in case it needs to unlock it
temporarily. The underlying mutex will always be locked when the lock
object is passed via interrupted() call.

Other: Allow server_state change from donor to connected. This may
happen if the joiner crashes during SST and the provider reports
it before the DBMS side SST mechanism detects the error.
2019-02-19 22:26:45 +02:00
be98517cb3 Debug log level implementation
Debug log will now filter output based on debug level that is enabled.
2019-02-13 13:05:45 +02:00
510c7f767f Deal with backwards compatibility in sst_received()
Earlier versions of cluster software may not support storing
the view info into stable storage. In order to work around this
during rolling upgrade, skip sanity checks for recovered view
if the view state_id ID is undefined.
2019-02-13 08:54:10 +02:00
4eb6074e67 codership/wsrep-lib#71 to make output cleaner with additional space 2019-02-08 14:04:04 +04:00
ad5d8ea066 Fixed typo for issue #68 2019-02-07 12:29:39 +02:00
e7d72ae7f6 codership/mariadb-wsrep#27 Galera cache encryption
* Created interface class for encryption support
* Implemented function for setting enc key to provider, callback function for encryption/decryption
2019-02-01 16:57:34 +01:00
632f8c3b14 Fixed race condition in checking init_initialized on prim view
Flag init_initialized_ must be checked before changing the
state to s_initializing in on_primary_view() in order to avoid
race between main thread and applier thread. Otherwise it is
possible that main thread gains control after setting state
to initializing and changes the flag init_initialized_ to true
before the check is done in on_primary_view().
2019-01-25 12:18:46 +02:00
6e2c70c226 codership/wsrep-lib#54 Fixed race in server disconnect
Convert streaming client to applier only if the server is not
in disconnected state. In disconnected state the appliers map
is supposed to be empty and will be reconstructed from fragment
storage when the server is connected back to cluster.
2019-01-21 17:00:08 +02:00
a6b38d2428 codership/wsrep-lib#54 Service call to recover streaming appliers
Introduced server_service recover_streaming_appliers() interface
call which will be called in total order whenever streaming appliers
must be recovered. The call comes with two overloads, one which
can be called from client context (e.g. after SST has been received)
and the other from high priority context (e.g. view event handling).

The client context overload should be eventually be deprecated once
there is a mechanism to make provider signal that it has joined to
the cluster and will start applying events.
2019-01-21 17:00:08 +02:00
144d8c13c1 Relaxed server_state state transition sanity checks
In release build log a warning but continue with state transition
anyway. In debug build log warning and crash in assert.
2019-01-21 14:13:25 +02:00
76875c3be1 Allowed transition s_joined to s_donor 2019-01-21 14:13:25 +02:00
47263df442 Revert "codership/mariadb-wsrep#27 Galera cache encryption"
This reverts commit 7e9419e811.
2019-01-21 14:12:28 +02:00
476bcdb41e Revert "codership/mariadb-wsrep#27 Galera cache encryption fixup"
This reverts commit 043e8bc2ea.
2019-01-21 14:12:10 +02:00
043e8bc2ea codership/mariadb-wsrep#27 Galera cache encryption fixup
Fixup to enable/disable encryption on provider loading
2019-01-20 15:20:52 +02:00
7e9419e811 codership/mariadb-wsrep#27 Galera cache encryption
* Implemented encryption callback and enc_set_key
* Added pure virtual functions for encryption functionality
* Set enc key if provider was not loaded on time
2019-01-19 23:58:20 +01:00
89b3561ad8 Read recovered position from sst_received() after initialization
In general the position where the storage recovers after a SST
cannot be known untile the recovery process is over. This in turn
means that the position cannot be known when the server_state
sst_received() method is called. Worked around the problem by
introducing get_position() method into server service which
can be used to get the position from stable storage after SST
has completed and the state has been recovered.
2019-01-15 12:35:06 +02:00
1e9325197a Turn "Could not find applier context" into debug message
Remove warning "Could not find applier context" when no applier
context is found while applying rollback fragment.
2019-01-15 10:57:39 +01:00
653d2526eb codership/wsrep-lib#34 Changed flow of control in sst_received()
Instead of handling error case at the beginning, execute the middle
of method body in case of success, leaving only single call to
provider().sst_received() at the end.
2018-12-21 15:11:41 +02:00
4f88e9aea6 codership/wsrep-lib#34 Fixed non-debug compilation error 2018-12-20 19:35:31 +02:00
e9bb552096 codership/wsrep-lib#34 Provided a method to interrupt state waiters
Intruduced server_state::interrupt_state_waiters() to interrupt
all waiters inside server_state::wait_until_state(). This mechanism
is needed when an error is encountered during state change processing
and waiting threads may need to be interrupted to check and handle
the error condition.

Made server_state::wait_until_state() to throw exception if the
wait was interrupted and the new server state is either disconnecting
or disconnected, which usually indicates error condition.
2018-12-20 19:35:31 +02:00
ac5a4cde0d codership/wsrep-lib#34 Fixed init first IST processing
Init first join crashed in

  server: s1 unallowed state transition: joined -> joined

This was due to missing state check for state in on_primary_view()
before changing to joined state. Added appropriate check.

Implemented unit tests for simple IST scenarios.
2018-12-20 19:35:31 +02:00
728eaa80b5 Added test cases for donor transitions.
Replaced all references to provider_ in server_state methods to
provider() call which is virtual and can be overridden by test classes.
Provider pointer may not be initialized during unit tests yet.
2018-12-20 19:35:31 +02:00
7cd0656990 Allow server_state joiner - disconnecting transition.
Transition joiner - disconnecting may happen when the joiner failed
to receive SST succesfully. Because the system is at undefined state
at this point, skip most of the processing in sst_received()
and return control to caller after notifying the provider about
failure.
2018-12-20 19:35:31 +02:00
e81c66cd59 Fixed assertion on server_state connected - disconnecting transition
Transition from server_state connected state to disconnecting must
be allowed to deal with errors during server startup.

Added SST first test cases for server_state transitions:
* Successful join via SST
* Error in connect state
* Error in joiner state
2018-12-20 19:35:31 +02:00
1776537765 codership/wsrep-lib#34 Handle sync-disconnected-sync for init first
Handle init first case where state is cycled from synced to
disconnected and back synced without SST.
2018-12-20 19:35:31 +02:00
ae0109f9b3 codership/wsrep-lib#34 Refactored view handling
Extracted on_primary_view(), on_non_primary_view() out of on_view().
2018-12-20 19:35:31 +02:00
76424ad515 codership/wsrep-lib#34 Unit test for sync-disconnect-sync
Added unit test for sync-disconnect-sync transition without SST.
2018-12-20 19:35:31 +02:00
256cd6ae60 codership/wsrep-lib#32 Allow transient desync errors in desync_and_pause()
Provider desync may return an error if the provider cannot communicate
with rest of the cluster. However, this is acceptable for example
if the node has dropped from primary view. Instead of returning
error immediately after failed desync(), attempt to pause the provider
regardless of the error. If pause operation fails, error is returned.
In order to avoid resync in resume_and_resync() in the case desync
failed in desync_and_pause(), new member variable desynced_on_pause_
was introduced to decide whether to resync or not in resume_and_resync().
This variable is protected by pause()/resume() calls since they do
not allow concurrent pause/resume operations.
2018-12-13 13:04:41 +02:00
21781f6644 Improved SST diagnostic logging 2018-12-06 23:26:19 +02:00
0f77323d0e Refs codership/wsrep-lib#18 Don't recover view from state if SST failed.
It is pointless and most likely will result in an unnecessary error
message logged.
2018-12-06 23:26:19 +02:00
8f490d431a Undefined server id in convert_streaming_client_to_applier()
Method wsrep::server_state::convert_streaming_client_to_applier() may
insert an entry in streaming_appliers_ map which contains undefined
server_id. This happens if the method is called while in non-primary
state, and server_state::id_ is undefined.
The fix is to use the server_id which is recorded in client's
tansaction object.
2018-12-03 22:36:24 +01:00
31f09ca4aa Refs codership/wsrep-lib#18 Fix compile warning
Fixes unused parameter `view` in server_state::go_final() when
compiled without assert()s.
2018-11-24 13:42:53 +01:00
3950ea3027 Refs codership/wsrep-lib#18 Small fixups
- fixed node ID assertion in on_connect() method,
   fixed "sanity checks" to allow reconnection to primary component
 - fixed code duplication in on_view() method
2018-11-23 23:27:09 +02:00
d95ec7ed99 Refs codership/wsrep-lib#18: Fixup to proper view from SST processing.
Added a call to log_view() to do the internal initializations that
need to be done on receiveing a new view. Note however that it is not
a view *event*. Here we only need to configure the application to
comply with a new state that it has received, so that it can go on
to apply replication events and catch up with the cluster.
2018-11-23 13:11:20 +02:00
fb14883547 Recover current view from state after SST.
When member joins the group and needs to receive an SST it won't
receive the corresponding menbership view event because the SST
happens after the event and will already include the effects of
all events ordered before it. The view then must be recovered from
the received state.

Minor renames and cleanups.

References codership/wsrep-lib#18
2018-11-12 12:47:42 +02:00
ea9971d54b - Initialize member cluster ID only on connection to cluster and forget
it on disconnect.
 - Don't rely on own index from the view because the view may come from
   another member (IST/SST), instead always determine own index from own ID.

Refs codership/wsrep-lib#13
2018-11-09 00:42:05 +02:00
7c6ee3f61f In order to avoid potential deadlocks, release client_state lock when
calling server state methods which may acquire server_state mutex.

Fixed compilation errors in release mode.
2018-10-15 16:35:19 +03:00
c0c977f9ab Added GPLv2 licence and copyright headers. 2018-10-15 15:14:22 +03:00
31f244c3b3 Fixed compilation on Ubuntu 18.04 / GCC 7.3.0 2018-10-02 21:41:14 +03:00
5bf8ad1294 Close SR transactions when disconnecting from the group.
Moved SR fragment removal for total order BFd SR transactions
into after_rollback() call to avoid deadlocking while trying
to access storage before rolling back the transaction.
2018-07-19 15:13:27 +03:00
ca5c24655f Fixes to SR transaction processing
* Release server lock temporarily when BF aborting local SR
  transaction during view event processing
* Check transaction state for BF aborts in before_prepare() after
  the lock has acquired after fragment removal
* Send rollback fragment only from streaming_rollback()
2018-07-17 15:23:53 +03:00
b02200b1ef Fixes to streaming rollback
* Check fragment removal error code in prepare phase. It is possible
  that the transaction gets BF aborted during fragment removal.
* Mark fragment certified in certify_fragment() even if the provider
  returns cert failed error. With current wsrep-API error codes
  it may not be possible to distinquish certification failure
  and BF abort during fragment replication. This may also be a
  provider bug. As a result rollback fragment may sometimes be
  replicated when it would not be necessary.
2018-07-17 14:34:24 +03:00