wsrep-lib

mirror of https://github.com/codership/wsrep-lib.git synced 2025-07-30 07:23:07 +03:00

Author	SHA1	Message	Date
Teemu Ollakka	20128556d6	Restore original thread local storage after releasing streaming applier. In convert_streaming_client_to_applier() the new streaming applier is created and deleted if the server has been disconnected. However, releasing streaming applier may modify thread local storage. Call store_globals() to restore thread local storage before returning.	2019-08-29 14:56:21 +03:00
Leandro Pacheco	55427188c1	avoid holding server_state lock when creating streaming applier This fix avoids creating a lock cycle in MariaDB, between LOCK_global_system_variables, LOCK_wsrep_cluster_config and LOCK_wsrep_server_state.	2019-08-07 16:39:03 -03:00
Teemu Ollakka	0c54cbd3f8	codership/wsrep-lib#106 Relaxed assumptions about threading model Sanity checks to detect concurrency bugs were assuming a threading model where each client state would always be processed within single thread of execution. This however may be too strong assumption if the application uses some kind of thread pooling. This patch relaxes those assumptions by removing current_thread_id_ from client_state and relaxing assertions against owning_thread_id_. This patch also adds a new method wait_rollback_complete_and_acquire_ownership() into client_state. This method is idempotent and can be used to gain control to client_state before before_command() is called. The method will wait until possible background rollback process is over and marks the state to s_exec to protect the state against new background rollbacks. Other fixes/improvements: - High priority globals state is restored after discarding streaming. - Allowed server_state transition donor -> synced. - Client state method store_globals() was renamed to acquire_ownership() to better describe the intent. Method store_globals() was left for backwards compatibility and marked deprecated.	2019-08-05 15:12:44 +03:00
Alexey Yurchenko	0f676bd893	codership/wsrep-lib#104 Error voting support - populate and pass real error description buffer to provider in case of applying error - return 0 from server_state::on_apply() if error voting confirmed consistency - remove fragments and rollback after fragment applying failure - always release streaming applier on commit or rollback	2019-07-15 03:48:55 +03:00
Leandro Pacheco	ae746fb289	fixing reviewer comments - style fixes - small improvement to avoid unnecessary search on close_orphaned_sr	2019-03-05 10:53:21 +01:00
Leandro Pacheco	5ef5becea6	removing previous_primary_view from public iface and style fixes	2019-03-05 10:34:30 +01:00
Leandro Pacheco	71f3fb2d01	close SR transacions on equal consecutive views Fixes a bug where the fact that an SR master leaves the primary view gets missed. When two consecutive primary views have the same membership we now assume that every SR needs to be rolled back, as the system may have been through a state of only non-primary components.	2019-03-05 09:41:48 +01:00
Teemu Ollakka	badf53a28d	Return error code from high_priority_service::adopt_transaction() Adopt transaction may need to start a new transaction on DBMS side, allow returning an error if the transaction start fails.	2019-02-25 12:37:58 +02:00
Teemu Ollakka	49deb7da98	Refactored checks for transaction state before certification Moved the check for transaction state before certification step into separate method abort_or_interrupted() which will check the state and adjust state and client_state error status accordingly. Moved the check for abort_or_interrupted() to happen before the state is changed to certifying and write set data is appended. This makes the check atomic and reduces the probability of race conditions. After this check we rely on provider side transaction state management and error reporting until the certification step is over. Change to public API: Pass client_state mutex wrappend in unique_lock object to client_service::interrupted() call. This way the DBMS side has a control to the lock object in case it needs to unlock it temporarily. The underlying mutex will always be locked when the lock object is passed via interrupted() call. Other: Allow server_state change from donor to connected. This may happen if the joiner crashes during SST and the provider reports it before the DBMS side SST mechanism detects the error.	2019-02-19 22:26:45 +02:00
mkaruza	be98517cb3	Debug log level implementation Debug log will now filter output based on debug level that is enabled.	2019-02-13 13:05:45 +02:00
Teemu Ollakka	510c7f767f	Deal with backwards compatibility in sst_received() Earlier versions of cluster software may not support storing the view info into stable storage. In order to work around this during rolling upgrade, skip sanity checks for recovered view if the view state_id ID is undefined.	2019-02-13 08:54:10 +02:00
Shahriyar Rzayev	4eb6074e67	codership/wsrep-lib#71 to make output cleaner with additional space	2019-02-08 14:04:04 +04:00
Shahriyar Rzayev	ad5d8ea066	Fixed typo for issue #68	2019-02-07 12:29:39 +02:00
mkaruza	e7d72ae7f6	codership/mariadb-wsrep#27 Galera cache encryption * Created interface class for encryption support * Implemented function for setting enc key to provider, callback function for encryption/decryption	2019-02-01 16:57:34 +01:00
Teemu Ollakka	632f8c3b14	Fixed race condition in checking init_initialized on prim view Flag init_initialized_ must be checked before changing the state to s_initializing in on_primary_view() in order to avoid race between main thread and applier thread. Otherwise it is possible that main thread gains control after setting state to initializing and changes the flag init_initialized_ to true before the check is done in on_primary_view().	2019-01-25 12:18:46 +02:00
Teemu Ollakka	6e2c70c226	codership/wsrep-lib#54 Fixed race in server disconnect Convert streaming client to applier only if the server is not in disconnected state. In disconnected state the appliers map is supposed to be empty and will be reconstructed from fragment storage when the server is connected back to cluster.	2019-01-21 17:00:08 +02:00
Teemu Ollakka	a6b38d2428	codership/wsrep-lib#54 Service call to recover streaming appliers Introduced server_service recover_streaming_appliers() interface call which will be called in total order whenever streaming appliers must be recovered. The call comes with two overloads, one which can be called from client context (e.g. after SST has been received) and the other from high priority context (e.g. view event handling). The client context overload should be eventually be deprecated once there is a mechanism to make provider signal that it has joined to the cluster and will start applying events.	2019-01-21 17:00:08 +02:00
Teemu Ollakka	144d8c13c1	Relaxed server_state state transition sanity checks In release build log a warning but continue with state transition anyway. In debug build log warning and crash in assert.	2019-01-21 14:13:25 +02:00
mkaruza	76875c3be1	Allowed transition s_joined to s_donor	2019-01-21 14:13:25 +02:00
Teemu Ollakka	47263df442	Revert "codership/mariadb-wsrep#27 Galera cache encryption" This reverts commit `7e9419e811`.	2019-01-21 14:12:28 +02:00
Teemu Ollakka	476bcdb41e	Revert "codership/mariadb-wsrep#27 Galera cache encryption fixup" This reverts commit `043e8bc2ea`.	2019-01-21 14:12:10 +02:00
Alexey Yurchenko	043e8bc2ea	codership/mariadb-wsrep#27 Galera cache encryption fixup Fixup to enable/disable encryption on provider loading	2019-01-20 15:20:52 +02:00
mkaruza	7e9419e811	codership/mariadb-wsrep#27 Galera cache encryption * Implemented encryption callback and enc_set_key * Added pure virtual functions for encryption functionality * Set enc key if provider was not loaded on time	2019-01-19 23:58:20 +01:00
Teemu Ollakka	89b3561ad8	Read recovered position from sst_received() after initialization In general the position where the storage recovers after a SST cannot be known untile the recovery process is over. This in turn means that the position cannot be known when the server_state sst_received() method is called. Worked around the problem by introducing get_position() method into server service which can be used to get the position from stable storage after SST has completed and the state has been recovered.	2019-01-15 12:35:06 +02:00
Daniele Sciascia	1e9325197a	Turn "Could not find applier context" into debug message Remove warning "Could not find applier context" when no applier context is found while applying rollback fragment.	2019-01-15 10:57:39 +01:00
Teemu Ollakka	653d2526eb	codership/wsrep-lib#34 Changed flow of control in sst_received() Instead of handling error case at the beginning, execute the middle of method body in case of success, leaving only single call to provider().sst_received() at the end.	2018-12-21 15:11:41 +02:00
Teemu Ollakka	4f88e9aea6	codership/wsrep-lib#34 Fixed non-debug compilation error	2018-12-20 19:35:31 +02:00
Teemu Ollakka	e9bb552096	codership/wsrep-lib#34 Provided a method to interrupt state waiters Intruduced server_state::interrupt_state_waiters() to interrupt all waiters inside server_state::wait_until_state(). This mechanism is needed when an error is encountered during state change processing and waiting threads may need to be interrupted to check and handle the error condition. Made server_state::wait_until_state() to throw exception if the wait was interrupted and the new server state is either disconnecting or disconnected, which usually indicates error condition.	2018-12-20 19:35:31 +02:00
Teemu Ollakka	ac5a4cde0d	codership/wsrep-lib#34 Fixed init first IST processing Init first join crashed in server: s1 unallowed state transition: joined -> joined This was due to missing state check for state in on_primary_view() before changing to joined state. Added appropriate check. Implemented unit tests for simple IST scenarios.	2018-12-20 19:35:31 +02:00
Teemu Ollakka	728eaa80b5	Added test cases for donor transitions. Replaced all references to provider_ in server_state methods to provider() call which is virtual and can be overridden by test classes. Provider pointer may not be initialized during unit tests yet.	2018-12-20 19:35:31 +02:00
Teemu Ollakka	7cd0656990	Allow server_state joiner - disconnecting transition. Transition joiner - disconnecting may happen when the joiner failed to receive SST succesfully. Because the system is at undefined state at this point, skip most of the processing in sst_received() and return control to caller after notifying the provider about failure.	2018-12-20 19:35:31 +02:00
Teemu Ollakka	e81c66cd59	Fixed assertion on server_state connected - disconnecting transition Transition from server_state connected state to disconnecting must be allowed to deal with errors during server startup. Added SST first test cases for server_state transitions: * Successful join via SST * Error in connect state * Error in joiner state	2018-12-20 19:35:31 +02:00
Teemu Ollakka	1776537765	codership/wsrep-lib#34 Handle sync-disconnected-sync for init first Handle init first case where state is cycled from synced to disconnected and back synced without SST.	2018-12-20 19:35:31 +02:00
Teemu Ollakka	ae0109f9b3	codership/wsrep-lib#34 Refactored view handling Extracted on_primary_view(), on_non_primary_view() out of on_view().	2018-12-20 19:35:31 +02:00
Teemu Ollakka	76424ad515	codership/wsrep-lib#34 Unit test for sync-disconnect-sync Added unit test for sync-disconnect-sync transition without SST.	2018-12-20 19:35:31 +02:00
Teemu Ollakka	256cd6ae60	codership/wsrep-lib#32 Allow transient desync errors in desync_and_pause() Provider desync may return an error if the provider cannot communicate with rest of the cluster. However, this is acceptable for example if the node has dropped from primary view. Instead of returning error immediately after failed desync(), attempt to pause the provider regardless of the error. If pause operation fails, error is returned. In order to avoid resync in resume_and_resync() in the case desync failed in desync_and_pause(), new member variable desynced_on_pause_ was introduced to decide whether to resync or not in resume_and_resync(). This variable is protected by pause()/resume() calls since they do not allow concurrent pause/resume operations.	2018-12-13 13:04:41 +02:00
Alexey Yurchenko	21781f6644	Improved SST diagnostic logging	2018-12-06 23:26:19 +02:00
Alexey Yurchenko	0f77323d0e	Refs codership/wsrep-lib#18 Don't recover view from state if SST failed. It is pointless and most likely will result in an unnecessary error message logged.	2018-12-06 23:26:19 +02:00
Daniele Sciascia	8f490d431a	Undefined server id in convert_streaming_client_to_applier() Method wsrep::server_state::convert_streaming_client_to_applier() may insert an entry in streaming_appliers_ map which contains undefined server_id. This happens if the method is called while in non-primary state, and server_state::id_ is undefined. The fix is to use the server_id which is recorded in client's tansaction object.	2018-12-03 22:36:24 +01:00
Daniele Sciascia	31f09ca4aa	Refs codership/wsrep-lib#18 Fix compile warning Fixes unused parameter `view` in server_state::go_final() when compiled without assert()s.	2018-11-24 13:42:53 +01:00
Alexey Yurchenko	3950ea3027	Refs codership/wsrep-lib#18 Small fixups - fixed node ID assertion in on_connect() method, fixed "sanity checks" to allow reconnection to primary component - fixed code duplication in on_view() method	2018-11-23 23:27:09 +02:00
Alexey Yurchenko	d95ec7ed99	Refs codership/wsrep-lib#18 : Fixup to proper view from SST processing. Added a call to log_view() to do the internal initializations that need to be done on receiveing a new view. Note however that it is not a view event. Here we only need to configure the application to comply with a new state that it has received, so that it can go on to apply replication events and catch up with the cluster.	2018-11-23 13:11:20 +02:00
Alexey Yurchenko	fb14883547	Recover current view from state after SST. When member joins the group and needs to receive an SST it won't receive the corresponding menbership view event because the SST happens after the event and will already include the effects of all events ordered before it. The view then must be recovered from the received state. Minor renames and cleanups. References codership/wsrep-lib#18	2018-11-12 12:47:42 +02:00
Alexey Yurchenko	ea9971d54b	- Initialize member cluster ID only on connection to cluster and forget it on disconnect. - Don't rely on own index from the view because the view may come from another member (IST/SST), instead always determine own index from own ID. Refs codership/wsrep-lib#13	2018-11-09 00:42:05 +02:00
Teemu Ollakka	7c6ee3f61f	In order to avoid potential deadlocks, release client_state lock when calling server state methods which may acquire server_state mutex. Fixed compilation errors in release mode.	2018-10-15 16:35:19 +03:00
Teemu Ollakka	c0c977f9ab	Added GPLv2 licence and copyright headers.	2018-10-15 15:14:22 +03:00
Alexey Yurchenko	31f244c3b3	Fixed compilation on Ubuntu 18.04 / GCC 7.3.0	2018-10-02 21:41:14 +03:00
Teemu Ollakka	5bf8ad1294	Close SR transactions when disconnecting from the group. Moved SR fragment removal for total order BFd SR transactions into after_rollback() call to avoid deadlocking while trying to access storage before rolling back the transaction.	2018-07-19 15:13:27 +03:00
Teemu Ollakka	ca5c24655f	Fixes to SR transaction processing * Release server lock temporarily when BF aborting local SR transaction during view event processing * Check transaction state for BF aborts in before_prepare() after the lock has acquired after fragment removal * Send rollback fragment only from streaming_rollback()	2018-07-17 15:23:53 +03:00
Teemu Ollakka	b02200b1ef	Fixes to streaming rollback * Check fragment removal error code in prepare phase. It is possible that the transaction gets BF aborted during fragment removal. * Mark fragment certified in certify_fragment() even if the provider returns cert failed error. With current wsrep-API error codes it may not be possible to distinquish certification failure and BF abort during fragment replication. This may also be a provider bug. As a result rollback fragment may sometimes be replicated when it would not be necessary.	2018-07-17 14:34:24 +03:00

1 2

95 Commits