The method is needed by GRP provider to recover SR transactions
after becoming connected to the cluster. The SST code path does
not get always executed, and the view change hander is too late
in codepath as the GRP may start applying events without
delivering primary view first.
commit 3b419aa6e2cced92c07c571116aad3d55cd1e7e4
Author: Teemu Ollakka <teemu.ollakka@galeracluster.com>
Date: Sun Feb 19 10:29:34 2023 +0200
Skip fetching config options if provider not loaded via wsrep-API
commit 044220cc067cbf74a956fa1a4476f6d873a78210
Author: Teemu Ollakka <teemu.ollakka@galeracluster.com>
Date: Wed Jul 13 10:31:03 2022 +0300
Operation context pointer for client state
commit eeb05a92384933dd026d5ca3c6f854510bb76eed
Author: Teemu Ollakka <teemu.ollakka@galeracluster.com>
Date: Mon Jul 4 09:03:23 2022 +0300
Add unit test log in gitignore
commit 92a04070fc0dc4f55dc0415442ed9003c29a9cf6
Author: Teemu Ollakka <teemu.ollakka@galeracluster.com>
Date: Sun May 8 12:45:36 2022 +0300
Added convenience method prev() to seqno
commit f83ca1917e50bfe29c3912ccd3a9989ead68378a
Author: Teemu Ollakka <teemu.ollakka@galeracluster.com>
Date: Sun May 1 16:37:24 2022 +0300
Pass victim context for provider on BF abort
This change is needed for custom provider implementations to
have a way to access the victim in the application context.
Helper interface operation_context to pass caller context for
service/provider callbacks in more type safe way.
commit 244eabe8cfcc86ca49981b0d8d081b740a967bda
Author: Teemu Ollakka <teemu.ollakka@galeracluster.com>
Date: Wed May 25 07:39:43 2022 +0300
Handle disconnecting state in on_sync()
When disconnecting from the group, the sync event from the
provider must not change the state back to synced.
commit ba8e23df0dfec71427aac0d35036f26d908b4995
Author: Teemu Ollakka <teemu.ollakka@galeracluster.com>
Date: Tue Mar 22 17:43:52 2022 +0200
Add provider position field to ws_meta and view
Provider position is needed in coordinated recovery
between application and provider. Pass the position
info from provider to application to allow making
it durable.
commit 53e60f64c953b252a47bc91e87b4131465b5f15f
Author: Teemu Ollakka <teemu.ollakka@galeracluster.com>
Date: Sat Mar 19 14:45:57 2022 +0200
Reset TOI meta after releasing total order in provider
This is to keep the TOI meta available in case the provider
implementation needs it.
commit bccb9997f29e94a2d5160d388454ace5f89efb6c
Author: Teemu Ollakka <teemu.ollakka@galeracluster.com>
Date: Mon Jan 3 11:19:58 2022 +0200
Fixed id ostream operator to print human readable ids
commit 6d0b37daafdb87975931b0bd95bb8ea821a4071e
Author: Teemu Ollakka <teemu.ollakka@galeracluster.com>
Date: Wed Dec 15 16:37:45 2021 +0200
Silence unused variable warning
commit 4b8616f3d13137d992b8b994baa48f2981238352
Author: Denis Protivensky <denis.protivensky@galeracluster.com>
Date: Wed Dec 15 16:43:31 2021 +0300
Fix provider loading in test for release builds
commit 6df17812d945a07a6f199e6fe83e287afbc28ed9
Author: Denis Protivensky <denis.protivensky@galeracluster.com>
Date: Tue Dec 14 20:28:56 2021 +0300
Introduce set_provider_factory() method for server_state
This allows injecting an application allocated provider into
server_state.
After this virtual provider getter is unnecessary. Made the getter
normal method and fixed unit tests accordingly.
The condition to skip changing to `s_joined` for all codepaths
which return from donor state. Extracted the logic into separate
method.
Commented start_sst_action in mock_server_service.
Use pointers to pass state objects to service constructors
to work around GCC 12 warning
error: member ‘wsrep::mock_storage_service::client_state_’
is used uninitialized
Removed calls to assert() from public headers to have
full control when assertions are enabled in wsrep-lib
code regardless of parent project build configuration.
Moved methods containing assertions and non-trivial
code from headers into compilation units.
Changed server_state public methods sst_received() and wait_until_state()
to report errors as return value instead of throwing exceptions.
This was done to gradually get rid of public methods which report
errors via exceptions.
This change was part of MDEV-30419.
This patch introduces a queue to store ids of transactions that failed
to send a rollback fragment in streaming_rollback(). This is to avoid
potentially missed rollback fragments when a cluster splits and then
later reforms. Rollback fragments would be missing if a node rolled
back a transaction locally (either BFed or voluntary rollback) while
non-primary, and the attempt to send rollback fragment failed in
transaction::streaming_rollback().
Transaction that fail to send rollback fragment can proceed to
rollback locally. However we must ensure that rollback fragments for
those transactions are eventually delivered by the cluster. This must
be done before a potentially conflicting writeset causes BF-BF
conflicts in the rest of the cluster.
* Add method `restore_prepared_transaction` to `client_state` class
which restores a transaction state from storage given its xid.
* Add method `commit_or_rollback_by_xid` to terminate prepared XA
transactions by xid.
* Make sure that transactions in prepared state are not rolled back
when their master fails/partitions away.
Added a wsrep::thread_service interface to allow application to
inject instrumented thread, mutex and condition variable implementation
for provider.
The interface is defined in include/wsrep/thread_service.hpp.
Sample implementation is provided in dbsim/db_threads.[h|c]pp.
This patch will also clean up some remaining dependencies to
wsrep-API compilation units so that the dependency to wsrep-API
is header only. This will extending the provider support to
later wsrep-API versions.
Fixes a bug where the fact that an SR master leaves the primary view
gets missed. When two consecutive primary views have the same
membership we now assume that every SR needs to be rolled back, as the
system may have been through a state of only non-primary components.
Introduced server_service recover_streaming_appliers() interface
call which will be called in total order whenever streaming appliers
must be recovered. The call comes with two overloads, one which
can be called from client context (e.g. after SST has been received)
and the other from high priority context (e.g. view event handling).
The client context overload should be eventually be deprecated once
there is a mechanism to make provider signal that it has joined to
the cluster and will start applying events.
* Implemented encryption callback and enc_set_key
* Added pure virtual functions for encryption functionality
* Set enc key if provider was not loaded on time
In general the position where the storage recovers after a SST
cannot be known untile the recovery process is over. This in turn
means that the position cannot be known when the server_state
sst_received() method is called. Worked around the problem by
introducing get_position() method into server service which
can be used to get the position from stable storage after SST
has completed and the state has been recovered.
Intruduced server_state::interrupt_state_waiters() to interrupt
all waiters inside server_state::wait_until_state(). This mechanism
is needed when an error is encountered during state change processing
and waiting threads may need to be interrupted to check and handle
the error condition.
Made server_state::wait_until_state() to throw exception if the
wait was interrupted and the new server state is either disconnecting
or disconnected, which usually indicates error condition.
Transition from server_state connected state to disconnecting must
be allowed to deal with errors during server startup.
Added SST first test cases for server_state transitions:
* Successful join via SST
* Error in connect state
* Error in joiner state
Provider desync may return an error if the provider cannot communicate
with rest of the cluster. However, this is acceptable for example
if the node has dropped from primary view. Instead of returning
error immediately after failed desync(), attempt to pause the provider
regardless of the error. If pause operation fails, error is returned.
In order to avoid resync in resume_and_resync() in the case desync
failed in desync_and_pause(), new member variable desynced_on_pause_
was introduced to decide whether to resync or not in resume_and_resync().
This variable is protected by pause()/resume() calls since they do
not allow concurrent pause/resume operations.
- fixed node ID assertion in on_connect() method,
fixed "sanity checks" to allow reconnection to primary component
- fixed code duplication in on_view() method
When member joins the group and needs to receive an SST it won't
receive the corresponding menbership view event because the SST
happens after the event and will already include the effects of
all events ordered before it. The view then must be recovered from
the received state.
Minor renames and cleanups.
References codership/wsrep-lib#18
it on disconnect.
- Don't rely on own index from the view because the view may come from
another member (IST/SST), instead always determine own index from own ID.
Refs codership/wsrep-lib#13
Moved SR fragment removal for total order BFd SR transactions
into after_rollback() call to avoid deadlocking while trying
to access storage before rolling back the transaction.
SR tranasctions are BF aborted or rolled back on primary view
changes according to the following rules:
* Ongoing local SR transactions are BF aborted if the processing
server is not found from the current view.
* All remote SR transactions whose origin server is not included in the
current view are rolled back.
The interface method can be used to notify the DBMS implementation
about state changes in well defined order. The call will be done
under server_state mutex protection.
* Added bootstrap service call to do DBMS side bootstrap operations
during the cluster bootstrap.
* Added last_committed_gtid() to provider interface
* Implemented wait_for_gtid() provider call
* Pass initial position to the server state