mirror of
https://github.com/postgres/postgres.git
synced 2025-09-02 04:21:28 +03:00
Fix race condition leading to hanging logical slot creation.
The snapshot assembly during the creation of logical slots relied waiting for transactions in xl_running_xacts to end, by checking for their commit/abort records. Unfortunately, despite locking, it is possible to see an xl_running_xact record listing transactions as ready, that have already WAL-logged an commit/abort record, as the locking just prevents the ProcArray to be adjusted, and the commit record has to be logged first. That lead to either delayed or hanging snapshot creation, because snapbuild.c would wait "forever" to see commit/abort records for some transactions. That hang resolved only if a xl_running_xacts record without any running transactions happened to be logged, far from certain on a busy server. It's impractical to prevent that via more heavyweight locking, the likelihood of deadlocks and significantly increased contention would be too big. Instead change the initial snapshot creation to be solely based on tracking the oldest running transaction via xl_running_xacts->oldestRunningXid - that actually ends up significantly simplifying the code. That has two disadvantages: 1) Because we cannot fully "trust" the contents of xl_running_xacts, we cannot use it to build the initial snapshot. Instead we have to wait twice for all running transactions to finish. 2) Previously a slot, unless the race occurred, could be created when the all transaction perceived as running based on commit/abort records, now we have to wait for the next xl_running_xacts record. To address that, trigger logging new xl_running_xacts record from within snapbuild.c exactly when necessary. Unfortunately snabuild.c's SnapBuild is stored on disk, one of the stupider ideas of a certain Mr Freund, so we can't change it in a minor release. As this is going to be backpatched, we have to hack around a bit to keep on-disk compatibility. A later commit will rejigger that on master. Author: Andres Freund, based on a quite different patch from Petr Jelinek Analyzed-By: Petr Jelinek Reviewed-By: Petr Jelinek Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backpatch: 9.4-, where logical decoding has been introduced
This commit is contained in:
@@ -20,24 +20,30 @@ typedef enum
|
||||
/*
|
||||
* Initial state, we can't do much yet.
|
||||
*/
|
||||
SNAPBUILD_START,
|
||||
SNAPBUILD_START = -1,
|
||||
|
||||
/*
|
||||
* Collecting committed transactions, to build the initial catalog
|
||||
* snapshot.
|
||||
*/
|
||||
SNAPBUILD_BUILDING_SNAPSHOT = 0,
|
||||
|
||||
/*
|
||||
* We have collected enough information to decode tuples in transactions
|
||||
* that started after this.
|
||||
*
|
||||
* Once we reached this we start to collect changes. We cannot apply them
|
||||
* yet because the might be based on transactions that were still running
|
||||
* when we reached them yet.
|
||||
* yet, because they might be based on transactions that were still running
|
||||
* when FULL_SNAPSHOT was reached.
|
||||
*/
|
||||
SNAPBUILD_FULL_SNAPSHOT,
|
||||
SNAPBUILD_FULL_SNAPSHOT = 1,
|
||||
|
||||
/*
|
||||
* Found a point after hitting built_full_snapshot where all transactions
|
||||
* that were running at that point finished. Till we reach that we hold
|
||||
* off calling any commit callbacks.
|
||||
* Found a point after SNAPBUILD_FULL_SNAPSHOT where all transactions that
|
||||
* were running at that point finished. Till we reach that we hold off
|
||||
* calling any commit callbacks.
|
||||
*/
|
||||
SNAPBUILD_CONSISTENT
|
||||
SNAPBUILD_CONSISTENT = 2
|
||||
} SnapBuildState;
|
||||
|
||||
/* forward declare so we don't have to expose the struct to the public */
|
||||
@@ -73,9 +79,6 @@ extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
|
||||
extern void SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
|
||||
TransactionId xid, int nsubxacts,
|
||||
TransactionId *subxacts);
|
||||
extern void SnapBuildAbortTxn(SnapBuild *builder, XLogRecPtr lsn,
|
||||
TransactionId xid, int nsubxacts,
|
||||
TransactionId *subxacts);
|
||||
extern bool SnapBuildProcessChange(SnapBuild *builder, TransactionId xid,
|
||||
XLogRecPtr lsn);
|
||||
extern void SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
|
||||
|
Reference in New Issue
Block a user