1
0
mirror of https://github.com/postgres/postgres.git synced 2025-05-29 16:21:20 +03:00

Fix rare assertion failure in standby, if primary is restarted

During hot standby, ExpireAllKnownAssignedTransactionIds() and
ExpireOldKnownAssignedTransactionIds() functions mark old transactions
as no-longer running, but they failed to update xactCompletionCount
and latestCompletedXid. AFAICS it would not lead to incorrect query
results, because those functions effectively turn in-progress
transactions into aborted transactions and an MVCC snapshot considers
both as "not visible". But it could surprise GetSnapshotDataReuse()
and trigger the "TransactionIdPrecedesOrEquals(TransactionXmin,
RecentXmin))" assertion in it, if the apparent xmin in a backend would
move backwards. We saw this happen when GetCatalogSnapshot() would
reuse an older catalog snapshot, when GetTransactionSnapshot() had
already advanced TransactionXmin.

The bug goes back all the way to commit 623a9ba79b in v14 that
introduced the snapshot reuse mechanism, but it started to happen more
frequently with commit 952365cded6 which removed a
GetTransactionSnapshot() call from backend startup. That made it more
likely for ExpireOldKnownAssignedTransactionIds() to be called between
GetCatalogSnapshot() and the first GetTransactionSnapshot() in a
backend.

Andres Freund first spotted this assertion failure on buildfarm member
'skink'. Reproduction and analysis by Tomas Vondra.

Backpatch-through: 14
Discussion: https://www.postgresql.org/message-id/oey246mcw43cy4qw2hqjmurbd62lfdpcuxyqiu7botx3typpax%40h7o7mfg5zmdj
This commit is contained in:
Heikki Linnakangas 2025-03-23 20:41:16 +02:00
parent 1353b1161a
commit 302ce5bd93

View File

@ -4496,9 +4496,23 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
void
ExpireAllKnownAssignedTransactionIds(void)
{
FullTransactionId latestXid;
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
KnownAssignedXidsRemovePreceding(InvalidTransactionId);
/* Reset latestCompletedXid to nextXid - 1 */
Assert(FullTransactionIdIsValid(TransamVariables->nextXid));
latestXid = TransamVariables->nextXid;
FullTransactionIdRetreat(&latestXid);
TransamVariables->latestCompletedXid = latestXid;
/*
* Any transactions that were in-progress were effectively aborted, so
* advance xactCompletionCount.
*/
TransamVariables->xactCompletionCount++;
/*
* Reset lastOverflowedXid. Currently, lastOverflowedXid has no use after
* the call of this function. But do this for unification with what
@ -4516,8 +4530,18 @@ ExpireAllKnownAssignedTransactionIds(void)
void
ExpireOldKnownAssignedTransactionIds(TransactionId xid)
{
TransactionId latestXid;
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
/* As in ProcArrayEndTransaction, advance latestCompletedXid */
latestXid = xid;
TransactionIdRetreat(latestXid);
MaintainLatestCompletedXidRecovery(latestXid);
/* ... and xactCompletionCount */
TransamVariables->xactCompletionCount++;
/*
* Reset lastOverflowedXid if we know all transactions that have been
* possibly running are being gone. Not doing so could cause an incorrect