1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-10 17:42:29 +03:00

Replace the former method of determining snapshot xmax --- to wit, calling

ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid"
variable that is updated during transaction commit or abort.  Since
latestCompletedXid is written only in places that had to lock ProcArrayLock
exclusively anyway, and is read only in places that had to lock ProcArrayLock
shared anyway, it adds no new locking requirements to the system despite being
cluster-wide.  Moreover, removing ReadNewTransactionId from snapshot
acquisition eliminates the need to take both XidGenLock and ProcArrayLock at
the same time.  Since XidGenLock is sometimes held across I/O this can be a
significant win.  Some preliminary benchmarking suggested that this patch has
no effect on average throughput but can significantly improve the worst-case
transaction times seen in pgbench.  Concept by Florian Pflug, implementation
by Tom Lane.
This commit is contained in:
Tom Lane
2007-09-08 20:31:15 +00:00
parent 0a51e7073c
commit 6bd4f401b0
14 changed files with 331 additions and 213 deletions

View File

@@ -1,4 +1,4 @@
$PostgreSQL: pgsql/src/backend/access/transam/README,v 1.8 2007/09/07 20:59:26 tgl Exp $
$PostgreSQL: pgsql/src/backend/access/transam/README,v 1.9 2007/09/08 20:31:14 tgl Exp $
The Transaction System
----------------------
@@ -238,8 +238,10 @@ reason why this would be bad is that C would see (in the row inserted by A)
earlier changes by B, and it would be inconsistent for C not to see any
of B's changes elsewhere in the database.
Formally, the correctness requirement is "if A sees B as committed,
and B sees C as committed, then A must see C as committed".
Formally, the correctness requirement is "if a snapshot A considers
transaction X as committed, and any of transaction X's snapshots considered
transaction Y as committed, then snapshot A must consider transaction Y as
committed".
What we actually enforce is strict serialization of commits and rollbacks
with snapshot-taking: we do not allow any transaction to exit the set of
@@ -248,42 +250,45 @@ stronger than necessary for consistency, but is relatively simple to
enforce, and it assists with some other issues as explained below.) The
implementation of this is that GetSnapshotData takes the ProcArrayLock in
shared mode (so that multiple backends can take snapshots in parallel),
but xact.c must take the ProcArrayLock in exclusive mode while clearing
MyProc->xid at transaction end (either commit or abort).
but ProcArrayEndTransaction must take the ProcArrayLock in exclusive mode
while clearing MyProc->xid at transaction end (either commit or abort).
GetSnapshotData must in fact acquire ProcArrayLock before it calls
ReadNewTransactionId. Otherwise it would be possible for a transaction A
postdating the xmax to commit, and then an existing transaction B that saw
A as committed to commit, before GetSnapshotData is able to acquire
ProcArrayLock and finish taking its snapshot. This would violate the
consistency requirement, because A would be still running and B not
according to this snapshot.
ProcArrayEndTransaction also holds the lock while advancing the shared
latestCompletedXid variable. This allows GetSnapshotData to use
latestCompletedXid + 1 as xmax for its snapshot: there can be no
transaction >= this xid value that the snapshot needs to consider as
completed.
In short, then, the rule is that no transaction may exit the set of
currently-running transactions between the time we fetch xmax and the time
we finish building our snapshot. However, this restriction only applies
to transactions that have an XID --- read-only transactions can end without
acquiring ProcArrayLock, since they don't affect anyone else's snapshot.
currently-running transactions between the time we fetch latestCompletedXid
and the time we finish building our snapshot. However, this restriction
only applies to transactions that have an XID --- read-only transactions
can end without acquiring ProcArrayLock, since they don't affect anyone
else's snapshot nor latestCompletedXid.
Transaction start, per se, doesn't have any interlocking with these
considerations, since we no longer assign an XID immediately at transaction
start. But when we do decide to allocate an XID, we must require
GetNewTransactionId to store the new XID into the shared ProcArray before
releasing XidGenLock. This ensures that when GetSnapshotData calls
ReadNewTransactionId (which also takes XidGenLock), all active XIDs before
the returned value of nextXid are already present in the ProcArray and
can't be missed by GetSnapshotData. Unfortunately, we can't have
GetNewTransactionId take ProcArrayLock to do this, else it could deadlock
against GetSnapshotData. Therefore, we simply let GetNewTransactionId
store into MyProc->xid without any lock. We are thereby relying on
fetch/store of an XID to be atomic, else other backends might see a
partially-set XID. (NOTE: for multiprocessors that need explicit memory
access fence instructions, this means that acquiring/releasing XidGenLock
is just as necessary as acquiring/releasing ProcArrayLock for
GetSnapshotData to ensure it sees up-to-date xid fields.) This also means
that readers of the ProcArray xid fields must be careful to fetch a value
only once, rather than assume they can read it multiple times and get the
same answer each time.
start. But when we do decide to allocate an XID, GetNewTransactionId must
store the new XID into the shared ProcArray before releasing XidGenLock.
This ensures that all top-level XIDs <= latestCompletedXid are either
present in the ProcArray, or not running anymore. (This guarantee doesn't
apply to subtransaction XIDs, because of the possibility that there's not
room for them in the subxid array; instead we guarantee that they are
present or the overflow flag is set.) If a backend released XidGenLock
before storing its XID into MyProc, then it would be possible for another
backend to allocate and commit a later XID, causing latestCompletedXid to
pass the first backend's XID, before that value became visible in the
ProcArray. That would break GetOldestXmin, as discussed below.
We allow GetNewTransactionId to store the XID into MyProc->xid (or the
subxid array) without taking ProcArrayLock. This was once necessary to
avoid deadlock; while that is no longer the case, it's still beneficial for
performance. We are thereby relying on fetch/store of an XID to be atomic,
else other backends might see a partially-set XID. This also means that
readers of the ProcArray xid fields must be careful to fetch a value only
once, rather than assume they can read it multiple times and get the same
answer each time. (Use volatile-qualified pointers when doing this, to
ensure that the C compiler does exactly what you tell it to.)
Another important activity that uses the shared ProcArray is GetOldestXmin,
which must determine a lower bound for the oldest xmin of any active MVCC
@@ -303,12 +308,10 @@ currently-active XIDs: no xact, in particular not the oldest, can exit
while we hold shared ProcArrayLock. So GetOldestXmin's view of the minimum
active XID will be the same as that of any concurrent GetSnapshotData, and
so it can't produce an overestimate. If there is no active transaction at
all, GetOldestXmin returns the result of ReadNewTransactionId. Note that
two concurrent executions of GetOldestXmin might not see the same result
from ReadNewTransactionId --- but if there is a difference, the intervening
execution(s) of GetNewTransactionId must have stored their XIDs into the
ProcArray, so the later execution of GetOldestXmin will see them and
compute the same global xmin anyway.
all, GetOldestXmin returns latestCompletedXid + 1, which is a lower bound
for the xmin that might be computed by concurrent or later GetSnapshotData
calls. (We know that no XID less than this could be about to appear in
the ProcArray, because of the XidGenLock interlock discussed above.)
GetSnapshotData also performs an oldest-xmin calculation (which had better
match GetOldestXmin's) and stores that into RecentGlobalXmin, which is used

View File

@@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/transam/transam.c,v 1.70 2007/08/01 22:45:07 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/transam/transam.c,v 1.71 2007/09/08 20:31:14 tgl Exp $
*
* NOTES
* This file contains the high level access-method interface to the
@@ -432,6 +432,33 @@ TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
return (diff >= 0);
}
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
*/
TransactionId
TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids)
{
TransactionId result;
/*
* In practice it is highly likely that the xids[] array is sorted, and
* so we could save some cycles by just taking the last child XID, but
* this probably isn't so performance-critical that it's worth depending
* on that assumption. But just to show we're not totally stupid, scan
* the array back-to-front to avoid useless assignments.
*/
result = mainxid;
while (--nxids >= 0)
{
if (TransactionIdPrecedes(result, xids[nxids]))
result = xids[nxids];
}
return result;
}
/*
* TransactionIdGetCommitLSN
*

View File

@@ -7,7 +7,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/transam/twophase.c,v 1.34 2007/09/05 20:53:17 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/transam/twophase.c,v 1.35 2007/09/08 20:31:14 tgl Exp $
*
* NOTES
* Each global transaction is associated with a global transaction
@@ -1127,6 +1127,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
char *buf;
char *bufptr;
TwoPhaseFileHeader *hdr;
TransactionId latestXid;
TransactionId *children;
RelFileNode *commitrels;
RelFileNode *abortrels;
@@ -1162,6 +1163,9 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
abortrels = (RelFileNode *) bufptr;
bufptr += MAXALIGN(hdr->nabortrels * sizeof(RelFileNode));
/* compute latestXid among all children */
latestXid = TransactionIdLatest(xid, hdr->nsubxacts, children);
/*
* The order of operations here is critical: make the XLOG entry for
* commit or abort, then mark the transaction committed or aborted in
@@ -1179,7 +1183,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
hdr->nsubxacts, children,
hdr->nabortrels, abortrels);
ProcArrayRemove(&gxact->proc);
ProcArrayRemove(&gxact->proc, latestXid);
/*
* In case we fail while running the callbacks, mark the gxact invalid so

View File

@@ -6,7 +6,7 @@
* Copyright (c) 2000-2007, PostgreSQL Global Development Group
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/transam/varsup.c,v 1.78 2007/02/15 23:23:22 alvherre Exp $
* $PostgreSQL: pgsql/src/backend/access/transam/varsup.c,v 1.79 2007/09/08 20:31:14 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@@ -31,7 +31,9 @@ VariableCache ShmemVariableCache = NULL;
/*
* Allocate the next XID for my new transaction.
* Allocate the next XID for my new transaction or subtransaction.
*
* The new XID is also stored into MyProc before returning.
*/
TransactionId
GetNewTransactionId(bool isSubXact)
@@ -43,7 +45,11 @@ GetNewTransactionId(bool isSubXact)
* transaction id.
*/
if (IsBootstrapProcessingMode())
{
Assert(!isSubXact);
MyProc->xid = BootstrapTransactionId;
return BootstrapTransactionId;
}
LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
@@ -112,19 +118,19 @@ GetNewTransactionId(bool isSubXact)
TransactionIdAdvance(ShmemVariableCache->nextXid);
/*
* We must store the new XID into the shared PGPROC array before releasing
* XidGenLock. This ensures that when GetSnapshotData calls
* ReadNewTransactionId, all active XIDs before the returned value of
* nextXid are already present in PGPROC. Else we have a race condition.
* We must store the new XID into the shared ProcArray before releasing
* XidGenLock. This ensures that every active XID older than
* latestCompletedXid is present in the ProcArray, which is essential
* for correct OldestXmin tracking; see src/backend/access/transam/README.
*
* XXX by storing xid into MyProc without acquiring ProcArrayLock, we are
* relying on fetch/store of an xid to be atomic, else other backends
* might see a partially-set xid here. But holding both locks at once
* would be a nasty concurrency hit (and in fact could cause a deadlock
* against GetSnapshotData). So for now, assume atomicity. Note that
* readers of PGPROC xid field should be careful to fetch the value only
* once, rather than assume they can read it multiple times and get the
* same answer each time.
* would be a nasty concurrency hit. So for now, assume atomicity.
*
* Note that readers of PGPROC xid fields should be careful to fetch the
* value only once, rather than assume they can read a value multiple
* times and get the same answer each time.
*
* The same comments apply to the subxact xid count and overflow fields.
*
@@ -138,11 +144,10 @@ GetNewTransactionId(bool isSubXact)
* race-condition window, in that the new XID will not appear as running
* until its parent link has been placed into pg_subtrans. However, that
* will happen before anyone could possibly have a reason to inquire about
* the status of the XID, so it seems OK. (Snapshots taken during this
* the status of the XID, so it seems OK. (Snapshots taken during this
* window *will* include the parent XID, so they will deliver the correct
* answer later on when someone does have a reason to inquire.)
*/
if (MyProc != NULL)
{
/*
* Use volatile pointer to prevent code rearrangement; other backends

View File

@@ -10,7 +10,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/access/transam/xact.c,v 1.249 2007/09/07 20:59:26 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/transam/xact.c,v 1.250 2007/09/08 20:31:14 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@@ -233,7 +233,7 @@ static void CallSubXactCallbacks(SubXactEvent event,
SubTransactionId parentSubid);
static void CleanupTransaction(void);
static void CommitTransaction(void);
static void RecordTransactionAbort(bool isSubXact);
static TransactionId RecordTransactionAbort(bool isSubXact);
static void StartTransaction(void);
static void RecordSubTransactionCommit(void);
@@ -748,13 +748,17 @@ AtSubStart_ResourceOwner(void)
/*
* RecordTransactionCommit
*
* Returns latest XID among xact and its children, or InvalidTransactionId
* if the xact has no XID. (We compute that here just because it's easier.)
*
* This is exported only to support an ugly hack in VACUUM FULL.
*/
void
TransactionId
RecordTransactionCommit(void)
{
TransactionId xid = GetTopTransactionIdIfAny();
bool markXidCommitted = TransactionIdIsValid(xid);
TransactionId latestXid = InvalidTransactionId;
int nrels;
RelFileNode *rels;
bool haveNonTemp;
@@ -930,6 +934,9 @@ RecordTransactionCommit(void)
END_CRIT_SECTION();
}
/* Compute latestXid while we have the child XIDs handy */
latestXid = TransactionIdLatest(xid, nchildren, children);
/* Reset XactLastRecEnd until the next transaction writes something */
XactLastRecEnd.xrecoff = 0;
@@ -939,6 +946,8 @@ cleanup:
pfree(rels);
if (children)
pfree(children);
return latestXid;
}
@@ -1084,11 +1093,15 @@ RecordSubTransactionCommit(void)
/*
* RecordTransactionAbort
*
* Returns latest XID among xact and its children, or InvalidTransactionId
* if the xact has no XID. (We compute that here just because it's easier.)
*/
static void
static TransactionId
RecordTransactionAbort(bool isSubXact)
{
TransactionId xid = GetCurrentTransactionIdIfAny();
TransactionId latestXid;
int nrels;
RelFileNode *rels;
int nchildren;
@@ -1108,7 +1121,7 @@ RecordTransactionAbort(bool isSubXact)
/* Reset XactLastRecEnd until the next transaction writes something */
if (!isSubXact)
XactLastRecEnd.xrecoff = 0;
return;
return InvalidTransactionId;
}
/*
@@ -1186,6 +1199,9 @@ RecordTransactionAbort(bool isSubXact)
END_CRIT_SECTION();
/* Compute latestXid while we have the child XIDs handy */
latestXid = TransactionIdLatest(xid, nchildren, children);
/*
* If we're aborting a subtransaction, we can immediately remove failed
* XIDs from PGPROC's cache of running child XIDs. We do that here for
@@ -1193,7 +1209,7 @@ RecordTransactionAbort(bool isSubXact)
* main xacts, the equivalent happens just after this function returns.
*/
if (isSubXact)
XidCacheRemoveRunningXids(xid, nchildren, children);
XidCacheRemoveRunningXids(xid, nchildren, children, latestXid);
/* Reset XactLastRecEnd until the next transaction writes something */
if (!isSubXact)
@@ -1204,6 +1220,8 @@ RecordTransactionAbort(bool isSubXact)
pfree(rels);
if (children)
pfree(children);
return latestXid;
}
/*
@@ -1481,6 +1499,7 @@ static void
CommitTransaction(void)
{
TransactionState s = CurrentTransactionState;
TransactionId latestXid;
ShowTransactionState("CommitTransaction");
@@ -1552,7 +1571,7 @@ CommitTransaction(void)
/*
* Here is where we really truly commit.
*/
RecordTransactionCommit();
latestXid = RecordTransactionCommit();
PG_TRACE1(transaction__commit, MyProc->lxid);
@@ -1560,47 +1579,8 @@ CommitTransaction(void)
* Let others know about no transaction in progress by me. Note that
* this must be done _before_ releasing locks we hold and _after_
* RecordTransactionCommit.
*
* Note: MyProc may be null during bootstrap.
*/
if (MyProc != NULL)
{
if (TransactionIdIsValid(MyProc->xid))
{
/*
* We must lock ProcArrayLock while clearing MyProc->xid, so
* that we do not exit the set of "running" transactions while
* someone else is taking a snapshot. See discussion in
* src/backend/access/transam/README.
*/
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyProc->xid = InvalidTransactionId;
MyProc->lxid = InvalidLocalTransactionId;
MyProc->xmin = InvalidTransactionId;
MyProc->inVacuum = false; /* must be cleared with xid/xmin */
/* Clear the subtransaction-XID cache too while holding the lock */
MyProc->subxids.nxids = 0;
MyProc->subxids.overflowed = false;
LWLockRelease(ProcArrayLock);
}
else
{
/*
* If we have no XID, we don't need to lock, since we won't
* affect anyone else's calculation of a snapshot. We might
* change their estimate of global xmin, but that's OK.
*/
MyProc->lxid = InvalidLocalTransactionId;
MyProc->xmin = InvalidTransactionId;
MyProc->inVacuum = false; /* must be cleared with xid/xmin */
Assert(MyProc->subxids.nxids == 0);
Assert(MyProc->subxids.overflowed == false);
}
}
ProcArrayEndTransaction(MyProc, latestXid);
/*
* This is all post-commit cleanup. Note that if an error is raised here,
@@ -1824,20 +1804,8 @@ PrepareTransaction(void)
* Let others know about no transaction in progress by me. This has to be
* done *after* the prepared transaction has been marked valid, else
* someone may think it is unlocked and recyclable.
*
* We can skip locking ProcArrayLock here, because this action does not
* actually change anyone's view of the set of running XIDs: our entry
* is duplicate with the gxact that has already been inserted into the
* ProcArray.
*/
MyProc->xid = InvalidTransactionId;
MyProc->lxid = InvalidLocalTransactionId;
MyProc->xmin = InvalidTransactionId;
MyProc->inVacuum = false; /* must be cleared with xid/xmin */
/* Clear the subtransaction-XID cache too */
MyProc->subxids.nxids = 0;
MyProc->subxids.overflowed = false;
ProcArrayClearTransaction(MyProc);
/*
* This is all post-transaction cleanup. Note that if an error is raised
@@ -1921,6 +1889,7 @@ static void
AbortTransaction(void)
{
TransactionState s = CurrentTransactionState;
TransactionId latestXid;
/* Prevent cancel/die interrupt while cleaning up */
HOLD_INTERRUPTS();
@@ -1987,7 +1956,7 @@ AbortTransaction(void)
* Advertise the fact that we aborted in pg_clog (assuming that we got as
* far as assigning an XID to advertise).
*/
RecordTransactionAbort(false);
latestXid = RecordTransactionAbort(false);
PG_TRACE1(transaction__abort, MyProc->lxid);
@@ -1995,49 +1964,8 @@ AbortTransaction(void)
* Let others know about no transaction in progress by me. Note that this
* must be done _before_ releasing locks we hold and _after_
* RecordTransactionAbort.
*
* Note: MyProc may be null during bootstrap.
*/
if (MyProc != NULL)
{
if (TransactionIdIsValid(MyProc->xid))
{
/*
* We must lock ProcArrayLock while clearing MyProc->xid, so
* that we do not exit the set of "running" transactions while
* someone else is taking a snapshot. See discussion in
* src/backend/access/transam/README.
*/
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyProc->xid = InvalidTransactionId;
MyProc->lxid = InvalidLocalTransactionId;
MyProc->xmin = InvalidTransactionId;
MyProc->inVacuum = false; /* must be cleared with xid/xmin */
MyProc->inCommit = false; /* be sure this gets cleared */
/* Clear the subtransaction-XID cache too while holding the lock */
MyProc->subxids.nxids = 0;
MyProc->subxids.overflowed = false;
LWLockRelease(ProcArrayLock);
}
else
{
/*
* If we have no XID, we don't need to lock, since we won't
* affect anyone else's calculation of a snapshot. We might
* change their estimate of global xmin, but that's OK.
*/
MyProc->lxid = InvalidLocalTransactionId;
MyProc->xmin = InvalidTransactionId;
MyProc->inVacuum = false; /* must be cleared with xid/xmin */
MyProc->inCommit = false; /* be sure this gets cleared */
Assert(MyProc->subxids.nxids == 0);
Assert(MyProc->subxids.overflowed == false);
}
}
ProcArrayEndTransaction(MyProc, latestXid);
/*
* Post-abort cleanup. See notes in CommitTransaction() concerning
@@ -3863,7 +3791,7 @@ AbortSubTransaction(void)
s->parent->subTransactionId);
/* Advertise the fact that we aborted in pg_clog. */
RecordTransactionAbort(true);
(void) RecordTransactionAbort(true);
/* Post-abort cleanup */
if (TransactionIdIsValid(s->transactionId))

View File

@@ -7,7 +7,7 @@
* Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* $PostgreSQL: pgsql/src/backend/access/transam/xlog.c,v 1.280 2007/09/05 18:10:47 tgl Exp $
* $PostgreSQL: pgsql/src/backend/access/transam/xlog.c,v 1.281 2007/09/08 20:31:14 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@@ -5196,6 +5196,10 @@ StartupXLOG(void)
XLogCtl->ckptXidEpoch = ControlFile->checkPointCopy.nextXidEpoch;
XLogCtl->ckptXid = ControlFile->checkPointCopy.nextXid;
/* also initialize latestCompletedXid, to nextXid - 1 */
ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
TransactionIdRetreat(ShmemVariableCache->latestCompletedXid);
/* Start up the commit log and related stuff, too */
StartupCLOG();
StartupSUBTRANS(oldestActiveXID);