mirror of
https://github.com/postgres/postgres.git
synced 2025-11-15 03:41:20 +03:00
pgindent run for 9.4
This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
This commit is contained in:
@@ -11,15 +11,15 @@
|
||||
* log can be broken into relatively small, independent segments.
|
||||
*
|
||||
* XLOG interactions: this module generates an XLOG record whenever a new
|
||||
* CLOG page is initialized to zeroes. Other writes of CLOG come from
|
||||
* CLOG page is initialized to zeroes. Other writes of CLOG come from
|
||||
* recording of transaction commit or abort in xact.c, which generates its
|
||||
* own XLOG records for these events and will re-perform the status update
|
||||
* on redo; so we need make no additional XLOG entry here. For synchronous
|
||||
* on redo; so we need make no additional XLOG entry here. For synchronous
|
||||
* transaction commits, the XLOG is guaranteed flushed through the XLOG commit
|
||||
* record before we are called to log a commit, so the WAL rule "write xlog
|
||||
* before data" is satisfied automatically. However, for async commits we
|
||||
* must track the latest LSN affecting each CLOG page, so that we can flush
|
||||
* XLOG that far and satisfy the WAL rule. We don't have to worry about this
|
||||
* XLOG that far and satisfy the WAL rule. We don't have to worry about this
|
||||
* for aborts (whether sync or async), since the post-crash assumption would
|
||||
* be that such transactions failed anyway.
|
||||
*
|
||||
@@ -105,7 +105,7 @@ static void set_status_by_pages(int nsubxids, TransactionId *subxids,
|
||||
* in the tree of xid. In various cases nsubxids may be zero.
|
||||
*
|
||||
* lsn must be the WAL location of the commit record when recording an async
|
||||
* commit. For a synchronous commit it can be InvalidXLogRecPtr, since the
|
||||
* commit. For a synchronous commit it can be InvalidXLogRecPtr, since the
|
||||
* caller guarantees the commit record is already flushed in that case. It
|
||||
* should be InvalidXLogRecPtr for abort cases, too.
|
||||
*
|
||||
@@ -417,7 +417,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
|
||||
* Testing during the PostgreSQL 9.2 development cycle revealed that on a
|
||||
* large multi-processor system, it was possible to have more CLOG page
|
||||
* requests in flight at one time than the numebr of CLOG buffers which existed
|
||||
* at that time, which was hardcoded to 8. Further testing revealed that
|
||||
* at that time, which was hardcoded to 8. Further testing revealed that
|
||||
* performance dropped off with more than 32 CLOG buffers, possibly because
|
||||
* the linear buffer search algorithm doesn't scale well.
|
||||
*
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
*
|
||||
* The pg_multixact manager is a pg_clog-like manager that stores an array of
|
||||
* MultiXactMember for each MultiXactId. It is a fundamental part of the
|
||||
* shared-row-lock implementation. Each MultiXactMember is comprised of a
|
||||
* shared-row-lock implementation. Each MultiXactMember is comprised of a
|
||||
* TransactionId and a set of flag bits. The name is a bit historical:
|
||||
* originally, a MultiXactId consisted of more than one TransactionId (except
|
||||
* in rare corner cases), hence "multi". Nowadays, however, it's perfectly
|
||||
@@ -18,7 +18,7 @@
|
||||
*
|
||||
* We use two SLRU areas, one for storing the offsets at which the data
|
||||
* starts for each MultiXactId in the other one. This trick allows us to
|
||||
* store variable length arrays of TransactionIds. (We could alternatively
|
||||
* store variable length arrays of TransactionIds. (We could alternatively
|
||||
* use one area containing counts and TransactionIds, with valid MultiXactId
|
||||
* values pointing at slots containing counts; but that way seems less robust
|
||||
* since it would get completely confused if someone inquired about a bogus
|
||||
@@ -38,7 +38,7 @@
|
||||
*
|
||||
* Like clog.c, and unlike subtrans.c, we have to preserve state across
|
||||
* crashes and ensure that MXID and offset numbering increases monotonically
|
||||
* across a crash. We do this in the same way as it's done for transaction
|
||||
* across a crash. We do this in the same way as it's done for transaction
|
||||
* IDs: the WAL record is guaranteed to contain evidence of every MXID we
|
||||
* could need to worry about, and we just make sure that at the end of
|
||||
* replay, the next-MXID and next-offset counters are at least as large as
|
||||
@@ -50,7 +50,7 @@
|
||||
* The minimum value in each database is stored in pg_database, and the
|
||||
* global minimum is part of pg_control. Any vacuum that is able to
|
||||
* advance its database's minimum value also computes a new global minimum,
|
||||
* and uses this value to truncate older segments. When new multixactid
|
||||
* and uses this value to truncate older segments. When new multixactid
|
||||
* values are to be created, care is taken that the counter does not
|
||||
* fall within the wraparound horizon considering the global minimum value.
|
||||
*
|
||||
@@ -85,13 +85,13 @@
|
||||
|
||||
|
||||
/*
|
||||
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
|
||||
* Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
|
||||
* used everywhere else in Postgres.
|
||||
*
|
||||
* Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
|
||||
* MultiXact page numbering also wraps around at
|
||||
* 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
|
||||
* 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_SEGMENTS_PER_PAGE. We need
|
||||
* 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_SEGMENTS_PER_PAGE. We need
|
||||
* take no explicit notice of that fact in this module, except when comparing
|
||||
* segment and page numbers in TruncateMultiXact (see
|
||||
* MultiXactOffsetPagePrecedes).
|
||||
@@ -110,7 +110,7 @@
|
||||
* additional flag bits for each TransactionId. To do this without getting
|
||||
* into alignment issues, we store four bytes of flags, and then the
|
||||
* corresponding 4 Xids. Each such 5-word (20-byte) set we call a "group", and
|
||||
* are stored as a whole in pages. Thus, with 8kB BLCKSZ, we keep 409 groups
|
||||
* are stored as a whole in pages. Thus, with 8kB BLCKSZ, we keep 409 groups
|
||||
* per page. This wastes 12 bytes per page, but that's OK -- simplicity (and
|
||||
* performance) trumps space efficiency here.
|
||||
*
|
||||
@@ -161,7 +161,7 @@ static SlruCtlData MultiXactMemberCtlData;
|
||||
#define MultiXactMemberCtl (&MultiXactMemberCtlData)
|
||||
|
||||
/*
|
||||
* MultiXact state shared across all backends. All this state is protected
|
||||
* MultiXact state shared across all backends. All this state is protected
|
||||
* by MultiXactGenLock. (We also use MultiXactOffsetControlLock and
|
||||
* MultiXactMemberControlLock to guard accesses to the two sets of SLRU
|
||||
* buffers. For concurrency's sake, we avoid holding more than one of these
|
||||
@@ -179,7 +179,7 @@ typedef struct MultiXactStateData
|
||||
MultiXactId lastTruncationPoint;
|
||||
|
||||
/*
|
||||
* oldest multixact that is still on disk. Anything older than this
|
||||
* oldest multixact that is still on disk. Anything older than this
|
||||
* should not be consulted.
|
||||
*/
|
||||
MultiXactId oldestMultiXactId;
|
||||
@@ -269,8 +269,8 @@ typedef struct mXactCacheEnt
|
||||
} mXactCacheEnt;
|
||||
|
||||
#define MAX_CACHE_ENTRIES 256
|
||||
static dlist_head MXactCache = DLIST_STATIC_INIT(MXactCache);
|
||||
static int MXactCacheMembers = 0;
|
||||
static dlist_head MXactCache = DLIST_STATIC_INIT(MXactCache);
|
||||
static int MXactCacheMembers = 0;
|
||||
static MemoryContext MXactContext = NULL;
|
||||
|
||||
#ifdef MULTIXACT_DEBUG
|
||||
@@ -528,7 +528,7 @@ MultiXactIdIsRunning(MultiXactId multi)
|
||||
|
||||
/*
|
||||
* This could be made faster by having another entry point in procarray.c,
|
||||
* walking the PGPROC array only once for all the members. But in most
|
||||
* walking the PGPROC array only once for all the members. But in most
|
||||
* cases nmembers should be small enough that it doesn't much matter.
|
||||
*/
|
||||
for (i = 0; i < nmembers; i++)
|
||||
@@ -579,9 +579,9 @@ MultiXactIdSetOldestMember(void)
|
||||
* back. Which would be wrong.
|
||||
*
|
||||
* Note that a shared lock is sufficient, because it's enough to stop
|
||||
* someone from advancing nextMXact; and nobody else could be trying to
|
||||
* write to our OldestMember entry, only reading (and we assume storing
|
||||
* it is atomic.)
|
||||
* someone from advancing nextMXact; and nobody else could be trying
|
||||
* to write to our OldestMember entry, only reading (and we assume
|
||||
* storing it is atomic.)
|
||||
*/
|
||||
LWLockAcquire(MultiXactGenLock, LW_SHARED);
|
||||
|
||||
@@ -615,7 +615,7 @@ MultiXactIdSetOldestMember(void)
|
||||
* The value to set is the oldest of nextMXact and all the valid per-backend
|
||||
* OldestMemberMXactId[] entries. Because of the locking we do, we can be
|
||||
* certain that no subsequent call to MultiXactIdSetOldestMember can set
|
||||
* an OldestMemberMXactId[] entry older than what we compute here. Therefore
|
||||
* an OldestMemberMXactId[] entry older than what we compute here. Therefore
|
||||
* there is no live transaction, now or later, that can be a member of any
|
||||
* MultiXactId older than the OldestVisibleMXactId we compute here.
|
||||
*/
|
||||
@@ -751,7 +751,7 @@ MultiXactIdCreateFromMembers(int nmembers, MultiXactMember *members)
|
||||
* heap_lock_tuple() to have put it there, and heap_lock_tuple() generates
|
||||
* an XLOG record that must follow ours. The normal LSN interlock between
|
||||
* the data page and that XLOG record will ensure that our XLOG record
|
||||
* reaches disk first. If the SLRU members/offsets data reaches disk
|
||||
* reaches disk first. If the SLRU members/offsets data reaches disk
|
||||
* sooner than the XLOG record, we do not care because we'll overwrite it
|
||||
* with zeroes unless the XLOG record is there too; see notes at top of
|
||||
* this file.
|
||||
@@ -882,7 +882,7 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
|
||||
* GetNewMultiXactId
|
||||
* Get the next MultiXactId.
|
||||
*
|
||||
* Also, reserve the needed amount of space in the "members" area. The
|
||||
* Also, reserve the needed amount of space in the "members" area. The
|
||||
* starting offset of the reserved space is returned in *offset.
|
||||
*
|
||||
* This may generate XLOG records for expansion of the offsets and/or members
|
||||
@@ -916,7 +916,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
|
||||
|
||||
/*----------
|
||||
* Check to see if it's safe to assign another MultiXactId. This protects
|
||||
* against catastrophic data loss due to multixact wraparound. The basic
|
||||
* against catastrophic data loss due to multixact wraparound. The basic
|
||||
* rules are:
|
||||
*
|
||||
* If we're past multiVacLimit, start trying to force autovacuum cycles.
|
||||
@@ -930,7 +930,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
|
||||
{
|
||||
/*
|
||||
* For safety's sake, we release MultiXactGenLock while sending
|
||||
* signals, warnings, etc. This is not so much because we care about
|
||||
* signals, warnings, etc. This is not so much because we care about
|
||||
* preserving concurrency in this situation, as to avoid any
|
||||
* possibility of deadlock while doing get_database_name(). First,
|
||||
* copy all the shared values we'll need in this path.
|
||||
@@ -981,8 +981,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
|
||||
(errmsg_plural("database \"%s\" must be vacuumed before %u more MultiXactId is used",
|
||||
"database \"%s\" must be vacuumed before %u more MultiXactIds are used",
|
||||
multiWrapLimit - result,
|
||||
oldest_datname,
|
||||
multiWrapLimit - result),
|
||||
oldest_datname,
|
||||
multiWrapLimit - result),
|
||||
errhint("Execute a database-wide VACUUM in that database.\n"
|
||||
"You might also need to commit or roll back old prepared transactions.")));
|
||||
else
|
||||
@@ -990,8 +990,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
|
||||
(errmsg_plural("database with OID %u must be vacuumed before %u more MultiXactId is used",
|
||||
"database with OID %u must be vacuumed before %u more MultiXactIds are used",
|
||||
multiWrapLimit - result,
|
||||
oldest_datoid,
|
||||
multiWrapLimit - result),
|
||||
oldest_datoid,
|
||||
multiWrapLimit - result),
|
||||
errhint("Execute a database-wide VACUUM in that database.\n"
|
||||
"You might also need to commit or roll back old prepared transactions.")));
|
||||
}
|
||||
@@ -1036,7 +1036,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
|
||||
* until after file extension has succeeded!
|
||||
*
|
||||
* We don't care about MultiXactId wraparound here; it will be handled by
|
||||
* the next iteration. But note that nextMXact may be InvalidMultiXactId
|
||||
* the next iteration. But note that nextMXact may be InvalidMultiXactId
|
||||
* or the first value on a segment-beginning page after this routine
|
||||
* exits, so anyone else looking at the variable must be prepared to deal
|
||||
* with either case. Similarly, nextOffset may be zero, but we won't use
|
||||
@@ -1114,16 +1114,16 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
|
||||
* need to allow an empty set to be returned regardless, if the caller is
|
||||
* willing to accept it; the caller is expected to check that it's an
|
||||
* allowed condition (such as ensuring that the infomask bits set on the
|
||||
* tuple are consistent with the pg_upgrade scenario). If the caller is
|
||||
* tuple are consistent with the pg_upgrade scenario). If the caller is
|
||||
* expecting this to be called only on recently created multis, then we
|
||||
* raise an error.
|
||||
*
|
||||
* Conversely, an ID >= nextMXact shouldn't ever be seen here; if it is
|
||||
* seen, it implies undetected ID wraparound has occurred. This raises a
|
||||
* seen, it implies undetected ID wraparound has occurred. This raises a
|
||||
* hard error.
|
||||
*
|
||||
* Shared lock is enough here since we aren't modifying any global state.
|
||||
* Acquire it just long enough to grab the current counter values. We may
|
||||
* Acquire it just long enough to grab the current counter values. We may
|
||||
* need both nextMXact and nextOffset; see below.
|
||||
*/
|
||||
LWLockAcquire(MultiXactGenLock, LW_SHARED);
|
||||
@@ -1151,12 +1151,12 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
|
||||
|
||||
/*
|
||||
* Find out the offset at which we need to start reading MultiXactMembers
|
||||
* and the number of members in the multixact. We determine the latter as
|
||||
* and the number of members in the multixact. We determine the latter as
|
||||
* the difference between this multixact's starting offset and the next
|
||||
* one's. However, there are some corner cases to worry about:
|
||||
*
|
||||
* 1. This multixact may be the latest one created, in which case there is
|
||||
* no next one to look at. In this case the nextOffset value we just
|
||||
* no next one to look at. In this case the nextOffset value we just
|
||||
* saved is the correct endpoint.
|
||||
*
|
||||
* 2. The next multixact may still be in process of being filled in: that
|
||||
@@ -1167,11 +1167,11 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
|
||||
* (because we are careful to pre-zero offset pages). Because
|
||||
* GetNewMultiXactId will never return zero as the starting offset for a
|
||||
* multixact, when we read zero as the next multixact's offset, we know we
|
||||
* have this case. We sleep for a bit and try again.
|
||||
* have this case. We sleep for a bit and try again.
|
||||
*
|
||||
* 3. Because GetNewMultiXactId increments offset zero to offset one to
|
||||
* handle case #2, there is an ambiguity near the point of offset
|
||||
* wraparound. If we see next multixact's offset is one, is that our
|
||||
* wraparound. If we see next multixact's offset is one, is that our
|
||||
* multixact's actual endpoint, or did it end at zero with a subsequent
|
||||
* increment? We handle this using the knowledge that if the zero'th
|
||||
* member slot wasn't filled, it'll contain zero, and zero isn't a valid
|
||||
@@ -1297,8 +1297,8 @@ retry:
|
||||
|
||||
/*
|
||||
* MultiXactHasRunningRemoteMembers
|
||||
* Does the given multixact have still-live members from
|
||||
* transactions other than our own?
|
||||
* Does the given multixact have still-live members from
|
||||
* transactions other than our own?
|
||||
*/
|
||||
bool
|
||||
MultiXactHasRunningRemoteMembers(MultiXactId multi)
|
||||
@@ -1694,7 +1694,7 @@ multixact_twophase_postabort(TransactionId xid, uint16 info,
|
||||
|
||||
/*
|
||||
* Initialization of shared memory for MultiXact. We use two SLRU areas,
|
||||
* thus double memory. Also, reserve space for the shared MultiXactState
|
||||
* thus double memory. Also, reserve space for the shared MultiXactState
|
||||
* struct and the per-backend MultiXactId arrays (two of those, too).
|
||||
*/
|
||||
Size
|
||||
@@ -1754,7 +1754,7 @@ MultiXactShmemInit(void)
|
||||
|
||||
/*
|
||||
* This func must be called ONCE on system install. It creates the initial
|
||||
* MultiXact segments. (The MultiXacts directories are assumed to have been
|
||||
* MultiXact segments. (The MultiXacts directories are assumed to have been
|
||||
* created by initdb, and MultiXactShmemInit must have been called already.)
|
||||
*/
|
||||
void
|
||||
@@ -1849,7 +1849,7 @@ MaybeExtendOffsetSlru(void)
|
||||
|
||||
if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
|
||||
{
|
||||
int slotno;
|
||||
int slotno;
|
||||
|
||||
/*
|
||||
* Fortunately for us, SimpleLruWritePage is already prepared to deal
|
||||
@@ -1925,7 +1925,7 @@ TrimMultiXact(void)
|
||||
MultiXactOffsetCtl->shared->latest_page_number = pageno;
|
||||
|
||||
/*
|
||||
* Zero out the remainder of the current offsets page. See notes in
|
||||
* Zero out the remainder of the current offsets page. See notes in
|
||||
* StartupCLOG() for motivation.
|
||||
*/
|
||||
entryno = MultiXactIdToOffsetEntry(multi);
|
||||
@@ -1955,7 +1955,7 @@ TrimMultiXact(void)
|
||||
MultiXactMemberCtl->shared->latest_page_number = pageno;
|
||||
|
||||
/*
|
||||
* Zero out the remainder of the current members page. See notes in
|
||||
* Zero out the remainder of the current members page. See notes in
|
||||
* TrimCLOG() for motivation.
|
||||
*/
|
||||
flagsoff = MXOffsetToFlagsOffset(offset);
|
||||
@@ -2097,7 +2097,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid)
|
||||
|
||||
/*
|
||||
* We'll start complaining loudly when we get within 10M multis of the
|
||||
* stop point. This is kind of arbitrary, but if you let your gas gauge
|
||||
* stop point. This is kind of arbitrary, but if you let your gas gauge
|
||||
* get down to 1% of full, would you be looking for the next gas station?
|
||||
* We need to be fairly liberal about this number because there are lots
|
||||
* of scenarios where most transactions are done by automatic clients that
|
||||
@@ -2172,8 +2172,8 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid)
|
||||
(errmsg_plural("database \"%s\" must be vacuumed before %u more MultiXactId is used",
|
||||
"database \"%s\" must be vacuumed before %u more MultiXactIds are used",
|
||||
multiWrapLimit - curMulti,
|
||||
oldest_datname,
|
||||
multiWrapLimit - curMulti),
|
||||
oldest_datname,
|
||||
multiWrapLimit - curMulti),
|
||||
errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
|
||||
"You might also need to commit or roll back old prepared transactions.")));
|
||||
else
|
||||
@@ -2181,8 +2181,8 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid)
|
||||
(errmsg_plural("database with OID %u must be vacuumed before %u more MultiXactId is used",
|
||||
"database with OID %u must be vacuumed before %u more MultiXactIds are used",
|
||||
multiWrapLimit - curMulti,
|
||||
oldest_datoid,
|
||||
multiWrapLimit - curMulti),
|
||||
oldest_datoid,
|
||||
multiWrapLimit - curMulti),
|
||||
errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
|
||||
"You might also need to commit or roll back old prepared transactions.")));
|
||||
}
|
||||
@@ -2375,16 +2375,16 @@ GetOldestMultiXactId(void)
|
||||
|
||||
/*
|
||||
* SlruScanDirectory callback.
|
||||
* This callback deletes segments that are outside the range determined by
|
||||
* the given page numbers.
|
||||
* This callback deletes segments that are outside the range determined by
|
||||
* the given page numbers.
|
||||
*
|
||||
* Both range endpoints are exclusive (that is, segments containing any of
|
||||
* those pages are kept.)
|
||||
*/
|
||||
typedef struct MembersLiveRange
|
||||
{
|
||||
int rangeStart;
|
||||
int rangeEnd;
|
||||
int rangeStart;
|
||||
int rangeEnd;
|
||||
} MembersLiveRange;
|
||||
|
||||
static bool
|
||||
@@ -2392,15 +2392,15 @@ SlruScanDirCbRemoveMembers(SlruCtl ctl, char *filename, int segpage,
|
||||
void *data)
|
||||
{
|
||||
MembersLiveRange *range = (MembersLiveRange *) data;
|
||||
MultiXactOffset nextOffset;
|
||||
MultiXactOffset nextOffset;
|
||||
|
||||
if ((segpage == range->rangeStart) ||
|
||||
(segpage == range->rangeEnd))
|
||||
return false; /* easy case out */
|
||||
return false; /* easy case out */
|
||||
|
||||
/*
|
||||
* To ensure that no segment is spuriously removed, we must keep track
|
||||
* of new segments added since the start of the directory scan; to do this,
|
||||
* To ensure that no segment is spuriously removed, we must keep track of
|
||||
* new segments added since the start of the directory scan; to do this,
|
||||
* we update our end-of-range point as we run.
|
||||
*
|
||||
* As an optimization, we can skip looking at shared memory if we know for
|
||||
@@ -2473,10 +2473,10 @@ void
|
||||
TruncateMultiXact(MultiXactId oldestMXact)
|
||||
{
|
||||
MultiXactOffset oldestOffset;
|
||||
MultiXactOffset nextOffset;
|
||||
MultiXactOffset nextOffset;
|
||||
mxtruncinfo trunc;
|
||||
MultiXactId earliest;
|
||||
MembersLiveRange range;
|
||||
MembersLiveRange range;
|
||||
|
||||
/*
|
||||
* Note we can't just plow ahead with the truncation; it's possible that
|
||||
|
||||
@@ -15,7 +15,7 @@
|
||||
*
|
||||
* We use a control LWLock to protect the shared data structures, plus
|
||||
* per-buffer LWLocks that synchronize I/O for each buffer. The control lock
|
||||
* must be held to examine or modify any shared state. A process that is
|
||||
* must be held to examine or modify any shared state. A process that is
|
||||
* reading in or writing out a page buffer does not hold the control lock,
|
||||
* only the per-buffer lock for the buffer it is working on.
|
||||
*
|
||||
@@ -34,7 +34,7 @@
|
||||
* could have happened while we didn't have the lock).
|
||||
*
|
||||
* As with the regular buffer manager, it is possible for another process
|
||||
* to re-dirty a page that is currently being written out. This is handled
|
||||
* to re-dirty a page that is currently being written out. This is handled
|
||||
* by re-setting the page's page_dirty flag.
|
||||
*
|
||||
*
|
||||
@@ -96,7 +96,7 @@ typedef struct SlruFlushData *SlruFlush;
|
||||
* page_lru_count entries to be "reset" to lower values than they should have,
|
||||
* in case a process is delayed while it executes this macro. With care in
|
||||
* SlruSelectLRUPage(), this does little harm, and in any case the absolute
|
||||
* worst possible consequence is a nonoptimal choice of page to evict. The
|
||||
* worst possible consequence is a nonoptimal choice of page to evict. The
|
||||
* gain from allowing concurrent reads of SLRU pages seems worth it.
|
||||
*/
|
||||
#define SlruRecentlyUsed(shared, slotno) \
|
||||
@@ -481,7 +481,7 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
|
||||
*
|
||||
* NOTE: only one write attempt is made here. Hence, it is possible that
|
||||
* the page is still dirty at exit (if someone else re-dirtied it during
|
||||
* the write). However, we *do* attempt a fresh write even if the page
|
||||
* the write). However, we *do* attempt a fresh write even if the page
|
||||
* is already being written; this is for checkpoints.
|
||||
*
|
||||
* Control lock must be held at entry, and will be held at exit.
|
||||
@@ -634,7 +634,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
|
||||
* In a crash-and-restart situation, it's possible for us to receive
|
||||
* commands to set the commit status of transactions whose bits are in
|
||||
* already-truncated segments of the commit log (see notes in
|
||||
* SlruPhysicalWritePage). Hence, if we are InRecovery, allow the case
|
||||
* SlruPhysicalWritePage). Hence, if we are InRecovery, allow the case
|
||||
* where the file doesn't exist, and return zeroes instead.
|
||||
*/
|
||||
fd = OpenTransientFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
|
||||
@@ -964,9 +964,9 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
|
||||
|
||||
/*
|
||||
* If we find any EMPTY slot, just select that one. Else choose a
|
||||
* victim page to replace. We normally take the least recently used
|
||||
* victim page to replace. We normally take the least recently used
|
||||
* valid page, but we will never take the slot containing
|
||||
* latest_page_number, even if it appears least recently used. We
|
||||
* latest_page_number, even if it appears least recently used. We
|
||||
* will select a slot that is already I/O busy only if there is no
|
||||
* other choice: a read-busy slot will not be least recently used once
|
||||
* the read finishes, and waiting for an I/O on a write-busy slot is
|
||||
@@ -1041,7 +1041,7 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
|
||||
|
||||
/*
|
||||
* If all pages (except possibly the latest one) are I/O busy, we'll
|
||||
* have to wait for an I/O to complete and then retry. In that
|
||||
* have to wait for an I/O to complete and then retry. In that
|
||||
* unhappy case, we choose to wait for the I/O on the least recently
|
||||
* used slot, on the assumption that it was likely initiated first of
|
||||
* all the I/Os in progress and may therefore finish first.
|
||||
@@ -1193,7 +1193,7 @@ restart:;
|
||||
/*
|
||||
* Hmm, we have (or may have) I/O operations acting on the page, so
|
||||
* we've got to wait for them to finish and then start again. This is
|
||||
* the same logic as in SlruSelectLRUPage. (XXX if page is dirty,
|
||||
* the same logic as in SlruSelectLRUPage. (XXX if page is dirty,
|
||||
* wouldn't it be OK to just discard it without writing it? For now,
|
||||
* keep the logic the same as it was.)
|
||||
*/
|
||||
@@ -1293,7 +1293,7 @@ SlruScanDirectory(SlruCtl ctl, SlruScanCallback callback, void *data)
|
||||
cldir = AllocateDir(ctl->Dir);
|
||||
while ((clde = ReadDir(cldir, ctl->Dir)) != NULL)
|
||||
{
|
||||
size_t len;
|
||||
size_t len;
|
||||
|
||||
len = strlen(clde->d_name);
|
||||
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
*
|
||||
* The pg_subtrans manager is a pg_clog-like manager that stores the parent
|
||||
* transaction Id for each transaction. It is a fundamental part of the
|
||||
* nested transactions implementation. A main transaction has a parent
|
||||
* nested transactions implementation. A main transaction has a parent
|
||||
* of InvalidTransactionId, and each subtransaction has its immediate parent.
|
||||
* The tree can easily be walked from child to parent, but not in the
|
||||
* opposite direction.
|
||||
@@ -191,7 +191,7 @@ SUBTRANSShmemInit(void)
|
||||
* must have been called already.)
|
||||
*
|
||||
* Note: it's not really necessary to create the initial segment now,
|
||||
* since slru.c would create it on first write anyway. But we may as well
|
||||
* since slru.c would create it on first write anyway. But we may as well
|
||||
* do it to be sure the directory is set up correctly.
|
||||
*/
|
||||
void
|
||||
|
||||
@@ -66,7 +66,7 @@ restoreTimeLineHistoryFiles(TimeLineID begin, TimeLineID end)
|
||||
* Try to read a timeline's history file.
|
||||
*
|
||||
* If successful, return the list of component TLIs (the given TLI followed by
|
||||
* its ancestor TLIs). If we can't find the history file, assume that the
|
||||
* its ancestor TLIs). If we can't find the history file, assume that the
|
||||
* timeline has no parents, and return a list of just the specified timeline
|
||||
* ID.
|
||||
*/
|
||||
@@ -150,7 +150,7 @@ readTimeLineHistory(TimeLineID targetTLI)
|
||||
if (nfields != 3)
|
||||
ereport(FATAL,
|
||||
(errmsg("syntax error in history file: %s", fline),
|
||||
errhint("Expected a transaction log switchpoint location.")));
|
||||
errhint("Expected a transaction log switchpoint location.")));
|
||||
|
||||
if (result && tli <= lasttli)
|
||||
ereport(FATAL,
|
||||
@@ -281,7 +281,7 @@ findNewestTimeLine(TimeLineID startTLI)
|
||||
* reason: human-readable explanation of why the timeline was switched
|
||||
*
|
||||
* Currently this is only used at the end recovery, and so there are no locking
|
||||
* considerations. But we should be just as tense as XLogFileInit to avoid
|
||||
* considerations. But we should be just as tense as XLogFileInit to avoid
|
||||
* emplacing a bogus file.
|
||||
*/
|
||||
void
|
||||
@@ -418,7 +418,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
|
||||
|
||||
/*
|
||||
* Prefer link() to rename() here just to be really sure that we don't
|
||||
* overwrite an existing file. However, there shouldn't be one, so
|
||||
* overwrite an existing file. However, there shouldn't be one, so
|
||||
* rename() is an acceptable substitute except for the truly paranoid.
|
||||
*/
|
||||
#if HAVE_WORKING_LINK
|
||||
|
||||
@@ -145,7 +145,7 @@ TransactionIdDidCommit(TransactionId transactionId)
|
||||
* be a window just after database startup where we do not have complete
|
||||
* knowledge in pg_subtrans of the transactions after TransactionXmin.
|
||||
* StartupSUBTRANS() has ensured that any missing information will be
|
||||
* zeroed. Since this case should not happen under normal conditions, it
|
||||
* zeroed. Since this case should not happen under normal conditions, it
|
||||
* seems reasonable to emit a WARNING for it.
|
||||
*/
|
||||
if (xidstatus == TRANSACTION_STATUS_SUB_COMMITTED)
|
||||
@@ -301,7 +301,7 @@ TransactionIdPrecedes(TransactionId id1, TransactionId id2)
|
||||
{
|
||||
/*
|
||||
* If either ID is a permanent XID then we can just do unsigned
|
||||
* comparison. If both are normal, do a modulo-2^32 comparison.
|
||||
* comparison. If both are normal, do a modulo-2^32 comparison.
|
||||
*/
|
||||
int32 diff;
|
||||
|
||||
|
||||
@@ -443,7 +443,7 @@ LockGXact(const char *gid, Oid user)
|
||||
/*
|
||||
* Note: it probably would be possible to allow committing from
|
||||
* another database; but at the moment NOTIFY is known not to work and
|
||||
* there may be some other issues as well. Hence disallow until
|
||||
* there may be some other issues as well. Hence disallow until
|
||||
* someone gets motivated to make it work.
|
||||
*/
|
||||
if (MyDatabaseId != proc->databaseId)
|
||||
@@ -1031,7 +1031,7 @@ EndPrepare(GlobalTransaction gxact)
|
||||
* out the correct state file CRC, we have an inconsistency: the xact is
|
||||
* prepared according to WAL but not according to our on-disk state. We
|
||||
* use a critical section to force a PANIC if we are unable to complete
|
||||
* the write --- then, WAL replay should repair the inconsistency. The
|
||||
* the write --- then, WAL replay should repair the inconsistency. The
|
||||
* odds of a PANIC actually occurring should be very tiny given that we
|
||||
* were able to write the bogus CRC above.
|
||||
*
|
||||
@@ -1069,7 +1069,7 @@ EndPrepare(GlobalTransaction gxact)
|
||||
errmsg("could not close two-phase state file: %m")));
|
||||
|
||||
/*
|
||||
* Mark the prepared transaction as valid. As soon as xact.c marks
|
||||
* Mark the prepared transaction as valid. As soon as xact.c marks
|
||||
* MyPgXact as not running our XID (which it will do immediately after
|
||||
* this function returns), others can commit/rollback the xact.
|
||||
*
|
||||
@@ -1336,7 +1336,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
|
||||
/*
|
||||
* In case we fail while running the callbacks, mark the gxact invalid so
|
||||
* no one else will try to commit/rollback, and so it can be recycled
|
||||
* properly later. It is still locked by our XID so it won't go away yet.
|
||||
* properly later. It is still locked by our XID so it won't go away yet.
|
||||
*
|
||||
* (We assume it's safe to do this without taking TwoPhaseStateLock.)
|
||||
*/
|
||||
@@ -1540,7 +1540,7 @@ CheckPointTwoPhase(XLogRecPtr redo_horizon)
|
||||
*
|
||||
* This approach creates a race condition: someone else could delete a
|
||||
* GXACT between the time we release TwoPhaseStateLock and the time we try
|
||||
* to open its state file. We handle this by special-casing ENOENT
|
||||
* to open its state file. We handle this by special-casing ENOENT
|
||||
* failures: if we see that, we verify that the GXACT is no longer valid,
|
||||
* and if so ignore the failure.
|
||||
*/
|
||||
@@ -1621,7 +1621,7 @@ CheckPointTwoPhase(XLogRecPtr redo_horizon)
|
||||
*
|
||||
* We throw away any prepared xacts with main XID beyond nextXid --- if any
|
||||
* are present, it suggests that the DBA has done a PITR recovery to an
|
||||
* earlier point in time without cleaning out pg_twophase. We dare not
|
||||
* earlier point in time without cleaning out pg_twophase. We dare not
|
||||
* try to recover such prepared xacts since they likely depend on database
|
||||
* state that doesn't exist now.
|
||||
*
|
||||
@@ -1713,7 +1713,7 @@ PrescanPreparedTransactions(TransactionId **xids_p, int *nxids_p)
|
||||
* XID, and they may force us to advance nextXid.
|
||||
*
|
||||
* We don't expect anyone else to modify nextXid, hence we don't
|
||||
* need to hold a lock while examining it. We still acquire the
|
||||
* need to hold a lock while examining it. We still acquire the
|
||||
* lock to modify it, though.
|
||||
*/
|
||||
subxids = (TransactionId *)
|
||||
|
||||
@@ -39,7 +39,7 @@ VariableCache ShmemVariableCache = NULL;
|
||||
*
|
||||
* Note: when this is called, we are actually already inside a valid
|
||||
* transaction, since XIDs are now not allocated until the transaction
|
||||
* does something. So it is safe to do a database lookup if we want to
|
||||
* does something. So it is safe to do a database lookup if we want to
|
||||
* issue a warning about XID wrap.
|
||||
*/
|
||||
TransactionId
|
||||
@@ -165,20 +165,20 @@ GetNewTransactionId(bool isSubXact)
|
||||
/*
|
||||
* Now advance the nextXid counter. This must not happen until after we
|
||||
* have successfully completed ExtendCLOG() --- if that routine fails, we
|
||||
* want the next incoming transaction to try it again. We cannot assign
|
||||
* want the next incoming transaction to try it again. We cannot assign
|
||||
* more XIDs until there is CLOG space for them.
|
||||
*/
|
||||
TransactionIdAdvance(ShmemVariableCache->nextXid);
|
||||
|
||||
/*
|
||||
* We must store the new XID into the shared ProcArray before releasing
|
||||
* XidGenLock. This ensures that every active XID older than
|
||||
* XidGenLock. This ensures that every active XID older than
|
||||
* latestCompletedXid is present in the ProcArray, which is essential for
|
||||
* correct OldestXmin tracking; see src/backend/access/transam/README.
|
||||
*
|
||||
* XXX by storing xid into MyPgXact without acquiring ProcArrayLock, we
|
||||
* are relying on fetch/store of an xid to be atomic, else other backends
|
||||
* might see a partially-set xid here. But holding both locks at once
|
||||
* might see a partially-set xid here. But holding both locks at once
|
||||
* would be a nasty concurrency hit. So for now, assume atomicity.
|
||||
*
|
||||
* Note that readers of PGXACT xid fields should be careful to fetch the
|
||||
@@ -289,7 +289,7 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
|
||||
|
||||
/*
|
||||
* We'll start complaining loudly when we get within 10M transactions of
|
||||
* the stop point. This is kind of arbitrary, but if you let your gas
|
||||
* the stop point. This is kind of arbitrary, but if you let your gas
|
||||
* gauge get down to 1% of full, would you be looking for the next gas
|
||||
* station? We need to be fairly liberal about this number because there
|
||||
* are lots of scenarios where most transactions are done by automatic
|
||||
@@ -390,7 +390,7 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
|
||||
* We primarily check whether oldestXidDB is valid. The cases we have in
|
||||
* mind are that that database was dropped, or the field was reset to zero
|
||||
* by pg_resetxlog. In either case we should force recalculation of the
|
||||
* wrap limit. Also do it if oldestXid is old enough to be forcing
|
||||
* wrap limit. Also do it if oldestXid is old enough to be forcing
|
||||
* autovacuums or other actions; this ensures we update our state as soon
|
||||
* as possible once extra overhead is being incurred.
|
||||
*/
|
||||
|
||||
@@ -270,7 +270,7 @@ static void CallSubXactCallbacks(SubXactEvent event,
|
||||
SubTransactionId parentSubid);
|
||||
static void CleanupTransaction(void);
|
||||
static void CheckTransactionChain(bool isTopLevel, bool throwError,
|
||||
const char *stmtType);
|
||||
const char *stmtType);
|
||||
static void CommitTransaction(void);
|
||||
static TransactionId RecordTransactionAbort(bool isSubXact);
|
||||
static void StartTransaction(void);
|
||||
@@ -450,7 +450,7 @@ AssignTransactionId(TransactionState s)
|
||||
{
|
||||
bool isSubXact = (s->parent != NULL);
|
||||
ResourceOwner currentOwner;
|
||||
bool log_unknown_top = false;
|
||||
bool log_unknown_top = false;
|
||||
|
||||
/* Assert that caller didn't screw up */
|
||||
Assert(!TransactionIdIsValid(s->transactionId));
|
||||
@@ -487,8 +487,8 @@ AssignTransactionId(TransactionState s)
|
||||
|
||||
/*
|
||||
* When wal_level=logical, guarantee that a subtransaction's xid can only
|
||||
* be seen in the WAL stream if its toplevel xid has been logged
|
||||
* before. If necessary we log a xact_assignment record with fewer than
|
||||
* be seen in the WAL stream if its toplevel xid has been logged before.
|
||||
* If necessary we log a xact_assignment record with fewer than
|
||||
* PGPROC_MAX_CACHED_SUBXIDS. Note that it is fine if didLogXid isn't set
|
||||
* for a transaction even though it appears in a WAL record, we just might
|
||||
* superfluously log something. That can happen when an xid is included
|
||||
@@ -637,7 +637,7 @@ SubTransactionIsActive(SubTransactionId subxid)
|
||||
*
|
||||
* "used" must be TRUE if the caller intends to use the command ID to mark
|
||||
* inserted/updated/deleted tuples. FALSE means the ID is being fetched
|
||||
* for read-only purposes (ie, as a snapshot validity cutoff). See
|
||||
* for read-only purposes (ie, as a snapshot validity cutoff). See
|
||||
* CommandCounterIncrement() for discussion.
|
||||
*/
|
||||
CommandId
|
||||
@@ -724,7 +724,7 @@ TransactionIdIsCurrentTransactionId(TransactionId xid)
|
||||
|
||||
/*
|
||||
* We always say that BootstrapTransactionId is "not my transaction ID"
|
||||
* even when it is (ie, during bootstrap). Along with the fact that
|
||||
* even when it is (ie, during bootstrap). Along with the fact that
|
||||
* transam.c always treats BootstrapTransactionId as already committed,
|
||||
* this causes the tqual.c routines to see all tuples as committed, which
|
||||
* is what we need during bootstrap. (Bootstrap mode only inserts tuples,
|
||||
@@ -866,7 +866,7 @@ AtStart_Memory(void)
|
||||
/*
|
||||
* If this is the first time through, create a private context for
|
||||
* AbortTransaction to work in. By reserving some space now, we can
|
||||
* insulate AbortTransaction from out-of-memory scenarios. Like
|
||||
* insulate AbortTransaction from out-of-memory scenarios. Like
|
||||
* ErrorContext, we set it up with slow growth rate and a nonzero minimum
|
||||
* size, so that space will be reserved immediately.
|
||||
*/
|
||||
@@ -969,7 +969,7 @@ AtSubStart_ResourceOwner(void)
|
||||
Assert(s->parent != NULL);
|
||||
|
||||
/*
|
||||
* Create a resource owner for the subtransaction. We make it a child of
|
||||
* Create a resource owner for the subtransaction. We make it a child of
|
||||
* the immediate parent's resource owner.
|
||||
*/
|
||||
s->curTransactionOwner =
|
||||
@@ -989,7 +989,7 @@ AtSubStart_ResourceOwner(void)
|
||||
* RecordTransactionCommit
|
||||
*
|
||||
* Returns latest XID among xact and its children, or InvalidTransactionId
|
||||
* if the xact has no XID. (We compute that here just because it's easier.)
|
||||
* if the xact has no XID. (We compute that here just because it's easier.)
|
||||
*/
|
||||
static TransactionId
|
||||
RecordTransactionCommit(void)
|
||||
@@ -1034,7 +1034,7 @@ RecordTransactionCommit(void)
|
||||
|
||||
/*
|
||||
* If we didn't create XLOG entries, we're done here; otherwise we
|
||||
* should flush those entries the same as a commit record. (An
|
||||
* should flush those entries the same as a commit record. (An
|
||||
* example of a possible record that wouldn't cause an XID to be
|
||||
* assigned is a sequence advance record due to nextval() --- we want
|
||||
* to flush that to disk before reporting commit.)
|
||||
@@ -1051,7 +1051,7 @@ RecordTransactionCommit(void)
|
||||
BufmgrCommit();
|
||||
|
||||
/*
|
||||
* Mark ourselves as within our "commit critical section". This
|
||||
* Mark ourselves as within our "commit critical section". This
|
||||
* forces any concurrent checkpoint to wait until we've updated
|
||||
* pg_clog. Without this, it is possible for the checkpoint to set
|
||||
* REDO after the XLOG record but fail to flush the pg_clog update to
|
||||
@@ -1059,7 +1059,7 @@ RecordTransactionCommit(void)
|
||||
* crashes a little later.
|
||||
*
|
||||
* Note: we could, but don't bother to, set this flag in
|
||||
* RecordTransactionAbort. That's because loss of a transaction abort
|
||||
* RecordTransactionAbort. That's because loss of a transaction abort
|
||||
* is noncritical; the presumption would be that it aborted, anyway.
|
||||
*
|
||||
* It's safe to change the delayChkpt flag of our own backend without
|
||||
@@ -1168,15 +1168,15 @@ RecordTransactionCommit(void)
|
||||
/*
|
||||
* Check if we want to commit asynchronously. We can allow the XLOG flush
|
||||
* to happen asynchronously if synchronous_commit=off, or if the current
|
||||
* transaction has not performed any WAL-logged operation. The latter
|
||||
* transaction has not performed any WAL-logged operation. The latter
|
||||
* case can arise if the current transaction wrote only to temporary
|
||||
* and/or unlogged tables. In case of a crash, the loss of such a
|
||||
* and/or unlogged tables. In case of a crash, the loss of such a
|
||||
* transaction will be irrelevant since temp tables will be lost anyway,
|
||||
* and unlogged tables will be truncated. (Given the foregoing, you might
|
||||
* think that it would be unnecessary to emit the XLOG record at all in
|
||||
* this case, but we don't currently try to do that. It would certainly
|
||||
* cause problems at least in Hot Standby mode, where the
|
||||
* KnownAssignedXids machinery requires tracking every XID assignment. It
|
||||
* KnownAssignedXids machinery requires tracking every XID assignment. It
|
||||
* might be OK to skip it only when wal_level < hot_standby, but for now
|
||||
* we don't.)
|
||||
*
|
||||
@@ -1423,7 +1423,7 @@ AtSubCommit_childXids(void)
|
||||
* RecordTransactionAbort
|
||||
*
|
||||
* Returns latest XID among xact and its children, or InvalidTransactionId
|
||||
* if the xact has no XID. (We compute that here just because it's easier.)
|
||||
* if the xact has no XID. (We compute that here just because it's easier.)
|
||||
*/
|
||||
static TransactionId
|
||||
RecordTransactionAbort(bool isSubXact)
|
||||
@@ -1440,7 +1440,7 @@ RecordTransactionAbort(bool isSubXact)
|
||||
|
||||
/*
|
||||
* If we haven't been assigned an XID, nobody will care whether we aborted
|
||||
* or not. Hence, we're done in that case. It does not matter if we have
|
||||
* or not. Hence, we're done in that case. It does not matter if we have
|
||||
* rels to delete (note that this routine is not responsible for actually
|
||||
* deleting 'em). We cannot have any child XIDs, either.
|
||||
*/
|
||||
@@ -1456,7 +1456,7 @@ RecordTransactionAbort(bool isSubXact)
|
||||
* We have a valid XID, so we should write an ABORT record for it.
|
||||
*
|
||||
* We do not flush XLOG to disk here, since the default assumption after a
|
||||
* crash would be that we aborted, anyway. For the same reason, we don't
|
||||
* crash would be that we aborted, anyway. For the same reason, we don't
|
||||
* need to worry about interlocking against checkpoint start.
|
||||
*/
|
||||
|
||||
@@ -1624,7 +1624,7 @@ AtSubAbort_childXids(void)
|
||||
|
||||
/*
|
||||
* We keep the child-XID arrays in TopTransactionContext (see
|
||||
* AtSubCommit_childXids). This means we'd better free the array
|
||||
* AtSubCommit_childXids). This means we'd better free the array
|
||||
* explicitly at abort to avoid leakage.
|
||||
*/
|
||||
if (s->childXids != NULL)
|
||||
@@ -1802,7 +1802,7 @@ StartTransaction(void)
|
||||
VirtualXactLockTableInsert(vxid);
|
||||
|
||||
/*
|
||||
* Advertise it in the proc array. We assume assignment of
|
||||
* Advertise it in the proc array. We assume assignment of
|
||||
* LocalTransactionID is atomic, and the backendId should be set already.
|
||||
*/
|
||||
Assert(MyProc->backendId == vxid.backendId);
|
||||
@@ -1899,7 +1899,7 @@ CommitTransaction(void)
|
||||
|
||||
/*
|
||||
* The remaining actions cannot call any user-defined code, so it's safe
|
||||
* to start shutting down within-transaction services. But note that most
|
||||
* to start shutting down within-transaction services. But note that most
|
||||
* of this stuff could still throw an error, which would switch us into
|
||||
* the transaction-abort path.
|
||||
*/
|
||||
@@ -2104,7 +2104,7 @@ PrepareTransaction(void)
|
||||
|
||||
/*
|
||||
* The remaining actions cannot call any user-defined code, so it's safe
|
||||
* to start shutting down within-transaction services. But note that most
|
||||
* to start shutting down within-transaction services. But note that most
|
||||
* of this stuff could still throw an error, which would switch us into
|
||||
* the transaction-abort path.
|
||||
*/
|
||||
@@ -2224,7 +2224,7 @@ PrepareTransaction(void)
|
||||
XactLastRecEnd = 0;
|
||||
|
||||
/*
|
||||
* Let others know about no transaction in progress by me. This has to be
|
||||
* Let others know about no transaction in progress by me. This has to be
|
||||
* done *after* the prepared transaction has been marked valid, else
|
||||
* someone may think it is unlocked and recyclable.
|
||||
*/
|
||||
@@ -2233,7 +2233,7 @@ PrepareTransaction(void)
|
||||
/*
|
||||
* This is all post-transaction cleanup. Note that if an error is raised
|
||||
* here, it's too late to abort the transaction. This should be just
|
||||
* noncritical resource releasing. See notes in CommitTransaction.
|
||||
* noncritical resource releasing. See notes in CommitTransaction.
|
||||
*/
|
||||
|
||||
CallXactCallbacks(XACT_EVENT_PREPARE);
|
||||
@@ -2411,7 +2411,7 @@ AbortTransaction(void)
|
||||
ProcArrayEndTransaction(MyProc, latestXid);
|
||||
|
||||
/*
|
||||
* Post-abort cleanup. See notes in CommitTransaction() concerning
|
||||
* Post-abort cleanup. See notes in CommitTransaction() concerning
|
||||
* ordering. We can skip all of it if the transaction failed before
|
||||
* creating a resource owner.
|
||||
*/
|
||||
@@ -2646,7 +2646,7 @@ CommitTransactionCommand(void)
|
||||
|
||||
/*
|
||||
* Here we were in a perfectly good transaction block but the user
|
||||
* told us to ROLLBACK anyway. We have to abort the transaction
|
||||
* told us to ROLLBACK anyway. We have to abort the transaction
|
||||
* and then clean up.
|
||||
*/
|
||||
case TBLOCK_ABORT_PENDING:
|
||||
@@ -2666,7 +2666,7 @@ CommitTransactionCommand(void)
|
||||
|
||||
/*
|
||||
* We were just issued a SAVEPOINT inside a transaction block.
|
||||
* Start a subtransaction. (DefineSavepoint already did
|
||||
* Start a subtransaction. (DefineSavepoint already did
|
||||
* PushTransaction, so as to have someplace to put the SUBBEGIN
|
||||
* state.)
|
||||
*/
|
||||
@@ -2870,7 +2870,7 @@ AbortCurrentTransaction(void)
|
||||
break;
|
||||
|
||||
/*
|
||||
* Here, we failed while trying to COMMIT. Clean up the
|
||||
* Here, we failed while trying to COMMIT. Clean up the
|
||||
* transaction and return to idle state (we do not want to stay in
|
||||
* the transaction).
|
||||
*/
|
||||
@@ -2932,7 +2932,7 @@ AbortCurrentTransaction(void)
|
||||
|
||||
/*
|
||||
* If we failed while trying to create a subtransaction, clean up
|
||||
* the broken subtransaction and abort the parent. The same
|
||||
* the broken subtransaction and abort the parent. The same
|
||||
* applies if we get a failure while ending a subtransaction.
|
||||
*/
|
||||
case TBLOCK_SUBBEGIN:
|
||||
@@ -3485,7 +3485,7 @@ UserAbortTransactionBlock(void)
|
||||
break;
|
||||
|
||||
/*
|
||||
* We are inside a subtransaction. Mark everything up to top
|
||||
* We are inside a subtransaction. Mark everything up to top
|
||||
* level as exitable.
|
||||
*/
|
||||
case TBLOCK_SUBINPROGRESS:
|
||||
@@ -3619,7 +3619,7 @@ ReleaseSavepoint(List *options)
|
||||
break;
|
||||
|
||||
/*
|
||||
* We are in a non-aborted subtransaction. This is the only valid
|
||||
* We are in a non-aborted subtransaction. This is the only valid
|
||||
* case.
|
||||
*/
|
||||
case TBLOCK_SUBINPROGRESS:
|
||||
@@ -3676,7 +3676,7 @@ ReleaseSavepoint(List *options)
|
||||
|
||||
/*
|
||||
* Mark "commit pending" all subtransactions up to the target
|
||||
* subtransaction. The actual commits will happen when control gets to
|
||||
* subtransaction. The actual commits will happen when control gets to
|
||||
* CommitTransactionCommand.
|
||||
*/
|
||||
xact = CurrentTransactionState;
|
||||
@@ -3775,7 +3775,7 @@ RollbackToSavepoint(List *options)
|
||||
|
||||
/*
|
||||
* Mark "abort pending" all subtransactions up to the target
|
||||
* subtransaction. The actual aborts will happen when control gets to
|
||||
* subtransaction. The actual aborts will happen when control gets to
|
||||
* CommitTransactionCommand.
|
||||
*/
|
||||
xact = CurrentTransactionState;
|
||||
@@ -4182,7 +4182,7 @@ CommitSubTransaction(void)
|
||||
CommandCounterIncrement();
|
||||
|
||||
/*
|
||||
* Prior to 8.4 we marked subcommit in clog at this point. We now only
|
||||
* Prior to 8.4 we marked subcommit in clog at this point. We now only
|
||||
* perform that step, if required, as part of the atomic update of the
|
||||
* whole transaction tree at top level commit or abort.
|
||||
*/
|
||||
@@ -4641,7 +4641,7 @@ TransStateAsString(TransState state)
|
||||
/*
|
||||
* xactGetCommittedChildren
|
||||
*
|
||||
* Gets the list of committed children of the current transaction. The return
|
||||
* Gets the list of committed children of the current transaction. The return
|
||||
* value is the number of child transactions. *ptr is set to point to an
|
||||
* array of TransactionIds. The array is allocated in TopTransactionContext;
|
||||
* the caller should *not* pfree() it (this is a change from pre-8.4 code!).
|
||||
|
||||
@@ -101,7 +101,7 @@ bool XLOG_DEBUG = false;
|
||||
* future XLOG segment as long as there aren't already XLOGfileslop future
|
||||
* segments; else we'll delete it. This could be made a separate GUC
|
||||
* variable, but at present I think it's sufficient to hardwire it as
|
||||
* 2*CheckPointSegments+1. Under normal conditions, a checkpoint will free
|
||||
* 2*CheckPointSegments+1. Under normal conditions, a checkpoint will free
|
||||
* no more than 2*CheckPointSegments log segments, and we want to recycle all
|
||||
* of them; the +1 allows boundary cases to happen without wasting a
|
||||
* delete/create-segment cycle.
|
||||
@@ -190,7 +190,7 @@ static bool LocalHotStandbyActive = false;
|
||||
* 0: unconditionally not allowed to insert XLOG
|
||||
* -1: must check RecoveryInProgress(); disallow until it is false
|
||||
* Most processes start with -1 and transition to 1 after seeing that recovery
|
||||
* is not in progress. But we can also force the value for special cases.
|
||||
* is not in progress. But we can also force the value for special cases.
|
||||
* The coding in XLogInsertAllowed() depends on the first two of these states
|
||||
* being numerically the same as bool true and false.
|
||||
*/
|
||||
@@ -223,7 +223,7 @@ static bool recoveryPauseAtTarget = true;
|
||||
static TransactionId recoveryTargetXid;
|
||||
static TimestampTz recoveryTargetTime;
|
||||
static char *recoveryTargetName;
|
||||
static int min_recovery_apply_delay = 0;
|
||||
static int min_recovery_apply_delay = 0;
|
||||
static TimestampTz recoveryDelayUntilTime;
|
||||
|
||||
/* options taken from recovery.conf for XLOG streaming */
|
||||
@@ -261,7 +261,7 @@ static bool recoveryStopAfter;
|
||||
*
|
||||
* expectedTLEs: a list of TimeLineHistoryEntries for recoveryTargetTLI and the timelines of
|
||||
* its known parents, newest first (so recoveryTargetTLI is always the
|
||||
* first list member). Only these TLIs are expected to be seen in the WAL
|
||||
* first list member). Only these TLIs are expected to be seen in the WAL
|
||||
* segments we read, and indeed only these TLIs will be considered as
|
||||
* candidate WAL files to open at all.
|
||||
*
|
||||
@@ -290,7 +290,7 @@ XLogRecPtr XactLastRecEnd = InvalidXLogRecPtr;
|
||||
/*
|
||||
* RedoRecPtr is this backend's local copy of the REDO record pointer
|
||||
* (which is almost but not quite the same as a pointer to the most recent
|
||||
* CHECKPOINT record). We update this from the shared-memory copy,
|
||||
* CHECKPOINT record). We update this from the shared-memory copy,
|
||||
* XLogCtl->Insert.RedoRecPtr, whenever we can safely do so (ie, when we
|
||||
* hold an insertion lock). See XLogInsert for details. We are also allowed
|
||||
* to update from XLogCtl->RedoRecPtr if we hold the info_lck;
|
||||
@@ -418,11 +418,11 @@ typedef struct XLogCtlInsert
|
||||
slock_t insertpos_lck; /* protects CurrBytePos and PrevBytePos */
|
||||
|
||||
/*
|
||||
* CurrBytePos is the end of reserved WAL. The next record will be inserted
|
||||
* at that position. PrevBytePos is the start position of the previously
|
||||
* inserted (or rather, reserved) record - it is copied to the prev-link
|
||||
* of the next record. These are stored as "usable byte positions" rather
|
||||
* than XLogRecPtrs (see XLogBytePosToRecPtr()).
|
||||
* CurrBytePos is the end of reserved WAL. The next record will be
|
||||
* inserted at that position. PrevBytePos is the start position of the
|
||||
* previously inserted (or rather, reserved) record - it is copied to the
|
||||
* prev-link of the next record. These are stored as "usable byte
|
||||
* positions" rather than XLogRecPtrs (see XLogBytePosToRecPtr()).
|
||||
*/
|
||||
uint64 CurrBytePos;
|
||||
uint64 PrevBytePos;
|
||||
@@ -464,7 +464,7 @@ typedef struct XLogCtlInsert
|
||||
/*
|
||||
* WAL insertion locks.
|
||||
*/
|
||||
WALInsertLockPadded *WALInsertLocks;
|
||||
WALInsertLockPadded *WALInsertLocks;
|
||||
LWLockTranche WALInsertLockTranche;
|
||||
int WALInsertLockTrancheId;
|
||||
} XLogCtlInsert;
|
||||
@@ -504,10 +504,11 @@ typedef struct XLogCtlData
|
||||
* Latest initialized page in the cache (last byte position + 1).
|
||||
*
|
||||
* To change the identity of a buffer (and InitializedUpTo), you need to
|
||||
* hold WALBufMappingLock. To change the identity of a buffer that's still
|
||||
* dirty, the old page needs to be written out first, and for that you
|
||||
* need WALWriteLock, and you need to ensure that there are no in-progress
|
||||
* insertions to the page by calling WaitXLogInsertionsToFinish().
|
||||
* hold WALBufMappingLock. To change the identity of a buffer that's
|
||||
* still dirty, the old page needs to be written out first, and for that
|
||||
* you need WALWriteLock, and you need to ensure that there are no
|
||||
* in-progress insertions to the page by calling
|
||||
* WaitXLogInsertionsToFinish().
|
||||
*/
|
||||
XLogRecPtr InitializedUpTo;
|
||||
|
||||
@@ -799,8 +800,8 @@ static void rm_redo_error_callback(void *arg);
|
||||
static int get_sync_bit(int method);
|
||||
|
||||
static void CopyXLogRecordToWAL(int write_len, bool isLogSwitch,
|
||||
XLogRecData *rdata,
|
||||
XLogRecPtr StartPos, XLogRecPtr EndPos);
|
||||
XLogRecData *rdata,
|
||||
XLogRecPtr StartPos, XLogRecPtr EndPos);
|
||||
static void ReserveXLogInsertLocation(int size, XLogRecPtr *StartPos,
|
||||
XLogRecPtr *EndPos, XLogRecPtr *PrevPtr);
|
||||
static bool ReserveXLogSwitch(XLogRecPtr *StartPos, XLogRecPtr *EndPos,
|
||||
@@ -860,6 +861,7 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)
|
||||
if (rechdr == NULL)
|
||||
{
|
||||
static char rechdrbuf[SizeOfXLogRecord + MAXIMUM_ALIGNOF];
|
||||
|
||||
rechdr = (XLogRecord *) MAXALIGN(&rechdrbuf);
|
||||
MemSet(rechdr, 0, SizeOfXLogRecord);
|
||||
}
|
||||
@@ -1075,12 +1077,12 @@ begin:;
|
||||
* record to the shared WAL buffer cache is a two-step process:
|
||||
*
|
||||
* 1. Reserve the right amount of space from the WAL. The current head of
|
||||
* reserved space is kept in Insert->CurrBytePos, and is protected by
|
||||
* insertpos_lck.
|
||||
* reserved space is kept in Insert->CurrBytePos, and is protected by
|
||||
* insertpos_lck.
|
||||
*
|
||||
* 2. Copy the record to the reserved WAL space. This involves finding the
|
||||
* correct WAL buffer containing the reserved space, and copying the
|
||||
* record in place. This can be done concurrently in multiple processes.
|
||||
* correct WAL buffer containing the reserved space, and copying the
|
||||
* record in place. This can be done concurrently in multiple processes.
|
||||
*
|
||||
* To keep track of which insertions are still in-progress, each concurrent
|
||||
* inserter acquires an insertion lock. In addition to just indicating that
|
||||
@@ -1232,6 +1234,7 @@ begin:;
|
||||
{
|
||||
TRACE_POSTGRESQL_XLOG_SWITCH();
|
||||
XLogFlush(EndPos);
|
||||
|
||||
/*
|
||||
* Even though we reserved the rest of the segment for us, which is
|
||||
* reflected in EndPos, we return a pointer to just the end of the
|
||||
@@ -1272,7 +1275,7 @@ begin:;
|
||||
rdt_lastnormal->next = NULL;
|
||||
|
||||
initStringInfo(&recordbuf);
|
||||
for (;rdata != NULL; rdata = rdata->next)
|
||||
for (; rdata != NULL; rdata = rdata->next)
|
||||
appendBinaryStringInfo(&recordbuf, rdata->data, rdata->len);
|
||||
|
||||
appendStringInfoString(&buf, " - ");
|
||||
@@ -1514,8 +1517,8 @@ CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
|
||||
|
||||
/*
|
||||
* If this was an xlog-switch, it's not enough to write the switch record,
|
||||
* we also have to consume all the remaining space in the WAL segment.
|
||||
* We have already reserved it for us, but we still need to make sure it's
|
||||
* we also have to consume all the remaining space in the WAL segment. We
|
||||
* have already reserved it for us, but we still need to make sure it's
|
||||
* allocated and zeroed in the WAL buffers so that when the caller (or
|
||||
* someone else) does XLogWrite(), it can really write out all the zeros.
|
||||
*/
|
||||
@@ -1556,14 +1559,14 @@ WALInsertLockAcquire(void)
|
||||
|
||||
/*
|
||||
* It doesn't matter which of the WAL insertion locks we acquire, so try
|
||||
* the one we used last time. If the system isn't particularly busy,
|
||||
* it's a good bet that it's still available, and it's good to have some
|
||||
* the one we used last time. If the system isn't particularly busy, it's
|
||||
* a good bet that it's still available, and it's good to have some
|
||||
* affinity to a particular lock so that you don't unnecessarily bounce
|
||||
* cache lines between processes when there's no contention.
|
||||
*
|
||||
* If this is the first time through in this backend, pick a lock
|
||||
* (semi-)randomly. This allows the locks to be used evenly if you have
|
||||
* a lot of very short connections.
|
||||
* (semi-)randomly. This allows the locks to be used evenly if you have a
|
||||
* lot of very short connections.
|
||||
*/
|
||||
static int lockToTry = -1;
|
||||
|
||||
@@ -1583,10 +1586,10 @@ WALInsertLockAcquire(void)
|
||||
/*
|
||||
* If we couldn't get the lock immediately, try another lock next
|
||||
* time. On a system with more insertion locks than concurrent
|
||||
* inserters, this causes all the inserters to eventually migrate
|
||||
* to a lock that no-one else is using. On a system with more
|
||||
* inserters than locks, it still helps to distribute the inserters
|
||||
* evenly across the locks.
|
||||
* inserters, this causes all the inserters to eventually migrate to a
|
||||
* lock that no-one else is using. On a system with more inserters
|
||||
* than locks, it still helps to distribute the inserters evenly
|
||||
* across the locks.
|
||||
*/
|
||||
lockToTry = (lockToTry + 1) % num_xloginsert_locks;
|
||||
}
|
||||
@@ -1604,8 +1607,8 @@ WALInsertLockAcquireExclusive(void)
|
||||
/*
|
||||
* When holding all the locks, we only update the last lock's insertingAt
|
||||
* indicator. The others are set to 0xFFFFFFFFFFFFFFFF, which is higher
|
||||
* than any real XLogRecPtr value, to make sure that no-one blocks
|
||||
* waiting on those.
|
||||
* than any real XLogRecPtr value, to make sure that no-one blocks waiting
|
||||
* on those.
|
||||
*/
|
||||
for (i = 0; i < num_xloginsert_locks - 1; i++)
|
||||
{
|
||||
@@ -1655,7 +1658,7 @@ WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt)
|
||||
* WALInsertLockAcquireExclusive.
|
||||
*/
|
||||
LWLockUpdateVar(&WALInsertLocks[num_xloginsert_locks - 1].l.lock,
|
||||
&WALInsertLocks[num_xloginsert_locks - 1].l.insertingAt,
|
||||
&WALInsertLocks[num_xloginsert_locks - 1].l.insertingAt,
|
||||
insertingAt);
|
||||
}
|
||||
else
|
||||
@@ -1716,15 +1719,16 @@ WaitXLogInsertionsToFinish(XLogRecPtr upto)
|
||||
* Loop through all the locks, sleeping on any in-progress insert older
|
||||
* than 'upto'.
|
||||
*
|
||||
* finishedUpto is our return value, indicating the point upto which
|
||||
* all the WAL insertions have been finished. Initialize it to the head
|
||||
* of reserved WAL, and as we iterate through the insertion locks, back it
|
||||
* finishedUpto is our return value, indicating the point upto which all
|
||||
* the WAL insertions have been finished. Initialize it to the head of
|
||||
* reserved WAL, and as we iterate through the insertion locks, back it
|
||||
* out for any insertion that's still in progress.
|
||||
*/
|
||||
finishedUpto = reservedUpto;
|
||||
for (i = 0; i < num_xloginsert_locks; i++)
|
||||
{
|
||||
XLogRecPtr insertingat = InvalidXLogRecPtr;
|
||||
XLogRecPtr insertingat = InvalidXLogRecPtr;
|
||||
|
||||
do
|
||||
{
|
||||
/*
|
||||
@@ -1797,9 +1801,9 @@ GetXLogBuffer(XLogRecPtr ptr)
|
||||
}
|
||||
|
||||
/*
|
||||
* The XLog buffer cache is organized so that a page is always loaded
|
||||
* to a particular buffer. That way we can easily calculate the buffer
|
||||
* a given page must be loaded into, from the XLogRecPtr alone.
|
||||
* The XLog buffer cache is organized so that a page is always loaded to a
|
||||
* particular buffer. That way we can easily calculate the buffer a given
|
||||
* page must be loaded into, from the XLogRecPtr alone.
|
||||
*/
|
||||
idx = XLogRecPtrToBufIdx(ptr);
|
||||
|
||||
@@ -1827,8 +1831,8 @@ GetXLogBuffer(XLogRecPtr ptr)
|
||||
if (expectedEndPtr != endptr)
|
||||
{
|
||||
/*
|
||||
* Let others know that we're finished inserting the record up
|
||||
* to the page boundary.
|
||||
* Let others know that we're finished inserting the record up to the
|
||||
* page boundary.
|
||||
*/
|
||||
WALInsertLockUpdateInsertingAt(expectedEndPtr - XLOG_BLCKSZ);
|
||||
|
||||
@@ -1837,7 +1841,7 @@ GetXLogBuffer(XLogRecPtr ptr)
|
||||
|
||||
if (expectedEndPtr != endptr)
|
||||
elog(PANIC, "could not find WAL buffer for %X/%X",
|
||||
(uint32) (ptr >> 32) , (uint32) ptr);
|
||||
(uint32) (ptr >> 32), (uint32) ptr);
|
||||
}
|
||||
else
|
||||
{
|
||||
@@ -1974,8 +1978,8 @@ XLogRecPtrToBytePos(XLogRecPtr ptr)
|
||||
else
|
||||
{
|
||||
result = fullsegs * UsableBytesInSegment +
|
||||
(XLOG_BLCKSZ - SizeOfXLogLongPHD) + /* account for first page */
|
||||
(fullpages - 1) * UsableBytesInPage; /* full pages */
|
||||
(XLOG_BLCKSZ - SizeOfXLogLongPHD) + /* account for first page */
|
||||
(fullpages - 1) * UsableBytesInPage; /* full pages */
|
||||
if (offset > 0)
|
||||
{
|
||||
Assert(offset >= SizeOfXLogShortPHD);
|
||||
@@ -2170,8 +2174,8 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
|
||||
}
|
||||
|
||||
/*
|
||||
* Now the next buffer slot is free and we can set it up to be the next
|
||||
* output page.
|
||||
* Now the next buffer slot is free and we can set it up to be the
|
||||
* next output page.
|
||||
*/
|
||||
NewPageBeginPtr = XLogCtl->InitializedUpTo;
|
||||
NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ;
|
||||
@@ -2194,7 +2198,8 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
|
||||
/* NewPage->xlp_info = 0; */ /* done by memset */
|
||||
NewPage ->xlp_tli = ThisTimeLineID;
|
||||
NewPage ->xlp_pageaddr = NewPageBeginPtr;
|
||||
/* NewPage->xlp_rem_len = 0; */ /* done by memset */
|
||||
|
||||
/* NewPage->xlp_rem_len = 0; */ /* done by memset */
|
||||
|
||||
/*
|
||||
* If online backup is not in progress, mark the header to indicate
|
||||
@@ -2202,12 +2207,12 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
|
||||
* blocks. This allows the WAL archiver to know whether it is safe to
|
||||
* compress archived WAL data by transforming full-block records into
|
||||
* the non-full-block format. It is sufficient to record this at the
|
||||
* page level because we force a page switch (in fact a segment switch)
|
||||
* when starting a backup, so the flag will be off before any records
|
||||
* can be written during the backup. At the end of a backup, the last
|
||||
* page will be marked as all unsafe when perhaps only part is unsafe,
|
||||
* but at worst the archiver would miss the opportunity to compress a
|
||||
* few records.
|
||||
* page level because we force a page switch (in fact a segment
|
||||
* switch) when starting a backup, so the flag will be off before any
|
||||
* records can be written during the backup. At the end of a backup,
|
||||
* the last page will be marked as all unsafe when perhaps only part
|
||||
* is unsafe, but at worst the archiver would miss the opportunity to
|
||||
* compress a few records.
|
||||
*/
|
||||
if (!Insert->forcePageWrites)
|
||||
NewPage ->xlp_info |= XLP_BKP_REMOVABLE;
|
||||
@@ -2329,7 +2334,8 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
|
||||
* if we're passed a bogus WriteRqst.Write that is past the end of the
|
||||
* last page that's been initialized by AdvanceXLInsertBuffer.
|
||||
*/
|
||||
XLogRecPtr EndPtr = XLogCtl->xlblocks[curridx];
|
||||
XLogRecPtr EndPtr = XLogCtl->xlblocks[curridx];
|
||||
|
||||
if (LogwrtResult.Write >= EndPtr)
|
||||
elog(PANIC, "xlog write request %X/%X is past end of log %X/%X",
|
||||
(uint32) (LogwrtResult.Write >> 32),
|
||||
@@ -2413,7 +2419,7 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
|
||||
do
|
||||
{
|
||||
errno = 0;
|
||||
written = write(openLogFile, from, nleft);
|
||||
written = write(openLogFile, from, nleft);
|
||||
if (written <= 0)
|
||||
{
|
||||
if (errno == EINTR)
|
||||
@@ -2422,7 +2428,7 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
|
||||
(errcode_for_file_access(),
|
||||
errmsg("could not write to log file %s "
|
||||
"at offset %u, length %zu: %m",
|
||||
XLogFileNameP(ThisTimeLineID, openLogSegNo),
|
||||
XLogFileNameP(ThisTimeLineID, openLogSegNo),
|
||||
openLogOff, nbytes)));
|
||||
}
|
||||
nleft -= written;
|
||||
@@ -2500,7 +2506,7 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
|
||||
{
|
||||
/*
|
||||
* Could get here without iterating above loop, in which case we might
|
||||
* have no open file or the wrong one. However, we do not need to
|
||||
* have no open file or the wrong one. However, we do not need to
|
||||
* fsync more than one file.
|
||||
*/
|
||||
if (sync_method != SYNC_METHOD_OPEN &&
|
||||
@@ -2569,7 +2575,7 @@ XLogSetAsyncXactLSN(XLogRecPtr asyncXactLSN)
|
||||
|
||||
/*
|
||||
* If the WALWriter is sleeping, we should kick it to make it come out of
|
||||
* low-power mode. Otherwise, determine whether there's a full page of
|
||||
* low-power mode. Otherwise, determine whether there's a full page of
|
||||
* WAL available to write.
|
||||
*/
|
||||
if (!sleeping)
|
||||
@@ -2616,7 +2622,8 @@ XLogGetReplicationSlotMinimumLSN(void)
|
||||
{
|
||||
/* use volatile pointer to prevent code rearrangement */
|
||||
volatile XLogCtlData *xlogctl = XLogCtl;
|
||||
XLogRecPtr retval;
|
||||
XLogRecPtr retval;
|
||||
|
||||
SpinLockAcquire(&xlogctl->info_lck);
|
||||
retval = xlogctl->replicationSlotMinLSN;
|
||||
SpinLockRelease(&xlogctl->info_lck);
|
||||
@@ -2883,9 +2890,9 @@ XLogFlush(XLogRecPtr record)
|
||||
* We normally flush only completed blocks; but if there is nothing to do on
|
||||
* that basis, we check for unflushed async commits in the current incomplete
|
||||
* block, and flush through the latest one of those. Thus, if async commits
|
||||
* are not being used, we will flush complete blocks only. We can guarantee
|
||||
* are not being used, we will flush complete blocks only. We can guarantee
|
||||
* that async commits reach disk after at most three cycles; normally only
|
||||
* one or two. (When flushing complete blocks, we allow XLogWrite to write
|
||||
* one or two. (When flushing complete blocks, we allow XLogWrite to write
|
||||
* "flexibly", meaning it can stop at the end of the buffer ring; this makes a
|
||||
* difference only with very high load or long wal_writer_delay, but imposes
|
||||
* one extra cycle for the worst case for async commits.)
|
||||
@@ -3060,7 +3067,7 @@ XLogNeedsFlush(XLogRecPtr record)
|
||||
* log, seg: identify segment to be created/opened.
|
||||
*
|
||||
* *use_existent: if TRUE, OK to use a pre-existing file (else, any
|
||||
* pre-existing file will be deleted). On return, TRUE if a pre-existing
|
||||
* pre-existing file will be deleted). On return, TRUE if a pre-existing
|
||||
* file was used.
|
||||
*
|
||||
* use_lock: if TRUE, acquire ControlFileLock while moving file into
|
||||
@@ -3127,11 +3134,11 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
|
||||
errmsg("could not create file \"%s\": %m", tmppath)));
|
||||
|
||||
/*
|
||||
* Zero-fill the file. We have to do this the hard way to ensure that all
|
||||
* Zero-fill the file. We have to do this the hard way to ensure that all
|
||||
* the file space has really been allocated --- on platforms that allow
|
||||
* "holes" in files, just seeking to the end doesn't allocate intermediate
|
||||
* space. This way, we know that we have all the space and (after the
|
||||
* fsync below) that all the indirect blocks are down on disk. Therefore,
|
||||
* fsync below) that all the indirect blocks are down on disk. Therefore,
|
||||
* fdatasync(2) or O_DSYNC will be sufficient to sync future writes to the
|
||||
* log file.
|
||||
*
|
||||
@@ -3223,7 +3230,7 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
|
||||
* a different timeline)
|
||||
*
|
||||
* Currently this is only used during recovery, and so there are no locking
|
||||
* considerations. But we should be just as tense as XLogFileInit to avoid
|
||||
* considerations. But we should be just as tense as XLogFileInit to avoid
|
||||
* emplacing a bogus file.
|
||||
*/
|
||||
static void
|
||||
@@ -3434,7 +3441,7 @@ XLogFileOpen(XLogSegNo segno)
|
||||
if (fd < 0)
|
||||
ereport(PANIC,
|
||||
(errcode_for_file_access(),
|
||||
errmsg("could not open transaction log file \"%s\": %m", path)));
|
||||
errmsg("could not open transaction log file \"%s\": %m", path)));
|
||||
|
||||
return fd;
|
||||
}
|
||||
@@ -3541,13 +3548,13 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source)
|
||||
* the timelines listed in expectedTLEs.
|
||||
*
|
||||
* We expect curFileTLI on entry to be the TLI of the preceding file in
|
||||
* sequence, or 0 if there was no predecessor. We do not allow curFileTLI
|
||||
* sequence, or 0 if there was no predecessor. We do not allow curFileTLI
|
||||
* to go backwards; this prevents us from picking up the wrong file when a
|
||||
* parent timeline extends to higher segment numbers than the child we
|
||||
* want to read.
|
||||
*
|
||||
* If we haven't read the timeline history file yet, read it now, so that
|
||||
* we know which TLIs to scan. We don't save the list in expectedTLEs,
|
||||
* we know which TLIs to scan. We don't save the list in expectedTLEs,
|
||||
* however, unless we actually find a valid segment. That way if there is
|
||||
* neither a timeline history file nor a WAL segment in the archive, and
|
||||
* streaming replication is set up, we'll read the timeline history file
|
||||
@@ -3611,7 +3618,7 @@ XLogFileClose(void)
|
||||
|
||||
/*
|
||||
* WAL segment files will not be re-read in normal operation, so we advise
|
||||
* the OS to release any cached pages. But do not do so if WAL archiving
|
||||
* the OS to release any cached pages. But do not do so if WAL archiving
|
||||
* or streaming is active, because archiver and walsender process could
|
||||
* use the cache to read the WAL segment.
|
||||
*/
|
||||
@@ -3777,7 +3784,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
|
||||
{
|
||||
/*
|
||||
* We ignore the timeline part of the XLOG segment identifiers in
|
||||
* deciding whether a segment is still needed. This ensures that we
|
||||
* deciding whether a segment is still needed. This ensures that we
|
||||
* won't prematurely remove a segment from a parent timeline. We could
|
||||
* probably be a little more proactive about removing segments of
|
||||
* non-parent timelines, but that would be a whole lot more
|
||||
@@ -3828,6 +3835,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
|
||||
xlde->d_name)));
|
||||
|
||||
#ifdef WIN32
|
||||
|
||||
/*
|
||||
* On Windows, if another process (e.g another backend)
|
||||
* holds the file open in FILE_SHARE_DELETE mode, unlink
|
||||
@@ -4310,7 +4318,7 @@ rescanLatestTimeLine(void)
|
||||
* I/O routines for pg_control
|
||||
*
|
||||
* *ControlFile is a buffer in shared memory that holds an image of the
|
||||
* contents of pg_control. WriteControlFile() initializes pg_control
|
||||
* contents of pg_control. WriteControlFile() initializes pg_control
|
||||
* given a preloaded buffer, ReadControlFile() loads the buffer from
|
||||
* the pg_control file (during postmaster or standalone-backend startup),
|
||||
* and UpdateControlFile() rewrites pg_control after we modify xlog state.
|
||||
@@ -4715,7 +4723,7 @@ check_wal_buffers(int *newval, void **extra, GucSource source)
|
||||
{
|
||||
/*
|
||||
* If we haven't yet changed the boot_val default of -1, just let it
|
||||
* be. We'll fix it when XLOGShmemSize is called.
|
||||
* be. We'll fix it when XLOGShmemSize is called.
|
||||
*/
|
||||
if (XLOGbuffers == -1)
|
||||
return true;
|
||||
@@ -4815,7 +4823,7 @@ XLOGShmemInit(void)
|
||||
|
||||
/* WAL insertion locks. Ensure they're aligned to the full padded size */
|
||||
allocptr += sizeof(WALInsertLockPadded) -
|
||||
((uintptr_t) allocptr) % sizeof(WALInsertLockPadded);
|
||||
((uintptr_t) allocptr) %sizeof(WALInsertLockPadded);
|
||||
WALInsertLocks = XLogCtl->Insert.WALInsertLocks =
|
||||
(WALInsertLockPadded *) allocptr;
|
||||
allocptr += sizeof(WALInsertLockPadded) * num_xloginsert_locks;
|
||||
@@ -4836,8 +4844,8 @@ XLOGShmemInit(void)
|
||||
|
||||
/*
|
||||
* Align the start of the page buffers to a full xlog block size boundary.
|
||||
* This simplifies some calculations in XLOG insertion. It is also required
|
||||
* for O_DIRECT.
|
||||
* This simplifies some calculations in XLOG insertion. It is also
|
||||
* required for O_DIRECT.
|
||||
*/
|
||||
allocptr = (char *) TYPEALIGN(XLOG_BLCKSZ, allocptr);
|
||||
XLogCtl->pages = allocptr;
|
||||
@@ -5233,7 +5241,7 @@ readRecoveryCommandFile(void)
|
||||
const char *hintmsg;
|
||||
|
||||
if (!parse_int(item->value, &min_recovery_apply_delay, GUC_UNIT_MS,
|
||||
&hintmsg))
|
||||
&hintmsg))
|
||||
ereport(ERROR,
|
||||
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
||||
errmsg("parameter \"%s\" requires a temporal value", "min_recovery_apply_delay"),
|
||||
@@ -5271,7 +5279,7 @@ readRecoveryCommandFile(void)
|
||||
|
||||
/*
|
||||
* If user specified recovery_target_timeline, validate it or compute the
|
||||
* "latest" value. We can't do this until after we've gotten the restore
|
||||
* "latest" value. We can't do this until after we've gotten the restore
|
||||
* command and set InArchiveRecovery, because we need to fetch timeline
|
||||
* history files from the archive.
|
||||
*/
|
||||
@@ -5464,8 +5472,8 @@ recoveryStopsBefore(XLogRecord *record)
|
||||
*
|
||||
* when testing for an xid, we MUST test for equality only, since
|
||||
* transactions are numbered in the order they start, not the order
|
||||
* they complete. A higher numbered xid will complete before you
|
||||
* about 50% of the time...
|
||||
* they complete. A higher numbered xid will complete before you about
|
||||
* 50% of the time...
|
||||
*/
|
||||
stopsHere = (record->xl_xid == recoveryTargetXid);
|
||||
}
|
||||
@@ -5525,8 +5533,8 @@ recoveryStopsAfter(XLogRecord *record)
|
||||
record_info = record->xl_info & ~XLR_INFO_MASK;
|
||||
|
||||
/*
|
||||
* There can be many restore points that share the same name; we stop
|
||||
* at the first one.
|
||||
* There can be many restore points that share the same name; we stop at
|
||||
* the first one.
|
||||
*/
|
||||
if (recoveryTarget == RECOVERY_TARGET_NAME &&
|
||||
record->xl_rmid == RM_XLOG_ID && record_info == XLOG_RESTORE_POINT)
|
||||
@@ -5543,9 +5551,9 @@ recoveryStopsAfter(XLogRecord *record)
|
||||
strlcpy(recoveryStopName, recordRestorePointData->rp_name, MAXFNAMELEN);
|
||||
|
||||
ereport(LOG,
|
||||
(errmsg("recovery stopping at restore point \"%s\", time %s",
|
||||
recoveryStopName,
|
||||
timestamptz_to_str(recoveryStopTime))));
|
||||
(errmsg("recovery stopping at restore point \"%s\", time %s",
|
||||
recoveryStopName,
|
||||
timestamptz_to_str(recoveryStopTime))));
|
||||
return true;
|
||||
}
|
||||
}
|
||||
@@ -5688,10 +5696,10 @@ recoveryApplyDelay(XLogRecord *record)
|
||||
/*
|
||||
* Is it a COMMIT record?
|
||||
*
|
||||
* We deliberately choose not to delay aborts since they have no effect
|
||||
* on MVCC. We already allow replay of records that don't have a
|
||||
* timestamp, so there is already opportunity for issues caused by early
|
||||
* conflicts on standbys.
|
||||
* We deliberately choose not to delay aborts since they have no effect on
|
||||
* MVCC. We already allow replay of records that don't have a timestamp,
|
||||
* so there is already opportunity for issues caused by early conflicts on
|
||||
* standbys.
|
||||
*/
|
||||
record_info = record->xl_info & ~XLR_INFO_MASK;
|
||||
if (!(record->xl_rmid == RM_XACT_ID &&
|
||||
@@ -5711,7 +5719,7 @@ recoveryApplyDelay(XLogRecord *record)
|
||||
*/
|
||||
TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime,
|
||||
&secs, µsecs);
|
||||
if (secs <= 0 && microsecs <=0)
|
||||
if (secs <= 0 && microsecs <= 0)
|
||||
return false;
|
||||
|
||||
while (true)
|
||||
@@ -5731,15 +5739,15 @@ recoveryApplyDelay(XLogRecord *record)
|
||||
TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime,
|
||||
&secs, µsecs);
|
||||
|
||||
if (secs <= 0 && microsecs <=0)
|
||||
if (secs <= 0 && microsecs <= 0)
|
||||
break;
|
||||
|
||||
elog(DEBUG2, "recovery apply delay %ld seconds, %d milliseconds",
|
||||
secs, microsecs / 1000);
|
||||
secs, microsecs / 1000);
|
||||
|
||||
WaitLatch(&XLogCtl->recoveryWakeupLatch,
|
||||
WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
|
||||
secs * 1000L + microsecs / 1000);
|
||||
WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
|
||||
secs * 1000L + microsecs / 1000);
|
||||
}
|
||||
return true;
|
||||
}
|
||||
@@ -5978,7 +5986,7 @@ StartupXLOG(void)
|
||||
ValidateXLOGDirectoryStructure();
|
||||
|
||||
/*
|
||||
* Clear out any old relcache cache files. This is *necessary* if we do
|
||||
* Clear out any old relcache cache files. This is *necessary* if we do
|
||||
* any WAL replay, since that would probably result in the cache files
|
||||
* being out of sync with database reality. In theory we could leave them
|
||||
* in place if the database had been cleanly shut down, but it seems
|
||||
@@ -6050,7 +6058,7 @@ StartupXLOG(void)
|
||||
ereport(ERROR,
|
||||
(errcode(ERRCODE_OUT_OF_MEMORY),
|
||||
errmsg("out of memory"),
|
||||
errdetail("Failed while allocating an XLog reading processor.")));
|
||||
errdetail("Failed while allocating an XLog reading processor.")));
|
||||
xlogreader->system_identifier = ControlFile->system_identifier;
|
||||
|
||||
if (read_backup_label(&checkPointLoc, &backupEndRequired,
|
||||
@@ -6261,9 +6269,9 @@ StartupXLOG(void)
|
||||
StartupReorderBuffer();
|
||||
|
||||
/*
|
||||
* Startup MultiXact. We need to do this early for two reasons: one
|
||||
* is that we might try to access multixacts when we do tuple freezing,
|
||||
* and the other is we need its state initialized because we attempt
|
||||
* Startup MultiXact. We need to do this early for two reasons: one is
|
||||
* that we might try to access multixacts when we do tuple freezing, and
|
||||
* the other is we need its state initialized because we attempt
|
||||
* truncation during restartpoints.
|
||||
*/
|
||||
StartupMultiXact();
|
||||
@@ -6517,9 +6525,9 @@ StartupXLOG(void)
|
||||
}
|
||||
|
||||
/*
|
||||
* Initialize shared variables for tracking progress of WAL replay,
|
||||
* as if we had just replayed the record before the REDO location
|
||||
* (or the checkpoint record itself, if it's a shutdown checkpoint).
|
||||
* Initialize shared variables for tracking progress of WAL replay, as
|
||||
* if we had just replayed the record before the REDO location (or the
|
||||
* checkpoint record itself, if it's a shutdown checkpoint).
|
||||
*/
|
||||
SpinLockAcquire(&xlogctl->info_lck);
|
||||
if (checkPoint.redo < RecPtr)
|
||||
@@ -6646,17 +6654,17 @@ StartupXLOG(void)
|
||||
}
|
||||
|
||||
/*
|
||||
* If we've been asked to lag the master, wait on
|
||||
* latch until enough time has passed.
|
||||
* If we've been asked to lag the master, wait on latch until
|
||||
* enough time has passed.
|
||||
*/
|
||||
if (recoveryApplyDelay(record))
|
||||
{
|
||||
/*
|
||||
* We test for paused recovery again here. If
|
||||
* user sets delayed apply, it may be because
|
||||
* they expect to pause recovery in case of
|
||||
* problems, so we must test again here otherwise
|
||||
* pausing during the delay-wait wouldn't work.
|
||||
* We test for paused recovery again here. If user sets
|
||||
* delayed apply, it may be because they expect to pause
|
||||
* recovery in case of problems, so we must test again
|
||||
* here otherwise pausing during the delay-wait wouldn't
|
||||
* work.
|
||||
*/
|
||||
if (xlogctl->recoveryPause)
|
||||
recoveryPausesHere();
|
||||
@@ -6893,8 +6901,8 @@ StartupXLOG(void)
|
||||
/*
|
||||
* Consider whether we need to assign a new timeline ID.
|
||||
*
|
||||
* If we are doing an archive recovery, we always assign a new ID. This
|
||||
* handles a couple of issues. If we stopped short of the end of WAL
|
||||
* If we are doing an archive recovery, we always assign a new ID. This
|
||||
* handles a couple of issues. If we stopped short of the end of WAL
|
||||
* during recovery, then we are clearly generating a new timeline and must
|
||||
* assign it a unique new ID. Even if we ran to the end, modifying the
|
||||
* current last segment is problematic because it may result in trying to
|
||||
@@ -6969,7 +6977,7 @@ StartupXLOG(void)
|
||||
|
||||
/*
|
||||
* Tricky point here: readBuf contains the *last* block that the LastRec
|
||||
* record spans, not the one it starts in. The last block is indeed the
|
||||
* record spans, not the one it starts in. The last block is indeed the
|
||||
* one we want to use.
|
||||
*/
|
||||
if (EndOfLog % XLOG_BLCKSZ != 0)
|
||||
@@ -6996,9 +7004,9 @@ StartupXLOG(void)
|
||||
else
|
||||
{
|
||||
/*
|
||||
* There is no partial block to copy. Just set InitializedUpTo,
|
||||
* and let the first attempt to insert a log record to initialize
|
||||
* the next buffer.
|
||||
* There is no partial block to copy. Just set InitializedUpTo, and
|
||||
* let the first attempt to insert a log record to initialize the next
|
||||
* buffer.
|
||||
*/
|
||||
XLogCtl->InitializedUpTo = EndOfLog;
|
||||
}
|
||||
@@ -7162,7 +7170,7 @@ StartupXLOG(void)
|
||||
XLogReportParameters();
|
||||
|
||||
/*
|
||||
* All done. Allow backends to write WAL. (Although the bool flag is
|
||||
* All done. Allow backends to write WAL. (Although the bool flag is
|
||||
* probably atomic in itself, we use the info_lck here to ensure that
|
||||
* there are no race conditions concerning visibility of other recent
|
||||
* updates to shared memory.)
|
||||
@@ -7200,7 +7208,7 @@ StartupXLOG(void)
|
||||
static void
|
||||
CheckRecoveryConsistency(void)
|
||||
{
|
||||
XLogRecPtr lastReplayedEndRecPtr;
|
||||
XLogRecPtr lastReplayedEndRecPtr;
|
||||
|
||||
/*
|
||||
* During crash recovery, we don't reach a consistent state until we've
|
||||
@@ -7322,7 +7330,7 @@ RecoveryInProgress(void)
|
||||
/*
|
||||
* Initialize TimeLineID and RedoRecPtr when we discover that recovery
|
||||
* is finished. InitPostgres() relies upon this behaviour to ensure
|
||||
* that InitXLOGAccess() is called at backend startup. (If you change
|
||||
* that InitXLOGAccess() is called at backend startup. (If you change
|
||||
* this, see also LocalSetXLogInsertAllowed.)
|
||||
*/
|
||||
if (!LocalRecoveryInProgress)
|
||||
@@ -7335,6 +7343,7 @@ RecoveryInProgress(void)
|
||||
pg_memory_barrier();
|
||||
InitXLOGAccess();
|
||||
}
|
||||
|
||||
/*
|
||||
* Note: We don't need a memory barrier when we're still in recovery.
|
||||
* We might exit recovery immediately after return, so the caller
|
||||
@@ -7594,7 +7603,7 @@ GetRedoRecPtr(void)
|
||||
{
|
||||
/* use volatile pointer to prevent code rearrangement */
|
||||
volatile XLogCtlData *xlogctl = XLogCtl;
|
||||
XLogRecPtr ptr;
|
||||
XLogRecPtr ptr;
|
||||
|
||||
/*
|
||||
* The possibly not up-to-date copy in XlogCtl is enough. Even if we
|
||||
@@ -7983,7 +7992,7 @@ CreateCheckPoint(int flags)
|
||||
/*
|
||||
* If this isn't a shutdown or forced checkpoint, and we have not inserted
|
||||
* any XLOG records since the start of the last checkpoint, skip the
|
||||
* checkpoint. The idea here is to avoid inserting duplicate checkpoints
|
||||
* checkpoint. The idea here is to avoid inserting duplicate checkpoints
|
||||
* when the system is idle. That wastes log space, and more importantly it
|
||||
* exposes us to possible loss of both current and previous checkpoint
|
||||
* records if the machine crashes just as we're writing the update.
|
||||
@@ -8120,7 +8129,7 @@ CreateCheckPoint(int flags)
|
||||
* performing those groups of actions.
|
||||
*
|
||||
* One example is end of transaction, so we must wait for any transactions
|
||||
* that are currently in commit critical sections. If an xact inserted
|
||||
* that are currently in commit critical sections. If an xact inserted
|
||||
* its commit record into XLOG just before the REDO point, then a crash
|
||||
* restart from the REDO point would not replay that record, which means
|
||||
* that our flushing had better include the xact's update of pg_clog. So
|
||||
@@ -8131,9 +8140,8 @@ CreateCheckPoint(int flags)
|
||||
* fuzzy: it is possible that we will wait for xacts we didn't really need
|
||||
* to wait for. But the delay should be short and it seems better to make
|
||||
* checkpoint take a bit longer than to hold off insertions longer than
|
||||
* necessary.
|
||||
* (In fact, the whole reason we have this issue is that xact.c does
|
||||
* commit record XLOG insertion and clog update as two separate steps
|
||||
* necessary. (In fact, the whole reason we have this issue is that xact.c
|
||||
* does commit record XLOG insertion and clog update as two separate steps
|
||||
* protected by different locks, but again that seems best on grounds of
|
||||
* minimizing lock contention.)
|
||||
*
|
||||
@@ -8280,9 +8288,9 @@ CreateCheckPoint(int flags)
|
||||
|
||||
/*
|
||||
* Truncate pg_subtrans if possible. We can throw away all data before
|
||||
* the oldest XMIN of any running transaction. No future transaction will
|
||||
* the oldest XMIN of any running transaction. No future transaction will
|
||||
* attempt to reference any pg_subtrans entry older than that (see Asserts
|
||||
* in subtrans.c). During recovery, though, we mustn't do this because
|
||||
* in subtrans.c). During recovery, though, we mustn't do this because
|
||||
* StartupSUBTRANS hasn't been called yet.
|
||||
*/
|
||||
if (!RecoveryInProgress())
|
||||
@@ -8600,11 +8608,11 @@ CreateRestartPoint(int flags)
|
||||
_logSegNo--;
|
||||
|
||||
/*
|
||||
* Try to recycle segments on a useful timeline. If we've been promoted
|
||||
* since the beginning of this restartpoint, use the new timeline
|
||||
* chosen at end of recovery (RecoveryInProgress() sets ThisTimeLineID
|
||||
* in that case). If we're still in recovery, use the timeline we're
|
||||
* currently replaying.
|
||||
* Try to recycle segments on a useful timeline. If we've been
|
||||
* promoted since the beginning of this restartpoint, use the new
|
||||
* timeline chosen at end of recovery (RecoveryInProgress() sets
|
||||
* ThisTimeLineID in that case). If we're still in recovery, use the
|
||||
* timeline we're currently replaying.
|
||||
*
|
||||
* There is no guarantee that the WAL segments will be useful on the
|
||||
* current timeline; if recovery proceeds to a new timeline right
|
||||
@@ -8636,9 +8644,9 @@ CreateRestartPoint(int flags)
|
||||
|
||||
/*
|
||||
* Truncate pg_subtrans if possible. We can throw away all data before
|
||||
* the oldest XMIN of any running transaction. No future transaction will
|
||||
* the oldest XMIN of any running transaction. No future transaction will
|
||||
* attempt to reference any pg_subtrans entry older than that (see Asserts
|
||||
* in subtrans.c). When hot standby is disabled, though, we mustn't do
|
||||
* in subtrans.c). When hot standby is disabled, though, we mustn't do
|
||||
* this because StartupSUBTRANS hasn't been called yet.
|
||||
*/
|
||||
if (EnableHotStandby)
|
||||
@@ -8697,7 +8705,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
|
||||
/* then check whether slots limit removal further */
|
||||
if (max_replication_slots > 0 && keep != InvalidXLogRecPtr)
|
||||
{
|
||||
XLogRecPtr slotSegNo;
|
||||
XLogRecPtr slotSegNo;
|
||||
|
||||
XLByteToSeg(keep, slotSegNo);
|
||||
|
||||
@@ -8730,7 +8738,7 @@ XLogPutNextOid(Oid nextOid)
|
||||
* We need not flush the NEXTOID record immediately, because any of the
|
||||
* just-allocated OIDs could only reach disk as part of a tuple insert or
|
||||
* update that would have its own XLOG record that must follow the NEXTOID
|
||||
* record. Therefore, the standard buffer LSN interlock applied to those
|
||||
* record. Therefore, the standard buffer LSN interlock applied to those
|
||||
* records will ensure no such OID reaches disk before the NEXTOID record
|
||||
* does.
|
||||
*
|
||||
@@ -8859,8 +8867,9 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
|
||||
* lsn updates. We assume pd_lower/upper cannot be changed without an
|
||||
* exclusive lock, so the contents bkp are not racy.
|
||||
*
|
||||
* With buffer_std set to false, XLogCheckBuffer() sets hole_length and
|
||||
* hole_offset to 0; so the following code is safe for either case.
|
||||
* With buffer_std set to false, XLogCheckBuffer() sets hole_length
|
||||
* and hole_offset to 0; so the following code is safe for either
|
||||
* case.
|
||||
*/
|
||||
memcpy(copied_buffer, origdata, bkpb.hole_offset);
|
||||
memcpy(copied_buffer + bkpb.hole_offset,
|
||||
@@ -9072,7 +9081,7 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
|
||||
/*
|
||||
* We used to try to take the maximum of ShmemVariableCache->nextOid
|
||||
* and the recorded nextOid, but that fails if the OID counter wraps
|
||||
* around. Since no OID allocation should be happening during replay
|
||||
* around. Since no OID allocation should be happening during replay
|
||||
* anyway, better to just believe the record exactly. We still take
|
||||
* OidGenLock while setting the variable, just in case.
|
||||
*/
|
||||
@@ -9262,10 +9271,10 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
|
||||
BkpBlock bkpb;
|
||||
|
||||
/*
|
||||
* Full-page image (FPI) records contain a backup block stored "inline"
|
||||
* in the normal data since the locking when writing hint records isn't
|
||||
* sufficient to use the normal backup block mechanism, which assumes
|
||||
* exclusive lock on the buffer supplied.
|
||||
* Full-page image (FPI) records contain a backup block stored
|
||||
* "inline" in the normal data since the locking when writing hint
|
||||
* records isn't sufficient to use the normal backup block mechanism,
|
||||
* which assumes exclusive lock on the buffer supplied.
|
||||
*
|
||||
* Since the only change in these backup block are hint bits, there
|
||||
* are no recovery conflicts generated.
|
||||
@@ -9415,7 +9424,7 @@ get_sync_bit(int method)
|
||||
|
||||
/*
|
||||
* Optimize writes by bypassing kernel cache with O_DIRECT when using
|
||||
* O_SYNC/O_FSYNC and O_DSYNC. But only if archiving and streaming are
|
||||
* O_SYNC/O_FSYNC and O_DSYNC. But only if archiving and streaming are
|
||||
* disabled, otherwise the archive command or walsender process will read
|
||||
* the WAL soon after writing it, which is guaranteed to cause a physical
|
||||
* read if we bypassed the kernel cache. We also skip the
|
||||
@@ -9619,7 +9628,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
|
||||
* during an on-line backup even if not doing so at other times, because
|
||||
* it's quite possible for the backup dump to obtain a "torn" (partially
|
||||
* written) copy of a database page if it reads the page concurrently with
|
||||
* our write to the same page. This can be fixed as long as the first
|
||||
* our write to the same page. This can be fixed as long as the first
|
||||
* write to the page in the WAL sequence is a full-page write. Hence, we
|
||||
* turn on forcePageWrites and then force a CHECKPOINT, to ensure there
|
||||
* are no dirty pages in shared memory that might get dumped while the
|
||||
@@ -9663,7 +9672,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
|
||||
* old timeline IDs. That would otherwise happen if you called
|
||||
* pg_start_backup() right after restoring from a PITR archive: the
|
||||
* first WAL segment containing the startup checkpoint has pages in
|
||||
* the beginning with the old timeline ID. That can cause trouble at
|
||||
* the beginning with the old timeline ID. That can cause trouble at
|
||||
* recovery: we won't have a history file covering the old timeline if
|
||||
* pg_xlog directory was not included in the base backup and the WAL
|
||||
* archive was cleared too before starting the backup.
|
||||
@@ -9686,7 +9695,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
|
||||
bool checkpointfpw;
|
||||
|
||||
/*
|
||||
* Force a CHECKPOINT. Aside from being necessary to prevent torn
|
||||
* Force a CHECKPOINT. Aside from being necessary to prevent torn
|
||||
* page problems, this guarantees that two successive backup runs
|
||||
* will have different checkpoint positions and hence different
|
||||
* history file names, even if nothing happened in between.
|
||||
@@ -10339,7 +10348,7 @@ GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli)
|
||||
*
|
||||
* If we see a backup_label during recovery, we assume that we are recovering
|
||||
* from a backup dump file, and we therefore roll forward from the checkpoint
|
||||
* identified by the label file, NOT what pg_control says. This avoids the
|
||||
* identified by the label file, NOT what pg_control says. This avoids the
|
||||
* problem that pg_control might have been archived one or more checkpoints
|
||||
* later than the start of the dump, and so if we rely on it as the start
|
||||
* point, we will fail to restore a consistent database state.
|
||||
@@ -10686,7 +10695,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||
* Standby mode is implemented by a state machine:
|
||||
*
|
||||
* 1. Read from either archive or pg_xlog (XLOG_FROM_ARCHIVE), or just
|
||||
* pg_xlog (XLOG_FROM_XLOG)
|
||||
* pg_xlog (XLOG_FROM_XLOG)
|
||||
* 2. Check trigger file
|
||||
* 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
|
||||
* 4. Rescan timelines
|
||||
@@ -10887,8 +10896,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||
* file from pg_xlog.
|
||||
*/
|
||||
readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
|
||||
currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :
|
||||
currentSource);
|
||||
currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :
|
||||
currentSource);
|
||||
if (readFile >= 0)
|
||||
return true; /* success! */
|
||||
|
||||
@@ -10945,11 +10954,11 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||
if (havedata)
|
||||
{
|
||||
/*
|
||||
* Great, streamed far enough. Open the file if it's
|
||||
* Great, streamed far enough. Open the file if it's
|
||||
* not open already. Also read the timeline history
|
||||
* file if we haven't initialized timeline history
|
||||
* yet; it should be streamed over and present in
|
||||
* pg_xlog by now. Use XLOG_FROM_STREAM so that
|
||||
* pg_xlog by now. Use XLOG_FROM_STREAM so that
|
||||
* source info is set correctly and XLogReceiptTime
|
||||
* isn't changed.
|
||||
*/
|
||||
@@ -11014,7 +11023,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||
HandleStartupProcInterrupts();
|
||||
}
|
||||
|
||||
return false; /* not reached */
|
||||
return false; /* not reached */
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -11022,9 +11031,9 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
|
||||
* in the current WAL page, previously read by XLogPageRead().
|
||||
*
|
||||
* 'emode' is the error mode that would be used to report a file-not-found
|
||||
* or legitimate end-of-WAL situation. Generally, we use it as-is, but if
|
||||
* or legitimate end-of-WAL situation. Generally, we use it as-is, but if
|
||||
* we're retrying the exact same record that we've tried previously, only
|
||||
* complain the first time to keep the noise down. However, we only do when
|
||||
* complain the first time to keep the noise down. However, we only do when
|
||||
* reading from pg_xlog, because we don't expect any invalid records in archive
|
||||
* or in records streamed from master. Files in the archive should be complete,
|
||||
* and we should never hit the end of WAL because we stop and wait for more WAL
|
||||
|
||||
@@ -300,8 +300,8 @@ RestoreArchivedFile(char *path, const char *xlogfname,
|
||||
signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;
|
||||
|
||||
ereport(signaled ? FATAL : DEBUG2,
|
||||
(errmsg("could not restore file \"%s\" from archive: %s",
|
||||
xlogfname, wait_result_to_str(rc))));
|
||||
(errmsg("could not restore file \"%s\" from archive: %s",
|
||||
xlogfname, wait_result_to_str(rc))));
|
||||
|
||||
not_available:
|
||||
|
||||
|
||||
@@ -429,7 +429,7 @@ pg_is_in_recovery(PG_FUNCTION_ARGS)
|
||||
Datum
|
||||
pg_xlog_location_diff(PG_FUNCTION_ARGS)
|
||||
{
|
||||
Datum result;
|
||||
Datum result;
|
||||
|
||||
result = DirectFunctionCall2(pg_lsn_mi,
|
||||
PG_GETARG_DATUM(0),
|
||||
|
||||
@@ -199,7 +199,7 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
|
||||
randAccess = true;
|
||||
|
||||
/*
|
||||
* RecPtr is pointing to end+1 of the previous WAL record. If we're
|
||||
* RecPtr is pointing to end+1 of the previous WAL record. If we're
|
||||
* at a page boundary, no more records can fit on the current page. We
|
||||
* must skip over the page header, but we can't do that until we've
|
||||
* read in the page, since the header size is variable.
|
||||
@@ -277,7 +277,7 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
|
||||
/*
|
||||
* If the whole record header is on this page, validate it immediately.
|
||||
* Otherwise do just a basic sanity check on xl_tot_len, and validate the
|
||||
* rest of the header after reading it from the next page. The xl_tot_len
|
||||
* rest of the header after reading it from the next page. The xl_tot_len
|
||||
* check is necessary here to ensure that we enter the "Need to reassemble
|
||||
* record" code path below; otherwise we might fail to apply
|
||||
* ValidXLogRecordHeader at all.
|
||||
@@ -572,7 +572,7 @@ err:
|
||||
* Validate an XLOG record header.
|
||||
*
|
||||
* This is just a convenience subroutine to avoid duplicated code in
|
||||
* XLogReadRecord. It's not intended for use from anywhere else.
|
||||
* XLogReadRecord. It's not intended for use from anywhere else.
|
||||
*/
|
||||
static bool
|
||||
ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
|
||||
@@ -661,7 +661,7 @@ ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
|
||||
* data to read in) until we've checked the CRCs.
|
||||
*
|
||||
* We assume all of the record (that is, xl_tot_len bytes) has been read
|
||||
* into memory at *record. Also, ValidXLogRecordHeader() has accepted the
|
||||
* into memory at *record. Also, ValidXLogRecordHeader() has accepted the
|
||||
* record's header, which means in particular that xl_tot_len is at least
|
||||
* SizeOfXlogRecord, so it is safe to fetch xl_len.
|
||||
*/
|
||||
|
||||
Reference in New Issue
Block a user