mirror of
https://github.com/postgres/postgres.git
synced 2025-08-28 18:48:04 +03:00
Remove tabs after spaces in C comments
This was not changed in HEAD, but will be done later as part of a pgindent run. Future pgindent runs will also do this. Report by Tom Lane Backpatch through all supported branches, but not HEAD
This commit is contained in:
@@ -44,7 +44,7 @@ int32 *PrivateRefCount;
|
||||
*
|
||||
* IO_IN_PROGRESS -- this is a flag in the buffer descriptor.
|
||||
* It must be set when an IO is initiated and cleared at
|
||||
* the end of the IO. It is there to make sure that one
|
||||
* the end of the IO. It is there to make sure that one
|
||||
* process doesn't start to use a buffer while another is
|
||||
* faulting it in. see WaitIO and related routines.
|
||||
*
|
||||
@@ -54,7 +54,7 @@ int32 *PrivateRefCount;
|
||||
*
|
||||
* PrivateRefCount -- Each buffer also has a private refcount that keeps
|
||||
* track of the number of times the buffer is pinned in the current
|
||||
* process. This is used for two purposes: first, if we pin a
|
||||
* process. This is used for two purposes: first, if we pin a
|
||||
* a buffer more than once, we only need to change the shared refcount
|
||||
* once, thus only lock the shared state once; second, when a transaction
|
||||
* aborts, it should only unpin the buffers exactly the number of times it
|
||||
|
@@ -3,7 +3,7 @@
|
||||
* buf_table.c
|
||||
* routines for mapping BufferTags to buffer indexes.
|
||||
*
|
||||
* Note: the routines in this file do no locking of their own. The caller
|
||||
* Note: the routines in this file do no locking of their own. The caller
|
||||
* must hold a suitable lock on the appropriate BufMappingLock, as specified
|
||||
* in the comments. We can't do the locking inside these functions because
|
||||
* in most cases the caller needs to adjust the buffer header contents
|
||||
@@ -112,7 +112,7 @@ BufTableLookup(BufferTag *tagPtr, uint32 hashcode)
|
||||
* Insert a hashtable entry for given tag and buffer ID,
|
||||
* unless an entry already exists for that tag
|
||||
*
|
||||
* Returns -1 on successful insertion. If a conflicting entry exists
|
||||
* Returns -1 on successful insertion. If a conflicting entry exists
|
||||
* already, returns the buffer ID in that entry.
|
||||
*
|
||||
* Caller must hold exclusive lock on BufMappingLock for tag's partition
|
||||
|
@@ -111,7 +111,7 @@ static void AtProcExit_Buffers(int code, Datum arg);
|
||||
* PrefetchBuffer -- initiate asynchronous read of a block of a relation
|
||||
*
|
||||
* This is named by analogy to ReadBuffer but doesn't actually allocate a
|
||||
* buffer. Instead it tries to ensure that a future ReadBuffer for the given
|
||||
* buffer. Instead it tries to ensure that a future ReadBuffer for the given
|
||||
* block will not be delayed by the I/O. Prefetching is optional.
|
||||
* No-op if prefetching isn't compiled in.
|
||||
*/
|
||||
@@ -201,7 +201,7 @@ ReadBuffer(Relation reln, BlockNumber blockNum)
|
||||
* Assume when this function is called, that reln has been opened already.
|
||||
*
|
||||
* In RBM_NORMAL mode, the page is read from disk, and the page header is
|
||||
* validated. An error is thrown if the page header is not valid. (But
|
||||
* validated. An error is thrown if the page header is not valid. (But
|
||||
* note that an all-zero page is considered "valid"; see PageIsVerified().)
|
||||
*
|
||||
* RBM_ZERO_ON_ERROR is like the normal mode, but if the page header is not
|
||||
@@ -209,7 +209,7 @@ ReadBuffer(Relation reln, BlockNumber blockNum)
|
||||
* for non-critical data, where the caller is prepared to repair errors.
|
||||
*
|
||||
* In RBM_ZERO mode, if the page isn't in buffer cache already, it's filled
|
||||
* with zeros instead of reading it from disk. Useful when the caller is
|
||||
* with zeros instead of reading it from disk. Useful when the caller is
|
||||
* going to fill the page from scratch, since this saves I/O and avoids
|
||||
* unnecessary failure if the page-on-disk has corrupt page headers.
|
||||
* Caution: do not use this mode to read a page that is beyond the relation's
|
||||
@@ -365,7 +365,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
|
||||
* This can happen because mdread doesn't complain about reads beyond
|
||||
* EOF (when zero_damaged_pages is ON) and so a previous attempt to
|
||||
* read a block beyond EOF could have left a "valid" zero-filled
|
||||
* buffer. Unfortunately, we have also seen this case occurring
|
||||
* buffer. Unfortunately, we have also seen this case occurring
|
||||
* because of buggy Linux kernels that sometimes return an
|
||||
* lseek(SEEK_END) result that doesn't account for a recent write. In
|
||||
* that situation, the pre-existing buffer would contain valid data
|
||||
@@ -575,7 +575,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
|
||||
|
||||
/*
|
||||
* Didn't find it in the buffer pool. We'll have to initialize a new
|
||||
* buffer. Remember to unlock the mapping lock while doing the work.
|
||||
* buffer. Remember to unlock the mapping lock while doing the work.
|
||||
*/
|
||||
LWLockRelease(newPartitionLock);
|
||||
|
||||
@@ -585,7 +585,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
|
||||
bool lock_held;
|
||||
|
||||
/*
|
||||
* Select a victim buffer. The buffer is returned with its header
|
||||
* Select a victim buffer. The buffer is returned with its header
|
||||
* spinlock still held! Also (in most cases) the BufFreelistLock is
|
||||
* still held, since it would be bad to hold the spinlock while
|
||||
* possibly waking up other processes.
|
||||
@@ -634,7 +634,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
|
||||
* If using a nondefault strategy, and writing the buffer
|
||||
* would require a WAL flush, let the strategy decide whether
|
||||
* to go ahead and write/reuse the buffer or to choose another
|
||||
* victim. We need lock to inspect the page LSN, so this
|
||||
* victim. We need lock to inspect the page LSN, so this
|
||||
* can't be done inside StrategyGetBuffer.
|
||||
*/
|
||||
if (strategy != NULL &&
|
||||
@@ -755,7 +755,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
|
||||
{
|
||||
/*
|
||||
* We can only get here if (a) someone else is still reading
|
||||
* in the page, or (b) a previous read attempt failed. We
|
||||
* in the page, or (b) a previous read attempt failed. We
|
||||
* have to wait for any active read attempt to finish, and
|
||||
* then set up our own read attempt if the page is still not
|
||||
* BM_VALID. StartBufferIO does it all.
|
||||
@@ -848,7 +848,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
|
||||
* This is used only in contexts such as dropping a relation. We assume
|
||||
* that no other backend could possibly be interested in using the page,
|
||||
* so the only reason the buffer might be pinned is if someone else is
|
||||
* trying to write it out. We have to let them finish before we can
|
||||
* trying to write it out. We have to let them finish before we can
|
||||
* reclaim the buffer.
|
||||
*
|
||||
* The buffer could get reclaimed by someone else while we are waiting
|
||||
@@ -947,7 +947,7 @@ retry:
|
||||
*
|
||||
* Marks buffer contents as dirty (actual write happens later).
|
||||
*
|
||||
* Buffer must be pinned and exclusive-locked. (If caller does not hold
|
||||
* Buffer must be pinned and exclusive-locked. (If caller does not hold
|
||||
* exclusive lock, then somebody could be in process of writing the buffer,
|
||||
* leading to risk of bad data written to disk.)
|
||||
*/
|
||||
@@ -991,7 +991,7 @@ MarkBufferDirty(Buffer buffer)
|
||||
*
|
||||
* Formerly, this saved one cycle of acquiring/releasing the BufMgrLock
|
||||
* compared to calling the two routines separately. Now it's mainly just
|
||||
* a convenience function. However, if the passed buffer is valid and
|
||||
* a convenience function. However, if the passed buffer is valid and
|
||||
* already contains the desired block, we just return it as-is; and that
|
||||
* does save considerable work compared to a full release and reacquire.
|
||||
*
|
||||
@@ -1043,7 +1043,7 @@ ReleaseAndReadBuffer(Buffer buffer,
|
||||
* when we first pin it; for other strategies we just make sure the usage_count
|
||||
* isn't zero. (The idea of the latter is that we don't want synchronized
|
||||
* heap scans to inflate the count, but we need it to not be zero to discourage
|
||||
* other backends from stealing buffers from our ring. As long as we cycle
|
||||
* other backends from stealing buffers from our ring. As long as we cycle
|
||||
* through the ring faster than the global clock-sweep cycles, buffers in
|
||||
* our ring won't be chosen as victims for replacement by other backends.)
|
||||
*
|
||||
@@ -1051,7 +1051,7 @@ ReleaseAndReadBuffer(Buffer buffer,
|
||||
*
|
||||
* Note that ResourceOwnerEnlargeBuffers must have been done already.
|
||||
*
|
||||
* Returns TRUE if buffer is BM_VALID, else FALSE. This provision allows
|
||||
* Returns TRUE if buffer is BM_VALID, else FALSE. This provision allows
|
||||
* some callers to avoid an extra spinlock cycle.
|
||||
*/
|
||||
static bool
|
||||
@@ -1204,7 +1204,7 @@ BufferSync(int flags)
|
||||
* have the flag set.
|
||||
*
|
||||
* Note that if we fail to write some buffer, we may leave buffers with
|
||||
* BM_CHECKPOINT_NEEDED still set. This is OK since any such buffer would
|
||||
* BM_CHECKPOINT_NEEDED still set. This is OK since any such buffer would
|
||||
* certainly need to be written for the next checkpoint attempt, too.
|
||||
*/
|
||||
num_to_write = 0;
|
||||
@@ -1945,7 +1945,7 @@ RelationGetNumberOfBlocksInFork(Relation relation, ForkNumber forkNum)
|
||||
* specified relation that have block numbers >= firstDelBlock.
|
||||
* (In particular, with firstDelBlock = 0, all pages are removed.)
|
||||
* Dirty pages are simply dropped, without bothering to write them
|
||||
* out first. Therefore, this is NOT rollback-able, and so should be
|
||||
* out first. Therefore, this is NOT rollback-able, and so should be
|
||||
* used only with extreme caution!
|
||||
*
|
||||
* Currently, this is called only from smgr.c when the underlying file
|
||||
@@ -1954,7 +1954,7 @@ RelationGetNumberOfBlocksInFork(Relation relation, ForkNumber forkNum)
|
||||
* be deleted momentarily anyway, and there is no point in writing it.
|
||||
* It is the responsibility of higher-level code to ensure that the
|
||||
* deletion or truncation does not lose any data that could be needed
|
||||
* later. It is also the responsibility of higher-level code to ensure
|
||||
* later. It is also the responsibility of higher-level code to ensure
|
||||
* that no other process could be trying to load more pages of the
|
||||
* relation into buffers.
|
||||
*
|
||||
@@ -1997,9 +1997,9 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
|
||||
*
|
||||
* This function removes all the buffers in the buffer cache for a
|
||||
* particular database. Dirty pages are simply dropped, without
|
||||
* bothering to write them out first. This is used when we destroy a
|
||||
* bothering to write them out first. This is used when we destroy a
|
||||
* database, to avoid trying to flush data to disk when the directory
|
||||
* tree no longer exists. Implementation is pretty similar to
|
||||
* tree no longer exists. Implementation is pretty similar to
|
||||
* DropRelFileNodeBuffers() which is for destroying just one relation.
|
||||
* --------------------------------------------------------------------
|
||||
*/
|
||||
@@ -2298,9 +2298,9 @@ SetBufferCommitInfoNeedsSave(Buffer buffer)
|
||||
/*
|
||||
* This routine might get called many times on the same page, if we are
|
||||
* making the first scan after commit of an xact that added/deleted many
|
||||
* tuples. So, be as quick as we can if the buffer is already dirty. We
|
||||
* tuples. So, be as quick as we can if the buffer is already dirty. We
|
||||
* do this by not acquiring spinlock if it looks like the status bits are
|
||||
* already OK. (Note it is okay if someone else clears BM_JUST_DIRTIED
|
||||
* already OK. (Note it is okay if someone else clears BM_JUST_DIRTIED
|
||||
* immediately after we look, because the buffer content update is already
|
||||
* done and will be reflected in the I/O.)
|
||||
*/
|
||||
|
@@ -36,7 +36,7 @@ typedef struct
|
||||
*/
|
||||
|
||||
/*
|
||||
* Statistics. These counters should be wide enough that they can't
|
||||
* Statistics. These counters should be wide enough that they can't
|
||||
* overflow during a single bgwriter cycle.
|
||||
*/
|
||||
uint32 completePasses; /* Complete cycles of the clock sweep */
|
||||
@@ -129,7 +129,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
|
||||
|
||||
/*
|
||||
* We count buffer allocation requests so that the bgwriter can estimate
|
||||
* the rate of buffer consumption. Note that buffers recycled by a
|
||||
* the rate of buffer consumption. Note that buffers recycled by a
|
||||
* strategy object are intentionally not counted here.
|
||||
*/
|
||||
StrategyControl->numBufferAllocs++;
|
||||
@@ -248,7 +248,7 @@ StrategyFreeBuffer(volatile BufferDesc *buf)
|
||||
*
|
||||
* In addition, we return the completed-pass count (which is effectively
|
||||
* the higher-order bits of nextVictimBuffer) and the count of recent buffer
|
||||
* allocs if non-NULL pointers are passed. The alloc count is reset after
|
||||
* allocs if non-NULL pointers are passed. The alloc count is reset after
|
||||
* being read.
|
||||
*/
|
||||
int
|
||||
@@ -442,7 +442,7 @@ GetBufferFromRing(BufferAccessStrategy strategy)
|
||||
|
||||
/*
|
||||
* If the slot hasn't been filled yet, tell the caller to allocate a new
|
||||
* buffer with the normal allocation strategy. He will then fill this
|
||||
* buffer with the normal allocation strategy. He will then fill this
|
||||
* slot by calling AddBufferToRing with the new buffer.
|
||||
*/
|
||||
bufnum = strategy->buffers[strategy->current];
|
||||
@@ -495,7 +495,7 @@ AddBufferToRing(BufferAccessStrategy strategy, volatile BufferDesc *buf)
|
||||
*
|
||||
* When a nondefault strategy is used, the buffer manager calls this function
|
||||
* when it turns out that the buffer selected by StrategyGetBuffer needs to
|
||||
* be written out and doing so would require flushing WAL too. This gives us
|
||||
* be written out and doing so would require flushing WAL too. This gives us
|
||||
* a chance to choose a different victim.
|
||||
*
|
||||
* Returns true if buffer manager should ask for a new victim, and false
|
||||
|
@@ -95,7 +95,7 @@ LocalPrefetchBuffer(SMgrRelation smgr, ForkNumber forkNum,
|
||||
* Find or create a local buffer for the given page of the given relation.
|
||||
*
|
||||
* API is similar to bufmgr.c's BufferAlloc, except that we do not need
|
||||
* to do any locking since this is all local. Also, IO_IN_PROGRESS
|
||||
* to do any locking since this is all local. Also, IO_IN_PROGRESS
|
||||
* does not get set. Lastly, we support only default access strategy
|
||||
* (hence, usage_count is always advanced).
|
||||
*/
|
||||
@@ -286,7 +286,7 @@ MarkLocalBufferDirty(Buffer buffer)
|
||||
* specified relation that have block numbers >= firstDelBlock.
|
||||
* (In particular, with firstDelBlock = 0, all pages are removed.)
|
||||
* Dirty pages are simply dropped, without bothering to write them
|
||||
* out first. Therefore, this is NOT rollback-able, and so should be
|
||||
* out first. Therefore, this is NOT rollback-able, and so should be
|
||||
* used only with extreme caution!
|
||||
*
|
||||
* See DropRelFileNodeBuffers in bufmgr.c for more notes.
|
||||
@@ -413,7 +413,7 @@ GetLocalBufferStorage(void)
|
||||
/*
|
||||
* We allocate local buffers in a context of their own, so that the
|
||||
* space eaten for them is easily recognizable in MemoryContextStats
|
||||
* output. Create the context on first use.
|
||||
* output. Create the context on first use.
|
||||
*/
|
||||
if (LocalBufferContext == NULL)
|
||||
LocalBufferContext =
|
||||
|
@@ -29,7 +29,7 @@
|
||||
* that was current at that time.
|
||||
*
|
||||
* BufFile also supports temporary files that exceed the OS file size limit
|
||||
* (by opening multiple fd.c temporary files). This is an essential feature
|
||||
* (by opening multiple fd.c temporary files). This is an essential feature
|
||||
* for sorts and hashjoins on large amounts of data.
|
||||
*-------------------------------------------------------------------------
|
||||
*/
|
||||
@@ -72,7 +72,7 @@ struct BufFile
|
||||
bool dirty; /* does buffer need to be written? */
|
||||
|
||||
/*
|
||||
* resowner is the ResourceOwner to use for underlying temp files. (We
|
||||
* resowner is the ResourceOwner to use for underlying temp files. (We
|
||||
* don't need to remember the memory context we're using explicitly,
|
||||
* because after creation we only repalloc our arrays larger.)
|
||||
*/
|
||||
@@ -519,7 +519,7 @@ BufFileSeek(BufFile *file, int fileno, off_t offset, int whence)
|
||||
{
|
||||
/*
|
||||
* Seek is to a point within existing buffer; we can just adjust
|
||||
* pos-within-buffer, without flushing buffer. Note this is OK
|
||||
* pos-within-buffer, without flushing buffer. Note this is OK
|
||||
* whether reading or writing, but buffer remains dirty if we were
|
||||
* writing.
|
||||
*/
|
||||
|
@@ -64,7 +64,7 @@
|
||||
* and other code that tries to open files without consulting fd.c. This
|
||||
* is the number left free. (While we can be pretty sure we won't get
|
||||
* EMFILE, there's never any guarantee that we won't get ENFILE due to
|
||||
* other processes chewing up FDs. So it's a bad idea to try to open files
|
||||
* other processes chewing up FDs. So it's a bad idea to try to open files
|
||||
* without consulting fd.c. Nonetheless we cannot control all code.)
|
||||
*
|
||||
* Because this is just a fixed setting, we are effectively assuming that
|
||||
@@ -154,8 +154,8 @@ typedef struct vfd
|
||||
} Vfd;
|
||||
|
||||
/*
|
||||
* Virtual File Descriptor array pointer and size. This grows as
|
||||
* needed. 'File' values are indexes into this array.
|
||||
* Virtual File Descriptor array pointer and size. This grows as
|
||||
* needed. 'File' values are indexes into this array.
|
||||
* Note that VfdCache[0] is not a usable VFD, just a list header.
|
||||
*/
|
||||
static Vfd *VfdCache;
|
||||
@@ -221,7 +221,7 @@ static int nextTempTableSpace = 0;
|
||||
*
|
||||
* The Least Recently Used ring is a doubly linked list that begins and
|
||||
* ends on element zero. Element zero is special -- it doesn't represent
|
||||
* a file and its "fd" field always == VFD_CLOSED. Element zero is just an
|
||||
* a file and its "fd" field always == VFD_CLOSED. Element zero is just an
|
||||
* anchor that shows us the beginning/end of the ring.
|
||||
* Only VFD elements that are currently really open (have an FD assigned) are
|
||||
* in the Lru ring. Elements that are "virtually" open can be recognized
|
||||
@@ -379,7 +379,7 @@ InitFileAccess(void)
|
||||
* We stop counting if usable_fds reaches max_to_probe. Note: a small
|
||||
* value of max_to_probe might result in an underestimate of already_open;
|
||||
* we must fill in any "gaps" in the set of used FDs before the calculation
|
||||
* of already_open will give the right answer. In practice, max_to_probe
|
||||
* of already_open will give the right answer. In practice, max_to_probe
|
||||
* of a couple of dozen should be enough to ensure good results.
|
||||
*
|
||||
* We assume stdin (FD 0) is available for dup'ing
|
||||
@@ -456,7 +456,7 @@ count_usable_fds(int max_to_probe, int *usable_fds, int *already_open)
|
||||
pfree(fd);
|
||||
|
||||
/*
|
||||
* Return results. usable_fds is just the number of successful dups. We
|
||||
* Return results. usable_fds is just the number of successful dups. We
|
||||
* assume that the system limit is highestfd+1 (remember 0 is a legal FD
|
||||
* number) and so already_open is highestfd+1 - usable_fds.
|
||||
*/
|
||||
@@ -950,7 +950,7 @@ OpenTemporaryFile(bool interXact)
|
||||
|
||||
/*
|
||||
* If not, or if tablespace is bad, create in database's default
|
||||
* tablespace. MyDatabaseTableSpace should normally be set before we get
|
||||
* tablespace. MyDatabaseTableSpace should normally be set before we get
|
||||
* here, but just in case it isn't, fall back to pg_default tablespace.
|
||||
*/
|
||||
if (file <= 0)
|
||||
@@ -1475,7 +1475,7 @@ reserveAllocatedDesc(void)
|
||||
/*
|
||||
* Routines that want to use stdio (ie, FILE*) should use AllocateFile
|
||||
* rather than plain fopen(). This lets fd.c deal with freeing FDs if
|
||||
* necessary to open the file. When done, call FreeFile rather than fclose.
|
||||
* necessary to open the file. When done, call FreeFile rather than fclose.
|
||||
*
|
||||
* Note that files that will be open for any significant length of time
|
||||
* should NOT be handled this way, since they cannot share kernel file
|
||||
@@ -1654,7 +1654,7 @@ TryAgain:
|
||||
* Read a directory opened with AllocateDir, ereport'ing any error.
|
||||
*
|
||||
* This is easier to use than raw readdir() since it takes care of some
|
||||
* otherwise rather tedious and error-prone manipulation of errno. Also,
|
||||
* otherwise rather tedious and error-prone manipulation of errno. Also,
|
||||
* if you are happy with a generic error message for AllocateDir failure,
|
||||
* you can just do
|
||||
*
|
||||
@@ -1770,7 +1770,7 @@ SetTempTablespaces(Oid *tableSpaces, int numSpaces)
|
||||
numTempTableSpaces = numSpaces;
|
||||
|
||||
/*
|
||||
* Select a random starting point in the list. This is to minimize
|
||||
* Select a random starting point in the list. This is to minimize
|
||||
* conflicts between backends that are most likely sharing the same list
|
||||
* of temp tablespaces. Note that if we create multiple temp files in the
|
||||
* same transaction, we'll advance circularly through the list --- this
|
||||
@@ -1799,7 +1799,7 @@ TempTablespacesAreSet(void)
|
||||
/*
|
||||
* GetNextTempTableSpace
|
||||
*
|
||||
* Select the next temp tablespace to use. A result of InvalidOid means
|
||||
* Select the next temp tablespace to use. A result of InvalidOid means
|
||||
* to use the current database's default tablespace.
|
||||
*/
|
||||
Oid
|
||||
|
@@ -52,7 +52,7 @@
|
||||
* Range Category
|
||||
* 0 - 31 0
|
||||
* 32 - 63 1
|
||||
* ... ... ...
|
||||
* ... ... ...
|
||||
* 8096 - 8127 253
|
||||
* 8128 - 8163 254
|
||||
* 8164 - 8192 255
|
||||
@@ -127,7 +127,7 @@ static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof);
|
||||
* will turn out to have too little space available by the time the caller
|
||||
* gets a lock on it. In that case, the caller should report the actual
|
||||
* amount of free space available on that page and then try again (see
|
||||
* RecordAndGetPageWithFreeSpace). If InvalidBlockNumber is returned,
|
||||
* RecordAndGetPageWithFreeSpace). If InvalidBlockNumber is returned,
|
||||
* extend the relation.
|
||||
*/
|
||||
BlockNumber
|
||||
|
@@ -185,13 +185,13 @@ restart:
|
||||
|
||||
/*----------
|
||||
* Start the search from the target slot. At every step, move one
|
||||
* node to the right, then climb up to the parent. Stop when we reach
|
||||
* node to the right, then climb up to the parent. Stop when we reach
|
||||
* a node with enough free space (as we must, since the root has enough
|
||||
* space).
|
||||
*
|
||||
* The idea is to gradually expand our "search triangle", that is, all
|
||||
* nodes covered by the current node, and to be sure we search to the
|
||||
* right from the start point. At the first step, only the target slot
|
||||
* right from the start point. At the first step, only the target slot
|
||||
* is examined. When we move up from a left child to its parent, we are
|
||||
* adding the right-hand subtree of that parent to the search triangle.
|
||||
* When we move right then up from a right child, we are dropping the
|
||||
|
@@ -4,7 +4,7 @@
|
||||
* POSTGRES inter-process communication definitions.
|
||||
*
|
||||
* This file is misnamed, as it no longer has much of anything directly
|
||||
* to do with IPC. The functionality here is concerned with managing
|
||||
* to do with IPC. The functionality here is concerned with managing
|
||||
* exit-time cleanup for either a postmaster or a backend.
|
||||
*
|
||||
*
|
||||
@@ -84,7 +84,7 @@ static int on_proc_exit_index,
|
||||
* -cim 2/6/90
|
||||
*
|
||||
* Unfortunately, we can't really guarantee that add-on code
|
||||
* obeys the rule of not calling exit() directly. So, while
|
||||
* obeys the rule of not calling exit() directly. So, while
|
||||
* this is the preferred way out of the system, we also register
|
||||
* an atexit callback that will make sure cleanup happens.
|
||||
* ----------------------------------------------------------------
|
||||
@@ -103,7 +103,7 @@ proc_exit(int code)
|
||||
* fixed file name, each backend will overwrite earlier profiles. To
|
||||
* fix that, we create a separate subdirectory for each backend
|
||||
* (./gprof/pid) and 'cd' to that subdirectory before we exit() - that
|
||||
* forces mcleanup() to write each profile into its own directory. We
|
||||
* forces mcleanup() to write each profile into its own directory. We
|
||||
* end up with something like: $PGDATA/gprof/8829/gmon.out
|
||||
* $PGDATA/gprof/8845/gmon.out ...
|
||||
*
|
||||
@@ -257,7 +257,7 @@ atexit_callback(int exitstatus, void *arg)
|
||||
* on_proc_exit
|
||||
*
|
||||
* this function adds a callback function to the list of
|
||||
* functions invoked by proc_exit(). -cim 2/6/90
|
||||
* functions invoked by proc_exit(). -cim 2/6/90
|
||||
* ----------------------------------------------------------------
|
||||
*/
|
||||
void
|
||||
@@ -288,7 +288,7 @@ on_proc_exit(pg_on_exit_callback function, Datum arg)
|
||||
* on_shmem_exit
|
||||
*
|
||||
* this function adds a callback function to the list of
|
||||
* functions invoked by shmem_exit(). -cim 2/6/90
|
||||
* functions invoked by shmem_exit(). -cim 2/6/90
|
||||
* ----------------------------------------------------------------
|
||||
*/
|
||||
void
|
||||
|
@@ -51,7 +51,7 @@ static bool addin_request_allowed = true;
|
||||
* a loadable module.
|
||||
*
|
||||
* This is only useful if called from the _PG_init hook of a library that
|
||||
* is loaded into the postmaster via shared_preload_libraries. Once
|
||||
* is loaded into the postmaster via shared_preload_libraries. Once
|
||||
* shared memory has been allocated, calls will be ignored. (We could
|
||||
* raise an error, but it seems better to make it a no-op, so that
|
||||
* libraries containing such calls can be reloaded if needed.)
|
||||
@@ -81,7 +81,7 @@ RequestAddinShmemSpace(Size size)
|
||||
* This is a bit code-wasteful and could be cleaned up.)
|
||||
*
|
||||
* If "makePrivate" is true then we only need private memory, not shared
|
||||
* memory. This is true for a standalone backend, false for a postmaster.
|
||||
* memory. This is true for a standalone backend, false for a postmaster.
|
||||
*/
|
||||
void
|
||||
CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
|
||||
|
@@ -26,9 +26,9 @@
|
||||
|
||||
/*
|
||||
* The postmaster is signaled by its children by sending SIGUSR1. The
|
||||
* specific reason is communicated via flags in shared memory. We keep
|
||||
* specific reason is communicated via flags in shared memory. We keep
|
||||
* a boolean flag for each possible "reason", so that different reasons
|
||||
* can be signaled by different backends at the same time. (However,
|
||||
* can be signaled by different backends at the same time. (However,
|
||||
* if the same reason is signaled more than once simultaneously, the
|
||||
* postmaster will observe it only once.)
|
||||
*
|
||||
@@ -42,7 +42,7 @@
|
||||
* have three possible states: UNUSED, ASSIGNED, ACTIVE. An UNUSED slot is
|
||||
* available for assignment. An ASSIGNED slot is associated with a postmaster
|
||||
* child process, but either the process has not touched shared memory yet,
|
||||
* or it has successfully cleaned up after itself. A ACTIVE slot means the
|
||||
* or it has successfully cleaned up after itself. A ACTIVE slot means the
|
||||
* process is actively using shared memory. The slots are assigned to
|
||||
* child processes at random, and postmaster.c is responsible for tracking
|
||||
* which one goes with which PID.
|
||||
@@ -297,7 +297,7 @@ PostmasterIsAlive(bool amDirectChild)
|
||||
}
|
||||
|
||||
/*
|
||||
* Use kill() to see if the postmaster is still alive. This can sometimes
|
||||
* Use kill() to see if the postmaster is still alive. This can sometimes
|
||||
* give a false positive result, since the postmaster's PID may get
|
||||
* recycled, but it is good enough for existing uses by indirect children
|
||||
* and in debugging environments.
|
||||
|
@@ -19,11 +19,11 @@
|
||||
*
|
||||
* During hot standby, we also keep a list of XIDs representing transactions
|
||||
* that are known to be running in the master (or more precisely, were running
|
||||
* as of the current point in the WAL stream). This list is kept in the
|
||||
* as of the current point in the WAL stream). This list is kept in the
|
||||
* KnownAssignedXids array, and is updated by watching the sequence of
|
||||
* arriving XIDs. This is necessary because if we leave those XIDs out of
|
||||
* snapshots taken for standby queries, then they will appear to be already
|
||||
* complete, leading to MVCC failures. Note that in hot standby, the PGPROC
|
||||
* complete, leading to MVCC failures. Note that in hot standby, the PGPROC
|
||||
* array represents standby processes, which by definition are not running
|
||||
* transactions that have XIDs.
|
||||
*
|
||||
@@ -261,7 +261,7 @@ ProcArrayAdd(PGPROC *proc)
|
||||
if (arrayP->numProcs >= arrayP->maxProcs)
|
||||
{
|
||||
/*
|
||||
* Ooops, no room. (This really shouldn't happen, since there is a
|
||||
* Ooops, no room. (This really shouldn't happen, since there is a
|
||||
* fixed supply of PGPROC structs too, and so we should have failed
|
||||
* earlier.)
|
||||
*/
|
||||
@@ -1054,9 +1054,9 @@ TransactionIdIsActive(TransactionId xid)
|
||||
* ignored.
|
||||
*
|
||||
* This is used by VACUUM to decide which deleted tuples must be preserved
|
||||
* in a table. allDbs = TRUE is needed for shared relations, but allDbs =
|
||||
* in a table. allDbs = TRUE is needed for shared relations, but allDbs =
|
||||
* FALSE is sufficient for non-shared relations, since only backends in my
|
||||
* own database could ever see the tuples in them. Also, we can ignore
|
||||
* own database could ever see the tuples in them. Also, we can ignore
|
||||
* concurrently running lazy VACUUMs because (a) they must be working on other
|
||||
* tables, and (b) they don't need to do snapshot-based lookups.
|
||||
*
|
||||
@@ -1321,7 +1321,7 @@ GetSnapshotData(Snapshot snapshot)
|
||||
|
||||
/*
|
||||
* If the transaction has been assigned an xid < xmax we add it to
|
||||
* the snapshot, and update xmin if necessary. There's no need to
|
||||
* the snapshot, and update xmin if necessary. There's no need to
|
||||
* store XIDs >= xmax, since we'll treat them as running anyway.
|
||||
* We don't bother to examine their subxids either.
|
||||
*
|
||||
@@ -1346,7 +1346,7 @@ GetSnapshotData(Snapshot snapshot)
|
||||
* do that much work while holding the ProcArrayLock.
|
||||
*
|
||||
* The other backend can add more subxids concurrently, but cannot
|
||||
* remove any. Hence it's important to fetch nxids just once.
|
||||
* remove any. Hence it's important to fetch nxids just once.
|
||||
* Should be safe to use memcpy, though. (We needn't worry about
|
||||
* missing any xids added concurrently, because they must postdate
|
||||
* xmax.)
|
||||
@@ -1782,7 +1782,7 @@ BackendPidGetProc(int pid)
|
||||
* Only main transaction Ids are considered. This function is mainly
|
||||
* useful for determining what backend owns a lock.
|
||||
*
|
||||
* Beware that not every xact has an XID assigned. However, as long as you
|
||||
* Beware that not every xact has an XID assigned. However, as long as you
|
||||
* only call this using an XID found on disk, you're safe.
|
||||
*/
|
||||
int
|
||||
@@ -1842,7 +1842,7 @@ IsBackendPid(int pid)
|
||||
* some snapshot we have. Since we examine the procarray with only shared
|
||||
* lock, there are race conditions: a backend could set its xmin just after
|
||||
* we look. Indeed, on multiprocessors with weak memory ordering, the
|
||||
* other backend could have set its xmin *before* we look. We know however
|
||||
* other backend could have set its xmin *before* we look. We know however
|
||||
* that such a backend must have held shared ProcArrayLock overlapping our
|
||||
* own hold of ProcArrayLock, else we would see its xmin update. Therefore,
|
||||
* any snapshot the other backend is taking concurrently with our scan cannot
|
||||
@@ -2295,7 +2295,7 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
|
||||
* XidCacheRemoveRunningXids
|
||||
*
|
||||
* Remove a bunch of TransactionIds from the list of known-running
|
||||
* subtransactions for my backend. Both the specified xid and those in
|
||||
* subtransactions for my backend. Both the specified xid and those in
|
||||
* the xids[] array (of length nxids) are removed from the subxids cache.
|
||||
* latestXid must be the latest XID among the group.
|
||||
*/
|
||||
@@ -2401,7 +2401,7 @@ DisplayXidCache(void)
|
||||
* treated as running by standby transactions, even though they are not in
|
||||
* the standby server's PGPROC array.
|
||||
*
|
||||
* We record all XIDs that we know have been assigned. That includes all the
|
||||
* We record all XIDs that we know have been assigned. That includes all the
|
||||
* XIDs seen in WAL records, plus all unobserved XIDs that we can deduce have
|
||||
* been assigned. We can deduce the existence of unobserved XIDs because we
|
||||
* know XIDs are assigned in sequence, with no gaps. The KnownAssignedXids
|
||||
@@ -2410,7 +2410,7 @@ DisplayXidCache(void)
|
||||
*
|
||||
* During hot standby we do not fret too much about the distinction between
|
||||
* top-level XIDs and subtransaction XIDs. We store both together in the
|
||||
* KnownAssignedXids list. In backends, this is copied into snapshots in
|
||||
* KnownAssignedXids list. In backends, this is copied into snapshots in
|
||||
* GetSnapshotData(), taking advantage of the fact that XidInMVCCSnapshot()
|
||||
* doesn't care about the distinction either. Subtransaction XIDs are
|
||||
* effectively treated as top-level XIDs and in the typical case pg_subtrans
|
||||
@@ -2623,14 +2623,14 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid)
|
||||
* must hold shared ProcArrayLock to examine the array. To remove XIDs from
|
||||
* the array, the startup process must hold ProcArrayLock exclusively, for
|
||||
* the usual transactional reasons (compare commit/abort of a transaction
|
||||
* during normal running). Compressing unused entries out of the array
|
||||
* during normal running). Compressing unused entries out of the array
|
||||
* likewise requires exclusive lock. To add XIDs to the array, we just insert
|
||||
* them into slots to the right of the head pointer and then advance the head
|
||||
* pointer. This wouldn't require any lock at all, except that on machines
|
||||
* with weak memory ordering we need to be careful that other processors
|
||||
* see the array element changes before they see the head pointer change.
|
||||
* We handle this by using a spinlock to protect reads and writes of the
|
||||
* head/tail pointers. (We could dispense with the spinlock if we were to
|
||||
* head/tail pointers. (We could dispense with the spinlock if we were to
|
||||
* create suitable memory access barrier primitives and use those instead.)
|
||||
* The spinlock must be taken to read or write the head/tail pointers unless
|
||||
* the caller holds ProcArrayLock exclusively.
|
||||
@@ -2727,7 +2727,7 @@ KnownAssignedXidsCompress(bool force)
|
||||
* If exclusive_lock is true then caller already holds ProcArrayLock in
|
||||
* exclusive mode, so we need no extra locking here. Else caller holds no
|
||||
* lock, so we need to be sure we maintain sufficient interlocks against
|
||||
* concurrent readers. (Only the startup process ever calls this, so no need
|
||||
* concurrent readers. (Only the startup process ever calls this, so no need
|
||||
* to worry about concurrent writers.)
|
||||
*/
|
||||
static void
|
||||
@@ -2773,7 +2773,7 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
|
||||
Assert(tail >= 0 && tail < pArray->maxKnownAssignedXids);
|
||||
|
||||
/*
|
||||
* Verify that insertions occur in TransactionId sequence. Note that even
|
||||
* Verify that insertions occur in TransactionId sequence. Note that even
|
||||
* if the last existing element is marked invalid, it must still have a
|
||||
* correctly sequenced XID value.
|
||||
*/
|
||||
@@ -2876,7 +2876,7 @@ KnownAssignedXidsSearch(TransactionId xid, bool remove)
|
||||
}
|
||||
|
||||
/*
|
||||
* Standard binary search. Note we can ignore the KnownAssignedXidsValid
|
||||
* Standard binary search. Note we can ignore the KnownAssignedXidsValid
|
||||
* array here, since even invalid entries will contain sorted XIDs.
|
||||
*/
|
||||
first = tail;
|
||||
|
@@ -26,7 +26,7 @@
|
||||
* for a module and should never be allocated after the shared memory
|
||||
* initialization phase. Hash tables have a fixed maximum size, but
|
||||
* their actual size can vary dynamically. When entries are added
|
||||
* to the table, more space is allocated. Queues link data structures
|
||||
* to the table, more space is allocated. Queues link data structures
|
||||
* that have been allocated either within fixed-size structures or as hash
|
||||
* buckets. Each shared data structure has a string name to identify
|
||||
* it (assigned in the module that declares it).
|
||||
@@ -40,7 +40,7 @@
|
||||
* The shmem index has two purposes: first, it gives us
|
||||
* a simple model of how the world looks when a backend process
|
||||
* initializes. If something is present in the shmem index,
|
||||
* it is initialized. If it is not, it is uninitialized. Second,
|
||||
* it is initialized. If it is not, it is uninitialized. Second,
|
||||
* the shmem index allows us to allocate shared memory on demand
|
||||
* instead of trying to preallocate structures and hard-wire the
|
||||
* sizes and locations in header files. If you are using a lot
|
||||
@@ -55,8 +55,8 @@
|
||||
* pointers using the method described in (b) above.
|
||||
*
|
||||
* (d) memory allocation model: shared memory can never be
|
||||
* freed, once allocated. Each hash table has its own free list,
|
||||
* so hash buckets can be reused when an item is deleted. However,
|
||||
* freed, once allocated. Each hash table has its own free list,
|
||||
* so hash buckets can be reused when an item is deleted. However,
|
||||
* if one hash table grows very large and then shrinks, its space
|
||||
* cannot be redistributed to other tables. We could build a simple
|
||||
* hash bucket garbage collector if need be. Right now, it seems
|
||||
@@ -116,7 +116,7 @@ InitShmemAllocation(void)
|
||||
Assert(shmhdr != NULL);
|
||||
|
||||
/*
|
||||
* Initialize the spinlock used by ShmemAlloc. We have to do the space
|
||||
* Initialize the spinlock used by ShmemAlloc. We have to do the space
|
||||
* allocation the hard way, since obviously ShmemAlloc can't be called
|
||||
* yet.
|
||||
*/
|
||||
@@ -217,7 +217,7 @@ InitShmemIndex(void)
|
||||
*
|
||||
* Since ShmemInitHash calls ShmemInitStruct, which expects the ShmemIndex
|
||||
* hashtable to exist already, we have a bit of a circularity problem in
|
||||
* initializing the ShmemIndex itself. The special "ShmemIndex" hash
|
||||
* initializing the ShmemIndex itself. The special "ShmemIndex" hash
|
||||
* table name will tell ShmemInitStruct to fake it.
|
||||
*/
|
||||
info.keysize = SHMEM_INDEX_KEYSIZE;
|
||||
@@ -294,7 +294,7 @@ ShmemInitHash(const char *name, /* table string name for shmem index */
|
||||
* ShmemInitStruct -- Create/attach to a structure in shared memory.
|
||||
*
|
||||
* This is called during initialization to find or allocate
|
||||
* a data structure in shared memory. If no other process
|
||||
* a data structure in shared memory. If no other process
|
||||
* has created the structure, this routine allocates space
|
||||
* for it. If it exists already, a pointer to the existing
|
||||
* structure is returned.
|
||||
@@ -303,7 +303,7 @@ ShmemInitHash(const char *name, /* table string name for shmem index */
|
||||
* already in the shmem index (hence, already initialized).
|
||||
*
|
||||
* Note: before Postgres 9.0, this function returned NULL for some failure
|
||||
* cases. Now, it always throws error instead, so callers need not check
|
||||
* cases. Now, it always throws error instead, so callers need not check
|
||||
* for NULL.
|
||||
*/
|
||||
void *
|
||||
@@ -335,7 +335,7 @@ ShmemInitStruct(const char *name, Size size, bool *foundPtr)
|
||||
* be trying to init the shmem index itself.
|
||||
*
|
||||
* Notice that the ShmemIndexLock is released before the shmem
|
||||
* index has been initialized. This should be OK because no other
|
||||
* index has been initialized. This should be OK because no other
|
||||
* process can be accessing shared memory yet.
|
||||
*/
|
||||
Assert(shmemseghdr->index == NULL);
|
||||
|
@@ -14,7 +14,7 @@
|
||||
*
|
||||
* Package for managing doubly-linked lists in shared memory.
|
||||
* The only tricky thing is that SHM_QUEUE will usually be a field
|
||||
* in a larger record. SHMQueueNext has to return a pointer
|
||||
* in a larger record. SHMQueueNext has to return a pointer
|
||||
* to the record itself instead of a pointer to the SHMQueue field
|
||||
* of the record. It takes an extra parameter and does some extra
|
||||
* pointer arithmetic to do this correctly.
|
||||
|
@@ -26,7 +26,7 @@
|
||||
* Because backends sitting idle will not be reading sinval events, we
|
||||
* need a way to give an idle backend a swift kick in the rear and make
|
||||
* it catch up before the sinval queue overflows and forces it to go
|
||||
* through a cache reset exercise. This is done by sending
|
||||
* through a cache reset exercise. This is done by sending
|
||||
* PROCSIG_CATCHUP_INTERRUPT to any backend that gets too far behind.
|
||||
*
|
||||
* State for catchup events consists of two flags: one saying whether
|
||||
@@ -65,7 +65,7 @@ SendSharedInvalidMessages(const SharedInvalidationMessage *msgs, int n)
|
||||
* NOTE: it is entirely possible for this routine to be invoked recursively
|
||||
* as a consequence of processing inside the invalFunction or resetFunction.
|
||||
* Furthermore, such a recursive call must guarantee that all outstanding
|
||||
* inval messages have been processed before it exits. This is the reason
|
||||
* inval messages have been processed before it exits. This is the reason
|
||||
* for the strange-looking choice to use a statically allocated buffer array
|
||||
* and counters; it's so that a recursive call can process messages already
|
||||
* sucked out of sinvaladt.c.
|
||||
@@ -131,7 +131,7 @@ ReceiveSharedInvalidMessages(
|
||||
* We are now caught up. If we received a catchup signal, reset that
|
||||
* flag, and call SICleanupQueue(). This is not so much because we need
|
||||
* to flush dead messages right now, as that we want to pass on the
|
||||
* catchup signal to the next slowest backend. "Daisy chaining" the
|
||||
* catchup signal to the next slowest backend. "Daisy chaining" the
|
||||
* catchup signal this way avoids creating spikes in system load for what
|
||||
* should be just a background maintenance activity.
|
||||
*/
|
||||
@@ -151,7 +151,7 @@ ReceiveSharedInvalidMessages(
|
||||
*
|
||||
* If we are idle (catchupInterruptEnabled is set), we can safely
|
||||
* invoke ProcessCatchupEvent directly. Otherwise, just set a flag
|
||||
* to do it later. (Note that it's quite possible for normal processing
|
||||
* to do it later. (Note that it's quite possible for normal processing
|
||||
* of the current transaction to cause ReceiveSharedInvalidMessages()
|
||||
* to be run later on; in that case the flag will get cleared again,
|
||||
* since there's no longer any reason to do anything.)
|
||||
@@ -227,7 +227,7 @@ HandleCatchupInterrupt(void)
|
||||
* EnableCatchupInterrupt
|
||||
*
|
||||
* This is called by the PostgresMain main loop just before waiting
|
||||
* for a frontend command. We process any pending catchup events,
|
||||
* for a frontend command. We process any pending catchup events,
|
||||
* and enable the signal handler to process future events directly.
|
||||
*
|
||||
* NOTE: the signal handler starts out disabled, and stays so until
|
||||
@@ -272,7 +272,7 @@ EnableCatchupInterrupt(void)
|
||||
* DisableCatchupInterrupt
|
||||
*
|
||||
* This is called by the PostgresMain main loop just after receiving
|
||||
* a frontend command. Signal handler execution of catchup events
|
||||
* a frontend command. Signal handler execution of catchup events
|
||||
* is disabled until the next EnableCatchupInterrupt call.
|
||||
*
|
||||
* The PROCSIG_NOTIFY_INTERRUPT signal handler also needs to call this,
|
||||
|
@@ -45,7 +45,7 @@
|
||||
* In reality, the messages are stored in a circular buffer of MAXNUMMESSAGES
|
||||
* entries. We translate MsgNum values into circular-buffer indexes by
|
||||
* computing MsgNum % MAXNUMMESSAGES (this should be fast as long as
|
||||
* MAXNUMMESSAGES is a constant and a power of 2). As long as maxMsgNum
|
||||
* MAXNUMMESSAGES is a constant and a power of 2). As long as maxMsgNum
|
||||
* doesn't exceed minMsgNum by more than MAXNUMMESSAGES, we have enough space
|
||||
* in the buffer. If the buffer does overflow, we recover by setting the
|
||||
* "reset" flag for each backend that has fallen too far behind. A backend
|
||||
@@ -58,7 +58,7 @@
|
||||
* normal behavior is that at most one such interrupt is in flight at a time;
|
||||
* when a backend completes processing a catchup interrupt, it executes
|
||||
* SICleanupQueue, which will signal the next-furthest-behind backend if
|
||||
* needed. This avoids undue contention from multiple backends all trying
|
||||
* needed. This avoids undue contention from multiple backends all trying
|
||||
* to catch up at once. However, the furthest-back backend might be stuck
|
||||
* in a state where it can't catch up. Eventually it will get reset, so it
|
||||
* won't cause any more problems for anyone but itself. But we don't want
|
||||
@@ -89,7 +89,7 @@
|
||||
* the writer wants to change maxMsgNum while readers need to read it.
|
||||
* We deal with that by having a spinlock that readers must take for just
|
||||
* long enough to read maxMsgNum, while writers take it for just long enough
|
||||
* to write maxMsgNum. (The exact rule is that you need the spinlock to
|
||||
* to write maxMsgNum. (The exact rule is that you need the spinlock to
|
||||
* read maxMsgNum if you are not holding SInvalWriteLock, and you need the
|
||||
* spinlock to write maxMsgNum unless you are holding both locks.)
|
||||
*
|
||||
@@ -404,7 +404,7 @@ SIInsertDataEntries(const SharedInvalidationMessage *data, int n)
|
||||
SISeg *segP = shmInvalBuffer;
|
||||
|
||||
/*
|
||||
* N can be arbitrarily large. We divide the work into groups of no more
|
||||
* N can be arbitrarily large. We divide the work into groups of no more
|
||||
* than WRITE_QUANTUM messages, to be sure that we don't hold the lock for
|
||||
* an unreasonably long time. (This is not so much because we care about
|
||||
* letting in other writers, as that some just-caught-up backend might be
|
||||
@@ -426,7 +426,7 @@ SIInsertDataEntries(const SharedInvalidationMessage *data, int n)
|
||||
* If the buffer is full, we *must* acquire some space. Clean the
|
||||
* queue and reset anyone who is preventing space from being freed.
|
||||
* Otherwise, clean the queue only when it's exceeded the next
|
||||
* fullness threshold. We have to loop and recheck the buffer state
|
||||
* fullness threshold. We have to loop and recheck the buffer state
|
||||
* after any call of SICleanupQueue.
|
||||
*/
|
||||
for (;;)
|
||||
@@ -480,11 +480,11 @@ SIInsertDataEntries(const SharedInvalidationMessage *data, int n)
|
||||
* executing on behalf of other backends, since each instance will modify only
|
||||
* fields of its own backend's ProcState, and no instance will look at fields
|
||||
* of other backends' ProcStates. We express this by grabbing SInvalReadLock
|
||||
* in shared mode. Note that this is not exactly the normal (read-only)
|
||||
* in shared mode. Note that this is not exactly the normal (read-only)
|
||||
* interpretation of a shared lock! Look closely at the interactions before
|
||||
* allowing SInvalReadLock to be grabbed in shared mode for any other reason!
|
||||
*
|
||||
* NB: this can also run in parallel with SIInsertDataEntries. It is not
|
||||
* NB: this can also run in parallel with SIInsertDataEntries. It is not
|
||||
* guaranteed that we will return any messages added after the routine is
|
||||
* entered.
|
||||
*
|
||||
@@ -567,7 +567,7 @@ SIGetDataEntries(SharedInvalidationMessage *data, int datasize)
|
||||
*
|
||||
* Caution: because we transiently release write lock when we have to signal
|
||||
* some other backend, it is NOT guaranteed that there are still minFree
|
||||
* free message slots at exit. Caller must recheck and perhaps retry.
|
||||
* free message slots at exit. Caller must recheck and perhaps retry.
|
||||
*/
|
||||
void
|
||||
SICleanupQueue(bool callerHasWriteLock, int minFree)
|
||||
@@ -588,7 +588,7 @@ SICleanupQueue(bool callerHasWriteLock, int minFree)
|
||||
/*
|
||||
* Recompute minMsgNum = minimum of all backends' nextMsgNum, identify the
|
||||
* furthest-back backend that needs signaling (if any), and reset any
|
||||
* backends that are too far back. Note that because we ignore sendOnly
|
||||
* backends that are too far back. Note that because we ignore sendOnly
|
||||
* backends here it is possible for them to keep sending messages without
|
||||
* a problem even when they are the only active backend.
|
||||
*/
|
||||
|
@@ -131,7 +131,7 @@ GetStandbyLimitTime(void)
|
||||
|
||||
/*
|
||||
* The cutoff time is the last WAL data receipt time plus the appropriate
|
||||
* delay variable. Delay of -1 means wait forever.
|
||||
* delay variable. Delay of -1 means wait forever.
|
||||
*/
|
||||
GetXLogReceiptTime(&rtime, &fromStream);
|
||||
if (fromStream)
|
||||
|
@@ -801,7 +801,7 @@ inv_truncate(LargeObjectDesc *obj_desc, int len)
|
||||
|
||||
/*
|
||||
* If we found the page of the truncation point we need to truncate the
|
||||
* data in it. Otherwise if we're in a hole, we need to create a page to
|
||||
* data in it. Otherwise if we're in a hole, we need to create a page to
|
||||
* mark the end of data.
|
||||
*/
|
||||
if (olddata != NULL && olddata->pageno == pageno)
|
||||
|
@@ -51,7 +51,7 @@ typedef struct
|
||||
} WAIT_ORDER;
|
||||
|
||||
/*
|
||||
* Information saved about each edge in a detected deadlock cycle. This
|
||||
* Information saved about each edge in a detected deadlock cycle. This
|
||||
* is used to print a diagnostic message upon failure.
|
||||
*
|
||||
* Note: because we want to examine this info after releasing the lock
|
||||
@@ -119,7 +119,7 @@ static PGPROC *blocking_autovacuum_proc = NULL;
|
||||
* InitDeadLockChecking -- initialize deadlock checker during backend startup
|
||||
*
|
||||
* This does per-backend initialization of the deadlock checker; primarily,
|
||||
* allocation of working memory for DeadLockCheck. We do this per-backend
|
||||
* allocation of working memory for DeadLockCheck. We do this per-backend
|
||||
* since there's no percentage in making the kernel do copy-on-write
|
||||
* inheritance of workspace from the postmaster. We want to allocate the
|
||||
* space at startup because (a) the deadlock checker might be invoked when
|
||||
@@ -291,10 +291,10 @@ GetBlockingAutoVacuumPgproc(void)
|
||||
* DeadLockCheckRecurse -- recursively search for valid orderings
|
||||
*
|
||||
* curConstraints[] holds the current set of constraints being considered
|
||||
* by an outer level of recursion. Add to this each possible solution
|
||||
* by an outer level of recursion. Add to this each possible solution
|
||||
* constraint for any cycle detected at this level.
|
||||
*
|
||||
* Returns TRUE if no solution exists. Returns FALSE if a deadlock-free
|
||||
* Returns TRUE if no solution exists. Returns FALSE if a deadlock-free
|
||||
* state is attainable, in which case waitOrders[] shows the required
|
||||
* rearrangements of lock wait queues (if any).
|
||||
*/
|
||||
@@ -429,7 +429,7 @@ TestConfiguration(PGPROC *startProc)
|
||||
*
|
||||
* Since we need to be able to check hypothetical configurations that would
|
||||
* exist after wait queue rearrangement, the routine pays attention to the
|
||||
* table of hypothetical queue orders in waitOrders[]. These orders will
|
||||
* table of hypothetical queue orders in waitOrders[]. These orders will
|
||||
* be believed in preference to the actual ordering seen in the locktable.
|
||||
*/
|
||||
static bool
|
||||
@@ -505,7 +505,7 @@ FindLockCycleRecurse(PGPROC *checkProc,
|
||||
conflictMask = lockMethodTable->conflictTab[checkProc->waitLockMode];
|
||||
|
||||
/*
|
||||
* Scan for procs that already hold conflicting locks. These are "hard"
|
||||
* Scan for procs that already hold conflicting locks. These are "hard"
|
||||
* edges in the waits-for graph.
|
||||
*/
|
||||
procLocks = &(lock->procLocks);
|
||||
@@ -703,7 +703,7 @@ ExpandConstraints(EDGE *constraints,
|
||||
nWaitOrders = 0;
|
||||
|
||||
/*
|
||||
* Scan constraint list backwards. This is because the last-added
|
||||
* Scan constraint list backwards. This is because the last-added
|
||||
* constraint is the only one that could fail, and so we want to test it
|
||||
* for inconsistency first.
|
||||
*/
|
||||
@@ -757,7 +757,7 @@ ExpandConstraints(EDGE *constraints,
|
||||
* The initial queue ordering is taken directly from the lock's wait queue.
|
||||
* The output is an array of PGPROC pointers, of length equal to the lock's
|
||||
* wait queue length (the caller is responsible for providing this space).
|
||||
* The partial order is specified by an array of EDGE structs. Each EDGE
|
||||
* The partial order is specified by an array of EDGE structs. Each EDGE
|
||||
* is one that we need to reverse, therefore the "waiter" must appear before
|
||||
* the "blocker" in the output array. The EDGE array may well contain
|
||||
* edges associated with other locks; these should be ignored.
|
||||
@@ -827,7 +827,7 @@ TopoSort(LOCK *lock,
|
||||
afterConstraints[k] = i + 1;
|
||||
}
|
||||
/*--------------------
|
||||
* Now scan the topoProcs array backwards. At each step, output the
|
||||
* Now scan the topoProcs array backwards. At each step, output the
|
||||
* last proc that has no remaining before-constraints, and decrease
|
||||
* the beforeConstraints count of each of the procs it was constrained
|
||||
* against.
|
||||
|
@@ -65,7 +65,7 @@ SetLocktagRelationOid(LOCKTAG *tag, Oid relid)
|
||||
/*
|
||||
* LockRelationOid
|
||||
*
|
||||
* Lock a relation given only its OID. This should generally be used
|
||||
* Lock a relation given only its OID. This should generally be used
|
||||
* before attempting to open the relation's relcache entry.
|
||||
*/
|
||||
void
|
||||
@@ -252,7 +252,7 @@ LockHasWaitersRelation(Relation relation, LOCKMODE lockmode)
|
||||
/*
|
||||
* LockRelationIdForSession
|
||||
*
|
||||
* This routine grabs a session-level lock on the target relation. The
|
||||
* This routine grabs a session-level lock on the target relation. The
|
||||
* session lock persists across transaction boundaries. It will be removed
|
||||
* when UnlockRelationIdForSession() is called, or if an ereport(ERROR) occurs,
|
||||
* or if the backend exits.
|
||||
@@ -455,7 +455,7 @@ XactLockTableInsert(TransactionId xid)
|
||||
*
|
||||
* Delete the lock showing that the given transaction ID is running.
|
||||
* (This is never used for main transaction IDs; those locks are only
|
||||
* released implicitly at transaction end. But we do use it for subtrans IDs.)
|
||||
* released implicitly at transaction end. But we do use it for subtrans IDs.)
|
||||
*/
|
||||
void
|
||||
XactLockTableDelete(TransactionId xid)
|
||||
@@ -476,7 +476,7 @@ XactLockTableDelete(TransactionId xid)
|
||||
* subtransaction, we will exit as soon as it aborts or its top parent commits.
|
||||
* It takes some extra work to ensure this, because to save on shared memory
|
||||
* the XID lock of a subtransaction is released when it ends, whether
|
||||
* successfully or unsuccessfully. So we have to check if it's "still running"
|
||||
* successfully or unsuccessfully. So we have to check if it's "still running"
|
||||
* and if so wait for its parent.
|
||||
*/
|
||||
void
|
||||
|
@@ -873,7 +873,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
|
||||
|
||||
/*
|
||||
* If lock requested conflicts with locks requested by waiters, must join
|
||||
* wait queue. Otherwise, check for conflict with already-held locks.
|
||||
* wait queue. Otherwise, check for conflict with already-held locks.
|
||||
* (That's last because most complex check.)
|
||||
*/
|
||||
if (lockMethodTable->conflictTab[lockmode] & lock->waitMask)
|
||||
@@ -950,7 +950,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
|
||||
|
||||
/*
|
||||
* NOTE: do not do any material change of state between here and
|
||||
* return. All required changes in locktable state must have been
|
||||
* return. All required changes in locktable state must have been
|
||||
* done when the lock was granted to us --- see notes in WaitOnLock.
|
||||
*/
|
||||
|
||||
@@ -980,7 +980,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
|
||||
{
|
||||
/*
|
||||
* Decode the locktag back to the original values, to avoid sending
|
||||
* lots of empty bytes with every message. See lock.h to check how a
|
||||
* lots of empty bytes with every message. See lock.h to check how a
|
||||
* locktag is defined for LOCKTAG_RELATION
|
||||
*/
|
||||
LogAccessExclusiveLock(locktag->locktag_field1,
|
||||
@@ -1045,7 +1045,7 @@ LockCheckConflicts(LockMethod lockMethodTable,
|
||||
}
|
||||
|
||||
/*
|
||||
* Rats. Something conflicts. But it could still be my own lock. We have
|
||||
* Rats. Something conflicts. But it could still be my own lock. We have
|
||||
* to construct a conflict mask that does not reflect our own locks, but
|
||||
* only lock types held by other processes.
|
||||
*/
|
||||
@@ -1137,7 +1137,7 @@ UnGrantLock(LOCK *lock, LOCKMODE lockmode,
|
||||
|
||||
/*
|
||||
* We need only run ProcLockWakeup if the released lock conflicts with at
|
||||
* least one of the lock types requested by waiter(s). Otherwise whatever
|
||||
* least one of the lock types requested by waiter(s). Otherwise whatever
|
||||
* conflict made them wait must still exist. NOTE: before MVCC, we could
|
||||
* skip wakeup if lock->granted[lockmode] was still positive. But that's
|
||||
* not true anymore, because the remaining granted locks might belong to
|
||||
@@ -1157,7 +1157,7 @@ UnGrantLock(LOCK *lock, LOCKMODE lockmode,
|
||||
}
|
||||
|
||||
/*
|
||||
* CleanUpLock -- clean up after releasing a lock. We garbage-collect the
|
||||
* CleanUpLock -- clean up after releasing a lock. We garbage-collect the
|
||||
* proclock and lock objects if possible, and call ProcLockWakeup if there
|
||||
* are remaining requests and the caller says it's OK. (Normally, this
|
||||
* should be called after UnGrantLock, and wakeupNeeded is the result from
|
||||
@@ -1516,7 +1516,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
|
||||
}
|
||||
|
||||
/*
|
||||
* Decrease the total local count. If we're still holding the lock, we're
|
||||
* Decrease the total local count. If we're still holding the lock, we're
|
||||
* done.
|
||||
*/
|
||||
locallock->nLocks--;
|
||||
@@ -2272,7 +2272,7 @@ PostPrepare_Locks(TransactionId xid)
|
||||
/*
|
||||
* We cannot simply modify proclock->tag.myProc to reassign
|
||||
* ownership of the lock, because that's part of the hash key and
|
||||
* the proclock would then be in the wrong hash chain. So, unlink
|
||||
* the proclock would then be in the wrong hash chain. So, unlink
|
||||
* and delete the old proclock; create a new one with the right
|
||||
* contents; and link it into place. We do it in this order to be
|
||||
* certain we won't run out of shared memory (the way dynahash.c
|
||||
@@ -2394,7 +2394,7 @@ GetLockStatusData(void)
|
||||
* view of the state.
|
||||
*
|
||||
* Since this is a read-only operation, we take shared instead of
|
||||
* exclusive lock. There's not a whole lot of point to this, because all
|
||||
* exclusive lock. There's not a whole lot of point to this, because all
|
||||
* the normal operations require exclusive lock, but it doesn't hurt
|
||||
* anything either. It will at least allow two backends to do
|
||||
* GetLockStatusData in parallel.
|
||||
|
@@ -6,7 +6,7 @@
|
||||
* Lightweight locks are intended primarily to provide mutual exclusion of
|
||||
* access to shared-memory data structures. Therefore, they offer both
|
||||
* exclusive and shared lock modes (to support read/write and read-only
|
||||
* access to a shared object). There are few other frammishes. User-level
|
||||
* access to a shared object). There are few other frammishes. User-level
|
||||
* locking should be done with the full lock manager --- which depends on
|
||||
* LWLocks to protect its shared state.
|
||||
*
|
||||
@@ -53,7 +53,7 @@ typedef struct LWLock
|
||||
* (LWLockIds are indexes into the array.) We force the array stride to
|
||||
* be a power of 2, which saves a few cycles in indexing, but more
|
||||
* importantly also ensures that individual LWLocks don't cross cache line
|
||||
* boundaries. This reduces cache contention problems, especially on AMD
|
||||
* boundaries. This reduces cache contention problems, especially on AMD
|
||||
* Opterons. (Of course, we have to also ensure that the array start
|
||||
* address is suitably aligned.)
|
||||
*
|
||||
@@ -200,7 +200,7 @@ NumLWLocks(void)
|
||||
* a loadable module.
|
||||
*
|
||||
* This is only useful if called from the _PG_init hook of a library that
|
||||
* is loaded into the postmaster via shared_preload_libraries. Once
|
||||
* is loaded into the postmaster via shared_preload_libraries. Once
|
||||
* shared memory has been allocated, calls will be ignored. (We could
|
||||
* raise an error, but it seems better to make it a no-op, so that
|
||||
* libraries containing such calls can be reloaded if needed.)
|
||||
@@ -378,7 +378,7 @@ LWLockAcquire(LWLockId lockid, LWLockMode mode)
|
||||
* in the presence of contention. The efficiency of being able to do that
|
||||
* outweighs the inefficiency of sometimes wasting a process dispatch
|
||||
* cycle because the lock is not free when a released waiter finally gets
|
||||
* to run. See pgsql-hackers archives for 29-Dec-01.
|
||||
* to run. See pgsql-hackers archives for 29-Dec-01.
|
||||
*/
|
||||
for (;;)
|
||||
{
|
||||
|
@@ -32,11 +32,11 @@
|
||||
* examining the MVCC data.)
|
||||
*
|
||||
* (1) Besides tuples actually read, they must cover ranges of tuples
|
||||
* which would have been read based on the predicate. This will
|
||||
* which would have been read based on the predicate. This will
|
||||
* require modelling the predicates through locks against database
|
||||
* objects such as pages, index ranges, or entire tables.
|
||||
*
|
||||
* (2) They must be kept in RAM for quick access. Because of this, it
|
||||
* (2) They must be kept in RAM for quick access. Because of this, it
|
||||
* isn't possible to always maintain tuple-level granularity -- when
|
||||
* the space allocated to store these approaches exhaustion, a
|
||||
* request for a lock may need to scan for situations where a single
|
||||
@@ -49,7 +49,7 @@
|
||||
*
|
||||
* (4) While they are associated with a transaction, they must survive
|
||||
* a successful COMMIT of that transaction, and remain until all
|
||||
* overlapping transactions complete. This even means that they
|
||||
* overlapping transactions complete. This even means that they
|
||||
* must survive termination of the transaction's process. If a
|
||||
* top level transaction is rolled back, however, it is immediately
|
||||
* flagged so that it can be ignored, and its SIREAD locks can be
|
||||
@@ -90,7 +90,7 @@
|
||||
* may yet matter because they overlap still-active transactions.
|
||||
*
|
||||
* SerializablePredicateLockListLock
|
||||
* - Protects the linked list of locks held by a transaction. Note
|
||||
* - Protects the linked list of locks held by a transaction. Note
|
||||
* that the locks themselves are also covered by the partition
|
||||
* locks of their respective lock targets; this lock only affects
|
||||
* the linked list connecting the locks related to a transaction.
|
||||
@@ -101,11 +101,11 @@
|
||||
* - It is relatively infrequent that another process needs to
|
||||
* modify the list for a transaction, but it does happen for such
|
||||
* things as index page splits for pages with predicate locks and
|
||||
* freeing of predicate locked pages by a vacuum process. When
|
||||
* freeing of predicate locked pages by a vacuum process. When
|
||||
* removing a lock in such cases, the lock itself contains the
|
||||
* pointers needed to remove it from the list. When adding a
|
||||
* lock in such cases, the lock can be added using the anchor in
|
||||
* the transaction structure. Neither requires walking the list.
|
||||
* the transaction structure. Neither requires walking the list.
|
||||
* - Cleaning up the list for a terminated transaction is sometimes
|
||||
* not done on a retail basis, in which case no lock is required.
|
||||
* - Due to the above, a process accessing its active transaction's
|
||||
@@ -348,7 +348,7 @@ int max_predicate_locks_per_xact; /* set by guc.c */
|
||||
|
||||
/*
|
||||
* This provides a list of objects in order to track transactions
|
||||
* participating in predicate locking. Entries in the list are fixed size,
|
||||
* participating in predicate locking. Entries in the list are fixed size,
|
||||
* and reside in shared memory. The memory address of an entry must remain
|
||||
* fixed during its lifetime. The list will be protected from concurrent
|
||||
* update externally; no provision is made in this code to manage that. The
|
||||
@@ -538,7 +538,7 @@ SerializationNeededForWrite(Relation relation)
|
||||
|
||||
/*
|
||||
* These functions are a simple implementation of a list for this specific
|
||||
* type of struct. If there is ever a generalized shared memory list, we
|
||||
* type of struct. If there is ever a generalized shared memory list, we
|
||||
* should probably switch to that.
|
||||
*/
|
||||
static SERIALIZABLEXACT *
|
||||
@@ -758,7 +758,7 @@ OldSerXidPagePrecedesLogically(int p, int q)
|
||||
int diff;
|
||||
|
||||
/*
|
||||
* We have to compare modulo (OLDSERXID_MAX_PAGE+1)/2. Both inputs should
|
||||
* We have to compare modulo (OLDSERXID_MAX_PAGE+1)/2. Both inputs should
|
||||
* be in the range 0..OLDSERXID_MAX_PAGE.
|
||||
*/
|
||||
Assert(p >= 0 && p <= OLDSERXID_MAX_PAGE);
|
||||
@@ -920,7 +920,7 @@ OldSerXidAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
|
||||
}
|
||||
|
||||
/*
|
||||
* Get the minimum commitSeqNo for any conflict out for the given xid. For
|
||||
* Get the minimum commitSeqNo for any conflict out for the given xid. For
|
||||
* a transaction which exists but has no conflict out, InvalidSerCommitSeqNo
|
||||
* will be returned.
|
||||
*/
|
||||
@@ -973,7 +973,7 @@ OldSerXidSetActiveSerXmin(TransactionId xid)
|
||||
/*
|
||||
* When no sxacts are active, nothing overlaps, set the xid values to
|
||||
* invalid to show that there are no valid entries. Don't clear headPage,
|
||||
* though. A new xmin might still land on that page, and we don't want to
|
||||
* though. A new xmin might still land on that page, and we don't want to
|
||||
* repeatedly zero out the same page.
|
||||
*/
|
||||
if (!TransactionIdIsValid(xid))
|
||||
@@ -1458,7 +1458,7 @@ SummarizeOldestCommittedSxact(void)
|
||||
|
||||
/*
|
||||
* Grab the first sxact off the finished list -- this will be the earliest
|
||||
* commit. Remove it from the list.
|
||||
* commit. Remove it from the list.
|
||||
*/
|
||||
sxact = (SERIALIZABLEXACT *)
|
||||
SHMQueueNext(FinishedSerializableTransactions,
|
||||
@@ -1974,7 +1974,7 @@ RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
|
||||
/*
|
||||
* Delete child target locks owned by this process.
|
||||
* This implementation is assuming that the usage of each target tag field
|
||||
* is uniform. No need to make this hard if we don't have to.
|
||||
* is uniform. No need to make this hard if we don't have to.
|
||||
*
|
||||
* We aren't acquiring lightweight locks for the predicate lock or lock
|
||||
* target structures associated with this transaction unless we're going
|
||||
@@ -2420,7 +2420,7 @@ PredicateLockTuple(Relation relation, HeapTuple tuple, Snapshot snapshot)
|
||||
}
|
||||
|
||||
/*
|
||||
* Do quick-but-not-definitive test for a relation lock first. This will
|
||||
* Do quick-but-not-definitive test for a relation lock first. This will
|
||||
* never cause a return when the relation is *not* locked, but will
|
||||
* occasionally let the check continue when there really *is* a relation
|
||||
* level lock.
|
||||
@@ -2732,7 +2732,7 @@ exit:
|
||||
* transaction which is not serializable.
|
||||
*
|
||||
* NOTE: This is currently only called with transfer set to true, but that may
|
||||
* change. If we decide to clean up the locks from a table on commit of a
|
||||
* change. If we decide to clean up the locks from a table on commit of a
|
||||
* transaction which executed DROP TABLE, the false condition will be useful.
|
||||
*/
|
||||
static void
|
||||
@@ -2813,7 +2813,7 @@ DropAllPredicateLocksFromTable(Relation relation, bool transfer)
|
||||
continue; /* already the right lock */
|
||||
|
||||
/*
|
||||
* If we made it here, we have work to do. We make sure the heap
|
||||
* If we made it here, we have work to do. We make sure the heap
|
||||
* relation lock exists, then we walk the list of predicate locks for
|
||||
* the old target we found, moving all locks to the heap relation lock
|
||||
* -- unless they already hold that.
|
||||
@@ -3158,7 +3158,7 @@ ReleasePredicateLocks(bool isCommit)
|
||||
* If this value is changing, we don't care that much whether we get the
|
||||
* old or new value -- it is just used to determine how far
|
||||
* GlobalSerizableXmin must advance before this transaction can be fully
|
||||
* cleaned up. The worst that could happen is we wait for one more
|
||||
* cleaned up. The worst that could happen is we wait for one more
|
||||
* transaction to complete before freeing some RAM; correctness of visible
|
||||
* behavior is not affected.
|
||||
*/
|
||||
@@ -3260,7 +3260,7 @@ ReleasePredicateLocks(bool isCommit)
|
||||
}
|
||||
|
||||
/*
|
||||
* Release all outConflicts to committed transactions. If we're rolling
|
||||
* Release all outConflicts to committed transactions. If we're rolling
|
||||
* back clear them all. Set SXACT_FLAG_CONFLICT_OUT if any point to
|
||||
* previously committed transactions.
|
||||
*/
|
||||
@@ -3579,7 +3579,7 @@ ClearOldPredicateLocks(void)
|
||||
* matter -- but keep the transaction entry itself and any outConflicts.
|
||||
*
|
||||
* When the summarize flag is set, we've run short of room for sxact data
|
||||
* and must summarize to the SLRU. Predicate locks are transferred to a
|
||||
* and must summarize to the SLRU. Predicate locks are transferred to a
|
||||
* dummy "old" transaction, with duplicate locks on a single target
|
||||
* collapsing to a single lock with the "latest" commitSeqNo from among
|
||||
* the conflicting locks..
|
||||
@@ -3772,7 +3772,7 @@ XidIsConcurrent(TransactionId xid)
|
||||
/*
|
||||
* CheckForSerializableConflictOut
|
||||
* We are reading a tuple which has been modified. If it is visible to
|
||||
* us but has been deleted, that indicates a rw-conflict out. If it's
|
||||
* us but has been deleted, that indicates a rw-conflict out. If it's
|
||||
* not visible and was created by a concurrent (overlapping)
|
||||
* serializable transaction, that is also a rw-conflict out,
|
||||
*
|
||||
@@ -3859,7 +3859,7 @@ CheckForSerializableConflictOut(bool visible, Relation relation,
|
||||
Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
|
||||
|
||||
/*
|
||||
* Find top level xid. Bail out if xid is too early to be a conflict, or
|
||||
* Find top level xid. Bail out if xid is too early to be a conflict, or
|
||||
* if it's our own xid.
|
||||
*/
|
||||
if (TransactionIdEquals(xid, GetTopTransactionIdIfAny()))
|
||||
@@ -3924,7 +3924,7 @@ CheckForSerializableConflictOut(bool visible, Relation relation,
|
||||
|
||||
/*
|
||||
* We have a conflict out to a transaction which has a conflict out to a
|
||||
* summarized transaction. That summarized transaction must have
|
||||
* summarized transaction. That summarized transaction must have
|
||||
* committed first, and we can't tell when it committed in relation to our
|
||||
* snapshot acquisition, so something needs to be canceled.
|
||||
*/
|
||||
@@ -3958,7 +3958,7 @@ CheckForSerializableConflictOut(bool visible, Relation relation,
|
||||
&& (!SxactHasConflictOut(sxact)
|
||||
|| MySerializableXact->SeqNo.lastCommitBeforeSnapshot < sxact->SeqNo.earliestOutConflictCommit))
|
||||
{
|
||||
/* Read-only transaction will appear to run first. No conflict. */
|
||||
/* Read-only transaction will appear to run first. No conflict. */
|
||||
LWLockRelease(SerializableXactHashLock);
|
||||
return;
|
||||
}
|
||||
@@ -4549,7 +4549,7 @@ OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
|
||||
*
|
||||
* If a dangerous structure is found, the pivot (the near conflict) is
|
||||
* marked for death, because rolling back another transaction might mean
|
||||
* that we flail without ever making progress. This transaction is
|
||||
* that we flail without ever making progress. This transaction is
|
||||
* committing writes, so letting it commit ensures progress. If we
|
||||
* canceled the far conflict, it might immediately fail again on retry.
|
||||
*/
|
||||
|
@@ -355,7 +355,7 @@ InitProcess(void)
|
||||
|
||||
/*
|
||||
* We might be reusing a semaphore that belonged to a failed process. So
|
||||
* be careful and reinitialize its value here. (This is not strictly
|
||||
* be careful and reinitialize its value here. (This is not strictly
|
||||
* necessary anymore, but seems like a good idea for cleanliness.)
|
||||
*/
|
||||
PGSemaphoreReset(&MyProc->sem);
|
||||
@@ -405,7 +405,7 @@ InitProcessPhase2(void)
|
||||
*
|
||||
* Auxiliary processes are presently not expected to wait for real (lockmgr)
|
||||
* locks, so we need not set up the deadlock checker. They are never added
|
||||
* to the ProcArray or the sinval messaging mechanism, either. They also
|
||||
* to the ProcArray or the sinval messaging mechanism, either. They also
|
||||
* don't get a VXID assigned, since this is only useful when we actually
|
||||
* hold lockmgr locks.
|
||||
*
|
||||
@@ -502,7 +502,7 @@ InitAuxiliaryProcess(void)
|
||||
|
||||
/*
|
||||
* We might be reusing a semaphore that belonged to a failed process. So
|
||||
* be careful and reinitialize its value here. (This is not strictly
|
||||
* be careful and reinitialize its value here. (This is not strictly
|
||||
* necessary anymore, but seems like a good idea for cleanliness.)
|
||||
*/
|
||||
PGSemaphoreReset(&MyProc->sem);
|
||||
@@ -642,7 +642,7 @@ LockWaitCancel(void)
|
||||
|
||||
/*
|
||||
* We used to do PGSemaphoreReset() here to ensure that our proc's wait
|
||||
* semaphore is reset to zero. This prevented a leftover wakeup signal
|
||||
* semaphore is reset to zero. This prevented a leftover wakeup signal
|
||||
* from remaining in the semaphore if someone else had granted us the lock
|
||||
* we wanted before we were able to remove ourselves from the wait-list.
|
||||
* However, now that ProcSleep loops until waitStatus changes, a leftover
|
||||
@@ -758,7 +758,7 @@ ProcKill(int code, Datum arg)
|
||||
|
||||
/*
|
||||
* AuxiliaryProcKill() -- Cut-down version of ProcKill for auxiliary
|
||||
* processes (bgwriter, etc). The PGPROC and sema are not released, only
|
||||
* processes (bgwriter, etc). The PGPROC and sema are not released, only
|
||||
* marked as not-in-use.
|
||||
*/
|
||||
static void
|
||||
@@ -884,7 +884,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
|
||||
*
|
||||
* Special case: if I find I should go in front of some waiter, check to
|
||||
* see if I conflict with already-held locks or the requests before that
|
||||
* waiter. If not, then just grant myself the requested lock immediately.
|
||||
* waiter. If not, then just grant myself the requested lock immediately.
|
||||
* This is the same as the test for immediate grant in LockAcquire, except
|
||||
* we are only considering the part of the wait queue before my insertion
|
||||
* point.
|
||||
@@ -903,7 +903,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
|
||||
if (lockMethodTable->conflictTab[lockmode] & proc->heldLocks)
|
||||
{
|
||||
/*
|
||||
* Yes, so we have a deadlock. Easiest way to clean up
|
||||
* Yes, so we have a deadlock. Easiest way to clean up
|
||||
* correctly is to call RemoveFromWaitQueue(), but we
|
||||
* can't do that until we are *on* the wait queue. So, set
|
||||
* a flag to check below, and break out of loop. Also,
|
||||
@@ -1010,8 +1010,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
|
||||
|
||||
/*
|
||||
* If someone wakes us between LWLockRelease and PGSemaphoreLock,
|
||||
* PGSemaphoreLock will not block. The wakeup is "saved" by the semaphore
|
||||
* implementation. While this is normally good, there are cases where a
|
||||
* PGSemaphoreLock will not block. The wakeup is "saved" by the semaphore
|
||||
* implementation. While this is normally good, there are cases where a
|
||||
* saved wakeup might be leftover from a previous operation (for example,
|
||||
* we aborted ProcWaitForSignal just before someone did ProcSendSignal).
|
||||
* So, loop to wait again if the waitStatus shows we haven't been granted
|
||||
@@ -1031,7 +1031,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
|
||||
|
||||
/*
|
||||
* waitStatus could change from STATUS_WAITING to something else
|
||||
* asynchronously. Read it just once per loop to prevent surprising
|
||||
* asynchronously. Read it just once per loop to prevent surprising
|
||||
* behavior (such as missing log messages).
|
||||
*/
|
||||
myWaitStatus = MyProc->waitStatus;
|
||||
@@ -1425,10 +1425,10 @@ check_done:
|
||||
* This can share the semaphore normally used for waiting for locks,
|
||||
* since a backend could never be waiting for a lock and a signal at
|
||||
* the same time. As with locks, it's OK if the signal arrives just
|
||||
* before we actually reach the waiting state. Also as with locks,
|
||||
* before we actually reach the waiting state. Also as with locks,
|
||||
* it's necessary that the caller be robust against bogus wakeups:
|
||||
* always check that the desired state has occurred, and wait again
|
||||
* if not. This copes with possible "leftover" wakeups.
|
||||
* if not. This copes with possible "leftover" wakeups.
|
||||
*/
|
||||
void
|
||||
ProcWaitForSignal(void)
|
||||
@@ -1482,7 +1482,7 @@ ProcSendSignal(int pid)
|
||||
/*
|
||||
* Enable the SIGALRM interrupt to fire after the specified delay
|
||||
*
|
||||
* Delay is given in milliseconds. Caller should be sure a SIGALRM
|
||||
* Delay is given in milliseconds. Caller should be sure a SIGALRM
|
||||
* signal handler is installed before this is called.
|
||||
*
|
||||
* This code properly handles nesting of deadlock timeout alarms within
|
||||
@@ -1533,7 +1533,7 @@ enable_sig_alarm(int delayms, bool is_statement_timeout)
|
||||
* NOTE: in this case it is possible that this routine will be
|
||||
* interrupted by the previously-set timer alarm. This is okay
|
||||
* because the signal handler will do only what it should do according
|
||||
* to the state variables. The deadlock checker may get run earlier
|
||||
* to the state variables. The deadlock checker may get run earlier
|
||||
* than normal, but that does no harm.
|
||||
*/
|
||||
timeout_start_time = GetCurrentTimestamp();
|
||||
|
@@ -77,7 +77,7 @@ s_lock(volatile slock_t *lock, const char *file, int line)
|
||||
*
|
||||
* We time out and declare error after NUM_DELAYS delays (thus, exactly
|
||||
* that many tries). With the given settings, this will usually take 2 or
|
||||
* so minutes. It seems better to fix the total number of tries (and thus
|
||||
* so minutes. It seems better to fix the total number of tries (and thus
|
||||
* the probability of unintended failure) than to fix the total time
|
||||
* spent.
|
||||
*
|
||||
@@ -140,7 +140,7 @@ s_lock(volatile slock_t *lock, const char *file, int line)
|
||||
* Note: spins_per_delay is local within our current process. We want to
|
||||
* average these observations across multiple backends, since it's
|
||||
* relatively rare for this function to even get entered, and so a single
|
||||
* backend might not live long enough to converge on a good value. That
|
||||
* backend might not live long enough to converge on a good value. That
|
||||
* is handled by the two routines below.
|
||||
*/
|
||||
if (cur_delay == 0)
|
||||
@@ -179,7 +179,7 @@ update_spins_per_delay(int shared_spins_per_delay)
|
||||
/*
|
||||
* We use an exponential moving average with a relatively slow adaption
|
||||
* rate, so that noise in any one backend's result won't affect the shared
|
||||
* value too much. As long as both inputs are within the allowed range,
|
||||
* value too much. As long as both inputs are within the allowed range,
|
||||
* the result must be too, so we need not worry about clamping the result.
|
||||
*
|
||||
* We deliberately truncate rather than rounding; this is so that single
|
||||
|
@@ -5,7 +5,7 @@
|
||||
*
|
||||
*
|
||||
* For machines that have test-and-set (TAS) instructions, s_lock.h/.c
|
||||
* define the spinlock implementation. This file contains only a stub
|
||||
* define the spinlock implementation. This file contains only a stub
|
||||
* implementation for spinlocks using PGSemaphores. Unless semaphores
|
||||
* are implemented in a way that doesn't involve a kernel call, this
|
||||
* is too slow to be very useful :-(
|
||||
|
@@ -54,7 +54,7 @@ PageInit(Page page, Size pageSize, Size specialSize)
|
||||
* PageHeaderIsValid
|
||||
* Check that the header fields of a page appear valid.
|
||||
*
|
||||
* This is called when a page has just been read in from disk. The idea is
|
||||
* This is called when a page has just been read in from disk. The idea is
|
||||
* to cheaply detect trashed pages before we go nuts following bogus item
|
||||
* pointers, testing invalid transaction identifiers, etc.
|
||||
*
|
||||
@@ -99,7 +99,7 @@ PageHeaderIsValid(PageHeader page)
|
||||
/*
|
||||
* PageAddItem
|
||||
*
|
||||
* Add an item to a page. Return value is offset at which it was
|
||||
* Add an item to a page. Return value is offset at which it was
|
||||
* inserted, or InvalidOffsetNumber if there's not room to insert.
|
||||
*
|
||||
* If overwrite is true, we just store the item at the specified
|
||||
@@ -699,7 +699,7 @@ PageIndexTupleDelete(Page page, OffsetNumber offnum)
|
||||
* PageIndexMultiDelete
|
||||
*
|
||||
* This routine handles the case of deleting multiple tuples from an
|
||||
* index page at once. It is considerably faster than a loop around
|
||||
* index page at once. It is considerably faster than a loop around
|
||||
* PageIndexTupleDelete ... however, the caller *must* supply the array
|
||||
* of item numbers to be deleted in item number order!
|
||||
*/
|
||||
|
@@ -77,7 +77,7 @@
|
||||
* not needed because of an mdtruncate() operation. The reason for leaving
|
||||
* them present at size zero, rather than unlinking them, is that other
|
||||
* backends and/or the bgwriter might be holding open file references to
|
||||
* such segments. If the relation expands again after mdtruncate(), such
|
||||
* such segments. If the relation expands again after mdtruncate(), such
|
||||
* that a deactivated segment becomes active again, it is important that
|
||||
* such file references still be valid --- else data might get written
|
||||
* out to an unlinked old copy of a segment file that will eventually
|
||||
@@ -114,7 +114,7 @@ static MemoryContext MdCxt; /* context for all md.c allocations */
|
||||
* we keep track of pending fsync operations: we need to remember all relation
|
||||
* segments that have been written since the last checkpoint, so that we can
|
||||
* fsync them down to disk before completing the next checkpoint. This hash
|
||||
* table remembers the pending operations. We use a hash table mostly as
|
||||
* table remembers the pending operations. We use a hash table mostly as
|
||||
* a convenient way of eliminating duplicate requests.
|
||||
*
|
||||
* We use a similar mechanism to remember no-longer-needed files that can
|
||||
@@ -276,7 +276,7 @@ mdcreate(SMgrRelation reln, ForkNumber forkNum, bool isRedo)
|
||||
* During bootstrap, there are cases where a system relation will be
|
||||
* accessed (by internal backend processes) before the bootstrap
|
||||
* script nominally creates it. Therefore, allow the file to exist
|
||||
* already, even if isRedo is not set. (See also mdopen)
|
||||
* already, even if isRedo is not set. (See also mdopen)
|
||||
*/
|
||||
if (isRedo || IsBootstrapProcessingMode())
|
||||
fd = PathNameOpenFile(path, O_RDWR | PG_BINARY, 0600);
|
||||
@@ -318,7 +318,7 @@ mdcreate(SMgrRelation reln, ForkNumber forkNum, bool isRedo)
|
||||
* if the contents of the file were repopulated by subsequent WAL entries.
|
||||
* But if we didn't WAL-log insertions, but instead relied on fsyncing the
|
||||
* file after populating it (as for instance CLUSTER and CREATE INDEX do),
|
||||
* the contents of the file would be lost forever. By leaving the empty file
|
||||
* the contents of the file would be lost forever. By leaving the empty file
|
||||
* until after the next checkpoint, we prevent reassignment of the relfilenode
|
||||
* number until it's safe, because relfilenode assignment skips over any
|
||||
* existing file.
|
||||
@@ -470,7 +470,7 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
|
||||
/*
|
||||
* Note: because caller usually obtained blocknum by calling mdnblocks,
|
||||
* which did a seek(SEEK_END), this seek is often redundant and will be
|
||||
* optimized away by fd.c. It's not redundant, however, if there is a
|
||||
* optimized away by fd.c. It's not redundant, however, if there is a
|
||||
* partial page at the end of the file. In that case we want to try to
|
||||
* overwrite the partial page with a full page. It's also not redundant
|
||||
* if bufmgr.c had to dump another buffer of the same file to make room
|
||||
@@ -770,9 +770,9 @@ mdnblocks(SMgrRelation reln, ForkNumber forknum)
|
||||
* exactly RELSEG_SIZE long, and it's useless to recheck that each time.
|
||||
*
|
||||
* NOTE: this assumption could only be wrong if another backend has
|
||||
* truncated the relation. We rely on higher code levels to handle that
|
||||
* truncated the relation. We rely on higher code levels to handle that
|
||||
* scenario by closing and re-opening the md fd, which is handled via
|
||||
* relcache flush. (Since the bgwriter doesn't participate in relcache
|
||||
* relcache flush. (Since the bgwriter doesn't participate in relcache
|
||||
* flush, it could have segment chain entries for inactive segments;
|
||||
* that's OK because the bgwriter never needs to compute relation size.)
|
||||
*/
|
||||
@@ -965,7 +965,7 @@ mdsync(void)
|
||||
|
||||
/*
|
||||
* If we are in the bgwriter, the sync had better include all fsync
|
||||
* requests that were queued by backends up to this point. The tightest
|
||||
* requests that were queued by backends up to this point. The tightest
|
||||
* race condition that could occur is that a buffer that must be written
|
||||
* and fsync'd for the checkpoint could have been dumped by a backend just
|
||||
* before it was visited by BufferSync(). We know the backend will have
|
||||
@@ -1057,7 +1057,7 @@ mdsync(void)
|
||||
* have been deleted (unlinked) by the time we get to them. Rather
|
||||
* than just hoping an ENOENT (or EACCES on Windows) error can be
|
||||
* ignored, what we do on error is absorb pending requests and
|
||||
* then retry. Since mdunlink() queues a "revoke" message before
|
||||
* then retry. Since mdunlink() queues a "revoke" message before
|
||||
* actually unlinking, the fsync request is guaranteed to be
|
||||
* marked canceled after the absorb if it really was this case.
|
||||
* DROP DATABASE likewise has to tell us to forget fsync requests
|
||||
@@ -1450,7 +1450,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
|
||||
|
||||
/*
|
||||
* NB: it's intentional that we don't change cycle_ctr if the entry
|
||||
* already exists. The fsync request must be treated as old, even
|
||||
* already exists. The fsync request must be treated as old, even
|
||||
* though the new request will be satisfied too by any subsequent
|
||||
* fsync.
|
||||
*
|
||||
@@ -1458,7 +1458,7 @@ RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
|
||||
* act just as though it wasn't there. The only case where this could
|
||||
* happen would be if a file had been deleted, we received but did not
|
||||
* yet act on the cancel request, and the same relfilenode was then
|
||||
* assigned to a new file. We mustn't lose the new request, but it
|
||||
* assigned to a new file. We mustn't lose the new request, but it
|
||||
* should be considered new not old.
|
||||
*/
|
||||
}
|
||||
@@ -1621,7 +1621,7 @@ _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
|
||||
{
|
||||
/*
|
||||
* Normally we will create new segments only if authorized by the
|
||||
* caller (i.e., we are doing mdextend()). But when doing WAL
|
||||
* caller (i.e., we are doing mdextend()). But when doing WAL
|
||||
* recovery, create segments anyway; this allows cases such as
|
||||
* replaying WAL data that has a write into a high-numbered
|
||||
* segment of a relation that was later deleted. We want to go
|
||||
|
@@ -544,7 +544,7 @@ smgrtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
|
||||
* Send a shared-inval message to force other backends to close any smgr
|
||||
* references they may have for this rel. This is useful because they
|
||||
* might have open file pointers to segments that got removed, and/or
|
||||
* smgr_targblock variables pointing past the new rel end. (The inval
|
||||
* smgr_targblock variables pointing past the new rel end. (The inval
|
||||
* message will come back to our backend, too, causing a
|
||||
* probably-unnecessary local smgr flush. But we don't expect that this
|
||||
* is a performance-critical path.) As in the unlink code, we want to be
|
||||
|
Reference in New Issue
Block a user