In 4b4b680c I passed a buffer index number (starting from 0) instead of
a proper Buffer id (which start from 1 for shared buffers) in two
places.
This wasn't noticed so far as one of those locations isn't compiled at
all (PrintPinnedBufs) and the other one (InvalidBuffer) requires a
unlikely, but possible, set of circumstances to trigger a symptom.
To reduce the likelihood of such incidents a bit also convert existing
open coded mappings from buffer descriptors to buffer ids with
BufferDescriptorGetBuffer().
Author: Qingqing Zhou
Reported-By: Qingqing Zhou
Discussion: CAJjS0u2ai9ooUisKtkV8cuVUtEkMTsbK8c7juNAjv8K11zeCQg@mail.gmail.com
Backpatch: 9.5 where the private ref count infrastructure was introduced
If some, but not all, of the length word has already been read, and the
next attempt to read sees exactly the number of bytes needed to complete
the length word, or fewer, then we'll incorrectly read less than all of
the available data.
Antonin Houska
LWLockAttemptLock pointlessly read the lock's state in every loop
iteration, even though pg_atomic_compare_exchange_u32() returns the old
value. Instead do that only once before the loop iteration.
Additionally there's no need to have the expected_state variable,
old_state mostly had the same value anyway.
Noticed-By: Heikki Linnakangas
Backpatch: 9.5, no reason to let the branches diverge at this point
The lwlock scalability work introduced two race conditions into the
lwlock variable support provided for xlog.c. First, and harmlessly on
most platforms, it set/read the variable without the spinlock in some
places. Secondly, due to the removal of the spinlock, it was possible
that a backend missed changes to the variable's state if it changed in
the wrong moment because checking the lock's state, the variable's state
and the queuing are not protected by a single spinlock acquisition
anymore.
To fix first move resetting the variable's from LWLockAcquireWithVar to
WALInsertLockRelease, via a new function LWLockReleaseClearVar. That
prevents issues around waiting for a variable's value to change when a
new locker has acquired the lock, but not yet set the value. Secondly
re-check that the variable hasn't changed after enqueing, that prevents
the issue that the lock has been released and already re-acquired by the
time the woken up backend checks for the lock's state.
Reported-By: Jeff Janes
Analyzed-By: Heikki Linnakangas
Reviewed-By: Heikki Linnakangas
Discussion: 5592DB35.2060401@iki.fi
Backpatch: 9.5, where the lwlock scalability went in
Don't print a WARNING if we get ESRCH from a kill() that's attempting
to cancel an autovacuum worker. It's possible (and has been seen in the
buildfarm) that the worker is already gone by the time we are able to
execute the kill, in which case the failure is harmless. About the only
plausible reason for reporting such cases would be to help debug corrupted
lock table contents, but this is hardly likely to be the most important
symptom if that happens. Moreover issuing a WARNING might scare users
more than is warranted.
Also, since sending a signal to an autovacuum worker is now entirely a
routine thing, and the worker will log the query cancel on its end anyway,
reduce the message saying we're doing that from LOG to DEBUG1 level.
Very minor cosmetic cleanup as well.
Since the main practical reason for doing this is to avoid unnecessary
buildfarm failures, back-patch to all active branches.
1. Replay of the WAL record for setting a bit in the visibility map
contained an assertion that a full-page image of that record type can only
occur with checksums enabled. But it can also happen with wal_log_hints, so
remove the assertion. Unlike checksums, wal_log_hints can be changed on the
fly, so it would be complicated to figure out if it was enabled at the time
that the WAL record was generated.
2. wal_log_hints has the same effect on the locking needed to read the LSN
of a page as data checksums. BufferGetLSNAtomic() didn't get the memo.
Backpatch to 9.4, where wal_log_hints was added.
The argument that this is a sufficiently-expected case to be silently
ignored seems pretty thin. Andres had brought it up back when we were
still considering that most fsync failures should be hard errors, and it
probably would be legit not to fail hard for ETXTBSY --- but the same is
true for EROFS and other cases, which is why we gave up on hard failures.
ETXTBSY is surely not a normal case, so logging the failure seems fine
from here.
Commit 2ce439f337 introduced a rather serious
regression, namely that if its scan of the data directory came across any
un-fsync-able files, it would fail and thereby prevent database startup.
Worse yet, symlinks to such files also caused the problem, which meant that
crash restart was guaranteed to fail on certain common installations such
as older Debian.
After discussion, we agreed that (1) failure to start is worse than any
consequence of not fsync'ing is likely to be, therefore treat all errors
in this code as nonfatal; (2) we should not chase symlinks other than
those that are expected to exist, namely pg_xlog/ and tablespace links
under pg_tblspc/. The latter restriction avoids possibly fsync'ing a
much larger part of the filesystem than intended, if the user has left
random symlinks hanging about in the data directory.
This commit takes care of that and also does some code beautification,
mainly moving the relevant code into fd.c, which seems a much better place
for it than xlog.c, and making sure that the conditional compilation for
the pre_sync_fname pass has something to do with whether pg_flush_data
works.
I also relocated the call site in xlog.c down a few lines; it seems a
bit silly to be doing this before ValidateXLOGDirectoryStructure().
The similar logic in initdb.c ought to be made to match this, but that
change is noncritical and will be dealt with separately.
Back-patch to all active branches, like the prior commit.
Abhijit Menon-Sen and Tom Lane
Ensure that we null-terminate the result string (one place in pg_rewind).
Be paranoid about out-of-range results from readlink() (should not happen,
but there is no good reason for some call sites to be careful about it and
others not). Consistently use the whole buffer, not sometimes one byte
less. Ensure we emit an appropriate errcode() in all cases. Spell the
error messages the same way.
The only serious bug here is the missing null-termination in pg_rewind,
which is new code, so no need for a back-patch.
Abhijit Menon-Sen and Tom Lane
We want this struct to be exactly a series of 3 int16 words, no more
and no less. Historically, at least, some ARM compilers preferred to
pad it to 8 bytes unless coerced. Our old way of doing that was just
to use __attribute__((packed)), but as pointed out by Piotr Stefaniak,
that does too much: it also licenses the compiler to give the struct
only byte-alignment. We don't want that because it adds access overhead,
possibly quite significant overhead. According to the GCC manual, what
we want requires also specifying __attribute__((align(2))). It's not
entirely clear if all the relevant compilers accept this pragma as well,
but we can hope the buildfarm will tell us if not. We can also add a
static assertion that should fire if the compiler padded the struct.
Since the combination of these pragmas should define exactly what we
want on any compiler that accepts them, let's try using them wherever
we think they exist, not only for __arm__. (This is likely to expose
that the conditional definitions in c.h are inadequate, but finding
that out would be a good thing.)
The immediate motivation for this is that the current definition of
ExecRowMark allows its curCtid field to be misaligned. It is not clear
whether there are any other uses of ItemPointerData with a similar hazard.
We could change the definition of ExecRowMark if this doesn't work, but
it would be far better to have a future-proof fix.
Piotr Stefaniak, some further hacking by me
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.
Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.
Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
The point of the assertion is to ensure that the arrays allocated in stack
are large enough, but the check was one item short.
This won't matter in practice because MaxIndexTuplesPerPage is an
overestimate, so you can't have that many items on a page in reality.
But let's be tidy.
Spotted by Anastasia Lubennikova. Backpatch to all supported versions, like
the patch that added the assertion.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
pg_win32_is_junction() was a typo for pgwin32_is_junction(). open()
was used not only in a two-argument form, which breaks on Windows,
but also where BasicOpenFile() should have been used.
Per reports from Andrew Dunstan and David Rowley.
Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk. This could lead to data corruption.
Back-patch to all supported versions.
Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised
by me.
This does four basic things. First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers. Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes. Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.
Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
e.g. to avoid loops in bi-directional replication setups
The solution to these problems, as implemented here, consist out of
three parts:
1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
replication origin, how far replay has progressed in a efficient and
crash safe manner.
3) The ability to filter out changes performed on the behest of a
replication origin during logical decoding; this allows complex
replication topologies. E.g. by filtering all replayed changes out.
Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated. We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.
This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL. Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.
For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.
Bumps both catversion and wal page magic.
Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
20140923182422.GA15776@alap3.anarazel.de,
20131114172632.GE7522@alap2.anarazel.de
BackendIdGetTransactionIds() neglected the possibility that the PROC
pointer in a ProcState array entry is null. In current usage, this could
only crash if the other backend had exited since pgstat_read_current_status
saw it as active, which is a pretty narrow window. But it's reachable in
the field, per bug #12918 from Vladimir Borodin.
Back-patch to 9.4 where the faulty code was introduced.
Even though the main benefit of the Lehman and Yao algorithm for
btrees is that no locks need be held between page reads in an
index search, we were holding a buffer pin on each leaf page after
it was read until we were ready to read the next one. The reason
was so that we could treat this as a weak lock to create an
"interlock" with vacuum's deletion of heap line pointers, even
though our README file pointed out that this was not necessary for
a scan using an MVCC snapshot.
The main goal of this patch is to reduce the blocking of vacuum
processes by in-progress btree index scans (including a cursor
which is idle), but the code rearrangement also allows for one
less buffer content lock to be taken when a forward scan steps from
one page to the next, which results in a small but consistent
performance improvement in many workloads.
This patch leaves behavior unchanged for some cases, which can be
addressed separately so that each case can be evaluated on its own
merits. These unchanged cases are when a scan uses a non-MVCC
snapshot, an index-only scan, and a scan of a btree index for which
modifications are not WAL-logged. If later patches allow all of
these cases to drop the buffer pin after reading a leaf page, then
the btree vacuum process can be simplified; it will no longer need
the "super-exclusive" lock to delete tuples from a page.
Reviewed by Heikki Linnakangas and Kyotaro Horiguchi
dsm_control->nitems never decreases, so this is testing whether the
server has *ever* run out of DSM segments, not whether it is
*currently* out of DSM segments.
Reported off-list by Amit Kapila.
Right now, there's only one flag, DSM_CREATE_NULL_IF_MAXSEGMENTS,
which suppresses the error that would normally be thrown when the
maximum number of segments already exists, instead returning NULL.
It might be useful to add more flags in the future, such as one to
ignore allocation errors, but I haven't done that here.
When LockBufferForCleanup() has to wait for getting a cleanup lock on a
buffer it does so by setting a flag in the buffer header and then wait
for other backends to signal it using ProcWaitForSignal().
Unfortunately LockBufferForCleanup() missed that ProcWaitForSignal() can
return for other reasons than the signal it is hoping for. If such a
spurious signal arrives the wait flags on the buffer header will still
be set. That then triggers "ERROR: multiple backends attempting to wait
for pincount 1".
The fix is simple, unset the flag if still set when retrying. That
implies an additional spinlock acquisition/release, but that's unlikely
to matter given the cost of waiting for a cleanup lock. Alternatively
it'd have been possible to move responsibility for maintaining the
relevant flag to the waiter all together, but that might have had
negative consequences due to possible floods of signals. Besides being
more invasive.
This looks to be a very longstanding bug. The relevant code in
LockBufferForCleanup() hasn't changed materially since its introduction
and ProcWaitForSignal() was documented to return for unrelated reasons
since 8.2. The master only patch series removing ImmediateInterruptOK
made it much easier to hit though, as ProcSendSignal/ProcWaitForSignal
now uses a latch shared with other tasks.
Per discussion with Kevin Grittner, Tom Lane and me.
Backpatch to all supported branches.
Discussion: 11553.1423805224@sss.pgh.pa.us
The remaining caller (lwlocks) doesn't need that facility, and we plan
to remove ImmedidateInterruptOK entirely. That means that interrupts
can't be serviced race-free and portably anyway, so there's little
reason for keeping the feature.
Reviewed-By: Heikki Linnakangas
Deadlock checking was performed inside signal handlers up to
now. While it's a remarkable feat to have made this work reliably,
it's quite complex to understand why that is the case. Partially it
worked due to the assumption that semaphores are signal safe - which
is not actually documented to be the case for sysv semaphores.
The reason we had to rely on performing this work inside signal
handlers is that semaphores aren't guaranteed to be interruptable by
signals on all platforms. But now that latches provide a somewhat
similar API, which actually has the guarantee of being interruptible,
we can avoid doing so.
Signalling between ProcSleep, ProcWakeup, ProcWaitForSignal and
ProcSendSignal is now done using latches. This increases the
likelihood of spurious wakeups. As spurious wakeup already were
possible and aren't likely to be frequent enough to be an actual
problem, this seems acceptable.
This change would allow for further simplification of the deadlock
checking, now that it doesn't have to run in a signal handler. But
even if I were motivated to do so right now, it would still be better
to do that separately. Such a cleanup shouldn't have to be reviewed a
the same time as the more fundamental changes in this commit.
There is one possible usability regression due to this commit. Namely
it is more likely than before that log_lock_waits messages are output
more than once.
Reviewed-By: Heikki Linnakangas
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
The logic to compact away removed tuples from page was duplicated with
small differences in PageRepairFragmentation, PageIndexMultiDelete, and
PageIndexDeleteNoCompact. Put it into a common function.
Reviewed by Peter Geoghegan.
If any error occurred while we were in the middle of reading a protocol
message from the client, we could lose sync, and incorrectly try to
interpret a part of another message as a new protocol message. That will
usually lead to an "invalid frontend message" error that terminates the
connection. However, this is a security issue because an attacker might
be able to deliberately cause an error, inject a Query message in what's
supposed to be just user data, and have the server execute it.
We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other
operations that could ereport(ERROR) in the middle of processing a message,
but a query cancel interrupt or statement timeout could nevertheless cause
it to happen. Also, the V2 fastpath and COPY handling were not so careful.
It's very difficult to recover in the V2 COPY protocol, so we will just
terminate the connection on error. In practice, that's what happened
previously anyway, as we lost protocol sync.
To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set
whenever we're in the middle of reading a message. When it's set, we cannot
safely ERROR out and continue running, because we might've read only part
of a message. PqCommReadingMsg acts somewhat similarly to critical sections
in that if an error occurs while it's set, the error handler will force the
connection to be terminated, as if the error was FATAL. It's not
implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted
to PANIC in critical sections, because we want to be able to use
PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes
advantage of that to prevent an OOM error from terminating the connection.
To prevent unnecessary connection terminations, add a holdoff mechanism
similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel
interrupts, but still allow die interrupts. The rules on which interrupts
are processed when are now a bit more complicated, so refactor
ProcessInterrupts() and the calls to it in signal handlers so that the
signal handlers always call it if ImmediateInterruptOK is set, and
ProcessInterrupts() can decide to not do anything if the other conditions
are not met.
Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund.
Backpatch to all supported versions.
Security: CVE-2015-0244
GetLockConflicts() has for a long time not properly terminated the
returned array. During normal processing the returned array is zero
initialized which, while not pretty, is sufficient to be recognized as
a invalid virtual transaction id. But the HotStandby case is more than
aesthetically broken: The allocated (and reused) array is neither
zeroed upon allocation, nor reinitialized, nor terminated.
Not having a terminating element means that the end of the array will
not be recognized and that recovery conflict handling will thus read
ahead into adjacent memory. Only terminating when hitting memory
content that looks like a invalid virtual transaction id. Luckily
this seems so far not have caused significant problems, besides making
recovery conflict more expensive.
Discussion: 20150127142713.GD29457@awork2.anarazel.de
Backpatch into all supported branches.
Benchmarks has shown that aligning the buffer descriptor array to
cache lines is important for scalability; especially on bigger,
multi-socket, machines.
Currently the array sometimes already happens to be aligned by
happenstance, depending how large previous shared memory allocations
were. That can lead to wildly varying performance results after minor
configuration changes.
In addition to aligning the start of descriptor array, also force the
size of individual descriptors to be of a common cache line size (64
bytes). That happens to already be the case on 64bit platforms, but
this way we can change the struct BufferDesc more easily.
As the alignment primarily matters in highly concurrent workloads
which probably all are 64bit these days, and the space wastage of
element alignment would be a bit more noticeable on 32bit systems, we
don't force the stride to be cacheline sized on 32bit platforms for
now. If somebody does actual performance testing, we can reevaluate
that decision by changing the definition of BUFFERDESC_PADDED_SIZE.
Discussion: 20140202151319.GD32123@awork2.anarazel.de
Per discussion with Bruce Momjan, Tom Lane, Robert Haas, and Peter
Geoghegan.
As noted by Tom Lane the improvements in 4b4b680c3d had the problem
that in some situations we searched, entered and modified entries in
the private refcount hash while holding a spinlock. I had tried to
keep the logic entirely local to PinBuffer_Locked(), but that's not
really possible given it's called with a spinlock held...
Besides being disadvantageous from a performance point of view, this
also has problems with error handling safety. If we failed inserting
an entry into the hashtable due to an out of memory error, we'd error
out with a held spinlock. Not good.
Change the way private refcounts are manipulated: Before a buffer can
be tracked an entry has to be reserved using
ReservePrivateRefCountEntry(); then, if a entry is not found using
GetPrivateRefCountEntry(), it can be entered with
NewPrivateRefCountEntry().
Also take advantage of the fact that PinBuffer_Locked() currently is
never called for buffers that already have been pinned by the current
backend and don't search the private refcount entries for preexisting
local pins. That results in a small, but measurable, performance
improvement.
Additionally make ReleaseBuffer() always call UnpinBuffer() for shared
buffers. That avoids duplicating work in an eventual UnpinBuffer()
call that already has been done in ReleaseBuffer() and also saves some
code.
Per discussion with Tom Lane.
Discussion: 15028.1418772313@sss.pgh.pa.us
To do so, move InitializeLatchSupport() into the new common process
initialization functions, and add a new global variable MyLatch.
MyLatch is usable as soon InitPostmasterChild() has been called
(i.e. very early during startup). Initially it points to a process
local latch that exists in all processes. InitProcess/InitAuxiliaryProcess
then replaces that local latch with PGPROC->procLatch. During shutdown
the reverse happens.
This is primarily advantageous for two reasons: For one it simplifies
dealing with the shared process latch, especially in signal handlers,
because instead of having to check for MyProc, MyLatch can be used
unconditionally. For another, a later patch that makes FEs/BE
communication use latches, now can rely on the existence of a latch,
even before having gone through InitProcess.
Discussion: 20140927191243.GD5423@alap3.anarazel.de
Since their introduction latches have required barriers in SetLatch
and ResetLatch - but when they were introduced there wasn't any
barrier abstraction. Instead latches were documented to rely on the
callsites to provide barrier semantics.
Now that the barrier support looks halfway complete, add the necessary
barriers to both latch implementations.
Also remove a now superflous lock acquisition from syncrep.c and a
superflous (and insufficient) barrier from freelist.c. There might be
other cases that can now be simplified, but those are the only ones
I've seen on a quick scan.
We might want to backpatch this at some later point, but right now the
barrier infrastructure in the backbranches isn't totally on par with
master.
Discussion: 20150112154026.GB2092@awork2.anarazel.de
Back in ed0b409, PGPROC was split and moved to static variables in
procarray.c, with procs in ProcArrayStruct replaced by an array of
integers representing process numbers (pgprocnos), with -1 indicating a
dead process which has yet to be removed. Access to procArray is
generally done under ProcArrayLock and therefore most code does not have
to concern itself with -1 entries.
However, MinimumActiveBackends intentionally does not take
ProcArrayLock, which means it has to be extra careful when accessing
procArray. Prior to ed0b409, this was handled by checking for a NULL
in the pointer array, but that check was no longer valid after the
split. Coverity pointed out that the check could never happen and so
it was removed in 5592eba. That didn't make anything worse, but it
didn't fix the issue either.
The correct fix is to check for pgprocno == -1 and skip over that entry
if it is encountered.
Back-patch to 9.2, since there can be attempts to access the arrays
prior to their start otherwise. Note that the changes prior to 9.4 will
look a bit different due to the change in 5592eba.
Note that MinimumActiveBackends only returns a bool for heuristic
purposes and any pre-array accesses are strictly read-only and so there
is no security implication and the lack of fields complaints indicates
it's very unlikely to run into issues due to this.
Pointed out by Noah.