Since commit 2f1d2b7a we have set PAM_RHOST to "[local]" for Unix
sockets. This caused Linux PAM's libaudit integration to make DNS
requests for that name. It's not exactly clear what value PAM_RHOST
should have in that case, but it seems clear that we shouldn't set it
to an unresolvable name, so don't do that.
Back-patch to 9.6. Bug #15520.
Author: Thomas Munro
Reviewed-by: Peter Eisentraut
Reported-by: Albert Schabhuetl
Discussion: https://postgr.es/m/15520-4c266f986998e1c5%40postgresql.org
During table rewrites (VACUUM FULL and CLUSTER), the main heap is logged
using XLOG / FPI records, and thus (correctly) ignored in decoding.
But the associated TOAST table is WAL-logged as plain INSERT records,
and so was logically decoded and passed to reorder buffer.
That has severe consequences with TOAST tables of non-trivial size.
Firstly, reorder buffer has to keep all those changes, possibly spilling
them to a file, incurring I/O costs and disk space.
Secondly, ReoderBufferCommit() was stashing all those TOAST chunks into
a hash table, which got discarded only after processing the row from the
main heap. But as the main heap is not decoded for rewrites, this never
happened, so all the TOAST data accumulated in memory, resulting either
in excessive memory consumption or OOM.
The fix is simple, as commit e9edc1ba already introduced infrastructure
(namely HEAP_INSERT_NO_LOGICAL flag) to skip logical decoding of TOAST
tables, but it only applied it to system tables. So simply use it for
all TOAST data in raw_heap_insert().
That would however solve only the memory consumption issue - the TOAST
changes would still be decoded and added to the reorder buffer, and
spilled to disk (although without TOAST tuple data, so much smaller).
But we can solve that by tweaking DecodeInsert() to just ignore such
INSERT records altogether, using XLH_INSERT_CONTAINS_NEW_TUPLE flag,
instead of skipping them later in ReorderBufferCommit().
Review: Masahiko Sawada
Discussion: https://www.postgresql.org/message-id/flat/1a17c643-e9af-3dba-486b-fbe31bc1823a%402ndquadrant.com
Backpatch: 9.4-, where logical decoding was introduced
Unfortunately ac218aa4f6 missed the fact that a reference to
'pg_catalog.regnamespace'::regclass wouldn't work before that type is
known. Fix that, by replacing the regtype usage with a join to
pg_type.
Reported-By: Tom Lane
Author: Andres Freund
Discussion: https://postgr.es/m/8863.1543297423@sss.pgh.pa.us
Backpatch: 9.5-, like ac218aa4f6
When the regrole (0c90f6769) and regnamespace (cb9fa802b) types were
added in 9.5, pg_upgrade's check for reg* types wasn't updated. While
regrole currently is safe, regnamespace is not.
It seems unlikely that anybody uses regnamespace inside catalog tables
across a pg_upgrade, but the tests should be correct nevertheless.
While at it, reorder the types checked in the query to be
alphabetical. Otherwise it's annoying to compare existing and tested
for types.
Author: Andres Freund
Discussion: https://postgr.es/m/037e152a-cb25-3bcb-4f35-bdc9988f8204@2ndQuadrant.com
Backpatch: 9.5-, as regrole/regnamespace
latex_escaped_print() mistranslated \ and failed to provide any translation
for # ^ and ~, all of which would typically lead to LaTeX document syntax
errors. In addition it didn't translate < > and |, which would typically
render as unexpected characters.
To some extent this represents shortcomings in ancient versions of LaTeX,
which if memory serves had no easy way to render these control characters
as ASCII text. But that's been fixed for, um, decades. In any case there
is no value in emitting guaranteed-to-fail output for these characters.
Noted while fooling with test cases added by commit 9a98984f4. Back-patch
the code change to all supported versions.
Hstore data generated on pg 8.4 and pg_upgraded to current versions
remains in its original on-disk format unless modified. The same goes
for values generated by the addon hstore-new module on pre-9.0
versions. (The hstoreUpgrade function converts old values on the fly
when read in, but the on-disk value is not modified by this.)
Since old-format empty hstores (and hstore-new hstores) have
representations compatible with the new format, hstoreUpgrade thought
it could get away without modifying such values; but this breaks
hstore_hash (and the new hstore_hash_extended) which assumes
bit-perfect matching between semantically identical hstore values.
Only one bit actually differs (the "new version" flag in the count
field) but that of course is enough to break the hash.
Fix by making hstoreUpgrade unconditionally convert all old values to
new format.
Backpatch all the way, even though this changes a hash value in some
cases, because in those cases the hash value is already failing - for
example, a hash join between old- and new-format empty hstores will be
failing to match, or a hash index on an hstore column containing an
old-format empty value will be failing to find the value since it will
be searching for a hash derived from a new-format datum. (There are no
known field reports of this happening, probably because hashing of
hstores has only been useful in limited circumstances and there
probably isn't much upgraded data being used this way.)
Per concerns arising from discussion of commit eb6f29141be. Original
bug is my fault.
Discussion: https://postgr.es/m/60b1fd3b-7332-40f0-7e7f-f2f04f777747%402ndquadrant.com
ftoi4 and its sibling coercion functions did their overflow checks in
a way that looked superficially plausible, but actually depended on an
assumption that the MIN and MAX comparison constants can be represented
exactly in the float4 or float8 domain. That fails in ftoi4, ftoi8,
and dtoi8, resulting in a possibility that values near the MAX limit will
be wrongly converted (to negative values) when they need to be rejected.
Also, because we compared before rounding off the fractional part,
the other three functions threw errors for values that really ought
to get rounded to the min or max integer value.
Fix by doing rint() first (requiring an assumption that it handles
NaN and Inf correctly; but dtoi8 and ftoi8 were assuming that already),
and by comparing to values that should coerce to float exactly, namely
INTxx_MIN and -INTxx_MIN. Also remove some random cosmetic discrepancies
between these six functions.
This back-patches commits cbdb8b4c0 and 452b637d4. In the 9.4 branch,
also back-patch the portion of 62e2a8dc2 that added PG_INTnn_MIN and
related constants to c.h, so that these functions can rely on them.
Per bug #15519 from Victor Petrovykh.
Patch by me; thanks to Andrew Gierth for analysis and discussion.
Discussion: https://postgr.es/m/15519-4fc785b483201ff1@postgresql.org
1. Integer overflow in internal_size could result in memory corruption
in decompression since a zero-length array would be allocated and then
written to. This leads to crashes or corruption when traversing an
index which has been populated with sufficiently sparse values. Fix by
using int64 for computations and checking for overflow.
2. Integer overflow in g_int_compress could cause pessimal merge
choices, resulting in unnecessarily large ranges (which would in turn
trigger issue 1 above). Fix by using int64 again.
3. Even without overflow, array sizes could become large enough to
cause unexplained memory allocation errors. Fix by capping the sizes
to a safe limit and report actual errors pointing at gist__intbig_ops
as needed.
4. Large inputs to the compression function always consist of large
runs of consecutive integers, and the compression loop was processing
these one at a time in an O(N^2) manner with a lot of overhead. The
expected runtime of this function could easily exceed 6 months for a
single call as a result. Fix by performing a linear-time first pass,
which reduces the worst case to something on the order of seconds.
Backpatch all the way, since this has been wrong forever.
Per bug #15518 from report from irc user "dymk", analysis and patch by
me.
Discussion: https://postgr.es/m/15518-799e426c3b4f8358@postgresql.org
The documentation claimed that an enum type requires "one or more"
labels, but since 1fd9883ff49, zero labels are also allowed.
Reported-by: Lukas Eder <lukas.eder@gmail.com>
Bug: #15356
The test case that Autoconf uses to discover whether a function has
been declared doesn't work reliably with clang, because clang reports
a warning not an error if the name is a known built-in function.
On some platforms, this results in a lot of compile-time warnings about
strlcpy and related functions not having been declared.
There is a fix for this (by Noah Misch) in the upstream Autoconf sources,
but since they've not made a release in years and show no indication of
doing so anytime soon, let's just absorb their fix directly. We can
revert this when and if we update to a newer Autoconf release.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/26819.1542515567@sss.pgh.pa.us
On some operating systems, it doesn't make sense to retry fsync(),
because dirty data cached by the kernel may have been dropped on
write-back failure. In that case the only remaining copy of the
data is in the WAL. A subsequent fsync() could appear to succeed,
but not have flushed the data. That means that a future checkpoint
could apparently complete successfully but have lost data.
Therefore, violently prevent any future checkpoint attempts by
panicking on the first fsync() failure. Note that we already
did the same for WAL data; this change extends that behavior to
non-temporary data files.
Provide a GUC data_sync_retry to control this new behavior, for
users of operating systems that don't eject dirty data, and possibly
forensic/testing uses. If it is set to on and the write-back error
was transient, a later checkpoint might genuinely succeed (on a
system that does not throw away buffers on failure); if the error is
permanent, later checkpoints will continue to fail. The GUC defaults
to off, meaning that we panic.
Back-patch to all supported releases.
There is still a narrow window for error-loss on some operating
systems: if the file is closed and later reopened and a write-back
error occurs in the intervening time, but the inode has the bad
luck to be evicted due to memory pressure before we reopen, we could
miss the error. A later patch will address that with a scheme
for keeping files with dirty data open at all times, but we judge
that to be too complicated to back-patch.
Author: Craig Ringer, with some adjustments by Thomas Munro
Reported-by: Craig Ringer
Reviewed-by: Robert Haas, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
If fsync() fails, md.c must keep the request in its bitmap, so that
future attempts will try again.
Back-patch to all supported releases.
Author: Thomas Munro
Reviewed-by: Amit Kapila
Reported-by: Andrew Gierth
Discussion: https://postgr.es/m/87y3i1ia4w.fsf%40news-spur.riddles.org.uk
wcsrtombs (called through wchar2char from common functions like lower,
upper, etc.) uses various optimizations that may look like access to
uninitialized data, triggering valgrind reports.
For example AVX2 instructions load data in 256-bit chunks, and gconv
does something similar with 32-bit chunks. This is faster than accessing
the bytes one by one, and the uninitialized part of the buffer is not
actually used. So suppress the bogus reports.
The exact stack depends on possible optimizations - it might be AVX, SSE
(as in the report by Aleksander Alekseev) or something else. Hence the
last frame is wildcarded, to deal with this.
Backpatch all the way back to 9.4.
Author: Tomas Vondra
Discussion: https://www.postgresql.org/message-id/flat/90ac0452-e907-e7a4-b3c8-15bd33780e62%402ndquadrant.com
Discussion: https://www.postgresql.org/message-id/20180220150838.GD18315@e733.localdomain
This hasn't been correct since 9.3 added "latex-longtable".
I left the phraseology "Unique abbreviations are allowed" alone.
It's correct as far as it goes, and we are studiously refraining
from specifying exactly what happens if you give a non-unique
abbreviation. (The answer in the back branches is "you get a
backwards-compatible choice", and the answer in HEAD will shortly
be "you get an error", but there seems no need to mention such
details here.)
Daniel Vérité
Discussion: https://postgr.es/m/cb7e1caf-3ea6-450d-af28-f524903a030c@manitou-mail.org
In commit ecfd55795, I removed sqlda.c's checks for ndigits != 0 on the
grounds that we should duplicate the state of the numeric value's digit
buffer even when all the digits are zeroes. However, that still isn't
quite right, because another possible state of the digit buffer is
buf == digits == NULL (this occurs for a NaN). As the code now stands,
it'll invoke memcpy with a NULL source address and zero bytecount,
which we know a few platforms crash on. Hence, reinstate the no-copy
short-circuit, but make it test specifically for buf != NULL rather than
some other condition. In hindsight, the ndigits test (added by commit
f2ae9f9c3) was almost certainly meant to fix the NaN case not the
all-zeroes case as the associated thread alleged.
As before, back-patch to all supported versions.
Discussion: https://postgr.es/m/1803D792815FC24D871C00D17AE95905C71161@g01jpexmbkw24
If a failure happens when a transaction is starting between the moment
the transaction status is changed from TRANS_DEFAULT to TRANS_START and
the moment the current user ID and security context flags are fetched
via GetUserIdAndSecContext(), or before initializing its basic fields,
then those may get reset to incorrect values when the transaction
aborts, leaving the session in an inconsistent state.
One problem reported is that failing a starting transaction at the first
query of a session could cause several kinds of system crashes on the
follow-up queries.
In order to solve that, move the initialization of the transaction state
fields and the call of GetUserIdAndSecContext() in charge of fetching
the current user ID close to the point where the transaction status is
switched to TRANS_START, where there cannot be any error triggered
in-between, per an idea of Tom Lane. This properly ensures that the
current user ID, the security context flags and that the basic fields of
TransactionState remain consistent even if the transaction fails while
starting.
Reported-by: Richard Guo
Diagnosed-By: Richard Guo
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAN_9JTxECSb=pEPcb0a8d+6J+bDcOZ4=DgRo_B7Y5gRHJUM=Rw@mail.gmail.com
Backpatch-through: 9.4
Numeric values with leading zeroes were incorrectly copied into a
SQLDA (SQL Descriptor Area), leading to wrong results in ECPG programs.
Report and patch by Daisuke Higuchi. Back-patch to all supported
versions.
Discussion: https://postgr.es/m/1803D792815FC24D871C00D17AE95905C71161@g01jpexmbkw24
group clearing mechanism.
Commit 0e141c0fbb introduced initialization of atomic variable in
InitProcess which means that it's not safe to look at it for backends that
aren't currently in use. Fix that by initializing the same during postmaster
startup.
Reported-by: Andres Freund
Author: Amit Kapila
Backpatch-through: 9.6
Discussion: https://postgr.es/m/20181027104138.qmbbelopvy7cw2qv@alap3.anarazel.de
classify_index_clause_usage() is O(N^2) in the number of distinct index
qual clauses it considers, because of its use of a simple search list to
store them. For nearly all queries, that's fine because only a few clauses
will be considered. But Alexander Kuzmenkov reported a machine-generated
query with 80000 (!) index qual clauses, which caused this code to take
forever. Somewhat remarkably, this is the only O(N^2) behavior we now
have for such a query, so let's fix it.
We can get rid of the O(N^2) runtime for cases like this without much
damage to the functionality of choose_bitmap_and() by separating out
paths with "too many" qual or pred clauses, and deeming them to always
be nonredundant with other paths. Then their clauses needn't go into
the search list, so it doesn't get too long, but we don't lose the
ability to consider bitmap AND plans altogether. I set the threshold
for "too many" to be 100 clauses per path, which should be plenty to
ensure no change in planning behavior for normal queries.
There are other things we could do to make this go faster, but it's not
clear that it's worth any additional effort. 80000 qual clauses require
a whole lot of work in many other places, too.
The code's been like this for a long time, so back-patch to all supported
branches. The troublesome query only works back to 9.5 (in 9.4 it fails
with stack overflow in the parser); so I'm not sure that fixing this in
9.4 has any real-world benefit, but perhaps it does.
Discussion: https://postgr.es/m/90c5bdfa-d633-dabe-9889-3cf3e1acd443@postgrespro.ru
This patch fixes several related cases in which pg_shdepend entries were
never made, or were lost, for references to roles appearing in the ACLs of
schemas and/or types. While that did no immediate harm, if a referenced
role were later dropped, the drop would be allowed and would leave a
dangling reference in the object's ACL. That still wasn't a big problem
for normal database usage, but it would cause obscure failures in
subsequent dump/reload or pg_upgrade attempts, taking the form of
attempts to grant privileges to all-numeric role names. (I think I've
seen field reports matching that symptom, but can't find any right now.)
Several cases are fixed here:
1. ALTER DOMAIN SET/DROP DEFAULT would lose the dependencies for any
existing ACL entries for the domain. This case is ancient, dating
back as far as we've had pg_shdepend tracking at all.
2. If a default type privilege applies, CREATE TYPE recorded the
ACL properly but forgot to install dependency entries for it.
This dates to the addition of default privileges for types in 9.2.
3. If a default schema privilege applies, CREATE SCHEMA recorded the
ACL properly but forgot to install dependency entries for it.
This dates to the addition of default privileges for schemas in v10
(commit ab89e465c).
Another somewhat-related problem is that when creating a relation
rowtype or implicit array type, TypeCreate would apply any available
default type privileges to that type, which we don't really want
since such an object isn't supposed to have privileges of its own.
(You can't, for example, drop such privileges once they've been added
to an array type.)
ab89e465c is also to blame for a race condition in the regression tests:
privileges.sql transiently installed globally-applicable default
privileges on schemas, which sometimes got absorbed into the ACLs of
schemas created by concurrent test scripts. This should have resulted
in failures when privileges.sql tried to drop the role holding such
privileges; but thanks to the bug fixed here, it instead led to dangling
ACLs in the final state of the regression database. We'd managed not to
notice that, but it became obvious in the wake of commit da906766c, which
allowed the race condition to occur in pg_upgrade tests.
To fix, add a function recordDependencyOnNewAcl to encapsulate what
callers of get_user_default_acl need to do; while the original call
sites got that right via ad-hoc code, none of the later-added ones
have. Also change GenerateTypeDependencies to generate these
dependencies, which requires adding the typacl to its parameter list.
(That might be annoying if there are any extensions calling that
function directly; but if there are, they're most likely buggy in the
same way as the core callers were, so they need work anyway.) While
I was at it, I changed GenerateTypeDependencies to accept most of its
parameters in the form of a Form_pg_type pointer, making its parameter
list a bit less unwieldy and mistake-prone.
The test race condition is fixed just by wrapping the addition and
removal of default privileges into a single transaction, so that that
state is never visible externally. We might eventually prefer to
separate out tests of default privileges into a script that runs by
itself, but that would be a bigger change and would make the tests
run slower overall.
Back-patch relevant parts to all supported branches.
Discussion: https://postgr.es/m/15719.1541725287@sss.pgh.pa.us
Previously it was possible to set client_min_messages to FATAL or PANIC,
which had the effect of suppressing transmission of regular ERROR messages
to the client. Perhaps that seemed like a useful option in the past, but
the trouble with it is that it breaks guarantees that are explicitly made
in our FE/BE protocol spec about how a query cycle can end. While libpq
and psql manage to cope with the omission, that's mostly because they
are not very bright; client libraries that have more semantic knowledge
are likely to get confused. Notably, pgODBC doesn't behave very sanely.
Let's fix this by getting rid of the ability to set client_min_messages
above ERROR.
In HEAD, just remove the FATAL and PANIC options from the set of allowed
enum values for client_min_messages. (This change also affects
trace_recovery_messages, but that's OK since these aren't useful values
for that variable either.)
In the back branches, there was concern that rejecting these values might
break applications that are explicitly setting things that way. I'm
pretty skeptical of that argument, but accommodate it by accepting these
values and then internally setting the variable to ERROR anyway.
In all branches, this allows a couple of tiny simplifications in the
logic in elog.c, so do that.
Also respond to the point that was made that client_min_messages has
exactly nothing to do with the server's logging behavior, and therefore
does not belong in the "When To Log" subsection of the documentation.
The "Statement Behavior" subsection is a better match, so move it there.
Jonah Harris and Tom Lane
Discussion: https://postgr.es/m/7809.1541521180@sss.pgh.pa.us
Discussion: https://postgr.es/m/15479-ef0f4cc2fd995ca2@postgresql.org
On Windows, in UTF8 database encoding, what char2wchar() produces is
UTF16 not UTF32, ie, characters above U+FFFF will be represented by
surrogate pairs. t_isdigit() and siblings did not account for this
and failed to provide a large enough result buffer. That in turn
led to bogus "invalid multibyte character for locale" errors, because
contrary to what you might think from char2wchar()'s documentation,
its Windows code path doesn't cope sanely with buffer overflow.
The solution for t_isdigit() and siblings is pretty clear: provide
a 3-wchar_t result buffer not 2.
char2wchar() also needs some work to provide more consistent, and more
accurately documented, buffer overrun behavior. But that's a bigger job
and it doesn't actually have any immediate payoff, so leave it for later.
Per bug #15476 from Kenji Uno, who deserves credit for identifying the
cause of the problem. Back-patch to all active branches.
Discussion: https://postgr.es/m/15476-4314f480acf0f114@postgresql.org
The solution arrived at in commit e74dd00f5 presumes that the compiler
has a suitable default -isysroot setting ... but further experience
shows that in many combinations of macOS version, XCode version, Xcode
command line tools version, and phase of the moon, Apple's compiler
will *not* supply a default -isysroot value.
We could potentially go back to the approach used in commit 68fc227dd,
but I don't have a lot of faith in the reliability or life expectancy of
that either. Let's just revert to the approach already shipped in 11.0,
namely specifying an -isysroot switch globally. As a partial response to
the concerns raised by Jakob Egger, adjust the contents of Makefile.global
to look like
CPPFLAGS = -isysroot $(PG_SYSROOT) ...
PG_SYSROOT = /path/to/sysroot
This allows overriding the sysroot path at build time in a relatively
painless way.
Add documentation to installation.sgml about how to use the PG_SYSROOT
option. I also took the opportunity to document how to work around
macOS's "System Integrity Protection" feature.
As before, back-patch to all supported versions.
Discussion: https://postgr.es/m/20840.1537850987@sss.pgh.pa.us
Previously it was possible to create a slot, change wal_level, and
restart, even if the new wal_level was insufficient for the
slot. That's a problem for both logical and physical slots, because
the necessary WAL records are not generated.
This removes a few tests in newer versions that, somewhat
inexplicably, whether restarting with a too low wal_level worked (a
buggy behaviour!).
Reported-By: Joshua D. Drake
Author: Andres Freund
Discussion: https://postgr.es/m/20181029191304.lbsmhshkyymhw22w@alap3.anarazel.de
Backpatch: 9.4-, where replication slots where introduced
spgendscan neglected to pfree all the memory allocated by spgbeginscan.
It's possible to get away with that in most normal queries, since the
memory is allocated in the executor's per-query context which is about
to get deleted anyway; but it causes severe memory leakage during
creation or filling of large exclusion-constraint indexes.
Also, document that amendscan is supposed to free what ambeginscan
allocates. The docs' lack of clarity on that point probably caused this
bug to begin with. (There is discussion of changing that API spec going
forward, but I don't think it'd be appropriate for the back branches.)
Per report from Bruno Wolff. It's been like this since the beginning,
so back-patch to all active branches.
In HEAD, also fix an independent leak caused by commit 2a6368343
(allocating memory during spgrescan instead of spgbeginscan, which
might be all right if it got cleaned up, but it didn't). And do a bit
of code beautification on that commit, too.
Discussion: https://postgr.es/m/20181024012314.GA27428@wolff.to
This patch absorbs an upstream fix to "zic" for a recently-introduced
bug that made it output data that some 32-bit clients couldn't read.
Given the current source data, the bug only manifests in zones with
leap seconds, which we don't generate, so that there's no actual
change in our installed timezone data files from this. Still, in
case somebody uses our copy of "zic" to do something else, it seems
best to apply the fix promptly.
Also, update the README's notes about converting upstream code to
our conventions.
Modern versions of perl no longer include the current directory in the
perl searchpath, as it's insecure. Instead of adding the current
directory, we get around the problem by adding the directory where the
script lives.
Problem noted by Victor Wagner.
Solution adapted from buildfarm client code.
Backpatch to all live versions.
On Windows this mean that the regression tests can now safely and
successfully run as Administrator, which is useful in situations like
Appveyor. Elsewhere it's a no-op.
Backpatch to 9.5 - this is harder in earlier branches and not worth the
trouble.
Discussion: https://postgr.es/m/650b0c29-9578-8571-b1d2-550d7f89f307@2ndQuadrant.com
PQnotifies() is defined to just process already-read data, not try to read
any more from the socket. (This is a debatable decision, perhaps, but I'm
hesitant to change longstanding library behavior.) The documentation has
long recommended calling PQconsumeInput() before PQnotifies() to ensure
that any already-arrived message would get absorbed and processed.
However, psql did not get that memo, which explains why it's not very
reliable about reporting notifications promptly.
Also, most (not quite all) callers called PQconsumeInput() just once before
a PQnotifies() loop. Taking this recommendation seriously implies that we
should do PQconsumeInput() before each call. This is more important now
that we have "payload" strings in notification messages than it was before;
that increases the probability of having more than one packet's worth
of notify messages. Hence, adjust code as well as documentation examples
to do it like that.
Back-patch to 9.5 to match related server fixes. In principle we could
probably go back further with these changes, but given lack of field
complaints I doubt it's worthwhile.
Discussion: https://postgr.es/m/CAOYf6ec-TmRYjKBXLLaGaB-jrd=mjG1Hzn1a1wufUAR39PQYhw@mail.gmail.com
Commit 4f85fde8e introduced some code that was meant to ensure that we'd
process cancel, die, sinval catchup, and notify interrupts while waiting
for client input. But there was a flaw: it supposed that the process
latch would be set upon arrival at secure_read() if any such interrupt
was pending. In reality, we might well have cleared the process latch
at some earlier point while those flags remained set -- particularly
notifyInterruptPending, which can't be handled as long as we're within
a transaction.
To fix the NOTIFY case, also attempt to process signals (except
ProcDiePending) before trying to read.
Also, if we see that ProcDiePending is set before we read, forcibly set the
process latch to ensure that we will handle that signal promptly if no data
is available. I also made it set the process latch on the way out, in case
there is similar logic elsewhere. (It remains true that we won't service
ProcDiePending here unless we need to wait for input.)
The code for handling ProcDiePending during a write needs those changes,
too.
Also be a little more careful about when to reset whereToSendOutput,
and improve related comments.
Back-patch to 9.5 where this code was added. I'm not entirely convinced
that older branches don't have similar issues, but the complaint at hand
is just about the >= 9.5 code.
Jeff Janes and Tom Lane
Discussion: https://postgr.es/m/CAOYf6ec-TmRYjKBXLLaGaB-jrd=mjG1Hzn1a1wufUAR39PQYhw@mail.gmail.com