The ExecReScan machinery contains various optimizations for postponing
or skipping rescans of plan subtrees; for example a HashAgg node may
conclude that it can re-use the table it built before, instead of
re-reading its input subtree. But that is wrong if the input contains
a parallel-aware table scan node, since the portion of the table scanned
by the leader process is likely to vary from one rescan to the next.
This explains the timing-dependent buildfarm failures we saw after
commit a2b70c89c.
The established mechanism for showing that a plan node's output is
potentially variable is to mark it as depending on some runtime Param.
Hence, to fix this, invent a dummy Param (one that has a PARAM_EXEC
parameter number, but carries no actual value) associated with each Gather
or GatherMerge node, mark parallel-aware nodes below that node as dependent
on that Param, and arrange for ExecReScanGather[Merge] to flag that Param
as changed whenever the Gather[Merge] node is rescanned.
This solution breaks an undocumented assumption made by the parallel
executor logic, namely that all rescans of nodes below a Gather[Merge]
will happen synchronously during the ReScan of the top node itself.
But that's fundamentally contrary to the design of the ExecReScan code,
and so was doomed to fail someday anyway (even if you want to argue
that the bug being fixed here wasn't a failure of that assumption).
A follow-on patch will address that issue. In the meantime, the worst
that's expected to happen is that given very bad timing luck, the leader
might have to do all the work during a rescan, because workers think
they have nothing to do, if they are able to start up before the eventual
ReScan of the leader's parallel-aware table scan node has reset the
shared scan state.
Although this problem exists in 9.6, there does not seem to be any way
for it to manifest there. Without GatherMerge, it seems that a plan tree
that has a rescan-short-circuiting node below Gather will always also
have one above it that will short-circuit in the same cases, preventing
the Gather from being rescanned. Hence we won't take the risk of
back-patching this change into 9.6. But v10 needs it.
Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
Adding more than 1 billion rows to a PGresult would overflow its ntups and
tupArrSize fields, leading to client crashes. It'd be desirable to use
wider fields on 64-bit machines, but because all of libpq's external APIs
use plain "int" for row counters, that's going to be hard to accomplish
without an ABI break. Given the lack of complaints so far, and the general
pain that would be involved in using such huge PGresults, let's settle for
just preventing the overflow and reporting a useful error message if it
does happen. Also, for a couple more lines of code we can increase the
threshold of trouble from INT_MAX/2 to INT_MAX rows.
To do that, refactor pqAddTuple() to allow returning an error message that
replaces the default assumption that it failed because of out-of-memory.
Along the way, fix PQsetvalue() so that it reports all failures via
pqInternalNotice(). It already did so in the case of bad field number,
but neglected to report anything for other error causes.
Because of the potential for crashes, this seems like a back-patchable
bug fix, despite the lack of field reports.
Michael Paquier, per a complaint from Igor Korot.
Discussion: https://postgr.es/m/CA+FnnTxyLWyjY1goewmJNxC==HQCCF4fKkoCTa9qR36oRAHDPw@mail.gmail.com
Fix thinko in commit 8be8510cf: it's okay to have dbid == 0 in normal
(non-pin) entries in pg_shdepend, because global objects such as
databases are entered that way. The test would pass so long as it
was run in a cluster containing no databases/tablespaces owned by,
or granted to, roles other than the bootstrap superuser. That's the
expected situation for "make check", but for "make installcheck", not
so much.
Reported by Ryan Murphy.
Discussion: https://postgr.es/m/CAHeEsBc6EQe0mxGBKDXAwJbntgfvoAd5MQC-5362SmC3Tng_6g@mail.gmail.com
The string "% of total" was marked by xgettext to be a c-format, but it
is actually not, so mark up the source to prevent that.
Compute the column widths of the final display dynamically based on the
translated strings, so that translations don't mess up the display
accidentally.
Commit 16be2fd100199bdf284becfcee02c5eb20d8a11d added DSA_ALLOC_HUGE,
DSA_ALLOC_ZERO and DSA_ALLOC_NO_OOM which have the same numerical
values and meanings as the similarly named MCXT_... macros. In one
place we accidentally used MCXT_ALLOC_NO_OOM when DSA_ALLOC_NO_OOM is
wanted, so tidy that up.
Author: Thomas Munro
Discussion: http://postgr.es/m/CAEepm=2AimHxVkkxnMfQvbZMkXy0uKbVa0-D38c5-qwrCm4CMQ@mail.gmail.com
Backpatch: 10, where dsa was introduced.
Remove code meant for upgrading to a particular version of PostgreSQL
9.0. Since pg_upgrade only supports upgrading to the current major
version, this code is no longer useful.
2cd70845240 / c6293249d change the way individual attributes in a
TupleDesc are stored / accessed. To reduce the effort of making
extensions compatible with postgresql 11, and to ease future
backpatching, backpatch introduction of TupleDescAttr() to all
releases. Do not backpatch change in storage, as that'd be a breaking
change for existing and working extensions.
Author: Andres Freund
Discussion: https://postgr.es/m/20170820181723.tdswdinzptbcwhrr@alap3.anarazel.de
Backpatch: 9.2-
Users can still create them themselves. Instead, document Unicode TR 35
collation options for ICU, so users can create all this themselves.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Install language+region combinations even if they are not distinct from
the language's base locale. This gives better long-term stability of
the set of predefined locales and makes the predefined locales less
implementation-dependent and more practical for users.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
It appeared in a conditional that excludes AIX, Cygwin and MinGW. Give
ICU support a chance to work on those platforms. Back-patch to v10,
where ICU support was introduced.
Commit 3eb9a5e7c unintentionally introduced an ordering dependency
into restore_toc_entries_prefork(). The existing coding of
reduce_dependencies() contains a check to skip moving a TOC entry
to the ready_list if it wasn't initially in the pending_list.
This used to suffice to prevent reduce_dependencies() from trying to
move anything into the ready_list during restore_toc_entries_prefork(),
because the pending_list stayed empty throughout that phase; but it no
longer does. The problem doesn't manifest unless the TOC has been
reordered by SortTocFromFile, which is how I missed it in testing.
To fix, just add a test for ready_list == NULL, converting the call
with NULL from a poor man's sanity check into an explicit command
not to touch TOC items' list membership. Clarify some of the comments
around this; in particular, note the primary purpose of the check for
pending_list membership, which is to ensure that we can't try to restore
the same item twice, in case a TOC list forces it to be restored before
its dependency count goes to zero.
Per report from Fabrízio de Royes Mello. Back-patch to 9.3, like the
previous commit.
Discussion: https://postgr.es/m/CAFcNs+pjuv0JL_x4+=71TPUPjdLHOXA4YfT32myj_OrrZb4ohA@mail.gmail.com
Add a new EState member es_leaf_result_relations, so that the trigger
code knows about ResultRelInfos created by tuple routing. Also make
sure ExplainPrintTriggers knows about partition-related
ResultRelInfos.
Etsuro Fujita, reviewed by Amit Langote
Discussion: http://postgr.es/m/57163e18-8e56-da83-337a-22f2c0008051@lab.ntt.co.jp
That code patch was good as far as it went, but the associated test case
has exposed fundamental brain damage in the parallel scan mechanism,
which is going to take nontrivial work to correct. In the interests of
getting the buildfarm back to green so that unrelated work can proceed,
let's temporarily remove the test case.
Instead, lock them in the caller using find_all_inheritors so that
they get locked in the standard order, minimizing deadlock risks.
Also in RelationGetPartitionDispatchInfo, avoid opening tables which
are not partitioned; there's no need.
Amit Langote, reviewed by Ashutosh Bapat and Amit Khandekar
Discussion: http://postgr.es/m/91b36fa1-c197-b72f-ca6e-56c593bae68c@lab.ntt.co.jp
It now emerges that we can only rely on Perl to tell us we must use
-D_USE_32BIT_TIME_T if it's Perl 5.13.4 or later. For older versions,
revert to our previous practice of assuming we need that symbol in
all 32-bit Windows builds. This is not ideal, but inquiring into
which compiler version Perl was built with seems far too fragile.
In any case, we had not previously had complaints about these old
Perl versions, so let's assume this is Good Enough. (It's still
better than the situation ante commit 5a5c2feca, in that at least
the effects are confined to PL/Perl rather than the whole PG build.)
Back-patch to all supported versions, like 5a5c2feca and predecessors.
Discussion: https://postgr.es/m/CANFyU97OVQ3+Mzfmt3MhuUm5NwPU=-FtbNH5Eb7nZL9ua8=rcA@mail.gmail.com
This became possible by commit
6c2003f8a1bbc7c192a2e83ec51581c018aa162f. This just makes pg_dump aware
of it and updates the documentation.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Although not confirmed and probably rare, if the newly allocated memory
is not already zero, this could possibly have caused some problems.
Also reorder the initializations slightly so they match the order of the
struct definition.
Author: Wong, Yi Wen <yiwong@amazon.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
This appears to have been an omission in the original commit
0d692a0dc9f. All related information_schema views already include
foreign tables.
Reported-by: Nicolas Thauvin <nicolas.thauvin@dalibo.com>
The initial implementation of autovacuum work-items used a dynamic
shared memory area (DSA). However, it's argued that dynamic shared
memory is not portable enough, so we cannot rely on it being supported
everywhere; at the same time, autovacuum work-items are now a critical
part of the server, so it's not acceptable that they don't work in the
cases where dynamic shared memory is disabled. Therefore, let's fall
back to a simpler implementation of work-items that just uses
autovacuum's main shared memory segment for storage.
Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com
Since we currently only have one protocol, this doesn't make much of a
difference other than the error message.
Author: Yugo Nagata <nagata@sraoss.co.jp>
The original code (since 00e6a16d01) was assuming aborting the
transaction in autovacuum launcher was sufficient to release all
resources, but in reality the launcher runs quite a lot of code out of
any transactions. Re-introduce individual cleanup calls to make abort
more robust.
Reported-by: Robert Haas
Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com
The API for WaitLatch and friends followed the Unix convention in which
waiting for a socket connection to complete is identical to waiting for
the socket to accept a write. While Windows provides a select(2)
emulation that agrees with that, the native WaitForMultipleObjects API
treats them as quite different --- and for some bizarre reason, it will
report a not-yet-connected socket as write-ready. libpq itself has so
far escaped dealing with this because it waits with select(), but in
libpqwalreceiver.c we want to wait using WaitLatchOrSocket. The semantics
mismatch resulted in replication connection failures on Windows, but only
for remote connections (apparently, localhost connections complete
immediately, or at least too fast for anyone to have noticed the problem
in single-machine testing).
To fix, introduce an additional WL_SOCKET_CONNECTED wait flag for
WaitLatchOrSocket, which is identical to WL_SOCKET_WRITEABLE on
non-Windows, but results in waiting for FD_CONNECT events on Windows.
Ideally, we would also distinguish the two conditions in the API for
PQconnectPoll(), but changing that API at this point seems infeasible.
Instead, cheat by checking for PQstatus() == CONNECTION_STARTED to
determine that we're still waiting for the connection to complete.
(This is a cheat mainly because CONNECTION_STARTED is documented as an
internal state rather than something callers should rely on. Perhaps
we ought to change the documentation ... but this patch doesn't.)
Per reports from Jobin Augustine and Igor Neyman. Back-patch to v10
where commit 1e8a85009 exposed this longstanding shortcoming.
Andres Freund, minor fix and some code review/beautification by me
Discussion: https://postgr.es/m/CAHBggj8g2T+ZDcACZ2FmzX9CTxkWjKBsHd6NkYB4i9Ojf6K1Fw@mail.gmail.com
Before commit d3cc37f1d801a6b5cad9bf179274a8, an inheritance parent
whose only children were temp tables of other sessions would end up
as a simple scan of the parent; but with that commit, we end up with
an Append node, per a report from Ashutosh Bapat. Tweak the logic
so that we go back to the old way, and update the function header
comment for partitioning while we're at it.
Ashutosh Bapat, reviewed by Amit Langote and adjusted by me.
Discussion: http://postgr.es/m/CAFjFpReWJr1yTkHU=OqiMBmcYCMoSW3VPR39RBuQ_ovwDFBT5Q@mail.gmail.com
Stress testing by Andreas Seltenreich disclosed longstanding problems that
occur if a FATAL exit (e.g. due to receipt of SIGTERM) occurs while we are
trying to execute a ROLLBACK of an already-failed transaction. In such a
case, xact.c is in TBLOCK_ABORT state, so that AbortOutOfAnyTransaction
would skip AbortTransaction and go straight to CleanupTransaction. This
led to an assert failure in an assert-enabled build (due to the ROLLBACK's
portal still having a cleanup hook) or without assertions, to a FATAL exit
complaining about "cannot drop active portal". The latter's not
disastrous, perhaps, but it's messy enough to want to improve it.
We don't really want to run all of AbortTransaction in this code path.
The minimum required to clean up the open portal safely is to do
AtAbort_Memory and AtAbort_Portals. It seems like a good idea to
do AtAbort_Memory unconditionally, to be entirely sure that we are
starting with a safe CurrentMemoryContext. That means that if the
main loop in AbortOutOfAnyTransaction does nothing, we need an extra
step at the bottom to restore CurrentMemoryContext = TopMemoryContext,
which I chose to do by invoking AtCleanup_Memory. This'll result in
calling AtCleanup_Memory twice in many of the paths through this function,
but that seems harmless and reasonably inexpensive.
The original motivation for the assertion in AtCleanup_Portals was that
we wanted to be sure that any user-defined code executed as a consequence
of the cleanup hook runs during AbortTransaction not CleanupTransaction.
That still seems like a valid concern, and now that we've seen one case
of the assertion firing --- which means that exactly that would have
happened in a production build --- let's replace the Assert with a runtime
check. If we see the cleanup hook still set, we'll emit a WARNING and
just drop the hook unexecuted.
This has been like this a long time, so back-patch to all supported
branches.
Discussion: https://postgr.es/m/877ey7bmun.fsf@ansel.ydns.eu
Commit 3c163a7fc's original choice to ignore all #define symbols whose
names begin with underscore turns out to be too simplistic. On Windows,
some Perl installations are built with -D_USE_32BIT_TIME_T, and we must
absorb that or we get the wrong result for sizeof(PerlInterpreter).
This effectively re-reverts commit ef58b87df, which injected that symbol
in a hacky way, making it apply to all of Postgres not just PL/Perl.
More significantly, it did so on *all* 32-bit Windows builds, even when
the Perl build to be used did not select this option; so that it fails
to work properly with some newer Perl builds.
By making this change, we would be introducing an ABI break in 32-bit
Windows builds; but fortunately we have not used type time_t in any
exported Postgres APIs in a long time. So it should be OK, both for
PL/Perl itself and for third-party extensions, if an extension library
is built with a different _USE_32BIT_TIME_T setting than the core code.
Patch by me, based on research by Ashutosh Sharma and Robert Haas.
Back-patch to all supported branches, as commit 3c163a7fc was.
Discussion: https://postgr.es/m/CANFyU97OVQ3+Mzfmt3MhuUm5NwPU=-FtbNH5Eb7nZL9ua8=rcA@mail.gmail.com
The sole useful effect of this function, to check that no catcache
entries have positive refcounts at transaction end, has really been
obsolete since we introduced ResourceOwners in PG 8.1. We reduced the
checks to assertions years ago, so that the function was a complete
no-op in production builds. There have been previous discussions about
removing it entirely, but consensus up to now was that it had some small
value as a cross-check for bugs in the ResourceOwner logic.
However, it now emerges that it's possible to trigger these assertions
if you hit an assert-enabled backend with SIGTERM during a call to
SearchCatCacheList, because that function temporarily increases the
refcounts of entries it's intending to add to a catcache list construct.
In a normal ERROR scenario, the extra refcounts are cleaned up by
SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a
transaction abort and exit without ever executing PG_CATCH handlers.
There's a case to be made that this is a generic hazard and we should
consider restructuring elog(FATAL) handling so that pending PG_CATCH
handlers do get run. That's pretty scary though: it could easily create
more problems than it solves. Preliminary stress testing by Andreas
Seltenreich suggests that there are not many live problems of this ilk,
so we rejected that idea.
There are more-localized ways to fix the problem; the most principled
one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY.
But adding cycles to SearchCatCacheList isn't very appealing. We could
also weaken the assertions in AtEOXact_CatCache in some more or less
ad-hoc way, but that just makes its raison d'etre even less compelling.
In the end, the most reasonable solution seems to be to just remove
AtEOXact_CatCache altogether, on the grounds that it's not worth trying
to fix it. It hasn't found any bugs for us in many years.
Per report from Jeevan Chalke. Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com
At least on my machine, a run with code coverage enabled produces some
".gcov" files whose names begin with ".". "rm -f *.gcov" fails to match
those, so they don't get cleaned up by "make clean". Fix it.