The majority of error exit cases in json_lex_string() failed to
set lex->token_terminator, causing problems for the error context
reporting code: it would see token_terminator less than token_start
and do something more or less nuts. In v14 and up the end result
could be as bad as a crash in report_json_context(). Older
versions accidentally avoided that fate; but all versions produce
error context lines that are far less useful than intended,
because they'd stop at the end of the prior token instead of
continuing to where the actually-bad input is.
To fix, invent some macros that make it less notationally painful
to do the right thing. Also add documentation about what the
function is actually required to do; and in >= v14, add an assertion
in report_json_context about token_terminator being sufficiently
far advanced.
Per report from Nikolay Shaplov. Back-patch to all supported
versions.
Discussion: https://postgr.es/m/7332649.x5DLKWyVIX@thinkpad-pgpro
check_agg_arguments_walker() supposed that it needn't descend into
the arguments of a lower-level aggregate function, but this is
just wrong in the presence of multiple levels of sub-select. The
oversight would lead to executor failures on queries that should
be rejected. (Prior to v11, they actually were rejected, thanks
to a "redundant" execution-time check.)
Per bug #17835 from Anban Company. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/17835-4f29f3098b2d0ba4@postgresql.org
The error cases for TLS and GSS encryption were inconsistent. After TLS
fails, the connection is marked as dead and follow-up calls of
PQconnectPoll() would return immediately, but GSS encryption was not
doing that, so the connection would still have been allowed to enter the
GSS handling code. This was handled incorrectly when gssencmode was set
to "require". "prefer" was working correctly, and this could not happen
under "disable" as GSS encryption would not be attempted.
This commit makes the error handling of GSS encryption on par with TLS
portion, fixing the case of gssencmode=require.
Reported-by: Jacob Champion
Author: Michael Paquier
Reviewed-by: Jacob Champion, Stephen Frost
Discussion: https://postgr.es/m/23787477-5fe1-a161-6d2a-e459f74c4713@timescale.com
Backpatch-through: 12
This was an omission in the original creation of the module.
Also slightly adjust some wording to avoid a double "is".
Backpatch the non-meson piece of this to release 12, where the module
was introduced.
Discussion: https://postgr.es/m/be869e1c-8e3f-4cde-8609-212c899cccf9@dunslane.net
64bit xids can't represent xids before epoch 0 (see also be504a3e974). When
FullTransactionIdFromXidAndCtx() was passed such an xid, it'd create a 64bit
xid far into the future. Noticed while adding assertions in the course of
investigating be504a3e974, as amcheck's test create such xids.
To fix the issue, just return FirstNormalFullTransactionId in this case. A
freshly initdb'd cluster already has a newer horizon. The most minimal version
of this would make the messages for some detected corruptions differently
inaccurate. To make those cases accurate, switch
FullTransactionIdFromXidAndCtx() to use the 32bit modulo difference between
xid and nextxid to compute the 64bit xid, yielding sensible "in the future" /
"in the past" answers.
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
Backpatch: 14-, where heapam verification was introduced
The initialization order in update_cached_xid_range() was wrong, calling
FullTransactionIdFromXidAndCtx() before setting
->next_xid. FullTransactionIdFromXidAndCtx() uses ->next_xid.
In most situations this will not cause visible issues, because the next call
to update_cached_xid_range() will use a less wrong ->next_xid. It's rare that
xids advance fast enough for this to be a problem.
Found while adding more asserts to the 64bit xid infrastructure.
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
Backpatch: 14-, where heapam verification was introduced
If the regex compiler can see that a regex is unsatisfiable
(for example, '$foo') then it may emit an NFA having no arcs.
pg_trgm's packGraph function did the wrong thing in this case;
it would access off the end of a work array, and with bad luck
could produce a corrupted output data structure causing more
problems later. This could end with wrong answers or crashes
in queries using a pg_trgm GIN or GiST index with such a regex.
Fix by not trying to de-duplicate if there aren't at least 2 arcs.
Per bug #17830 from Alexander Lakhin. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/17830-57ff5f89bdb02b09@postgresql.org
The COPY documentation is quite clear that "COPY relation TO" copies
rows from only the named table, not any inheritance children it may
have. However, if you enabled row-level security on the table then
this stopped being true, because the code forgot to apply the ONLY
modifier in the "SELECT ... FROM relation" query that it constructs
in order to allow RLS predicates to be attached. Fix that.
Report and patch by Antonin Houska (comment adjustments and test case
by me). Back-patch to all supported branches.
Discussion: https://postgr.es/m/3472.1675251957@antos
Commit bdaabb9b started skipping doomed transactions when building the
list of possible conflicts for SERIALIZABLE READ ONLY. That makes
sense, because doomed transactions won't commit, but a couple of subtle
things broke:
1. If all uncommitted r/w transactions are doomed, a READ ONLY
transaction would arbitrarily not benefit from the safe snapshot
optimization. It would not be taken immediately, and yet no other
transaction would set SXACT_FLAG_RO_SAFE later.
2. In the same circumstances but with DEFERRABLE, GetSafeSnapshot()
would correctly exit its wait loop without sleeping and then take the
optimization in non-assert builds, but assert builds would fail a sanity
check that SXACT_FLAG_RO_SAFE had been set by another transaction.
This is similar to the case for PredXact->WritableSxactCount == 0. We
should opt out immediately if our possibleUnsafeConflicts list is empty
after filtering.
The code to maintain the serializable global xmin is moved down below
the new opt out site, because otherwise we'd have to reverse its effects
before returning.
Back-patch to all supported releases. Bug #17368.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/17116-d6ca217acc180e30%40postgresql.org
Discussion: https://postgr.es/m/20110707212159.GF76634%40csail.mit.edu
When vacuum_defer_cleanup_age is bigger than the current xid, including the
epoch, the subtraction of vacuum_defer_cleanup_age would lead to a wrapped
around xid. While that normally is not a problem, the subsequent conversion to
a 64bit xid results in a 64bit-xid very far into the future. As that xid is
used as a horizon to detect whether rows versions are old enough to be
removed, that allows removal of rows that are still visible (i.e. corruption).
If vacuum_defer_cleanup_age was never changed from the default, there is no
chance of this bug occurring.
This bug was introduced in dc7420c2c92. A lesser version of it exists in
12-13, introduced by fb5344c969a, affecting only GiST.
The 12-13 version of the issue can, in rare cases, lead to pages in a gist
index getting recycled too early, potentially causing index entries to be
found multiple times.
The fix is fairly simple - don't allow vacuum_defer_cleanup_age to retreat
further than FirstNormalTransactionId.
Patches to make similar bugs easier to find, by adding asserts to the 64bit
xid infrastructure, have been proposed, but are not suitable for backpatching.
Currently there are no tests for vacuum_defer_cleanup_age. A patch introducing
infrastructure to make writing a test easier has been posted to the list.
Reported-by: Michail Nikolaev <michail.nikolaev@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de
Backpatch: 12-, but impact/fix is smaller for 12-13
If a view is defined atop another view, and then CREATE OR REPLACE
VIEW is used to add columns to the lower view, then when the upper
view's referencing RTE is expanded by ApplyRetrieveRule we will have
a subquery RTE with fewer eref->colnames than output columns. This
confuses various code that assumes those lists are always in sync,
as they are in plain parser output.
We have seen such problems before (cf commit d5b760ecb), and now
I think the time has come to do what was speculated about in that
commit: let's make ApplyRetrieveRule synthesize some column names to
preserve the invariant that holds in parser output. Otherwise we'll
be chasing this class of bugs indefinitely. Moreover, it appears from
testing that this actually gives us better results in the test case
d5b760ecb added, and likely in other corner cases that we lack
coverage for.
In HEAD, I replaced d5b760ecb's hack to make expandRTE exit early with
an elog(ERROR) call, since the case is now presumably unreachable.
But it seems like changing that in back branches would bring more risk
than benefit, so there I just updated the comment.
Per bug #17811 from Alexander Lakhin. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/17811-d31686b78f0dffc9@postgresql.org
If UPDATE is forced to retry after an EvalPlanQual check, it neglected
to repeat GENERATED-column computations, even though those might well
have changed since we're dealing with a different tuple than before.
Fixing this is mostly a matter of looping back a bit further when
we retry. In v15 and HEAD that's most easily done by altering the API
of ExecUpdateAct so that it includes computing GENERATED expressions.
Also, if an UPDATE in a partitioned table turns into a cross-partition
INSERT operation, we failed to recompute GENERATED columns. That's a
bug since 8bf6ec3ba allowed partitions to have different generation
expressions; although it seems to have no ill effects before that.
Fixing this is messier because we can now have situations where the same
query needs both the UPDATE-aligned set of GENERATED columns and the
INSERT-aligned set, and it's unclear which set will be generated first
(else we could hack things by forcing the INSERT-aligned set to be
generated, which is indeed how fe9e658f4 made it work for MERGE).
The best fix seems to be to build and store separate sets of expressions
for the INSERT and UPDATE cases. That would create ABI issues in the
back branches, but so far it seems we can leave this alone in the back
branches.
Per bug #17823 from Hisahiro Kauchi. The first part of this affects all
branches back to v12 where GENERATED columns were added.
Discussion: https://postgr.es/m/17823-b64909cf7d63de84@postgresql.org
1. Make sure that we don't decrement SxactGlobalXminCount twice when
the SXACT_FLAG_RO_SAFE optimization is reached in a parallel query.
This could trigger a sanity check failure in assert builds. Non-assert
builds recompute the count in SetNewSxactGlobalXmin(), so the problem
was hidden, explaining the lack of field reports. Add a new isolation
test to exercise that case.
2. Remove an assertion that the DOOMED flag can't be set on a partially
released SERIALIZABLEXACT. Instead, ignore the flag (our transaction
was already determined to be read-only safe, and DOOMED is in fact set
during partial release, and there was already an assertion that it
wasn't set sooner). Improve an existing isolation test so that it
reaches that case (previously it wasn't quite testing what it was
supposed to be testing; see discussion).
Back-patch to 12. Bug #17116. Defects in commit 47a338cf.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/17116-d6ca217acc180e30%40postgresql.org
Attempting to use this function with a raw page not coming from a GiST
index would cause a crash, as it was missing the same sanity checks as
gist_page_items_bytea(). This slightly refactors the code so as all the
basic validation checks for GiST pages are done in a single routine,
in the same fashion as the pageinspect functions for hash and BRIN.
This fixes an issue similar to 076f4d9. A test is added to stress for
this case. While on it, I have added a similar test for
brin_page_items() with a combination make of a valid GiST index and a
raw btree page. This one was already protected, but it was not tested.
Reported-by: Egor Chindyaskin
Author: Dmitry Koval
Discussion: https://postgr.es/m/17815-fc4a2d3b74705703@postgresql.org
Backpatch-through: 14
This is usually harmless, but if you were very unlucky it could
provoke a segfault due to the "to" string being right up against
the end of memory. Found via valgrind testing (so we might've
found it earlier, except that our regression tests lacked any
exercise of translate()'s deletion feature).
Fix by switching the order of the test-for-end-of-string and
advance-pointer steps. While here, compute "to_ptr + tolen"
just once. (Smarter compilers might figure that out for
themselves, but let's just make sure.)
Report and fix by Daniil Anisimov, in bug #17816.
Discussion: https://postgr.es/m/17816-70f3d2764e88a108@postgresql.org
postgres_fdw will close its remote session if an sinval cache reset
occurs, since it's possible that that means some FDW parameters
changed. We had two tests that were trying to ensure that the
session remains alive by setting debug_discard_caches = 0; but
that's not sufficient. Even though the tests seem stable enough
in the buildfarm, they flap a lot under CI.
In the first test, which is checking the ability to recover from
a lost connection, we can stabilize the results by just not
caring whether pg_terminate_backend() finds a victim backend.
If a reset did happen, there won't be a session to terminate
anymore, but the test can proceed anyway. (Arguably, we are
then not testing the unintentional-disconnect case, but as long
as that scenario is exercised in most runs I think it's fine;
testing the reset-driven case is of value too.)
In the second test, which is trying to verify the application_name
displayed in pg_stat_activity by a remote session, we had a race
condition in that the remote session might go away before we can
fetch its pg_stat_activity entry. We can close that race and make
the test more certainly test what it intends to by arranging things
so that the remote session itself fetches its pg_stat_activity entry
(based on PID rather than a somewhat-circular assumption about the
application name).
Both tests now demonstrably pass under debug_discard_caches = 1,
so we can remove that hack.
Back-patch into relevant back branches.
Discussion: https://postgr.es/m/20230226194340.u44bkfgyz64c67i6@awork3.anarazel.de
It's been this way for a very long time, but it appears to have been
masking an issue that only manifests with different settings. Therefore,
run the tests in the installation's default encoding/locale.
Backpatch to all live branches.
We already tried to fix this in commits 3f7323cbb et al (and follow-on
fixes), but now it emerges that there are still unfixed cases;
moreover, these cases affect all branches not only pre-v14. I thought
we had eliminated all cases of making multiple clones of an UPDATE's
target list when we nuked inheritance_planner. But it turns out we
still do that in some partitioned-UPDATE cases, notably including
INSERT ... ON CONFLICT UPDATE, because ExecInitPartitionInfo thinks
it's okay to clone and modify the parent's targetlist.
This fix is based on a suggestion from Andres Freund: let's stop
abusing the ParamExecData.execPlan mechanism, which was only ever
meant to handle initplans, and instead solve the execution timing
problem by having the expression compiler move MULTIEXPR_SUBLINK steps
to the front of their expression step lists. This is feasible because
(a) all branches still in support compile the entire targetlist of
an UPDATE into a single ExprState, and (b) we know that all
MULTIEXPR_SUBLINKs do need to be evaluated --- none could be buried
inside a CASE, for example. There is a minor semantics change
concerning the order of execution of the MULTIEXPR's subquery versus
other parts of the parent targetlist, but that seems like something
we can get away with. By doing that, we no longer need to worry
about whether different clones of a MULTIEXPR_SUBLINK share output
Params; their usage of that data structure won't overlap.
Per bug #17800 from Alexander Lakhin. Back-patch to all supported
branches. In v13 and earlier, we can revert 3f7323cbb and follow-on
fixes; however, I chose to keep the SubPlan.subLinkId field added
in ccbb54c72. We don't need that anymore in the core code, but it's
cheap enough to fill, and removing a plan node field in a minor
release seems like it'd be asking for trouble.
Andres Freund and Tom Lane
Discussion: https://postgr.es/m/17800-ff90866b3906c964@postgresql.org
If a rule action contains a subquery that refers to columns from OLD
or NEW, then those are really lateral references, and the planner will
complain if it sees such things in a subquery that isn't marked as
lateral. However, at rule-definition time, the user isn't required to
mark the subquery with LATERAL, and so it can fail when the rule is
used.
Fix this by marking such subqueries as lateral in the rewriter, at the
point where they're used.
Dean Rasheed and Tom Lane, per report from Alexander Lakhin.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/5e09da43-aaba-7ea7-0a51-a2eb981b058b%40gmail.com
Multiple cycles of starting up and shutting down the plugin within a
single session would eventually lead to "out of relcache_callback_list
slots", because pgoutput_startup blindly re-registered its cache
callbacks each time. Fix it to register them only once, as all other
users of cache callbacks already take care to do.
This has been broken all along, so back-patch to all supported branches.
Shi Yu
Discussion: https://postgr.es/m/OSZPR01MB631004A78D743D68921FFAD3FDA79@OSZPR01MB6310.jpnprd01.prod.outlook.com
Given an updatable view with a DO ALSO INSERT ... SELECT rule, a
multi-row INSERT ... VALUES query on the view fails if the VALUES list
contains any DEFAULTs that are not replaced by view defaults. This
manifests as an "unrecognized node type" error, or an Assert failure,
in an assert-enabled build.
The reason is that when RewriteQuery() attempts to replace the
remaining DEFAULT items with NULLs in any product queries, using
rewriteValuesRTEToNulls(), it assumes that the VALUES RTE is located
at the same rangetable index in each product query. However, if the
product query is an INSERT ... SELECT, then the VALUES RTE is actually
in the SELECT part of that query (at the same index), rather than the
top-level product query itself.
Fix, by descending to the SELECT in such cases. Note that we can't
simply use getInsertSelectQuery() for this, since that expects to be
given a raw rule action with OLD and NEW placeholder entries, so we
duplicate its logic instead.
While at it, beef up the checks in getInsertSelectQuery() by checking
that the jointree->fromlist node is indeed a RangeTblRef, and that the
RTE it points to has rtekind == RTE_SUBQUERY.
Per bug #17803, from Alexander Lakhin. Back-patch to all supported
branches.
Dean Rasheed, reviewed by Tom Lane.
Discussion: https://postgr.es/m/17803-53c63ed4ecb4eac6%40postgresql.org
Whe decoding a transactional logical message, logicalmsg_decode called
SnapBuildGetOrBuildSnapshot. But we may not have a consistent snapshot
yet at that point. We don't actually need the snapshot in this case
(during replay we'll have the snapshot from the transaction), so in
practice this is harmless. But in assert-enabled build this crashes.
Fixed by requesting the snapshot only in non-transactional case, where
we are guaranteed to have SNAPBUILD_CONSISTENT.
Backpatch to 11. The issue exists since 9.6.
Backpatch-through: 11
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/84d60912-6eab-9b84-5de3-41765a5449e8@enterprisedb.com
If asked to decrease the size of a large (>8K) palloc chunk,
AllocSetRealloc could improperly change the Valgrind state of memory
beyond the new end of the chunk: it would mark data UNDEFINED as far
as the old end of the chunk after having done the realloc(3) call,
thus tromping on the state of memory that no longer belongs to it.
One would normally expect that memory to now be marked NOACCESS,
so that this mislabeling might prevent detection of later errors.
If realloc() had chosen to move the chunk someplace else (unlikely,
but well within its rights) we could also mismark perfectly-valid
DEFINED data as UNDEFINED, causing false-positive valgrind reports
later. Also, any malloc bookkeeping data placed within this area
might now be wrongly marked, causing additional problems.
Fix by replacing relevant uses of "oldsize" with "Min(size, oldsize)".
It's sufficient to mark as far as "size" when that's smaller, because
whatever remains in the new chunk size will be marked NOACCESS below,
and we expect realloc() to have taken care of marking the memory
beyond the new official end of the chunk.
While we're here, also rename the function's "oldsize" variable
to "oldchksize" to more clearly explain what it actually holds,
namely the distance to the end of the chunk (that is, requested size
plus trailing padding). This is more consistent with the use of
"size" and "chksize" to hold the new requested size and chunk size.
Add a new variable "oldsize" in the one stanza where we're actually
talking about the old requested size.
Oversight in commit c477f3e44. Back-patch to all supported branches,
as that was, just in case anybody wants to do valgrind testing on back
branches.
Karina Litskevich
Discussion: https://postgr.es/m/CACiT8iaAET-fmzjjZLjaJC4zwSJmrFyL7LAdHwaYyjjQOQ4hcg@mail.gmail.com
Failing to do so results in an error when a pgbench script tries to
start a serializable transaction inside a pipeline, because by the time
BEGIN ISOLATION LEVEL SERIALIZABLE is executed, we're already in a
transaction that has acquired a snapshot, so the server rightfully
complains.
We can work around that by preparing all commands in the pipeline before
actually starting the pipeline. This changes the existing code in two
aspects: first, we now prepare each command individually at the point
where that command is about to be executed; previously, we would prepare
all commands in a script as soon as the first command of that script
would be executed. It's hard to see that this would make much of a
difference (particularly since it only affects the first time to execute
each script in a client), but I didn't actually try to measure it.
Secondly, we no longer use PQsendPrepare() in pipeline mode, but only
PQprepare. There's no specific reason for this change other than no
longer needing to do differently in pipeline mode. (Previously we had
no choice, because in pipeline mode PQprepare could not be used.)
Backpatch to 14, where pgbench got support for pipeline mode.
Reported-by: Yugo NAGATA <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/20210716153013.fc53b1c780b06fccc07a7f0d@sraoss.co.jp
When evaluating clauses on multiple scan keys of a multi-column BRIN
index, we can stop processing as soon as we find a scan key eliminating
the range, and the range should not be added to tbe bitmap.
That's how it worked before 14, but since a681e3c107a the code treated
the range as matching if it matched at least the last scan key.
Backpatch to 14, where this code was introduced.
Backpatch-through: 14
Discussion: https://postgr.es/m/ebc18613-125e-60df-7520-fcbe0f9274fc%40enterprisedb.com
ruleutils.c blindly printed the user-given alias (or nothing if there
hadn't been one) for the target table of INSERT/UPDATE/DELETE queries.
That works a large percentage of the time, but not always: for queries
appearing in WITH, it's possible that we chose a different alias to
avoid conflict with outer-scope names. Since the chosen alias would
be used in any Var references to the target table, this'd lead to an
inconsistent printout with consequences such as dump/restore failures.
The correct logic for printing (or not) a relation alias was embedded
in get_from_clause_item. Factor it out to a separate function so that
we don't need a jointree node to use it. (Only a limited part of that
function can be reached from these new call sites, but this seems like
the cleanest non-duplicative factorization.)
In passing, I got rid of a redundant "\d+ rules_src" step in rules.sql.
Initial report from Jonathan Katz; thanks to Vignesh C for analysis.
This has been broken for a long time, so back-patch to all supported
branches.
Discussion: https://postgr.es/m/e947fa21-24b2-f922-375a-d4f763ef3e4b@postgresql.org
Discussion: https://postgr.es/m/CALDaNm1MMntjmT_NJGp-Z=xbF02qHGAyuSHfYHias3TqQbPF2w@mail.gmail.com
OpenSSL 1.1.1 and newer versions have added support for RSA-PSS
certificates, which requires the use of a specific routine in OpenSSL to
determine which hash function to use when compiling it when using
channel binding in SCRAM-SHA-256. X509_get_signature_nid(), that is the
original routine the channel binding code has relied on, is not able to
determine which hash algorithm to use for such certificates. However,
X509_get_signature_info(), new to OpenSSL 1.1.1, is able to do it. This
commit switches the channel binding logic to rely on
X509_get_signature_info() over X509_get_signature_nid(), which would be
the choice when building with 1.1.1 or newer.
The error could have been triggered on the client or the server, hence
libpq and the backend need to have their related code paths patched.
Note that attempting to load an RSA-PSS certificate with OpenSSL 1.1.0
or older leads to a failure due to an unsupported algorithm.
The discovery of relying on X509_get_signature_info() comes from Jacob,
the tests have been written by Heikki (with few tweaks from me), while I
have bundled the whole together while adding the bits needed for MSVC
and meson.
This issue exists since channel binding exists, so backpatch all the way
down. Some tests are added in 15~, triggered if compiling with OpenSSL
1.1.1 or newer, where the certificate and key files can easily be
generated for RSA-PSS.
Reported-by: Gunnar "Nick" Bluth
Author: Jacob Champion, Heikki Linnakangas
Discussion: https://postgr.es/m/17760-b6c61e752ec07060@postgresql.org
Backpatch-through: 11
When an aggregate function is used as a WindowFunc and a tuple transitions
out of the window frame, we ordinarily try to make use of the aggregate
function's inverse transition function to "unaggregate" the exiting tuple.
This optimization is disabled for various cases, including when the
aggregate contains a volatile function. In such a case we'd be unable to
ensure that the transition value was calculated to the same value during
transitions and inverse transitions. Unfortunately, we did this check by
calling contain_volatile_functions() which does not recursively search
SubPlans for volatile functions. If the aggregate function's arguments or
its FILTER clause contained a subplan with volatile functions then we'd
fail to notice this.
Here we fix this by just disabling the optimization when the WindowFunc
contains any subplans. Volatile functions are not the only reason that a
subplan may have nonrepeatable results.
Bug: #17777
Reported-by: Anban Company
Discussion: https://postgr.es/m/17777-860b739b6efde977%40postgresql.org
Reviewed-by: Tom Lane
Backpatch-through: 11
It appears no longer possible to build the SGML docs without a local
installation of the DocBook DTD, because sourceforge.net now only
permits HTTPS access, and no common version of xsltproc supports that.
Hence, remove the bits of our documentation suggesting that that's
possible or useful.
In fact, we might as well add the --nonet option to the build recipes
automatically, for a bit of extra security.
Also fix our documentation-tool-installation recipes for macOS to
ensure that xmllint and xsltproc are pulled in from MacPorts or
Homebrew. The previous recipes assumed you could use the
Apple-supplied versions of these tools; which still works, except that
you'd need to set an environment variable to ensure that they would
find DTD files provided by those package managers. Simpler and easier
to just recommend pulling in the additional packages.
In HEAD, also document how to build docs using Meson, and adjust
"ninja docs" to just build the HTML docs, for consistency with the
default behavior of doc/src/sgml/Makefile.
In a fit of neatnik-ism, I also made the ordering of the package
lists match the order in which the tools are described at the head
of the appendix.
Aleksander Alekseev, Peter Eisentraut, Tom Lane
Discussion: https://postgr.es/m/CAJ7c6TO8Aro2nxg=EQsVGiSDe-TstP4EsSvDHd7DSRsP40PgGA@mail.gmail.com
Try to disable ASLR when building in EXEC_BACKEND mode, to avoid random
memory mapping failures while testing. For developer use only, no
effect on regular builds.
This has been originally applied as of f3e7806 for v15~, but
recently-added buildfarm member gokiburi tests this configuration on
older branches as well, causing it to fail randomly as ASLR would be
enabled.
Suggested-by: Andres Freund <andres@anarazel.de>
Tested-by: Bossart, Nathan <bossartn@amazon.com>
Discussion: https://postgr.es/m/20210806032944.m4tz7j2w47mant26%40alap3.anarazel.de
Backpatch-through: 12
pqsecure_open_gss() includes a code path handling error messages with
v2-style protocol messages coming from the server. The client-side
buffer holding the error message does not force a NULL-termination, with
the data of the server getting copied to the errorMessage of the
connection. Hence, it would be possible for a server to send an
unterminated string and copy arbitrary bytes in the buffer receiving the
error message in the client, opening the door to a crash or even data
exposure.
As at this stage of the authentication process the exchange has not been
completed yet, this could be abused by an attacker without Kerberos
credentials. Clients that have a valid kerberos cache are vulnerable as
libpq opportunistically requests for it except if gssencmode is
disabled.
Author: Jacob Champion
Backpatch-through: 12
Security: CVE-2022-41862
The prior coding of int64_div_fast_to_numeric() had a number of bugs
that would cause it to fail under different circumstances, such as
with log10val2 <= 0, or log10val2 a multiple of 4, or in the "slow"
numeric path with log10val2 >= 10.
None of those could be triggered by any of our current code, which
only uses log10val2 = 3 or 6. However, they made it a hazard for any
future code that might use it. Also, since this is exported by
numeric.c, users writing their own C code might choose to use it.
Therefore fix, and back-patch to v14, where it was introduced.
Dean Rasheed, reviewed by Tom Lane.
Discussion: https://postgr.es/m/CAEZATCW8gXgW0tgPxPgHDPhVX71%2BSWFRkhnXy%2BTfGDsKLepu2g%40mail.gmail.com
Breaking <phrase> over two lines is not handled by psql's
create_help.pl. (It creates faulty \help output.)
Undo the formatting change introduced by
9bdad1b5153e5d6b77a8f9c6e32286d6bafcd76d to fix this for now.
An early release of AF_UNIX in Windows apparently supported Linux-style
"abstract" Unix sockets, but they do not seem to work in current Windows
versions and there is no mention of any of this in the Winsock
documentation. Remove the mention of Windows from the documentation.
Back-patch to 14, where commit c9f0624b landed.
Discussion: https://postgr.es/m/CA%2BhUKGKrYbSZhrk4NGfoQGT_3LQS5pC5KNE1g0tvE_pPBZ7uew%40mail.gmail.com
DST law changes in Greenland and Mexico. Notably, a new timezone
America/Ciudad_Juarez has been split off from America/Ojinaga.
Historical corrections for northern Canada, Colombia, and Singapore.
This was only mentioned in the description of the text/label, which
are marked as being in quotes in the synopsis, which can cause
confusion (as witnessed on IRC).
Also separate the literal and NULL cases in the parameter list, per
suggestion from Tom Lane.
Also add an example of dropping a security label.
Dagfinn Ilmari Mannsåker, with some tweaks by me
Discussion: https://postgr.es/m/87sffqk4zp.fsf@wibble.ilmari.org
This test has been added as of 857ee8e that has introduced the SQL
function txid_status(), with the purpose of checking that a transaction
ID still in-progress during a crash is correctly marked as aborted after
recovery finishes.
This test is unstable, and some configuration scenarios may that easier
to reproduce (wal_level=minimal, wal_compression=on) because the WAL
holding the information about the in-progress transaction ID may not
have made it to disk yet, hence a post-crash recovery may cause the same
XID to be reused, triggering a test failure.
We have discussed a few approaches, like making this function force a
WAL flush to make it reliable across crashes, but we don't want to pay a
performance penalty in some scenarios, as well. The test could have
been tweaked to enforce a checkpoint but that actually breaks the
promise of the test to rely on a stable result of txid_status() after
a crash.
This issue has been reported a few times across the past years, with an
original report from Kyotaro Horiguchi. The buildfarm machines tanager,
hachi and gokiburi enable wal_compression, and fail on this test
periodically.
Discussion: https://postgr.es/m/3163112.1674762209@sss.pgh.pa.us
Discussion: https://postgr.es/m/20210305.115011.558061052471425531.horikyota.ntt@gmail.com
Backpatch-through: 11
If the final chunk of an oversized tuple being written out to disk was
exactly 32760 bytes, it would be corrupted due to a fencepost bug.
Bug #17619. Back-patch to 11 where the code arrived.
While testing that (see test module in archives), I (tmunro) noticed
that the per-participant page counter was not initialized to zero as it
should have been; that wasn't a live bug when it was written since DSM
memory was originally always zeroed, but since 14
min_dynamic_shared_memory might be configured and it supplies non-zeroed
memory, so that is also fixed here.
Author: Dmitry Astapov <dastapov@gmail.com>
Discussion: https://postgr.es/m/17619-0de62ceda812b8b5%40postgresql.org
network_ops is an opclass family of SpGiST, and the opclass able to
work on the inet type is named inet_ops.
Oversight in 7a1cd52, that reworked the design of the table listing all
the operators available.
Reported-by: Laurence Parry
Reviewed-by: Tom Lane, David G. Johnston
Discussion: https://postgr.es/m/167458110639.2667300.14741268666497110766@wrigleys.postgresql.org
Backpatch-through: 14
The drop database command waits for the logical replication sync worker to
accept ProcSignalBarrier and the worker's slot creation waits for the drop
database to finish which leads to a deadlock. This happens because the
tablesync worker holds interrupts while creating a slot.
We prevent cancel/die interrupts while creating a slot in the table sync
worker because it is possible that before the server finishes this
command, a concurrent drop subscription happens which would complete
without removing this slot and that leads to the slot existing until the
end of walsender. However, the slot will eventually get dropped at the
walsender exit time, so there is no danger of the dangling slot.
This patch reallows cancel/die interrupts while creating a slot and
modifies the test to wait for slots to become zero to prevent finding an
ephemeral slot.
The reported hang doesn't happen in PG14 as the drop database starts to
wait for ProcSignalBarrier with PG15 (commits 4eb2176318 and e2f65f4255)
but it is good to backpatch this till PG14 as it is not a good idea to
prevent interrupts during a network call that could block indefinitely.
Reported-by: Lakshmi Narayanan Sreethar
Diagnosed-by: Andres Freund
Author: Hou Zhijie
Reviewed-by: Vignesh C, Amit Kapila
Backpatch-through: 14, where it was introduced in commit 6b67d72b60
Discussion: https://postgr.es/m/CA+kvmZELXQ4ZD3U=XCXuG3KvFgkuPoN1QrEj8c-rMRodrLOnsg@mail.gmail.com
When libpqrcv_connect (also known as walrcv_connect()) failed, it leaked the
libpq connection. In most paths that's fairly harmless, as the calling process
will exit soon after. But e.g. CREATE SUBSCRIPTION could lead to a somewhat
longer lived leak.
Fix by releasing resources, including the libpq connection, on error.
Add a test exercising the error code path. To make it reliable and safe, the
test tries to connect to port=-1, which happens to fail during connection
establishment, rather than during connection string parsing.
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20230121011237.q52apbvlarfv6jm6@awork3.anarazel.de
Backpatch: 11-
b762fed64 recently changed this test to prevent subquery pullup to allow
us to test Memoize with lateral_vars. As pointed out by Tom Lane, OFFSET
0 is our standard way of preventing subquery pullups, so do it that way
instead.
Discussion: https://postgr.es/m/2144818.1674517061@sss.pgh.pa.us
Backpatch-through: 14, same as b762fed64
The test in question was meant to be testing Memoize to ensure it worked
correctly when the inner side of the join contained lateral vars, however,
nothing in the lateral subquery stopped it from being pulled up into the
main query, so the planner did that, and that meant no more lateral vars.
Here we add a simple ORDER BY to stop the planner from being able to
pullup the lateral subquery.
Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_LHJaN4L-tXpKMiPFnsCJWU1P8Xh59o0W7AA6UN99=cQ@mail.gmail.com
Backpatch-through: 14, where Memoize was added.
The motivation for this change is that when pg_dump dumps a
partitioned index that's marked REPLICA IDENTITY, it generates a
command sequence that applies REPLICA IDENTITY before the partitioned
index has been marked valid, causing restore to fail. We could
perhaps change pg_dump to not do it like that, but that would be
difficult and would not fix existing dump files with the problem.
There seems to be very little reason for the backend to disallow
this anyway --- the code ignores indisreplident when the index
isn't valid --- so instead let's fix it by allowing the case.
Commit 9511fb37a previously expressed a concern that allowing
indisreplident to be set on invalid indexes might allow us to
wind up in a situation where a table could have indisreplident
set on multiple indexes. I'm not sure I follow that concern
exactly, but in any case the only way that could happen is because
relation_mark_replica_identity is too trusting about the existing set
of markings being valid. Let's just rip out its early-exit code path
(which sure looks like premature optimization anyway; what are we
doing expending code to make redundant ALTER TABLE ... REPLICA
IDENTITY commands marginally faster and not-redundant ones marginally
slower?) and fix it to positively guarantee that no more than one
index is marked indisreplident.
The pg_dump failure can be demonstrated in all supported branches,
so back-patch all the way. I chose to back-patch 9511fb37a as well,
just to keep indisreplident handling the same in all branches.
Per bug #17756 from Sergey Belyashov.
Discussion: https://postgr.es/m/17756-dd50e8e0c8dd4a40@postgresql.org
When the length was too short, the server read outside the allocation.
That yielded the same log noise as sending the correct length with
(backendPID,cancelAuthCode) matching nothing. Change to a message about
the unexpected length. Given the attacker's lack of control over the
memory layout and the general lack of diversity in memory layouts at the
code in question, we doubt a would-be attacker could cause a segfault.
Hence, while the report arrived via security@postgresql.org, this is not
a vulnerability. Back-patch to v11 (all supported versions).
Andrey Borodin, reviewed by Tom Lane. Reported by Andrey Borodin.