pg_rewind looks at the control file to check what timeline a server is on.
But promotion doesn't immediately write a checkpoint, it merely writes
an end-of-recovery WAL record. If pg_rewind runs immediately after
promotion, before the checkpoint has completed, it will think think that
the server is still on the earlier timeline. We ran into this issue a long
time ago already, see commit 484a848a73f.
It's a bit bogus that pg_rewind doesn't determine the timeline correctly
until the end-of-recovery checkpoint has completed. We probably should
fix that. But for now work around it by waiting for the checkpoint
to complete before running pg_rewind, like we did in commit 484a848a73f.
In the passing, tidy up the new test a little bit. Rerder the INSERTs so
that the comments make more sense, remove a spurious CHECKPOINT call after
pg_rewind has already run, and add --debug option, so that if this fails
again, we'll have more data.
Per buildfarm failure at https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=rorqual&dt=2020-12-06%2018%3A32%3A19&stg=pg_rewind-check.
Backpatch to all supported versions.
Discussion: https://www.postgresql.org/message-id/1713707e-e318-761c-d287-5b6a4aa807e8@iki.fi
Buildfarm has been failing sporadically on the new test. I was able to
reproduce this by adding a random 0-10 s delay in the walreceiver, just
before it connects to the primary. There's a race condition where node_3
is promoted before it has fully caught up with node_1, leading to diverged
timelines. When node_1 is later reconfigured as standby following node_3,
it fails to catch up:
LOG: primary server contains no more WAL on requested timeline 1
LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/30000A0
That's the situation where you'd need to use pg_rewind, but in this case
it happens already when we are just setting up the actual pg_rewind
scenario we want to test, so change the test so that it waits until
node_3 is connected and fully caught up before promoting it, so that you
get a clean, controlled failover.
Also rewrite some of the comments, for clarity. The existing comments
detailed what each step in the test did, but didn't give a good overview
of the situation the steps were trying to create.
For reasons I don't understand, the test setup had to be written slightly
differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version
needed node 1 to be reinitialized from backup, whereas in later versions
it could be shut down and reconfigured to be a standby. But even 9.5 should
support "clean switchover", where primary makes sure that pending WAL is
replicated to standby on shutdown. It would be nice to figure out what's
going on there, but that's independent of pg_rewind and the scenario that
this test tests.
Discussion: https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi
In a few places, the long-version options were listed before the
single-letter ones in the command summary of a few commands. This
didn't match other commands, and didn't match the option ordering later
in the same reference page.
Backpatch-through: 9.5
If the target is a standby server, its WAL doesn't end at the last
checkpoint record, but at minRecoveryPoint. We must scan all the
WAL from the last common checkpoint all the way up to minRecoveryPoint
for modified pages, and also consider that portion when determining
whether the server needs rewinding.
Backpatch to all supported versions.
Author: Ian Barwick and me
Discussion: https://www.postgresql.org/message-id/CABvVfJU-LDWvoz4-Yow3Ay5LZYTuPD7eSjjE4kGyNZpXC6FrVQ%40mail.gmail.com
As it stood, expandTableLikeClause() re-did the same relation_openrv
call that transformTableLikeClause() had done. However there are
scenarios where this would not find the same table as expected.
We hold lock on the LIKE source table, so it can't be renamed or
dropped, but another table could appear before it in the search path.
This explains the odd behavior reported in bug #16758 when cloning a
table as a temp table of the same name. This case worked as expected
before commit 502898192 introduced the need to open the source table
twice, so we should fix it.
To make really sure we get the same table, let's re-open it by OID not
name. That requires adding an OID field to struct TableLikeClause,
which is a little nervous-making from an ABI standpoint, but as long
as it's at the end I don't think there's any serious risk.
Per bug #16758 from Marc Boeren. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/16758-840e84a6cfab276d@postgresql.org
When committing a transaction that dropped a relation, we previously
truncated only the first segment file to free up disk space (the one
that won't be unlinked until the next checkpoint).
Truncate higher numbered segments too, even though we unlink them on
commit. This frees the disk space immediately, even if other backends
have open file descriptors and might take a long time to get around to
handling shared invalidation events and closing them. Also extend the
same behavior to the first segment, in recovery.
Back-patch to all supported releases.
Bug: #16663
Reported-by: Denis Patron <denis.patron@previnet.it>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Reviewed-by: David Zhang <david.zhang@highgo.ca>
Discussion: https://postgr.es/m/16663-fe97ccf9932fc800%40postgresql.org
Because regular CREATE INDEX commands are independent, and there's no
logical data dependency, it's not immediately obvious that transactions
held by concurrent index builds on one table will block the second phase
of concurrent index creation on an unrelated table, so document this
caveat.
Backpatch this all the way back. In branch master, mention that only
some indexes are involved.
Author: James Coleman <jtc331@gmail.com>
Reviewed-by: David Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/CAAaqYe994=PUrn8CJZ4UEo_S-FfRr_3ogERyhtdgHAb2WG_Ufg@mail.gmail.com
If a PlaceHolderVar is to be evaluated at a join relation, but
its value is only needed there and not at higher levels, we neglected
to update the joinrel's direct_lateral_relids to include the PHV's
source rel. This causes problems because join_is_legal() then won't
allow joining the joinrel to the PHV's source rel at all, leading
to "failed to build any N-way joins" planner failures.
Per report from Andreas Seltenreich. Back-patch to 9.5
where the problem originated.
Discussion: https://postgr.es/m/87blfgqa4t.fsf@aurora.ydns.eu
Through my misreading of what the existing code actually did,
commits 85c54287a et al. broke psql's behavior for the case where
"\c connstring" provides a password in the connstring. We should
use that password in such a case, but as of 85c54287a we ignored it
(and instead, prompted for a password).
Commit 94929f1cf fixed that in HEAD, but since I thought it was
cleaning up a longstanding misbehavior and not one I'd just created,
I didn't back-patch it.
Hence, back-patch the portions of 94929f1cf having to do with
password management. In addition to fixing the introduced bug,
this means that "\c -reuse-previous=on connstring" will allow
re-use of an existing connection's password if the connstring
doesn't change user/host/port. That didn't happen before, but
it seems like a bug fix, and anyway I'm loath to have significant
differences in this code across versions.
Also fix an error with the same root cause about whether or not to
override a connstring's setting of client_encoding. As of 85c54287a
we always did so; restore the previous behavior of overriding only
when stdin/stdout are a terminal and there's no environment setting
of PGCLIENTENCODING. (I find that definition a bit surprising, but
right now doesn't seem like the time to revisit it.)
Per bug #16746 from Krzysztof Gradek. As with the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/16746-44b30e2edf4335d4@postgresql.org
The documentation omitted the critical tidbit that a keyword-array entry
is simply ignored if its corresponding value-array entry is NULL or an
empty string; it will *not* override any previously-obtained value for
the parameter. (See conninfo_array_parse().) I'd supposed that would
force the setting back to default, which is what led me into bug #16746;
but it doesn't.
While here, I couldn't resist the temptation to do some copy-editing,
both in the description of PQconnectdbParams() and in the section
about connection URI syntax.
Discussion: https://postgr.es/m/931505.1606618746@sss.pgh.pa.us
Commit 566372b3d fixed some race conditions involving concurrent
SimpleLruTruncate calls, but it introduced new ones in async.c.
A newly-listening backend could attempt to read Notify SLRU pages that
were in process of being truncated, possibly causing an error. Also,
the QUEUE_TAIL pointer could become set to a value that's not equal to
the queue position of any backend. While that's fairly harmless in
v13 and up (thanks to commit 51004c717), in older branches it resulted
in near-permanent disabling of the queue truncation logic, so that
continued use of NOTIFY led to queue-fill warnings and eventual
inability to send any more notifies. (A server restart is enough to
make that go away, but it's still pretty unpleasant.)
The core of the problem is confusion about whether QUEUE_TAIL
represents the "logical" tail of the queue (i.e., the oldest
still-interesting data) or the "physical" tail (the oldest data we've
not yet truncated away). To fix, split that into two variables.
QUEUE_TAIL regains its definition as the logical tail, and we
introduce a new variable to track the oldest un-truncated page.
Per report from Mikael Gustavsson. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/1b8561412e8a4f038d7a491c8b922788@smhi.se
Previously this code assumed that all IndexScan nodes supported
mark/restore, which is not true since it depends on optional index AM
support functions. This could lead to errors about missing support
functions in rare edge cases of mergejoins with no sort keys, where an
unordered non-btree index scan was placed on the inner path without a
protecting Materialize node. (Normally, the fact that merge join
requires ordered input would avoid this error.)
Backpatch all the way since this bug is ancient.
Per report from Eugen Konkov on irc.
Discussion: https://postgr.es/m/87o8jn50be.fsf@news-spur.riddles.org.uk
We previously put the -isysroot switch only into CPPFLAGS, theorizing
that it was only needed to find the right copies of include files.
However, it seems that we also need to use it while linking programs,
to find the right stub ".tbd" files for libraries. We got away
without that up to now, but apparently that was mostly luck. It may
also be that failures are only observed when the Xcode version is
noticeably out of sync with the host macOS version; the case that's
prompting action right now is that builds fail when using latest Xcode
(12.2) on macOS Catalina, even though it's fine on Big Sur.
Hence, add -isysroot to LDFLAGS as well. (It seems that the more
common practice is to put it in CFLAGS, whence it'd be included at
both compile and link steps. However, we can't mess with CFLAGS in
the platform template file without confusing configure's logic for
choosing default CFLAGS.)
Back-patch of 49407dc32 into all supported branches.
Report and patch by James Hilliard (some cosmetic mods by me)
Discussion: https://postgr.es/m/20201120003314.20560-1-james.hilliard1@gmail.com
Commit 502898192 was too careless about the order of execution of the
additional ALTER TABLE operations generated by expandTableLikeClause.
It just stuck them all at the end, which seems okay for most purposes.
But it falls down in the case where LIKE is importing a primary key
or unique index and the outer CREATE TABLE includes a FOREIGN KEY
constraint that needs to depend on that index. Weird as that is,
it used to work, so we ought to keep it working.
To fix, make parse_utilcmd.c insert LIKE clauses between index-creation
and FK-creation commands in the transformed list of commands, and change
utility.c so that the commands generated by expandTableLikeClause are
executed immediately not at the end. One could imagine scenarios where
this wouldn't work either; but currently expandTableLikeClause only
makes column default expressions, CHECK constraints, and indexes, and
this ordering seems fine for those.
Per bug #16730 from Sofoklis Papasofokli. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/16730-b902f7e6e0276b30@postgresql.org
Otherwise, if FDDEBUG is enabled, the debugging output fails because
it tries to read the fileName, which isn't set up yet (and should in
fact always be NULL).
AFAICT, this has been wrong since Berkeley. Before 96bf88d52,
it would accidentally fail to crash on platforms where snprintf()
is forgiving about being passed a NULL pointer for %s; but the
file name intended to be included in the debug output wouldn't
ever have shown up.
Report and fix by Greg Nancarrow. Although this is only visibly
broken in custom-made builds, it still seems worth back-patching
to all supported branches, as the FDDEBUG code is pretty useless
as it stands.
Discussion: https://postgr.es/m/CAJcOf-cUDgm9qYtC_B6XrC6MktMPNRby2p61EtSGZKnfotMArw@mail.gmail.com
Document that though the history file content is marked as bytea, it is
the same a text, and neither is btyea-escaped or encoding converted.
Reported-by: Brar Piening
Discussion: https://postgr.es/m/6a1b9cd9-17e3-df67-be55-86102af6bdf5@gmx.de
Backpatch-through: 13 - 9.5 (not master)
Whether from size overflow in gistSplit or from secondary splits,
picksplit is (rarely) called with exactly two items to split.
Formerly, due to special-case handling of the last item, this would
lead to access to an uninitialized cache entry; prior to PG 13 this
might have been harmless or at worst led to an incorrect union datum,
but in 13 onwards it can cause a backend crash from using an
uninitialized pointer.
Repair by removing the special case, which was deemed not to have been
appropriate anyway. Backpatch all the way, because this bug has
existed since pg_trgm was added.
Per report on IRC from user "ftzdomino". Analysis and testing by me,
patch from Alexander Korotkov.
Discussion: https://postgr.es/m/87k0usfdxg.fsf@news-spur.riddles.org.uk
Commit 8bf74967dab moved some of the code from brin_new_memtuple to
brin_memtuple_initialize, but this resulted in some of the code being
duplicate. Fix by removing the duplicate lines and backpatch to 10.
Author: Tomas Vondra
Backpatch-through: 10
Discussion: https://postgr.es/m/5eb50c97-9a8e-b691-8c40-1b2a55611c4c%40enterprisedb.com
Introduce TimestampDifferenceMilliseconds() to simplify callers
that would rather have the difference in milliseconds, instead of
the select()-oriented seconds-and-microseconds format. This gets
rid of at least one integer division per call, and it eliminates
some apparently-easy-to-mess-up arithmetic.
Two of these call sites were in fact wrong:
* pg_prewarm's autoprewarm_main() forgot to multiply the seconds
by 1000, thus ending up with a delay 1000X shorter than intended.
That doesn't quite make it a busy-wait, but close.
* postgres_fdw's pgfdw_get_cleanup_result() thought it needed to compute
microseconds not milliseconds, thus ending up with a delay 1000X longer
than intended. Somebody along the way had noticed this problem but
misdiagnosed the cause, and imposed an ad-hoc 60-second limit rather
than fixing the units. This was relatively harmless in context, because
we don't care that much about exactly how long this delay is; still,
it's wrong.
There are a few more callers of TimestampDifference() that don't
have a direct need for seconds-and-microseconds, but can't use
TimestampDifferenceMilliseconds() either because they do need
microsecond precision or because they might possibly deal with
intervals long enough to overflow 32-bit milliseconds. It might be
worth inventing another API to improve that, but that seems outside
the scope of this patch; so those callers are untouched here.
Given the fact that we are fixing some bugs, and the likelihood
that future patches might want to back-patch code that uses this
new API, back-patch to all supported branches.
Alexey Kondratov and Tom Lane
Discussion: https://postgr.es/m/3b1c053a21c07c1ed5e00be3b2b855ef@postgrespro.ru
Summarily changing the STYPE of regression-test aggregates that
depend on array_append or array_cat is an issue for the buildfarm's
cross-version-upgrade tests, because those aggregates (as defined
in the back branches) now won't load into HEAD. Although this seems
like only a minimal risk for genuine user-defined aggregates, we
need to do something for the buildfarm. Hence, adjust the aggregate
definitions, in both HEAD and the back branches.
Discussion: https://postgr.es/m/1401824.1604537031@sss.pgh.pa.us
Discussion: https://postgr.es/m/E1kaQ2c-0005lx-Eg@gemulon.postgresql.org
If an interactive psql session used \gset when querying a compromised
server, the attacker could execute arbitrary code as the operating
system account running psql. Using a prefix not found among specially
treated variables, e.g. every lowercase string, precluded the attack.
Fix by issuing a warning and setting no variable for the column in
question. Users wanting the old behavior can use a prefix and then a
meta-command like "\set HISTSIZE :prefix_HISTSIZE". Back-patch to 9.5
(all supported versions).
Reviewed by Robert Haas. Reported by Nick Cleaton.
Security: CVE-2020-25696
Specifically, this blocks DECLARE ... WITH HOLD and firing of deferred
triggers within index expressions and materialized view queries. An
attacker having permission to create non-temp objects in at least one
schema could execute arbitrary SQL functions under the identity of the
bootstrap superuser. One can work around the vulnerability by disabling
autovacuum and not manually running ANALYZE, CLUSTER, REINDEX, CREATE
INDEX, VACUUM FULL, or REFRESH MATERIALIZED VIEW. (Don't restore from
pg_dump, since it runs some of those commands.) Plain VACUUM (without
FULL) is safe, and all commands are fine when a trusted user owns the
target object. Performance may degrade quickly under this workaround,
however. Back-patch to 9.5 (all supported versions).
Reviewed by Robert Haas. Reported by Etienne Stalmans.
Security: CVE-2020-25695
brin_form_tuple failed to consider the values may be toasted, inserting
the toast pointer into the index. This may easily result in index
corruption, as the toast data may be deleted and cleaned up by vacuum.
The cleanup however does not care about indexes, leaving invalid toast
pointers behind, which triggers errors like this:
ERROR: missing chunk number 0 for toast value 16433 in pg_toast_16426
A less severe consequence are inconsistent failures due to the index row
being too large, depending on whether brin_form_tuple operated on plain
or toasted version of the row. For example
CREATE TABLE t (val TEXT);
INSERT INTO t VALUES ('... long value ...')
CREATE INDEX idx ON t USING brin (val);
would likely succeed, as the row would likely include toast pointer.
Switching the order of INSERT and CREATE INDEX would likely fail:
ERROR: index row size 8712 exceeds maximum 8152 for index "idx"
because this happens before the row values are toasted.
The bug exists since PostgreSQL 9.5 where BRIN indexes were introduced.
So backpatch all the way back.
Author: Tomas Vondra
Reviewed-by: Alvaro Herrera
Backpatch-through: 9.5
Discussion: https://postgr.es/m/20201001184133.oq5uq75sb45pu3aw@development
Discussion: https://postgr.es/m/20201104010544.zexj52mlldagzowv%40development
Revert 59ab4ac32, as well as the followup fix 33862cb9c, in all
branches. We need to think a bit harder about what the behavior
of LOCK TABLE on views should be, and there's no time for that
before next week's releases. We'll take another crack at this
later.
Discussion: https://postgr.es/m/16703-e348f58aab3cf6cc@postgresql.org
Revert 403a3d91c, as well as the followup fix 7f4235032, in all
branches. We need to think a bit harder about what the behavior
of LOCK TABLE on views should be, and there's no time for that
before next week's releases. We'll take another crack at this
later.
Discussion: https://postgr.es/m/16703-e348f58aab3cf6cc@postgresql.org
Commits 59ab4ac32 et al mistakenly copied-and-pasted some text about
how LOCK on a view recurses to referenced tables into pre-v11 branches,
which in fact don't do that. Undo that, and instead state clearly
that they don't. (I also chose to add a note that this behavior
changed in v11. We usually don't back-patch such statements, but
since it's easy to add the warning now, might as well.)
Noted while considering followup fixes for bug #16703.
Discussion: https://postgr.es/m/16703-e348f58aab3cf6cc@postgresql.org
If the planner erroneously puts a non-parallel-safe SubPlan into
a parallelized portion of the query tree, nodeSubplan.c will fail
in the worker processes because it finds a null in es_subplanstates,
which it's unable to cope with. It seems worth a test-and-elog to
make that an error case rather than a core dump case.
This probably should have been included in commit 16ebab688, which
was responsible for allowing nulls to appear in es_subplanstates
to begin with. So, back-patch to v10 where that came in.
Discussion: https://postgr.es/m/924226.1604422326@sss.pgh.pa.us
The intention in commit 491c029db was to require superuserness to
change the BYPASSRLS property, but the actual effect of the coding
in AlterRole() was to require superuserness to change anything at all
about a BYPASSRLS role. Other properties of a BYPASSRLS role should
be changeable under the same rules as for a normal role, though.
Fix that, and also take care of some documentation omissions related
to BYPASSRLS and REPLICATION role properties.
Tom Lane and Stephen Frost, per bug report from Wolfgang Walther.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/a5548a9f-89ee-3167-129d-162b5985fcf8@technowledgy.de
fill_hba_line() thought it could get away with passing sizeof(struct
sockaddr_storage) rather than the actual addrlen previously returned
by getaddrinfo(). While that appears to work on many platforms,
it does not work on FreeBSD 11: you get back a failure, which leads
to the view showing NULL for the address and netmask columns in all
rows. The POSIX spec for getnameinfo() is pretty clearly on
FreeBSD's side here: you should pass the actual address length.
So it seems plausible that there are other platforms where this
coding also fails, and we just hadn't noticed.
Also, IMO the fact that getnameinfo() failure leads to a NULL output
is pretty bogus in itself. Our pg_getnameinfo_all() wrapper is
careful to emit "???" on failure, and we should use that in such
cases. NULL should only be emitted in rows that don't have IP
addresses.
Per bug #16695 from Peter Vandivier. Back-patch to v10 where this
code was added.
Discussion: https://postgr.es/m/16695-a665558e2f630be7@postgresql.org
Although error results received from the backend should always have
a SQLSTATE field, ones generated by libpq won't, making this code
vulnerable to a crash after, say, untimely loss of connection.
Noted by Coverity.
Oversight in commit 403a3d91c. Back-patch to 9.5, as that was.
The timetz test cases I added in commit a9632830b were unintentionally
sensitive to whether or not DST is active in the PST8PDT time zone.
Thus, they'll start failing this coming weekend, as reported by
Bernhard M. Wiedemann in bug #16689. Fortunately, DST-awareness is
not significant to the purpose of these test cases, so we can just
force them all to PDT (DST hours) to preserve stability of the
results.
Back-patch to v10, as the prior patch was.
Discussion: https://postgr.es/m/16689-57701daa23b377bf@postgresql.org
In almost all other places, we use plain "r" or "w" mode in popen()
calls (the exceptions being for COPY data). This one has been
overlooked (possibly because it's buried in a ".l" flex file?),
but it's using PG_BINARY_R.
Kensuke Okamura complained in bug #16688 that we fail to strip \r
when stripping the trailing newline from a backtick result string.
That's true enough, but we'd also fail to convert embedded \r\n
cleanly, which also seems undesirable. Fixing the popen() mode
seems like the best way to deal with this.
It's been like this for a long time, so back-patch to all supported
branches.
Discussion: https://postgr.es/m/16688-c649c7b69cd7e6f8@postgresql.org