This reverts commit cb2fd7eac285b1b0a24eeb2b8ed4456b66c5a09f. Per
numerous buildfarm members, it was incompatible with parallel query, and
a test case assumed LP64. Back-patch to 9.5 (all supported versions).
Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
Until now, only selected bulk operations (e.g. COPY) did this. If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY. See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules. Maintainers of table access
methods should examine that section.
To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL. A new GUC,
wal_skip_threshold, guides that choice. If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold. Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.
Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node. Make relcache.c retain entries for certain
dropped relations until end of transaction.
Back-patch to 9.5 (all supported versions). This introduces a new WAL
record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As
always, update standby systems before master systems. This changes
sizeof(RelationData) and sizeof(IndexStmt), breaking binary
compatibility for affected extensions. (The most recent commit to
affect the same class of extensions was
089e4d405d0f3b94c74a2c6a54357a84a681754b.)
Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas. Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem. Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.
Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
Back-patch a subset of commit 9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b
to v11, v10, 9.6, and 9.5. Include the latest repairs to this function.
Use a new XLOG_FPI_MULTI value instead of reusing XLOG_FPI. That way,
if an older server reads WAL from this function, that server will PANIC
instead of applying just one page of the record. The next commit adds a
call to this function.
Discussion: https://postgr.es/m/20200304.162919.898938381201316571.horikyota.ntt@gmail.com
swap_relation_files() calls toast_get_valid_index() to find and lock
this index, just before swapping with the rebuilt TOAST index. The
latter function releases the lock before returning. Potential for
mischief is low; a concurrent session can issue ALTER INDEX ... SET
(fillfactor = ...), which is not alarming. Nonetheless, changing
pg_class.relfilenode without a lock is unconventional. Back-patch to
9.5 (all supported versions), because another fix needs this.
Discussion: https://postgr.es/m/20191226001521.GA1772687@rfd.leadboat.com
Remove an obsolete comment from AtEOXact_cleanup(). Restore formatting
of a comment in struct RelationData, mangled by the pgindent run in
commit 9af4159fce6654aa0e081b00d02bca40b978745c. Back-patch to 9.5 (all
supported versions), because another fix stacks on this.
This patch fixes the error message in get_major_server_version() to be
"could not parse version file", and uses the full file path name, rather
than just the data directory path.
Also, commit 4109bb5de4 added the cause of the failure to the "could
not open" error message, and improved quoting. This patch backpatches
the "could not open" cause to PG 12, where it was first widely used, and
backpatches the quoting fix in that patch to all supported releases.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/87pne2w98h.fsf@wibble.ilmari.org
Author: Dagfinn Ilmari Mannsåker
Backpatch-through: 9.5
This commit corrects the descriptions of RecoveryWalAll and RecoveryWalStream
wait events in the documentation.
Back-patch to v10 where those wait events were added.
Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Atsushi Torikoshi
Discussion: https://postgr.es/m/124997ee-096a-5d09-d8da-2c7a57d0816e@oss.nttdata.com
I noticed that we completely failed to document the restriction
that an "anyrange" result type has to be inferred from an "anyrange"
input. The docs also were less clear than they could be about the
relationship between "anyrange" and "anyarray".
It's been like this all along, so back-patch.
If pkg-config is installed and knows about libxml2, use its information
rather than asking xml2-config. Otherwise proceed as before. This
patch allows "configure --with-libxml" to succeed on platforms that
have pkg-config but not xml2-config, which is likely to soon become
a typical situation.
The old mechanism can be forced by setting XML2_CONFIG explicitly
(hence, build processes that were already doing so will certainly
not need adjustment). Also, it's now possible to set XML2_CFLAGS
and XML2_LIBS explicitly to override both programs.
There is a small risk of this breaking existing build processes,
if there are multiple libxml2 installations on the machine and
pkg-config disagrees with xml2-config about which to use. The
only case where that seems really likely is if a builder has tried
to select a non-default xml2-config by putting it early in his PATH
rather than setting XML2_CONFIG. Plan to warn against that in the
minor release notes.
Back-patch to v10; before that we had no pkg-config infrastructure,
and it doesn't seem worth adding it for this.
Hugh McMaster and Tom Lane; Peter Eisentraut also made an earlier
attempt at this, from which I lifted most of the docs changes.
Discussion: https://postgr.es/m/CAN9BcdvfUwc9Yx5015bLH2TOiQ-M+t_NADBSPhMF7dZ=pLa_iw@mail.gmail.com
This extends the fixes made in commit 085b6b667 to other SRFs with the
same bug, namely pg_logdir_ls(), pgrowlocks(), pg_timezone_names(),
pg_ls_dir(), and pg_tablespace_databases().
Also adjust various comments and documentation to warn against
expecting to clean up resources during a ValuePerCall SRF's final
call.
Back-patch to all supported branches, since these functions were
all born broken.
Justin Pryzby, with cosmetic tweaks by me
Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
resolve_polymorphic_tupdesc() and resolve_polymorphic_argtypes() failed to
cover the case of having to resolve anyarray given only an anyrange input.
The bug was masked if anyelement was also used (as either input or
output), which probably helps account for our not having noticed.
While looking at this I noticed that resolve_generic_type() would produce
the wrong answer if asked to make that same resolution. ISTM that
resolve_generic_type() is confusingly defined and overly complex, so
rather than fix it, let's just make funcapi.c do the specific lookups
it requires for itself.
With this change, resolve_generic_type() is not used anywhere, so remove
it in HEAD. In the back branches, leave it alone (complete with bug)
just in case any external code is using it.
While we're here, make some other refactoring adjustments in funcapi.c
with an eye to upcoming future expansion of the set of polymorphic types:
* Simplify quick-exit tests by adding an overall have_polymorphic_result
flag. This is about a wash now but will be a win when there are more
flags.
* Reduce duplication of code between resolve_polymorphic_tupdesc() and
resolve_polymorphic_argtypes().
* Don't bother to validate correct matching of anynonarray or anyenum;
the parser should have done that, and even if it didn't, just doing
"return false" here would lead to a very confusing, off-point error
message. (Really, "return false" in these two functions should only
occur if the call_expr isn't supplied or we can't obtain data type
info from it.)
* For the same reason, throw an elog rather than "return false" if
we fail to resolve a polymorphic type.
The bug's been there since we added anyrange, so back-patch to
all supported branches.
Discussion: https://postgr.es/m/6093.1584202130@sss.pgh.pa.us
This should of course be just "PG_ARGISNULL()".
Also reorder a couple of paras to make the discussion of PG_ARGISNULL
less disjointed.
Back-patch to v10 where the error was introduced.
Laurenz Albe and Tom Lane, per an anonymous docs comment
Discussion: https://postgr.es/m/158399487096.5708.10696365251766477013@wrigleys.postgresql.org
If an index was explicitly set as replica identity index, this setting
was lost when a table was rewritten by ALTER TABLE. Because this
setting is part of pg_index but actually controlled by ALTER
TABLE (not part of CREATE INDEX, say), we have to do some extra work
to restore it.
Based-on-patch-by: Quan Zongliang <quanzongliang@gmail.com>
Reviewed-by: Euler Taveira <euler.taveira@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c70fcab2-4866-0d9f-1d01-e75e189db342@gmail.com
RecordKnownAssignedTransactionIds() should never move
nextXid backwards. Before this commit, that could happen
if some other code path had advanced it without advancing
latestObservedXid.
One consequence is that a well timed XLOG_CHECKPOINT_ONLINE
could cause hot standby feedback messages to get confused
and report an xmin from a future epoch, potentially allowing
vacuum to run too soon on the primary.
Repair, by making sure RecordKnownAssignedTransactionIds()
can only move nextXid forwards.
In release 12 and master, this was already done by commit
2fc7af5e, which consolidated similar code and straightened
out this bug. Back-patch to supported releases before that.
Author: Eka Palamadai <ekanatha@amazon.com>
Discussion: https://postgr.es/m/98BB4805-D0A2-48E1-96F4-15014313EADC@amazon.com
I forgot that the WAL directory might hold other files besides WAL
segments, notably including new segments still being filled.
That means a blind test for the first file's size being 16MB can
fail. Restrict based on file name length to make it more robust.
Per buildfarm.
pg_dump is oblivious to this kind of dependency, so they're lost on
dump/restores (and pg_upgrade). Have pg_dump emit ALTER lines so that
they're preserved. Add some pg_dump tests for the whole thing, also.
Reviewed-by: Tom Lane (offlist)
Reviewed-by: Ibrar Ahmed
Reviewed-by: Ahsan Hadi (who also reviewed commit 899a04f5ed61)
Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql
This coding technique is undesirable because (a) it leaks the FD for
the rest of the transaction if the SRF is not run to completion, and
(b) allocated FDs are a scarce resource, but multiple interleaved
uses of the relevant functions could eat many such FDs.
In v11 and later, a query such as "SELECT pg_ls_waldir() LIMIT 1"
yields a warning about the leaked FD, and the only reason there's
no warning in earlier branches is that fd.c didn't whine about such
leaks before commit 9cb7db3f0. Even disregarding the warning, it
wouldn't be too hard to run a backend out of FDs with careless use
of these SQL functions.
Hence, rewrite the function so that it reads the directory within
a single call, returning the results as a tuplestore rather than
via value-per-call mode.
There are half a dozen other built-in SRFs with similar problems,
but let's fix this one to start with, just to see if the buildfarm
finds anything wrong with the code.
In passing, fix bogus error report for stat() failure: it was
whining about the directory when it should be fingering the
individual file. Doubtless a copy-and-paste error.
Back-patch to v10 where this function was added.
Justin Pryzby, with cosmetic tweaks and test cases by me
Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
If the command is attempted for an extension that the object already
depends on, silently do nothing.
In particular, this means that if a database containing multiple such
entries is dumped, the restore will silently do the right thing and
record just the first one. (At least, in a world where pg_dump does
dump such entries -- which it doesn't currently, but it will.)
Backpatch to 9.6, where this kind of dependency was introduced.
Reviewed-by: Ibrar Ahmed, Tom Lane (offlist)
Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql
Previously, event triggers were restored just after regular triggers
(and FK constraints, which are basically triggers). This is risky
since an event trigger, once installed, could interfere with subsequent
restore commands. Worse, because event triggers don't have any
particular dependencies on any post-data objects, a parallel restore
would consider them eligible to be restored the moment the post-data
phase starts, allowing them to also interfere with restoration of a
whole bunch of objects that would have been restored before them in
a serial restore. There's no way to completely remove the risk of a
misguided event trigger breaking the restore, since if nothing else
it could break other event triggers. But we can certainly push them
to later in the process to minimize the hazard.
To fix, tweak the RestorePass mechanism introduced by commit 3eb9a5e7c
so that event triggers are handled as part of the post-ACL processing
pass (renaming the "REFRESH" pass to "POST_ACL" to reflect its more
general use). This will cause them to restore after everything except
matview refreshes, which seems OK since matview refreshes really ought
to run in the post-restore state of the database. In a parallel
restore, event triggers and matview refreshes might be intermixed,
but that seems all right as well.
Also update the code and comments in pg_dump_sort.c so that its idea
of how things are sorted agrees with what actually happens due to
the RestorePass mechanism. This is mostly cosmetic: it'll affect the
order of objects in a dump's TOC, but not the actual restore order.
But not changing that would be quite confusing to somebody reading
the code.
Back-patch to all supported branches.
Fabrízio de Royes Mello, tweaked a bit by me
Discussion: https://postgr.es/m/CAFcNs+ow1hmFox8P--3GSdtwz-S3Binb6ZmoP6Vk+Xg=K6eZNA@mail.gmail.com
Previously "waiting" could appear twice via PS in case of lock conflict
in hot standby mode. Specifically this issue happend when the delay
in WAL application determined by max_standby_archive_delay and
max_standby_streaming_delay had passed but it took more than 500 msec
to cancel all the conflicting transactions. Especially we can observe this
easily by setting those delay parameters to -1.
The cause of this issue was that WaitOnLock() and
ResolveRecoveryConflictWithVirtualXIDs() added "waiting" to
the process title in that case. This commit prevents
ResolveRecoveryConflictWithVirtualXIDs() from reporting waiting
in case of lock conflict, to fix the bug.
Back-patch to all back branches.
Author: Masahiko Sawada
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com
At the end of recovery, standby mode is turned off to re-fetch the last
valid record from archive or pg_wal. Previously, if recovery target was
reached and standby mode was turned off while the current WAL source
was stream, recovery could try to retrieve WAL file containing the last
valid record unexpectedly from stream even though not in standby mode.
This caused an assertion failure. That is, the assertion test confirms that
WAL file should not be retrieved from stream if standby mode is not true.
This commit moves back the current WAL source to archive if it's stream
even though not in standby mode, to avoid that assertion failure.
This issue doesn't cause the server to crash when built with assertion
disabled. In this case, the attempt to retrieve WAL file from stream not
in standby mode just fails. And then recovery tries to retrieve WAL file
from archive or pg_wal.
Back-patch to all supported branches.
Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/20200227.124830.2197604521555566121.horikyota.ntt@gmail.com
Previously the documentation explains that WAL segment files
start at 000000010000000000000000. But the first WAL segment file
that initdb creates is 000000010000000000000001 not
000000010000000000000000. This change was caused by old
commit 8c843fff2d, but the documentation had not been updated
a long time.
Back-patch to all supported branches.
Author: Fujii Masao
Reviewed-by: David Zhang
Discussion: https://postgr.es/m/CAHGQGwHOmGe2OqGOmp8cOfNVDivq7dbV74L5nUGr+3eVd2CU2Q@mail.gmail.com
Access to this module is granted to the pg_monitor role, not
pg_read_all_stats. (Given the view's performance impact,
it seems wise to be restrictive, so I think this was the
correct decision --- and anyway it was clearly intentional.)
Per bug #16279 from Philip Semanchuk.
Discussion: https://postgr.es/m/16279-fcaac33c68aab0ab@postgresql.org
The original coding failed to properly quote those arguments, leading to
failures when using quotes in the values used. As the quoting can be
encoding-sensitive, the connection to the backend needs to be taken
before applying the correct quoting.
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200214041004.GB1998@paquier.xyz
Backpatch-through: 9.5
Creating a bunch of non-overlapping partial indexes is generally
a bad idea, so add an example saying not to do that.
Back-patch to v10. Before that, the alternative of using (real)
partitioning wasn't available, so that the tradeoff isn't quite
so clear cut.
Discussion: https://postgr.es/m/CAKVFrvFY-f7kgwMRMiPLbPYMmgjc8Y2jjUGK_Y0HVcYAmU6ymg@mail.gmail.com
The function hash table keys made by compute_function_hashkey() failed
to distinguish event-trigger call context from regular call context.
This meant that once we'd successfully made a hash entry for an event
trigger (either by validation, or by normal use as an event trigger),
an attempt to call the trigger function as a plain function would
find this hash entry and thereby bypass the you-can't-do-that check in
do_compile(). Thus we'd attempt to execute the function, leading to
strange errors or even crashes, depending on function contents and
server version.
To fix, add an isEventTrigger field to PLpgSQL_func_hashkey,
paralleling the longstanding infrastructure for regular triggers.
This fits into what had been pad space, so there's no risk of an ABI
break, even assuming that any third-party code is looking at these
hash keys. (I considered replacing isTrigger with a PLpgSQL_trigtype
enum field, but felt that that carried some API/ABI risk. Maybe we
should change it in HEAD though.)
Per bug #16266 from Alexander Lakhin. This has been broken since
event triggers were invented, so back-patch to all supported branches.
Discussion: https://postgr.es/m/16266-fcd7f838e97ba5d4@postgresql.org
VACUUM may truncate heap in several batches. The activity report
is logged for each batch, and contains the number of pages in the table
before and after the truncation, and also the elapsed time during
the truncation. Previously the elapsed time reported in each batch was
the total elapsed time since starting the truncation until finishing
each batch. For example, if the truncation was processed dividing into
three batches, the second batch reported the accumulated time elapsed
during both first and second batches. This is strange and confusing
because the number of pages in the table reported together is not
total. Instead, each batch should report the time elapsed during
only that batch.
The cause of this issue was that the resource usage snapshot was
initialized only at the beginning of the truncation and was never
reset later. This commit fixes the issue by changing VACUUM so that
the resource usage snapshot is reset at each batch.
Back-patch to all supported branches.
Reported-by: Tatsuhito Kasahara
Author: Tatsuhito Kasahara
Reviewed-by: Masahiko Sawada, Fujii Masao
Discussion: https://postgr.es/m/CAP0=ZVJsf=NvQuy+QXQZ7B=ZVLoDV_JzsVC1FRsF1G18i3zMGg@mail.gmail.com
Manifested as
ERROR: subtransaction logged without previous top-level txn record
this check forbids legit behaviours like
- First xl_xact_assignment record is beyond reading, i.e. earlier
restart_lsn.
- After restart_lsn there is some change of a subxact.
- After that, there is second xl_xact_assignment (for another subxact)
revealing the relationship between top and first subxact.
Such a transaction won't be streamed anyway because we hadn't seen it in
full. Saying for sure whether xact of some record encountered after
the snapshot was deserialized can be streamed or not requires to know
whether it wrote something before deserialization point --if yes, it
hasn't been seen in full and can't be decoded. Snapshot doesn't have such
info, so there is no easy way to relax the check.
Reported-by: Hsu, John
Diagnosed-by: Arseny Sher
Author: Arseny Sher, Amit Kapila
Reviewed-by: Amit Kapila, Dilip Kumar
Backpatch-through: 9.5
Discussion: https://postgr.es/m/AB5978B2-1772-4FEE-A245-74C91704ECB0@amazon.com
This was unaccountably omitted in the original RLS patch.
The SQL syntax is basically the same as for comments on triggers,
so crib code from dumpTrigger().
Per report from Marc Munro. Back-patch to all supported branches.
Discussion: https://postgr.es/m/1581889298.18009.15.camel@bloodnok.com
The GRANTED BY clause in GRANT/REVOKE ROLE has been there since 2005
but was never documented. I'm not sure now whether that was just an
oversight or was intentional (given the limited capability of the
option). But seeing that pg_dumpall does emit code that uses this
option, it seems like not documenting it at all is a bad idea.
Also, when we upgraded the syntax to allow CURRENT_USER/SESSION_USER
as the privilege recipient, the role form of GRANT was incorrectly
not modified to show that, and REVOKE's docs weren't touched at all.
Although I'm not that excited about GRANTED BY, the other oversight
seems serious enough to justify a back-patch.
Discussion: https://postgr.es/m/3070.1581526786@sss.pgh.pa.us
Marking an object as dependant on an extension did not have any
privilege check whatsoever; this allowed any user to mark objects as
droppable by anyone able to DROP EXTENSION, which could be used to cause
system-wide havoc. Disallow by checking that the calling user owns the
mentioned object.
(No constraints are placed on the extension.)
Security: CVE-2020-1720
Reported-by: Tom Lane
Discussion: 31605.1566429043@sss.pgh.pa.us
The previous coding forgot to apply shell quoting to the socket
directory and the data folder, leading to failures when running
pg_upgrade. This refactors the code generating the pg_ctl command
starting clusters to use a more correct shell quoting. Failures are
easier to trigger in 12 and newer versions by using a value of
--socketdir that includes quotes, but it is also possible to cause
failures with quotes included in the default socket directory used by
pg_upgrade or the data folders of the clusters involved in the
upgrade.
As 9.4 is going to be EOL'd with the next minor release, nobody is
likely going to upgrade to it now so this branch is not included in the
set of branches fixed.
Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Noah Misch
Backpatch-through: 9.5