Soon the stats collector will be no more, with statistics instead getting
stored in shared memory. There are a lot of references to the stats collector
in comments. This commit replaces most of these references with "cumulative
statistics system", with the remaining ones getting replaced as part of
subsequent commits.
This is done separately from the - quite large - shared memory statistics
patch to make review easier.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de
The transactional integration code is largely independent from the rest of
pgstat.c. Subsequent commits will add more related code.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20220404041516.cctrvpadhuriawlq@alap3.anarazel.de
It might seem pointless to allow use of dsm in single user mode, but otherwise
subsystems might need dedicated single user mode code paths.
Besides changing the assert, all that's needed is to make some windows code
assuming the presence of postmaster conditional.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA+hUKGL9hY_VY=+oUK+Gc1iSRx-Ls5qeYJ6q=dQVZnT3R63Taw@mail.gmail.com
Exclusive-mode backups have been deprecated since 9.6 (when
non-exclusive backups were introduced) due to the issues
they can cause should the system crash while one is running and
generally because non-exclusive provides a much better interface.
Further, exclusive backup mode wasn't really being tested (nor was most
of the related code- like being able to log in just to stop an exclusive
backup and the bits of the state machine related to that) and having to
possibly deal with an exclusive backup and the backup_label file
existing during pg_basebackup, pg_rewind, etc, added other complexities
that we are better off without.
This patch removes the exclusive backup mode, the various special cases
for dealing with it, and greatly simplifies the online backup code and
documentation.
Authors: David Steele, Nathan Bossart
Reviewed-by: Chapman Flack
Discussion: https://postgr.es/m/ac7339ca-3718-3c93-929f-99e725d1172c@pgmasters.nethttps://postgr.es/m/CAHg+QDfiM+WU61tF6=nPZocMZvHDzCK47Kneyb0ZRULYzV5sKQ@mail.gmail.com
This patch allows "PGC_SUSET" parameters to be set by non-superusers
if they have been explicitly granted the privilege to do so.
The privilege to perform ALTER SYSTEM SET/RESET on a specific parameter
can also be granted.
Such privileges are cluster-wide, not per database. They are tracked
in a new shared catalog, pg_parameter_acl.
Granting and revoking these new privileges works as one would expect.
One caveat is that PGC_USERSET GUCs are unaffected by the SET privilege
--- one could wish that those were handled by a revocable grant to
PUBLIC, but they are not, because we couldn't make it robust enough
for GUCs defined by extensions.
Mark Dilger, reviewed at various times by Andrew Dunstan, Robert Haas,
Joshua Brindle, and myself
Discussion: https://postgr.es/m/3D691E20-C1D5-4B80-8BA5-6BEB63AF3029@enterprisedb.com
The test created a 1m row table in order to test parallel operation of
JSON_VALUE. However, this was more than were needed for the test, so
save time by halving it, and also by making the table unlogged.
Experimentation shows that this size is only a little above the number
required to generate the expected output.
Per gripe from Andres Freund
Discussion: https://postgr.es/m/20220406022118.3ocqvhxr6kciw5am@alap3.anarazel.de
In commit 27e1f1456, create_append_plan() only allowed the subplan
created from a given subpath to be executed asynchronously when it was
an async-capable ForeignPath. To extend coverage, this patch handles
cases when the given subpath includes some other Path types as well that
can be omitted in the plan processing, such as a ProjectionPath directly
atop an async-capable ForeignPath, allowing asynchronous execution in
partitioned-scan/partitioned-join queries with non-Var tlist expressions
and more UNION queries.
Andrey Lepikhov and Etsuro Fujita, reviewed by Alexander Pyhalov and
Zhihong Yu.
Discussion: https://postgr.es/m/659c37a8-3e71-0ff2-394c-f04428c76f08%40postgrespro.ru
Commit 4a39f87acd changed the aggregated log format. Problem is, now
the explanatory paragraph for the log line in the document is too
long. Also the log format included more optional columns, and it's
harder to parse the log lines. This commit tries to solve the
problems.
- There's no optional log columns anymore. If a column is not
meaningful with provided pgbench option, it will be presented as 0.
- Reorder the log columns so that it's easier to parse them.
- Adjust explanatory paragraph for the log line in the doc.
Discussion: https://postgr.es/m/flat/202203280757.3tu4ovs3petm%40alvherre.pgsql
It's possible for the query that "waits for restart" to complete a
successful iteration before the postmaster has noticed its SIGKILL'd
child and begun the restart cycle. (This is a bit hard to believe
perhaps, but it's been seen at least twice in the buildfarm, mainly
on ancient platforms that likely have quirky schedulers.)
To provide a more secure interlock, wait for the other session
we're using to report that it's been forcibly shut down.
Patch by me, based on a suggestion from Andres Freund.
Back-patch to v14 where this test case came in.
Discussion: https://postgr.es/m/1801850.1649047827@sss.pgh.pa.us
The pg_fatal log which included filesizes were using UINT64_FORMAT for
the size_t variables, which failed on 32 bit buildfarm animals. Change
to using plain int instead, which is in line with how digestControlFile
is doing it already.
Per buildfarm animals florican and lapwing.
Discussion: https://postgr.es/m/13C2BF64-4A6D-47E4-9181-3A658F00C9B7@yesql.se
These clauses allow the user to specify how data from nested paths are
joined, allowing considerable freedom in shaping the tabular output of
JSON_TABLE.
PLAN DEFAULT allows the user to specify the global strategies when
dealing with sibling or child nested paths. The is often sufficient to
achieve the necessary goal, and is considerably simpler than the full
PLAN clause, which allows the user to specify the strategy to be used
for each named nested path.
Nikita Glukhov
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zhihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
Discussion: https://postgr.es/m/7e2cb85d-24cf-4abb-30a5-1a33715959bd@postgrespro.ru
Commits 74cf7d46 and a61daa14 fixed pg_upgrade bugs involving oversights
in how relfrozenxid or relminmxid are carried forward or initialized.
Corruption caused by bugs of this nature was ameliorated by commit
78db307bb2, which taught VACUUM to always overwrite existing invalid
relfrozenxid or relminmxid values that are apparently "in the future".
Extend that work now by showing a warning in the event of overwriting
either relfrozenxid or relminmxid due to an existing value that is "in
the future". There is probably a decent chance that the sanity checks
added by commit 699bf7d05c will raise an error before VACUUM reaches
this point, but we shouldn't rely on that.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzmRZEzeGvLv8yDW0AbFmSvJjTziORqjVUrf74mL4GL0Ww@mail.gmail.com
There's a race condition if a file changes in the source system
after we have collected the file list. If the file becomes larger,
we only fetched up to its original size. That can easily result in
a truncated file. That's not a problem for relation files, files
in pg_xact, etc. because any actions on them will be replayed from
the WAL. However, configuration files are affected.
This commit mitigates the race condition by fetching small files in
whole, even if they have grown. A test is added in which an extra
file copied is concurrently grown with the output of pg_rewind thus
guaranteeing it to have changed in size during the operation. This
is not a full fix: we still believe the original file size for files
larger than 1 MB. That should be enough for configuration files,
and doing more than that would require big changes to the chunking
logic in libpq_source.c.
This mitigates the race condition if the file is modified between
the original scan of files and copying the file, but there's still
a race condition if a file is changed while it's being copied.
That's a much smaller window, though, and pg_basebackup has the
same issue.
This race can be seen with pg_auto_failover, which frequently uses
ALTER SYSTEM, which updates postgresql.auto.conf. Often, pg_rewind
will fail, because the postgresql.auto.conf file changed concurrently
and a partial version of it was copied to the target. The partial
file would fail to parse, preventing the server from starting up.
Author: Heikki Linnakangas
Reviewed-by: Cary Huang
Discussion: https://postgr.es/m/f67feb24-5833-88cb-1020-19a4a2b83ac7%40iki.fi
The test logic is extended with two new concepts:
- Addition of a compression command called compress_cmd, executed
between restore_cmd and dump_cmd to control the contents of the dumps.
In the case of this commit, this is used to compress or decompress
elements of a dump to test new code paths.
- Addition of a new flag called compile_option, to check if a set of
tests can be executed depending on the ./configure options used in a
given build.
The tests introduced here are for gzip, but they are designed so as they
can easily be extended for new compression methods.
Author: Georgios Kokolatos, Rachel Heaton
Discussion: https://postgr.es/m/faUNEOpts9vunEaLnmxmG-DldLSg_ql137OC3JYDmgrOMHm1RvvWY2IdBkv_CRxm5spCCb_OmKNk2T03TMm0fBEWveFF9wA1WizPuAgB7Ss=@protonmail.com
* Move the execution pruning initialization steps that are common
between both ExecInitAppend() and ExecInitMergeAppend() into a new
function ExecInitPartitionPruning() defined in execPartition.c.
Those steps include creation of a PartitionPruneState to be used for
all instances of pruning and determining the minimal set of child
subplans that need to be initialized by performing initial pruning if
needed, and finally adjusting the subplan_map arrays in the
PartitionPruneState to reflect the new set of subplans remaining
after initial pruning if it was indeed performed.
ExecCreatePartitionPruneState() is no longer exported out of
execPartition.c and has been renamed to CreatePartitionPruneState()
as a local sub-routine of ExecInitPartitionPruning().
* Likewise, ExecFindInitialMatchingSubPlans() that was in charge of
performing initial pruning no longer needs to be exported. In fact,
since it would now have the same body as the more generally named
ExecFindMatchingSubPlans(), except differing in the value of
initial_prune passed to the common subroutine
find_matching_subplans_recurse(), it seems better to remove it and add
an initial_prune argument to ExecFindMatchingSubPlans().
* Add an ExprContext field to PartitionPruneContext to remove the
implicit assumption in the runtime pruning code that the ExprContext to
use to compute pruning expressions that need one can always rely on the
PlanState providing it. A future patch will allow runtime pruning (at
least the initial pruning steps) to be performed without the
corresponding PlanState yet having been created, so this will help.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqEYCpEqh2LMDOp9mT+4-QoVe8HgFMKBjntEMCTZLpcCCA@mail.gmail.com
The expected backend message after SIGQUIT changed in commit
7e784d1dc, but we missed updating this test case. Also, experience
shows that we might sometimes get "could not send data to server"
instead of either of the libpq messages the test is looking for.
Per report from Mark Dilger. Back-patch to v14 where the
backend message changed.
Discussion: https://postgr.es/m/17BD82D7-49AC-40C9-8204-E7ADD30321A0@enterprisedb.com
The previous coding of dshash_seq_next(), on the first call, accessed
status->hash_table->size_log2 without holding a partition lock and without
guaranteeing that ensure_valid_bucket_pointers() had ever been called.
That oversight turns out to not have immediately visible effects, because
bucket 0 is always in partition 0, and ensure_valid_bucket_pointers() was
called after acquiring the partition lock. However,
PARTITION_FOR_BUCKET_INDEX() with a size_log2 of 0 ends up triggering formally
undefined behaviour.
Simplify by accessing partition 0, without using PARTITION_FOR_BUCKET_INDEX().
While at it, remove dshash_get_current(), there is no convincing use
case. Also polish a few comments.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA+hUKGL9hY_VY=+oUK+Gc1iSRx-Ls5qeYJ6q=dQVZnT3R63Taw@mail.gmail.com
These would all need to be rephrased when moving to shared memory stats, but
since they don't provide actual information right now, remove them instead.
The comments for PgStat_Msg* are left in, because they will all be removed as
part of the shared memory stats patch.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
This feature allows jsonb data to be treated as a table and thus used in
a FROM clause like other tabular data. Data can be selected from the
jsonb using jsonpath expressions, and hoisted out of nested structures
in the jsonb to form multiple rows, more or less like an outer join.
Nikita Glukhov
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zhihong Yu (whose
name I previously misspelled), Himanshu Upadhyaya, Daniel Gustafsson,
Justin Pryzby.
Discussion: https://postgr.es/m/7e2cb85d-24cf-4abb-30a5-1a33715959bd@postgrespro.ru
Previously, psql printed only the last result if a command string
returned multiple result sets. Now it prints all of them. The
previous behavior can be obtained by setting the psql variable
SHOW_ALL_RESULTS to off.
This is a significantly enhanced version of
3a5130672296ed4e682403a77a9a3ad3d21cef75 (that was later reverted).
There is also much more test coverage for various psql features now.
Author: Fabien COELHO <coelho@cri.ensmp.fr>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: "Iwata, Aya" <iwata.aya@jp.fujitsu.com> (earlier version)
Reviewed-by: Daniel Verite <daniel@manitou-mail.org> (earlier version)
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> (earlier version)
Reviewed-by: vignesh C <vignesh21@gmail.com> (earlier version)
Discussion: https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1904132231510.8961@lancre
This script runs the core regression tests with quite a small value of
shared_buffers, making it prone to breakage due to synchronize_seqscans
kicking in where the tests don't expect that. Disable that feature to
stabilize the tests.
Discussion: https://postgr.es/m/1258185.1648876239@sss.pgh.pa.us
Commit f4fb45d15c tried to free memory during aggregate finalization.
This cause issues, particularly when used as a window function, so stop
doing that.
Per complaint by Jaime Casanova and diagnosis by Andres Freund
Discussion: https://postgr.es/m/YkfeMNYRCGhySKyg@ahch-to
Before this patch, there was some code that tried to make sure that the
buffer was always big enough at the start, and then asserted that it
didn't need to be enlarged later. However, the code to make sure it was
big enough at the start doesn't actually work, and therefore it was
possible to fail an assertion and crash later.
Remove the code that tries to make sure the buffer is always big enough
at the start in favor of enlarging the buffer as we go along whenever
that is necessary.
The mistake probably happened because, on the server side, we do
actually need to guarantee that the buffer is big enough at the start
to avoid subsequent resizings. However, in that case, the calling
code makes promises about how much data it will provide at once, but
here, that's not the case.
Report by Justin Pryzby. Analysis by me. Patch by Dipesh Pandit.
Discussion: http://postgr.es/m/20220330143536.GG28503@telsasoft.com
The general usage pattern when we store tuples in tuplesort.c is that
we store a series of tuples one by one then either perform a sort or spill
them to disk. In the common case, there is no pfreeing of already stored
tuples. For the common case since we do not individually pfree tuples, we
have very little need for aset.c memory allocation behavior which
maintains freelists and always rounds allocation sizes up to the next
power of 2 size.
Here we conditionally use generation.c contexts for storing tuples in
tuplesort.c when the sort will never be bounded. Unfortunately, the
memory context to store tuples is already created by the time any calls
would be made to tuplesort_set_bound(), so here we add a new sort option
that allows callers to specify if they're going to need a bounded sort or
not. We'll use a standard aset.c allocator when this sort option is not
set.
Extension authors must ensure that the TUPLESORT_ALLOWBOUNDED flag is
used when calling tuplesort_begin_* for any sorts that make a call to
tuplesort_set_bound().
Author: David Rowley
Reviewed-by: Andy Fan
Discussion: https://postgr.es/m/CAApHDvoH4ASzsAOyHcxkuY01Qf++8JJ0paw+03dk+W25tQEcNQ@mail.gmail.com
This replaces the bool flag for randomAccess. An upcoming patch requires
adding another option, so instead of breaking the API for that, then
breaking it again one day if we add more options, let's just break it
once. Any boolean options we add in the future will just make use of an
unused bit in the flags.
Any extensions making use of tuplesorts will need to update their code
to pass TUPLESORT_RANDOMACCESS instead of true for randomAccess.
TUPLESORT_NONE can be used for a set of empty options.
Author: David Rowley
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/CAApHDvoH4ASzsAOyHcxkuY01Qf%2B%2B8JJ0paw%2B03dk%2BW25tQEcNQ%40mail.gmail.com
Here we make a series of improvements to the generation memory
allocator, namely:
1. Allow generation contexts to have a minimum, initial and maximum block
sizes. The standard allocator allows this already but when the generation
context was added, it only allowed fixed-sized blocks. The problem with
fixed-sized blocks is that it's difficult to choose how large to make the
blocks. If the chosen size is too small then we'd end up with a large
number of blocks and a large number of malloc calls. If the block size is
made too large, then memory is wasted.
2. Add support for "keeper" blocks. This is a special block that is
allocated along with the context itself but is never freed. Instead,
when the last chunk in the keeper block is freed, we simply mark the block
as empty to allow new allocations to make use of it.
3. Add facility to "recycle" newly empty blocks instead of freeing them
and having to later malloc an entire new block again. We do this by
recording a single GenerationBlock which has become empty of any chunks.
When we run out of space in the current block, we check to see if there is
a "freeblock" and use that if it contains enough space for the allocation.
Author: David Rowley, Tomas Vondra
Reviewed-by: Andy Fan
Discussion: https://postgr.es/m/d987fd54-01f8-0f73-af6c-519f799a0ab8@enterprisedb.com
When dispatching sort operations to specialized variants, commit
69749243 failed to handle the case where CLUSTER-sort decides not to
initialize datum1 and isnull1. Fix by hoisting that decision up a level
and advertising whether datum1 can be relied on, in the Tuplesortstate
object.
Per reports from UBsan and Valgrind build farm animals, while running
the cluster.sql test.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAFBsxsF1TeK5Fic0M%2BTSJXzbKsY6aBqJGNj6ptURuB09ZF6k_w%40mail.gmail.com
datetime.c's parsing logic has assumed that strtod() will accept
a string that looks like ".", which it does in glibc, but not on
some less-common platforms such as AIX. The result of this was
that datetime fields like "123." would be accepted on some platforms
but not others; which is a sufficiently odd case that it's not that
surprising we've heard no field complaints. But commit e39f99046
extended that assumption to new places, and happened to add a test
case that exposed the platform dependency. Remove this dependency
by special-casing situations without any digits after the decimal
point.
(Again, this is in part a pre-existing bug but I don't feel a
compulsion to back-patch.)
Also, rearrange e39f99046's changes in formatting.c to avoid a
Coverity complaint that we were copying an uninitialized field.
Discussion: https://postgr.es/m/1592893.1648969747@sss.pgh.pa.us
Non-aggressive VACUUMs were at a gratuitous disadvantage (relative to
aggressive VACUUMs) around advancing relfrozenxid and relminmxid before
now. The issue only came up when concurrent activity unset some heap
page's visibility map bit right as VACUUM was considering if the page
should get counted in frozenskipped_pages. The non-aggressive case
would recheck the all-frozen bit at this point. The aggressive case
reasoned that the page (a skippable page) must have at least been
all-frozen in the recent past, so skipping it won't make relfrozenxid
advancement unsafe (which is never okay for aggressive VACUUMs).
The recheck created a window for some other backend to confuse matters
for VACUUM. If the page's VM bit turned out to be unset, VACUUM would
conclude that the page was _never_ all-frozen. frozenskipped_pages was
not incremented, and yet VACUUM couldn't back out of skipping at this
late stage (it couldn't choose to scan the page instead). This made it
unsafe to advance relfrozenxid later on.
Consistently avoid the issue by generalizing how we skip frozen pages
during aggressive VACUUMs: take the same approach when skipping any
skippable page range during aggressive and non-aggressive VACUUMs alike.
The new approach makes ranges (not individual pages) the fundamental
unit of skipping using the visibility map. frozenskipped_pages is
replaced with a boolean flag that represents whether some skippable
range with one or more all-visible pages was actually skipped.
It is safe for VACUUM to treat a page as all-frozen provided it at least
had its all-frozen bit set after the OldestXmin cutoff was established.
VACUUM is only required to scan pages that might have XIDs < OldestXmin
(unfrozen XIDs) to be able to safely advance relfrozenxid. Tuples
concurrently inserted on "skipped" pages can be thought of as equivalent
to tuples concurrently inserted on a block >= rel_pages.
It's possible that the issue this commit fixes hardly ever came up in
practice. But we only had to be unlucky once to lose out on advancing
relfrozenxid -- a single affected heap page was enough to throw VACUUM
off. That seems like something to avoid on general principle. This is
similar to an issue fixed by commit 44fa8488, which taught vacuumlazy.c
to not give up on non-aggressive relfrozenxid advancement just because a
cleanup lock wasn't immediately available on some heap page.
Skipping an all-visible range is now explicitly structured as a choice
made by non-aggressive VACUUMs, by weighing known costs (scanning extra
skippable pages to freeze their tuples early) against known benefits
(advancing relfrozenxid early). This works in essentially the same way
as it always has (don't skip ranges < SKIP_PAGES_THRESHOLD). We could
do much better here in the future by considering other relevant factors.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzn6bGJGfOy3zSTJicKLw99PHJeSOQBOViKjSCinaxUKDQ@mail.gmail.com
Discussion: https://postgr.es/m/CA%2BTgmoZiSOY6H7aadw5ZZGm7zYmfDzL6nwmL5V7GL4HgJgLF_w%40mail.gmail.com
When VACUUM set relfrozenxid before now, it set it to whatever value was
used to determine which tuples to freeze -- the FreezeLimit cutoff.
This approach was very naive. The relfrozenxid invariant only requires
that new relfrozenxid values be <= the oldest extant XID remaining in
the table (at the point that the VACUUM operation ends), which in
general might be much more recent than FreezeLimit.
VACUUM now carefully tracks the oldest remaining XID/MultiXactId as it
goes (the oldest remaining values _after_ lazy_scan_prune processing).
The final values are set as the table's new relfrozenxid and new
relminmxid in pg_class at the end of each VACUUM. The oldest XID might
come from a tuple's xmin, xmax, or xvac fields. It might even come from
one of the table's remaining MultiXacts.
Final relfrozenxid values must still be >= FreezeLimit in an aggressive
VACUUM (FreezeLimit still acts as a lower bound on the final value that
aggressive VACUUM can set relfrozenxid to). Since standard VACUUMs
still make no guarantees about advancing relfrozenxid, they might as
well set relfrozenxid to a value from well before FreezeLimit when the
opportunity presents itself. In general standard VACUUMs may now set
relfrozenxid to any value > the original relfrozenxid and <= OldestXmin.
Credit for the general idea of using the oldest extant XID to set
pg_class.relfrozenxid at the end of VACUUM goes to Andres Freund.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com
VACUUM VERBOSE and autovacuum log reports were taught to report the
details of how VACUUM advanced relfrozenxid (and relminmxid) by commit
872770fd. Highlight this by adding a "Tip" to the documentation, next
to related discussion of age(relfrozenxid) monitoring.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wzk0C1O-MKkOrj4YAfsGRru2=cA2VQpqM-9R1HNuG3nFaQ@mail.gmail.com
DecodeInterval (interval input) was careless about integer-overflow
hazards, allowing bogus results to be obtained for sufficiently
large input values. Also, since it initially converted the input
to a "struct tm", it was impossible to produce the full range of
representable interval values.
Meanwhile, EncodeInterval (interval output) and a few other
functions could suffer failures if asked to process sufficiently
large interval values, because they also relied on being able to
represent an interval in "struct tm" which is not designed to
handle that.
Fix all this stuff by introducing new struct types that are more
fit for purpose.
While this is clearly a bug fix, it's also an API break for any
code that's calling these functions directly. So back-patching
doesn't seem wise, especially in view of the lack of field
complaints.
Joe Koshakow, editorialized a bit by me
Discussion: https://postgr.es/m/CAAvxfHff0JLYHwyBrtMx_=6wr=k2Xp+D+-X3vEhHjJYMj+mQcg@mail.gmail.com
Adjust the documentation's coverage of aggressive VACUUMs to make it
clearer that aggressive vacuuming isn't always strictly necessary. It's
possible for non-aggressive VACUUMs to advance relfrozenxid/relminmxid
without fail, given the right workload conditions.
While this has always been true, it matters more with recent and pending
improvements to VACUUM. These improvements make non-aggressive
vacuuming more likely to advance relfrozenxid/relminmxid in practice.
While this is an unrelated improvement to the docs, formally speaking,
it still doesn't seem worth backpatching. So don't backpatch.
Cover some cases that would have been broken by a proposed patch,
but we failed to notice for lack of test coverage. I'm pushing
this separately mainly to memorialize that it *is* our historical
behavior.
Discussion: https://postgr.es/m/1344498.1648920056@sss.pgh.pa.us
Move some of the heap_vacuum_rel() instrumentation related variables to
the scope where they're actually needed. Also reorder some of the
variable declarations at the start of heap_vacuum_rel() so that related
variables appear together.
This is essentially the same as applying VACUUM FULL to a partitioned
table, which has been supported since commit 3c3bb99330aa (March 2017).
While there's no great use case in applying CLUSTER to partitioned
tables, we don't have any strong reason not to allow it either.
For now, partitioned indexes cannot be marked clustered, so an index
must always be specified.
While at it, rename some variables that were RangeVars during the
development that led to 8bc717cb8878 but never made it that way to the
source tree; there's no need to perpetuate names that have always been
more confusing than helpful.
Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/20201028003312.GU9241@telsasoft.com
Discussion: https://postgr.es/m/20200611153502.GT14879@telsasoft.com
Remove circular "25.1.5.1. Multixacts And Wraparound" link that
references the section that the link itself appears in. An explanation
of MultiXactId age appears only a few sentences before the link, so
there's no question that the link is superfluous at best.
Oversight in commit d5409295.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch: 14-
The buildfarm has revealed some instability in results from catalog
queries in tests from commit 1a36bc9dba8. Cure this by adding ORDER BY
to such queries.