During pruning and freezing in phase I of vacuum, we delay clearing
all_visible and all_frozen in the presence of dead items. This allows
opportunistic freezing if the page would otherwise be fully frozen,
since those dead items are later removed in vacuum phase III.
To move the VM update into the same WAL record that
prunes and freezes tuples, we must know whether the page will
be marked all-visible/all-frozen before emitting WAL. Previously we
waited until after emitting WAL to update all_visible/all_frozen to
their correct values.
The only barrier to updating these flags immediately after deciding
whether to opportunistically freeze was that while emitting WAL for a
record freezing tuples, we use the pre-corrected value of all_frozen to
compute the snapshot conflict horizon. By determining the conflict
horizon earlier, we can update the flags immediately after making the
opportunistic freeze decision.
This is required to set the VM in the XLOG_HEAP2_PRUNE_VACUUM_SCAN
record emitted by pruning and freezing.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
When running in Docker, the container may not have privileges needed by
get_mempolicy(). This is called by numa_available() in libnuma, but
versions prior to 2.0.19 did not expect that. The numa_available() call
seemingly succeeds, but then we get unexpected failures when trying to
query status of pages:
postgres =# select * from pg_shmem_allocations_numa;
ERROR: XX000: failed NUMA pages inquiry status: Operation not
permitted
LOCATION: pg_get_shmem_allocations_numa, shmem.c:691
The best solution is to call get_mempolicy() first, and proceed to
numa_available() only when it does not fail with EPERM. Otherwise we'd
need to treat older libnuma versions as insufficient, which seems a bit
too harsh, as this only affects containerized systems.
Fix by me, based on suggestions by Christoph. Backpatch to 18, where the
NUMA functions were introduced.
Reported-by: Christoph Berg <myon@debian.org>
Reviewed-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aPDZOxjrmEo_1JRG@msg.df7cb.de
Backpatch-through: 18
The upstream timezone code uses a bool variable as an array subscript.
Back when PostgreSQL's bool was char, this would have caused a warning
from gcc -Wchar-subscripts, which is included in -Wall. But this has
been obsolete since probably commit d26a810ebf, but certainly since
bool is now the C standard bool. So we can remove this deviation from
the upstream code, to make future code merges simpler.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/9ad2749f-77ab-4ecb-a321-1ca915480b05%40eisentraut.org
Commit 792353f7d5 updated the pg_dump and pg_dumpall documentation to
clarify which statistics are not included in their output. The pg_upgrade
documentation contained a nearly identical description, but it was not updated
at the same time.
This commit updates the pg_upgrade documentation to match those changes.
Backpatch to v18, where commit 792353f7d5 was backpatched to.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Discussion: https://postgr.es/m/CAHGQGwFnfgdGz8aGWVzgFCFwoWQU7KnFFjmxinf4RkQAkzmR+w@mail.gmail.com
Backpatch-through: 18
MSVCRT predated C99 and invented non-standard placeholders for 64-bit
numbers, and then later used them in standard macros when C99
<inttypes.h> arrived. The macros just use %lld etc when building with
UCRT, so there should be no way for our interposed sprintf.c code to
receive the pre-standard kind these days. Time to drop the code that
parses them.
That code was in fact already dead when commit 962da900 landed, as we'd
disclaimed MSVCRT support a couple of weeks earlier in commit 1758d424,
but patch development overlapped and the history of these macros hadn't
been investigated.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/4d8b1a67-aab2-4429-b44b-f03988095939%40eisentraut.org
If both sides of the operator have most-common-value statistics,
eqjoinsel wants to check which MCVs have matches on the other side.
Formerly it did this with a dumb compare-all-the-entries loop,
which had O(N^2) behavior for long MCV lists. When that code was
written, twenty-plus years ago, that seemed tolerable; but nowadays
people frequently use much larger statistics targets, so that the
O(N^2) behavior can hurt quite a bit.
To add insult to injury, when asked for semijoin semantics, the
entire comparison loop was done over, even though we frequently
know that it will yield exactly the same results.
To improve matters, switch to using a hash table to perform the
matching. Testing suggests that depending on the data type, we may
need up to about 100 MCVs on each side to amortize the extra costs
of setting up the hash table and performing hash-value computations;
so continue to use the old looping method when there are fewer MCVs
than that.
Also, refactor so that we don't repeat the matching work unless
we really need to, which occurs only in the uncommon case where
eqjoinsel_semi decides to truncate the set of inner MCVs it
considers. The refactoring also got rid of the need to use the
presented operator's commutator. Real-world operators that are
using eqjoinsel should pretty much always have commutators, but
at the very least this saves a few syscache lookups.
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Co-authored-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20ea8bf5-3569-4e46-92ef-ebb2666debf6@tantorlabs.com
The fix for bug #19055 (commit b0cc0a71e) allowed CTE references in
sub-selects within aggregate functions to affect the semantic levels
assigned to such aggregates. It turns out this broke some related
cases, leading to assertion failures or strange planner errors such
as "unexpected outer reference in CTE query". After experimenting
with some alternative rules for assigning the semantic level in
such cases, we've come to the conclusion that changing the level
is more likely to break things than be helpful.
Therefore, this patch undoes what b0cc0a71e changed, and instead
installs logic to throw an error if there is any reference to a
CTE that's below the semantic level that standard SQL rules would
assign to the aggregate based on its contained Var and Aggref nodes.
(The SQL standard disallows sub-selects within aggregate functions,
so it can't reach the troublesome case and hence has no rule for
what to do.)
Perhaps someone will come along with a legitimate query that this
logic rejects, and if so probably the example will help us craft
a level-adjustment rule that works better than what b0cc0a71e did.
I'm not holding my breath for that though, because the previous
logic had been there for a very long time before bug #19055 without
complaints, and that bug report sure looks to have originated from
fuzzing not from real usage.
Like b0cc0a71e, back-patch to all supported branches, though
sadly that no longer includes v13.
Bug: #19106
Reported-by: Kamil Monicz <kamil@monicz.dev>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19106-9dd3668a0734cd72@postgresql.org
Backpatch-through: 14
This file is written for 8-space tabs, since we expect that most
users who edit their configuration files use 8-space tabs.
However, most of PostgreSQL is written for 4-space tabs, and at
least one popular web interface defaults to 4-space tabs. Rather
than trying to standardize on a particular tab width for this file,
let's just switch to spaces.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aReNUKdMgKxLqmq7%40nathan
This patch renames the sync_error_count column to sync_table_error_count
in the pg_stat_subscription_stats view. The new name makes the purpose
explicit now that a separate column exists to track sequence
synchronization errors.
Additionally, the column seq_sync_error_count is renamed to
sync_seq_error_count to maintain a consistent naming pattern, making it
easier for users to group, and query synchronization related counters.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CALDaNm3WwJmz=-4ybTkhniB-Nf3qmFG9Zx1uKjyLLoPF5NYYXA@mail.gmail.com
Remove bogus stripping of RelabelTypes: that can result in building
an output SAOP tree with incorrect exposed exprType for the operands,
which might confuse polymorphic operators. Moreover it demonstrably
prevents folding some OR-trees to SAOPs when the RHS expressions
have different base types that were coerced to the same type by
RelabelTypes.
Reduce prohibition on type_is_rowtype to just disallow type RECORD.
We need that because otherwise we would happily fold multiple RECORD
Consts into a RECORDARRAY Const even if they aren't the same record
type. (We could allow that perhaps, if we checked that they all have
the same typmod, but the case doesn't seem worth that much effort.)
However, there is no reason at all to disallow the transformation
for named composite types, nor domains over them: as long as we can
find a suitable array type we're good.
Remove some assertions that seem rather out of place (it's not
this code's duty to verify that the RestrictInfo structure is
sane). Rewrite some comments.
The issues with RelabelType stripping seem severe enough to
back-patch this into v18 where the code was introduced.
Author: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAHewXN=aH7GQBk4fXU-WaEeVmQWUmBAeNyBfJ3VKzPphyPKUkQ@mail.gmail.com
Backpatch-through: 18
The documentation did not previously mention the default values for
the --fsync-interval and --plugin options, even though pg_recvlogical --help
shows them. This omission made it harder for users to understand
the tool's behavior from the documentation alone.
This commit adds the missing default value descriptions for both options
to the pg_recvlogical documentation.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/CAHGQGwFqssPBjkWMFofGq32e_tANOeWN-cM=6biAP3nnFUXMRw@mail.gmail.com
PostgreSQL 18 deprecated password_encryption='md5', but the
comments for this GUC in the sample configuration file did
not mention the deprecation. Update comments with a notice
to make as many users as possible aware of it. Also add a
comment to the related md5_password_warnings GUC while there.
Author: Michael Banck <mbanck@gmx.net>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Backpatch-through: 18
The existing format of pg_dependencies uses a single-object JSON
structure, with each key value embedding all the knowledge about the
set attributes tracked, like:
{"1 => 5": 1.000000, "5 => 1": 0.423130}
While this is a very compact format, it is confusing to read and it is
difficult to manipulate the values within the object, particularly when
tracking multiple attributes.
The new output format introduced in this commit is a JSON array of
objects, with:
- A key named "degree", with a float value.
- A key named "attributes", with an array of attribute numbers.
- A key named "dependency", with an attribute number.
The values use the same underlying type as previously when printed, with
a new output format that shows now as follows:
[{"degree": 1.000000, "attributes": [1], "dependency": 5},
{"degree": 0.423130, "attributes": [5], "dependency": 1}]
This new format will become handy for a follow-up set of changes, so as
it becomes possible to inject extended statistics rather than require an
ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new
cluster.
This format has been suggested by Tomas Vondra. The key names are
defined in the header introduced by 1f927cce44, to ease the
integration of frontend-specific changes that are still under
discussion. (Again a personal note: if anybody comes up with better
name for the keys, of course feel free.)
The bulk of the changes come from the regression tests, where
jsonb_pretty() is now used to make the outputs generated easier to
parse.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
The existing format of pg_ndistinct uses a single-object JSON structure
where each key is itself a comma-separated list of attnums, like:
{"3, 4": 11, "3, 6": 11, "4, 6": 11, "3, 4, 6": 11}
While this is a very compact format, it is confusing to read and it is
difficult to manipulate the values within the object.
The new output format introduced in this commit is an array of objects,
with:
- A key named "attributes", that contains an array of attribute numbers.
- A key named "ndistinct", represented as an integer.
The values use the same underlying type as previously when printed, with
a new output format that shows now as follows:
[{"ndistinct": 11, "attributes": [3,4]},
{"ndistinct": 11, "attributes": [3,6]},
{"ndistinct": 11, "attributes": [4,6]},
{"ndistinct": 11, "attributes": [3,4,6]}]
This new format will become handy for a follow-up set of changes, so as
it becomes possible to inject extended statistics rather than require an
ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new
cluster.
This format has been suggested by Tomas Vondra. The key names are
defined in a new header, to ease with the integration of
frontend-specific changes that are still under discussion. (Personal
note: I am not specifically wedded to these key names, but if there are
better name suggestions for this release, feel free.)
The bulk of the changes come from the regression tests, where
jsonb_pretty() is now used to make the outputs generated easier to
parse.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
Until d2ea2d310d, the PS_USE_PS_STRINGS
option was used on the GNU/Hurd. As this option got removed and
PS_USE_CLOBBER_ARGV appears to work fine nowadays on the Hurd, define
this one to re-enable process title changes on this platform.
In the 14 and 15 branches, the existing test for __hurd__ (added 25
years ago by commit 209aa77d, removed in 16 by the above commit) is left
unchanged for now as it was activating slightly different code paths and
would need investigation by a Hurd user.
Author: Michael Banck <mbanck@debian.org>
Discussion: https://postgr.es/m/CA%2BhUKGJMNGUAqf27WbckYFrM-Mavy0RKJvocfJU%3DJ2XcAZyv%2Bw%40mail.gmail.com
Backpatch-through: 16
This new test, added in 009_log_temp_files, checks that the temporary
files created by a WITH HOLD cursor are dropped at the end of the
transaction where the transaction has been created.
The portal's executor is shutdown in PersistHoldablePortal(), after for
example some forced detoast, so as the cursor data can be accessed
without requiring a snapshot.
Author: Mircea Cadariu <cadariu.mircea@gmail.com>
Discussion: https://postgr.es/m/0a666d28-9080-4239-90d6-f6345bb43468@gmail.com
When instrumenting a MERGE command containing both WHEN NOT MATCHED BY
SOURCE and WHEN NOT MATCHED BY TARGET actions using EXPLAIN ANALYZE, a
concurrent update of the target relation could lead to an Assert
failure in show_modifytable_info(). In a non-assert build, this would
lead to an incorrect value for "skipped" tuples in the EXPLAIN output,
rather than a crash.
This could happen if the concurrent update caused a matched row to no
longer match, in which case ExecMerge() treats the single originally
matched row as a pair of not matched rows, and potentially executes 2
not-matched actions for the single source row. This could then lead to
a state where the number of rows processed by the ModifyTable node
exceeds the number of rows produced by its source node, causing
"skipped_path" in show_modifytable_info() to be negative, triggering
the Assert.
Fix this in ExecMergeMatched() by incrementing the instrumentation
tuple count on the source node whenever a concurrent update of this
kind is detected, if both kinds of merge actions exist, so that the
number of source rows matches the number of actions potentially
executed, and the "skipped" tuple count is correct.
Back-patch to v17, where support for WHEN NOT MATCHED BY SOURCE
actions was introduced.
Bug: #19111
Reported-by: Dilip Kumar <dilipbalaut@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/19111-5b06624513d301b3@postgresql.org
Backpatch-through: 17
WaitLSNWakeup() incorrectly returned early when called with
InvalidXLogRecPtr (meaning "wake all waiters"), because the fast-path
check compared minWaitedLSN > 0 without validating currentLSN first.
This caused WAIT FOR LSN commands to wait indefinitely during standby
promotion until random signals woke them.
Add an XLogRecPtrIsValid() check before the comparison so that
InvalidXLogRecPtr bypasses the fast-path and wakes all waiters immediately.
Discussion: https://postgr.es/m/CABPTF7UieOYbOgH3EnQCasaqcT1T4N6V2wammwrWCohQTnD_Lw%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>