1
0
mirror of https://github.com/postgres/postgres.git synced 2025-10-16 17:07:43 +03:00
Commit Graph

62191 Commits

Author SHA1 Message Date
Michael Paquier
85e0ff62b6 Improve stability of btree page split on ERRORs
This improves the stability of VACUUM when processing btree indexes,
which was previously able to trigger an assertion failure in
_bt_lock_subtree_parent() when an error was previously thrown outside
the scope of _bt_split() when splitting a btree page.  VACUUM would
consider the index as in a corrupted state as the right page would not
be zeroed for the error thrown (allocation failure is one pattern).

In a non-assert build, VACUUM is able to succeed, reporting what it sees
as a corruption while attempting to fix the index.  This would manifest
as a LOG message, as of:
LOG: failed to re-find parent key in index "idx" for deletion target
page N
CONTEXT:  while vacuuming index "idx" of relation "public.tab"

This commit improves the code to rely on two PGAlignedBlocks that are
used as a temporary space for the left and right pages.  The main change
concerns the right page, whose contents are now copied into the
"temporary" PGAlignedBlock page while its original space is zeroed.  Its
contents are moved from the PGAlignedBlock page back to the page once we
enter in the critical section used for the split.  This simplifies the
split logic, as it is not necessary to zero the right page before
throwing an error anymore.  Hence errors can now be thrown outside the
split code.  For the left page, this shaves one allocation, with
PageGetTempPage() being previously used.

The previous logic originates from commit 8fa30f906b, at a point where
PGAlignedBlock did not exist yet.  This could be argued as something
that should be backpatched, but the lack of complaints indicates that it
may not be necessary.

Author: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/566dacaf-5751-47e4-abc6-73de17a5d42a@garret.ru
2025-09-26 08:41:06 +09:00
David Rowley
3760d278dc Fix misleading comment in pg_get_statisticsobjdef_string()
The comment claimed that a TABLESPACE reference was added to the
resulting string, but that's not true.  Looks like the comment was
copied from pg_get_indexdef_string() without being adjusted correctly.

Reported-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxHwVPgeu8o9D8oUeDQYEHTAZGt-J5uaJNgYMzkAW7MiCA@mail.gmail.com
2025-09-26 11:04:15 +12:00
David Rowley
4be9024d57 Remove unused parameter from check_and_push_window_quals
... and find_window_run_conditions.

This seems to have been around and unused ever since the Run Condition
feature was added in 9d9c02ccd.  Let's remove it to clean things up a
bit.

Author: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/DD26NJ0Y34ZS.2ZOJPHSY12PFI@gmail.com
2025-09-26 10:21:30 +12:00
Masahiko Sawada
76418a0b67 psql: Add COMPLETE_WITH_FILES and COMPLETE_WITH_GENERATOR macros.
While most tab completions in match_previous_words() use
COMPLETE_WITH* macros to wrap rl_completion_matches(), some direct
calls to rl_completion_matches() still remained.

This commit introduces COMPLETE_WITH_FILES and COMPLETE_WITH_GENERATOR
macros to replace these direct calls, enhancing both code consistency
and readability.

Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20250605100835.b396f9d656df1018f65a4556@sraoss.co.jp
2025-09-25 14:28:01 -07:00
Tom Lane
02c4bc8830 Try to avoid floating-point roundoff error in pg_sleep().
I noticed the surprising behavior that pg_sleep(0.001) will sleep
for 2ms not the expected 1ms.  Apparently the float8 calculation of
time-to-sleep is managing to produce something a hair over 1, which
ceil() rounds up to 2, and then WaitLatch() faithfully waits 2ms.
It could be that this works as-expected for some ranges of current
timestamp but not others, which would account for not having seen
it before.  In any case, let's try to avoid it by removing the
float arithmetic in the delay calculation.  We're stuck with the
declared input type being float8, but we can convert that to integer
microseconds right away, and then work strictly with integral values.
There might still be roundoff surprises for certain input values,
but at least the behavior won't be time-varying.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/3879137.1758825752@sss.pgh.pa.us
2025-09-25 17:02:15 -04:00
Tom Lane
e849bd551c Add minimal sleep to stats isolation test functions.
The functions test_stat_func() and test_stat_func2() had empty
function bodies, so that they took very little time to run.  This made
it possible that on machines with relatively low timer resolution the
functions could return before the clock advanced, making the test fail
(as seen on buildfarm members fruitcrow and hamerkop).

To avoid that, pg_sleep for 10us during the functions.  As far as we
can tell, all current hardware has clock resolution much less than
that.  (The current implementation of pg_sleep will round it up to
1ms anyway, but someday that might get improved.)

Author: Michael Banck <mbanck@gmx.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/68d413a3.a70a0220.24c74c.8be9@mx.google.com
Backpatch-through: 15
2025-09-25 13:29:37 -04:00
Robert Haas
803ef0ed49 Fix array allocation bugs in SetExplainExtensionState.
If we already have an extension_state array but see a new extension_id
much larger than the highest the extension_id we've previously seen,
the old code might have failed to expand the array to a large enough
size, leading to disaster. Also, if we don't have an extension array
at all and need to create one, we should make sure that it's big enough
that we don't have to resize it instantly.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/2949591.1758570711@sss.pgh.pa.us
Backpatch-through: 18
2025-09-25 11:43:52 -04:00
Tom Lane
507aa16125 Doc: clean up documentation for new UUID functions.
Fix assorted failures to conform to our normal style for function
documentation, such as lack of parentheses and incorrect markup.

Author: Marcos Pegoraro <marcos@f10.com.br>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAB-JLwbocrFjKfGHoKY43pHTf49Ca2O0j3WVebC8z-eQBMPJyw@mail.gmail.com
Backpatch-through: 18
2025-09-25 11:23:27 -04:00
Tom Lane
170a8a3f46 Teach doc/src/sgml/Makefile about the new func/*.sgml files.
These were omitted from build dependencies and also tab/nbsp
checks, with the result that "make" did nothing after modifying
a func/*.sgml file.

Oversight in 4e23c9ef6.  AFAICT we don't need any comparable
changes in meson.build, or at least I don't see it doing anything
special for the pre-existing ref/*.sgml files.
2025-09-25 11:09:26 -04:00
Daniel Gustafsson
0b3ce7878a Remove preprocessor guards from injection points
When defining an injection point there is no need to wrap the definition
with USE_INJECTION_POINT guards, the INJECTION_POINT macro is available
in all builds.  Remove to make the code consistent.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/OSCPR01MB14966C8015DEB05ABEF2CE077F51FA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 17
2025-09-25 15:27:33 +02:00
Daniel Gustafsson
d8f07dbb81 Fix comments in recovery tests
Commit 4464fddf removed the large insertions but missed to remove
all the comments referring to them.  Also remove a superfluous ')'
in another comment.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/OSCPR01MB149663A99DAF2826BE691C23DF51FA@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-09-25 15:24:41 +02:00
Álvaro Herrera
7e638d7f50 Don't include execnodes.h in replication/conflict.h
... which silently propagates a lot of headers into many places
via pgstat.h, as evidenced by the variety of headers that this patch
needs to add to seemingly random places.  Add a minimum of typedefs to
conflict.h to be able to remove execnodes.h, and fix the fallout.

Backpatch to 18, where conflict.h first appeared.

Discussion: https://postgr.es/m/202509191927.uj2ijwmho7nv@alvherre.pgsql
2025-09-25 14:52:41 +02:00
Álvaro Herrera
81fc3e28e3 Update some more forward declarations to use typedef
As commit d4d1fc527b.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/202509191025.22agk3fvpilc@alvherre.pgsql
2025-09-25 14:33:19 +02:00
Fujii Masao
668de04309 pgbench: Fix typo in documentation.
This commit fixes a typo introduced in commit b6290ea48e.

Reported off-list by Erik Rijkers <er@xs4all.nl>
2025-09-25 14:06:12 +09:00
Fujii Masao
b6290ea48e pgbench: Clarify documentation for \gset and \aset.
This commit updates the pgbench documentation to list \gset and \aset
as separate terms for easier reading. It also clarifies that \gset raises
an error if the query returns zero or multiple rows, and explains how to
detect cases where the query with \aset returned no rows.

Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20250626180125.5b896902a3d0bcd93f86c240@sraoss.co.jp
2025-09-25 12:09:32 +09:00
Fujii Masao
879c492480 vacuumdb: Do not run VACUUM (ONLY_DATABASE_STATS) when --analyze-only.
Previously, vacuumdb --analyze-only issued VACUUM (ONLY_DATABASE_STATS)
at the end. Since --analyze-only is meant to update optimizer statistics only,
this extra VACUUM command is unnecessary.

This commit prevents vacuumdb --analyze-only from running that redundant
VACUUM command.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwEqHGa-k=wbRMucUVihHVXk4NQkK94GNN=ym9cQ5HBSHg@mail.gmail.com
2025-09-25 01:38:54 +09:00
Melanie Plageman
ae8ea7278c Correct prune WAL record opcode name in comment
f83d709760 incorrectly refers to a XLOG_HEAP2_PRUNE_FREEZE WAL record
opcode. No such code exists. The relevant opcodes are
XLOG_HEAP2_PRUNE_ON_ACCESS, XLOG_HEAP2_PRUNE_VACUUM_SCAN, and
XLOG_HEAP2_PRUNE_VACUUM_CLEANUP. Correct it.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
2025-09-24 12:29:56 -04:00
Tom Lane
aadbcc40bc Ensure guc_tables.o's dependency on guc_tables.inc.c is known.
Without this, rebuilds can malfunction unless --enable-depend is used.
Historically we've expected that you can get away without
--enable-depend as long as you manually clean after changing *.h
files; the makefiles are supposed to handle other sorts of
dependencies.  So add this one.

Follow-on to 635998965, so no need for back-patch.

Discussion: https://postgr.es/m/3121329.1758650878@sss.pgh.pa.us
2025-09-24 12:28:20 -04:00
Tom Lane
7ccbf6d8b5 Include pg_test_timing's full output in the TAP test log.
We were already doing a short (1-second) pg_test_timing run during
check-world and buildfarm runs.  But we weren't doing anything
with the result except for a basic regex-based sanity check.
Collecting that output from buildfarm runs is seeming very
attractive though, because it would help us determine what sort
of timing resolution is available on supported platforms.
It's not very long, so let's just note it verbatim in the TAP log.

Discussion: https://postgr.es/m/3321785.1758728271@sss.pgh.pa.us
2025-09-24 12:09:11 -04:00
Fujii Masao
7fcb32ad02 Fix incorrect and inconsistent comments in tableam.h and heapam.c.
This commit corrects several issues in function comments:

* The parameter "rel" was incorrectly referred to as "relation" in the comments
   for table_tuple_delete(), table_tuple_update(), and table_tuple_lock().
* In table_tuple_delete(), "changingPart" was listed as an output parameter
   in the comments but is actually input.
* In table_tuple_update(), "slot" was listed as an input parameter
   in the comments but is actually output.
* The comment for "update_indexes" in table_tuple_update() was mis-indented.
* The comments for heap_lock_tuple() incorrectly referenced a non-existent
   "tid" parameter.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2nB6Ay8g=KEn7L3qbYX_4+sLk9XOMkV0XZqHR4cTY8ZvQ@mail.gmail.com
2025-09-25 00:51:59 +09:00
Peter Eisentraut
a5b35fcedb Remove PointerIsValid()
This doesn't provide any value over the standard style of checking the
pointer directly or comparing against NULL.

Also remove related:
- AllocPointerIsValid() [unused]
- IndexScanIsValid() [had one user]
- HeapScanIsValid() [unused]
- InvalidRelation [unused]

Leaving HeapTupleIsValid(), ItemIdIsValid(), PortalIsValid(),
RelationIsValid for now, to reduce code churn.

Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/ad50ab6b-6f74-4603-b099-1cd6382fb13d%40eisentraut.org
Discussion: https://www.postgresql.org/message-id/CA+hUKG+NFKnr=K4oybwDvT35dW=VAjAAfiuLxp+5JeZSOV3nBg@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/bccf2803-5252-47c2-9ff0-340502d5bd1c@iki.fi
2025-09-24 15:17:20 +02:00
Daniel Gustafsson
0fba25eb72 Fix incorrect option name in usage screen
The usage screen incorrectly refered to the --docs option as --sgml.
Backpatch down to v17 where this script was introduced.

Author: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20250729.135638.1148639539103758555.horikyota.ntt@gmail.com
Backpatch-through: 17
2025-09-24 14:58:18 +02:00
Daniel Gustafsson
711ccce38f Consistently handle tab delimiters for wait event names
Format validation and element extraction for intermediate line
strings were inconsistent in their handling of tab delimiters,
which resulted in an unclear error when multiple tab characters
were used as a delimiter.  This fixes it by using captures from
the validation regex instead of a separate split() to avoid the
inconsistency.  Also, it ensures that \t+ is used consistently
when inspecting the strings.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20250729.135638.1148639539103758555.horikyota.ntt@gmail.com
2025-09-24 14:57:26 +02:00
John Naylor
5334620eef Update GB18030 encoding from version 2000 to 2022
Mappings for 18 characters have changed, affecting 36 code points. This
is a break in compatibility, but these characters are rarely used.

U+E5E5 (Private Use Area) was previously mapped to \xA3A0. This code
point now maps to \x65356535. Attempting to convert \xA3A0 will now
raise an error.

Separate from the 2022 update, the following mappings were previously
swapped, and subsequently corrected in 2000 and later versions:
 * U+E7C7 (Private Use Area) now maps to \x8135F437
 * U+1E3F (Latin Small Letter M with Acute) now maps to \xA8BC

The 2022 standard mentions the following policy changes, but they
have no effect in our implementation:

66 new ideographs are now required, but these are mapped
algorithmically so were already handled by utf8_and_gb18030.c.

Nine CJK compatibility ideographs are no longer required, but
implementations may retain them, as does the source we use from
the Unicode Consortium.

Release notes: Compatibility section

For further details, see:
https://www.unicode.org/L2/L2022/22274-disruptive-changes.pdf
https://ken-lunde.medium.com/the-gb-18030-2022-standard-3d0ebaeb4132

Author: Chao Li <lic@highgo.com>
Author: Zheng Tao <taoz@highgo.com>
Discussion: https://postgr.es/m/966d9fc.169.198741fe60b.Coremail.jiaoshuntian%40highgo.com
2025-09-24 13:26:05 +07:00
Amit Kapila
e41d954da6 Fix LOCK_TIMEOUT handling during parallel apply.
Previously, the parallel apply worker used SIGINT to receive a graceful
shutdown signal from the leader apply worker. However, SIGINT is also used
by the LOCK_TIMEOUT handler to trigger a query-cancel interrupt. This
overlap caused the parallel apply worker to miss LOCK_TIMEOUT signals,
leading to incorrect behavior during lock wait/contention.

This patch resolves the conflict by switching the graceful shutdown signal
from SIGINT to SIGUSR2.

Reported-by: Zane Duffield <duffieldzane@gmail.com>
Diagnosed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/CACMiCkXyC4au74kvE2g6Y=mCEF8X6r-Ne_ty4r7qWkUjRE4+oQ@mail.gmail.com
2025-09-24 04:11:53 +00:00
Michael Paquier
f83fe65f3f Fix compiler warnings in test_bitmapset
The macros doing conversions of/from "text" from/to Bitmapset were using
arbitrary casts with Datum, something that is not fine since
2a600a93c7.

These macros do not actually need casts with Datum, as they are given
already "text" and Bitmapset data in input.  They are updated to use
cstring_to_text() and text_to_cstring(), fixing the compiler warnings
reported by the buildfarm.  Note that appending a -m32 to gcc to trigger
32-bit builds was enough to reproduce the warnings here.

While on it, outer parenthesis are added to TEXT_TO_BITMAPSET(), and
inner parenthesis are removed from BITMAPSET_TO_TEXT(), to make these
macros more consistent with the style used in the tree, based on
suggestions by Tom Lane.

Oversights in commit 00c3d87a5c.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Greg Burd <greg@burd.me>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3027069.1758606227@sss.pgh.pa.us
2025-09-24 08:20:23 +09:00
Robert Haas
f2bae51dfd Keep track of what RTIs a Result node is scanning.
Result nodes now include an RTI set, which is only non-NULL when they
have no subplan, and is taken from the relid set of the RelOptInfo that
the Result is generating. ExplainPreScanNode now takes notice of these
RTIs, which means that a few things get schema-qualified in the
regression tests that previously did not. This makes the output more
consistent between cases where some part of the plan tree is replaced by
a Result node and those where this does not happen.

Likewise, pg_overexplain's EXPLAIN (RANGE_TABLE) now displays the RTIs
stored in a Result node just as it already does for other RTI-bearing
node types.

Result nodes also now include a result_reason, which tells us something
about why the Result node was inserted.  Using that information, EXPLAIN
now emits, where relevant, a "Replaces" line describing the origin of
a Result node.

The purpose of these changes is to allow code that inspects a Plan
tree to understand the origin of Result nodes that appear therein.

Discussion: http://postgr.es/m/CA+TgmoYeUZePZWLsSO+1FAN7UPePT_RMEZBKkqYBJVCF1s60=w@mail.gmail.com
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
2025-09-23 09:07:55 -04:00
Daniel Gustafsson
a48d1ef586 doc: Remove trailing whitespace in xref
Remove stray whitespace in xref tag.

This was found due to a regression in xmllint 2.15.0 which flagged
this as an error, and at the time of this commit no fix for xmllint
has shipped.

Author: Erik Wienhold <ewie@ewie.name>
Discussion: https://postgr.es/m/f4c4661b-4e60-4c10-9336-768b7b55c084@ewie.name
Backpatch-through: 17
2025-09-22 10:12:31 +02:00
Michael Paquier
00c3d87a5c Add a test module for Bitmapset
Bitmapset has a complex set of APIs, defined in bitmapset.h, and it can
be hard to test edge cases with the backend core code only.

This test module is aimed at closing the gap, and implements a set of
SQL functions that act as wrappers of the low-level C functions of the
same names.  These functions rely on text as data type for the input and
the output as Bitmapset as a node has support for these.  An extra
function, named test_random_operations(), can be used to stress bitmaps
with random member values and a defined number of operations potentially
useful for other purposes than only tests.

The coverage increases from 85.2% to 93.4%.  It should be possible to
cover more code paths, but at least it's a beginning.

Author: Greg Burd <greg@burd.me>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/7BD1ABDB-B03A-464A-9BA9-A73B55AD8A1F@getmailspring.com
2025-09-22 16:53:00 +09:00
David Rowley
9fc7f6ab72 Fix various incorrect filename references
Author: Chao Li <li.evan.chao@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2=hOBCPm-Z=F15twr_23XjHeoXSbifP5GdEdtWona97wQ@mail.gmail.com
2025-09-22 13:33:17 +12:00
Richard Guo
e3a0304eba Fix misleading comment in RangeTblEntry
The comment describing join_using_alias incorrectly referred to the
alias field as being defined "below", when it actually appears earlier
in the RangeTblEntry struct.  This patch fixes that.

Author: Steve Lau <stevelauc@outlook.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/TYWPR01MB10612B020C33FD08F729415CEB613A@TYWPR01MB10612.jpnprd01.prod.outlook.com
2025-09-22 10:04:39 +09:00
Michael Paquier
293a3286d7 Fix meson build with -Duuid=ossp when using version older than 0.60
The package for the UUID library may be named "uuid" or "ossp-uuid", and
meson.build has been using a single call of dependency() with multiple
names, something only supported since meson 0.60.0.

The minimum version of meson supported by Postgres is 0.57.2 on HEAD,
since f039c22441, and 0.54 on stable branches down to 16.

Author: Oreo Yang <oreo.yang@hotmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/OS3P301MB01656E6F91539770682B1E77E711A@OS3P301MB0165.JPNP301.PROD.OUTLOOK.COM
Backpatch-through: 16
2025-09-22 08:03:23 +09:00
Daniel Gustafsson
e1d917182c Add support for base64url encoding and decoding
This adds support for base64url encoding and decoding, a base64
variant which is safe to use in filenames and URLs.  base64url
replaces '+' in the base64 alphabet with '-' and '/' with '_',
thus making it safe for URL addresses and file systems.

Support for base64url was originally suggested by Przemysław Sztoch.

Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Chao Li (Evan) <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/70f2b6a8-486a-4fdb-a951-84cef35e22ab@sztoch.pl
2025-09-20 23:19:32 +02:00
Tom Lane
261f89a976 Track the maximum possible frequency of non-MCE array elements.
The lossy-counting algorithm that ANALYZE uses to identify most-common
array elements has a notion of cutoff frequency: elements with
frequency greater than that are guaranteed to be collected, elements
with smaller frequencies are not.  In cases where we find fewer MCEs
than the stats target would permit us to store, the cutoff frequency
provides valuable additional information, to wit that there are no
non-MCEs with frequency greater than that.  What the selectivity
estimation functions actually use the "minfreq" entry for is as a
ceiling on the possible frequency of non-MCEs, so using the cutoff
rather than the lowest stored MCE frequency provides a tighter bound
and more accurate estimates.

Therefore, instead of redundantly storing the minimum observed MCE
frequency, store the cutoff frequency when there are fewer tracked
values than we want.  (When there are more, then of course we cannot
assert that no non-stored elements are above the cutoff frequency,
since we're throwing away some that are; so we still use the
minimum stored frequency in that case.)

Notably, this works even when none of the values are common enough
to be called MCEs.  In such cases we previously stored nothing in
the STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the
selectivity functions falling back to default estimates.  So in that
case we want to construct a STATISTIC_KIND_MCELEM entry that contains
no "values" but does have "numbers", to wit the three extra numbers
that the MCELEM entry type defines.  A small obstacle is that
update_attstats() has traditionally stored a null, not an empty array,
when passed zero "values" for a slot.  That gives rise to an MCELEM
entry that get_attstatsslot() will spit up on.  The least risky
solution seems to be to adjust update_attstats() so that it will emit
a non-null (but possibly empty) array when the passed stavalues array
pointer isn't NULL, rather than conditioning that on numvalues > 0.
In other existing cases I don't believe that that changes anything.
For consistency, handle the stanumbers array the same way.

In passing, improve the comments in routines that use
STATISTIC_KIND_MCELEM data.  Particularly, explain why we use
minfreq / 2 not minfreq as the estimate for non-MCE values.

Thanks to Matt Long for the suggestion that we could apply this
idea even when there are more than zero MCEs.

Reported-by: Mark Frost <FROSTMAR@uk.ibm.com>
Reported-by: Matt Long <matt@mattlong.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
2025-09-20 14:48:16 -04:00
Tom Lane
1eccb93150 Re-allow using statistics for bool-valued functions in WHERE.
Commit a391ff3c3, which added the ability for a function's support
function to provide a custom selectivity estimate for "WHERE f(...)",
unintentionally removed the possibility of applying expression
statistics after finding there's no applicable support function.
That happened because we no longer fell through to boolvarsel()
as before.  Refactor to do so again, putting the 0.3333333 default
back into boolvarsel() where it had been (cf. commit 39df0f150).

I surely wouldn't have made this error if 39df0f150 had included
a test case, so add one now.  At the time we did not have the
"extended statistics" infrastructure, but we do now, and it is
also unable to work in this scenario because of this error.
So make use of that for the test case.

This is very clearly a bug fix, but I'm afraid to put it into
released branches because of the likelihood of altering plan
choices, which we avoid doing in minor releases.  So, master only.

Reported-by: Frédéric Yhuel <frederic.yhuel@dalibo.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/a8b99dce-1bfb-4d97-af73-54a32b85c916@dalibo.com
2025-09-20 12:44:52 -04:00
Nathan Bossart
18cdf5932a Fix obsolete references to postgres.h in comments.
Oversights in commits d08741eab5 and d952373a98.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aMxbfSJ2wLWd32x-%40nathan
2025-09-19 09:19:03 -05:00
David Rowley
ac7c8e412c Improve wording in a few comments
Initially this was to fix the "catched" typo, but I (David) wasn't quite
clear on what the previous comment meant about being "effective".  I
expect this means efficiency, so I've reworded the comment to indicate
that.

While this is only a comment fixup, for the sake of possibly minimizing
possible future backpatching pain, I've opted to backpatch to 18 since
this code is new to that version and the release isn't out the door yet.

Author: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAHewXNmSYWPud1sfBvpKbCJeRkWeZYuqatxtV9U9LvAFXBEiBw@mail.gmail.com
Backpatch-through: 18
2025-09-19 23:35:23 +12:00
Amit Kapila
5b148706c5 Add optional pid parameter to pg_replication_origin_session_setup().
Commit 216a784829 introduced parallel apply workers, allowing multiple
processes to share a replication origin. To support this,
replorigin_session_setup() was extended to accept a pid argument
identifying the process using the origin.

This commit exposes that capability through the SQL interface function
pg_replication_origin_session_setup() by adding an optional pid parameter.
This enables multiple processes to coordinate replication using the same
origin when using SQL-level replication functions.

This change allows the non-builtin logical replication solutions to
implement parallel apply for large transactions.

Additionally, an existing internal error was made user-facing, as it can
now be triggered via the exposed SQL API.

Author: Doruk Yilmaz <doruk@mixrank.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com
Discussion: https://postgr.es/m/CAE2gYzyTSNvHY1+iWUwykaLETSuAZsCWyryokjP6rG46ZvRgQA@mail.gmail.com
2025-09-19 05:38:40 +00:00
Amit Kapila
8aac5923a3 Improve few errdetail messages introduced in commit 0d48d393d4.
Based on suggestions by Tom Lane

Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/20250916.114644.275726106301941878.horikyota.ntt@gmail.com
2025-09-19 04:52:59 +00:00
Michael Paquier
deb208df45 Make XLogFlush() and XLogNeedsFlush() decision-making more consistent
When deciding which code path to use depending on the state of recovery,
XLogFlush() and XLogNeedsFlush() have been relying on different
criterias:
- XLogFlush() relied on XLogInsertAllowed().
- XLogNeedsFlush() relied on RecoveryInProgress().

Currently, the checkpointer is allowed to insert WAL records while
RecoveryInProgress() returns true for an end-of-recovery checkpoint,
where XLogInsertAllowed() matters.  Using RecoveryInProgress() in
XLogNeedsFlush() did not really matter for its existing callers, as the
checkpointer only called XLogFlush().  However, a feature under
discussion, by Melanie Plageman, needs XLogNeedsFlush() to be able to
work in more contexts, the end-of-recovery checkpoint being one.

This commit changes XLogNeedsFlush() to use XLogInsertAllowed() instead
of RecoveryInProgress(), making the checks in both routines more
consistent.  While on it, an assertion based on XLogNeedsFlush() is
added at the end of XLogFlush(), triggered when flushing a physical
position (not for the normal recovery patch that checks for updates of
the minimum recovery point).  This assertion would fail for example in
the recovery test 015_promotion_pages if XLogNeedsFlush() is changed to
use RecoveryInProgress().  This should be hopefully enough to ensure
that the checks done in both routines remain consistent.

Author: Melanie Plageman <melanieplageman@gmail.com>
Co-authored-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAAKRu_a1vZRZRWO3_jv_X13RYoqLRVipGO0237g5PKzPa2YX6g@mail.gmail.com
2025-09-19 13:47:28 +09:00
Amit Langote
8741e48e5d Fix EPQ crash from missing partition pruning state in EState
Commit bb3ec16e14 moved partition pruning metadata into PlannedStmt.
At executor startup this metadata is used to initialize the EState
fields es_part_prune_infos, es_part_prune_states, and
es_part_prune_results.  EvalPlanQualStart() failed to copy those
fields into the child EState, causing NULL dereference when Append
ran partition pruning during a recheck. This can occur with DELETE
or UPDATE on partitioned tables that use runtime pruning, e.g. with
generic plans.

Fix by copying all partition pruning state into the EPQ estate.

Add an isolation test that reproduces the crash with concurrent
UPDATE and DELETE on a partitioned table, where the DELETE session
hits the crash during its EPQ recheck after the UPDATE commits.

Bug: #19056
Reported-by: Fei Changhong <feichanghong@qq.com>
Diagnozed-by: Fei Changhong <feichanghong@qq.com>
Author: David Rowley <dgrowleyml@gmail.com>
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/19056-a677cef9b54d76a0%40postgresql.org
2025-09-19 11:38:29 +09:00
Michael Paquier
3cd3a039da Document and check that PgStat_HashKey has no padding
This change is a tighter rework of 7d85d87f4d, which tried to improve
the code so as it would work should PgStat_HashKey gain new fields that
create padding bytes.  However, the previous change is proving to not be
enough as some code paths of pgstats do not pass PgStat_HashKey by
reference (valgrind would warn when padding is added to the structure,
through a new field).

Per discussion, let's document and check that PgStat_HashKey has no
padding rather than try to complicate the code of pgstats so as it is
able to work around that.

This removes a couple of memset(0) calls that should not be required.
While on it, this commit adds a static assertion checking that no
padding is introduced in the structure, by checking that the size of
PgStat_HashKey matches with the sum of the size of all its fields.

The object ID part of the hash key is already 8 bytes, which should be
plenty enough already.  A comment is added to discourage the addition of
new fields.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0t9omat+HVSakJXwTMWvhpYFcAZb41RPWKwrKFUgmAFBQ@mail.gmail.com
2025-09-19 09:54:05 +09:00
Nathan Bossart
16607718c0 Add a test harness for the LWLock tranche code.
This code is heavily used and already has decent test coverage, but
it lacks a dedicated test suite.  This commit changes that.

Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0tQ%2BEYSTOd2hQ8RXdsNfGBLAtOe-YmnsTE6ZVg0E-4qew%40mail.gmail.com
Discussion: https://postgr.es/m/CAA5RZ0vpr0P2rbA%3D_K0_SCHM7bmfVX4wEO9FAyopN1eWCYORhA%40mail.gmail.com
2025-09-18 15:23:11 -05:00
Nathan Bossart
c3cc2ab87d Fix re-initialization of LWLock-related shared memory.
When shared memory is re-initialized after a crash, the named
LWLock tranche request array that was copied to shared memory will
no longer be accessible.  To fix, save the pointer to the original
array in postmaster's local memory, and switch to it when
re-initializing the LWLock-related shared memory.

Oversight in commit ed1aad15e0.  Per buildfarm member batta.

Reported-by: Michael Paquier <michael@paquier.xyz>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aMoejB3iTWy1SxfF%40paquier.xyz
Discussion: https://postgr.es/m/f8ca018f-3479-49f6-a92c-e31db9f849d7%40gmail.com
2025-09-18 09:55:39 -05:00
Fujii Masao
2e66cae935 pgbench: Remove unused argument from create_sql_command().
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Discussion: https://postgr.es/m/20250917112814.096f660ea4c3c64630475e62@sraoss.co.jp
2025-09-18 11:22:21 +09:00
Fujii Masao
45f50c995f pg_restore: Fix security label handling with --no-publications/subscriptions.
Previously, pg_restore did not skip security labels on publications or
subscriptions even when --no-publications or --no-subscriptions was specified.
As a result, it could issue SECURITY LABEL commands for objects that were
never created, causing those commands to fail.

This commit fixes the issue by ensuring that security labels on publications
and subscriptions are also skipped when the corresponding options are used.

Backpatch to all supported versions.

Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACJufxHCt00pR9h51AVu6+yPD5J7JQn=7dQXxqacj0XyDhc-fA@mail.gmail.com
Backpatch-through: 13
2025-09-18 11:09:15 +09:00
Andres Freund
0110e2ec5c Mark shared buffer lookup table HASH_FIXED_SIZE
StrategyInitialize() calls InitBufTable() with maximum number of entries that
the buffer lookup table can ever have. Thus there should not be any need to
allocate more element after initialization. Hence mark the hash table as fixed
sized.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAExHW5v0jh3F_wj86yC=qBfWk0uiT94qy=Z41uzAHLHh0SerRA@mail.gmail.com
2025-09-17 20:28:43 -04:00
Tom Lane
b0cc0a71e0 Calculate agglevelsup correctly when Aggref contains a CTE.
If an aggregate function call contains a sub-select that has
an RTE referencing a CTE outside the aggregate, we must treat
that reference like a Var referencing the CTE's query level
for purposes of determining the aggregate's level.  Otherwise
we might reach the nonsensical conclusion that the aggregate
should be evaluated at some query level higher than the CTE,
ending in a planner error or a broken plan tree that causes
executor failures.

Bug: #19055
Reported-by: BugForge <dllggyx@outlook.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19055-6970cfa8556a394d@postgresql.org
Backpatch-through: 13
2025-09-17 16:32:57 -04:00
Thomas Munro
0951942bba jit: Fix type used for Datum values in LLVM IR.
Commit 2a600a93 made Datum 8 bytes wide everywhere.  It was no longer
appropriate to use TypeSizeT on 32 bit systems, and JIT compilation
would fail with various type check errors.  Introduce a separate
LLVMTypeRef with the name TypeDatum.  TypeSizeT is still used in some
places for actual size_t values.

Reported-by: Dmitry Mityugov <d.mityugov@postgrespro.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Dmitry Mityugov <d.mityugov@postgrespro.ru>
Discussion: https://postgr.es/m/0a9f0be59171c2e8f1b3bc10f4fcf267%40postgrespro.ru
2025-09-17 13:38:35 +12:00
Michael Paquier
39f67d9b55 injection_points: Fix incrementation of variable-numbered stats
The pending entry was not used when incrementing its data, directly
manipulating the shared memory pointer, without even locking it.  This
could mean losing statistics under concurrent activity.  The flush
callback was a no-op.

This code serves as a base template for extensions for the custom
cumulative statistics, so let's be clean and use a pending entry for the
incrementations, whose data is then flushed to the corresponding entry
in the shared hashtable when all the stats are reported, in its own
flush callback.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0v0U0yhPbY+bqChomkPbyUrRQ3rQXnZf_SB-svDiQOpgQ@mail.gmail.com
Backpatch-through: 18
2025-09-17 10:15:13 +09:00