1
0
mirror of https://github.com/postgres/postgres.git synced 2026-01-26 09:41:40 +03:00
Commit Graph

63142 Commits

Author SHA1 Message Date
Tom Lane
4576208454 Force standard_conforming_strings to always be ON.
Continuing to support this backwards-compatibility feature has
nontrivial costs; in particular it is potentially a security hazard
if an application somehow gets confused about which setting the
server is using.  We changed the default to ON fifteen years ago,
which seems like enough time for applications to have adapted.
Let's remove support for the legacy string syntax.

We should not remove the GUC altogether, since client-side code will
still test it, pg_dump scripts will attempt to set it to ON, etc.
Instead, just prevent it from being set to OFF.  There is precedent
for this approach (see commit de66987ad).

This patch does remove the related GUC escape_string_warning, however.
That setting does nothing when standard_conforming_strings is on,
so it's now useless.  We could leave it in place as a do-nothing
setting to avoid breaking clients that still set it, if there are any.
But it seems likely that any such client is also trying to turn off
standard_conforming_strings, so it'll need work anyway.

The client-side changes in this patch are pretty minimal, because even
though we are dropping the server's support, most of our clients still
need to be able to talk to older server versions.  We could remove
dead client code only once we disclaim compatibility with pre-v19
servers, which is surely years away.  One change of note is that
pg_dump/pg_dumpall now set standard_conforming_strings = on in their
source session, rather than accepting the source server's default.
This ensures that literals in view definitions and such will be
printed in a way that's acceptable to v19+.  In particular,
pg_upgrade will work transparently even if the source installation has
standard_conforming_strings = off.  (However, pg_restore will behave
the same as before if given an archive file containing
standard_conforming_strings = off.  Such an archive will not be safely
restorable into v19+, but we shouldn't break the ability to extract
valid data from it for use with an older server.)

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3279216.1767072538@sss.pgh.pa.us
2026-01-21 15:08:38 -05:00
Álvaro Herrera
4d6a66f675 Allow Boolean reloptions to have ternary values
From the user's point of view these are just Boolean values; from the
implementation side we can now distinguish an option that hasn't been
set.  Reimplement the vacuum_truncate reloption using this type.

This could also be used for reloptions vacuum_index_cleanup and
buffering, but those additionally need a per-option "alias" for the
state where the variable is unset (currently the value "auto").

Author: Nikolay Shaplov <dhyan@nataraj.su>
Reviewed-by: Timur Magomedov <t.magomedov@postgrespro.ru>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/3474141.usfYGdeWWP@thinkpad-pgpro
2026-01-21 20:06:01 +01:00
Tom Lane
cec5fe0d1e Remove useless flag PVC_INCLUDE_CONVERTROWTYPES.
This was introduced in the SJE patch (fc069a3a6), but it doesn't
do anything because pull_var_clause() never tests it.  Apparently
it snuck in from somebody's private fork.  Remove it again, but
only in HEAD -- seems best to let it be in v18.

Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/70008c19d22e3dd1565ca57f8436c0ba@postgrespro.ru
2026-01-21 13:26:19 -05:00
Álvaro Herrera
1f28982e40 amcheck: Fix snapshot usage in bt_index_parent_check
We were using SnapshotAny to do some index checks, but that's wrong and
causes spurious errors when used on indexes created by CREATE INDEX
CONCURRENTLY.  Fix it to use an MVCC snapshot, and add a test for it.

Backpatch of 6bd469d26a to branches 14-16.  I previously misidentified
the bug's origin: it came in with commit 7f563c09f8 (pg11-era, not
5ae2087202 as claimed previously), so all live branches are affected.

Also take the opportunity to fix some comments that we failed to update
in the original commits and apply pgperltidy.  In branch 14, remove the
unnecessary test plan specification (which would have need to have been
changed anyway; c.f. commit 549ec201d613.)

Diagnosed-by: Donghang Lin <donghanglin@gmail.com>
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Backpatch-through: 17
Discussion: https://postgr.es/m/CANtu0ojmVd27fEhfpST7RG2KZvwkX=dMyKUqg0KM87FkOSdz8Q@mail.gmail.com
2026-01-21 18:55:43 +01:00
Peter Eisentraut
e6bb491bf2 Remove more leftovers of AIX support
The make variables MKLDEXPORT and POSTGRES_IMP were only used for AIX,
so they should have been removed with commit 0b16bb8776.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/7a48b624-2236-4e11-9b9d-6a3c658d77a1%40eisentraut.org
2026-01-21 14:51:05 +01:00
Michael Paquier
1572ea96e6 pg_stat_statements: Add more tests for level tracking
This commit adds tests to verify the computation of the nesting level
for two code paths: the planner hook and the ExecutorFinish() hook.  The
nesting level is essential to save a correct "toplevel" status for the
added PGSS entries.

The author has noticed that removing the manipulations of nesting_level
in these two code paths did not cause the tests to complain, meaning
that we never had coverage for the assumptions taken by the code.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0uK1PSrgf52bWCtDpzaqbWt04o6ZA7zBm6UQyv7vyvf9w@mail.gmail.com
2026-01-21 18:18:15 +09:00
Peter Eisentraut
b4555cb070 Fix for C++ compatibility
After commit 476b35d4e3, some buildfarm members are complaining about
not recognizing _Noreturn when building the new C++ module
test_cplusplusext.  This is not a C++ feature, but it was gated like

    #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L
    #define pg_noreturn _Noreturn

But apparently that was not sufficient.  Some platforms define
__STDC_VERSION__ even in C++ mode.  (In this particular case, it was
g++ on Solaris, but apparently this is also done by some other
platforms, and it is allowed by the C++ standard.)  To fix, add a

    ... && !defined(__cplusplus)

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg@mail.gmail.com
2026-01-21 08:54:35 +01:00
John Naylor
7892e25924 Update some comments for fasthash
- Add advice about hashing multiple inputs with the incremental API
- Generalize statements that were specific to C strings to include
  all variable length inputs, where applicable.
- Update comments about the standalone functions and make it easy to
  find them.

Reported-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: zengman <zengman@halodbtech.com>
Discussion: https://postgr.es/m/CANWCAZZgKnf8dNOd_w03n88NqOfmMnMv2=D8_Oy6ADGyiMq+cg@mail.gmail.com
Discussion: https://postgr.es/m/CANWCAZa-2mEUY27xBw2TpsybpvVu3Ez4ABrHCBqZpAs_UDTj2Q@mail.gmail.com
2026-01-21 14:11:40 +07:00
Amit Kapila
48efefa6ca Improve errdetail for logical replication conflict messages.
This change enhances the clarity and usefulness of error detail messages
generated during logical replication conflicts. The following improvements
have been made:

1. Eliminate redundant output: Avoid printing duplicate remote row and
replica identity values for the multiple_unique_conflicts conflict type.
2. Improve message structure: Append tuple values directly to the main
error message, separated by a colon (:), for better readability.
3. Simplify local row terminology: Remove the word 'existing' when
referring to the local row, as this is already implied by context.
4. General code refinements: Apply miscellaneous code cleanups to improve
how conflict detail messages are constructed and formatted.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/CAHut+Psgkwy5-yGRJC15izecySGGysrbCszv_z93ess8XtCDOQ@mail.gmail.com
2026-01-21 04:58:03 +00:00
Michael Paquier
905ef401d5 pg_stat_statements: Clean up REGRESS list in Makefile
The "wal" and "entry_timestamp" items were still on the same line, which
was not intentional.

Thinko in f9afd56218.

Reported-by: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/aW6_Xc8auuu5iAPi@paquier.xyz
2026-01-21 11:29:34 +09:00
Michael Paquier
f9afd56218 pg_stat_statements: Rework test order
The test "squashing" was the last item of the REGRESS list, but
"cleanup" should be the second to last, dropping the extension.
"oldextversions" is the last item.

In passing, the REGRESS list is cleaned up to include one item per line,
so as diffs are minimized when adding new test files.

Noticed while playing with this area of the code.

Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/aW6_Xc8auuu5iAPi@paquier.xyz
2026-01-21 07:47:38 +09:00
Peter Eisentraut
476b35d4e3 tests: Add a test C++ extension module
While we already test that our headers are valid C++ using
headerscheck, it turns out that the macros we define might still
expand to invalid C++ code.  This adds a minimal test extension that
is compiled using C++ to test that it's actually possible to build and
run extensions written in C++.  Future commits will improve C++
compatibility of some of our macros and add usage of them to this
extension make sure that they don't regress in the future.

The test module is for the moment disabled when using MSVC.  In
particular, the use of designated initializers in PG_MODULE_MAGIC
would require C++20, for which we are currently not set up.  (GCC and
Clang support it as extensions.)  It is planned to fix this.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg@mail.gmail.com
2026-01-20 16:42:30 +01:00
Álvaro Herrera
f1cd34f952 Use integer backend type when exec'ing a postmaster child
This way we don't have to walk the entire process type array and
strcmp() the string with the names therein.  The integer value can be
directly used as array index instead.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/202512090935.k3xrtr44hxkn@alvherre.pgsql
2026-01-20 16:41:04 +01:00
Alexander Korotkov
30776ca468 Remove redundant pg_unreachable() after elog(ERROR) from ExecWaitStmt()
elog(ERROR) never returns.  Compilers don't always understand this.  So,
sometimes, we have to append pg_unreachable() to keep the compiler quiet
about returning from a non-void function without a value.  But
pg_unreachable() is redundant for ExecWaitStmt(), which is void.

Reported-by: Peter Eisentraut <peter@eisentraut.org>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/8d72a2b3-7423-4a15-a981-e130bf60b1a6%40eisentraut.org
Discussion: https://postgr.es/m/CABPTF7UcuVD0L-X%3DjZFfeygjPaZWWkVRwtWOaJw2tcXbEN2xsA%40mail.gmail.com
2026-01-20 16:10:25 +02:00
Amit Kapila
1ba3eee89a Fix concurrent sequence drops during sequence synchronization.
A recent BF failure showed that commit 7a485bd641 did not handle the case
where a sequence is dropped concurrently during sequence synchronization
on the subscriber. Previously, pg_get_sequence_data() would ERROR out if
the sequence was dropped concurrently. After 7a485bd641, it instead
returns NULL, which leads to an assertion failure on the subscriber.

To handle this change, update sequence synchronization to skip sequences
for which pg_get_sequence_data() returns NULL.

Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CALDaNm0FoGdt+1mzua0t-=wYdup5_zmFrvfNf-L=MGBnj9HAcg@mail.gmail.com
2026-01-20 09:40:13 +00:00
Michael Paquier
7ebb64c557 Add routine to free MCVList
This addition is in the same spirit as 32e27bd320 for MVNDistinct and
MVDependencies, except that we were missing a free routine for the third
type of extended statistics, MCVList.  I was not sure if we needed an
equivalent for MCVList, but after more review of the main patch set for
the import of extended statistics, it has become clear that we do.

This is introduced as its own change as this routine can be useful on
its own.  This one is a piece that has not been written by Corey
Huinker, I have just noticed it by myself on the way.

Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
2026-01-20 13:13:47 +09:00
Bruce Momjian
2e937eeb93 doc: revert "xreflabel" used for PL/Python & libpq chapters
This reverts d8aa21b74f, which was added for the PG 18 release notes,
and adjusts the PG 18 release notes for this change.  This is necessary
since the "xreflabel" affected other references to these chapters.

Reported-by: Robert Treat

Author: Robert Treat

Discussion: https://postgr.es/m/CABV9wwNEZDdp5QtrW5ut0H+MOf6U1PvrqBqmgSTgcixqk+Q73A@mail.gmail.com

Backpatch-through: 18
2026-01-19 22:59:10 -05:00
Michael Paquier
5d95219faa pg_stat_statements: Fix crash in list squashing with Vars
When IN/ANY clauses contain both constants and variable expressions, the
optimizer transforms them into separate structures: constants become
an array expression while variables become individual OR conditions.

This transformation was creating an overlap with the token locations,
causing pg_stat_statements query normalization to crash because it
could not calculate the amount of bytes remaining to write for the
normalized query.

This commit disables squashing for mixed IN list expressions when
constructing a scalar array op, by setting list_start and list_end
to -1 when both variables and non-variables are present.  Some
regression tests are added to PGSS to verify these patterns.

Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0ts9qiONnHjjHxPxtePs22GBo4d3jZ_s2BQC59AN7XbAA@mail.gmail.com
Backpatch-through: 18
2026-01-20 08:11:12 +09:00
Robert Haas
ecd275718b Don't set the truncation block length greater than RELSEG_SIZE.
When faced with a relation containing more than 1 physical segment
(i.e. >1GB, with normal settings), the previous code could compute a
truncation block length greater than RELSEG_SIZE, which could lead to
restore failures of this form:

file "%s" has truncation block length %u in excess of segment size %u

The fix is simply to clamp the maximum computed truncation_block_length
to RELSEG_SiZE. I have also added some comments to clarify the logic.

The test case was written by Oleg Tkachenko, but I have rewritten its
comments.

Reported-by: Oleg Tkachenko <oatkachenko@gmail.com>
Diagnosed-by: Oleg Tkachenko <oatkachenko@gmail.com>
Co-authored-by: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Oleg Tkachenko <oatkachenko@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Backpatch-through: 17
Discussion: http://postgr.es/m/00FEFC88-EA1D-4271-B38F-EB741733A84A@gmail.com
2026-01-19 12:09:32 -05:00
Richard Guo
34740b90bc Fix unsafe pushdown of quals referencing grouping Vars
When checking a subquery's output expressions to see if it's safe to
push down an upper-level qual, check_output_expressions() previously
treated grouping Vars as opaque Vars.  This implicitly assumed they
were stable and scalar.

However, a grouping Var's underlying expression corresponds to the
grouping clause, which may be volatile or set-returning.  If an
upper-level qual references such an output column, pushing it down
into the subquery is unsafe.  This can cause strange results due to
multiple evaluation of a volatile function, or introduce SRFs into
the subquery's WHERE/HAVING quals.

This patch teaches check_output_expressions() to look through grouping
Vars to their underlying expressions.  This ensures that any
volatility or set-returning properties in the grouping clause are
detected, preventing the unsafe pushdown.

We do not need to recursively examine the Vars contained in these
underlying expressions.  Even if they reference outputs from
lower-level subqueries (at any depth), those references are guaranteed
not to expand to volatile or set-returning functions, because
subqueries containing such functions in their targetlists are never
pulled up.

Backpatch to v18, where this issue was introduced.

Reported-by: Eric Ridge <eebbrr@gmail.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/7900964C-F99E-481E-BEE5-4338774CEB9F@gmail.com
Backpatch-through: 18
2026-01-19 11:13:23 +09:00
Tom Lane
228fe0c3e6 Update time zone data files to tzdata release 2025c.
This is pretty pro-forma for our purposes, as the only change
is a historical correction for pre-1976 DST laws in
Baja California.  (Upstream made this release mostly to update
their leap-second data, which we don't use.)  But with minor
releases coming up, we should be up-to-date.

Backpatch-through: 14
2026-01-18 14:54:33 -05:00
Michael Paquier
6bca4b50d0 Fix error message related to end TLI in backup manifest
The code adding the WAL information included in a backup manifest is
cross-checked with the contents of the timeline history file of the end
timeline.  A check based on the end timeline, when it fails, reported
the value of the start timeline in the error message.  This error is
fixed to show the correct timeline number in the report.

This error report would be confusing for users if seen, because it would
provide an incorrect information, so backpatch all the way down.

Oversight in 0d8c9c1210.

Author: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/tencent_0F2949C4594556F672CF4658@qq.com
Backpatch-through: 14
2026-01-18 17:24:25 +09:00
Michael Paquier
2a6ce34b55 Remove useless asserts in report_namespace_conflict()
An assertion is used in this routine to check that a valid namespace OID
is given by the caller, but it was repeated twice: once at the top of
the routine and a second time multiple times in a switch/case.  This
commit removes the assertions within the switch/case.

Thinko in commit 765cbfdc92.

Author: Man Zeng <zengman@halodbtech.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/tencent_40F8C1D82E2EE28065009AAA@qq.com
2026-01-18 16:11:46 +09:00
Peter Eisentraut
6831cd9e3b Fix PL/Python build on MSVC with older Meson
Amendment for commit 2bc60f8621.  With older Meson versions, we need
to specify the Python include directory directly to cc.check_header
instead of relying on the dependency to pass it through.

Author: Bryan Green <dbryan.green@gmail.com>
Discussion: https://www.postgresql.org/message-id/0de98c41-4145-44c1-aac5-087cf5b3e4a9%40gmail.com
2026-01-16 17:25:05 +01:00
Heikki Linnakangas
71379663fe Fix crash in test function on removable_cutoff(NULL)
The function is part of the injection_points test module and only used
in tests. None of the current tests call it with a NULL argument, but
it is supposed to work.

Backpatch-through: 17
2026-01-16 14:42:22 +02:00
Heikki Linnakangas
1c64d2fcbe Fix rare test failure in nbtree_half_dead_pages
If auto-analyze kicks in at just the right moment, it can hold a
snapshot and prevent the VACUUM command in the test from removing the
deleted tuples. The test needs the tuples to be removed, otherwise no
half-dead page is generated. To fix, introduce a helper procedure to
wait for the removable cutoff to advance, like the one used in the
syscache-update-pruned test for similar purposes.

Thanks to Alexander Lakhin for reproducing and analyzing the test
failure, and Tom Lane for the report.

Discussion: https://www.postgresql.org/message-id/307198.1767408023@sss.pgh.pa.us
2026-01-16 14:38:20 +02:00
Andres Freund
84705b3727 bufmgr: Avoid spurious compiler warning after fcb9c977aa
Some compilers, e.g. gcc with -Og or -O1, warn about the wait_event in
BufferLockAcquire() possibly being uninitialized. That can't actually happen,
as the switch() covers all legal lock mode values, but we still need to
silence the warning.  We could add a default:, but we'd like to get a warning
if we were to get a new lock mode in the future.  So just initialize
wait_event to 0.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/934395.1768518154@sss.pgh.pa.us
2026-01-16 06:58:35 -05:00
Michael Paquier
395b73c045 Improve pg_clear_extended_stats() with incorrect relation/stats combination
Issue fat-fingered in d756fa1019, noticed while doing more review of
the main patch set proposed.  I have missed the fact that this can be
triggered by specifying an extended stats object that does not match
with the relation specified and already locked.  Like the cases where
an object defined in input is missing, the code is changed to issue a
WARNING instead of a confusing cache lookup failure.

A regression test is added to cover this case.

Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
2026-01-16 15:24:59 +09:00
Amit Langote
889676a0d5 Fix rowmark handling for non-relation RTEs during executor init
Commit cbc127917e introduced tracking of unpruned relids to skip
processing of pruned partitions. PlannedStmt.unprunableRelids is
computed as the difference between PlannerGlobal.allRelids and
prunableRelids, but allRelids only contains RTE_RELATION entries.
This means non-relation RTEs (VALUES, subqueries, CTEs, etc.) are
never included in unprunableRelids, and consequently not in
es_unpruned_relids at runtime.

As a result, rowmarks attached to non-relation RTEs were incorrectly
skipped during executor initialization. This affects any DML statement
that has rowmarks on such RTEs, including MERGE with a VALUES or
subquery source, and UPDATE/DELETE with joins against subqueries or
CTEs. When a concurrent update triggers an EPQ recheck, the missing
rowmark leads to incorrect results.

Fix by restricting the es_unpruned_relids membership check to
RTE_RELATION entries only, since partition pruning only applies to
actual relations. Rowmarks for other RTE kinds are now always
processed.

Bug: #19355
Reported-by: Bihua Wang <wangbihua.cn@gmail.com>
Diagnosed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/19355-57d7d52ea4980dc6@postgresql.org
Backpatch-through: 18
2026-01-16 14:53:50 +09:00
Amit Langote
9cbb1d21d6 Fix segfault from releasing locks in detached DSM segments
If a FATAL error occurs while holding a lock in a DSM segment (such
as a dshash lock) and the process is not in a transaction, a
segmentation fault can occur during process exit.

The problem sequence is:

 1. Process acquires a lock in a DSM segment (e.g., via dshash)
 2. FATAL error occurs outside transaction context
 3. proc_exit() begins, calling before_shmem_exit callbacks
 4. dsm_backend_shutdown() detaches all DSM segments
 5. Later, on_shmem_exit callbacks run
 6. ProcKill() calls LWLockReleaseAll()
 7. Segfault: the lock being released is in unmapped memory

This only manifests outside transaction contexts because
AbortTransaction() calls LWLockReleaseAll() during transaction
abort, releasing locks before DSM cleanup. Background workers and
other non-transactional code paths are vulnerable.

Fix by calling LWLockReleaseAll() unconditionally at the start of
shmem_exit(), before any callbacks run. Releasing locks before
callbacks prevents the segfault - locks must be released before
dsm_backend_shutdown() detaches their memory. This is safe because
after an error, held locks are protecting potentially inconsistent
data anyway, and callbacks can acquire fresh locks if needed.

Also add a comment noting that LWLockReleaseAll() must be safe to
call before LWLock initialization (which it is, since
num_held_lwlocks will be 0), plus an Assert for the post-condition.

This fix aligns with the original design intent from commit
001a573a2, which noted that backends must clean up shared memory
state (including releasing lwlocks) before unmapping dynamic shared
memory segments.

Reported-by: Rahila Syed <rahilasyed90@gmail.com>
Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAH2L28uSvyiosL+kaic9249jRVoQiQF6JOnaCitKFq=xiFzX3g@mail.gmail.com
Backpatch-through: 14
2026-01-16 13:02:42 +09:00
Fujii Masao
b98cc4a14e pg_recvlogical: remove unnecessary OutputFsync() return value checks.
Commit 1e2fddfa33 changed OutputFsync() so that it always returns true.
However, pg_recvlogical.c still contained checks of its boolean return
value, which are now redundant.

This commit removes those checks and changes the type of return value of
OutputFsync() to void, simplifying the code.

Suggested-by: Yilin Zhang <jiezhilove@126.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFeTymZQ7RLvMU6WuDGar8bUQCazg=VOfA-9GeBkg-FzA@mail.gmail.com
2026-01-16 12:37:05 +09:00
Fujii Masao
d89b1d8175 Add test for pg_recvlogical reconnection behavior.
This commit adds a test to verify that data already received and flushed by
pg_recvlogical is not streamed again even after the connection is lost,
reestablished, and logical replication is restarted.

Author: Mircea Cadariu <cadariu.mircea@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFeTymZQ7RLvMU6WuDGar8bUQCazg=VOfA-9GeBkg-FzA@mail.gmail.com
2026-01-16 12:36:34 +09:00
Fujii Masao
0b10969db6 Add a new helper function wait_for_file() to Utils.pm.
wait_for_file() waits for the contents of a specified file, starting at an
optional offset, to match a given regular expression. If no offset is
provided, the entire file is checked. The function times out after
$PostgreSQL::Test::Utils::timeout_default seconds. It returns the total
file length on success.

The existing wait_for_log() function contains almost identical logic, but
is limited to reading the cluster's log file. This commit also refactors
wait_for_log() to call wait_for_file() instead, avoiding code duplication.

This helper will be used by upcoming changes.

Suggested-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFeTymZQ7RLvMU6WuDGar8bUQCazg=VOfA-9GeBkg-FzA@mail.gmail.com
2026-01-16 12:35:56 +09:00
Fujii Masao
41cbdab0ab pg_recvlogical: Prevent flushed data from being re-sent.
Previously, when pg_recvlogical lost connection, reconnected, and restarted
replication, data that had already been flushed could be streamed again.
This happened because the replication start position used when restarting
replication was taken from the last standby status message, which could be
older than the position of the last flushed data. As a result, some flushed
data newer than the replication start position could exist and be re-sent.

This commit fixes the issue by ensuring all written data is flushed to disk
before restarting replication, and by using the last flushed position as
the replication start point. This prevents already flushed data from being
re-sent.

Additionally, previously when the --no-loop option was used, pg_recvlogical
could exit without flushing written data, potentially losing data. To fix
this issue, this commit also ensures all data is flushed to disk before
exiting due to --no-loop.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Yilin Zhang <jiezhilove@126.com>
Reviewed-by: Dewei Dai <daidewei1970@163.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFeTymZQ7RLvMU6WuDGar8bUQCazg=VOfA-9GeBkg-FzA@mail.gmail.com
2026-01-16 12:35:26 +09:00
Michael Paquier
a7c63e4860 Fix stability issue with new TAP test of pg_createsubscriber
The test introduced in 639352d904 has added a direct pg_ctl command to
start a node, a method that is incompatible with the teardown() routine
used at the end of the test as the PID saved in the Cluster object would
prevent the node to be shut down.  This can ultimately prevent the test
to perform its cleanup, failing on timeout.

Like pg_ctl's 001_start_stop or ssl_passphrase_callback's 001_testfunc,
this commit changes the test so a direct pg_ctl command is used to stop
the rogue node.  That should be hopefully enough to cool down the
buildfarm.

Per report from buildfarm member fairywren, which is the only animal
that is showing this issue.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/TY7PR01MB1455452AE9053DD2B77B74FEAF58CA@TY7PR01MB14554.jpnprd01.prod.outlook.com
2026-01-16 12:12:26 +09:00
Michael Paquier
d756fa1019 Add pg_clear_extended_stats()
This function is able to clear the data associated to an extended
statistics object, making things so as the object looks as
newly-created.

The caller of this function needs the following arguments for the
extended stats to clear:
- The name of the relation.
- The schema name of the relation.
- The name of the extended stats object.
- The schema name of the extended stats object.
- If the stats are inherited or not.

The first two parameters are especially important to ensure a consistent
lookup and ACL checks for the relation on which is based the extended
stats object that will be cleared, relying first on a RangeVar lookup
where permissions are checked without locking a relation, critical to
prevent denial-of-service attacks when using this kind of function (see
also 688dc6299a for a similar concern).  The third to fifth arguments
give a way to target the extended stats records to clear.

This has been extracted from a larger patch by the same author, for a
piece which is again useful on its own.  I have rewritten large portions
of it.  The tests have been extended while discussing this piece,
resulting on what this commit includes.  The intention behind this
feature is to add support for the import of extended statistics across
dumps and upgrades, this change building one piece that we will be able
to rely on for the rest of the changes.

Bump catalog version.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
2026-01-16 08:13:30 +09:00
Andres Freund
d40fd85187 lwlock: Remove support for disowned lwlwocks
This reverts commit f8d7f29b3e, plus parts of
subsequent commits fixing a typo in a parameter name.

Support for disowned lwlocks was added for the benefit of AIO, to be able to
have content locks "owned" by the AIO subsystem. But as of commit fcb9c977aa,
content locks do not use lwlocks anymore.

It does not seem particularly likely that we need this facility outside of the
AIO use-case, therefore remove the now unused functions.

I did choose to keep the comment added in the aforementioned commit about
lock->owner intentionally being left pointing to the last owner.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/cj5mcjdpucvw4a54hehslr3ctukavrbnxltvuzzhqnimvpju5e@cy3g3mnsefwz
2026-01-15 14:57:45 -05:00
Andres Freund
55fbfb738b lwlock: Remove ForEachLWLockHeldByMe
As of commit fcb9c977aa, ForEachLWLockHeldByMe(), introduced in f4ece891fc,
is not used anymore, as content locks are now implemented in bufmgr.c.  It
doesn't seem that likely that a new user of the functionality will appear all
that soon, making removal of the function seem like the most sensible path. It
can easily be added back if necessary.

Discussion: https://postgr.es/m/lneuyxqxamqoayd2ntau3lqjblzdckw6tjgeu4574ezwh4tzlg%40noioxkquezdw
2026-01-15 14:57:45 -05:00
Andres Freund
335f2231a3 pgindent fix for 8077649907
Per buildfarm member koel.

Backpatch-through: 18
2026-01-15 14:57:45 -05:00
Andres Freund
fcb9c977aa bufmgr: Implement buffer content locks independently of lwlocks
Until now buffer content locks were implemented using lwlocks. That has the
obvious advantage of not needing a separate efficient implementation of
locks. However, the time for a dedicated buffer content lock implementation
has come:

1) Hint bits are currently set while holding only a share lock. This leads to
   having to copy pages while they are being written out if checksums are
   enabled, which is not cheap. We would like to add AIO writes, however once
   many buffers can be written out at the same time, it gets a lot more
   expensive to copy them, particularly because that copy needs to reside in
   shared buffers (for worker mode to have access to the buffer).

   In addition, modifying buffers while they are being written out can cause
   issues with unbuffered/direct-IO, as some filesystems (like btrfs) do not
   like that, due to filesystem internal checksums getting corrupted.

   The solution to this is to require a new share-exclusive lock-level to set
   hint bits and to write out buffers, making those operations mutually
   exclusive. We could introduce such a lock-level into the generic lwlock
   implementation, however it does not look like there would be other users,
   and it does add some overhead into important code paths.

2) For AIO writes we need to be able to race-freely check whether a buffer is
   undergoing IO and whether an exclusive lock on the page can be acquired. That
   is rather hard to do efficiently when the buffer state and the lock state
   are separate atomic variables. This is a major hindrance to allowing writes
   to be done asynchronously.

3) Buffer locks are by far the most frequently taken locks. Optimizing them
   specifically for their use case is worth the effort. E.g. by merging
   content locks into buffer locks we will be able to release a buffer lock
   and pin in one atomic operation.

4) There are more complicated optimizations, like long-lived "super pinned &
   locked" pages, that cannot realistically be implemented with the generic
   lwlock implementation.

Therefore implement content locks inside bufmgr.c. The lockstate is stored as
part of BufferDesc.state. The implementation of buffer content locks is fairly
similar to lwlocks, with a few important differences:

1) An additional lock-level share-exclusive has been added. This lock-level
   conflicts with exclusive locks and itself, but not share locks.

2) Error recovery for content locks is implemented as part of the already
   existing private-refcount tracking mechanism in combination with resowners,
   instead of a bespoke mechanism as the case for lwlocks. This means we do
   not need to add dedicated error-recovery code paths to release all content
   locks (like done with LWLockReleaseAll() for lwlocks).

3) The lock state is embedded in BufferDesc.state instead of having its own
   struct.

4) The wakeup logic is a tad more complicated due to needing to support the
   additional lock-level

This commit unfortunately introduces some code that is very similar to the
code in lwlock.c, however the code is not equivalent enough to easily merge
it. The future wins that this commit makes possible seem worth the cost.

As of this commit nothing uses the new share-exclusive lock mode. It will be
used in a future commit. It seemed too complicated to introduce the lock-level
in a separate commit.

It's worth calling out one wart in this commit: Despite content locks not
being lwlocks anymore, they continue to use PGPROC->lw* - that seemed better
than duplicating the relevant infrastructure.

Another thing worth pointing out is that, after this change, content locks are
not reported as LWLock wait events anymore, but as new wait events in the
"Buffer" wait event class (see also 6c5c393b74). The old BufferContent lwlock
tranche has been removed.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2026-01-15 14:26:53 -05:00
Andres Freund
dac328c8a6 bufmgr: Change BufferDesc.state to be a 64-bit atomic
This is motivated by wanting to merge buffer content locks into
BufferDesc.state in a future commit, rather than having a separate lwlock (see
commit c75ebc657f for more details). As this change is rather mechanical, it
seems to make sense to split it out into a separate commit, for easier review.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2026-01-15 14:20:41 -05:00
Tom Lane
282b1cde9d Optimize LISTEN/NOTIFY via shared channel map and direct advancement.
This patch reworks LISTEN/NOTIFY to avoid waking backends that have
no need to process the notification messages we just sent.

The primary change is to create a shared hash table that tracks
which processes are listening to which channels (where a "channel" is
defined by a database OID and channel name).  This allows a notifying
process to accurately determine which listeners are interested,
replacing the previous weak approximation that listeners in other
databases couldn't be interested.

Secondly, if a listener is known not to be interested and is
currently stopped at the old queue head, we avoid waking it at all
and just directly advance its queue pointer past the notifications
we inserted.

These changes permit very significant improvements (integer multiples)
in NOTIFY throughput, as well as a noticeable reduction in latency,
when there are many listeners but only a few are interested in any
specific message.  There is no improvement for the simplest case where
every listener reads every message, but any loss seems below the noise
level.

Author: Joel Jacobson <joel@compiler.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com
2026-01-15 14:12:15 -05:00
Heikki Linnakangas
23b25586dc Fix 'unexpected data beyond EOF' on replica restart
On restart, a replica can fail with an error like 'unexpected data
beyond EOF in block 200 of relation T/D/R'. These are the steps to
reproduce it:

- A relation has a size of 400 blocks.
  - Blocks 201 to 400 are empty.
  - Block 200 has two rows.
  - Blocks 100 to 199 are empty.
- A restartpoint is done
- Vacuum truncates the relation to 200 blocks
- A FPW deletes a row in block 200
- A checkpoint is done
- A FPW deletes the last row in block 200
- Vacuum truncates the relation to 100 blocks
- The replica restarts

When the replica restarts:

- The relation on disk starts at 100 blocks, because all the
  truncations were applied before restart.
- The first truncate to 200 blocks is replayed. It silently fails, but
  it will still (incorrectly!) update the cache size to 200 blocks
- The first FPW on block 200 is applied. XLogReadBufferForRead relies
  on the cached size and incorrectly assumes that the page already
  exists in the file, and thus won't extend the relation.
- The online checkpoint record is replayed, calling smgrdestroyall
  which causes the cached size to be discarded
- The second FPW on block 200 is applied. This time, the detected size
  is 100 blocks, an extend is attempted. However, the block 200 is
  already present in the buffer cache due to the first FPW. This
  triggers the 'unexpected data beyond EOF'.

To fix, update the cached size in SmgrRelation with the current size
rather than the requested new size, when the requested new size is
greater.

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://www.postgresql.org/message-id/CAO6_Xqrv-snNJNhbj1KjQmWiWHX3nYGDgAc=vxaZP3qc4g1Siw@mail.gmail.com
Backpatch-through: 14
2026-01-15 21:02:49 +02:00
Álvaro Herrera
35e3fae738 Remove #include <math.h> where not needed
Liujinyang reported the one in binaryheap.c, I then found and analyzed
the rest.

For future patches, we require git archaelogical analysis before we
accept patches of this nature.

Co-authored-by: liujinyang <21043272@qq.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/tencent_6B302BFCAF6F010E00AB5C2C0ECB7AA3F205@qq.com
2026-01-15 19:09:47 +01:00
Andres Freund
8077649907 aio: io_uring: Fix danger of completion getting reused before being read
We called io_uring_cqe_seen(..., cqe) before reading cqe->res. That allows the
completion to be reused, which in turn could lead to cqe->res being
overwritten. The window for that is very narrow and the likelihood of it
happening is very low, as we should never actually utilize all CQEs, but the
consequences would be bad.

This bug was reported to me privately.

Backpatch-through: 18
Discussion: https://postgr.es/m/bwo3e5lj2dgi2wzq4yvbyzu7nmwueczvvzioqsqo6azu6lm5oy@pbx75g2ach3p
2026-01-15 11:09:07 -05:00
Heikki Linnakangas
d9c3c94365 Wake up autovacuum launcher from postmaster when a worker exits
When an autovacuum worker exits, the launcher needs to be notified
with SIGUSR2, so that it can rebalance and possibly launch a new
worker. The launcher must be notified only after the worker has
finished ProcKill(), so that the worker slot is available for a new
worker. Before this commit, the autovacuum worker was responsible for
that, which required a slightly complicated dance to pass the
launcher's PID from FreeWorkerInfo() to ProcKill() in a global
variable.

Simplify that by moving the responsibility of the signaling to the
postmaster. The postmaster was already doing it when it failed to fork
a worker process, so it seems logical to make it responsible for
notifying the launcher on worker exit too. That's also how the
notification on background worker exit is done.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Discussion: https://www.postgresql.org/message-id/a5e27d25-c7e7-45d5-9bac-a17c8f462def@iki.fi
2026-01-15 18:02:25 +02:00
Heikki Linnakangas
102bdaa9be Add check for invalid offset at multixid truncation
If a multixid with zero offset is left behind after a crash, and that
multixid later becomes the oldest multixid, truncation might try to
look up its offset and read the zero value. In the worst case, we
might incorrectly use the zero offset to truncate valid SLRU segments
that are still needed. I'm not sure if that can happen in practice, or
if there are some other lower-level safeguards or incidental reasons
that prevent the caller from passing an unwritten multixid as the
oldest multi. But better safe than sorry, so let's add an explicit
check for it.

In stable branches, we should perhaps do the same check for
'oldestOffset', i.e. the offset of the old oldest multixid (in master,
'oldestOffset' is gone). But if the old oldest multixid has an invalid
offset, the damage has been done already, and we would never advance
past that point. It's not clear what we should do in that case. The
check that this commit adds will prevent such an multixid with invalid
offset from becoming the oldest multixid in the first place, which
seems enough for now.

Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: Discussion: https://www.postgresql.org/message-id/000301b2-5b81-4938-bdac-90f6eb660843@iki.fi
Backpatch-through: 14
2026-01-15 16:48:45 +02:00
Heikki Linnakangas
c4b71e6f60 Remove some unnecessary code from multixact truncation
With 64-bit multixact offsets, PerformMembersTruncation() doesn't need
the starting offset anymore. The 'oldestOffset' value that
TruncateMultiXact() calculates is no longer used for anything. Remove
it, and the code to calculate it.

'oldestOffset' was included in the WAL record as 'startTruncMemb',
which sounds nice if you e.g. look at the WAL with pg_waldump, but it
was also confusing because we didn't actually use the value for
determining what to truncate. Replaying the WAL would remove all
segments older than 'endTruncMemb', regardless of
'startTruncMemb'. The 'startTruncOff' stored in the WAL record was
similarly unnecessary even before 64-bit multixid offsets, it was
stored just for the sake of symmetry with 'startTruncMemb'. Remove
both from the WAL record, and rename the remaining 'endTruncOff' to
'oldestMulti' and 'endTruncMemb' to 'oldestOffset', for consistency
with the variable names used for them in other places.

Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://www.postgresql.org/message-id/000301b2-5b81-4938-bdac-90f6eb660843@iki.fi
2026-01-15 13:34:50 +02:00
Peter Eisentraut
da265a8717 plpython: Streamline initialization
The initialization of PL/Python (the Python interpreter, the global
state, the plpy module) was arranged confusingly across different
functions with unclear and confusing boundaries.  For example,
PLy_init_interp() said "Initialize the Python interpreter ..." but it
didn't actually do this, and PLy_init_plpy() said "initialize plpy
module" but it didn't do that either.  After this change, all the
global initialization is called directly from _PG_init(), and the plpy
module initialization is all called from its registered initialization
function PyInit_plpy().

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org
2026-01-15 12:11:52 +01:00
Peter Eisentraut
3263a893fb plpython: Remove duplicate PyModule_Create()
This seems to have existed like this since Python 3 support was
added (commit dd4cd55c15), but it's unclear what this second call is
supposed to accomplish.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org
2026-01-15 10:32:41 +01:00