1
0
mirror of https://github.com/postgres/postgres.git synced 2026-01-27 21:43:08 +03:00
Commit Graph

12639 Commits

Author SHA1 Message Date
Peter Eisentraut
a9bdb63bba Work around buggy alignas in older g++
Older g++ (<9.3) mishandle the alignas specifier (raise warnings that
the alignment is too large), but the more or less equivalent attribute
works.  So as a workaround, #define alignas to that attribute for
those versions.

see <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89357>

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/3119480.1769189606%40sss.pgh.pa.us
2026-01-25 11:32:47 +01:00
Dean Rasheed
b4307ae2e5 Fix trigger transition table capture for MERGE in CTE queries.
When executing a data-modifying CTE query containing MERGE and some
other DML operation on a table with statement-level AFTER triggers,
the transition tables passed to the triggers would fail to include the
rows affected by the MERGE.

The reason is that, when initializing a ModifyTable node for MERGE,
MakeTransitionCaptureState() would create a TransitionCaptureState
structure with a single "tcs_private" field pointing to an
AfterTriggersTableData structure with cmdType == CMD_MERGE. Tuples
captured there would then not be included in the sets of tuples
captured when executing INSERT/UPDATE/DELETE ModifyTable nodes in the
same query.

Since there are no MERGE triggers, we should only create
AfterTriggersTableData structures for INSERT/UPDATE/DELETE. Individual
MERGE actions should then use those, thereby sharing the same capture
tuplestores as any other DML commands executed in the same query.

This requires changing the TransitionCaptureState structure, replacing
"tcs_private" with 3 separate pointers to AfterTriggersTableData
structures, one for each of INSERT, UPDATE, and DELETE. Nominally,
this is an ABI break to a public structure in commands/trigger.h.
However, since this is a private field pointing to an opaque data
structure, the only way to create a valid TransitionCaptureState is by
calling MakeTransitionCaptureState(), and no extensions appear to be
doing that anyway, so it seems safe for back-patching.

Backpatch to v15, where MERGE was introduced.

Bug: #19380
Reported-by: Daniel Woelfel <dwwoelfel@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19380-4e293be2b4007248%40postgresql.org
Backpatch-through: 15
2026-01-24 11:30:48 +00:00
Jacob Champion
f7521bf721 pqcomm.h: Explicitly reserve protocol v3.1
Document this unused version alongside the other special protocol
numbers.

Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAOYmi%2BkKyw%3Dh-5NKqqpc7HC5M30_QmzFx3kgq2AdipyNj47nUw%40mail.gmail.com
2026-01-23 12:57:15 -08:00
Michael Paquier
a36164e746 Add WALRCV_CONNECTING state to the WAL receiver
Previously, a WAL receiver freshly started would set its state to
WALRCV_STREAMING immediately at startup, before actually establishing a
replication connection.

This commit introduces a new state called WALRCV_CONNECTING, which is
the state used when the WAL receiver freshly starts, or when a restart
is requested, with a switch to WALRCV_STREAMING once the connection to
the upstream server has been established with COPY_BOTH, meaning that
the WAL receiver is ready to stream changes.  This change is useful for
monitoring purposes, especially in environments with a high latency
where a connection could take some time to be established, giving some
room between the [re]start phase and the streaming activity.

From the point of view of the startup process, that flips the shared
memory state of the WAL receiver when it needs to be stopped, the
existing WALRCV_STREAMING and the new WALRCV_CONNECTING states have the
same semantics: the WAL receiver has started and it can be stopped.

Based on an initial suggestion from Noah Misch, with some input from me
about the design.

Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VQ5tGOSG5TS-Cg+Fb8gLCGFzxJ_eX4qg+WZ3ZPt=FtwQ@mail.gmail.com
2026-01-23 14:17:28 +09:00
Álvaro Herrera
69f98fce5b Make some use of anonymous unions [reloptions]
In the spirit of commit 4b7e6c73b0 and following, which see for more
details; it appears to have been quite an uncontroversial C11 feature to
use and it makes the code nicer to read.

This commit changes the relopt_value struct.

Author: Peter Eisentraut <peter@eisentraut.org>
Author: Álvaro Herrera <alvherre@kurilemu.de>
Note: Yes, this was written twice independently.
Discussion: https://postgr.es/m/202601192106.zcdi3yu2gzti@alvherre.pgsql
2026-01-22 17:04:59 +01:00
Peter Eisentraut
c257ba8397 Record range constructor functions in pg_range
When a range type is created, several construction functions are also
created, two for the range type and three for the multirange type.
These have an internal dependency, so they "belong" to the range type.
But there was no way to identify those functions when given a range
type.  An upcoming patch needs access to the two- or possibly the
three-argument range constructor function for a given range type.  The
only way to do that would be with fragile workarounds like matching
names and argument types.  The correct way to do that kind of thing is
to record to the links in the system catalogs.  This is what this
patch does, it records the OIDs of these five constructor functions in
the pg_range catalog.  (Currently, there is no code that makes use of
this.)

Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/7d63ddfa-c735-4dfe-8c7a-4f1e2a621058%40eisentraut.org
2026-01-22 15:56:29 +01:00
Nathan Bossart
25dc485074 Refactor some SIMD and popcount macros.
This commit does the following:

* Removes TRY_POPCNT_X86_64.  We now assume that the required CPUID
intrinsics are available when HAVE_X86_64_POPCNTQ is defined, as we
have done since v16 for meson builds when
USE_SSE42_CRC32C_WITH_RUNTIME_CHECK is defined and since v17 when
USE_AVX512_POPCNT_WITH_RUNTIME_CHECK is defined.

* Moves the MSVC check for HAVE_X86_64_POPCNTQ to configure-time.
This way, we set it for all relevant platforms in one place.

* Moves the #defines for USE_SSE2 and USE_NEON to c.h so that they
can be used elsewhere without including simd.h.  Consequently, we
can remove the POPCNT_AARCH64 macro.

* Moves the #includes for pg_bitutils.h to below the system headers
in pg_popcount_{aarch64,x86}.c, since we no longer depend on macros
from pg_bitutils.h to decide which system headers to use.

Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aWf_InS1VrbeXAfP%40nathan
2026-01-21 14:21:00 -06:00
Nathan Bossart
8c6653516c Rename "fast" and "slow" popcount functions.
Since we now have several implementations of the popcount
functions, let's give them more descriptive names.  This commit
replaces "slow" with "portable" and "fast" with "sse42".  While the
POPCNT instruction is technically not part of SSE4.2, this naming
scheme is close enough in practice and is arguably easier to
understand than using "popcnt" instead.

Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/aWf_InS1VrbeXAfP%40nathan
2026-01-21 14:21:00 -06:00
Nathan Bossart
79e232ca01 Move x86-64-specific popcount code to pg_popcount_x86.c.
This moves the remaining x86-64-specific popcount implementations
in pg_bitutils.c to pg_popcount_x86.c.

Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aWf_InS1VrbeXAfP%40nathan
2026-01-21 14:21:00 -06:00
Tom Lane
4576208454 Force standard_conforming_strings to always be ON.
Continuing to support this backwards-compatibility feature has
nontrivial costs; in particular it is potentially a security hazard
if an application somehow gets confused about which setting the
server is using.  We changed the default to ON fifteen years ago,
which seems like enough time for applications to have adapted.
Let's remove support for the legacy string syntax.

We should not remove the GUC altogether, since client-side code will
still test it, pg_dump scripts will attempt to set it to ON, etc.
Instead, just prevent it from being set to OFF.  There is precedent
for this approach (see commit de66987ad).

This patch does remove the related GUC escape_string_warning, however.
That setting does nothing when standard_conforming_strings is on,
so it's now useless.  We could leave it in place as a do-nothing
setting to avoid breaking clients that still set it, if there are any.
But it seems likely that any such client is also trying to turn off
standard_conforming_strings, so it'll need work anyway.

The client-side changes in this patch are pretty minimal, because even
though we are dropping the server's support, most of our clients still
need to be able to talk to older server versions.  We could remove
dead client code only once we disclaim compatibility with pre-v19
servers, which is surely years away.  One change of note is that
pg_dump/pg_dumpall now set standard_conforming_strings = on in their
source session, rather than accepting the source server's default.
This ensures that literals in view definitions and such will be
printed in a way that's acceptable to v19+.  In particular,
pg_upgrade will work transparently even if the source installation has
standard_conforming_strings = off.  (However, pg_restore will behave
the same as before if given an archive file containing
standard_conforming_strings = off.  Such an archive will not be safely
restorable into v19+, but we shouldn't break the ability to extract
valid data from it for use with an older server.)

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3279216.1767072538@sss.pgh.pa.us
2026-01-21 15:08:38 -05:00
Álvaro Herrera
4d6a66f675 Allow Boolean reloptions to have ternary values
From the user's point of view these are just Boolean values; from the
implementation side we can now distinguish an option that hasn't been
set.  Reimplement the vacuum_truncate reloption using this type.

This could also be used for reloptions vacuum_index_cleanup and
buffering, but those additionally need a per-option "alias" for the
state where the variable is unset (currently the value "auto").

Author: Nikolay Shaplov <dhyan@nataraj.su>
Reviewed-by: Timur Magomedov <t.magomedov@postgrespro.ru>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/3474141.usfYGdeWWP@thinkpad-pgpro
2026-01-21 20:06:01 +01:00
Tom Lane
cec5fe0d1e Remove useless flag PVC_INCLUDE_CONVERTROWTYPES.
This was introduced in the SJE patch (fc069a3a6), but it doesn't
do anything because pull_var_clause() never tests it.  Apparently
it snuck in from somebody's private fork.  Remove it again, but
only in HEAD -- seems best to let it be in v18.

Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/70008c19d22e3dd1565ca57f8436c0ba@postgrespro.ru
2026-01-21 13:26:19 -05:00
Peter Eisentraut
b4555cb070 Fix for C++ compatibility
After commit 476b35d4e3, some buildfarm members are complaining about
not recognizing _Noreturn when building the new C++ module
test_cplusplusext.  This is not a C++ feature, but it was gated like

    #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L
    #define pg_noreturn _Noreturn

But apparently that was not sufficient.  Some platforms define
__STDC_VERSION__ even in C++ mode.  (In this particular case, it was
g++ on Solaris, but apparently this is also done by some other
platforms, and it is allowed by the C++ standard.)  To fix, add a

    ... && !defined(__cplusplus)

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg@mail.gmail.com
2026-01-21 08:54:35 +01:00
John Naylor
7892e25924 Update some comments for fasthash
- Add advice about hashing multiple inputs with the incremental API
- Generalize statements that were specific to C strings to include
  all variable length inputs, where applicable.
- Update comments about the standalone functions and make it easy to
  find them.

Reported-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: zengman <zengman@halodbtech.com>
Discussion: https://postgr.es/m/CANWCAZZgKnf8dNOd_w03n88NqOfmMnMv2=D8_Oy6ADGyiMq+cg@mail.gmail.com
Discussion: https://postgr.es/m/CANWCAZa-2mEUY27xBw2TpsybpvVu3Ez4ABrHCBqZpAs_UDTj2Q@mail.gmail.com
2026-01-21 14:11:40 +07:00
Amit Kapila
48efefa6ca Improve errdetail for logical replication conflict messages.
This change enhances the clarity and usefulness of error detail messages
generated during logical replication conflicts. The following improvements
have been made:

1. Eliminate redundant output: Avoid printing duplicate remote row and
replica identity values for the multiple_unique_conflicts conflict type.
2. Improve message structure: Append tuple values directly to the main
error message, separated by a colon (:), for better readability.
3. Simplify local row terminology: Remove the word 'existing' when
referring to the local row, as this is already implied by context.
4. General code refinements: Apply miscellaneous code cleanups to improve
how conflict detail messages are constructed and formatted.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/CAHut+Psgkwy5-yGRJC15izecySGGysrbCszv_z93ess8XtCDOQ@mail.gmail.com
2026-01-21 04:58:03 +00:00
Michael Paquier
7ebb64c557 Add routine to free MCVList
This addition is in the same spirit as 32e27bd320 for MVNDistinct and
MVDependencies, except that we were missing a free routine for the third
type of extended statistics, MCVList.  I was not sure if we needed an
equivalent for MCVList, but after more review of the main patch set for
the import of extended statistics, it has become clear that we do.

This is introduced as its own change as this routine can be useful on
its own.  This one is a piece that has not been written by Corey
Huinker, I have just noticed it by myself on the way.

Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
2026-01-20 13:13:47 +09:00
Michael Paquier
d756fa1019 Add pg_clear_extended_stats()
This function is able to clear the data associated to an extended
statistics object, making things so as the object looks as
newly-created.

The caller of this function needs the following arguments for the
extended stats to clear:
- The name of the relation.
- The schema name of the relation.
- The name of the extended stats object.
- The schema name of the extended stats object.
- If the stats are inherited or not.

The first two parameters are especially important to ensure a consistent
lookup and ACL checks for the relation on which is based the extended
stats object that will be cleared, relying first on a RangeVar lookup
where permissions are checked without locking a relation, critical to
prevent denial-of-service attacks when using this kind of function (see
also 688dc6299a for a similar concern).  The third to fifth arguments
give a way to target the extended stats records to clear.

This has been extracted from a larger patch by the same author, for a
piece which is again useful on its own.  I have rewritten large portions
of it.  The tests have been extended while discussing this piece,
resulting on what this commit includes.  The intention behind this
feature is to add support for the import of extended statistics across
dumps and upgrades, this change building one piece that we will be able
to rely on for the rest of the changes.

Bump catalog version.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
2026-01-16 08:13:30 +09:00
Andres Freund
d40fd85187 lwlock: Remove support for disowned lwlwocks
This reverts commit f8d7f29b3e, plus parts of
subsequent commits fixing a typo in a parameter name.

Support for disowned lwlocks was added for the benefit of AIO, to be able to
have content locks "owned" by the AIO subsystem. But as of commit fcb9c977aa,
content locks do not use lwlocks anymore.

It does not seem particularly likely that we need this facility outside of the
AIO use-case, therefore remove the now unused functions.

I did choose to keep the comment added in the aforementioned commit about
lock->owner intentionally being left pointing to the last owner.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/cj5mcjdpucvw4a54hehslr3ctukavrbnxltvuzzhqnimvpju5e@cy3g3mnsefwz
2026-01-15 14:57:45 -05:00
Andres Freund
55fbfb738b lwlock: Remove ForEachLWLockHeldByMe
As of commit fcb9c977aa, ForEachLWLockHeldByMe(), introduced in f4ece891fc,
is not used anymore, as content locks are now implemented in bufmgr.c.  It
doesn't seem that likely that a new user of the functionality will appear all
that soon, making removal of the function seem like the most sensible path. It
can easily be added back if necessary.

Discussion: https://postgr.es/m/lneuyxqxamqoayd2ntau3lqjblzdckw6tjgeu4574ezwh4tzlg%40noioxkquezdw
2026-01-15 14:57:45 -05:00
Andres Freund
fcb9c977aa bufmgr: Implement buffer content locks independently of lwlocks
Until now buffer content locks were implemented using lwlocks. That has the
obvious advantage of not needing a separate efficient implementation of
locks. However, the time for a dedicated buffer content lock implementation
has come:

1) Hint bits are currently set while holding only a share lock. This leads to
   having to copy pages while they are being written out if checksums are
   enabled, which is not cheap. We would like to add AIO writes, however once
   many buffers can be written out at the same time, it gets a lot more
   expensive to copy them, particularly because that copy needs to reside in
   shared buffers (for worker mode to have access to the buffer).

   In addition, modifying buffers while they are being written out can cause
   issues with unbuffered/direct-IO, as some filesystems (like btrfs) do not
   like that, due to filesystem internal checksums getting corrupted.

   The solution to this is to require a new share-exclusive lock-level to set
   hint bits and to write out buffers, making those operations mutually
   exclusive. We could introduce such a lock-level into the generic lwlock
   implementation, however it does not look like there would be other users,
   and it does add some overhead into important code paths.

2) For AIO writes we need to be able to race-freely check whether a buffer is
   undergoing IO and whether an exclusive lock on the page can be acquired. That
   is rather hard to do efficiently when the buffer state and the lock state
   are separate atomic variables. This is a major hindrance to allowing writes
   to be done asynchronously.

3) Buffer locks are by far the most frequently taken locks. Optimizing them
   specifically for their use case is worth the effort. E.g. by merging
   content locks into buffer locks we will be able to release a buffer lock
   and pin in one atomic operation.

4) There are more complicated optimizations, like long-lived "super pinned &
   locked" pages, that cannot realistically be implemented with the generic
   lwlock implementation.

Therefore implement content locks inside bufmgr.c. The lockstate is stored as
part of BufferDesc.state. The implementation of buffer content locks is fairly
similar to lwlocks, with a few important differences:

1) An additional lock-level share-exclusive has been added. This lock-level
   conflicts with exclusive locks and itself, but not share locks.

2) Error recovery for content locks is implemented as part of the already
   existing private-refcount tracking mechanism in combination with resowners,
   instead of a bespoke mechanism as the case for lwlocks. This means we do
   not need to add dedicated error-recovery code paths to release all content
   locks (like done with LWLockReleaseAll() for lwlocks).

3) The lock state is embedded in BufferDesc.state instead of having its own
   struct.

4) The wakeup logic is a tad more complicated due to needing to support the
   additional lock-level

This commit unfortunately introduces some code that is very similar to the
code in lwlock.c, however the code is not equivalent enough to easily merge
it. The future wins that this commit makes possible seem worth the cost.

As of this commit nothing uses the new share-exclusive lock mode. It will be
used in a future commit. It seemed too complicated to introduce the lock-level
in a separate commit.

It's worth calling out one wart in this commit: Despite content locks not
being lwlocks anymore, they continue to use PGPROC->lw* - that seemed better
than duplicating the relevant infrastructure.

Another thing worth pointing out is that, after this change, content locks are
not reported as LWLock wait events anymore, but as new wait events in the
"Buffer" wait event class (see also 6c5c393b74). The old BufferContent lwlock
tranche has been removed.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2026-01-15 14:26:53 -05:00
Andres Freund
dac328c8a6 bufmgr: Change BufferDesc.state to be a 64-bit atomic
This is motivated by wanting to merge buffer content locks into
BufferDesc.state in a future commit, rather than having a separate lwlock (see
commit c75ebc657f for more details). As this change is rather mechanical, it
seems to make sense to split it out into a separate commit, for easier review.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2026-01-15 14:20:41 -05:00
Tom Lane
282b1cde9d Optimize LISTEN/NOTIFY via shared channel map and direct advancement.
This patch reworks LISTEN/NOTIFY to avoid waking backends that have
no need to process the notification messages we just sent.

The primary change is to create a shared hash table that tracks
which processes are listening to which channels (where a "channel" is
defined by a database OID and channel name).  This allows a notifying
process to accurately determine which listeners are interested,
replacing the previous weak approximation that listeners in other
databases couldn't be interested.

Secondly, if a listener is known not to be interested and is
currently stopped at the old queue head, we avoid waking it at all
and just directly advance its queue pointer past the notifications
we inserted.

These changes permit very significant improvements (integer multiples)
in NOTIFY throughput, as well as a noticeable reduction in latency,
when there are many listeners but only a few are interested in any
specific message.  There is no improvement for the simplest case where
every listener reads every message, but any loss seems below the noise
level.

Author: Joel Jacobson <joel@compiler.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com
2026-01-15 14:12:15 -05:00
Álvaro Herrera
35e3fae738 Remove #include <math.h> where not needed
Liujinyang reported the one in binaryheap.c, I then found and analyzed
the rest.

For future patches, we require git archaelogical analysis before we
accept patches of this nature.

Co-authored-by: liujinyang <21043272@qq.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/tencent_6B302BFCAF6F010E00AB5C2C0ECB7AA3F205@qq.com
2026-01-15 19:09:47 +01:00
Heikki Linnakangas
d9c3c94365 Wake up autovacuum launcher from postmaster when a worker exits
When an autovacuum worker exits, the launcher needs to be notified
with SIGUSR2, so that it can rebalance and possibly launch a new
worker. The launcher must be notified only after the worker has
finished ProcKill(), so that the worker slot is available for a new
worker. Before this commit, the autovacuum worker was responsible for
that, which required a slightly complicated dance to pass the
launcher's PID from FreeWorkerInfo() to ProcKill() in a global
variable.

Simplify that by moving the responsibility of the signaling to the
postmaster. The postmaster was already doing it when it failed to fork
a worker process, so it seems logical to make it responsible for
notifying the launcher on worker exit too. That's also how the
notification on background worker exit is done.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Discussion: https://www.postgresql.org/message-id/a5e27d25-c7e7-45d5-9bac-a17c8f462def@iki.fi
2026-01-15 18:02:25 +02:00
Heikki Linnakangas
c4b71e6f60 Remove some unnecessary code from multixact truncation
With 64-bit multixact offsets, PerformMembersTruncation() doesn't need
the starting offset anymore. The 'oldestOffset' value that
TruncateMultiXact() calculates is no longer used for anything. Remove
it, and the code to calculate it.

'oldestOffset' was included in the WAL record as 'startTruncMemb',
which sounds nice if you e.g. look at the WAL with pg_waldump, but it
was also confusing because we didn't actually use the value for
determining what to truncate. Replaying the WAL would remove all
segments older than 'endTruncMemb', regardless of
'startTruncMemb'. The 'startTruncOff' stored in the WAL record was
similarly unnecessary even before 64-bit multixid offsets, it was
stored just for the sake of symmetry with 'startTruncMemb'. Remove
both from the WAL record, and rename the remaining 'endTruncOff' to
'oldestMulti' and 'endTruncMemb' to 'oldestOffset', for consistency
with the variable names used for them in other places.

Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://www.postgresql.org/message-id/000301b2-5b81-4938-bdac-90f6eb660843@iki.fi
2026-01-15 13:34:50 +02:00
Michael Paquier
32e27bd320 Introduce routines to validate and free MVNDistinct and MVDependencies
These routines are useful to perform some basic validation checks on
each object structure, working currently on attribute numbers for
non-expression and expression attnums.  These checks could be extended
in the future.

Note that this code is not used yet in the tree, and that these
functions will become handy for an upcoming patch for the import of
extended statistics data.  However, they are worth their own independent
change as they are actually useful by themselves, with at least the
extension code argument in mind (or perhaps I am just feeling more
pedantic today).

Extracted from a larger patch by the same author, with many adjustments
and fixes by me.

Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
2026-01-15 09:36:05 +09:00
Peter Eisentraut
fa16e7fd84 Revert "Replace pg_restrict by standard restrict"
This reverts commit f0f2c0c1ae.

The original problem that led to the use of pg_restrict was that MSVC
couldn't handle plain restrict, and defining it to something else
would conflict with its __declspec(restrict) that is used in system
header files.  In C11 mode, this is no longer a problem, as MSVC
handles plain restrict.  This led to the commit to replace pg_restrict
with restrict.  But this did not take C++ into account.  Standard C++
does not have restrict, so we defined it as something else (for
example, MSVC supports __restrict).  But this then again conflicts
with __declspec(restrict) in system header files.  So we have to
revert this attempt.  The comments are updated to clarify that the
reason for this is now C++ only.

Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/CAGECzQRoD7chJP1-dneSrhxUJv%2BBRcigoGOO4UwGzaShLot2Yw%40mail.gmail.com
2026-01-14 15:12:25 +01:00
Andres Freund
ff219c1987 bufmgr: Make definitions related to buffer descriptor easier to modify
This is in preparation to widening the buffer state to 64 bits, which in turn
is preparation for implementing content locks in bufmgr. This commit aims to
make the subsequent commits a bit easier to review, by separating out
reformatting etc from the actual changes.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/4csodkvvfbfloxxjlkgsnl2lgfv2mtzdl7phqzd4jxjadxm4o5@usw7feyb5bzf
2026-01-13 19:38:29 -05:00
Michael Paquier
e217dc7484 Fix query jumbling with GROUP BY clauses
RangeTblEntry.groupexprs was marked with the node attribute
query_jumble_ignore, causing a list of GROUP BY expressions to be
ignored during the query jumbling.  For example, these two queries could
be grouped together within the same query ID:
SELECT count(*) FROM t GROUP BY a;
SELECT count(*) FROM t GROUP BY b;

However, as such queries use different GROUP BY clauses, they should be
split across multiple entries.

This fixes an oversight in 247dea89f7, that has introduced an RTE for
GROUP BY clauses.  Query IDs are documented as being stable across minor
releases, but as this is a regression new to v18 and that we are still
early in its support cycle, a backpatch is exceptionally done as this
has broken a behavior that exists since query jumbling is supported in
core, since its introduction in pg_stat_statements.

The tests of pg_stat_statements are expanded to cover this area, with
patterns involving GROUP BY and GROUPING clauses.

Author: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxEy2W+tCqC7XuJ94r3ivWsM=onKJp94kRFx3hoARjBeFQ@mail.gmail.com
Backpatch-through: 18
2026-01-14 08:44:12 +09:00
Andres Freund
0b96e734c5 heapam: Add batch mode mvcc check and use it in page mode
There are two reasons for doing so:

1) It is generally faster to perform checks in a batched fashion and making
   sequential scans faster is nice.

2) We would like to stop setting hint bits while pages are being written
   out. The necessary locking becomes visible for page mode scans, if done for
   every tuple. With batching, the overhead can be amortized to only happen
   once per page.

There are substantial further optimization opportunities along these
lines:

- Right now HeapTupleSatisfiesMVCCBatch() simply uses the single-tuple
  HeapTupleSatisfiesMVCC(), relying on the compiler to inline it. We could
  instead write an explicitly optimized version that avoids repeated xid
  tests.

- Introduce batched version of the serializability test

- Introduce batched version of HeapTupleSatisfiesVacuum

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/6rgb2nvhyvnszz4ul3wfzlf5rheb2kkwrglthnna7qhe24onwr@vw27225tkyar
2026-01-12 13:22:04 -05:00
Álvaro Herrera
225d1df1d2 Stop including {brin,gin}_tuple.h in tuplesort.h
Doing this meant that those two headers, which are supposed to be
internal to their corresponding index AMs, were being included pretty
much universally, because tuplesort.h is included by execnodes.h which
is very widely used.  Stop that, and fix fallout.

We also change indexing.h to no longer include execnodes.h (tuptable.h
is sufficient), and relscan.h to no longer include buf.h (pointless
since c2fe139c20).

Author: Mario González <gonzalemario@gmail.com>
Discussion: https://postgr.es/m/CAFsReFUcBFup=Ohv_xd7SNQ=e73TXi8YNEkTsFEE2BW7jS1noQ@mail.gmail.com
2026-01-12 18:09:49 +01:00
Álvaro Herrera
2defd00062 Move instrumentation-related structs to instrument_node.h
Some structs and enums related to parallel query instrumentation had
organically grown scattered across various files, and were causing
header pollution especially through execnodes.h.  Create a single file
where they can live together.

This only moves the structs to the new file; cleaning up the pollution
by removing no-longer-necessary cross-header inclusion will be done in
future commits.

Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: Mario González <gonzalemario@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/202510051642.wwmn4mj77wch@alvherre.pgsql
Discussion: https://postgr.es/m/CAFsReFUr4KrQ60z+ck9cRM4WuUw1TCghN7EFwvV0KvuncTRc2w@mail.gmail.com
2026-01-12 16:59:28 +01:00
Andres Freund
e5a5e0a907 instrumentation: Keep time fields as instrtime, convert in callers
Previously the instrumentation logic always converted to seconds, only for
many of the callers to do unnecessary division to get to milliseconds. As an
upcoming refactoring will split the Instrumentation struct, utilize instrtime
always to keep things simpler. It's also a bit faster to not have to first
convert to a double in functions like InstrEndLoop(), InstrAggNode().

Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53PkzZ3UotnRrrnXWAv=F4avRq9MQ8zU+bxoN9tpovEu6fGQ@mail.gmail.com
2026-01-09 13:38:00 -05:00
Heikki Linnakangas
bba81f9d3d Inline ginCompareAttEntries for speed
It is called in tight loops during GIN index build.

Author: David Geier <geidav.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/5d366878-2007-4d31-861e-19294b7a583b@gmail.com
2026-01-09 20:31:43 +02:00
Peter Eisentraut
69d76fb2ab Decouple C++ support in Meson's PGXS from LLVM enablement
This is important for Postgres extensions that are written in C++,
such as pg_duckdb, which uses PGXS as the build system currently.  In
the autotools build, C++ is not coupled to LLVM.  If the autotools
build is configured without --with-llvm, the C++ compiler and the
various flags get persisted into the Makefile.global.

Author: Tristan Partin <tristan@partin.io>
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/D98JHQF7H2A8.VSE3I4CJBTAB%40partin.io
2026-01-09 10:25:02 +01:00
Tom Lane
b352d3d80b Mark GiST inet_ops as opcdefault, and deal with ensuing fallout.
This patch completes the transition to making inet_ops be default for
inet/cidr columns, rather than btree_gist's opclasses.  Once we do
that, though, pg_upgrade has a big problem.  A dump from an older
version will see btree_gist's opclasses as being default, so it will
not mention the opclass explicitly in CREATE INDEX commands, which
would cause the restore to create the indexes using inet_ops.  Since
that's not compatible with what's actually in the files, havoc would
ensue.

This isn't readily fixable, because the CREATE INDEX command strings
are built by the older server's pg_get_indexdef() function; pg_dump
hasn't nearly enough knowledge to modify those strings successfully.
Even if we cared to put in the work to make that happen in pg_dump,
it would be counterproductive because the end goal here is to get
people off of these opclasses.  Allowing such indexes to persist
through pg_upgrade wouldn't advance that goal.

Therefore, this patch just adds code to pg_upgrade to detect indexes
that would be problematic and refuse to upgrade.

There's another issue too: even without any indexes to worry about,
pg_dump in binary-upgrade mode will reproduce the "CREATE OPERATOR
CLASS ... DEFAULT" commands for btree_gist's opclasses, and those
will fail because now we have a built-in opclass that provides a
conflicting default.  We could ask users to drop the btree_gist
extension altogether before upgrading, but that would carry very
severe penalties.  It would affect perfectly-valid indexes for other
data types, and it would drop operators that might be relied on in
views or other database objects.  Instead, put a hack in DefineOpClass
to ignore the DEFAULT clauses for these opclasses when in
binary-upgrade mode.  This will result in installing a version of
btree_gist that isn't quite the version it claims to be, but that can
be fixed by issuing ALTER EXTENSION UPDATE afterwards.

Since we don't apply that hack when not in binary-upgrade mode,
it is now impossible to install any version of btree_gist
less than 1.9 via CREATE EXTENSION.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/2483812.1754072263@sss.pgh.pa.us
2026-01-08 14:03:56 -05:00
Heikki Linnakangas
ad853bb877 Fix misc typos, mostly in comments
The only user-visible change is the fix in the "malformed
pg_dependencies" error detail. That one is new in commit e1405aa5e3,
so no backpatching required.
2026-01-08 18:10:08 +02:00
Peter Eisentraut
5e7abdac99 strnlen() is now required
Remove all configure checks and workarounds for strnlen() missing.  It
is required by POSIX 2008.

Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/98ce805c-6103-421b-adc3-fcf8f3dddbe3%40eisentraut.org
2026-01-08 08:51:20 +01:00
Nathan Bossart
a516b3f00d MSVC: Support building for AArch64.
This commit does the following to get tests passing for
MSVC/AArch64:

* Implements spin_delay() with an ISB instruction (like we do for
gcc/clang on AArch64).

* Sets USE_ARMV8_CRC32C unconditionally.  Vendor-supported versions
of Windows for AArch64 require at least ARMv8.1, which is where CRC
extension support became mandatory.

* Implements S_UNLOCK() with _InterlockedExchange().  The existing
implementation for MSVC uses _ReadWriteBarrier() (a compiler
barrier), which is insufficient for this purpose on non-TSO
architectures.

There are likely other changes required to take full advantage of
the hardware (e.g., atomics/arch-arm.h, simd.h,
pg_popcount_aarch64.c), but those can be dealt with later.

Author: Niyas Sait <niyas.sait@linaro.org>
Co-authored-by: Greg Burd <greg@burd.me>
Co-authored-by: Dave Cramer <davecramer@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Tested-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/A6152C7C-F5E3-4958-8F8E-7692D259FF2F%40greg.burd.me
Discussion: https://postgr.es/m/CAFPTBD-74%2BAEuN9n7caJ0YUnW5A0r-KBX8rYoEJWqFPgLKpzdg%40mail.gmail.com
2026-01-07 13:42:57 -06:00
Michael Paquier
b139bd3b6e Add data type oid8, 64-bit unsigned identifier
This new identifier type provides support for 64-bit unsigned values,
to be used in catalogs, like OIDs.  An advantage of a new data type is
that it becomes easier to grep for it in the code when assigning this
type to a catalog attribute, linking it to dedicated APIs and internal
structures.

The following operators are added in this commit, with dedicated tests:
- Casts with integer types and OID.
- btree and hash operators
- min/max functions.
- C type with related macros and defines, named around "Oid8".

This has been mentioned as useful on its own on the thread to add
support for 64-bit TOAST values, so as it becomes possible to attach
this data type to the TOAST code and catalog definitions.  However, as
this concept can apply to many more areas, it is implemented as its own
independent change.  This is based on a discussion with Andres Freund
and Tom Lane.

Bump catalog version.

Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com>
Discussion: https://postgr.es/m/1891064.1754681536@sss.pgh.pa.us
2026-01-07 11:37:00 +09:00
Jeff Davis
af2d4ca191 Clean up ICU includes.
Remove ICU includes from pg_locale.h, and instead include them in the
few C files that need ICU. Clean up a few other includes in passing.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/48911db71d953edec66df0d2ce303563d631fbe0.camel@j-davis.com
2026-01-06 17:19:51 -08:00
Jeff Davis
c4ff35f104 ICU: use UTF8-optimized case conversion API
Initializes a UCaseMap object once for use across calls, and uses
UTF8-optimized APIs.

Author: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: zengman <zengman@halodbtech.com>
Discussion: https://postgr.es/m/5a010b27-8ed9-4739-86fe-1562b07ba564@proxel.se
2026-01-06 14:09:07 -08:00
Michael Paquier
f1e251be80 Allow bgworkers to be terminated for database-related commands
Background workers gain a new flag, called BGWORKER_INTERRUPTIBLE, that
offers the possibility to terminate the workers when these are connected
to a database that is involved in one of the following commands:
ALTER DATABASE RENAME TO
ALTER DATABASE SET TABLESPACE
CREATE DATABASE
DROP DATABASE

This is useful to give background workers the same behavior as backends
and autovacuum workers, which are stopped when these commands are
executed.  The default behavior, that exists since 9.3, is still to
never terminate bgworkers connected to the database involved in any of
these commands.  The new flag has to be set to terminate the workers.

A couple of tests are added to worker_spi to track the commands that
impact the termination of the workers.  There is a test case for a
non-interruptible worker, additionally, that relies on an injection
point to make the wait time in CountOtherDBBackends() reduced from 5s to
0.3s for faster test runs.  The tests rely on the contents of the server
logs to check if a worker has been started or terminated:
- LOG generated by worker_spi_main() at startup, once connection to
database is done.
- FATAL in bgworker_die() when terminated.
A couple of tests run in the CI have showed that this method is stable
enough.  The safe_psql() calls that scan pg_stat_activity could be
replaced with some poll_query_until() for more stability, if the current
method proves to be an issue in the buildfarm.

Author: Aya Iwata <iwata.aya@fujitsu.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/OS7PR01MB11964335F36BE41021B62EAE8EAE4A@OS7PR01MB11964.jpnprd01.prod.outlook.com
2026-01-06 14:24:29 +09:00
David Rowley
c5af141cd4 Clarify where various catcache.h dlist_nodes are used
Also remove a comment which mentions we don't currently divide the
per-cache lists into hash buckets.  Since 473182c95, we do.

Author: ChangAo Chen <cca5507@qq.com>
Discussion: https://postgr.es/m/tencent_7732789707C8768EA13785A7B5EA29103208@qq.com
2026-01-06 14:39:36 +13:00
Tom Lane
7dc95cc3b9 Update to latest Snowball sources.
It's been almost a year since we last did this, and upstream has
been busy.  They've added stemmers for Polish and Esperanto,
and also deprecated their old Dutch stemmer in favor of the
Kraaij-Pohlmann algorithm.  (The "dutch" stemmer is now the
latter, and "dutch_porter" is the old algorithm.)

Upstream also decided to rename their internal header "header.h"
to something less generic: "snowball_runtime.h".  Seems like a good
thing, but it complicates this patch a bit because we were relying on
interposing our own version of "header.h" to control system header
inclusion order.  (We're partially failing at that now, because now the
generated stemmer files include <stddef.h> before snowball_runtime.h.
I think that'll be okay, but if the buildfarm complains then we'll
have to do more-extensive editing of the generated files.)

I realized that we weren't documenting the available stemmers in
any user-visible place, except indirectly through sample \dFd output.
That's incomplete because we only provide built-in dictionaries for
the recommended stemmers for each language, not alternative stemmers
such as dutch_porter.  So I added a list to the documentation.

I did not do anything with the stopword lists.  If those are still
available from snowballstem.org, they are mighty well hidden.

Discussion: https://postgr.es/m/1185975.1767569534@sss.pgh.pa.us
2026-01-05 15:22:37 -05:00
Alexander Korotkov
7a39f43d88 Extend xlogwait infrastructure with write and flush wait types
Add support for waiting on WAL write and flush LSNs in addition to the
existing replay LSN wait type. This provides the foundation for
extending the WAIT FOR command with MODE parameter.

Key changes are following.
- Add WAIT_LSN_TYPE_STANDBY_WRITE and WAIT_LSN_TYPE_STANDBY_FLUSH to
  WaitLSNType.
- Add GetCurrentLSNForWaitType() to retrieve the current LSN for each wait
  type.
- Add new wait events WAIT_EVENT_WAIT_FOR_WAL_WRITE and
  WAIT_EVENT_WAIT_FOR_WAL_FLUSH for pg_stat_activity visibility.
- Update WaitForLSN() to use GetCurrentLSNForWaitType() internally.

Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>
2026-01-05 19:56:19 +02:00
Michael Paquier
7d854bdc5b Remove unneeded defines from pg_config.h.in
This commit removes HAVE_ATOMIC_H and HAVE_MBARRIER_H from
pg_config.h.in, cleanup that could have been done in 25f36066dd.

Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/b2c0d0b7-3944-487d-a03d-d155851958ff@gmail.com
2026-01-05 09:27:19 +09:00
Michael Paquier
b8cfcb9e00 Fix typos and inconsistencies in code and comments
This change is a cocktail of harmonization of function argument names,
grammar typos, renames for better consistency and unused code (see
ltree).  All of these have been spotted by the author.

Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/b2c0d0b7-3944-487d-a03d-d155851958ff@gmail.com
2026-01-05 09:19:15 +09:00
Tom Lane
ba75f71752 Include error location in errors from ComputeIndexAttrs().
Make use of IndexElem's new location field to localize these
errors better.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxH3OgXF1hrzGAaWyNtye2jHEmk9JbtrtGv-KJK6tsGo5w@mail.gmail.com
2026-01-04 14:16:20 -05:00
Tom Lane
62299bbd90 Add parse location to IndexElem.
This patch mostly just fills in the field, although a few error
reports in resolve_unique_index_expr() are adjusted to use it.
The next commit will add more uses.

catversion bump out of an abundance of caution: I'm not sure
IndexElem can appear in stored rules, but I'm not sure it can't
either.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxH3OgXF1hrzGAaWyNtye2jHEmk9JbtrtGv-KJK6tsGo5w@mail.gmail.com
Discussion: https://postgr.es/m/202512121327.f2zimsr6guso@alvherre.pgsql
2026-01-04 14:16:20 -05:00