1
0
mirror of https://github.com/postgres/postgres.git synced 2025-12-21 05:21:08 +03:00
Commit Graph

1699 Commits

Author SHA1 Message Date
Peter Eisentraut
9d115b9e11 Message wording improvements
Use "row" instead of "tuple" for user-facing information for
logical replication conflicts.
2025-08-25 22:59:00 +02:00
Michael Paquier
ea1c6b0b0a Fix assertion failure with replication slot release in single-user mode
Some replication slot manipulations (logical decoding via SQL,
advancing) were failing an assertion when releasing a slot in
single-user mode, because active_pid was not set in a ReplicationSlot
when its slot is acquired.

ReplicationSlotAcquire() has some logic to be able to work with the
single-user mode.  This commit sets ReplicationSlot->active_pid to
MyProcPid, to let the slot-related logic fall-through, considering the
single process as the one holding the slot.

Some TAP tests are added for various replication slot functions with the
single-user mode, while on it, for slot creation, drop, advancing, copy
and logical decoding with multiple slot types (temporary, physical vs
logical).  These tests are skipped on Windows, as direct calls of
postgres --single would fail on permission failures.  There is no
platform-specific behavior that needs to be checked, so living with this
restriction should be fine.  The CI is OK with that, now let's see what
the buildfarm tells.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Mutaamba Maasha <maasha@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966ED588A0328DAEBE8CB25F5FA2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 13
2025-08-20 15:00:08 +09:00
Amit Kapila
a5d4c04150 Fix self-deadlock during DROP SUBSCRIPTION.
The DROP SUBSCRIPTION command performs several operations: it stops the
subscription workers, removes subscription-related entries from system
catalogs, and deletes the replication slot on the publisher server.
Previously, this command acquired an AccessExclusiveLock on
pg_subscription before initiating these steps.

However, while holding this lock, the command attempts to connect to the
publisher to remove the replication slot. In cases where the connection is
made to a newly created database on the same server as subscriber, the
cache-building process during connection tries to acquire an
AccessShareLock on pg_subscription, resulting in a self-deadlock.

To resolve this issue, we reduce the lock level on pg_subscription during
DROP SUBSCRIPTION from AccessExclusiveLock to RowExclusiveLock. Earlier,
the higher lock level was used to prevent the launcher from starting a new
worker during the drop operation, as a restarted worker could become
orphaned.

Now, instead of relying on a strict lock, we acquire an AccessShareLock on
the specific subscription being dropped and re-validate its existence
after acquiring the lock. If the subscription is no longer valid, the
worker exits gracefully. This approach avoids the deadlock while still
ensuring that orphan workers are not created.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/18988-7312c868be2d467f@postgresql.org
2025-08-19 05:18:24 +00:00
Fujii Masao
2d81a246f4 Avoid unexpected shutdown when sync_replication_slots is enabled.
Previously, enabling sync_replication_slots while wal_level was not set
to logical could cause the server to shut down. This was because
the postmaster performed a configuration check before launching
the slot synchronization worker and raised an ERROR if the settings
were incompatible. Since ERROR is treated as FATAL in the postmaster,
this resulted in the entire server shutting down unexpectedly.

This commit changes the postmaster to log that message with a LOG-level
instead of raising an ERROR, allowing the server to continue running
even with the misconfiguration.

Back-patch to v17, where slot synchronization was introduced.

Reported-by: Hugo DUBOIS <hdubois@scaleway.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Hugo DUBOIS <hdubois@scaleway.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CAH0PTU_pc3oHi__XESF9ZigCyzai1Mo3LsOdFyQA4aUDkm01RA@mail.gmail.com
Backpatch-through: 17
2025-08-04 20:52:59 +09:00
Michael Paquier
11de339aad Fix use-after-free with INSERT ON CONFLICT changes in reorderbuffer.c
In ReorderBufferProcessTXN(), used to send the data of a transaction to
an output plugin, INSERT ON CONFLICT changes (INTERNAL_SPEC_INSERT) are
delayed until a confirmation record arrives (INTERNAL_SPEC_CONFIRM),
updating the change being processed.

8c58624df4 has added an extra step after processing a change to update
the progress of the transaction, by calling the callback
update_progress_txn() based on the LSN stored in a change after a
threshold of CHANGES_THRESHOLD (100) is reached.  This logic has missed
the fact that for an INSERT ON CONFLICT change the data is freed once
processed, hence update_progress_txn() could be called pointing to a LSN
value that's already been freed.  This could result in random crashes,
depending on the workload.

Per discussion, this issue is fixed by reusing in update_progress_txn()
the LSN from the change processed found at the beginning of the loop,
meaning that for a INTERNAL_SPEC_CONFIRM change the progress is updated
using the LSN of the INTERNAL_SPEC_CONFIRM change, and not the LSN from
its INTERNAL_SPEC_INSERT change.  This is actually more correct, as we
want to update the progress to point to the INTERNAL_SPEC_CONFIRM
change.

Masahiko Sawada has found a nice trick to reproduce the issue: hardcode
CHANGES_THRESHOLD at 1 and run test_decoding (test "ddl" being enough)
on an instance running valgrind.  The bug has been analyzed by Ethan
Mertz, who also originally suggested the solution used in this patch.

Issue introduced by 8c58624df4, so backpatch down to v16.

Author: Ethan Mertz <ethan.mertz@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aIsQqDZ7x4LAQ6u1@paquier.xyz
Backpatch-through: 16
2025-08-02 17:08:48 +09:00
Amit Kapila
d9f01a287a Fix a deadlock during ALTER SUBSCRIPTION ... DROP PUBLICATION.
A deadlock can occur when the DDL command and the apply worker acquire
catalog locks in different orders while dropping replication origins.

The issue is rare in PG16 and higher branches because, in most cases, the
tablesync worker performs the origin drop in those branches, and its
locking sequence does not conflict with DDL operations.

This patch ensures consistent lock acquisition to prevent such deadlocks.

As per buildfarm.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/bab95e12-6cc5-4ebb-80a8-3e41956aa297@gmail.com
2025-08-01 07:46:22 +00:00
Robert Haas
44e135ad57 Avoid throwing away the error message in syncrep_yyerror.
Commit 473a575e05 purported to make this
function stash the error message in *syncrep_parse_result_p, but
it didn't actually.

As a result, an attempt to set synchronous_standby_names to any value
that does not parse resulted in a generic "parser failed." message
rather than anything more specific. This fixes that.

Discussion: http://postgr.es/m/CA+TgmoYF9wPNZ-Q_EMfib_espgHycY-eX__6Tzo2GpYpVXqCdQ@mail.gmail.com
Backpatch-through: 18
2025-07-28 10:57:10 -04:00
Amit Kapila
f36e577451 Fix the handling of two GUCs during upgrade.
Previously, the check_hook functions for max_slot_wal_keep_size and
idle_replication_slot_timeout would incorrectly raise an ERROR for values
set in postgresql.conf during upgrade, even though those values were not
actively used in the upgrade process.

To prevent logical slot invalidation during upgrade, we used to set
special values for these GUCs. Now, instead of relying on those values, we
directly prevent WAL removal and logical slot invalidation caused by
max_slot_wal_keep_size and idle_replication_slot_timeout.

Note: PostgreSQL 17 does not include the idle_replication_slot_timeout
GUC, so related changes were not backported.

BUG #18979
Reported-by: jorsol <jorsol@gmail.com>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed by: vignesh C <vignesh21@gmail.com>
Reviewed by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/219561.1751826409@sss.pgh.pa.us
Discussion: https://postgr.es/m/18979-a1b7fdbb7cd181c6@postgresql.org
2025-07-11 10:28:29 +05:30
Fujii Masao
afb64a56d9 doc: Clarify meaning of "idle" in idle_replication_slot_timeout.
This commit updates the documentation to clarify that "idle" in
idle_replication_slot_timeout means the replication slot is inactive,
that is, not currently used by any replication connection.

Without this clarification, "idle" could be misinterpreted to mean
that the slot is not advancing or that no data is being streamed,
even if a connection exists.

Back-patch to v18 where idle_replication_slot_timeout was added.

Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Gunnar Morling <gunnar.morling@googlemail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CADGJaX_0+FTguWpNSpgVWYQP_7MhoO0D8=cp4XozSQgaZ40Odw@mail.gmail.com
Backpatch-through: 18
2025-07-11 08:45:56 +09:00
Fujii Masao
37c76aeb9a Change unit of idle_replication_slot_timeout to seconds.
Previously, the idle_replication_slot_timeout parameter used minutes
as its unit, based on the assumption that values would typically exceed
one minute in production environments. However, this caused unexpected
behavior: specifying a value below 30 seconds would round down to 0,
effectively disabling the timeout. This could be surprising to users.

To allow finer-grained control and avoid such confusion, this commit changes
the unit of idle_replication_slot_timeout to seconds. Larger values can
still be specified easily using standard time suffixes, for example,
'24h' for 24 hours.

Back-patch to v18 where idle_replication_slot_timeout was added.

Reported-by: Gunnar Morling <gunnar.morling@googlemail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CADGJaX_0+FTguWpNSpgVWYQP_7MhoO0D8=cp4XozSQgaZ40Odw@mail.gmail.com
Backpatch-through: 18
2025-07-11 08:42:16 +09:00
Peter Eisentraut
50fd428b2b Message style improvements 2025-06-28 19:18:06 +02:00
Alexander Korotkov
7195c804bd Fix CheckPointReplicationSlots() with max_replication_slots == 0
ca307d5cec made CheckPointReplicationSlots() unconditionally call
ReplicationSlotsComputeRequiredLSN().  It causes an assertion trap when
max_replication_slots equals 0.  This commit makes
CheckPointReplicationSlots() call ReplicationSlotsComputeRequiredLSN() only
when at least one slot gets its last_saved_restart_lsn updated.  That avoids
an assert trap and also saves some cycles when no one slot has
last_saved_restart_lsn updated.

Based on ideas from Dilip Kumar <dilipbalaut@gmail.com> and
Hayato Kuroda <kuroda.hayato@fujitsu.com>.

Reported-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BB506AF934376FF3A8BB947BA%40OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-06-27 11:49:00 +03:00
Tom Lane
fd519419c9 Prevent excessive delays before launching new logrep workers.
The logical replication launcher process would sometimes sleep
for as much as 3 minutes before noticing that it is supposed
to launch a new worker.  This could happen if
(1) WaitForReplicationWorkerAttach absorbed a process latch wakeup
that was meant to cause ApplyLauncherMain to do work, or
(2) logicalrep_worker_launch reported failure, either because of
resource limits or because the new worker terminated immediately.

In case (2), the expected behavior is that we retry the launch after
wal_retrieve_retry_interval, but that didn't reliably happen.

It's not clear how often such conditions would occur in the field,
but in our subscription test suite they are somewhat common,
especially in tests that exercise cases that cause quick worker
failure.  That causes the tests to take substantially longer than
they ought to do on typical setups.

To fix (1), make WaitForReplicationWorkerAttach re-set the latch
before returning if it cleared it while looping.  To fix (2), ensure
that we reduce wait_time to no more than wal_retrieve_retry_interval
when logicalrep_worker_launch reports failure.  In passing, fix a
couple of perhaps-hypothetical race conditions, e.g. examining
worker->in_use without a lock.

Backpatch to v16.  Problem (2) didn't exist before commit 5a3a95385
because the previous code always set wait_time to
wal_retrieve_retry_interval when launching a worker, regardless of
success or failure of the launch.  That behavior also greatly
mitigated problem (1), so I'm not excited about adapting the remainder
of the patch to the substantially-different code in older branches.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/817604.1750723007@sss.pgh.pa.us
Backpatch-through: 16
2025-06-24 14:14:07 -04:00
Amit Kapila
6531f36283 Fix missing comment update in 1462aad2e4.
Remove the part of comment that says we don't allow toggling two_phase
option as that is supported in commit 1462aad2e4.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Author: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB1496656725F3951AEE8749EBDF579A@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-06-24 09:51:07 +05:30
Alexander Korotkov
70d8a91f82 Remove excess assert from InvalidatePossiblyObsoleteSlot()
ca307d5cec introduced keeping WAL segments by slot's last saved restart LSN.
It also added an assertion that the slot's restart LSN never goes backward.
However, situations when the restart LSN goes backward have been spotted by
buildfarm animals and investigated in the thread.

When pg_receivewal starts the replication, it sets the last replayed LSN to
the beginning of the segment, which is older than what
ReplicationSlotReserveWal() set for the slot.  A similar situation can happen
to pg_basebackup.  When standby reconnects to the primary, it sends the last
replayed LSN, which might be older than the last confirmed flush LSN.  In
both these situations, a concurrent checkpoint may trigger an assert trap.

Based on ideas from Vitaly Davydov <v.davydov@postgrespro.ru>,
Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>,
Vignesh C <vignesh21@gmail.com>,
Amit Kapila <amit.kapila16@gmail.com>.

Reported-by: Vignesh C <vignesh21@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CALDaNm3s-jpQTe1MshsvQ8GO%3DTLj233JCdkQ7uZ6pwqRVpxAdw%40mail.gmail.com
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
2025-06-23 21:27:42 +03:00
Amit Kapila
1546e17f9d Improve log messages and docs for slot synchronization.
Improve the clarity of LOG messages when a failover logical slot
synchronization fails, making the reasons more explicit for easier
debugging.

Update the documentation to outline scenarios where slot synchronization
can fail, especially during the initial sync, and emphasize that
pg_sync_replication_slot() is primarily intended for testing and
debugging purposes.

We also discussed improving the functionality of
pg_sync_replication_slot() so that it can be used reliably, but we would
take up that work for next version after some more discussion and review.

Reported-by: Suraj Kharage <suraj.kharage@enterprisedb.com>
Author: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/CAF1DzPWTcg+m+x+oVVB=y4q9=PYYsL_mujVp7uJr-_oUtWNGbA@mail.gmail.com
2025-06-19 09:48:08 +05:30
Masahiko Sawada
d87d07b7ad Fix re-distributing previously distributed invalidation messages during logical decoding.
Commit 4909b38af0 introduced logic to distribute invalidation messages
from catalog-modifying transactions to all concurrent in-progress
transactions. However, since each transaction distributes not only its
original invalidation messages but also previously distributed
messages to other transactions, this leads to an exponential increase
in allocation request size for invalidation messages, ultimately
causing memory allocation failure.

This commit fixes this issue by tracking distributed invalidation
messages separately per decoded transaction and not redistributing
these messages to other in-progress transactions. The maximum size of
distributed invalidation messages that one transaction can store is
limited to MAX_DISTR_INVAL_MSG_PER_TXN (8MB). Once the size of the
distributed invalidation messages exceeds this threshold, we
invalidate all caches in locations where distributed invalidation
messages need to be executed.

Back-patch to all supported versions where we introduced the fix by
commit 4909b38af0.

Note that this commit adds two new fields to ReorderBufferTXN to store
the distributed transactions. This change breaks ABI compatibility in
back branches, affecting third-party extensions that depend on the
size of the ReorderBufferTXN struct, though this scenario seems
unlikely.

Additionally, it adds a new flag to the txn_flags field of
ReorderBufferTXN to indicate distributed invalidation message
overflow. This should not affect existing implementations, as it is
unlikely that third-party extensions use unused bits in the txn_flags
field.

Bug: #18938 #18942
Author: vignesh C <vignesh21@gmail.com>
Reported-by: Duncan Sands <duncan.sands@deepbluecap.com>
Reported-by: John Hutchins <john.hutchins@wicourts.gov>
Reported-by: Laurence Parry <greenreaper@hotmail.com>
Reported-by: Max Madden <maxmmadden@gmail.com>
Reported-by: Braulio Fdo Gonzalez <brauliofg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/680bdaf6-f7d1-4536-b580-05c2760c67c6@deepbluecap.com
Discussion: https://postgr.es/m/18942-0ab1e5ae156613ad@postgresql.org
Discussion: https://postgr.es/m/18938-57c9a1c463b68ce0@postgresql.org
Discussion: https://postgr.es/m/CAD1FGCT2sYrP_70RTuo56QTizyc+J3wJdtn2gtO3VttQFpdMZg@mail.gmail.com
Discussion: https://postgr.es/m/CANO2=B=2BT1hSYCE=nuuTnVTnjidMg0+-FfnRnqM6kd23qoygg@mail.gmail.com
Backpatch-through: 13
2025-06-16 17:36:01 -07:00
Alexander Korotkov
eb124c3d6d Add TAP tests to check replication slot advance during the checkpoint
The new tests verify that logical and physical replication slots are still
valid after an immediate restart on checkpoint completion when the slot was
advanced during the checkpoint.

This commit introduces two new injection points to make these tests possible:

* checkpoint-before-old-wal-removal - triggered in the checkpointer process
  just before old WAL segments cleanup;
* logical-replication-slot-advance-segment - triggered in
  LogicalConfirmReceivedLocation() when restart_lsn was changed enough to
  point to the next WAL segment.

Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17
2025-06-14 03:55:21 +03:00
Alexander Korotkov
ca307d5cec Keep WAL segments by slot's last saved restart LSN
The patch fixes the issue with the unexpected removal of old WAL segments
after checkpoint, followed by an immediate restart.  The issue occurs when
a slot is advanced after the start of the checkpoint and before old WAL
segments are removed at the end of the checkpoint.

The patch introduces a new in-memory state for slots: last_saved_restart_lsn,
which is used to calculate the oldest LSN for removing WAL segments. This
state is updated every time with the current restart_lsn at the moment when
the slot is saved to disk.

This fix changes the shared memory layout.  It's applied to HEAD only because
we don't have to preserve ABI compatibility during the beta stage.  Another
fix that doesn't affect the ABI is committed to back branches.

Discussion: https://postgr.es/m/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
2025-06-14 03:36:04 +03:00
Nathan Bossart
a31767fc09 Use NULL instead of 0 for pointer arguments.
Commit 5fe08c006c fixed this for calls to dshash_create().  This
commit fixes calls to dshash_attach() and dsa_create_in_place().

Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aECi_gSD9JnVWQ8T%40nathan
2025-06-06 12:08:17 -05:00
Michael Paquier
5231ed8262 Use replay LSN as target for cascading logical WAL senders
A cascading WAL sender doing logical decoding (as known as doing its
work on a standby) has been using as flush LSN the value returned by
GetStandbyFlushRecPtr() (last position safely flushed to disk).  This is
incorrect as such processes are only able to decode changes up to the
LSN that has been replayed by the startup process.

This commit changes cascading logical WAL senders to use the replay LSN,
as returned by GetXLogReplayRecPtr().  This distinction is important
particularly during shutdown, when WAL senders need to send any
remaining available data to their clients, switching WAL senders to a
caught-up state.  Using the latest flush LSN rather than the replay LSN
could cause the WAL senders to be stuck in an infinite loop preventing
them to shut down, as the startup process does not run when WAL senders
attempt to catch up, so they could keep waiting for work that would
never happen.

Backpatch down to v16, where logical decoding on standbys has been
introduced.

Author: Alexey Makhmutov <a.makhmutov@postgrespro.ru>
Reviewed-by: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/52138028-7246-421c-9161-4fa108b88070@postgrespro.ru
Backpatch-through: 16
2025-06-02 12:03:59 +09:00
Nathan Bossart
706054b11b Ensure we have a snapshot when updating various system catalogs.
A few places that access system catalogs don't set up an active
snapshot before potentially accessing their TOAST tables.  To fix,
push an active snapshot just before each section of code that might
require accessing one of these TOAST tables, and pop it shortly
afterwards.  While at it, this commit adds some rather strict
assertions in an attempt to prevent such issues in the future.

Commit 16bf24e0e4 recently removed pg_replication_origin's TOAST
table in order to fix the same problem for that catalog.  On the
back-branches, those bugs are left in place.  We cannot easily
remove a catalog's TOAST table on released major versions, and only
replication origins with extremely long names are affected.  Given
the low severity of the issue, fixing older versions doesn't seem
worth the trouble of significantly modifying the patch.

Also, on v13 and v14, the aforementioned strict assertions have
been omitted because commit 2776922201, which added
HaveRegisteredOrActiveSnapshot(), was not back-patched.  While we
could probably back-patch it now, I've opted against it because it
seems unlikely that new TOAST snapshot issues will be introduced in
the oldest supported versions.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/18127-fe54b6a667f29658%40postgresql.org
Discussion: https://postgr.es/m/18309-c0bf914950c46692%40postgresql.org
Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan
Backpatch-through: 13
2025-05-30 15:17:28 -05:00
Amit Kapila
ad5eaf390c Don't retreat slot's confirmed_flush LSN.
Prevent moving the confirmed_flush backwards, as this could lead to data
duplication issues caused by replicating already replicated changes.

This can happen when a client acknowledges an LSN it doesn't have to do
anything for, and thus didn't store persistently. After a restart, the
client can send the prior LSN that it stored persistently as an
acknowledgement, but we need to ignore such an LSN to avoid retreating
confirm_flush LSN.

Diagnosed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Author: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Tested-by: Nisha Moond <nisha.moond412@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAJpy0uDZ29P=BYB1JDWMCh-6wXaNqMwG1u1mB4=10Ly0x7HhwQ@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB57164AB5716AF2E477D53F6F9489A@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-05-19 12:13:06 +05:30
Nathan Bossart
16bf24e0e4 Remove pg_replication_origin's TOAST table.
A few places that access this catalog don't set up an active
snapshot before potentially accessing its TOAST table.  However,
roname (the replication origin name) is the only varlena column, so
this is only a problem if the name requires out-of-line storage.
This commit removes its TOAST table to avoid needing to set up a
snapshot.  It also places a limit on replication origin names so
that attempts to set long names will fail with a more user-friendly
error.  Those chosen limit of 512 bytes should be sufficient to
avoid "row is too big" errors independent of BLCKSZ, but it should
also be lenient enough for all reasonable use-cases.

Bumps catversion.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan
2025-05-07 14:47:36 -05:00
Amit Kapila
3ff2a1f0c9 Fix assertion failure during decoding from synced slots.
The slot synchronization skips updating the confirmed_flush LSN of the
local slot if the local slot has a newer catalog_xmin or restart_lsn, but
still allows updating the two_phase and two_phase_at fields of the slot.
This opens up a window for the prepared transactions between old
confirmed_flush LSN and two_phase_at to unexpectedly get decoded and sent
to the downstream after promotion. Then, while decoding the commit
prepared the assert will fail, which expects that the prepare hasn't been
sent to the downstream.

The fix is to skip updating the other slot fields when we are skipping to
update the confirmed_flush LSN of the slot.

We didn't backpatch this commit as two_phase_at was not synced in back
branches, which means prepared transactions won't be unexpectedly sent to
downstream.

We discovered this problem while analyzing BF failure reported in the
discussion link.

Reliably reproducing this issue without a debugger is difficult. Given
its rarity, adding specific injection point to test it doesn't seem
worthwhile, so we won't be adding a dedicated test case.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716B44052000EB91EFAE60E94BC2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-04-29 12:52:05 +05:30
Amit Kapila
aaf9e95e87 Fix xmin advancement during fast_forward decoding.
During logical decoding, we advance catalog_xmin of logical too early in
fast_forward mode, resulting in required catalog data being removed by
vacuum. This mode is normally used to advance the slot without processing
the changes, but we still can't let the slot's xmin to advance to an
incorrect value.

Commit f49a80c481 fixed a similar issue where the logical slot's
catalog_xmin was getting advanced prematurely during non-fast-forward
mode. During xl_running_xacts processing, instead of directly advancing
the slot's xmin to the oldest running xid in the record, it allowed the
xmin to be held back for snapshots that can be used for
not-yet-replayed transactions, as those might consider older txns as
running too. However, it missed the fact that the same problem can happen
during fast_forward mode decoding, as we won't build a base snapshot in
that mode, and the future call to get_changes from the same slot can miss
seeing the required catalog changes leading to incorrect reslts.

This commit allows building the base snapshot even in fast_forward mode to
prevent the early advancement of xmin.

Reported-by: Amit Kapila <amit.kapila16@gmail.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAA4eK1LqWncUOqKijiafe+Ypt1gQAQRjctKLMY953J79xDBgAg@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB57163087F86621D44D9A72BF94BB2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-04-28 11:35:54 +05:30
Amit Kapila
0e091ce409 Fix an oversight in 3f28b2fcac.
Commit 3f28b2fcac tried to ensure that the replication origin shouldn't be
advanced in case of an ERROR in the apply worker, so that it can request
the same data again after restart. However, it is possible that an ERROR
was caught and handled by a (say PL/pgSQL) function, and  the apply worker
continues to apply further changes, in which case, we shouldn't reset the
replication origin.

Ensure to reset the origin only when the apply worker exits after an
ERROR.

Commit 3f28b2fcac added new function geterrlevel, which we removed in HEAD
as part of this commit, but kept it in backbranches to avoid breaking any
applications. A separate case can be made to have such a function even for
HEAD.

Reported-by: Shawn McCoy <shawn.the.mccoy@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/CALsgZNCGARa2mcYNVTSj9uoPcJo-tPuWUGECReKpNgTpo31_Pw@mail.gmail.com
2025-04-23 11:08:24 +05:30
Tom Lane
80b727eb9d Use the same cmd_context throughout a walsender's lifetime.
exec_replication_command created a cmd_context to work in and
then deleted it on exit.  This is pretty dangerous because
some replication commands start/finish transactions.  In the
wake of commit 1afe31f03, that could lead to re-selecting a
CurrentMemoryContext that's already been deleted, leading to
hilarity such as a memory context that is its own parent.

To fix, let's make the cmd_context persist across
exec_replication_command calls; instead of deleting it, we'll just
reset it each time.  In this way it retains the same identity and
there's no problem if transaction abort restores it as the working
context.  It probably even saves a few microseconds to do this.

This fix also ensures that exec_replication_command returns to the
caller (PostgresMain) with the same context active that had been
when it was called (probably MessageContext).  The previous
coding could get that wrong too.

Reported-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAO6_XqoJA7-_G6t7Uqe5nWF3nj+QBGn4F6Ptp=rUGDr0zo+KvA@mail.gmail.com
2025-04-21 12:09:36 -04:00
Michael Paquier
2e57790836 Fix race with synchronous_standby_names at startup
synchronous_standby_names cannot be reloaded safely by backends, and the
checkpointer is in charge of updating a state in shared memory if the
GUC is enabled in WalSndCtl, to let the backends know if they should
wait or not for a given LSN.  This provides a strict control on the
timing of the waiting queues if the GUC is enabled or disabled, then
reloaded.  The checkpointer is also in charge of waking up the backends
that could be waiting for a LSN when the GUC is disabled.

This logic had a race condition at startup, where it would be possible
for backends to not wait for a LSN even if synchronous_standby_names is
enabled.  This would cause visibility issues with transactions that we
should be waiting for but they were not.  The problem lasts until the
checkpointer does its initial update of the shared memory state when it
loads synchronous_standby_names.

In order to take care of this problem, the shared memory state in
WalSndCtl is extended to detect if it has been initialized by the
checkpointer, and not only check if synchronous_standby_names is
defined.  In WalSndCtlData, sync_standbys_defined is renamed to
sync_standbys_status, a bits8 able to know about two states:
- If the shared memory state has been initialized.  This flag is set by
the checkpointer at startup once, and never removed.
- If synchronous_standby_names is known as defined in the shared memory
state.  This is the same as the previous sync_standbys_defined in
WalSndCtl.

This method gives a way for backends to decide what they should do until
the shared memory area is initialized, and they now ultimately fall back
to a check on the GUC value in this case, which is the best thing that
can be done.

Fortunately, SyncRepUpdateSyncStandbysDefined() is called immediately by
the checkpointer when this process starts, so the window is very narrow.
It is possible to enlarge the problematic window by making the
checkpointer wait at the beginning of SyncRepUpdateSyncStandbysDefined()
with a hardcoded sleep for example, and doing so has showed that a 2PC
visibility test is indeed failing.  On machines slow enough, this bug
would cause spurious failures.

In 17~, we have looked at the possibility of adding an injection point
to have a reproducible test, but as the problematic window happens at
early startup, we would need to invent a way to make an injection point
optionally persistent across restarts when attached, something that
would be fine for this case as it would involve the checkpointer.  This
issue is quite old, and can be reproduced on all the stable branches.

Author: Melnikov Maksim <m.melnikov@postgrespro.ru>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ru
Backpatch-through: 13
2025-04-11 10:00:21 +09:00
David Rowley
928394b664 Improve various new-to-v18 appendStringInfo calls
Similar to 8461424fd, here we adjust a few new locations which were not
using the most suitable appendStringInfo* function for the intended
purpose.

Author: David Rowley <drowleyml@gmail.com
Discussion: https://postgr.es/m/CAApHDvqJnNjueb=Eoj8K+8n0g7nj_AcPWSiCj5RNV4fDejAfqA@mail.gmail.com
2025-04-11 10:07:22 +12:00
Amit Kapila
4909b38af0 Fix data loss in logical replication.
Data loss can happen when the DDLs like ALTER PUBLICATION ... ADD TABLE ...
or ALTER TYPE ...  that don't take a strong lock on table happens
concurrently to DMLs on the tables involved in the DDL. This happens
because logical decoding doesn't distribute invalidations to concurrent
transactions and those transactions use stale cache data to decode the
changes. The problem becomes bigger because we keep using the stale cache
even after those in-progress transactions are finished and skip the
changes required to be sent to the client.

This commit fixes the issue by distributing invalidation messages from
catalog-modifying transactions to all concurrent in-progress transactions.
This allows the necessary rebuild of the catalog cache when decoding new
changes after concurrent DDL.

We observed performance regression primarily during frequent execution of
*publication DDL* statements that modify the published tables. The
regression is minor or nearly nonexistent for DDLs that do not affect the
published tables or occur infrequently, making this a worthwhile cost to
resolve a longstanding data loss issue.

An alternative approach considered was to take a strong lock on each
affected table during publication modification. However, this would only
address issues related to publication DDLs (but not the ALTER TYPE ...)
and require locking every relation in the database for publications
created as FOR ALL TABLES, which is impractical.

The bug exists in all supported branches, but we are backpatching till 14.
The fix for 13 requires somewhat bigger changes than this fix, so the fix
for that branch is still under discussion.

Reported-by: hubert depesz lubaczewski <depesz@depesz.com>
Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Tested-by: Benoit Lobréau <benoit.lobreau@dalibo.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com
Discussion: https://postgr.es/m/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com
2025-04-10 13:14:40 +05:30
Amit Kapila
12eece5fd5 Fix uninitialized index information access during apply.
The issue happens when building conflict information during apply of
INSERT or UPDATE operations that violate unique constraints on leaf
partitions.

The problem was introduced in commit 9ff68679b5, which removed the
redundant calls to ExecOpenIndices/ExecCloseIndices. The previous code was
relying on the redundant ExecOpenIndices call in
apply_handle_tuple_routing() to build the index information required for
unique key conflict detection.

The fix is to delay building the index information until a conflict is
detected instead of relying on ExecOpenIndices to do the same. The
additional benefit of this approach is that it avoids building index
information when there is no conflict.

Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by:Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/TYAPR01MB57244ADA33DDA57119B9D26494A62@TYAPR01MB5724.jpnprd01.prod.outlook.com
2025-04-08 15:35:42 +05:30
Michael Paquier
039549d70f Flush the IO statistics of active WAL senders more frequently
WAL senders do not flush their statistics until they exit, limiting the
monitoring possible for live processes.  This is penalizing when WAL
senders are running for a long time, like in streaming or logical
replication setups, because it is not possible to know the amount of IO
they generate while running.

This commit makes WAL senders more aggressive with their statistics
flush, using an internal of 1 second, with the flush timing calculated
based on the existing GetCurrentTimestamp() done before the sleeps done
to wait for some activity.  Note that the sleep done for logical and
physical WAL senders happens in two different code paths, so the stats
flushes need to happen in these two places.

One test is added for the physical WAL sender case, and one for the
logical WAL sender case.  This can be done in a stable fashion by
relying on the WAL generated by the TAP tests in combination with a
stats reset while a server is running, but only on HEAD as WAL data has
been added to pg_stat_io in a051e71e28.

This issue exists since a9c70b46db and the introduction of pg_stat_io,
so backpatch down to v16.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/Z73IsKBceoVd4t55@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 16
2025-04-08 07:57:19 +09:00
Heikki Linnakangas
6e9c81836e Use standard die() signal handler in walreceiver
This gets rid of the bespoken ProcessWalRcvInterrupts() function,
which lets walreceiver terminate at any CHECK_FOR_INTERRUPTS() call.
And it's less code anyway.

We can now use the standard libpqsrv_connect_params() libpq wrapper
from libpq-be-fe-helpers.h, removing more code. We attempted to do
that earlier already in commit 728f86fec6, but that was reverted
because it didn't call ProcessWalRcvInterrupts() and therefore didn't
react to shutdown requests. Now that ProcessWalRcvInterrupts() is
gone, it works. As stated in that commit, this also leads to
libpqwalreceiver reserving file descriptors for libpq conncetions,
which is nice.

Author: Andres Freund <andres@anarazel.de> (the earlier commit)
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Yura Sokolov <y.sokolov@postgrespro.ru>
2025-04-04 12:38:32 +03:00
Masahiko Sawada
fd09c1316b Restrict copying of invalidated replication slots.
Previously, invalidated logical and physical replication slots could
be copied using the pg_copy_logical_replication_slot and
pg_copy_physical_replication_slot functions. Replication slots that
were invalidated for reasons other than WAL removal retained their
restart_lsn. This meant that a new slot copied from an invalidated
slot could have a restart_lsn pointing to a WAL segment that might
have already been removed.

This commit restricts the copying of invalidated replication slots.

Backpatch to v16, where slots could retain their restart_lsn when
invalidated for reasons other than WAL removal.

For v15 and earlier, this check is not required since slots can only
be invalidated due to WAL removal, and existing checks already handle
this issue.

Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CANhcyEU65aH0VYnLiu%3DOhNNxhnhNhwcXBeT-jvRe1OiJTo_Ayg%40mail.gmail.com
Backpatch-through: 16
2025-04-03 10:30:00 -07:00
Amit Kapila
4868c96bc8 Fix slot synchronization for two_phase enabled slots.
The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To address the issue on HEAD, the two_phase_at field of the slot is
exposed by the pg_replication_slots view and allows the slot
synchronization to copy this value to the corresponding synced slot on the
standby server.

This bug is likely to occur if the user toggles the two_phase option to
true after initial slot creation. Given that altering the two_phase option
of a replication slot is not allowed in PostgreSQL 17, this bug is less
likely to occur. We can't change the view/function definition in
backbranch so we can't push the same fix but we are brainstorming an
appropriate solution for PG17.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/TYAPR01MB5724CC7C288535BBCEEE65DA94A72@TYAPR01MB5724.jpnprd01.prod.outlook.com
2025-04-03 12:26:54 +05:30
Peter Eisentraut
a0ed19e0a9 Use PRI?64 instead of "ll?" in format strings (continued).
Continuation of work started in commit 15a79c73, after initial trial.

Author: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org
2025-03-29 10:43:57 +01:00
Daniel Gustafsson
058b5152f0 Fix guc_malloc calls for consistency and OOM checks
check_createrole_self_grant and check_synchronized_standby_slots
were allocating memory on a LOG elevel without checking if the
allocation succeeded or not, which would have led to a segfault
on allocation failure.

On top of that, a number of callsites were using the ERROR level,
relying on erroring out rather than returning false to allow the
GUC machinery handle it gracefully.  Other callsites used WARNING
instead of LOG.  While neither being not wrong, this changes all
check_ functions do it consistently with LOG.

init_custom_variable gets a promoted elevel to FATAL to keep
the guc_malloc error handling in line with the rest of the
error handling in that function which already call FATAL.  If
we encounter an OOM in this callsite there is no graceful
handling to be had, better to error out hard.

Backpatch the fix to check_createrole_self_grant down to v16
and the fix to check_synchronized_standby_slots down to v17
where they were introduced.

Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Nikita <pm91.arapov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Bug: #18845
Discussion: https://postgr.es/m/18845-582c6e10247377ec@postgresql.org
Backpatch-through: 16
2025-03-27 22:57:34 +01:00
Tom Lane
55527368bd Use PG_MODULE_MAGIC_EXT in our installable shared libraries.
It seems potentially useful to label our shared libraries with version
information, now that a facility exists for retrieving that.  This
patch labels them with the PG_VERSION string.  There was some
discussion about using semantic versioning conventions, but that
doesn't seem terribly helpful for modules with no SQL-level presence;
and for those that do have SQL objects, we typically expect them
to support multiple revisions of the SQL definitions, so it'd still
not be very helpful.

I did not label any of src/test/modules/.  It seems unnecessary since
we don't install those, and besides there ought to be someplace that
still provides test coverage for the original PG_MODULE_MAGIC macro.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com
2025-03-26 11:11:02 -04:00
Amit Kapila
b87ced747d Fix an oversight in 3abe9dc188.
Forgot to update the comment atop one of the functions.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/OSCPR01MB1496623BE1125B44614494E7AF5A72@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-03-25 09:26:23 +05:30
Amit Kapila
73eba5004a Detect and Log multiple_unique_conflicts type conflict.
Introduce a new conflict type, multiple_unique_conflicts, to handle cases
where an incoming row during logical replication violates multiple UNIQUE
constraints.

Previously, the apply worker detected and reported only the first
encountered key conflict (insert_exists/update_exists), causing repeated
failures as each constraint violation needs to be handled one by one
making the process slow and error-prone.

With this patch, the apply worker checks all unique constraints upfront
once the first key conflict is detected and reports
multiple_unique_conflicts if multiple violations exist. This allows users
to resolve all conflicts at once by deleting all conflicting tuples rather
than dealing with them individually or skipping the transaction.

In the future, this will also allow us to specify different resolution
handlers for such a conflict type.

Add the stats for this conflict type in pg_stat_subscription_stats.

Author: Nisha Moond <nisha.moond412@gmail.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/CABdArM7FW-_dnthGkg2s0fy1HhUB8C3ELA0gZX1kkbs1ZZoV3Q@mail.gmail.com
2025-03-24 12:30:44 +05:30
Masahiko Sawada
04ff636cbc Add GUC option to control maximum active replication origins.
This commit introduces a new GUC option max_active_replication_origins
to control the maximum number of active replication
origins. Previously, this was controlled by
'max_replication_slots'. Having a separate GUC option provides better
flexibility for setting up subscribers, as they may not require
replication slots (for cascading replication) but always require
replication origins.

Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/b81db436-8262-4575-b7c4-bc0c1551000b@app.fastmail.com
2025-03-21 12:20:15 -07:00
Andres Freund
02844012b3 aio: Basic subsystem initialization
This commit just does the minimal wiring up of the AIO subsystem, added in the
next commit, to the rest of the system. The next commit contains more details
about motivation and architecture.

This commit is kept separate to make it easier to review, separating the
changes across the tree, from the implementation of the new subsystem.

We discussed squashing this commit with the main commit before merging AIO,
but there has been a mild preference for keeping it separate.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-17 18:51:33 -04:00
Amit Kapila
7c99dc587a Fix ALTER SUBSCRIPTION ... SET PUBLICATION ... command.
The problem is that ALTER SUBSCRIPTION ... SET PUBLICATION ... will lead
to restarting of apply worker and after the restart, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before the restart, the origin has
not been updated, and the WAL start location points to a location before
where PUBLICATION pointed to by SET PUBLICATION doesn't exist, and that
can lead to an error like: "ERROR:  publication "pub1" does not exist".
Once this error occurs, apply worker will never be able to proceed and
will always return the same error.

We decided to skip loading the publication if the publication does not
exist. The publication is loaded later and updates the relation entry when
the publication gets created.

We decided not to backpatch this as this is a behaviour change, and we don't
see field reports. This problem has been found by intermittent buildfarm
failures.

Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/flat/CALDaNm0-n8FGAorM%2BbTxkzn%2BAOUyx5%3DL_XmnvOP6T24%2B-NcBKg%40mail.gmail.com
Discussion: https://postgr.es/m/CAA4eK1+T-ETXeRM4DHWzGxBpKafLCp__5bPA_QZfFQp7-0wj4Q@mail.gmail.com
2025-03-14 08:57:40 +05:30
Peter Eisentraut
3691edfab9 pg_noreturn to replace pg_attribute_noreturn()
We want to support a "noreturn" decoration on more compilers besides
just GCC-compatible ones, but for that we need to move the decoration
in front of the function declaration instead of either behind it or
wherever, which is the current style afforded by GCC-style attributes.
Also rename the macro to "pg_noreturn" to be similar to the C11
standard "noreturn".

pg_noreturn is now supported on all compilers that support C11 (using
_Noreturn), as well as GCC-compatible ones (using __attribute__, as
before), as well as MSVC (using __declspec).  (When PostgreSQL
requires C11, the latter two variants can be dropped.)

Now, all supported compilers effectively support pg_noreturn, so the
extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped.

This also fixes a possible problem if third-party code includes
stdnoreturn.h, because then the current definition of

    #define pg_attribute_noreturn() __attribute__((noreturn))

would cause an error.

Note that the C standard does not support a noreturn attribute on
function pointer types.  So we have to drop these here.  There are
only two instances at this time, so it's not a big loss.  In one case,
we can make up for it by adding the pg_noreturn to a wrapper function
and adding a pg_unreachable(), in the other case, the latter was
already done before.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
2025-03-13 12:37:26 +01:00
Amit Kapila
3abe9dc188 Avoid invalidating all RelationSyncCache entries on publication rename.
On Publication rename, we need to only invalidate the RelationSyncCache
entries corresponding to relations that are part of the publication being
renamed.

As part of this patch, we introduce a new invalidation message to
invalidate the cache maintained by the logical decoding output plugin. We
can't use existing relcache invalidation for this purpose, as that would
unnecessarily cause relcache invalidations in other backends.

This will improve performance by building fewer relation cache entries
during logical replication.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966C09AA201EFFA706576A7F5C92@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-03-13 09:16:33 +05:30
Heikki Linnakangas
ac4494646d Rename alloc/free functions in reorderbuffer.c
There used to be bespoken pools for these structs to reduce the
palloc/pfree overhead, but that was ripped out a long time ago and
replaced with the generic, cheaper generational memory allocator
(commit a4ccc1cef5). The Get/Return terminology made sense with the
pools, as you "got" an object from the pool and "returned" it later,
but now it just looks weird. Rename to Alloc/Free.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/c9e43d2d-8e83-444f-b111-430377368989@iki.fi
2025-03-12 22:03:39 +02:00
Masahiko Sawada
bd65cb3cd4 pg_logicalinspect: Fix possible crash when passing a directory path.
Previously, pg_logicalinspect functions were too trusting of their
input and blindly passed it to SnapBuildRestoreSnapshot(). If the
input pointed to a directory, the server could a PANIC error while
attempting to fsync_fname() with isdir=false on a directory.

This commit adds validation checks for input filenames and passes the
LSN extracted from the filename to SnapBuildRestoreSnapshot() instead
of the filename itself. It also adds regression tests for various
input patterns and permission checks.

Bug: #18828
Reported-by: Robins Tharakan <tharakan@gmail.com>
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/18828-0f4701c635064211@postgresql.org
2025-03-11 09:56:40 -07:00
Heikki Linnakangas
03f8e9a7fe Fix incorrect assertion in libpqwalreceiver
Was supposed to check the length of the array, but was checking its
size in bytes.

Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA%2BCOZaA_9afJxj9ZuO73U5P7WXP%2BZM9NGnZvTDCmBFz0FGP%2BwA@mail.gmail.com
2025-03-09 20:40:45 +02:00
Amit Kapila
588acf6d0e Avoid invalidating all RelationSyncCache entries on publication change.
On change of publication via ALTER PUBLICATION ... SET/ADD/DROP commands,
we were invalidating all the relations present in relation sync cache
maintained by pgoutput. We need to invalidate only the relation entries
that are changed as part of publication DDL.

We have ensured that the publication DDL execution generated the
invalidations required to invalidate impacted relation sync entries in
RelationSyncCache.

This improves the performance by avoiding building the cache entries for
the cases where a publication has many tables but only one of them is
dropped.

Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966C09AA201EFFA706576A7F5C92@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-03-06 14:19:38 +05:30