Commit 883a95646a introduced overflow entries in the replication lag tracker
to fix an issue where lag columns in pg_stat_replication could stall when
the replay LSN stopped advancing.
This commit adds comments clarifying the purpose and behavior of overflow
entries to improve code readability and understanding.
Since commit 883a95646a was recently applied and backpatched to all
supported branches, this follow-up commit is also backpatched accordingly.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VxqQA_DePxyZ7Y8V+ErYyXkmwJ1P6NC+YC+cvxMipWKw@mail.gmail.com
Backpatch-through: 13
We had several places that used cast-to-unsigned-int as a substitute
for properly checking for overflow. Coverity has started objecting
to that practice as likely introducing Y2038 bugs. An extra
comparison is surely not much compared to the cost of time(NULL), nor
is this coding practice particularly readable. Let's do it honestly,
with explicit logic covering the cases of first-time-through and
clock-went-backwards.
I don't feel a need to back-patch though: our released versions
will be out of support long before 2038, and besides which I think
the code would accidentally work anyway for another 70 years or so.
Previously, if primary_slot_name was set to an invalid slot name and
the configuration file was reloaded, both the postmaster and all other
backend processes reported a WARNING. With many processes running,
this could produce a flood of duplicate messages. The problem was that
the GUC check hook for primary_slot_name reported errors at WARNING
level via ereport().
This commit changes the check hook to use GUC_check_errdetail() and
GUC_check_errhint() for error reporting. As with other GUC parameters,
this causes non-postmaster processes to log the message at DEBUG3,
so by default, only the postmaster's message appears in the log file.
Backpatch to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/CAHGQGwFud-cvthCTfusBfKHBS6Jj6kdAPTdLWKvP2qjUX6L_wA@mail.gmail.com
Backpatch-through: 13
Previously, when the replay LSN reported in feedback messages from a standby
stopped advancing, for example, due to a recovery conflict, the write_lag and
flush_lag columns in pg_stat_replication would initially update but then stop
progressing. This prevented users from correctly monitoring replication lag.
The problem occurred because when any LSN stopped updating, the lag tracker's
cyclic buffer became full (the write head reached the slowest read head).
In that state, the lag tracker could no longer compute round-trip lag values
correctly.
This commit fixes the issue by handling the slowest read entry (the one
causing the buffer to fill) as a separate overflow entry and freeing space
so the write and other read heads can continue advancing in the buffer.
As a result, write_lag and flush_lag now continue updating even if the reported
replay LSN remains stalled.
Backpatch to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwGdGQ=1-X-71Caee-LREBUXSzyohkoQJd4yZZCMt24C0g@mail.gmail.com
Backpatch-through: 13
5983a4cff added CompactAttribute for storing commonly used fields from
FormData_pg_attribute. 5983a4cff didn't go to the trouble of adjusting
every location where we can use CompactAttribute rather than
FormData_pg_attribute, so here we change the remaining ones.
There are some locations where I've left the code using
FormData_pg_attribute. These are mostly in the ALTER TABLE code. Using
CompactAttribute here seems more risky as often the TupleDesc is being
changed and those changes may not have been flushed to the
CompactAttribute yet.
I've also left record_recv(), record_send(), record_cmp(), record_eq()
and record_image_eq() alone as it's not clear to me that accessing the
CompactAttribute is a win here due to the FormData_pg_attribute still
having to be accessed for most cases. Switching the relevant parts to
use CompactAttribute would result in having to access both for common
cases. Careful benchmarking may reveal that something can be done to
make this better, but in absence of that, the safer option is to leave
these alone.
In ReorderBufferToastReplace(), there was a check to skip attnums < 0
while looping over the TupleDesc. Doing this is redundant since
TupleDescs don't store < 0 attnums. Removing that code allows us to
move to using CompactAttribute.
The change in validateDomainCheckConstraint() just moves fetching the
FormData_pg_attribute into the ERROR path, which is cold due to calling
errstart_cold() and results in code being moved out of the common path.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAApHDvrMy90o1Lgkt31F82tcSuwRFHq3vyGewSRN=-QuSEEvyQ@mail.gmail.com
An error happening while a slot data is saved on disk in
SaveSlotToPath() could cause a state.tmp file (temporary file holding
the slot state data, renamed to its permanent name at the end of the
function) to remain around after it has been created. This temporary
file is created with O_EXCL, meaning that if an existing state.tmp is
found, its creation would fail. This would prevent the slot data to be
saved, requiring a manual intervention to remove state.tmp before being
able to save again a slot. Possible scenarios where this temporary file
could remain on disk is for example a ENOSPC case (no disk space) while
writing, syncing or renaming it. The bug reports point to a write
failure as the principal cause of the problems.
Using O_TRUNC has been argued back in 2019 as a potential solution to
discard any temporary file that could exist. This solution was rejected
as O_EXCL can also act as a safety measure when saving the slot state,
crash recovery offering cleanup guarantees post-crash. This commit uses
the alternative approach that has been suggested by Andres Freund back
in 2019. When the temporary state file cannot be written, synced,
closed or renamed (note: not when created!), an unlink() is used to
remove the temporary state file while holding the in-progress I/O
LWLock, so as any follow-up attempts to save a slot's data would not
choke on an existing file that remained around because of a previous
failure.
This problem has been reported a few times across the years, going back
to 2019, but for some reason I have never come back to do something
about it and it has been forgotten. A recent report has reminded me
that this was still a problem.
Reported-by: Kevin K Biju <kevinkbiju@gmail.com>
Reported-by: Sergei Kornilov <sk@zsrv.org>
Reported-by: Grigory Smolkin <g.smolkin@postgrespro.ru>
Discussion: https://postgr.es/m/CAM45KeHa32soKL_G8Vk38CWvTBeOOXcsxAPAs7Jt7yPRf2mbVA@mail.gmail.com
Discussion: https://postgr.es/m/3559061693910326@qy4q4a6esb2lebnz.sas.yp-c.yandex.net
Discussion: https://postgr.es/m/08bbfab1-a61d-3750-fc18-4ab2c1aa7f09@postgrespro.ru
Backpatch-through: 13
While pgoutput caches relation synchronization information in
RelationSyncCache that resides in CacheMemoryContext, each entry's
information (such as row filter expressions and column lists) is
stored in the entry's private memory context (entry_cxt in
RelationSyncEntry), which is a descendant memory context of the
decoding context. If a logical decoding invoked via SQL functions like
pg_logical_slot_get_binary_changes fails with an error, subsequent
logical decoding executions could access already-freed memory of the
entry's cache, resulting in a crash.
With this change, it's ensured that RelationSyncCache is cleaned up
even in error cases by using a memory context reset callback function.
Backpatch to 15, where entry_cxt was introduced for column filtering
and row filtering.
While the backbranches v13 and v14 have a similar issue where
RelationSyncCache persists even after an error when pgoutput is used
via SQL API, we decided not to backport this fix. This decision was
made because v13 is approaching its final minor release, and we won't
have an chance to fix any new issues that might arise. Additionally,
since using pgoutput via SQL API is not a common use case, the risk
outwights the benefit. If we receive bug reports, we can consider
backporting the fixes then.
Author: vignesh C <vignesh21@gmail.com>
Co-authored-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CALDaNm0x-aCehgt8Bevs2cm=uhmwS28MvbYq1=s2Ekf0aDPkOA@mail.gmail.com
Backpatch-through: 15
Previously, the parallel apply worker used SIGINT to receive a graceful
shutdown signal from the leader apply worker. However, SIGINT is also used
by the LOCK_TIMEOUT handler to trigger a query-cancel interrupt. This
overlap caused the parallel apply worker to miss LOCK_TIMEOUT signals,
leading to incorrect behavior during lock wait/contention.
This patch resolves the conflict by switching the graceful shutdown signal
from SIGINT to SIGUSR2.
Reported-by: Zane Duffield <duffieldzane@gmail.com>
Diagnosed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/CACMiCkXyC4au74kvE2g6Y=mCEF8X6r-Ne_ty4r7qWkUjRE4+oQ@mail.gmail.com
Commit 216a784829 introduced parallel apply workers, allowing multiple
processes to share a replication origin. To support this,
replorigin_session_setup() was extended to accept a pid argument
identifying the process using the origin.
This commit exposes that capability through the SQL interface function
pg_replication_origin_session_setup() by adding an optional pid parameter.
This enables multiple processes to coordinate replication using the same
origin when using SQL-level replication functions.
This change allows the non-builtin logical replication solutions to
implement parallel apply for large transactions.
Additionally, an existing internal error was made user-facing, as it can
now be triggered via the exposed SQL API.
Author: Doruk Yilmaz <doruk@mixrank.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com
Discussion: https://postgr.es/m/CAE2gYzyTSNvHY1+iWUwykaLETSuAZsCWyryokjP6rG46ZvRgQA@mail.gmail.com
This commit resumes automatic retention of conflict-relevant data for a
subscription. Previously, retention would stop if the apply process failed
to advance its xmin (oldest_nonremovable_xid) within the configured
max_retention_duration and user needs to manually re-enable
retain_dead_tuples option. With this change, retention will resume
automatically once the apply worker catches up and begins advancing its
xmin (oldest_nonremovable_xid) within the configured threshold.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Users of logical decoding can encounter an unexpected change of
CurrentResourceOwner and CurrentMemoryContext. The problem is that,
unlike other call sites of RollbackAndReleaseCurrentSubTransaction(), in
reorderbuffer.c we fail to restore the original values of these global
variables after being clobbered by subtransaction abort. This patch
saves the values prior to the call and restores them eventually.
In addition, logical.c and logicalfuncs.c had a hack to restore resource
owner, presumably because of lack of this restore. Remove that.
Instead, because the test coverage here is not very consistent, add an
Assert() to ensure that the resowner is kept identical; this would make
it easy to detect other cases of bugs were we fail to restore resowner
properly. This could be removed later.
This is arguably an old bug, but there appears to be no reason to
backpatch it and it's risky to do so, so refrain for now.
Author: Antonin Houska <ah@cybertec.at>
Reported-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/119497.1756892972@localhost
All the calls replaced by this commit use 4-byte integers for their
variables used in input of my_log2(). Hence, the limit against
too-large inputs does not really apply. Thresholds are also applied, as
of:
- In nodeAgg.c, the number of partitions is limited by
HASHAGG_MAX_PARTITIONS.
- In nodeHash.c, ExecChooseHashTableSize() caps its maximum number of
buckets based on HashJoinTuple and palloc() allocation limit.
- In worker.c, the number of subxacts tracked by ApplySubXactData uses
uint32, making pg_ceil_log2_64() safe to use directly.
Several approaches have been discussed, like an integration with
thresholds in pg_bitutils.h, but it was found confusing. This uses
Dean's idea, which gives a simpler result than what I came up with to be
able to remove dynahash.h. dynahash.h will be removed in a follow-up
commit, removing some duplication with the ceil log2 routines.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAEZATCUJPQD_7sC-wErak2CQGNa6bj2hY-mr8wsBki=kX7f2_A@mail.gmail.com
Address a potential SIGSEGV that may occur when the tablesync worker
attempts to locate a deleted row while applying changes. This situation
arises during conflict detection for update-deleted scenarios.
To prevent this crash, ensure that the operation is errored out early if
the leader apply worker is unavailable. Since the leader worker maintains
the necessary conflict detection metadata, proceeding without it serves no
purpose and risks reporting incorrect conflict type.
In the passing, improve a nearby comment.
Reported by Tom Lane as per Coverity
Author: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/334468.1757280992@sss.pgh.pa.us
This commit fixes three issues:
1) When a disabled subscription is created with retain_dead_tuples set to true,
the launcher is not woken up immediately, which may lead to delays in creating
the conflict detection slot.
Creating the conflict detection slot is essential even when the subscription is
not enabled. This ensures that dead tuples are retained, which is necessary for
accurately identifying the type of conflict during replication.
2) Conflict-related data was unnecessarily retained when the subscription does
not have a table.
3) Conflict-relevant data could be prematurely removed before applying
prepared transactions on the publisher that are in the commit critical section.
This issue occurred because the backend executing COMMIT PREPARED was not
accounted for during the computation of oldestXid in the commit phase on
the publisher. As a result, the subscriber could advance the conflict
slot's xmin without waiting for such COMMIT PREPARED transactions to
complete.
We fixed this issue by identifying prepared transactions that are in the
commit critical section during computation of oldestXid in commit phase.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS9PR01MB16913DACB64E5721872AA5C02943BA@OS9PR01MB16913.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/OS9PR01MB16913F67856B0DA2A909788129400A@OS9PR01MB16913.jpnprd01.prod.outlook.com
This commit introduces a new subscription parameter,
max_retention_duration, aimed at mitigating excessive accumulation of dead
tuples when retain_dead_tuples is enabled and the apply worker lags behind
the publisher.
When the time spent advancing a non-removable transaction ID exceeds the
max_retention_duration threshold, the apply worker will stop retaining
conflict detection information. In such cases, the conflict slot's xmin
will be set to InvalidTransactionId, provided that all apply workers
associated with the subscription (with retain_dead_tuples enabled) confirm
the retention duration has been exceeded.
To ensure retention status persists across server restarts, a new column
subretentionactive has been added to the pg_subscription catalog. This
prevents unnecessary reactivation of retention logic after a restart.
The conflict detection slot will not be automatically re-initialized
unless a new subscription is created with retain_dead_tuples = true, or
the user manually re-enables retain_dead_tuples.
A future patch will introduce support for automatic slot re-initialization
once at least one apply worker confirms that the retention duration is
within the configured max_retention_duration.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
This has been done historically because of get_database_name (which
since commit cb98e6fb8f belongs in lsyscache.c/h, so let's move it
there) and get_database_oid (which is in the right place, but whose
declaration should appear in pg_database.h rather than dbcommands.h).
Clean this up.
Also, xlogreader.h and stringinfo.h are no longer needed by dbcommands.h
since commit f1fd515b39, so remove them.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/202508191031.5ipojyuaswzt@alvherre.pgsql
Some replication slot manipulations (logical decoding via SQL,
advancing) were failing an assertion when releasing a slot in
single-user mode, because active_pid was not set in a ReplicationSlot
when its slot is acquired.
ReplicationSlotAcquire() has some logic to be able to work with the
single-user mode. This commit sets ReplicationSlot->active_pid to
MyProcPid, to let the slot-related logic fall-through, considering the
single process as the one holding the slot.
Some TAP tests are added for various replication slot functions with the
single-user mode, while on it, for slot creation, drop, advancing, copy
and logical decoding with multiple slot types (temporary, physical vs
logical). These tests are skipped on Windows, as direct calls of
postgres --single would fail on permission failures. There is no
platform-specific behavior that needs to be checked, so living with this
restriction should be fine. The CI is OK with that, now let's see what
the buildfarm tells.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Mutaamba Maasha <maasha@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966ED588A0328DAEBE8CB25F5FA2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 13
The DROP SUBSCRIPTION command performs several operations: it stops the
subscription workers, removes subscription-related entries from system
catalogs, and deletes the replication slot on the publisher server.
Previously, this command acquired an AccessExclusiveLock on
pg_subscription before initiating these steps.
However, while holding this lock, the command attempts to connect to the
publisher to remove the replication slot. In cases where the connection is
made to a newly created database on the same server as subscriber, the
cache-building process during connection tries to acquire an
AccessShareLock on pg_subscription, resulting in a self-deadlock.
To resolve this issue, we reduce the lock level on pg_subscription during
DROP SUBSCRIPTION from AccessExclusiveLock to RowExclusiveLock. Earlier,
the higher lock level was used to prevent the launcher from starting a new
worker during the drop operation, as a restarted worker could become
orphaned.
Now, instead of relying on a strict lock, we acquire an AccessShareLock on
the specific subscription being dropped and re-validate its existence
after acquiring the lock. If the subscription is no longer valid, the
worker exits gracefully. This approach avoids the deadlock while still
ensuring that orphan workers are not created.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/18988-7312c868be2d467f@postgresql.org
Commit 2633dae2e4 standardized LSN formatting but mistakenly changed
the logical snapshot filename format in SnapBuildSnapshotExists() from
"%X-%X.snap" to "%08X-%08X.snap". Other code still used the original
"%X-%X.snap" format, causing the replication slot synchronization worker
to fail to find existing snapshot files and produce excessive log messages.
This commit restores the original "%X-%X.snap" format
in SnapBuildSnapshotExists() to resolve the issue.
Author: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwHuHPB-ucAk_Tq3uSs4Fdziu1Jp_AA_RD3m5Ycky7m48w@mail.gmail.com
Macros like VARDATA() and VARSIZE() should be thought of as taking
values of type pointer to struct varlena or some other related struct.
The way they are implemented, you can pass anything to it and it will
cast it right. But this is in principle incorrect. To fix, add the
required DatumGetPointer() calls. Or in a couple of cases, remove
superfluous PointerGetDatum() calls.
It is planned in a subsequent patch to change macros like VARDATA()
and VARSIZE() to inline functions, which will enforce stricter typing.
This is in preparation for that.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/928ea48f-77c6-417b-897c-621ef16685a6%40eisentraut.org
Previously, enabling sync_replication_slots while wal_level was not set
to logical could cause the server to shut down. This was because
the postmaster performed a configuration check before launching
the slot synchronization worker and raised an ERROR if the settings
were incompatible. Since ERROR is treated as FATAL in the postmaster,
this resulted in the entire server shutting down unexpectedly.
This commit changes the postmaster to log that message with a LOG-level
instead of raising an ERROR, allowing the server to continue running
even with the misconfiguration.
Back-patch to v17, where slot synchronization was introduced.
Reported-by: Hugo DUBOIS <hdubois@scaleway.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Hugo DUBOIS <hdubois@scaleway.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CAH0PTU_pc3oHi__XESF9ZigCyzai1Mo3LsOdFyQA4aUDkm01RA@mail.gmail.com
Backpatch-through: 17
This enhancement builds upon the infrastructure introduced in commit
228c370868, which enables the preservation of deleted tuples and their
origin information on the subscriber. This capability is crucial for
handling concurrent transactions replicated from remote nodes.
The update introduces support for detecting update_deleted conflicts
during the application of update operations on the subscriber. When an
update operation fails to locate the target row-typically because it has
been concurrently deleted-we perform an additional table scan. This scan
uses the SnapshotAny mechanism and we do this additional scan only when
the retain_dead_tuples option is enabled for the relevant subscription.
The goal of this scan is to locate the most recently deleted tuple-matching
the old column values from the remote update-that has not yet been removed
by VACUUM and is still visible according to our slot (i.e., its deletion
is not older than conflict-detection-slot's xmin). If such a tuple is
found, the system reports an update_deleted conflict, including the origin
and transaction details responsible for the deletion.
This provides a groundwork for more robust and accurate conflict
resolution process, preventing unexpected behavior by correctly
identifying cases where a remote update clashes with a deletion from
another origin.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
In the current system architecture, none of these are worth obsessing
over; most are once-per-process leaks. However, Valgrind complains
about all of them, and if we get to using threads rather than
processes for backend sessions, it will become more interesting to
avoid per-session leaks.
* Fix leaks in StartupXLOG() and ShutdownWalRecovery().
* Fix leakage of pq_mq_handle in a parallel worker.
While at it, move mq_putmessage's "Assert(pq_mq_handle != NULL)"
to someplace where it's not trivially useless.
* Fix leak in logicalrep_worker_detach().
* Don't leak the startup-packet buffer in ProcessStartupPacket().
* Fix leak in evtcache.c's DecodeTextArrayToBitmapset().
If the presented array is toasted, this neglected to free the
detoasted copy, which was then leaked into EventTriggerCacheContext.
* I'm distressed by the amount of code that BuildEventTriggerCache
is willing to run while switched into a long-lived cache context.
Although the detoasted array is the only leak that Valgrind reports,
let's tighten things up while we're here. (DecodeTextArrayToBitmapset
is still run in the cache context, so doing this doesn't remove the
need for the detoast fix. But it reduces the surface area for other
leaks.)
* load_domaintype_info() intentionally leaked some intermediate cruft
into the long-lived DomainConstraintCache's memory context, reasoning
that the amount of leakage will typically not be much so it's not
worth doing a copyObject() of the final tree to avoid that. But
Valgrind knows nothing of engineering tradeoffs and complains anyway.
On the whole, the copyObject doesn't cost that much and this is surely
not a performance-critical code path, so let's do it the clean way.
* MarkGUCPrefixReserved didn't bother to clean up removed placeholder
GUCs at all, which shows up as a leak in one regression test.
It seems appropriate for it to do as much cleanup as
define_custom_variable does when replacing placeholders, so factor
that code out into a helper function. define_custom_variable's logic
was one brick shy of a load too: it forgot to free the separate
allocation for the placeholder's name.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
In ReorderBufferProcessTXN(), used to send the data of a transaction to
an output plugin, INSERT ON CONFLICT changes (INTERNAL_SPEC_INSERT) are
delayed until a confirmation record arrives (INTERNAL_SPEC_CONFIRM),
updating the change being processed.
8c58624df4 has added an extra step after processing a change to update
the progress of the transaction, by calling the callback
update_progress_txn() based on the LSN stored in a change after a
threshold of CHANGES_THRESHOLD (100) is reached. This logic has missed
the fact that for an INSERT ON CONFLICT change the data is freed once
processed, hence update_progress_txn() could be called pointing to a LSN
value that's already been freed. This could result in random crashes,
depending on the workload.
Per discussion, this issue is fixed by reusing in update_progress_txn()
the LSN from the change processed found at the beginning of the loop,
meaning that for a INTERNAL_SPEC_CONFIRM change the progress is updated
using the LSN of the INTERNAL_SPEC_CONFIRM change, and not the LSN from
its INTERNAL_SPEC_INSERT change. This is actually more correct, as we
want to update the progress to point to the INTERNAL_SPEC_CONFIRM
change.
Masahiko Sawada has found a nice trick to reproduce the issue: hardcode
CHANGES_THRESHOLD at 1 and run test_decoding (test "ddl" being enough)
on an instance running valgrind. The bug has been analyzed by Ethan
Mertz, who also originally suggested the solution used in this patch.
Issue introduced by 8c58624df4, so backpatch down to v16.
Author: Ethan Mertz <ethan.mertz@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aIsQqDZ7x4LAQ6u1@paquier.xyz
Backpatch-through: 16
A deadlock can occur when the DDL command and the apply worker acquire
catalog locks in different orders while dropping replication origins.
The issue is rare in PG16 and higher branches because, in most cases, the
tablesync worker performs the origin drop in those branches, and its
locking sequence does not conflict with DDL operations.
This patch ensures consistent lock acquisition to prevent such deadlocks.
As per buildfarm.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/bab95e12-6cc5-4ebb-80a8-3e41956aa297@gmail.com
Commit 473a575e05 purported to make this
function stash the error message in *syncrep_parse_result_p, but
it didn't actually.
As a result, an attempt to set synchronous_standby_names to any value
that does not parse resulted in a generic "parser failed." message
rather than anything more specific. This fixes that.
Discussion: http://postgr.es/m/CA+TgmoYF9wPNZ-Q_EMfib_espgHycY-eX__6Tzo2GpYpVXqCdQ@mail.gmail.com
Backpatch-through: 18
Remove a bunch of PG_TRY constructs, de-volatilize related
variables, remove some PQclear calls in error paths.
Aside from making the code simpler and shorter, this should
provide some marginal performance gains.
For ease of review, I did not re-indent code within the removed
PG_TRY constructs. That'll be done in a separate patch.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/2976982.1748049023@sss.pgh.pa.us
Logical replication requires reliable conflict detection to maintain data
consistency across nodes. To achieve this, we must prevent premature
removal of tuples deleted by other origins and their associated commit_ts
data by VACUUM, which could otherwise lead to incorrect conflict reporting
and resolution.
This patch introduces a mechanism to retain deleted tuples on the
subscriber during the application of concurrent transactions from remote
nodes. Retaining these tuples allows us to correctly ignore concurrent
updates to the same tuple. Without this, an UPDATE might be misinterpreted
as an INSERT during resolutions due to the absence of the original tuple.
Additionally, we ensure that origin metadata is not prematurely removed by
vacuum freeze, which is essential for detecting update_origin_differs and
delete_origin_differs conflicts.
To support this, a new replication slot named pg_conflict_detection is
created and maintained by the launcher on the subscriber. Each apply
worker tracks its own non-removable transaction ID, which the launcher
aggregates to determine the appropriate xmin for the slot, thereby
retaining necessary tuples.
Conflict information retention (deleted tuples and commit_ts) can be
enabled per subscription via the retain_conflict_info option. This is
disabled by default to avoid unnecessary overhead for configurations that
do not require conflict resolution or logging.
During upgrades, if any subscription on the old cluster has
retain_conflict_info enabled, a conflict detection slot will be created to
protect relevant tuples from deletion when the new cluster starts.
This is a foundational work to correctly detect update_deleted conflict
which will be done in a follow-up patch.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Commit 112faf1378 introduced a translation marker in libpq-be-fe-helpers.h,
but this caused build failures on some platforms—such as the one reported
by buildfarm member indri—due to linker issues with dblink. This is the same
problem previously addressed in commit 213c959a29.
To fix the issue, this commit removes the translation marker from
libpq-be-fe-helpers.h, following the approach used in 213c959a29.
It also removes the associated gettext_noop() calls added in commit
112faf1378, as they are no longer needed.
While reviewing this, a gettext_noop() call was also found in
contrib/basic_archive. Since contrib modules don't support translation,
this call has been removed as well.
Per buildfarm member indri.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/0e6299d9-608a-4ffa-aeb1-40cb8a99000b@oss.nttdata.com
Previously, NOTICE, WARNING, and similar messages received from remote
servers over replication, postgres_fdw, or dblink connections were printed
directly to stderr on the local server (e.g., the subscriber). As a result,
these messages lacked log prefixes (e.g., timestamp), making them harder
to trace and correlate with other log entries.
This commit addresses the issue by introducing a custom notice receiver
for replication, postgres_fdw, and dblink connections. These messages
are now logged via ereport(), ensuring they appear in the logs with proper
formatting and context, which improves clarity and aids in debugging.
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CALDaNm2xsHpWRtLm-VL_HJCsaE3+1Y_n-jDEAr3-suxVqc3xoQ@mail.gmail.com
The pgoutput plugin initializes optional parameters like "binary" with
default values at the start of processing. However, the "origin"
parameter was previously missed and left without explicit initialization.
Although the PGOutputData struct, which holds these settings,
is zero-initialized at allocation (resulting in publish_no_origin field
for "origin" parameter being false by default), this default was not
set explicitly, unlike other parameters.
This commit adds explicit initialization of the "origin" parameter to
ensure consistency and clarity in how defaults are handled.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/d2790f10-238d-4cb5-a743-d9d2a9dd900f@oss.nttdata.com
Previously, the check_hook functions for max_slot_wal_keep_size and
idle_replication_slot_timeout would incorrectly raise an ERROR for values
set in postgresql.conf during upgrade, even though those values were not
actively used in the upgrade process.
To prevent logical slot invalidation during upgrade, we used to set
special values for these GUCs. Now, instead of relying on those values, we
directly prevent WAL removal and logical slot invalidation caused by
max_slot_wal_keep_size and idle_replication_slot_timeout.
Note: PostgreSQL 17 does not include the idle_replication_slot_timeout
GUC, so related changes were not backported.
BUG #18979
Reported-by: jorsol <jorsol@gmail.com>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed by: vignesh C <vignesh21@gmail.com>
Reviewed by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/219561.1751826409@sss.pgh.pa.us
Discussion: https://postgr.es/m/18979-a1b7fdbb7cd181c6@postgresql.org
Previously, the idle_replication_slot_timeout parameter used minutes
as its unit, based on the assumption that values would typically exceed
one minute in production environments. However, this caused unexpected
behavior: specifying a value below 30 seconds would round down to 0,
effectively disabling the timeout. This could be surprising to users.
To allow finer-grained control and avoid such confusion, this commit changes
the unit of idle_replication_slot_timeout to seconds. Larger values can
still be specified easily using standard time suffixes, for example,
'24h' for 24 hours.
Back-patch to v18 where idle_replication_slot_timeout was added.
Reported-by: Gunnar Morling <gunnar.morling@googlemail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CADGJaX_0+FTguWpNSpgVWYQP_7MhoO0D8=cp4XozSQgaZ40Odw@mail.gmail.com
Backpatch-through: 18
This commit standardizes the output format for LSNs to ensure consistent
representation across various tools and messages. Previously, LSNs were
inconsistently printed as `%X/%X` in some contexts, while others used
zero-padding. This often led to confusion when comparing.
To address this, the LSN format is now uniformly set to `%X/%08X`,
ensuring the lower 32-bit part is always zero-padded to eight
hexadecimal digits.
Author: Japin Li <japinli@hotmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/ME0P300MB0445CA53CA0E4B8C1879AF84B641A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM