1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-28 23:42:10 +03:00

Invalidate inactive replication slots.

This commit introduces idle_replication_slot_timeout GUC that allows
inactive slots to be invalidated at the time of checkpoint. Because
checkpoints happen checkpoint_timeout intervals, there can be some lag
between when the idle_replication_slot_timeout was exceeded and when the
slot invalidation is triggered at the next checkpoint. To avoid such lags,
users can force a checkpoint to promptly invalidate inactive slots.

Note that the idle timeout invalidation mechanism is not applicable for
slots that do not reserve WAL or for slots on the standby server that are
synced from the primary server (i.e., standby slots having 'synced' field
'true'). Synced slots are always considered to be inactive because they
don't perform logical decoding to produce changes.

The slots can become inactive for a long period if a subscriber is down
due to a system error or inaccessible because of network issues. If such a
situation persists, it might be more practical to recreate the subscriber
rather than attempt to recover the node and wait for it to catch up which
could be time-consuming.

Then, external tools could create replication slots (e.g., for migrations
or upgrades) that may fail to remove them if an error occurs, leaving
behind unused slots that take up space and resources. Manually cleaning
them up can be tedious and error-prone, and without intervention, these
lingering slots can cause unnecessary WAL retention and system bloat.

As the duration of idle_replication_slot_timeout is in minutes, any test
using that would be time-consuming. We are planning to commit a follow up
patch for tests by using the injection point framework.

Author: Nisha Moond <nisha.moond412@gmail.com>
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB5716C131A7D80DAE8CB9E88794FC2@OS0PR01MB5716.jpnprd01.prod.outlook.com
This commit is contained in:
Amit Kapila
2025-02-19 09:29:50 +05:30
parent b464e51ab3
commit ac0e33136a
15 changed files with 368 additions and 87 deletions

View File

@ -4429,6 +4429,46 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
</listitem>
</varlistentry>
<varlistentry id="guc-idle-replication-slot-timeout" xreflabel="idle_replication_slot_timeout">
<term><varname>idle_replication_slot_timeout</varname> (<type>integer</type>)
<indexterm>
<primary><varname>idle_replication_slot_timeout</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Invalidate replication slots that have remained idle longer than this
duration. If this value is specified without units, it is taken as
minutes. A value of zero (the default) disables the idle timeout
invalidation mechanism. This parameter can only be set in the
<filename>postgresql.conf</filename> file or on the server command
line.
</para>
<para>
Slot invalidation due to idle timeout occurs during checkpoint.
Because checkpoints happen at <varname>checkpoint_timeout</varname>
intervals, there can be some lag between when the
<varname>idle_replication_slot_timeout</varname> was exceeded and when
the slot invalidation is triggered at the next checkpoint.
To avoid such lags, users can force a checkpoint to promptly invalidate
inactive slots. The duration of slot inactivity is calculated using the
slot's <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>inactive_since</structfield>
value.
</para>
<para>
Note that the idle timeout invalidation mechanism is not applicable
for slots that do not reserve WAL or for slots on the standby server
that are being synced from the primary server (i.e., standby slots
having <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>synced</structfield>
value <literal>true</literal>). Synced slots are always considered to
be inactive because they don't perform logical decoding to produce
changes.
</para>
</listitem>
</varlistentry>
<varlistentry id="guc-wal-sender-timeout" xreflabel="wal_sender_timeout">
<term><varname>wal_sender_timeout</varname> (<type>integer</type>)
<indexterm>

View File

@ -2390,6 +2390,11 @@ CONTEXT: processing remote data for replication origin "pg_16395" during "INSER
plus some reserve for table synchronization.
</para>
<para>
Logical replication slots are also affected by
<link linkend="guc-idle-replication-slot-timeout"><varname>idle_replication_slot_timeout</varname></link>.
</para>
<para>
<link linkend="guc-max-wal-senders"><varname>max_wal_senders</varname></link>
should be set to at least the same as

View File

@ -2619,6 +2619,13 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
perform logical decoding. It is set only for logical slots.
</para>
</listitem>
<listitem>
<para>
<literal>idle_timeout</literal> means that the slot has remained
idle longer than the configured
<xref linkend="guc-idle-replication-slot-timeout"/> duration.
</para>
</listitem>
</itemizedlist>
</para></entry>
</row>