1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-06 07:49:08 +03:00

Keep WAL segments by the flushed value of the slot's restart LSN

The patch fixes the issue with the unexpected removal of old WAL segments
after checkpoint, followed by an immediate restart.  The issue occurs when
a slot is advanced after the start of the checkpoint and before old WAL
segments are removed at the end of the checkpoint.

The idea of the patch is to get the minimal restart_lsn at the beginning
of checkpoint (or restart point) creation and use this value when calculating
the oldest LSN for WAL segments removal at the end of checkpoint.  This idea
was proposed by Tomas Vondra in the discussion.  Unlike 291221c46575, this
fix doesn't affect ABI and is intended for back branches.

Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 13
This commit is contained in:
Alexander Korotkov
2025-06-14 03:33:15 +03:00
parent d2ec671092
commit dd9bc1a17d
3 changed files with 60 additions and 9 deletions

View File

@@ -1803,7 +1803,15 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
SpinLockRelease(&MyReplicationSlot->mutex);
/* first write new xmin to disk, so we know what's up after a crash */
/*
* First, write new xmin and restart_lsn to disk so we know what's up
* after a crash. Even when we do this, the checkpointer can see the
* updated restart_lsn value in the shared memory; then, a crash can
* happen before we manage to write that value to the disk. Thus,
* checkpointer still needs to make special efforts to keep WAL
* segments required by the restart_lsn written to the disk. See
* CreateCheckPoint() and CreateRestartPoint() for details.
*/
if (updated_xmin || updated_restart)
{
ReplicationSlotMarkDirty();

View File

@@ -2065,6 +2065,10 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
* be energy wasted - the worst lost information can do here is give us
* wrong information in a statistics view - we'll just potentially be more
* conservative in removing files.
*
* Checkpointer makes special efforts to keep the WAL segments required by
* the restart_lsn written to the disk. See CreateCheckPoint() and
* CreateRestartPoint() for details.
*/
}