1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-24 00:23:06 +03:00

Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.

If TRUNCATE causes some buffers to be invalidated and thus the
checkpoint does not flush them, TRUNCATE must also ensure that the
corresponding files are truncated on disk. Otherwise, a replay
from the checkpoint might find that the buffers exist but have
the wrong contents, which may cause replay to fail.

Report by Teja Mupparti. Patch by Kyotaro Horiguchi, per a design
suggestion from Heikki Linnakangas, with some changes to the
comments by me. Review of this and a prior patch that approached
the issue differently by Heikki Linnakangas, Andres Freund, Álvaro
Herrera, Masahiko Sawada, and Tom Lane.

Discussion: http://postgr.es/m/BYAPR06MB6373BF50B469CA393C614257ABF00@BYAPR06MB6373.namprd06.prod.outlook.com
This commit is contained in:
Robert Haas
2022-03-24 14:32:06 -04:00
parent 86459b3296
commit 412ad7a556
11 changed files with 120 additions and 28 deletions

View File

@@ -325,6 +325,22 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
RelationPreTruncate(rel);
/*
* Make sure that a concurrent checkpoint can't complete while truncation
* is in progress.
*
* The truncation operation might drop buffers that the checkpoint
* otherwise would have flushed. If it does, then it's essential that
* the files actually get truncated on disk before the checkpoint record
* is written. Otherwise, if reply begins from that checkpoint, the
* to-be-truncated blocks might still exist on disk but have older
* contents than expected, which can cause replay to fail. It's OK for
* the blocks to not exist on disk at all, but not for them to have the
* wrong contents.
*/
Assert((MyProc->delayChkpt & DELAY_CHKPT_COMPLETE) == 0);
MyProc->delayChkpt |= DELAY_CHKPT_COMPLETE;
/*
* We WAL-log the truncation before actually truncating, which means
* trouble if the truncation fails. If we then crash, the WAL replay
@@ -363,13 +379,24 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
XLogFlush(lsn);
}
/* Do the real work to truncate relation forks */
/*
* This will first remove any buffers from the buffer pool that should no
* longer exist after truncation is complete, and then truncate the
* corresponding files on disk.
*/
smgrtruncate(RelationGetSmgr(rel), forks, nforks, blocks);
/* We've done all the critical work, so checkpoints are OK now. */
MyProc->delayChkpt &= ~DELAY_CHKPT_COMPLETE;
/*
* Update upper-level FSM pages to account for the truncation. This is
* important because the just-truncated pages were likely marked as
* all-free, and would be preferentially selected.
*
* NB: There's no point in delaying checkpoints until this is done.
* Because the FSM is not WAL-logged, we have to be prepared for the
* possibility of corruption after a crash anyway.
*/
if (need_fsm_vacuum)
FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);