mirror of
https://github.com/postgres/postgres.git
synced 2025-11-24 00:23:06 +03:00
Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.
If TRUNCATE causes some buffers to be invalidated and thus the checkpoint does not flush them, TRUNCATE must also ensure that the corresponding files are truncated on disk. Otherwise, a replay from the checkpoint might find that the buffers exist but have the wrong contents, which may cause replay to fail. Report by Teja Mupparti. Patch by Kyotaro Horiguchi, per a design suggestion from Heikki Linnakangas, with some changes to the comments by me. Review of this and a prior patch that approached the issue differently by Heikki Linnakangas, Andres Freund, Álvaro Herrera, Masahiko Sawada, and Tom Lane. Discussion: http://postgr.es/m/BYAPR06MB6373BF50B469CA393C614257ABF00@BYAPR06MB6373.namprd06.prod.outlook.com
This commit is contained in:
@@ -325,6 +325,22 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
|
||||
|
||||
RelationPreTruncate(rel);
|
||||
|
||||
/*
|
||||
* Make sure that a concurrent checkpoint can't complete while truncation
|
||||
* is in progress.
|
||||
*
|
||||
* The truncation operation might drop buffers that the checkpoint
|
||||
* otherwise would have flushed. If it does, then it's essential that
|
||||
* the files actually get truncated on disk before the checkpoint record
|
||||
* is written. Otherwise, if reply begins from that checkpoint, the
|
||||
* to-be-truncated blocks might still exist on disk but have older
|
||||
* contents than expected, which can cause replay to fail. It's OK for
|
||||
* the blocks to not exist on disk at all, but not for them to have the
|
||||
* wrong contents.
|
||||
*/
|
||||
Assert((MyProc->delayChkpt & DELAY_CHKPT_COMPLETE) == 0);
|
||||
MyProc->delayChkpt |= DELAY_CHKPT_COMPLETE;
|
||||
|
||||
/*
|
||||
* We WAL-log the truncation before actually truncating, which means
|
||||
* trouble if the truncation fails. If we then crash, the WAL replay
|
||||
@@ -363,13 +379,24 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
|
||||
XLogFlush(lsn);
|
||||
}
|
||||
|
||||
/* Do the real work to truncate relation forks */
|
||||
/*
|
||||
* This will first remove any buffers from the buffer pool that should no
|
||||
* longer exist after truncation is complete, and then truncate the
|
||||
* corresponding files on disk.
|
||||
*/
|
||||
smgrtruncate(RelationGetSmgr(rel), forks, nforks, blocks);
|
||||
|
||||
/* We've done all the critical work, so checkpoints are OK now. */
|
||||
MyProc->delayChkpt &= ~DELAY_CHKPT_COMPLETE;
|
||||
|
||||
/*
|
||||
* Update upper-level FSM pages to account for the truncation. This is
|
||||
* important because the just-truncated pages were likely marked as
|
||||
* all-free, and would be preferentially selected.
|
||||
*
|
||||
* NB: There's no point in delaying checkpoints until this is done.
|
||||
* Because the FSM is not WAL-logged, we have to be prepared for the
|
||||
* possibility of corruption after a crash anyway.
|
||||
*/
|
||||
if (need_fsm_vacuum)
|
||||
FreeSpaceMapVacuumRange(rel, nblocks, InvalidBlockNumber);
|
||||
|
||||
Reference in New Issue
Block a user