mirror of
				https://github.com/postgres/postgres.git
				synced 2025-11-03 09:13:20 +03:00 
			
		
		
		
	RelationTruncate() must set DELAY_CHKPT_START.
Previously, it set only DELAY_CHKPT_COMPLETE. That was important,
because it meant that if the XLOG_SMGR_TRUNCATE record preceded a
XLOG_CHECKPOINT_ONLINE record in the WAL, then the truncation would also
happen on disk before the XLOG_CHECKPOINT_ONLINE record was
written.
However, it didn't guarantee that the sync request for the truncation
was processed before the XLOG_CHECKPOINT_ONLINE record was written. By
setting DELAY_CHKPT_START, we guarantee that if an XLOG_SMGR_TRUNCATE
record is written to WAL before the redo pointer of a concurrent
checkpoint, the sync request queued by that operation must be processed
by that checkpoint, rather than being left for the following one.
This is a refinement of commit 412ad7a556.  Back-patch to all supported
releases, like that commit.
Author: Robert Haas <robertmhaas@gmail.com>
Reported-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2B-2rjGZC2kwqr2NMLBcEBp4uf59QT1advbWYF_uc%2B0Aw%40mail.gmail.com
			
			
This commit is contained in:
		@@ -326,20 +326,35 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 | 
			
		||||
	RelationPreTruncate(rel);
 | 
			
		||||
 | 
			
		||||
	/*
 | 
			
		||||
	 * Make sure that a concurrent checkpoint can't complete while truncation
 | 
			
		||||
	 * is in progress.
 | 
			
		||||
	 * The code which follows can interact with concurrent checkpoints in two
 | 
			
		||||
	 * separate ways.
 | 
			
		||||
	 *
 | 
			
		||||
	 * The truncation operation might drop buffers that the checkpoint
 | 
			
		||||
	 * otherwise would have flushed. If it does, then it's essential that
 | 
			
		||||
	 * the files actually get truncated on disk before the checkpoint record
 | 
			
		||||
	 * is written. Otherwise, if reply begins from that checkpoint, the
 | 
			
		||||
	 * First, the truncation operation might drop buffers that the checkpoint
 | 
			
		||||
	 * otherwise would have flushed. If it does, then it's essential that the
 | 
			
		||||
	 * files actually get truncated on disk before the checkpoint record is
 | 
			
		||||
	 * written. Otherwise, if reply begins from that checkpoint, the
 | 
			
		||||
	 * to-be-truncated blocks might still exist on disk but have older
 | 
			
		||||
	 * contents than expected, which can cause replay to fail. It's OK for
 | 
			
		||||
	 * the blocks to not exist on disk at all, but not for them to have the
 | 
			
		||||
	 * wrong contents.
 | 
			
		||||
	 * contents than expected, which can cause replay to fail. It's OK for the
 | 
			
		||||
	 * blocks to not exist on disk at all, but not for them to have the wrong
 | 
			
		||||
	 * contents. For this reason, we need to set DELAY_CHKPT_COMPLETE while
 | 
			
		||||
	 * this code executes.
 | 
			
		||||
	 *
 | 
			
		||||
	 * Second, the call to smgrtruncate() below will in turn call
 | 
			
		||||
	 * RegisterSyncRequest(). We need the sync request created by that call to
 | 
			
		||||
	 * be processed before the checkpoint completes. CheckPointGuts() will
 | 
			
		||||
	 * call ProcessSyncRequests(), but if we register our sync request after
 | 
			
		||||
	 * that happens, then the WAL record for the truncation could end up
 | 
			
		||||
	 * preceding the checkpoint record, while the actual sync doesn't happen
 | 
			
		||||
	 * until the next checkpoint. To prevent that, we need to set
 | 
			
		||||
	 * DELAY_CHKPT_START here. That way, if the XLOG_SMGR_TRUNCATE precedes
 | 
			
		||||
	 * the redo pointer of a concurrent checkpoint, we're guaranteed that the
 | 
			
		||||
	 * corresponding sync request will be processed before the checkpoint
 | 
			
		||||
	 * completes.
 | 
			
		||||
	 */
 | 
			
		||||
	Assert(!MyProc->delayChkpt);
 | 
			
		||||
	MyProc->delayChkpt = true;			/* DELAY_CHKPT_START */
 | 
			
		||||
	Assert(!MyProc->delayChkptEnd);
 | 
			
		||||
	MyProc->delayChkptEnd = true;
 | 
			
		||||
	MyProc->delayChkptEnd = true;		/* DELAY_CHKPT_COMPLETE */
 | 
			
		||||
 | 
			
		||||
	/*
 | 
			
		||||
	 * We WAL-log the truncation before actually truncating, which means
 | 
			
		||||
@@ -387,6 +402,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 | 
			
		||||
	smgrtruncate(RelationGetSmgr(rel), forks, nforks, blocks);
 | 
			
		||||
 | 
			
		||||
	/* We've done all the critical work, so checkpoints are OK now. */
 | 
			
		||||
	MyProc->delayChkpt = false;
 | 
			
		||||
	MyProc->delayChkptEnd = false;
 | 
			
		||||
 | 
			
		||||
	/*
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user