During recovery, if we reach consistent state and still have entries in the

invalid-page hash table, PANIC immediately. Immediate PANIC is much better than waiting for end-of-recovery, which is what we did before, because the end-of-recovery might not come until months later if this is a standby server. Also refrain from creating a restartpoint if there are invalid-page entries in the hash table. Restarting recovery from such a restartpoint would not see the invalid references, and wouldn't be able to cross-check them when consistency is reached. That wouldn't matter when things are going smoothly, but the more sanity checks you have the better. Fujii Masao
2025-10-16 17:07:43 +03:00 · 2011-12-02 10:49:54 +02:00
parent 15a5006aac
commit 1e616f6391
4 changed files with 70 additions and 28 deletions
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -562,7 +562,7 @@ static TimeLineID lastPageTLI = 0;
 static XLogRecPtr minRecoveryPoint;		/* local copy of
 										 * ControlFile->minRecoveryPoint */
 static bool updateMinRecoveryPoint = true;
-static bool reachedMinRecoveryPoint = false;
+bool reachedMinRecoveryPoint = false;

 static bool InRedo = false;

@@ -6758,12 +6758,6 @@ StartupXLOG(void)
 		/* Disallow XLogInsert again */
 		LocalXLogInsertAllowed = -1;

-		/*
-		 * Check to see if the XLOG sequence contained any unresolved
-		 * references to uninitialized pages.
-		 */
-		XLogCheckInvalidPages();
-
 		/*
 		 * Perform a checkpoint to update all our recovery activity to disk.
 		 *
@@ -6906,6 +6900,12 @@ CheckRecoveryConsistency(void)
 		XLByteLE(minRecoveryPoint, EndRecPtr) &&
 		XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
 	{
+		/*
+		 * Check to see if the XLOG sequence contained any unresolved
+		 * references to uninitialized pages.
+		 */
+		XLogCheckInvalidPages();
+
 		reachedMinRecoveryPoint = true;
 		ereport(LOG,
 				(errmsg("consistent recovery state reached at %X/%X",
@@ -7907,7 +7907,7 @@ RecoveryRestartPoint(const CheckPoint *checkPoint)
 	volatile XLogCtlData *xlogctl = XLogCtl;

 	/*
-	 * Is it safe to checkpoint?  We must ask each of the resource managers
+	 * Is it safe to restartpoint?  We must ask each of the resource managers
 	 * whether they have any partial state information that might prevent a
 	 * correct restart from this point.  If so, we skip this opportunity, but
 	 * return at the next checkpoint record for another try.
@@ -7926,6 +7926,22 @@ RecoveryRestartPoint(const CheckPoint *checkPoint)
 			}
 	}

+	/*
+	 * Also refrain from creating a restartpoint if we have seen any references
+	 * to non-existent pages. Restarting recovery from the restartpoint would
+	 * not see the references, so we would lose the cross-check that the pages
+	 * belonged to a relation that was dropped later.
+	 */
+	if (XLogHaveInvalidPages())
+	{
+		elog(trace_recovery(DEBUG2),
+			 "could not record restart point at %X/%X because there "
+			 "are unresolved references to invalid pages",
+			 checkPoint->redo.xlogid,
+			 checkPoint->redo.xrecoff);
+		return;
+	}
+
 	/*
 	 * Copy the checkpoint record to shared memory, so that checkpointer
 	 * can work out the next time it wants to perform a restartpoint.