Dept. of further reflection: I looked around to see if any other callers

of XLogInsert had the same sort of checkpoint interlock problem as RecordTransactionCommit, and indeed I found some. Btree index build and ALTER TABLE SET TABLESPACE write data outside the friendly confines of the buffer manager, and therefore they have to take their own responsibility for checkpoint interlock. The easiest solution seems to be to force smgrimmedsync at the end of the index build or table copy, even when the operation is being WAL-logged. This is sufficient since the new index or table will be of interest to no one if we don't get as far as committing the current transaction.
2025-10-18 04:29:09 +03:00 · 2004-08-15 23:44:46 +00:00
parent 057ea3471f
commit 1a3de15a3a
3 changed files with 46 additions and 24 deletions
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8,7 +8,7 @@
 *
 *
 * IDENTIFICATION
- *	  $PostgreSQL: pgsql/src/backend/commands/tablecmds.c,v 1.125 2004/08/13 04:50:28 tgl Exp $
+ *	  $PostgreSQL: pgsql/src/backend/commands/tablecmds.c,v 1.126 2004/08/15 23:44:46 tgl Exp $
 *
 *-------------------------------------------------------------------------
 */
@@ -5479,18 +5479,29 @@ copy_relation_data(Relation rel, SMgrRelation dst)
 		}

 		/*
-		 * Now write the page.  If not using WAL, say isTemp = true, to
-		 * suppress duplicate fsync.  If we are using WAL, it surely isn't a
-		 * temp rel, so !use_wal is a sufficient condition.
+		 * Now write the page.  We say isTemp = true even if it's not a
+		 * temp rel, because there's no need for smgr to schedule an fsync
+		 * for this write; we'll do it ourselves below.
 		 */
-		smgrwrite(dst, blkno, buf, !use_wal);
+		smgrwrite(dst, blkno, buf, true);
 	}

 	/*
-	 * If we weren't using WAL, and the rel isn't temp, we must fsync it
-	 * down to disk before it's safe to commit the transaction.
+	 * If the rel isn't temp, we must fsync it down to disk before it's
+	 * safe to commit the transaction.  (For a temp rel we don't care
+	 * since the rel will be uninteresting after a crash anyway.)
+	 *
+	 * It's obvious that we must do this when not WAL-logging the copy.
+	 * It's less obvious that we have to do it even if we did WAL-log the
+	 * copied pages.  The reason is that since we're copying outside
+	 * shared buffers, a CHECKPOINT occurring during the copy has no way
+	 * to flush the previously written data to disk (indeed it won't know
+	 * the new rel even exists).  A crash later on would replay WAL from the
+	 * checkpoint, therefore it wouldn't replay our earlier WAL entries.
+	 * If we do not fsync those pages here, they might still not be on disk
+	 * when the crash occurs.
 	 */
-	if (!use_wal && !rel->rd_istemp)
+	if (!rel->rd_istemp)
 		smgrimmedsync(dst);
 }