For inplace update, send nontransactional invalidations.

The inplace update survives ROLLBACK. The inval didn't, so another backend's DDL could then update the row without incorporating the inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER TABLE resulted in a table with an index, yet relhasindex=f. That is a source of index corruption. Back-patch to v12 (all supported versions). The back branch versions don't change WAL, because those branches just added end-of-recovery SIResetAll(). All branches change the ABI of extern function PrepareToInvalidateCacheTuple(). No PGXN extension calls that, and there's no apparent use case in extensions. Reviewed by Nitin Motiani and (in earlier versions) Andres Freund. Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
2025-11-04 20:11:56 +03:00 · 2024-10-25 06:51:02 -07:00
parent 3baf804b72
commit 4eac5a1fa7
12 changed files with 321 additions and 133 deletions
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6302,6 +6302,24 @@ heap_inplace_update_and_unlock(Relation relation,
 	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
 		elog(ERROR, "wrong tuple length");

+	/*
+	 * Construct shared cache inval if necessary.  Note that because we only
+	 * pass the new version of the tuple, this mustn't be used for any
+	 * operations that could change catcache lookup keys.  But we aren't
+	 * bothering with index updates either, so that's true a fortiori.
+	 */
+	CacheInvalidateHeapTupleInplace(relation, tuple, NULL);
+
+	/*
+	 * Unlink relcache init files as needed.  If unlinking, acquire
+	 * RelCacheInitLock until after associated invalidations.  By doing this
+	 * in advance, if we checkpoint and then crash between inplace
+	 * XLogInsert() and inval, we don't rely on StartupXLOG() ->
+	 * RelationCacheInitFileRemove().  That uses elevel==LOG, so replay would
+	 * neglect to PANIC on EIO.
+	 */
+	PreInplace_Inval();
+
 	/* NO EREPORT(ERROR) from here till changes are logged */
 	START_CRIT_SECTION();

@@ -6345,17 +6363,28 @@ heap_inplace_update_and_unlock(Relation relation,
 		PageSetLSN(BufferGetPage(buffer), recptr);
 	}

-	END_CRIT_SECTION();
-
-	heap_inplace_unlock(relation, oldtup, buffer);
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);

 	/*
-	 * Send out shared cache inval if necessary.  Note that because we only
-	 * pass the new version of the tuple, this mustn't be used for any
-	 * operations that could change catcache lookup keys.  But we aren't
-	 * bothering with index updates either, so that's true a fortiori.
+	 * Send invalidations to shared queue.  SearchSysCacheLocked1() assumes we
+	 * do this before UnlockTuple().
 	 *
-	 * XXX ROLLBACK discards the invalidation.  See test inplace-inval.spec.
+	 * If we're mutating a tuple visible only to this transaction, there's an
+	 * equivalent transactional inval from the action that created the tuple,
+	 * and this inval is superfluous.
+	 */
+	AtInplace_Inval();
+
+	END_CRIT_SECTION();
+	UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
+
+	AcceptInvalidationMessages();	/* local processing of just-sent inval */
+
+	/*
+	 * Queue a transactional inval.  The immediate invalidation we just sent
+	 * is the only one known to be necessary.  To reduce risk from the
+	 * transition to immediate invalidation, continue sending a transactional
+	 * invalidation like we've long done.  Third-party code might rely on it.
 	 */
 	if (!IsBootstrapProcessingMode())
 		CacheInvalidateHeapTuple(relation, tuple, NULL);
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1333,14 +1333,24 @@ RecordTransactionCommit(void)

 		/*
 		 * Transactions without an assigned xid can contain invalidation
-		 * messages (e.g. explicit relcache invalidations or catcache
-		 * invalidations for inplace updates); standbys need to process those.
-		 * We can't emit a commit record without an xid, and we don't want to
-		 * force assigning an xid, because that'd be problematic for e.g.
-		 * vacuum.  Hence we emit a bespoke record for the invalidations. We
-		 * don't want to use that in case a commit record is emitted, so they
-		 * happen synchronously with commits (besides not wanting to emit more
-		 * WAL records).
+		 * messages.  While inplace updates do this, this is not known to be
+		 * necessary; see comment at inplace CacheInvalidateHeapTuple().
+		 * Extensions might still rely on this capability, and standbys may
+		 * need to process those invals.  We can't emit a commit record
+		 * without an xid, and we don't want to force assigning an xid,
+		 * because that'd be problematic for e.g. vacuum.  Hence we emit a
+		 * bespoke record for the invalidations. We don't want to use that in
+		 * case a commit record is emitted, so they happen synchronously with
+		 * commits (besides not wanting to emit more WAL records).
+		 *
+		 * XXX Every known use of this capability is a defect.  Since an XID
+		 * isn't controlling visibility of the change that prompted invals,
+		 * other sessions need the inval even if this transactions aborts.
+		 *
+		 * ON COMMIT DELETE ROWS does a nontransactional index_build(), which
+		 * queues a relcache inval, including in transactions without an xid
+		 * that had read the (empty) table.  Standbys don't need any ON COMMIT
+		 * DELETE ROWS invals, but we've not done the work to withhold them.
 		 */
 		if (nmsgs != 0)
 		{