Fix creation of partition descriptor during concurrent detach+drop

If a partition undergoes DETACH CONCURRENTLY immediately followed by DROP, this could cause a problem for a concurrent transaction recomputing the partition descriptor when running a prepared statement, because it tries to dereference a pointer to a tuple that's not found in a catalog scan. The existing retry logic added in commit dbca3469eb is sufficient to cope with the overall problem, provided we don't try to dereference a non-existant heap tuple. Arguably, the code in RelationBuildPartitionDesc() has been wrong all along, since no check was added in commit 898e5e3290 against receiving a NULL tuple from the catalog scan; that bug has only become user-visible with DETACH CONCURRENTLY which was added in branch 14. Therefore, even though there's no known mechanism to cause a crash because of this, backpatch the addition of such a check to all supported branches. In branches prior to 14, this would cause the code to fail with a "missing relpartbound for relation XYZ" error instead of crashing; that's okay, because there are no reports of such behavior anyway. Author: Kuntal Ghosh <kuntalghosh.2007@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/18559-b48286d2eacd9a4e@postgresql.org
2025-10-28 11:55:03 +03:00 · 2024-08-12 18:17:56 -04:00
parent 9db6650a5a
commit 1b9dd6b05a
1 changed files with 22 additions and 8 deletions
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -210,6 +210,10 @@ retry:
 		 * shared queue.  We solve this problem by reading pg_class directly
 		 * for the desired tuple.
 		 *
+		 * If the partition recently detached is also dropped, we get no tuple
+		 * from the scan.  In that case, we also retry, and next time through
+		 * here, we don't see that partition anymore.
+		 *
 		 * The other problem is that DETACH CONCURRENTLY is in the process of
 		 * removing a partition, which happens in two steps: first it marks it
 		 * as "detach pending", commits, then unsets relpartbound.  If
@@ -224,8 +228,6 @@ retry:
 			Relation	pg_class;
 			SysScanDesc scan;
 			ScanKeyData key[1];
-			Datum		datum;
-			bool		isnull;

 			pg_class = table_open(RelationRelationId, AccessShareLock);
 			ScanKeyInit(&key[0],
@@ -234,17 +236,29 @@ retry:
 						ObjectIdGetDatum(inhrelid));
 			scan = systable_beginscan(pg_class, ClassOidIndexId, true,
 									  NULL, 1, key);
+
+			/*
+			 * We could get one tuple from the scan (the normal case), or zero
+			 * tuples if the table has been dropped meanwhile.
+			 */
 			tuple = systable_getnext(scan);
-			datum = heap_getattr(tuple, Anum_pg_class_relpartbound,
-								 RelationGetDescr(pg_class), &isnull);
-			if (!isnull)
-				boundspec = stringToNode(TextDatumGetCString(datum));
+			if (HeapTupleIsValid(tuple))
+			{
+				Datum		datum;
+				bool		isnull;
+
+				datum = heap_getattr(tuple, Anum_pg_class_relpartbound,
+									 RelationGetDescr(pg_class), &isnull);
+				if (!isnull)
+					boundspec = stringToNode(TextDatumGetCString(datum));
+			}
 			systable_endscan(scan);
 			table_close(pg_class, AccessShareLock);

 			/*
-			 * If we still don't get a relpartbound value, then it must be
-			 * because of DETACH CONCURRENTLY.  Restart from the top, as
+			 * If we still don't get a relpartbound value (either because
+			 * boundspec is null or because there was no tuple), then it must
+			 * be because of DETACH CONCURRENTLY.  Restart from the top, as
 			 * explained above.  We only do this once, for two reasons: first,
 			 * only one DETACH CONCURRENTLY session could affect us at a time,
 			 * since each of them would have to wait for the snapshot under