1
0
mirror of https://github.com/postgres/postgres.git synced 2025-04-27 22:56:53 +03:00

Fix creation of partition descriptor during concurrent detach+drop

If a partition undergoes DETACH CONCURRENTLY immediately followed by
DROP, this could cause a problem for a concurrent transaction
recomputing the partition descriptor when running a prepared statement,
because it tries to dereference a pointer to a tuple that's not found in
a catalog scan.

The existing retry logic added in commit dbca3469ebf8 is sufficient to
cope with the overall problem, provided we don't try to dereference a
non-existant heap tuple.

Arguably, the code in RelationBuildPartitionDesc() has been wrong all
along, since no check was added in commit 898e5e3290a7 against receiving
a NULL tuple from the catalog scan; that bug has only become
user-visible with DETACH CONCURRENTLY which was added in branch 14.
Therefore, even though there's no known mechanism to cause a crash
because of this, backpatch the addition of such a check to all supported
branches.  In branches prior to 14, this would cause the code to fail
with a "missing relpartbound for relation XYZ" error instead of
crashing; that's okay, because there are no reports of such behavior
anyway.

Author: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/18559-b48286d2eacd9a4e@postgresql.org
This commit is contained in:
Alvaro Herrera 2024-08-12 18:17:56 -04:00
parent 9db6650a5a
commit 1b9dd6b05a
No known key found for this signature in database
GPG Key ID: 1C20ACB9D5C564AE

View File

@ -210,6 +210,10 @@ retry:
* shared queue. We solve this problem by reading pg_class directly * shared queue. We solve this problem by reading pg_class directly
* for the desired tuple. * for the desired tuple.
* *
* If the partition recently detached is also dropped, we get no tuple
* from the scan. In that case, we also retry, and next time through
* here, we don't see that partition anymore.
*
* The other problem is that DETACH CONCURRENTLY is in the process of * The other problem is that DETACH CONCURRENTLY is in the process of
* removing a partition, which happens in two steps: first it marks it * removing a partition, which happens in two steps: first it marks it
* as "detach pending", commits, then unsets relpartbound. If * as "detach pending", commits, then unsets relpartbound. If
@ -224,8 +228,6 @@ retry:
Relation pg_class; Relation pg_class;
SysScanDesc scan; SysScanDesc scan;
ScanKeyData key[1]; ScanKeyData key[1];
Datum datum;
bool isnull;
pg_class = table_open(RelationRelationId, AccessShareLock); pg_class = table_open(RelationRelationId, AccessShareLock);
ScanKeyInit(&key[0], ScanKeyInit(&key[0],
@ -234,17 +236,29 @@ retry:
ObjectIdGetDatum(inhrelid)); ObjectIdGetDatum(inhrelid));
scan = systable_beginscan(pg_class, ClassOidIndexId, true, scan = systable_beginscan(pg_class, ClassOidIndexId, true,
NULL, 1, key); NULL, 1, key);
/*
* We could get one tuple from the scan (the normal case), or zero
* tuples if the table has been dropped meanwhile.
*/
tuple = systable_getnext(scan); tuple = systable_getnext(scan);
datum = heap_getattr(tuple, Anum_pg_class_relpartbound, if (HeapTupleIsValid(tuple))
RelationGetDescr(pg_class), &isnull); {
if (!isnull) Datum datum;
boundspec = stringToNode(TextDatumGetCString(datum)); bool isnull;
datum = heap_getattr(tuple, Anum_pg_class_relpartbound,
RelationGetDescr(pg_class), &isnull);
if (!isnull)
boundspec = stringToNode(TextDatumGetCString(datum));
}
systable_endscan(scan); systable_endscan(scan);
table_close(pg_class, AccessShareLock); table_close(pg_class, AccessShareLock);
/* /*
* If we still don't get a relpartbound value, then it must be * If we still don't get a relpartbound value (either because
* because of DETACH CONCURRENTLY. Restart from the top, as * boundspec is null or because there was no tuple), then it must
* be because of DETACH CONCURRENTLY. Restart from the top, as
* explained above. We only do this once, for two reasons: first, * explained above. We only do this once, for two reasons: first,
* only one DETACH CONCURRENTLY session could affect us at a time, * only one DETACH CONCURRENTLY session could affect us at a time,
* since each of them would have to wait for the snapshot under * since each of them would have to wait for the snapshot under