mirror of
https://github.com/postgres/postgres.git
synced 2025-11-21 00:42:43 +03:00
Fix relcache inconsistency hazard in partition detach
During queries coming from ri_triggers.c, we need to omit partitions that are marked pending detach -- otherwise, the RI query is tricked into allowing a row into the referencing table whose corresponding row is in the detached partition. Which is bogus: once the detach operation completes, the row becomes an orphan. However, the code was not doing that in repeatable-read transactions, because relcache kept a copy of the partition descriptor that included the partition, and used it in the RI query. This commit changes the partdesc cache code to only keep descriptors that aren't dependent on a snapshot (namely: those where no detached partition exist, and those where detached partitions are included). When a partdesc-without- detached-partitions is requested, we create one afresh each time; also, those partdescs are stored in PortalContext instead of CacheMemoryContext. find_inheritance_children gets a new output *detached_exist boolean, which indicates whether any partition marked pending-detach is found. Its "include_detached" input flag is changed to "omit_detached", because that name captures desired the semantics more naturally. CreatePartitionDirectory() and RelationGetPartitionDesc() arguments are identically renamed. This was noticed because a buildfarm member that runs with relcache clobbering, which would not keep the improperly cached partdesc, broke one test, which led us to realize that the expected output of that test was bogus. This commit also corrects that expected output. Author: Amit Langote <amitlangote09@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/3269784.1617215412@sss.pgh.pa.us
This commit is contained in:
@@ -52,13 +52,19 @@ typedef struct SeenRelsEntry
|
||||
* then no locks are acquired, but caller must beware of race conditions
|
||||
* against possible DROPs of child relations.
|
||||
*
|
||||
* include_detached says to include all partitions, even if they're marked
|
||||
* detached. Passing it as false means they might or might not be included,
|
||||
* depending on the visibility of the pg_inherits row for the active snapshot.
|
||||
* If a partition's pg_inherits row is marked "detach pending",
|
||||
* *detached_exist (if not null) is set true, otherwise it is set false.
|
||||
*
|
||||
* If omit_detached is true and there is an active snapshot (not the same as
|
||||
* the catalog snapshot used to scan pg_inherits!) and a pg_inherits tuple
|
||||
* marked "detach pending" is visible to that snapshot, then that partition is
|
||||
* omitted from the output list. This makes partitions invisible depending on
|
||||
* whether the transaction that marked those partitions as detached appears
|
||||
* committed to the active snapshot.
|
||||
*/
|
||||
List *
|
||||
find_inheritance_children(Oid parentrelId, bool include_detached,
|
||||
LOCKMODE lockmode)
|
||||
find_inheritance_children(Oid parentrelId, bool omit_detached,
|
||||
LOCKMODE lockmode, bool *detached_exist)
|
||||
{
|
||||
List *list = NIL;
|
||||
Relation relation;
|
||||
@@ -78,6 +84,9 @@ find_inheritance_children(Oid parentrelId, bool include_detached,
|
||||
if (!has_subclass(parentrelId))
|
||||
return NIL;
|
||||
|
||||
if (detached_exist)
|
||||
*detached_exist = false;
|
||||
|
||||
/*
|
||||
* Scan pg_inherits and build a working array of subclass OIDs.
|
||||
*/
|
||||
@@ -99,29 +108,35 @@ find_inheritance_children(Oid parentrelId, bool include_detached,
|
||||
{
|
||||
/*
|
||||
* Cope with partitions concurrently being detached. When we see a
|
||||
* partition marked "detach pending", we only include it in the set of
|
||||
* visible partitions if caller requested all detached partitions, or
|
||||
* if its pg_inherits tuple's xmin is still visible to the active
|
||||
* snapshot.
|
||||
* partition marked "detach pending", we omit it from the returned set
|
||||
* of visible partitions if caller requested that and the tuple's xmin
|
||||
* does not appear in progress to the active snapshot. (If there's no
|
||||
* active snapshot set, that means we're not running a user query, so
|
||||
* it's OK to always include detached partitions in that case; if the
|
||||
* xmin is still running to the active snapshot, then the partition
|
||||
* has not been detached yet and so we include it.)
|
||||
*
|
||||
* The reason for this check is that we want to avoid seeing the
|
||||
* The reason for this hack is that we want to avoid seeing the
|
||||
* partition as alive in RI queries during REPEATABLE READ or
|
||||
* SERIALIZABLE transactions. (If there's no active snapshot set,
|
||||
* that means we're not running a user query, so it's OK to always
|
||||
* include detached partitions in that case.)
|
||||
* SERIALIZABLE transactions: such queries use a different snapshot
|
||||
* than the one used by regular (user) queries.
|
||||
*/
|
||||
if (((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhdetachpending &&
|
||||
!include_detached &&
|
||||
ActiveSnapshotSet())
|
||||
if (((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhdetachpending)
|
||||
{
|
||||
TransactionId xmin;
|
||||
Snapshot snap;
|
||||
if (detached_exist)
|
||||
*detached_exist = true;
|
||||
|
||||
xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data);
|
||||
snap = GetActiveSnapshot();
|
||||
if (omit_detached && ActiveSnapshotSet())
|
||||
{
|
||||
TransactionId xmin;
|
||||
Snapshot snap;
|
||||
|
||||
if (!XidInMVCCSnapshot(xmin, snap))
|
||||
continue;
|
||||
xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data);
|
||||
snap = GetActiveSnapshot();
|
||||
|
||||
if (!XidInMVCCSnapshot(xmin, snap))
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
|
||||
@@ -235,8 +250,8 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
|
||||
ListCell *lc;
|
||||
|
||||
/* Get the direct children of this rel */
|
||||
currentchildren = find_inheritance_children(currentrel, false,
|
||||
lockmode);
|
||||
currentchildren = find_inheritance_children(currentrel, true,
|
||||
lockmode, NULL);
|
||||
|
||||
/*
|
||||
* Add to the queue only those children not already seen. This avoids
|
||||
|
||||
Reference in New Issue
Block a user