mirror of
https://github.com/postgres/postgres.git
synced 2025-07-18 17:42:25 +03:00
Load relcache entries' partitioning data on-demand, not immediately.
Formerly the rd_partkey and rd_partdesc data structures were always populated immediately when a relcache entry was built or rebuilt. This patch changes things so that they are populated only when they are first requested. (Hence, callers *must* now always use RelationGetPartitionKey or RelationGetPartitionDesc; just fetching the pointer directly is no longer acceptable.) This seems to have some performance benefits, but the main reason to do it is that it eliminates a recursive-reload failure that occurs if the partkey or partdesc expressions contain any references to the relation's rowtype (as discovered by Amit Langote). In retrospect, since loading these data structures might result in execution of nearly-arbitrary code via eval_const_expressions, it was a dumb idea to require that to happen during relcache entry rebuild. Also, fix things so that old copies of a relcache partition descriptor will be dropped when the cache entry's refcount goes to zero. In the previous coding it was possible for such copies to survive for the lifetime of the session, as I'd complained of in a previous discussion. (This management technique still isn't perfect, but it's better than before.) Improve the commentary explaining how that works and why it's safe to hand out direct pointers to these relcache substructures. In passing, improve RelationBuildPartitionDesc by using the same memory-context-parent-swap approach used by RelationBuildPartitionKey, thereby making it less dependent on strong assumptions about what partition_bounds_copy does. Avoid doing get_rel_relkind in the critical section, too. Patch by Amit Langote and Tom Lane; Robert Haas deserves some credit for prior work in the area, too. Although this is a pre-existing problem, no back-patch: the patch seems too invasive to be safe to back-patch, and the bug it fixes is a corner case that seems relatively unlikely to cause problems in the field. Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
This commit is contained in:
@ -35,7 +35,6 @@
|
||||
#include "utils/hashutils.h"
|
||||
#include "utils/lsyscache.h"
|
||||
#include "utils/partcache.h"
|
||||
#include "utils/rel.h"
|
||||
#include "utils/ruleutils.h"
|
||||
#include "utils/snapmgr.h"
|
||||
#include "utils/syscache.h"
|
||||
@ -775,6 +774,11 @@ partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval,
|
||||
/*
|
||||
* Return a copy of given PartitionBoundInfo structure. The data types of bounds
|
||||
* are described by given partition key specification.
|
||||
*
|
||||
* Note: it's important that this function and its callees not do any catalog
|
||||
* access, nor anything else that would result in allocating memory other than
|
||||
* the returned data structure. Since this is called in a long-lived context,
|
||||
* that would result in unwanted memory leaks.
|
||||
*/
|
||||
PartitionBoundInfo
|
||||
partition_bounds_copy(PartitionBoundInfo src,
|
||||
|
@ -47,17 +47,48 @@ typedef struct PartitionDirectoryEntry
|
||||
PartitionDesc pd;
|
||||
} PartitionDirectoryEntry;
|
||||
|
||||
static void RelationBuildPartitionDesc(Relation rel);
|
||||
|
||||
|
||||
/*
|
||||
* RelationGetPartitionDesc -- get partition descriptor, if relation is partitioned
|
||||
*
|
||||
* Note: we arrange for partition descriptors to not get freed until the
|
||||
* relcache entry's refcount goes to zero (see hacks in RelationClose,
|
||||
* RelationClearRelation, and RelationBuildPartitionDesc). Therefore, even
|
||||
* though we hand back a direct pointer into the relcache entry, it's safe
|
||||
* for callers to continue to use that pointer as long as (a) they hold the
|
||||
* relation open, and (b) they hold a relation lock strong enough to ensure
|
||||
* that the data doesn't become stale.
|
||||
*/
|
||||
PartitionDesc
|
||||
RelationGetPartitionDesc(Relation rel)
|
||||
{
|
||||
if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
|
||||
return NULL;
|
||||
|
||||
if (unlikely(rel->rd_partdesc == NULL))
|
||||
RelationBuildPartitionDesc(rel);
|
||||
|
||||
return rel->rd_partdesc;
|
||||
}
|
||||
|
||||
/*
|
||||
* RelationBuildPartitionDesc
|
||||
* Form rel's partition descriptor, and store in relcache entry
|
||||
*
|
||||
* Note: the descriptor won't be flushed from the cache by
|
||||
* RelationClearRelation() unless it's changed because of
|
||||
* addition or removal of a partition. Hence, code holding a lock
|
||||
* that's sufficient to prevent that can assume that rd_partdesc
|
||||
* won't change underneath it.
|
||||
* Partition descriptor is a complex structure; to avoid complicated logic to
|
||||
* free individual elements whenever the relcache entry is flushed, we give it
|
||||
* its own memory context, a child of CacheMemoryContext, which can easily be
|
||||
* deleted on its own. To avoid leaking memory in that context in case of an
|
||||
* error partway through this function, the context is initially created as a
|
||||
* child of CurTransactionContext and only re-parented to CacheMemoryContext
|
||||
* at the end, when no further errors are possible. Also, we don't make this
|
||||
* context the current context except in very brief code sections, out of fear
|
||||
* that some of our callees allocate memory on their own which would be leaked
|
||||
* permanently.
|
||||
*/
|
||||
void
|
||||
static void
|
||||
RelationBuildPartitionDesc(Relation rel)
|
||||
{
|
||||
PartitionDesc partdesc;
|
||||
@ -65,10 +96,12 @@ RelationBuildPartitionDesc(Relation rel)
|
||||
List *inhoids;
|
||||
PartitionBoundSpec **boundspecs = NULL;
|
||||
Oid *oids = NULL;
|
||||
bool *is_leaf = NULL;
|
||||
ListCell *cell;
|
||||
int i,
|
||||
nparts;
|
||||
PartitionKey key = RelationGetPartitionKey(rel);
|
||||
MemoryContext new_pdcxt;
|
||||
MemoryContext oldcxt;
|
||||
int *mapping;
|
||||
|
||||
@ -81,10 +114,11 @@ RelationBuildPartitionDesc(Relation rel)
|
||||
inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock);
|
||||
nparts = list_length(inhoids);
|
||||
|
||||
/* Allocate arrays for OIDs and boundspecs. */
|
||||
/* Allocate working arrays for OIDs, leaf flags, and boundspecs. */
|
||||
if (nparts > 0)
|
||||
{
|
||||
oids = palloc(nparts * sizeof(Oid));
|
||||
oids = (Oid *) palloc(nparts * sizeof(Oid));
|
||||
is_leaf = (bool *) palloc(nparts * sizeof(bool));
|
||||
boundspecs = palloc(nparts * sizeof(PartitionBoundSpec *));
|
||||
}
|
||||
|
||||
@ -172,65 +206,73 @@ RelationBuildPartitionDesc(Relation rel)
|
||||
|
||||
/* Save results. */
|
||||
oids[i] = inhrelid;
|
||||
is_leaf[i] = (get_rel_relkind(inhrelid) != RELKIND_PARTITIONED_TABLE);
|
||||
boundspecs[i] = boundspec;
|
||||
++i;
|
||||
}
|
||||
|
||||
/* Assert we aren't about to leak any old data structure */
|
||||
Assert(rel->rd_pdcxt == NULL);
|
||||
Assert(rel->rd_partdesc == NULL);
|
||||
/*
|
||||
* Create PartitionBoundInfo and mapping, working in the caller's context.
|
||||
* This could fail, but we haven't done any damage if so.
|
||||
*/
|
||||
if (nparts > 0)
|
||||
boundinfo = partition_bounds_create(boundspecs, nparts, key, &mapping);
|
||||
|
||||
/*
|
||||
* Now build the actual relcache partition descriptor. Note that the
|
||||
* order of operations here is fairly critical. If we fail partway
|
||||
* through this code, we won't have leaked memory because the rd_pdcxt is
|
||||
* attached to the relcache entry immediately, so it'll be freed whenever
|
||||
* the entry is rebuilt or destroyed. However, we don't assign to
|
||||
* rd_partdesc until the cached data structure is fully complete and
|
||||
* valid, so that no other code might try to use it.
|
||||
* Now build the actual relcache partition descriptor, copying all the
|
||||
* data into a new, small context. As per above comment, we don't make
|
||||
* this a long-lived context until it's finished.
|
||||
*/
|
||||
rel->rd_pdcxt = AllocSetContextCreate(CacheMemoryContext,
|
||||
"partition descriptor",
|
||||
ALLOCSET_SMALL_SIZES);
|
||||
MemoryContextCopyAndSetIdentifier(rel->rd_pdcxt,
|
||||
new_pdcxt = AllocSetContextCreate(CurTransactionContext,
|
||||
"partition descriptor",
|
||||
ALLOCSET_SMALL_SIZES);
|
||||
MemoryContextCopyAndSetIdentifier(new_pdcxt,
|
||||
RelationGetRelationName(rel));
|
||||
|
||||
partdesc = (PartitionDescData *)
|
||||
MemoryContextAllocZero(rel->rd_pdcxt, sizeof(PartitionDescData));
|
||||
MemoryContextAllocZero(new_pdcxt, sizeof(PartitionDescData));
|
||||
partdesc->nparts = nparts;
|
||||
/* If there are no partitions, the rest of the partdesc can stay zero */
|
||||
if (nparts > 0)
|
||||
{
|
||||
/* Create PartitionBoundInfo, using the caller's context. */
|
||||
boundinfo = partition_bounds_create(boundspecs, nparts, key, &mapping);
|
||||
|
||||
/* Now copy all info into relcache's partdesc. */
|
||||
oldcxt = MemoryContextSwitchTo(rel->rd_pdcxt);
|
||||
oldcxt = MemoryContextSwitchTo(new_pdcxt);
|
||||
partdesc->boundinfo = partition_bounds_copy(boundinfo, key);
|
||||
partdesc->oids = (Oid *) palloc(nparts * sizeof(Oid));
|
||||
partdesc->is_leaf = (bool *) palloc(nparts * sizeof(bool));
|
||||
MemoryContextSwitchTo(oldcxt);
|
||||
|
||||
/*
|
||||
* Assign OIDs from the original array into mapped indexes of the
|
||||
* result array. The order of OIDs in the former is defined by the
|
||||
* catalog scan that retrieved them, whereas that in the latter is
|
||||
* defined by canonicalized representation of the partition bounds.
|
||||
*
|
||||
* Also record leaf-ness of each partition. For this we use
|
||||
* get_rel_relkind() which may leak memory, so be sure to run it in
|
||||
* the caller's context.
|
||||
* Also save leaf-ness of each partition.
|
||||
*/
|
||||
for (i = 0; i < nparts; i++)
|
||||
{
|
||||
int index = mapping[i];
|
||||
|
||||
partdesc->oids[index] = oids[i];
|
||||
partdesc->is_leaf[index] =
|
||||
(get_rel_relkind(oids[i]) != RELKIND_PARTITIONED_TABLE);
|
||||
partdesc->is_leaf[index] = is_leaf[i];
|
||||
}
|
||||
MemoryContextSwitchTo(oldcxt);
|
||||
}
|
||||
|
||||
/*
|
||||
* We have a fully valid partdesc ready to store into the relcache.
|
||||
* Reparent it so it has the right lifespan.
|
||||
*/
|
||||
MemoryContextSetParent(new_pdcxt, CacheMemoryContext);
|
||||
|
||||
/*
|
||||
* But first, a kluge: if there's an old rd_pdcxt, it contains an old
|
||||
* partition descriptor that may still be referenced somewhere. Preserve
|
||||
* it, while not leaking it, by reattaching it as a child context of the
|
||||
* new rd_pdcxt. Eventually it will get dropped by either RelationClose
|
||||
* or RelationClearRelation.
|
||||
*/
|
||||
if (rel->rd_pdcxt != NULL)
|
||||
MemoryContextSetParent(rel->rd_pdcxt, new_pdcxt);
|
||||
rel->rd_pdcxt = new_pdcxt;
|
||||
rel->rd_partdesc = partdesc;
|
||||
}
|
||||
|
||||
|
Reference in New Issue
Block a user