1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-18 17:42:25 +03:00

Load relcache entries' partitioning data on-demand, not immediately.

Formerly the rd_partkey and rd_partdesc data structures were always
populated immediately when a relcache entry was built or rebuilt.
This patch changes things so that they are populated only when they
are first requested.  (Hence, callers *must* now always use
RelationGetPartitionKey or RelationGetPartitionDesc; just fetching
the pointer directly is no longer acceptable.)

This seems to have some performance benefits, but the main reason to do
it is that it eliminates a recursive-reload failure that occurs if the
partkey or partdesc expressions contain any references to the relation's
rowtype (as discovered by Amit Langote).  In retrospect, since loading
these data structures might result in execution of nearly-arbitrary code
via eval_const_expressions, it was a dumb idea to require that to happen
during relcache entry rebuild.

Also, fix things so that old copies of a relcache partition descriptor
will be dropped when the cache entry's refcount goes to zero.  In the
previous coding it was possible for such copies to survive for the
lifetime of the session, as I'd complained of in a previous discussion.
(This management technique still isn't perfect, but it's better than
before.)  Improve the commentary explaining how that works and why
it's safe to hand out direct pointers to these relcache substructures.

In passing, improve RelationBuildPartitionDesc by using the same
memory-context-parent-swap approach used by RelationBuildPartitionKey,
thereby making it less dependent on strong assumptions about what
partition_bounds_copy does.  Avoid doing get_rel_relkind in the
critical section, too.

Patch by Amit Langote and Tom Lane; Robert Haas deserves some credit
for prior work in the area, too.  Although this is a pre-existing
problem, no back-patch: the patch seems too invasive to be safe to
back-patch, and the bug it fixes is a corner case that seems
relatively unlikely to cause problems in the field.

Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
This commit is contained in:
Tom Lane
2019-12-25 14:43:13 -05:00
parent 8ce3aa9b59
commit 5b9312378e
12 changed files with 186 additions and 121 deletions

View File

@ -35,7 +35,6 @@
#include "utils/hashutils.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
#include "utils/rel.h"
#include "utils/ruleutils.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@ -775,6 +774,11 @@ partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval,
/*
* Return a copy of given PartitionBoundInfo structure. The data types of bounds
* are described by given partition key specification.
*
* Note: it's important that this function and its callees not do any catalog
* access, nor anything else that would result in allocating memory other than
* the returned data structure. Since this is called in a long-lived context,
* that would result in unwanted memory leaks.
*/
PartitionBoundInfo
partition_bounds_copy(PartitionBoundInfo src,

View File

@ -47,17 +47,48 @@ typedef struct PartitionDirectoryEntry
PartitionDesc pd;
} PartitionDirectoryEntry;
static void RelationBuildPartitionDesc(Relation rel);
/*
* RelationGetPartitionDesc -- get partition descriptor, if relation is partitioned
*
* Note: we arrange for partition descriptors to not get freed until the
* relcache entry's refcount goes to zero (see hacks in RelationClose,
* RelationClearRelation, and RelationBuildPartitionDesc). Therefore, even
* though we hand back a direct pointer into the relcache entry, it's safe
* for callers to continue to use that pointer as long as (a) they hold the
* relation open, and (b) they hold a relation lock strong enough to ensure
* that the data doesn't become stale.
*/
PartitionDesc
RelationGetPartitionDesc(Relation rel)
{
if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
return NULL;
if (unlikely(rel->rd_partdesc == NULL))
RelationBuildPartitionDesc(rel);
return rel->rd_partdesc;
}
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor, and store in relcache entry
*
* Note: the descriptor won't be flushed from the cache by
* RelationClearRelation() unless it's changed because of
* addition or removal of a partition. Hence, code holding a lock
* that's sufficient to prevent that can assume that rd_partdesc
* won't change underneath it.
* Partition descriptor is a complex structure; to avoid complicated logic to
* free individual elements whenever the relcache entry is flushed, we give it
* its own memory context, a child of CacheMemoryContext, which can easily be
* deleted on its own. To avoid leaking memory in that context in case of an
* error partway through this function, the context is initially created as a
* child of CurTransactionContext and only re-parented to CacheMemoryContext
* at the end, when no further errors are possible. Also, we don't make this
* context the current context except in very brief code sections, out of fear
* that some of our callees allocate memory on their own which would be leaked
* permanently.
*/
void
static void
RelationBuildPartitionDesc(Relation rel)
{
PartitionDesc partdesc;
@ -65,10 +96,12 @@ RelationBuildPartitionDesc(Relation rel)
List *inhoids;
PartitionBoundSpec **boundspecs = NULL;
Oid *oids = NULL;
bool *is_leaf = NULL;
ListCell *cell;
int i,
nparts;
PartitionKey key = RelationGetPartitionKey(rel);
MemoryContext new_pdcxt;
MemoryContext oldcxt;
int *mapping;
@ -81,10 +114,11 @@ RelationBuildPartitionDesc(Relation rel)
inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock);
nparts = list_length(inhoids);
/* Allocate arrays for OIDs and boundspecs. */
/* Allocate working arrays for OIDs, leaf flags, and boundspecs. */
if (nparts > 0)
{
oids = palloc(nparts * sizeof(Oid));
oids = (Oid *) palloc(nparts * sizeof(Oid));
is_leaf = (bool *) palloc(nparts * sizeof(bool));
boundspecs = palloc(nparts * sizeof(PartitionBoundSpec *));
}
@ -172,65 +206,73 @@ RelationBuildPartitionDesc(Relation rel)
/* Save results. */
oids[i] = inhrelid;
is_leaf[i] = (get_rel_relkind(inhrelid) != RELKIND_PARTITIONED_TABLE);
boundspecs[i] = boundspec;
++i;
}
/* Assert we aren't about to leak any old data structure */
Assert(rel->rd_pdcxt == NULL);
Assert(rel->rd_partdesc == NULL);
/*
* Create PartitionBoundInfo and mapping, working in the caller's context.
* This could fail, but we haven't done any damage if so.
*/
if (nparts > 0)
boundinfo = partition_bounds_create(boundspecs, nparts, key, &mapping);
/*
* Now build the actual relcache partition descriptor. Note that the
* order of operations here is fairly critical. If we fail partway
* through this code, we won't have leaked memory because the rd_pdcxt is
* attached to the relcache entry immediately, so it'll be freed whenever
* the entry is rebuilt or destroyed. However, we don't assign to
* rd_partdesc until the cached data structure is fully complete and
* valid, so that no other code might try to use it.
* Now build the actual relcache partition descriptor, copying all the
* data into a new, small context. As per above comment, we don't make
* this a long-lived context until it's finished.
*/
rel->rd_pdcxt = AllocSetContextCreate(CacheMemoryContext,
"partition descriptor",
ALLOCSET_SMALL_SIZES);
MemoryContextCopyAndSetIdentifier(rel->rd_pdcxt,
new_pdcxt = AllocSetContextCreate(CurTransactionContext,
"partition descriptor",
ALLOCSET_SMALL_SIZES);
MemoryContextCopyAndSetIdentifier(new_pdcxt,
RelationGetRelationName(rel));
partdesc = (PartitionDescData *)
MemoryContextAllocZero(rel->rd_pdcxt, sizeof(PartitionDescData));
MemoryContextAllocZero(new_pdcxt, sizeof(PartitionDescData));
partdesc->nparts = nparts;
/* If there are no partitions, the rest of the partdesc can stay zero */
if (nparts > 0)
{
/* Create PartitionBoundInfo, using the caller's context. */
boundinfo = partition_bounds_create(boundspecs, nparts, key, &mapping);
/* Now copy all info into relcache's partdesc. */
oldcxt = MemoryContextSwitchTo(rel->rd_pdcxt);
oldcxt = MemoryContextSwitchTo(new_pdcxt);
partdesc->boundinfo = partition_bounds_copy(boundinfo, key);
partdesc->oids = (Oid *) palloc(nparts * sizeof(Oid));
partdesc->is_leaf = (bool *) palloc(nparts * sizeof(bool));
MemoryContextSwitchTo(oldcxt);
/*
* Assign OIDs from the original array into mapped indexes of the
* result array. The order of OIDs in the former is defined by the
* catalog scan that retrieved them, whereas that in the latter is
* defined by canonicalized representation of the partition bounds.
*
* Also record leaf-ness of each partition. For this we use
* get_rel_relkind() which may leak memory, so be sure to run it in
* the caller's context.
* Also save leaf-ness of each partition.
*/
for (i = 0; i < nparts; i++)
{
int index = mapping[i];
partdesc->oids[index] = oids[i];
partdesc->is_leaf[index] =
(get_rel_relkind(oids[i]) != RELKIND_PARTITIONED_TABLE);
partdesc->is_leaf[index] = is_leaf[i];
}
MemoryContextSwitchTo(oldcxt);
}
/*
* We have a fully valid partdesc ready to store into the relcache.
* Reparent it so it has the right lifespan.
*/
MemoryContextSetParent(new_pdcxt, CacheMemoryContext);
/*
* But first, a kluge: if there's an old rd_pdcxt, it contains an old
* partition descriptor that may still be referenced somewhere. Preserve
* it, while not leaking it, by reattaching it as a child context of the
* new rd_pdcxt. Eventually it will get dropped by either RelationClose
* or RelationClearRelation.
*/
if (rel->rd_pdcxt != NULL)
MemoryContextSetParent(rel->rd_pdcxt, new_pdcxt);
rel->rd_pdcxt = new_pdcxt;
rel->rd_partdesc = partdesc;
}