1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-03 20:02:46 +03:00

Fix several recently introduced issues around handling new relation forks.

Most of these stem from d25f519107 "tableam: relation creation, VACUUM
FULL/CLUSTER, SET TABLESPACE.".

1) To pass data to the relation_set_new_filenode()
   RelationSetNewRelfilenode() was made to update RelationData.rd_rel
   directly. That's not OK however, as it makes the relcache entries
   temporarily inconsistent. Which among other scenarios is a problem
   if a REINDEX targets an index on pg_class - the
   CatalogTupleUpdate() in RelationSetNewRelfilenode().  Presumably
   that was introduced because other places in the code do so - while
   those aren't "good practice" they don't appear to be actively
   buggy (e.g. because system tables may not be targeted).

   I (Andres) should have caught this while reviewing and signficantly
   evolving the code in that commit, mea culpa.

   Fix that by instead passing in the new RelFileNode as separate
   argument to relation_set_new_filenode() and rely on the relcache to
   update the catalog entry. Also revert that the
   RelationMapUpdateMap() call was changed to immediate, and undo some
   other more unnecessary changes.

2) Document that the relation_set_new_filenode cannot rely on the
   whole relcache entry to be valid. It might be worthwhile to
   refactor the code to never have to rely on that, but given the way
   heap_create() is currently coded, that'd be a large change.

3) ATExecSetTableSpace() shouldn't do FlushRelationBuffers() itself. A
   table AM might not use shared buffers at all. Move to
   index_copy_data() and heapam_relation_copy_data().

4) heapam_relation_set_new_filenode() previously sometimes accessed
   rel->rd_rel->relpersistence rather than the `persistence`
   argument. Code movement mistake.

5) Previously heapam_relation_set_new_filenode() re-opened the smgr
   relation to create the init for, if necesary. Instead have
   RelationCreateStorage() return the SMgrRelation and use it to
   create the init fork.

6) Add a note about the danger of modifying the relcache directly to
   ATExecSetTableSpace() - it's currently not a bug because there's a
   check ERRORing for catalog tables.

Regression tests and assertion improvements that together trigger the
bug described in 1) will be added in a later commit, as there is a
related bug on all branches.

Reported-By: Michael Paquier
Diagnosed-By: Tom Lane and Andres Freund
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20190418011430.GA19133@paquier.xyz
This commit is contained in:
Andres Freund
2019-04-29 19:28:05 -07:00
parent 9ee7414ed0
commit 5c1560606d
8 changed files with 101 additions and 61 deletions

View File

@ -3440,6 +3440,7 @@ RelationSetNewRelfilenode(Relation relation, char persistence)
Form_pg_class classform;
MultiXactId minmulti = InvalidMultiXactId;
TransactionId freezeXid = InvalidTransactionId;
RelFileNode newrnode;
/* Allocate a new relfilenode */
newrelfilenode = GetNewRelFileNode(relation->rd_rel->reltablespace, NULL,
@ -3462,39 +3463,23 @@ RelationSetNewRelfilenode(Relation relation, char persistence)
*/
RelationDropStorage(relation);
/*
* Now update the pg_class row. However, if we're dealing with a mapped
* index, pg_class.relfilenode doesn't change; instead we have to send the
* update to the relation mapper.
*/
if (RelationIsMapped(relation))
RelationMapUpdateMap(RelationGetRelid(relation),
newrelfilenode,
relation->rd_rel->relisshared,
true);
else
{
relation->rd_rel->relfilenode = newrelfilenode;
classform->relfilenode = newrelfilenode;
}
RelationInitPhysicalAddr(relation);
/* initialize new relfilenode from old relfilenode */
newrnode = relation->rd_node;
/*
* Create storage for the main fork of the new relfilenode. If it's
* table-like object, call into table AM to do so, which'll also create
* the table's init fork.
*
* NOTE: any conflict in relfilenode value will be caught here, if
* GetNewRelFileNode messes up for any reason.
* NOTE: If relevant for the AM, any conflict in relfilenode value will be
* caught here, if GetNewRelFileNode messes up for any reason.
*/
newrnode = relation->rd_node;
newrnode.relNode = newrelfilenode;
/*
* Create storage for relation.
*/
switch (relation->rd_rel->relkind)
{
/* shouldn't be called for these */
/* shouldn't be called for these */
case RELKIND_VIEW:
case RELKIND_COMPOSITE_TYPE:
case RELKIND_FOREIGN_TABLE:
@ -3505,18 +3490,36 @@ RelationSetNewRelfilenode(Relation relation, char persistence)
case RELKIND_INDEX:
case RELKIND_SEQUENCE:
RelationCreateStorage(relation->rd_node, persistence);
RelationOpenSmgr(relation);
{
SMgrRelation srel;
srel = RelationCreateStorage(newrnode, persistence);
smgrclose(srel);
}
break;
case RELKIND_RELATION:
case RELKIND_TOASTVALUE:
case RELKIND_MATVIEW:
table_relation_set_new_filenode(relation, persistence,
table_relation_set_new_filenode(relation, &newrnode,
persistence,
&freezeXid, &minmulti);
break;
}
/*
* However, if we're dealing with a mapped index, pg_class.relfilenode
* doesn't change; instead we have to send the update to the relation
* mapper.
*/
if (RelationIsMapped(relation))
RelationMapUpdateMap(RelationGetRelid(relation),
newrelfilenode,
relation->rd_rel->relisshared,
false);
else
classform->relfilenode = newrelfilenode;
/* These changes are safe even for a mapped relation */
if (relation->rd_rel->relkind != RELKIND_SEQUENCE)
{