1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-05 07:21:24 +03:00

Fix logical decoding error when system table w/ toast is repeatedly rewritten.

Repeatedly rewriting a mapped catalog table with VACUUM FULL or
CLUSTER could cause logical decoding to fail with:
ERROR, "could not map filenode \"%s\" to relation OID"

To trigger the problem the rewritten catalog had to have live tuples
with toasted columns.

The problem was triggered as during catalog table rewrites the
heap_insert() check that prevents logical decoding information to be
emitted for system catalogs, failed to treat the new heap's toast table
as a system catalog (because the new heap is not recognized as a
catalog table via RelationIsLogicallyLogged()). The relmapper, in
contrast to the normal catalog contents, does not contain historical
information. After a single rewrite of a mapped table the new relation
is known to the relmapper, but if the table is rewritten twice before
logical decoding occurs, the relfilenode cannot be mapped to a
relation anymore.  Which then leads us to error out.   This only
happens for toast tables, because the main table contents aren't
re-inserted with heap_insert().

The fix is simple, add a new heap_insert() flag that prevents logical
decoding information from being emitted, and accept during decoding
that there might not be tuple data for toast tables.

Unfortunately that does not fix pre-existing logical decoding
errors. Doing so would require not throwing an error when a filenode
cannot be mapped to a relation during decoding, and that seems too
likely to hide bugs.  If it's crucial to fix decoding for an existing
slot, temporarily changing the ERROR in ReorderBufferCommit() to a
WARNING appears to be the best fix.

Author: Andres Freund
Discussion: https://postgr.es/m/20180914021046.oi7dm4ra3ot2g2kt@alap3.anarazel.de
Backpatch: 9.4-, where logical decoding was introduced
This commit is contained in:
Andres Freund
2018-10-10 13:53:03 -07:00
parent fd66ca083d
commit 0a0c255942
6 changed files with 163 additions and 10 deletions

View File

@ -1528,8 +1528,16 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
change->data.tp.relnode.relNode);
/*
* Catalog tuple without data, emitted while catalog was
* in the process of being rewritten.
* Mapped catalog tuple without data, emitted while
* catalog table was in the process of being rewritten. We
* can fail to look up the relfilenode, because the the
* relmapper has no "historic" view, in contrast to normal
* the normal catalog during decoding. Thus repeated
* rewrites can cause a lookup failure. That's OK because
* we do not decode catalog changes anyway. Normally such
* tuples would be skipped over below, but we can't
* identify whether the table should be logically logged
* without mapping the relfilenode to the oid.
*/
if (reloid == InvalidOid &&
change->data.tp.newtuple == NULL &&
@ -1584,10 +1592,17 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
* transaction's changes. Otherwise it will get
* freed/reused while restoring spooled data from
* disk.
*
* But skip doing so if there's no tuple-data. That
* happens if a non-mapped system catalog with a toast
* table is rewritten.
*/
dlist_delete(&change->node);
ReorderBufferToastAppendChunk(rb, txn, relation,
change);
if (change->data.tp.newtuple != NULL)
{
dlist_delete(&change->node);
ReorderBufferToastAppendChunk(rb, txn, relation,
change);
}
}
change_done: