1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-30 11:03:19 +03:00

Fix a couple of bugs in MultiXactId freezing

Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look
into a multixact to check the members against cutoff_xid.  This means
that a very old Xid could survive hidden within a multi, possibly
outliving its CLOG storage.  In the distant future, this would cause
clog lookup failures:
ERROR:  could not access status of transaction 3883960912
DETAIL:  Could not open file "pg_clog/0E78": No such file or directory.

This mostly was problematic when the updating transaction aborted, since
in that case the row wouldn't get pruned away earlier in vacuum and the
multixact could possibly survive for a long time.  In many cases, data
that is inaccessible for this reason way can be brought back
heuristically.

As a second bug, heap_freeze_tuple() didn't properly handle multixacts
that need to be frozen according to cutoff_multi, but whose updater xid
is still alive.  Instead of preserving the update Xid, it just set Xmax
invalid, which leads to both old and new tuple versions becoming
visible.  This is pretty rare in practice, but a real threat
nonetheless.  Existing corrupted rows, unfortunately, cannot be repaired
in an automated fashion.

Existing physical replicas might have already incorrectly frozen tuples
because of different behavior than in master, which might only become
apparent in the future once pg_multixact/ is truncated; it is
recommended that all clones be rebuilt after upgrading.

Following code analysis caused by bug report by J Smith in message
CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com
and privately by F-Secure.

Backpatch to 9.3, where freezing of MultiXactIds was introduced.

Analysis and patch by Andres Freund, with some tweaks by Álvaro.
This commit is contained in:
Alvaro Herrera
2013-11-28 19:17:21 -03:00
parent 1ce150b7bb
commit 2393c7d102
2 changed files with 151 additions and 23 deletions

View File

@ -434,11 +434,14 @@ MultiXactIdExpand(MultiXactId multi, TransactionId xid, MultiXactStatus status)
* Determine which of the members of the MultiXactId are still of
* interest. This is any running transaction, and also any transaction
* that grabbed something stronger than just a lock and was committed. (An
* update that aborted is of no interest here.)
* update that aborted is of no interest here; and having more than one
* update Xid in a multixact would cause errors elsewhere.)
*
* (Removing dead members is just an optimization, but a useful one. Note
* we have the same race condition here as above: j could be 0 at the end
* of the loop.)
* Removing dead members is not just an optimization: freezing of tuples
* whose Xmax are multis depends on this behavior.
*
* Note we have the same race condition here as above: j could be 0 at the
* end of the loop.
*/
newMembers = (MultiXactMember *)
palloc(sizeof(MultiXactMember) * (nmembers + 1));
@ -1052,7 +1055,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
Assert(MultiXactIdIsValid(multi));
if (!MultiXactIdIsValid(multi))
return -1;
/* See if the MultiXactId is in the local cache */
length = mXactCacheGetById(multi, members);