mirror of
https://github.com/postgres/postgres.git
synced 2025-07-05 07:21:24 +03:00
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories when replaying create database WAL records. Prior to this patch, the standby would fail to recover in such a case. However, the directories could be legitimately missing. Consider a sequence of WAL records as follows: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, the crash recovery must be able to move on. This patch adds a mechanism similar to invalid-page tracking, to keep a tally of missing directories during crash recovery. If all the missing directory references are matched with corresponding drop records at the end of crash recovery, the standby can safely continue following the primary. Backpatch to 13, at least for now. The bug is older, but fixing it in older branches requires more careful study of the interactions with commite6d8069522
, which appeared in 13. A new TAP test file is added to verify the condition. However, because it depends on commitd6d317dbf6
, it can only be added to branch master. I (Álvaro) manually verified that the code behaves as expected in branch 14. It's a bit nervous-making to leave the code uncovered by tests in older branches, but leaving the bug unfixed is even worse. Also, the main reason this fix took so long is precisely that we couldn't agree on a good strategy to approach testing for the bug, so perhaps this is the best we can do. Diagnosed-by: Paul Guo <paulguo@gmail.com> Author: Paul Guo <paulguo@gmail.com> Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Asim R Praveen <apraveen@pivotal.io> Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
This commit is contained in:
@ -57,6 +57,7 @@
|
||||
#include "access/tableam.h"
|
||||
#include "access/xact.h"
|
||||
#include "access/xloginsert.h"
|
||||
#include "access/xlogrecovery.h"
|
||||
#include "access/xlogutils.h"
|
||||
#include "catalog/catalog.h"
|
||||
#include "catalog/dependency.h"
|
||||
@ -1574,6 +1575,22 @@ tblspc_redo(XLogReaderState *record)
|
||||
{
|
||||
xl_tblspc_drop_rec *xlrec = (xl_tblspc_drop_rec *) XLogRecGetData(record);
|
||||
|
||||
if (!reachedConsistency)
|
||||
XLogForgetMissingDir(xlrec->ts_id, InvalidOid);
|
||||
|
||||
/*
|
||||
* Before we remove the tablespace directory, update minimum recovery
|
||||
* point to cover this WAL record. Once the tablespace is removed,
|
||||
* there's no going back. This manually enforces the WAL-first rule.
|
||||
* Doing this before the removal means that if the removal fails for
|
||||
* some reason, the directory is left alone and needs to be manually
|
||||
* removed. Alternatively we could update the minimum recovery point
|
||||
* after removal, but that would leave a small window where the
|
||||
* WAL-first rule could be violated.
|
||||
*/
|
||||
if (!reachedConsistency)
|
||||
XLogFlush(record->EndRecPtr);
|
||||
|
||||
/*
|
||||
* If we issued a WAL record for a drop tablespace it implies that
|
||||
* there were no files in it at all when the DROP was done. That means
|
||||
|
Reference in New Issue
Block a user