1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-25 12:03:53 +03:00

Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.

If TRUNCATE causes some buffers to be invalidated and thus the
checkpoint does not flush them, TRUNCATE must also ensure that the
corresponding files are truncated on disk. Otherwise, a replay
from the checkpoint might find that the buffers exist but have
the wrong contents, which may cause replay to fail.

Report by Teja Mupparti. Patch by Kyotaro Horiguchi, per a design
suggestion from Heikki Linnakangas, with some changes to the
comments by me. Review of this and a prior patch that approached
the issue differently by Heikki Linnakangas, Andres Freund, Álvaro
Herrera, Masahiko Sawada, and Tom Lane.

Discussion: http://postgr.es/m/BYAPR06MB6373BF50B469CA393C614257ABF00@BYAPR06MB6373.namprd06.prod.outlook.com
This commit is contained in:
Robert Haas
2022-03-24 14:32:06 -04:00
parent 86459b3296
commit 412ad7a556
11 changed files with 120 additions and 28 deletions

View File

@@ -86,6 +86,41 @@ struct XidCache
*/
#define INVALID_PGPROCNO PG_INT32_MAX
/*
* Flags for PGPROC.delayChkpt
*
* These flags can be used to delay the start or completion of a checkpoint
* for short periods. A flag is in effect if the corresponding bit is set in
* the PGPROC of any backend.
*
* For our purposes here, a checkpoint has three phases: (1) determine the
* location to which the redo pointer will be moved, (2) write all the
* data durably to disk, and (3) WAL-log the checkpoint.
*
* Setting DELAY_CHKPT_START prevents the system from moving from phase 1
* to phase 2. This is useful when we are performing a WAL-logged modification
* of data that will be flushed to disk in phase 2. By setting this flag
* before writing WAL and clearing it after we've both written WAL and
* performed the corresponding modification, we ensure that if the WAL record
* is inserted prior to the new redo point, the corresponding data changes will
* also be flushed to disk before the checkpoint can complete. (In the
* extremely common case where the data being modified is in shared buffers
* and we acquire an exclusive content lock on the relevant buffers before
* writing WAL, this mechanism is not needed, because phase 2 will block
* until we release the content lock and then flush the modified data to
* disk.)
*
* Setting DELAY_CHKPT_COMPLETE prevents the system from moving from phase 2
* to phase 3. This is useful if we are performing a WAL-logged operation that
* might invalidate buffers, such as relation truncation. In this case, we need
* to ensure that any buffers which were invalidated and thus not flushed by
* the checkpoint are actaully destroyed on disk. Replay can cope with a file
* or block that doesn't exist, but not with a block that has the wrong
* contents.
*/
#define DELAY_CHKPT_START (1<<0)
#define DELAY_CHKPT_COMPLETE (1<<1)
typedef enum
{
PROC_WAIT_STATUS_OK,
@@ -191,7 +226,7 @@ struct PGPROC
pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition
* started */
bool delayChkpt; /* true if this proc delays checkpoint start */
int delayChkpt; /* for DELAY_CHKPT_* flags */
uint8 statusFlags; /* this backend's status flags, see PROC_*
* above. mirrored in