mirror of
https://github.com/postgres/postgres.git
synced 2025-11-25 12:03:53 +03:00
Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.
If TRUNCATE causes some buffers to be invalidated and thus the checkpoint does not flush them, TRUNCATE must also ensure that the corresponding files are truncated on disk. Otherwise, a replay from the checkpoint might find that the buffers exist but have the wrong contents, which may cause replay to fail. Report by Teja Mupparti. Patch by Kyotaro Horiguchi, per a design suggestion from Heikki Linnakangas, with some changes to the comments by me. Review of this and a prior patch that approached the issue differently by Heikki Linnakangas, Andres Freund, Álvaro Herrera, Masahiko Sawada, and Tom Lane. Discussion: http://postgr.es/m/BYAPR06MB6373BF50B469CA393C614257ABF00@BYAPR06MB6373.namprd06.prod.outlook.com
This commit is contained in:
@@ -86,6 +86,41 @@ struct XidCache
|
||||
*/
|
||||
#define INVALID_PGPROCNO PG_INT32_MAX
|
||||
|
||||
/*
|
||||
* Flags for PGPROC.delayChkpt
|
||||
*
|
||||
* These flags can be used to delay the start or completion of a checkpoint
|
||||
* for short periods. A flag is in effect if the corresponding bit is set in
|
||||
* the PGPROC of any backend.
|
||||
*
|
||||
* For our purposes here, a checkpoint has three phases: (1) determine the
|
||||
* location to which the redo pointer will be moved, (2) write all the
|
||||
* data durably to disk, and (3) WAL-log the checkpoint.
|
||||
*
|
||||
* Setting DELAY_CHKPT_START prevents the system from moving from phase 1
|
||||
* to phase 2. This is useful when we are performing a WAL-logged modification
|
||||
* of data that will be flushed to disk in phase 2. By setting this flag
|
||||
* before writing WAL and clearing it after we've both written WAL and
|
||||
* performed the corresponding modification, we ensure that if the WAL record
|
||||
* is inserted prior to the new redo point, the corresponding data changes will
|
||||
* also be flushed to disk before the checkpoint can complete. (In the
|
||||
* extremely common case where the data being modified is in shared buffers
|
||||
* and we acquire an exclusive content lock on the relevant buffers before
|
||||
* writing WAL, this mechanism is not needed, because phase 2 will block
|
||||
* until we release the content lock and then flush the modified data to
|
||||
* disk.)
|
||||
*
|
||||
* Setting DELAY_CHKPT_COMPLETE prevents the system from moving from phase 2
|
||||
* to phase 3. This is useful if we are performing a WAL-logged operation that
|
||||
* might invalidate buffers, such as relation truncation. In this case, we need
|
||||
* to ensure that any buffers which were invalidated and thus not flushed by
|
||||
* the checkpoint are actaully destroyed on disk. Replay can cope with a file
|
||||
* or block that doesn't exist, but not with a block that has the wrong
|
||||
* contents.
|
||||
*/
|
||||
#define DELAY_CHKPT_START (1<<0)
|
||||
#define DELAY_CHKPT_COMPLETE (1<<1)
|
||||
|
||||
typedef enum
|
||||
{
|
||||
PROC_WAIT_STATUS_OK,
|
||||
@@ -191,7 +226,7 @@ struct PGPROC
|
||||
pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition
|
||||
* started */
|
||||
|
||||
bool delayChkpt; /* true if this proc delays checkpoint start */
|
||||
int delayChkpt; /* for DELAY_CHKPT_* flags */
|
||||
|
||||
uint8 statusFlags; /* this backend's status flags, see PROC_*
|
||||
* above. mirrored in
|
||||
|
||||
Reference in New Issue
Block a user