1
0
mirror of https://github.com/postgres/postgres.git synced 2025-08-17 01:02:17 +03:00
Files
postgres/src/backend/storage/smgr
Thomas Munro 6534d544cd PANIC on fsync() failure.
On some operating systems, it doesn't make sense to retry fsync(),
because dirty data cached by the kernel may have been dropped on
write-back failure.  In that case the only remaining copy of the
data is in the WAL.  A subsequent fsync() could appear to succeed,
but not have flushed the data.  That means that a future checkpoint
could apparently complete successfully but have lost data.

Therefore, violently prevent any future checkpoint attempts by
panicking on the first fsync() failure.  Note that we already
did the same for WAL data; this change extends that behavior to
non-temporary data files.

Provide a GUC data_sync_retry to control this new behavior, for
users of operating systems that don't eject dirty data, and possibly
forensic/testing uses.  If it is set to on and the write-back error
was transient, a later checkpoint might genuinely succeed (on a
system that does not throw away buffers on failure); if the error is
permanent, later checkpoints will continue to fail.  The GUC defaults
to off, meaning that we panic.

Back-patch to all supported releases.

There is still a narrow window for error-loss on some operating
systems: if the file is closed and later reopened and a write-back
error occurs in the intervening time, but the inode has the bad
luck to be evicted due to memory pressure before we reopen, we could
miss the error.  A later patch will address that with a scheme
for keeping files with dirty data open at all times, but we judge
that to be too complicated to back-patch.

Author: Craig Ringer, with some adjustments by Thomas Munro
Reported-by: Craig Ringer
Reviewed-by: Robert Haas, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
2018-11-19 13:37:59 +13:00
..
2010-09-20 22:08:53 +02:00
2018-11-19 13:37:59 +13:00
2014-07-28 16:30:14 -04:00
2018-01-02 23:30:12 -05:00
2018-01-02 23:30:12 -05:00

src/backend/storage/smgr/README

Storage Managers
================

In the original Berkeley Postgres system, there were several storage managers,
of which only the "magnetic disk" manager remains.  (At Berkeley there were
also managers for the Sony WORM optical disk jukebox and persistent main
memory, but these were never supported in any externally released Postgres,
nor in any version of PostgreSQL.)  The "magnetic disk" manager is itself
seriously misnamed, because actually it supports any kind of device for
which the operating system provides standard filesystem operations; which
these days is pretty much everything of interest.  However, we retain the
notion of a storage manager switch in case anyone ever wants to reintroduce
other kinds of storage managers.  Removing the switch layer would save
nothing noticeable anyway, since storage-access operations are surely far
more expensive than one extra layer of C function calls.

In Berkeley Postgres each relation was tagged with the ID of the storage
manager to use for it.  This is gone.  It would be probably more reasonable
to associate storage managers with tablespaces, should we ever re-introduce
multiple storage managers into the system catalogs.

The files in this directory, and their contents, are

    smgr.c	The storage manager switch dispatch code.  The routines in
		this file call the appropriate storage manager to do storage
		accesses requested by higher-level code.  smgr.c also manages
		the file handle cache (SMgrRelation table).

    md.c	The "magnetic disk" storage manager, which is really just
		an interface to the kernel's filesystem operations.

    smgrtype.c	Storage manager type -- maps string names to storage manager
		IDs and provides simple comparison operators.  This is the
		regproc support for type "smgr" in the system catalogs.
		(This is vestigial since no columns of type smgr exist
		in the catalogs anymore.)

Note that md.c in turn relies on src/backend/storage/file/fd.c.


Relation Forks
==============

Since 8.4, a single smgr relation can be comprised of multiple physical
files, called relation forks. This allows storing additional metadata like
Free Space information in additional forks, which can be grown and truncated
independently of the main data file, while still treating it all as a single
physical relation in system catalogs.

It is assumed that the main fork, fork number 0 or MAIN_FORKNUM, always
exists. Fork numbers are assigned in src/include/common/relpath.h.
Functions in smgr.c and md.c take an extra fork number argument, in addition
to relfilenode and block number, to identify which relation fork you want to
access. Since most code wants to access the main fork, a shortcut version of
ReadBuffer that accesses MAIN_FORKNUM is provided in the buffer manager for
convenience.