1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-06 07:49:08 +03:00

Generate backup manifests for base backups, and validate them.

A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.

A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.

Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.

Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
This commit is contained in:
Robert Haas
2020-04-03 14:59:47 -04:00
parent ce77abe63c
commit 0d8c9c1210
27 changed files with 3614 additions and 32 deletions

View File

@@ -561,6 +561,70 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--no-manifest</option></term>
<listitem>
<para>
Disables generation of a backup manifest. If this option is not
specified, the server will generate and send a backup manifest
which can be verified using <xref linkend="app-pgvalidatebackup" />.
The manifest is a list of every file present in the backup with the
exception of any WAL files that may be included. It also stores the
size, last modification time, and an optional checksum for each file.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--manifest-force-encode</option></term>
<listitem>
<para>
Forces all filenames in the backup manifest to be hex-encoded.
If this option is not specified, only non-UTF8 filenames are
hex-encoded. This option is mostly intended to test that tools which
read a backup manifest file properly handle this case.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--manifest-checksums=<replaceable class="parameter">algorithm</replaceable></option></term>
<listitem>
<para>
Specifies the checksum algorithm that should be applied to each file
included in the backup manifest. Currently, the available
algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>,
<literal>SHA224</literal>, <literal>SHA256</literal>,
<literal>SHA384</literal>, and <literal>SHA512</literal>.
The default is <literal>CRC32C</literal>.
</para>
<para>
If <literal>NONE</literal> is selected, the backup manifest will
not contain any checksums. Otherwise, it will contain a checksum
of each file in the backup using the specified algorithm. In addition,
the manifest will always contain a <literal>SHA256</literal>
checksum of its own contents. The <literal>SHA</literal> algorithms
are significantly more CPU-intensive than <literal>CRC32C</literal>,
so selecting one of them may increase the time required to complete
the backup.
</para>
<para>
Using a SHA hash function provides a cryptographically secure digest
of each file for users who wish to verify that the backup has not been
tampered with, while the CRC32C algorithm provides a checksum which is
much faster to calculate and good at catching errors due to accidental
changes but is not resistant to targeted modifications. Note that, to
be useful against an adversary who has access to the backup, the backup
manifest would need to be stored securely elsewhere or otherwise
verified not to have been modified since the backup was taken.
</para>
<para>
<xref linkend="app-pgvalidatebackup" /> can be used to check the
integrity of a backup against the backup manifest.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>