mirror of
https://github.com/postgres/postgres.git
synced 2025-07-31 22:04:40 +03:00
Extend amcheck to check heap pages.
Mark Dilger, reviewed by Peter Geoghegan, Andres Freund, Álvaro Herrera, Michael Paquier, Amul Sul, and by me. Some last-minute cosmetic revisions by me. Discussion: http://postgr.es/m/12ED3DA8-25F0-4B68-937D-D907CFBF08E7@enterprisedb.com
This commit is contained in:
@ -9,12 +9,11 @@
|
||||
|
||||
<para>
|
||||
The <filename>amcheck</filename> module provides functions that allow you to
|
||||
verify the logical consistency of the structure of relations. If the
|
||||
structure appears to be valid, no error is raised.
|
||||
verify the logical consistency of the structure of relations.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The functions verify various <emphasis>invariants</emphasis> in the
|
||||
The B-Tree checking functions verify various <emphasis>invariants</emphasis> in the
|
||||
structure of the representation of particular relations. The
|
||||
correctness of the access method functions behind index scans and
|
||||
other important operations relies on these invariants always
|
||||
@ -24,7 +23,7 @@
|
||||
collated lexical order). If that particular invariant somehow fails
|
||||
to hold, we can expect binary searches on the affected page to
|
||||
incorrectly guide index scans, resulting in wrong answers to SQL
|
||||
queries.
|
||||
queries. If the structure appears to be valid, no error is raised.
|
||||
</para>
|
||||
<para>
|
||||
Verification is performed using the same procedures as those used by
|
||||
@ -35,7 +34,22 @@
|
||||
functions.
|
||||
</para>
|
||||
<para>
|
||||
<filename>amcheck</filename> functions may only be used by superusers.
|
||||
Unlike the B-Tree checking functions which report corruption by raising
|
||||
errors, the heap checking function <function>verify_heapam</function> checks
|
||||
a table and attempts to return a set of rows, one row per corruption
|
||||
detected. Despite this, if facilities that
|
||||
<function>verify_heapam</function> relies upon are themselves corrupted, the
|
||||
function may be unable to continue and may instead raise an error.
|
||||
</para>
|
||||
<para>
|
||||
Permission to execute <filename>amcheck</filename> functions may be granted
|
||||
to non-superusers, but before granting such permissions careful consideration
|
||||
should be given to data security and privacy concerns. Although the
|
||||
corruption reports generated by these functions do not focus on the contents
|
||||
of the corrupted data so much as on the structure of that data and the nature
|
||||
of the corruptions found, an attacker who gains permission to execute these
|
||||
functions, particularly if the attacker can also induce corruption, might be
|
||||
able to infer something of the data itself from such messages.
|
||||
</para>
|
||||
|
||||
<sect2>
|
||||
@ -187,12 +201,221 @@ SET client_min_messages = DEBUG1;
|
||||
</para>
|
||||
</tip>
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>
|
||||
<function>
|
||||
verify_heapam(relation regclass,
|
||||
on_error_stop boolean,
|
||||
check_toast boolean,
|
||||
skip cstring,
|
||||
startblock bigint,
|
||||
endblock bigint,
|
||||
blkno OUT bigint,
|
||||
offnum OUT integer,
|
||||
attnum OUT integer,
|
||||
msg OUT text)
|
||||
returns record
|
||||
</function>
|
||||
</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Checks a table for structural corruption, where pages in the relation
|
||||
contain data that is invalidly formatted, and for logical corruption,
|
||||
where pages are structurally valid but inconsistent with the rest of the
|
||||
database cluster. Example usage:
|
||||
<screen>
|
||||
test=# select * from verify_heapam('mytable', check_toast := true);
|
||||
blkno | offnum | attnum | msg
|
||||
-------+--------+--------+--------------------------------------------------------------------------------------------------
|
||||
17 | 12 | | xmin 4294967295 precedes relation freeze threshold 17:1134217582
|
||||
960 | 4 | | data begins at offset 152 beyond the tuple length 58
|
||||
960 | 4 | | tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
|
||||
960 | 5 | | tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
|
||||
960 | 6 | | tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
|
||||
960 | 7 | | tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
|
||||
1147 | 2 | | number of attributes 2047 exceeds maximum expected for table 3
|
||||
1147 | 10 | | tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
|
||||
1147 | 15 | | number of attributes 67 exceeds maximum expected for table 3
|
||||
1147 | 16 | 1 | attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
|
||||
1147 | 18 | 2 | final toast chunk number 0 differs from expected value 6
|
||||
1147 | 19 | 2 | toasted value for attribute 2 missing from toast table
|
||||
1147 | 21 | | tuple is marked as only locked, but also claims key columns were updated
|
||||
1147 | 22 | | multitransaction ID 1775655 is from before relation cutoff 2355572
|
||||
(14 rows)
|
||||
</screen>
|
||||
As this example shows, the Tuple ID (TID) of the corrupt tuple is given
|
||||
in the (<literal>blkno</literal>, <literal>offnum</literal>) columns, and
|
||||
for corruptions specific to a particular attribute in the tuple, the
|
||||
<literal>attnum</literal> field shows which one.
|
||||
</para>
|
||||
<para>
|
||||
Structural corruption can happen due to faulty storage hardware, or
|
||||
relation files being overwritten or modified by unrelated software.
|
||||
This kind of corruption can also be detected with
|
||||
<link linkend="app-initdb-data-checksums"><application>data page
|
||||
checksums</application></link>.
|
||||
</para>
|
||||
<para>
|
||||
Relation pages which are correctly formatted, internally consistent, and
|
||||
correct relative to their own internal checksums may still contain
|
||||
logical corruption. As such, this kind of corruption cannot be detected
|
||||
with <application>checksums</application>. Examples include toasted
|
||||
values in the main table which lack a corresponding entry in the toast
|
||||
table, and tuples in the main table with a Transaction ID that is older
|
||||
than the oldest valid Transaction ID in the database or cluster.
|
||||
</para>
|
||||
<para>
|
||||
Multiple causes of logical corruption have been observed in production
|
||||
systems, including bugs in the <productname>PostgreSQL</productname>
|
||||
server software, faulty and ill-conceived backup and restore tools, and
|
||||
user error.
|
||||
</para>
|
||||
<para>
|
||||
Corrupt relations are most concerning in live production environments,
|
||||
precisely the same environments where high risk activities are least
|
||||
welcome. For this reason, <function>verify_heapam</function> has been
|
||||
designed to diagnose corruption without undue risk. It cannot guard
|
||||
against all causes of backend crashes, as even executing the calling
|
||||
query could be unsafe on a badly corrupted system. Access to <link
|
||||
linkend="catalogs-overview">catalog tables</link> are performed and could
|
||||
be problematic if the catalogs themselves are corrupted.
|
||||
</para>
|
||||
<para>
|
||||
The design principle adhered to in <function>verify_heapam</function> is
|
||||
that, if the rest of the system and server hardware are correct, under
|
||||
default options, <function>verify_heapam</function> will not crash the
|
||||
server due merely to structural or logical corruption in the target
|
||||
table.
|
||||
</para>
|
||||
<para>
|
||||
The <literal>check_toast</literal> attempts to reconcile the target
|
||||
table against entries in its corresponding toast table. This option is
|
||||
disabled by default and is known to be slow.
|
||||
If the target relation's corresponding toast table or toast index is
|
||||
corrupt, reconciling the target table against toast values could
|
||||
conceivably crash the server, although in many cases this would
|
||||
just produce an error.
|
||||
</para>
|
||||
<para>
|
||||
The following optional arguments are recognized:
|
||||
</para>
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>on_error_stop</term>
|
||||
<listitem>
|
||||
<para>
|
||||
If true, corruption checking stops at the end of the first block on
|
||||
which any corruptions are found.
|
||||
</para>
|
||||
<para>
|
||||
Defaults to false.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>check_toast</term>
|
||||
<listitem>
|
||||
<para>
|
||||
If true, toasted values are checked gainst the corresponding
|
||||
TOAST table.
|
||||
</para>
|
||||
<para>
|
||||
Defaults to false.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>skip</term>
|
||||
<listitem>
|
||||
<para>
|
||||
If not <literal>none</literal>, corruption checking skips blocks that
|
||||
are marked as all-visible or all-frozen, as given.
|
||||
Valid options are <literal>all-visible</literal>,
|
||||
<literal>all-frozen</literal> and <literal>none</literal>.
|
||||
</para>
|
||||
<para>
|
||||
Defaults to <literal>none</literal>.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>startblock</term>
|
||||
<listitem>
|
||||
<para>
|
||||
If specified, corruption checking begins at the specified block,
|
||||
skipping all previous blocks. It is an error to specify a
|
||||
<literal>startblock</literal> outside the range of blocks in the
|
||||
target table.
|
||||
</para>
|
||||
<para>
|
||||
By default, does not skip any blocks.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>endblock</term>
|
||||
<listitem>
|
||||
<para>
|
||||
If specified, corruption checking ends at the specified block,
|
||||
skipping all remaining blocks. It is an error to specify an
|
||||
<literal>endblock</literal> outside the range of blocks in the target
|
||||
table.
|
||||
</para>
|
||||
<para>
|
||||
By default, does not skip any blocks.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
<para>
|
||||
For each corruption detected, <function>verify_heapam</function> returns
|
||||
a row with the following columns:
|
||||
</para>
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>blkno</term>
|
||||
<listitem>
|
||||
<para>
|
||||
The number of the block containing the corrupt page.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>offnum</term>
|
||||
<listitem>
|
||||
<para>
|
||||
The OffsetNumber of the corrupt tuple.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>attnum</term>
|
||||
<listitem>
|
||||
<para>
|
||||
The attribute number of the corrupt column in the tuple, if the
|
||||
corruption is specific to a column and not the tuple as a whole.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>msg</term>
|
||||
<listitem>
|
||||
<para>
|
||||
A human readable message describing the corruption in the page.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Optional <parameter>heapallindexed</parameter> Verification</title>
|
||||
<para>
|
||||
When the <parameter>heapallindexed</parameter> argument to
|
||||
When the <parameter>heapallindexed</parameter> argument to B-Tree
|
||||
verification functions is <literal>true</literal>, an additional
|
||||
phase of verification is performed against the table associated with
|
||||
the target index relation. This consists of a <quote>dummy</quote>
|
||||
|
Reference in New Issue
Block a user