Extend amcheck to check heap pages.

Mark Dilger, reviewed by Peter Geoghegan, Andres Freund, Álvaro Herrera, Michael Paquier, Amul Sul, and by me. Some last-minute cosmetic revisions by me. Discussion: http://postgr.es/m/12ED3DA8-25F0-4B68-937D-D907CFBF08E7@enterprisedb.com
2025-07-31 22:04:40 +03:00 · 2020-10-22 08:44:18 -04:00
parent f8721bd752
commit 866e24d47d
12 changed files with 2299 additions and 9 deletions
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@ -9,12 +9,11 @@

 <para>
  The <filename>amcheck</filename> module provides functions that allow you to
-  verify the logical consistency of the structure of relations.  If the
-  structure appears to be valid, no error is raised.
+  verify the logical consistency of the structure of relations.
 </para>

 <para>
-  The functions verify various <emphasis>invariants</emphasis> in the
+  The B-Tree checking functions verify various <emphasis>invariants</emphasis> in the
  structure of the representation of particular relations.  The
  correctness of the access method functions behind index scans and
  other important operations relies on these invariants always
@ -24,7 +23,7 @@
  collated lexical order).  If that particular invariant somehow fails
  to hold, we can expect binary searches on the affected page to
  incorrectly guide index scans, resulting in wrong answers to SQL
-  queries.
+  queries.  If the structure appears to be valid, no error is raised.
 </para>
 <para>
  Verification is performed using the same procedures as those used by
@ -35,7 +34,22 @@
  functions.
 </para>
 <para>
-  <filename>amcheck</filename> functions may only be used by superusers.
+  Unlike the B-Tree checking functions which report corruption by raising
+  errors, the heap checking function <function>verify_heapam</function> checks
+  a table and attempts to return a set of rows, one row per corruption
+  detected.  Despite this, if facilities that
+  <function>verify_heapam</function> relies upon are themselves corrupted, the
+  function may be unable to continue and may instead raise an error.
+ </para>
+ <para>
+  Permission to execute <filename>amcheck</filename> functions may be granted
+  to non-superusers, but before granting such permissions careful consideration
+  should be given to data security and privacy concerns.  Although the
+  corruption reports generated by these functions do not focus on the contents
+  of the corrupted data so much as on the structure of that data and the nature
+  of the corruptions found, an attacker who gains permission to execute these
+  functions, particularly if the attacker can also induce corruption, might be
+  able to infer something of the data itself from such messages.
 </para>

 <sect2>
@ -187,12 +201,221 @@ SET client_min_messages = DEBUG1;
   </para>
  </tip>

+  <variablelist>
+   <varlistentry>
+    <term>
+     <function>
+      verify_heapam(relation regclass,
+                    on_error_stop boolean,
+                    check_toast boolean,
+                    skip cstring,
+                    startblock bigint,
+                    endblock bigint,
+                    blkno OUT bigint,
+                    offnum OUT integer,
+                    attnum OUT integer,
+                    msg OUT text)
+      returns record
+     </function>
+    </term>
+    <listitem>
+     <para>
+      Checks a table for structural corruption, where pages in the relation
+      contain data that is invalidly formatted, and for logical corruption,
+      where pages are structurally valid but inconsistent with the rest of the
+      database cluster.  Example usage:
+<screen>
+test=# select * from verify_heapam('mytable', check_toast := true);
+ blkno | offnum | attnum |                                                msg
+-------+--------+--------+--------------------------------------------------------------------------------------------------
+    17 |     12 |        | xmin 4294967295 precedes relation freeze threshold 17:1134217582
+   960 |      4 |        | data begins at offset 152 beyond the tuple length 58
+   960 |      4 |        | tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
+   960 |      5 |        | tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
+   960 |      6 |        | tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
+   960 |      7 |        | tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
+  1147 |      2 |        | number of attributes 2047 exceeds maximum expected for table 3
+  1147 |     10 |        | tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
+  1147 |     15 |        | number of attributes 67 exceeds maximum expected for table 3
+  1147 |     16 |      1 | attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
+  1147 |     18 |      2 | final toast chunk number 0 differs from expected value 6
+  1147 |     19 |      2 | toasted value for attribute 2 missing from toast table
+  1147 |     21 |        | tuple is marked as only locked, but also claims key columns were updated
+  1147 |     22 |        | multitransaction ID 1775655 is from before relation cutoff 2355572
+(14 rows)
+</screen>
+      As this example shows, the Tuple ID (TID) of the corrupt tuple is given
+      in the (<literal>blkno</literal>, <literal>offnum</literal>) columns, and
+      for corruptions specific to a particular attribute in the tuple, the
+      <literal>attnum</literal> field shows which one.
+     </para>
+     <para>
+      Structural corruption can happen due to faulty storage hardware, or
+      relation files being overwritten or modified by unrelated software.
+      This kind of corruption can also be detected with
+      <link linkend="app-initdb-data-checksums"><application>data page
+      checksums</application></link>.
+     </para>
+     <para>
+      Relation pages which are correctly formatted, internally consistent, and
+      correct relative to their own internal checksums may still contain
+      logical corruption.  As such, this kind of corruption cannot be detected
+      with <application>checksums</application>.  Examples include toasted
+      values in the main table which lack a corresponding entry in the toast
+      table, and tuples in the main table with a Transaction ID that is older
+      than the oldest valid Transaction ID in the database or cluster.
+     </para>
+     <para>
+      Multiple causes of logical corruption have been observed in production
+      systems, including bugs in the <productname>PostgreSQL</productname>
+      server software, faulty and ill-conceived backup and restore tools, and
+      user error.
+     </para>
+     <para>
+      Corrupt relations are most concerning in live production environments,
+      precisely the same environments where high risk activities are least
+      welcome.  For this reason, <function>verify_heapam</function> has been
+      designed to diagnose corruption without undue risk.  It cannot guard
+      against all causes of backend crashes, as even executing the calling
+      query could be unsafe on a badly corrupted system.   Access to <link
+      linkend="catalogs-overview">catalog tables</link> are performed and could
+      be problematic if the catalogs themselves are corrupted.
+     </para>
+     <para>
+      The design principle adhered to in <function>verify_heapam</function> is
+      that, if the rest of the system and server hardware are correct, under
+      default options, <function>verify_heapam</function> will not crash the
+      server due merely to structural or logical corruption in the target
+      table.
+     </para>
+     <para>
+      The <literal>check_toast</literal> attempts to reconcile the target
+      table against entries in its corresponding toast table.  This option is
+      disabled by default and is known to be slow.
+      If the target relation's corresponding toast table or toast index is
+      corrupt, reconciling the target table against toast values could
+      conceivably crash the server, although in many cases this would
+      just produce an error.
+     </para>
+     <para>
+      The following optional arguments are recognized:
+     </para>
+     <variablelist>
+      <varlistentry>
+       <term>on_error_stop</term>
+       <listitem>
+        <para>
+         If true, corruption checking stops at the end of the first block on
+         which any corruptions are found.
+        </para>
+        <para>
+         Defaults to false.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry>
+       <term>check_toast</term>
+       <listitem>
+        <para>
+         If true, toasted values are checked gainst the corresponding
+         TOAST table.
+        </para>
+        <para>
+         Defaults to false.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry>
+       <term>skip</term>
+       <listitem>
+        <para>
+         If not <literal>none</literal>, corruption checking skips blocks that
+         are marked as all-visible or all-frozen, as given.
+         Valid options are <literal>all-visible</literal>,
+         <literal>all-frozen</literal> and <literal>none</literal>.
+        </para>
+        <para>
+         Defaults to <literal>none</literal>.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry>
+       <term>startblock</term>
+       <listitem>
+        <para>
+         If specified, corruption checking begins at the specified block,
+         skipping all previous blocks.  It is an error to specify a
+         <literal>startblock</literal> outside the range of blocks in the
+         target table.
+        </para>
+        <para>
+         By default, does not skip any blocks.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry>
+       <term>endblock</term>
+       <listitem>
+        <para>
+         If specified, corruption checking ends at the specified block,
+         skipping all remaining blocks.  It is an error to specify an
+         <literal>endblock</literal> outside the range of blocks in the target
+         table.
+        </para>
+        <para>
+         By default, does not skip any blocks.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+     <para>
+      For each corruption detected, <function>verify_heapam</function> returns
+      a row with the following columns:
+     </para>
+     <variablelist>
+      <varlistentry>
+       <term>blkno</term>
+       <listitem>
+        <para>
+         The number of the block containing the corrupt page.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry>
+       <term>offnum</term>
+       <listitem>
+        <para>
+         The OffsetNumber of the corrupt tuple.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry>
+       <term>attnum</term>
+       <listitem>
+        <para>
+         The attribute number of the corrupt column in the tuple, if the
+         corruption is specific to a column and not the tuple as a whole.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry>
+       <term>msg</term>
+       <listitem>
+        <para>
+         A human readable message describing the corruption in the page.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
 </sect2>

 <sect2>
  <title>Optional <parameter>heapallindexed</parameter> Verification</title>
 <para>
-  When the <parameter>heapallindexed</parameter> argument to
+  When the <parameter>heapallindexed</parameter> argument to B-Tree
  verification functions is <literal>true</literal>, an additional
  phase of verification is performed against the table associated with
  the target index relation.  This consists of a <quote>dummy</quote>