Cope with data-offset-less archive files during out-of-order restores.

pg_dump produces custom-format archive files that lack data offsets when it is unable to seek its output. Up to now that's been a hazard for pg_restore. But if pg_restore is able to seek in the archive file, there is no reason to throw up our hands when asked to restore data blocks out of order. Instead, whenever we are searching for a data block, record the locations of the blocks we passed over (that is, fill in the missing data-offset fields in our in-memory copy of the TOC data). Then, when we hit a case that requires going backwards, we can just seek back. Also track the furthest point that we've searched to, and seek back to there when beginning a search for a new data block. This avoids possible O(N^2) time consumption, by ensuring that each data block is examined at most twice. (On Unix systems, that's at most twice per parallel-restore job; but since Windows uses threads here, the threads can share block location knowledge, reducing the amount of duplicated work.) We can also improve the code a bit by using fseeko() to skip over data blocks during the search. This is all of some use even in simple restores, but it's really significant for parallel pg_restore. In that case, we require seekability of the input already, and we will very probably need to do out-of-order restores. Back-patch to v12, as this fixes a regression introduced by commit 548e50976. Before that, parallel restore avoided requesting out-of-order restores, so it would work on a data-offset-less archive. Now it will again. Ideally this patch would include some test coverage, but there are other open bugs that need to be fixed before we can extend our coverage of parallel restore very much. Plan to revisit that later. David Gilman and Tom Lane; reviewed by Justin Pryzby Discussion: https://postgr.es/m/CALBH9DDuJ+scZc4MEvw5uO-=vRyR2=QF9+Yh=3hPEnKHWfS81A@mail.gmail.com
2025-12-06 00:02:13 +03:00 · 2020-07-17 13:03:50 -04:00
parent a8d0732ac2
commit f009591d6e
2 changed files with 118 additions and 35 deletions
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -246,12 +246,14 @@ PostgreSQL documentation
      <term><option>--jobs=<replaceable class="parameter">number-of-jobs</replaceable></option></term>
      <listitem>
       <para>
-        Run the most time-consuming parts
-        of <application>pg_restore</application> &mdash; those which load data,
-        create indexes, or create constraints &mdash; using multiple
-        concurrent jobs.  This option can dramatically reduce the time
+        Run the most time-consuming steps
+        of <application>pg_restore</application> &mdash; those that load data,
+        create indexes, or create constraints &mdash; concurrently, using up
+        to <replaceable class="parameter">number-of-jobs</replaceable>
+        concurrent sessions.  This option can dramatically reduce the time
        to restore a large database to a server running on a
-        multiprocessor machine.
+        multiprocessor machine.  This option is ignored when emitting a script
+        rather than connecting directly to a database server.
       </para>

       <para>
@@ -274,8 +276,7 @@ PostgreSQL documentation
        Only the custom and directory archive formats are supported
        with this option.
        The input must be a regular file or directory (not, for example, a
-        pipe).  This option is ignored when emitting a script rather
-        than connecting directly to a database server.  Also, multiple
+        pipe or standard input).  Also, multiple
        jobs cannot be used together with the
        option <option>--single-transaction</option>.
       </para>