1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-11 20:28:21 +03:00

Modify pg_basebackup to use a new COPY subprotocol for base backups.

In the new approach, all files across all tablespaces are sent in a
single COPY OUT operation. The CopyData messages are no longer raw
archive content; rather, each message is prefixed with a type byte
that describes its purpose, e.g. 'n' signifies the start of a new
archive and 'd' signifies archive or manifest data. This protocol
is significantly more extensible than the old approach, since we can
later create more message types, though not without concern for
backward compatibility.

The new protocol sends a few things to the client that the old one
did not. First, it sends the name of each archive explicitly, instead
of letting the client compute it. This is intended to make it easier
to write future patches that might send archives in a format other
that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress
messages rather than allowing the client to assume that progress is
defined by the number of bytes received. This will help with future
features where the server compresses the data, or sends it someplace
directly rather than transmitting it to the client.

The old protocol is still supported for compatibility with previous
releases. The new protocol is selected by means of a new
TARGET option to the BASE_BACKUP command. Currently, the
only supported target is 'client'. Support for additional
targets will be added in a later commit.

Patch by me. The patch set of which this is a part has had review
and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage,
Dipesh Pandit, and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
This commit is contained in:
Robert Haas
2022-01-18 13:47:26 -05:00
parent 3414099c33
commit cc333f3233
5 changed files with 806 additions and 48 deletions

View File

@ -2630,6 +2630,22 @@ The commands accepted in replication mode are:
</listitem>
</varlistentry>
<varlistentry>
<term><literal>TARGET</literal> <replaceable>'target'</replaceable></term>
<listitem>
<para>
Tells the server where to send the backup. If not specified,
the legacy base backup protocol will be used. Otherwise, the new
protocol will be used, as described below.
</para>
<para>
At present, the only supported value for this parameter is
<literal>client</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>PROGRESS [ <replaceable class="parameter">boolean</replaceable> ]</literal></term>
<listitem>
@ -2805,19 +2821,113 @@ The commands accepted in replication mode are:
<para>
After the second regular result set, one or more CopyOutResponse results
will be sent, one for the main data directory and one for each additional tablespace other
than <literal>pg_default</literal> and <literal>pg_global</literal>. The data in
the CopyOutResponse results will be a tar format (following the
<quote>ustar interchange format</quote> specified in the POSIX 1003.1-2008
standard) dump of the tablespace contents. Prior to
will be sent. If the <literal>TARGET</literal> option is not specified,
the legacy base backup protocol will be used. In this mode,
there will be one CopyOutResponse for the main directory, one for each
additional tablespace other than <literal>pg_default</literal> and
<literal>pg_global</literal>, and one for the backup manifested if
requested. The main data directory and any additional tablespaces will
be sent in tar format (following the <quote>ustar interchange
format</quote> specified in the POSIX 1003.1-2008 standard), and
the manifest will sent as a plain file. Prior to
<literal>PostgreSQL</literal> 15, the server omitted the two trailing
blocks of zeroes specified in the standard, but this is no longer the
case.
After the tar data is complete, and if a backup manifest was requested,
another CopyOutResponse result is sent, containing the manifest data for the
current base backup. In any case, a final ordinary result set will be
sent, containing the WAL end position of the backup, in the same format as
the start position.
</para>
<para>
New applications should specify the <literal>TARGET</literal> option.
When that option is used, a single CopyOutResponse will be sent, and
the payload of each CopyData message will contain a message in one of
the following formats:
</para>
<para>
<variablelist>
<varlistentry>
<term>new archive (B)</term>
<listitem><para><variablelist>
<varlistentry>
<term>Byte1('n')</term>
<listitem><para>
Identifes the messaage as indicating the start of a new archive.
</para></listitem>
</varlistentry>
<varlistentry>
<term>String</term>
<listitem><para>
The file name for this archive.
</para></listitem>
</varlistentry>
<varlistentry>
<term>String</term>
<listitem><para>
For the main data directory, an empty string. For other
tablespaces, the full path to the directory from which this
archive was created.
</para></listitem>
</varlistentry>
</variablelist></para></listitem>
</varlistentry>
<varlistentry>
<term>manifest (B)</term>
<listitem><para><variablelist>
<varlistentry>
<term>Byte1('m')</term>
<listitem><para>
Identifes the message as indicating the start of the backup
manifest.
</para></listitem>
</varlistentry>
</variablelist></para></listitem>
</varlistentry>
<varlistentry>
<term>archive or manifest data (B)</term>
<listitem><para><variablelist>
<varlistentry>
<term>Byte1('d')</term>
<listitem><para>
Identifes the message as containing archive or manifest data.
</para></listitem>
</varlistentry>
<varlistentry>
<term>Byte<replaceable>n</replaceable></term>
<listitem><para>
Data bytes.
</para></listitem>
</varlistentry>
</variablelist></para></listitem>
</varlistentry>
<varlistentry>
<term>progress report (B)</term>
<listitem><para><variablelist>
<varlistentry>
<term>Byte1('p')</term>
<listitem><para>
Identifes the message as a progress report.
</para></listitem>
</varlistentry>
<varlistentry>
<term>Int64</term>
<listitem><para>
The number of bytes from the current tablespace for which
processing has been completed.
</para></listitem>
</varlistentry>
</variablelist></para></listitem>
</varlistentry>
</variablelist>
</para>
<para>
After the CopyOutResponse, or all such responses, have been sent, a
final ordinary result set will be sent, containing the WAL end position
of the backup, in the same format as the start position.
</para>
<para>