1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-27 12:41:57 +03:00

Perform apply of large transactions by parallel workers.

Currently, for large transactions, the publisher sends the data in
multiple streams (changes divided into chunks depending upon
logical_decoding_work_mem), and then on the subscriber-side, the apply
worker writes the changes into temporary files and once it receives the
commit, it reads from those files and applies the entire transaction. To
improve the performance of such transactions, we can instead allow them to
be applied via parallel workers.

In this approach, we assign a new parallel apply worker (if available) as
soon as the xact's first stream is received and the leader apply worker
will send changes to this new worker via shared memory. The parallel apply
worker will directly apply the change instead of writing it to temporary
files. However, if the leader apply worker times out while attempting to
send a message to the parallel apply worker, it will switch to
"partial serialize" mode -  in this mode, the leader serializes all
remaining changes to a file and notifies the parallel apply workers to
read and apply them at the end of the transaction. We use a non-blocking
way to send the messages from the leader apply worker to the parallel
apply to avoid deadlocks. We keep this parallel apply assigned till the
transaction commit is received and also wait for the worker to finish at
commit. This preserves commit ordering and avoid writing to and reading
from files in most cases. We still need to spill if there is no worker
available.

This patch also extends the SUBSCRIPTION 'streaming' parameter so that the
user can control whether to apply the streaming transaction in a parallel
apply worker or spill the change to disk. The user can set the streaming
parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means
the streaming will be applied via a parallel apply worker, if available.
The parameter value 'on' means the streaming transaction will be spilled
to disk. The default value is 'off' (same as current behaviour).

In addition, the patch extends the logical replication STREAM_ABORT
message so that abort_lsn and abort_time can also be sent which can be
used to update the replication origin in parallel apply worker when the
streaming transaction is aborted. Because this message extension is needed
to support parallel streaming, parallel streaming is not supported for
publications on servers < PG16.

Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko
Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik
Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
This commit is contained in:
Amit Kapila
2023-01-09 07:00:39 +05:30
parent 5687e7810f
commit 216a784829
58 changed files with 4497 additions and 745 deletions

View File

@ -7913,11 +7913,16 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>substream</structfield> <type>bool</type>
<structfield>substream</structfield> <type>char</type>
</para>
<para>
If true, the subscription will allow streaming of in-progress
transactions
Controls how to handle the streaming of in-progress transactions:
<literal>f</literal> = disallow streaming of in-progress transactions,
<literal>t</literal> = spill the changes of in-progress transactions to
disk and apply at once after the transaction is committed on the
publisher and received by the subscriber,
<literal>p</literal> = apply changes directly using a parallel apply
worker if available (same as 't' if no worker is available)
</para></entry>
</row>

View File

@ -4968,7 +4968,8 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
<listitem>
<para>
Specifies maximum number of logical replication workers. This includes
both apply workers and table synchronization workers.
leader apply workers, parallel apply workers, and table synchronization
workers.
</para>
<para>
Logical replication workers are taken from the pool defined by
@ -5008,6 +5009,31 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
<varlistentry id="guc-max-parallel-apply-workers-per-subscription" xreflabel="max_parallel_apply_workers_per_subscription">
<term><varname>max_parallel_apply_workers_per_subscription</varname> (<type>integer</type>)
<indexterm>
<primary><varname>max_parallel_apply_workers_per_subscription</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Maximum number of parallel apply workers per subscription. This
parameter controls the amount of parallelism for streaming of
in-progress transactions with subscription parameter
<literal>streaming = parallel</literal>.
</para>
<para>
The parallel apply workers are taken from the pool defined by
<varname>max_logical_replication_workers</varname>.
</para>
<para>
The default value is 2. This parameter can only be set in the
<filename>postgresql.conf</filename> file or on the server command
line.
</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>

View File

@ -1501,6 +1501,16 @@ CONTEXT: processing remote data for replication origin "pg_16395" during "INSER
might not violate any constraint. This can easily make the subscriber
inconsistent.
</para>
<para>
When the streaming mode is <literal>parallel</literal>, the finish LSN of
failed transactions may not be logged. In that case, it may be necessary to
change the streaming mode to <literal>on</literal> or <literal>off</literal> and
cause the same conflicts again so the finish LSN of the failed transaction will
be written to the server log. For the usage of finish LSN, please refer to <link
linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ...
SKIP</command></link>.
</para>
</sect1>
<sect1 id="logical-replication-restrictions">
@ -1809,8 +1819,9 @@ CONTEXT: processing remote data for replication origin "pg_16395" during "INSER
<para>
<link linkend="guc-max-logical-replication-workers"><varname>max_logical_replication_workers</varname></link>
must be set to at least the number of subscriptions (for apply workers), plus
some reserve for the table synchronization workers.
must be set to at least the number of subscriptions (for leader apply
workers), plus some reserve for the table synchronization workers and
parallel apply workers.
</para>
<para>
@ -1827,6 +1838,13 @@ CONTEXT: processing remote data for replication origin "pg_16395" during "INSER
subscription initialization or when new tables are added.
</para>
<para>
<link linkend="guc-max-parallel-apply-workers-per-subscription"><varname>max_parallel_apply_workers_per_subscription</varname></link>
controls the amount of parallelism for streaming of in-progress
transactions with subscription parameter
<literal>streaming = parallel</literal>.
</para>
<para>
Logical replication workers are also affected by
<link linkend="guc-wal-receiver-timeout"><varname>wal_receiver_timeout</varname></link>,

View File

@ -1858,6 +1858,11 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>advisory</literal></entry>
<entry>Waiting to acquire an advisory user lock.</entry>
</row>
<row>
<entry><literal>applytransaction</literal></entry>
<entry>Waiting to acquire a lock on a remote transaction being applied
by a logical replication subscriber.</entry>
</row>
<row>
<entry><literal>extend</literal></entry>
<entry>Waiting to extend a relation.</entry>

View File

@ -3103,7 +3103,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
<listitem>
<para>
Protocol version. Currently versions <literal>1</literal>, <literal>2</literal>,
and <literal>3</literal> are supported.
<literal>3</literal>, and <literal>4</literal> are supported.
</para>
<para>
Version <literal>2</literal> is supported only for server version 14
@ -3113,6 +3113,11 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
Version <literal>3</literal> is supported only for server version 15
and above, and it allows streaming of two-phase commits.
</para>
<para>
Version <literal>4</literal> is supported only for server version 16
and above, and it allows streams of large in-progress transactions to
be applied in parallel.
</para>
</listitem>
</varlistentry>
@ -6883,6 +6888,28 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Int64 (XLogRecPtr)</term>
<listitem>
<para>
The LSN of the abort. This field is available since protocol version
4.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Int64 (TimestampTz)</term>
<listitem>
<para>
Abort timestamp of the transaction. The value is in number
of microseconds since PostgreSQL epoch (2000-01-01). This field is
available since protocol version 4.
</para>
</listitem>
</varlistentry>
</variablelist>
</listitem>
</varlistentry>

View File

@ -228,13 +228,29 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
</varlistentry>
<varlistentry>
<term><literal>streaming</literal> (<type>boolean</type>)</term>
<term><literal>streaming</literal> (<type>enum</type>)</term>
<listitem>
<para>
Specifies whether to enable streaming of in-progress transactions
for this subscription. By default, all transactions
are fully decoded on the publisher and only then sent to the
subscriber as a whole.
for this subscription. The default value is <literal>off</literal>,
meaning all transactions are fully decoded on the publisher and only
then sent to the subscriber as a whole.
</para>
<para>
If set to <literal>on</literal>, the incoming changes are written to
temporary files and then applied only after the transaction is
committed on the publisher and received by the subscriber.
</para>
<para>
If set to <literal>parallel</literal>, incoming changes are directly
applied via one of the parallel apply workers, if available. If no
parallel apply worker is free to handle streaming transactions then
the changes are written to temporary files and applied after the
transaction is committed. Note that if an error happens in a
parallel apply worker, the finish LSN of the remote transaction
might not be reported in the server log.
</para>
</listitem>
</varlistentry>

View File

@ -1379,8 +1379,9 @@
<literal>virtualxid</literal>,
<literal>spectoken</literal>,
<literal>object</literal>,
<literal>userlock</literal>, or
<literal>advisory</literal>.
<literal>userlock</literal>,
<literal>advisory</literal>, or
<literal>applytransaction</literal>.
(See also <xref linkend="wait-event-lock-table"/>.)
</para></entry>
</row>
@ -1594,6 +1595,15 @@
so the <structfield>database</structfield> column is meaningful for an advisory lock.
</para>
<para>
Apply transaction locks are used in parallel mode to apply the transaction
in logical replication. The remote transaction id is displayed in the
<structfield>transactionid</structfield> column. The <structfield>objsubid</structfield>
displays the lock subtype which is 0 for the lock used to synchronize the
set of changes, and 1 for the lock used to wait for the transaction to
finish to ensure commit order.
</para>
<para>
<structname>pg_locks</structname> provides a global view of all locks
in the database cluster, not only those relevant to the current database.