mirror of
https://github.com/postgres/postgres.git
synced 2025-04-22 23:02:54 +03:00
Make some more improvements to parallel query documentation.
Many places that mentioned only Gather should also mention Gather Merge, or should be phrased in a more neutral way. Be more clear about the fact that max_parallel_workers_per_gather affects the number of workers the planner may want to use. Fix a typo. Explain how Gather Merge works. Adjust wording around parallel scans to be a bit more clear. Adjust wording around parallel-restricted operations for the fact that uncorrelated subplans are no longer restricted. Patch by me, reviewed by Erik Rijkers Discussion: http://postgr.es/m/CA+TgmoZsTjgVGn=ei5ht-1qGFKy_m1VgB3d8+Rg304hz91N5ww@mail.gmail.com
This commit is contained in:
parent
e694010758
commit
c1ef4e5cdb
@ -2050,8 +2050,8 @@ include_dir 'conf.d'
|
||||
<listitem>
|
||||
<para>
|
||||
Sets the maximum number of workers that can be started by a single
|
||||
<literal>Gather</literal> node. Parallel workers are taken from the
|
||||
pool of processes established by
|
||||
<literal>Gather</literal> or <literal>Gather Merge</literal> node.
|
||||
Parallel workers are taken from the pool of processes established by
|
||||
<xref linkend="guc-max-worker-processes">, limited by
|
||||
<xref linkend="guc-max-parallel-workers">. Note that the requested
|
||||
number of workers may not actually be available at run time. If this
|
||||
|
@ -28,7 +28,8 @@
|
||||
<para>
|
||||
When the optimizer determines that parallel query is the fastest execution
|
||||
strategy for a particular query, it will create a query plan which includes
|
||||
a <firstterm>Gather node</firstterm>. Here is a simple example:
|
||||
a <firstterm>Gather</firstterm> or <firstterm>Gather Merge</firstterm>
|
||||
node. Here is a simple example:
|
||||
|
||||
<screen>
|
||||
EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
||||
@ -43,15 +44,16 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
||||
</para>
|
||||
|
||||
<para>
|
||||
In all cases, the <literal>Gather</literal> node will have exactly one
|
||||
In all cases, the <literal>Gather</literal> or
|
||||
<literal>Gather Merge</literal> node will have exactly one
|
||||
child plan, which is the portion of the plan that will be executed in
|
||||
parallel. If the <literal>Gather</> node is at the very top of the plan
|
||||
tree, then the entire query will execute in parallel. If it is somewhere
|
||||
else in the plan tree, then only the portion of the plan below it will run
|
||||
in parallel. In the example above, the query accesses only one table, so
|
||||
there is only one plan node other than the <literal>Gather</> node itself;
|
||||
since that plan node is a child of the <literal>Gather</> node, it will
|
||||
run in parallel.
|
||||
parallel. If the <literal>Gather</> or <literal>Gather Merge</> node is
|
||||
at the very top of the plan tree, then the entire query will execute in
|
||||
parallel. If it is somewhere else in the plan tree, then only the portion
|
||||
of the plan below it will run in parallel. In the example above, the
|
||||
query accesses only one table, so there is only one plan node other than
|
||||
the <literal>Gather</> node itself; since that plan node is a child of the
|
||||
<literal>Gather</> node, it will run in parallel.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
@ -60,35 +62,47 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
||||
during query execution, the process which is implementing the user's
|
||||
session will request a number of <link linkend="bgworker">background
|
||||
worker processes</link> equal to the number
|
||||
of workers chosen by the planner. The total number of background
|
||||
workers that can exist at any one time is limited by both
|
||||
of workers chosen by the planner. The number of background workers that
|
||||
the planner will consider using is limited to at most
|
||||
<xref linkend="guc-max-parallel-workers-per-gather">. The total number
|
||||
of background workers that can exist at any one time is limited by both
|
||||
<xref linkend="guc-max-worker-processes"> and
|
||||
<xref linkend="guc-max-parallel-workers">, so it is possible for a
|
||||
<xref linkend="guc-max-parallel-workers">. Therefore, it is possible for a
|
||||
parallel query to run with fewer workers than planned, or even with
|
||||
no workers at all. The optimal plan may depend on the number of workers
|
||||
that are available, so this can result in poor query performance. If this
|
||||
occurrence is frequent, considering increasing
|
||||
occurrence is frequent, consider increasing
|
||||
<varname>max_worker_processes</> and <varname>max_parallel_workers</>
|
||||
so that more workers can be run simultaneously or alternatively reducing
|
||||
<xref linkend="guc-max-parallel-workers-per-gather"> so that the planner
|
||||
<varname>max_parallel_workers_per_gather</varname> so that the planner
|
||||
requests fewer workers.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Every background worker process which is successfully started for a given
|
||||
parallel query will execute the portion of the plan below
|
||||
the <literal>Gather</> node. The leader will also execute that portion
|
||||
of the plan, but it has an additional responsibility: it must also read
|
||||
all of the tuples generated by the workers. When the parallel portion of
|
||||
the plan generates only a small number of tuples, the leader will often
|
||||
behave very much like an additional worker, speeding up query execution.
|
||||
Conversely, when the parallel portion of the plan generates a large number
|
||||
of tuples, the leader may be almost entirely occupied with reading the
|
||||
tuples generated by the workers and performing any further processing
|
||||
steps which are required by plan nodes above the level of the
|
||||
<literal>Gather</literal> node. In such cases, the leader will do very
|
||||
little of the work of executing the parallel portion of the plan.
|
||||
parallel query will execute the parallel portion of the plan. The leader
|
||||
will also execute that portion of the plan, but it has an additional
|
||||
responsibility: it must also read all of the tuples generated by the
|
||||
workers. When the parallel portion of the plan generates only a small
|
||||
number of tuples, the leader will often behave very much like an additional
|
||||
worker, speeding up query execution. Conversely, when the parallel portion
|
||||
of the plan generates a large number of tuples, the leader may be almost
|
||||
entirely occupied with reading the tuples generated by the workers and
|
||||
performing any further processing steps which are required by plan nodes
|
||||
above the level of the <literal>Gather</literal> node or
|
||||
<literal>Gather Merge</literal> node. In such cases, the leader will
|
||||
do very little of the work of executing the parallel portion of the plan.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
When the node at the top of the parallel portion of the plan is
|
||||
<literal>Gather Merge</> rather than <literal>Gather</>, it indicates that
|
||||
each process executing the parallel portion of the plan is producing
|
||||
tuples in sorted order, and that the leader is performing an
|
||||
order-preserving merge. In contrast, <literal>Gather</> reads tuples
|
||||
from the workers in whatever order is convenient, destroying any sort
|
||||
order that may have existed.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="when-can-parallel-query-be-used">
|
||||
@ -221,9 +235,9 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
||||
send such a message, this can only occur when using a client that
|
||||
does not rely on libpq. If this is a frequent
|
||||
occurrence, it may be a good idea to set
|
||||
<xref linkend="guc-max-parallel-workers-per-gather"> in sessions
|
||||
where it is likely, so as to avoid generating query plans that may
|
||||
be suboptimal when run serially.
|
||||
<xref linkend="guc-max-parallel-workers-per-gather"> to zero in
|
||||
sessions where it is likely, so as to avoid generating query plans
|
||||
that may be suboptimal when run serially.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
@ -262,6 +276,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
||||
so that each process which executes the plan will generate only a
|
||||
subset of the output rows in such a way that each required output row
|
||||
is guaranteed to be generated by exactly one of the cooperating processes.
|
||||
Generally, this means that the scan on the driving table of the query
|
||||
must be a parallel-aware scan.
|
||||
</para>
|
||||
|
||||
<sect2 id="parallel-scans">
|
||||
@ -302,9 +318,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
Only the scan types listed above may be used for a scan on the driving
|
||||
table within a parallel plan. Other scan types, such as parallel scans of
|
||||
non-btree indexes, may be supported in the future.
|
||||
Other scan types, such as scans of non-btree indexes, may support
|
||||
parallel scans in the future.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
@ -343,10 +358,10 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
||||
the query performs an aggregation step, producing a partial result for
|
||||
each group of which that process is aware. This is reflected in the plan
|
||||
as a <literal>Partial Aggregate</> node. Second, the partial results are
|
||||
transferred to the leader via the <literal>Gather</> node. Finally, the
|
||||
leader re-aggregates the results across all workers in order to produce
|
||||
the final result. This is reflected in the plan as a
|
||||
<literal>Finalize Aggregate</> node.
|
||||
transferred to the leader via <literal>Gather</> or <literal>Gather
|
||||
Merge</>. Finally, the leader re-aggregates the results across all
|
||||
workers in order to produce the final result. This is reflected in the
|
||||
plan as a <literal>Finalize Aggregate</> node.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
@ -416,8 +431,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
||||
operation is one which cannot be performed in a parallel worker, but which
|
||||
can be performed in the leader while parallel query is in use. Therefore,
|
||||
parallel restricted operations can never occur below a <literal>Gather</>
|
||||
node, but can occur elsewhere in a plan which contains a
|
||||
<literal>Gather</> node. A parallel unsafe operation is one which cannot
|
||||
or <literal>Gather Merge</> node, but can occur elsewhere in a plan which
|
||||
contains such a node. A parallel unsafe operation is one which cannot
|
||||
be performed while parallel query is in use, not even in the leader.
|
||||
When a query contains anything which is parallel unsafe, parallel query
|
||||
is completely disabled for that query.
|
||||
@ -449,7 +464,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
Access to an <literal>InitPlan</> or <literal>SubPlan</>.
|
||||
Access to an <literal>InitPlan</> or correlated <literal>SubPlan</>.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
@ -514,8 +529,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
||||
parallel-restricted functions or aggregates involved in the query in
|
||||
order to obtain a superior plan. So, for example, if a <literal>WHERE</>
|
||||
clause applied to a particular table is parallel restricted, the query
|
||||
planner will not consider placing the scan of that table below a
|
||||
<literal>Gather</> node. In some cases, it would be
|
||||
planner will not consider performing a scan of that table in the parallel
|
||||
portion of a plan. In some cases, it would be
|
||||
possible (and perhaps even efficient) to include the scan of that table in
|
||||
the parallel portion of the query and defer the evaluation of the
|
||||
<literal>WHERE</> clause so that it happens above the <literal>Gather</>
|
||||
|
Loading…
x
Reference in New Issue
Block a user