mirror of
https://github.com/postgres/postgres.git
synced 2025-04-22 23:02:54 +03:00
Make some more improvements to parallel query documentation.
Many places that mentioned only Gather should also mention Gather Merge, or should be phrased in a more neutral way. Be more clear about the fact that max_parallel_workers_per_gather affects the number of workers the planner may want to use. Fix a typo. Explain how Gather Merge works. Adjust wording around parallel scans to be a bit more clear. Adjust wording around parallel-restricted operations for the fact that uncorrelated subplans are no longer restricted. Patch by me, reviewed by Erik Rijkers Discussion: http://postgr.es/m/CA+TgmoZsTjgVGn=ei5ht-1qGFKy_m1VgB3d8+Rg304hz91N5ww@mail.gmail.com
This commit is contained in:
parent
e694010758
commit
c1ef4e5cdb
@ -2050,8 +2050,8 @@ include_dir 'conf.d'
|
|||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
Sets the maximum number of workers that can be started by a single
|
Sets the maximum number of workers that can be started by a single
|
||||||
<literal>Gather</literal> node. Parallel workers are taken from the
|
<literal>Gather</literal> or <literal>Gather Merge</literal> node.
|
||||||
pool of processes established by
|
Parallel workers are taken from the pool of processes established by
|
||||||
<xref linkend="guc-max-worker-processes">, limited by
|
<xref linkend="guc-max-worker-processes">, limited by
|
||||||
<xref linkend="guc-max-parallel-workers">. Note that the requested
|
<xref linkend="guc-max-parallel-workers">. Note that the requested
|
||||||
number of workers may not actually be available at run time. If this
|
number of workers may not actually be available at run time. If this
|
||||||
|
@ -28,7 +28,8 @@
|
|||||||
<para>
|
<para>
|
||||||
When the optimizer determines that parallel query is the fastest execution
|
When the optimizer determines that parallel query is the fastest execution
|
||||||
strategy for a particular query, it will create a query plan which includes
|
strategy for a particular query, it will create a query plan which includes
|
||||||
a <firstterm>Gather node</firstterm>. Here is a simple example:
|
a <firstterm>Gather</firstterm> or <firstterm>Gather Merge</firstterm>
|
||||||
|
node. Here is a simple example:
|
||||||
|
|
||||||
<screen>
|
<screen>
|
||||||
EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
||||||
@ -43,15 +44,16 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
|||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
In all cases, the <literal>Gather</literal> node will have exactly one
|
In all cases, the <literal>Gather</literal> or
|
||||||
|
<literal>Gather Merge</literal> node will have exactly one
|
||||||
child plan, which is the portion of the plan that will be executed in
|
child plan, which is the portion of the plan that will be executed in
|
||||||
parallel. If the <literal>Gather</> node is at the very top of the plan
|
parallel. If the <literal>Gather</> or <literal>Gather Merge</> node is
|
||||||
tree, then the entire query will execute in parallel. If it is somewhere
|
at the very top of the plan tree, then the entire query will execute in
|
||||||
else in the plan tree, then only the portion of the plan below it will run
|
parallel. If it is somewhere else in the plan tree, then only the portion
|
||||||
in parallel. In the example above, the query accesses only one table, so
|
of the plan below it will run in parallel. In the example above, the
|
||||||
there is only one plan node other than the <literal>Gather</> node itself;
|
query accesses only one table, so there is only one plan node other than
|
||||||
since that plan node is a child of the <literal>Gather</> node, it will
|
the <literal>Gather</> node itself; since that plan node is a child of the
|
||||||
run in parallel.
|
<literal>Gather</> node, it will run in parallel.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -60,34 +62,46 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
|||||||
during query execution, the process which is implementing the user's
|
during query execution, the process which is implementing the user's
|
||||||
session will request a number of <link linkend="bgworker">background
|
session will request a number of <link linkend="bgworker">background
|
||||||
worker processes</link> equal to the number
|
worker processes</link> equal to the number
|
||||||
of workers chosen by the planner. The total number of background
|
of workers chosen by the planner. The number of background workers that
|
||||||
workers that can exist at any one time is limited by both
|
the planner will consider using is limited to at most
|
||||||
|
<xref linkend="guc-max-parallel-workers-per-gather">. The total number
|
||||||
|
of background workers that can exist at any one time is limited by both
|
||||||
<xref linkend="guc-max-worker-processes"> and
|
<xref linkend="guc-max-worker-processes"> and
|
||||||
<xref linkend="guc-max-parallel-workers">, so it is possible for a
|
<xref linkend="guc-max-parallel-workers">. Therefore, it is possible for a
|
||||||
parallel query to run with fewer workers than planned, or even with
|
parallel query to run with fewer workers than planned, or even with
|
||||||
no workers at all. The optimal plan may depend on the number of workers
|
no workers at all. The optimal plan may depend on the number of workers
|
||||||
that are available, so this can result in poor query performance. If this
|
that are available, so this can result in poor query performance. If this
|
||||||
occurrence is frequent, considering increasing
|
occurrence is frequent, consider increasing
|
||||||
<varname>max_worker_processes</> and <varname>max_parallel_workers</>
|
<varname>max_worker_processes</> and <varname>max_parallel_workers</>
|
||||||
so that more workers can be run simultaneously or alternatively reducing
|
so that more workers can be run simultaneously or alternatively reducing
|
||||||
<xref linkend="guc-max-parallel-workers-per-gather"> so that the planner
|
<varname>max_parallel_workers_per_gather</varname> so that the planner
|
||||||
requests fewer workers.
|
requests fewer workers.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Every background worker process which is successfully started for a given
|
Every background worker process which is successfully started for a given
|
||||||
parallel query will execute the portion of the plan below
|
parallel query will execute the parallel portion of the plan. The leader
|
||||||
the <literal>Gather</> node. The leader will also execute that portion
|
will also execute that portion of the plan, but it has an additional
|
||||||
of the plan, but it has an additional responsibility: it must also read
|
responsibility: it must also read all of the tuples generated by the
|
||||||
all of the tuples generated by the workers. When the parallel portion of
|
workers. When the parallel portion of the plan generates only a small
|
||||||
the plan generates only a small number of tuples, the leader will often
|
number of tuples, the leader will often behave very much like an additional
|
||||||
behave very much like an additional worker, speeding up query execution.
|
worker, speeding up query execution. Conversely, when the parallel portion
|
||||||
Conversely, when the parallel portion of the plan generates a large number
|
of the plan generates a large number of tuples, the leader may be almost
|
||||||
of tuples, the leader may be almost entirely occupied with reading the
|
entirely occupied with reading the tuples generated by the workers and
|
||||||
tuples generated by the workers and performing any further processing
|
performing any further processing steps which are required by plan nodes
|
||||||
steps which are required by plan nodes above the level of the
|
above the level of the <literal>Gather</literal> node or
|
||||||
<literal>Gather</literal> node. In such cases, the leader will do very
|
<literal>Gather Merge</literal> node. In such cases, the leader will
|
||||||
little of the work of executing the parallel portion of the plan.
|
do very little of the work of executing the parallel portion of the plan.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
When the node at the top of the parallel portion of the plan is
|
||||||
|
<literal>Gather Merge</> rather than <literal>Gather</>, it indicates that
|
||||||
|
each process executing the parallel portion of the plan is producing
|
||||||
|
tuples in sorted order, and that the leader is performing an
|
||||||
|
order-preserving merge. In contrast, <literal>Gather</> reads tuples
|
||||||
|
from the workers in whatever order is convenient, destroying any sort
|
||||||
|
order that may have existed.
|
||||||
</para>
|
</para>
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
@ -221,9 +235,9 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
|||||||
send such a message, this can only occur when using a client that
|
send such a message, this can only occur when using a client that
|
||||||
does not rely on libpq. If this is a frequent
|
does not rely on libpq. If this is a frequent
|
||||||
occurrence, it may be a good idea to set
|
occurrence, it may be a good idea to set
|
||||||
<xref linkend="guc-max-parallel-workers-per-gather"> in sessions
|
<xref linkend="guc-max-parallel-workers-per-gather"> to zero in
|
||||||
where it is likely, so as to avoid generating query plans that may
|
sessions where it is likely, so as to avoid generating query plans
|
||||||
be suboptimal when run serially.
|
that may be suboptimal when run serially.
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
@ -262,6 +276,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
|||||||
so that each process which executes the plan will generate only a
|
so that each process which executes the plan will generate only a
|
||||||
subset of the output rows in such a way that each required output row
|
subset of the output rows in such a way that each required output row
|
||||||
is guaranteed to be generated by exactly one of the cooperating processes.
|
is guaranteed to be generated by exactly one of the cooperating processes.
|
||||||
|
Generally, this means that the scan on the driving table of the query
|
||||||
|
must be a parallel-aware scan.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<sect2 id="parallel-scans">
|
<sect2 id="parallel-scans">
|
||||||
@ -302,9 +318,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
|||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
Only the scan types listed above may be used for a scan on the driving
|
Other scan types, such as scans of non-btree indexes, may support
|
||||||
table within a parallel plan. Other scan types, such as parallel scans of
|
parallel scans in the future.
|
||||||
non-btree indexes, may be supported in the future.
|
|
||||||
</para>
|
</para>
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
@ -343,10 +358,10 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
|||||||
the query performs an aggregation step, producing a partial result for
|
the query performs an aggregation step, producing a partial result for
|
||||||
each group of which that process is aware. This is reflected in the plan
|
each group of which that process is aware. This is reflected in the plan
|
||||||
as a <literal>Partial Aggregate</> node. Second, the partial results are
|
as a <literal>Partial Aggregate</> node. Second, the partial results are
|
||||||
transferred to the leader via the <literal>Gather</> node. Finally, the
|
transferred to the leader via <literal>Gather</> or <literal>Gather
|
||||||
leader re-aggregates the results across all workers in order to produce
|
Merge</>. Finally, the leader re-aggregates the results across all
|
||||||
the final result. This is reflected in the plan as a
|
workers in order to produce the final result. This is reflected in the
|
||||||
<literal>Finalize Aggregate</> node.
|
plan as a <literal>Finalize Aggregate</> node.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -416,8 +431,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
|||||||
operation is one which cannot be performed in a parallel worker, but which
|
operation is one which cannot be performed in a parallel worker, but which
|
||||||
can be performed in the leader while parallel query is in use. Therefore,
|
can be performed in the leader while parallel query is in use. Therefore,
|
||||||
parallel restricted operations can never occur below a <literal>Gather</>
|
parallel restricted operations can never occur below a <literal>Gather</>
|
||||||
node, but can occur elsewhere in a plan which contains a
|
or <literal>Gather Merge</> node, but can occur elsewhere in a plan which
|
||||||
<literal>Gather</> node. A parallel unsafe operation is one which cannot
|
contains such a node. A parallel unsafe operation is one which cannot
|
||||||
be performed while parallel query is in use, not even in the leader.
|
be performed while parallel query is in use, not even in the leader.
|
||||||
When a query contains anything which is parallel unsafe, parallel query
|
When a query contains anything which is parallel unsafe, parallel query
|
||||||
is completely disabled for that query.
|
is completely disabled for that query.
|
||||||
@ -449,7 +464,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
|||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
Access to an <literal>InitPlan</> or <literal>SubPlan</>.
|
Access to an <literal>InitPlan</> or correlated <literal>SubPlan</>.
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
@ -514,8 +529,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
|||||||
parallel-restricted functions or aggregates involved in the query in
|
parallel-restricted functions or aggregates involved in the query in
|
||||||
order to obtain a superior plan. So, for example, if a <literal>WHERE</>
|
order to obtain a superior plan. So, for example, if a <literal>WHERE</>
|
||||||
clause applied to a particular table is parallel restricted, the query
|
clause applied to a particular table is parallel restricted, the query
|
||||||
planner will not consider placing the scan of that table below a
|
planner will not consider performing a scan of that table in the parallel
|
||||||
<literal>Gather</> node. In some cases, it would be
|
portion of a plan. In some cases, it would be
|
||||||
possible (and perhaps even efficient) to include the scan of that table in
|
possible (and perhaps even efficient) to include the scan of that table in
|
||||||
the parallel portion of the query and defer the evaluation of the
|
the parallel portion of the query and defer the evaluation of the
|
||||||
<literal>WHERE</> clause so that it happens above the <literal>Gather</>
|
<literal>WHERE</> clause so that it happens above the <literal>Gather</>
|
||||||
|
Loading…
x
Reference in New Issue
Block a user