mirror of
https://github.com/postgres/postgres.git
synced 2025-05-21 15:54:08 +03:00
This follows the American format, https://jakubmarian.com/comma-after-i-e-and-e-g/. There is no intention of requiring this format for future text, but making existing text consistent every few years makes sense. Discussion: https://postgr.es/m/20200825183619.GA22369@momjian.us Backpatch-through: 9.5
480 lines
21 KiB
Plaintext
480 lines
21 KiB
Plaintext
<!-- doc/src/sgml/parallel.sgml -->
|
|
|
|
<chapter id="parallel-query">
|
|
<title>Parallel Query</title>
|
|
|
|
<indexterm zone="parallel-query">
|
|
<primary>parallel query</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</> can devise query plans which can leverage
|
|
multiple CPUs in order to answer queries faster. This feature is known
|
|
as parallel query. Many queries cannot benefit from parallel query, either
|
|
due to limitations of the current implementation or because there is no
|
|
imaginable query plan which is any faster than the serial query plan.
|
|
However, for queries that can benefit, the speedup from parallel query
|
|
is often very significant. Many queries can run more than twice as fast
|
|
when using parallel query, and some queries can run four times faster or
|
|
even more. Queries that touch a large amount of data but return only a
|
|
few rows to the user will typically benefit most. This chapter explains
|
|
some details of how parallel query works and in which situations it can be
|
|
used so that users who wish to make use of it can understand what to expect.
|
|
</para>
|
|
|
|
<sect1 id="how-parallel-query-works">
|
|
<title>How Parallel Query Works</title>
|
|
|
|
<para>
|
|
When the optimizer determines that parallel query is the fastest execution
|
|
strategy for a particular query, it will create a query plan which includes
|
|
a <firstterm>Gather node</firstterm>. Here is a simple example:
|
|
|
|
<screen>
|
|
EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
|
|
QUERY PLAN
|
|
-------------------------------------------------------------------------------------
|
|
Gather (cost=1000.00..217018.43 rows=1 width=97)
|
|
Workers Planned: 2
|
|
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..216018.33 rows=1 width=97)
|
|
Filter: (filler ~~ '%x%'::text)
|
|
(4 rows)
|
|
</screen>
|
|
</para>
|
|
|
|
<para>
|
|
In all cases, the <literal>Gather</literal> node will have exactly one
|
|
child plan, which is the portion of the plan that will be executed in
|
|
parallel. If the <literal>Gather</> node is at the very top of the plan
|
|
tree, then the entire query will execute in parallel. If it is somewhere
|
|
else in the plan tree, then only that portion of the query will run in
|
|
parallel. In the example above, the query accesses only one table, so
|
|
there is only one plan node other than the <literal>Gather</> node itself;
|
|
since that plan node is a child of the <literal>Gather</> node, it will
|
|
run in parallel.
|
|
</para>
|
|
|
|
<para>
|
|
<link linkend="using-explain">Using EXPLAIN</>, you can see the number of
|
|
workers chosen by the planner. When the <literal>Gather</> node is reached
|
|
during query execution, the process which is implementing the user's
|
|
session will request a number of <link linkend="bgworker">background
|
|
worker processes</link> equal to the number
|
|
of workers chosen by the planner. The total number of background
|
|
workers that can exist at any one time is limited by
|
|
<xref linkend="guc-max-worker-processes">, so it is possible for a
|
|
parallel query to run with fewer workers than planned, or even with
|
|
no workers at all. The optimal plan may depend on the number of workers
|
|
that are available, so this can result in poor query performance. If this
|
|
occurrence is frequent, considering increasing
|
|
<varname>max_worker_processes</> so that more workers can be run
|
|
simultaneously or alternatively reducing
|
|
<xref linkend="guc-max-parallel-workers-per-gather"> so that the planner
|
|
requests fewer workers.
|
|
</para>
|
|
|
|
<para>
|
|
Every background worker process which is successfully started for a given
|
|
parallel query will execute the portion of the plan which is a descendent
|
|
of the <literal>Gather</> node. The leader will also execute that portion
|
|
of the plan, but it has an additional responsibility: it must also read
|
|
all of the tuples generated by the workers. When the parallel portion of
|
|
the plan generates only a small number of tuples, the leader will often
|
|
behave very much like an additional worker, speeding up query execution.
|
|
Conversely, when the parallel portion of the plan generates a large number
|
|
of tuples, the leader may be almost entirely occupied with reading the
|
|
tuples generated by the workers and performing any further processing
|
|
steps which are required by plan nodes above the level of the
|
|
<literal>Gather</literal> node. In such cases, the leader will do very
|
|
little of the work of executing the parallel portion of the plan.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="when-can-parallel-query-be-used">
|
|
<title>When Can Parallel Query Be Used?</title>
|
|
|
|
<para>
|
|
There are several settings which can cause the query planner not to
|
|
generate a parallel query plan under any circumstances. In order for
|
|
any parallel query plans whatsoever to be generated, the following
|
|
settings must be configured as indicated.
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<xref linkend="guc-max-parallel-workers-per-gather"> must be set to a
|
|
value which is greater than zero. This is a special case of the more
|
|
general principle that no more workers should be used than the number
|
|
configured via <varname>max_parallel_workers_per_gather</varname>.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
<xref linkend="guc-dynamic-shared-memory-type"> must be set to a
|
|
value other than <literal>none</>. Parallel query requires dynamic
|
|
shared memory in order to pass data between cooperating processes.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
In addition, the system must not be running in single-user mode. Since
|
|
the entire database system is running in single process in this situation,
|
|
no background workers will be available.
|
|
</para>
|
|
|
|
<para>
|
|
Even when it is in general possible for parallel query plans to be
|
|
generated, the planner will not generate them for a given query
|
|
if any of the following are true:
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
The query writes any data or locks any database rows. If a query
|
|
contains a data-modifying operation either at the top level or within
|
|
a CTE, no parallel plans for that query will be generated. This is a
|
|
limitation of the current implementation which could be lifted in a
|
|
future release.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
The query might be suspended during execution. In any situation in
|
|
which the system thinks that partial or incremental execution might
|
|
occur, no parallel plan is generated. For example, a cursor created
|
|
using <link linkend="sql-declare">DECLARE CURSOR</link> will never use
|
|
a parallel plan. Similarly, a PL/pgsql loop of the form
|
|
<literal>FOR x IN query LOOP .. END LOOP</literal> will never use a
|
|
parallel plan, because the parallel query system is unable to verify
|
|
that the code in the loop is safe to execute while parallel query is
|
|
active.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
The query uses any function marked <literal>PARALLEL UNSAFE</literal>.
|
|
Most system-defined functions are <literal>PARALLEL SAFE</literal>,
|
|
but user-defined functions are marked <literal>PARALLEL
|
|
UNSAFE</literal> by default. See the discussion of
|
|
<xref linkend="parallel-safety">.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
The query is running inside of another query that is already parallel.
|
|
For example, if a function called by a parallel query issues an SQL
|
|
query itself, that query will never use a parallel plan. This is a
|
|
limitation of the current implementation, but it may not be desirable
|
|
to remove this limitation, since it could result in a single query
|
|
using a very large number of processes.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
The transaction isolation level is serializable. This is
|
|
a limitation of the current implementation.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
Even when parallel query plan is generated for a particular query, there
|
|
are several circumstances under which it will be impossible to execute
|
|
that plan in parallel at execution time. If this occurs, the leader
|
|
will execute the portion of the plan below the <literal>Gather</>
|
|
node entirely by itself, almost as if the <literal>Gather</> node were
|
|
not present. This will happen if any of the following conditions are met:
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
No background workers can be obtained because of the limitation that
|
|
the total number of background workers cannot exceed
|
|
<xref linkend="guc-max-worker-processes">.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
The client sends an Execute message with a non-zero fetch count.
|
|
See the discussion of the
|
|
<link linkend="protocol-flow-ext-query">extended query protocol</link>.
|
|
Since <link linkend="libpq">libpq</link> currently provides no way to
|
|
send such a message, this can only occur when using a client that
|
|
does not rely on libpq. If this is a frequent
|
|
occurrence, it may be a good idea to set
|
|
<xref linkend="guc-max-parallel-workers-per-gather"> in sessions
|
|
where it is likely, so as to avoid generating query plans that may
|
|
be suboptimal when run serially.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
A prepared statement is executed using a <literal>CREATE TABLE .. AS
|
|
EXECUTE ..</literal> statement. This construct converts what otherwise
|
|
would have been a read-only operation into a read-write operation,
|
|
making it ineligible for parallel query.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
The transaction isolation level is serializable. This situation
|
|
does not normally arise, because parallel query plans are not
|
|
generated when the transaction isolation level is serializable.
|
|
However, it can happen if the transaction isolation level is changed to
|
|
serializable after the plan is generated and before it is executed.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</sect1>
|
|
|
|
<sect1 id="parallel-plans">
|
|
<title>Parallel Plans</title>
|
|
|
|
<para>
|
|
Because each worker executes the parallel portion of the plan to
|
|
completion, it is not possible to simply take an ordinary query plan
|
|
and run it using multiple workers. Each worker would produce a full
|
|
copy of the output result set, so the query would not run any faster
|
|
than normal but would produce incorrect results. Instead, the parallel
|
|
portion of the plan must be what is known internally to the query
|
|
optimizer as a <firstterm>partial plan</>; that is, it must be constructed
|
|
so that each process which executes the plan will generate only a
|
|
subset of the output rows in such a way that each required output row
|
|
is guaranteed to be generated by exactly one of the cooperating processes.
|
|
</para>
|
|
|
|
<sect2 id="parallel-scans">
|
|
<title>Parallel Scans</title>
|
|
|
|
<para>
|
|
Currently, the only type of scan which has been modified to work with
|
|
parallel query is a sequential scan. Therefore, the driving table in
|
|
a parallel plan will always be scanned using a
|
|
<literal>Parallel Seq Scan</>. The relation's blocks will be divided
|
|
among the cooperating processes. Blocks are handed out one at a
|
|
time, so that access to the relation remains sequential. Each process
|
|
will visit every tuple on the page assigned to it before requesting a new
|
|
page.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="parallel-joins">
|
|
<title>Parallel Joins</title>
|
|
|
|
<para>
|
|
The driving table may be joined to one or more other tables using nested
|
|
loops or hash joins. The inner side of the join may be any kind of
|
|
non-parallel plan that is otherwise supported by the planner provided that
|
|
it is safe to run within a parallel worker. For example, it may be an
|
|
index scan which looks up a value taken from the outer side of the join.
|
|
Each worker will execute the inner side of the join in full, which for
|
|
hash join means that an identical hash table is built in each worker
|
|
process.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="parallel-aggregation">
|
|
<title>Parallel Aggregation</title>
|
|
<para>
|
|
<productname>PostgreSQL</> supports parallel aggregation by aggregating in
|
|
two stages. First, each process participating in the parallel portion of
|
|
the query performs an aggregation step, producing a partial result for
|
|
each group of which that process is aware. This is reflected in the plan
|
|
as a <literal>Partial Aggregate</> node. Second, the partial results are
|
|
transferred to the leader via the <literal>Gather</> node. Finally, the
|
|
leader re-aggregates the results across all workers in order to produce
|
|
the final result. This is reflected in the plan as a
|
|
<literal>Finalize Aggregate</> node.
|
|
</para>
|
|
|
|
<para>
|
|
Because the <literal>Finalize Aggregate</> node runs on the leader
|
|
process, queries which produce a relatively large number of groups in
|
|
comparison to the number of input rows will appear less favorable to the
|
|
query planner. For example, in the worst-case scenario the number of
|
|
groups seen by the <literal>Finalize Aggregate</> node could be as many as
|
|
the number of input rows which were seen by all worker processes in the
|
|
<literal>Partial Aggregate</> stage. For such cases, there is clearly
|
|
going to be no performance benefit to using parallel aggregation. The
|
|
query planner takes this into account during the planning process and is
|
|
unlikely to choose parallel aggregate in this scenario.
|
|
</para>
|
|
|
|
<para>
|
|
Parallel aggregation is not supported in all situations. Each aggregate
|
|
must be <link linkend="parallel-safety">safe</> for parallelism and must
|
|
have a combine function. If the aggregate has a transition state of type
|
|
<literal>internal</>, it must have serialization and deserialization
|
|
functions. See <xref linkend="sql-createaggregate"> for more details.
|
|
Parallel aggregation is not supported if any aggregate function call
|
|
contains <literal>DISTINCT</> or <literal>ORDER BY</> clause and is also
|
|
not supported for ordered set aggregates or when the query involves
|
|
<literal>GROUPING SETS</>. It can only be used when all joins involved in
|
|
the query are also part of the parallel portion of the plan.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="parallel-plan-tips">
|
|
<title>Parallel Plan Tips</title>
|
|
|
|
<para>
|
|
If a query that is expected to do so does not produce a parallel plan,
|
|
you can try reducing <xref linkend="guc-parallel-setup-cost"> or
|
|
<xref linkend="guc-parallel-tuple-cost">. Of course, this plan may turn
|
|
out to be slower than the serial plan which the planner preferred, but
|
|
this will not always be the case. If you don't get a parallel
|
|
plan even with very small values of these settings (e.g., after setting
|
|
them both to zero), there may be some reason why the query planner is
|
|
unable to generate a parallel plan for your query. See
|
|
<xref linkend="when-can-parallel-query-be-used"> and
|
|
<xref linkend="parallel-safety"> for information on why this may be
|
|
the case.
|
|
</para>
|
|
|
|
<para>
|
|
When executing a parallel plan, you can use <literal>EXPLAIN (ANALYZE,
|
|
VERBOSE)</literal> to display per-worker statistics for each plan node.
|
|
This may be useful in determining whether the work is being evenly
|
|
distributed between all plan nodes and more generally in understanding the
|
|
performance characteristics of the plan.
|
|
</para>
|
|
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="parallel-safety">
|
|
<title>Parallel Safety</title>
|
|
|
|
<para>
|
|
The planner classifies operations involved in a query as either
|
|
<firstterm>parallel safe</>, <firstterm>parallel restricted</>,
|
|
or <firstterm>parallel unsafe</>. A parallel safe operation is one which
|
|
does not conflict with the use of parallel query. A parallel restricted
|
|
operation is one which cannot be performed in a parallel worker, but which
|
|
can be performed in the leader while parallel query is in use. Therefore,
|
|
parallel restricted operations can never occur below a <literal>Gather</>
|
|
node, but can occur elsewhere in a plan which contains a
|
|
<literal>Gather</> node. A parallel unsafe operation is one which cannot
|
|
be performed while parallel query is in use, not even in the leader.
|
|
When a query contains anything which is parallel unsafe, parallel query
|
|
is completely disabled for that query.
|
|
</para>
|
|
|
|
<para>
|
|
The following operations are always parallel restricted.
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Scans of common table expressions (CTEs).
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Scans of temporary tables.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Scans of foreign tables, unless the foreign data wrapper has
|
|
an <literal>IsForeignScanParallelSafe</> API which indicates otherwise.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Access to an <literal>InitPlan</> or <literal>SubPlan</>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<sect2 id="parallel-labeling">
|
|
<title>Parallel Labeling for Functions and Aggregates</title>
|
|
|
|
<para>
|
|
The planner cannot automatically determine whether a user-defined
|
|
function or aggregate is parallel safe, parallel restricted, or parallel
|
|
unsafe, because this would require predicting every operation which the
|
|
function could possibly perform. In general, this is equivalent to the
|
|
Halting Problem and therefore impossible. Even for simple functions
|
|
where it could conceivably be done, we do not try, since this would be expensive
|
|
and error-prone. Instead, all user-defined functions are assumed to
|
|
be parallel unsafe unless otherwise marked. When using
|
|
<xref linkend="sql-createfunction"> or
|
|
<xref linkend="sql-alterfunction">, markings can be set by specifying
|
|
<literal>PARALLEL SAFE</>, <literal>PARALLEL RESTRICTED</>, or
|
|
<literal>PARALLEL UNSAFE</> as appropriate. When using
|
|
<xref linkend="sql-createaggregate">, the
|
|
<literal>PARALLEL</> option can be specified with <literal>SAFE</>,
|
|
<literal>RESTRICTED</>, or <literal>UNSAFE</> as the corresponding value.
|
|
</para>
|
|
|
|
<para>
|
|
Functions and aggregates must be marked <literal>PARALLEL UNSAFE</> if
|
|
they write to the database, access sequences, change the transaction state
|
|
even temporarily (e.g., a PL/pgsql function which establishes an
|
|
<literal>EXCEPTION</> block to catch errors), or make persistent changes to
|
|
settings. Similarly, functions must be marked <literal>PARALLEL
|
|
RESTRICTED</> if they access temporary tables, client connection state,
|
|
cursors, prepared statements, or miscellaneous backend-local state which
|
|
the system cannot synchronize across workers. For example,
|
|
<literal>setseed</> and <literal>random</> are parallel restricted for
|
|
this last reason.
|
|
</para>
|
|
|
|
<para>
|
|
In general, if a function is labeled as being safe when it is restricted or
|
|
unsafe, or if it is labeled as being restricted when it is in fact unsafe,
|
|
it may throw errors or produce wrong answers when used in a parallel query.
|
|
C-language functions could in theory exhibit totally undefined behavior if
|
|
mislabeled, since there is no way for the system to protect itself against
|
|
arbitrary C code, but in most likely cases the result will be no worse than
|
|
for any other function. If in doubt, it is probably best to label functions
|
|
as <literal>UNSAFE</>.
|
|
</para>
|
|
|
|
<para>
|
|
If a function executed within a parallel worker acquires locks which are
|
|
not held by the leader, for example by querying a table not referenced in
|
|
the query, those locks will be released at worker exit, not end of
|
|
transaction. If you write a function which does this, and this behavior
|
|
difference is important to you, mark such functions as
|
|
<literal>PARALLEL RESTRICTED</literal>
|
|
to ensure that they execute only in the leader.
|
|
</para>
|
|
|
|
<para>
|
|
Note that the query planner does not consider deferring the evaluation of
|
|
parallel-restricted functions or aggregates involved in the query in
|
|
order to obtain a superior plan. So, for example, if a <literal>WHERE</>
|
|
clause applied to a particular table is parallel restricted, the query
|
|
planner will not consider placing the scan of that table below a
|
|
<literal>Gather</> node. In some cases, it would be
|
|
possible (and perhaps even efficient) to include the scan of that table in
|
|
the parallel portion of the query and defer the evaluation of the
|
|
<literal>WHERE</> clause so that it happens above the <literal>Gather</>
|
|
node. However, the planner does not do this.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|