mirror of
				https://github.com/postgres/postgres.git
				synced 2025-11-03 09:13:20 +03:00 
			
		
		
		
	Make some more improvements to parallel query documentation.
Many places that mentioned only Gather should also mention Gather Merge, or should be phrased in a more neutral way. Be more clear about the fact that max_parallel_workers_per_gather affects the number of workers the planner may want to use. Fix a typo. Explain how Gather Merge works. Adjust wording around parallel scans to be a bit more clear. Adjust wording around parallel-restricted operations for the fact that uncorrelated subplans are no longer restricted. Patch by me, reviewed by Erik Rijkers Discussion: http://postgr.es/m/CA+TgmoZsTjgVGn=ei5ht-1qGFKy_m1VgB3d8+Rg304hz91N5ww@mail.gmail.com
This commit is contained in:
		@@ -2050,8 +2050,8 @@ include_dir 'conf.d'
 | 
				
			|||||||
       <listitem>
 | 
					       <listitem>
 | 
				
			||||||
        <para>
 | 
					        <para>
 | 
				
			||||||
         Sets the maximum number of workers that can be started by a single
 | 
					         Sets the maximum number of workers that can be started by a single
 | 
				
			||||||
         <literal>Gather</literal> node.  Parallel workers are taken from the
 | 
					         <literal>Gather</literal> or <literal>Gather Merge</literal> node.
 | 
				
			||||||
         pool of processes established by
 | 
					         Parallel workers are taken from the pool of processes established by
 | 
				
			||||||
         <xref linkend="guc-max-worker-processes">, limited by
 | 
					         <xref linkend="guc-max-worker-processes">, limited by
 | 
				
			||||||
         <xref linkend="guc-max-parallel-workers">.  Note that the requested
 | 
					         <xref linkend="guc-max-parallel-workers">.  Note that the requested
 | 
				
			||||||
         number of workers may not actually be available at run time.  If this
 | 
					         number of workers may not actually be available at run time.  If this
 | 
				
			||||||
 
 | 
				
			|||||||
@@ -28,7 +28,8 @@
 | 
				
			|||||||
   <para>
 | 
					   <para>
 | 
				
			||||||
    When the optimizer determines that parallel query is the fastest execution
 | 
					    When the optimizer determines that parallel query is the fastest execution
 | 
				
			||||||
    strategy for a particular query, it will create a query plan which includes
 | 
					    strategy for a particular query, it will create a query plan which includes
 | 
				
			||||||
    a <firstterm>Gather node</firstterm>.  Here is a simple example:
 | 
					    a <firstterm>Gather</firstterm> or <firstterm>Gather Merge</firstterm>
 | 
				
			||||||
 | 
					    node.  Here is a simple example:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
<screen>
 | 
					<screen>
 | 
				
			||||||
EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 | 
					EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 | 
				
			||||||
@@ -43,15 +44,16 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 | 
				
			|||||||
   </para>
 | 
					   </para>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
   <para>
 | 
					   <para>
 | 
				
			||||||
    In all cases, the <literal>Gather</literal> node will have exactly one
 | 
					    In all cases, the <literal>Gather</literal> or
 | 
				
			||||||
 | 
					    <literal>Gather Merge</literal> node will have exactly one
 | 
				
			||||||
    child plan, which is the portion of the plan that will be executed in
 | 
					    child plan, which is the portion of the plan that will be executed in
 | 
				
			||||||
    parallel.  If the <literal>Gather</> node is at the very top of the plan
 | 
					    parallel.  If the <literal>Gather</> or <literal>Gather Merge</> node is
 | 
				
			||||||
    tree, then the entire query will execute in parallel.  If it is somewhere
 | 
					    at the very top of the plan tree, then the entire query will execute in
 | 
				
			||||||
    else in the plan tree, then only the portion of the plan below it will run
 | 
					    parallel.  If it is somewhere else in the plan tree, then only the portion
 | 
				
			||||||
    in parallel.  In the example above, the query accesses only one table, so
 | 
					    of the plan below it will run in parallel.  In the example above, the
 | 
				
			||||||
    there is only one plan node other than the <literal>Gather</> node itself;
 | 
					    query accesses only one table, so there is only one plan node other than
 | 
				
			||||||
    since that plan node is a child of the <literal>Gather</> node, it will
 | 
					    the <literal>Gather</> node itself; since that plan node is a child of the
 | 
				
			||||||
    run in parallel.
 | 
					    <literal>Gather</> node, it will run in parallel.
 | 
				
			||||||
   </para>
 | 
					   </para>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
   <para>
 | 
					   <para>
 | 
				
			||||||
@@ -60,34 +62,46 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 | 
				
			|||||||
    during query execution, the process which is implementing the user's
 | 
					    during query execution, the process which is implementing the user's
 | 
				
			||||||
    session will request a number of <link linkend="bgworker">background
 | 
					    session will request a number of <link linkend="bgworker">background
 | 
				
			||||||
    worker processes</link> equal to the number
 | 
					    worker processes</link> equal to the number
 | 
				
			||||||
    of workers chosen by the planner.  The total number of background
 | 
					    of workers chosen by the planner.  The number of background workers that
 | 
				
			||||||
    workers that can exist at any one time is limited by both
 | 
					    the planner will consider using is limited to at most
 | 
				
			||||||
 | 
					    <xref linkend="guc-max-parallel-workers-per-gather">.  The total number
 | 
				
			||||||
 | 
					    of background workers that can exist at any one time is limited by both
 | 
				
			||||||
    <xref linkend="guc-max-worker-processes"> and
 | 
					    <xref linkend="guc-max-worker-processes"> and
 | 
				
			||||||
    <xref linkend="guc-max-parallel-workers">, so it is possible for a
 | 
					    <xref linkend="guc-max-parallel-workers">.  Therefore, it is possible for a
 | 
				
			||||||
    parallel query to run with fewer workers than planned, or even with
 | 
					    parallel query to run with fewer workers than planned, or even with
 | 
				
			||||||
    no workers at all.  The optimal plan may depend on the number of workers
 | 
					    no workers at all.  The optimal plan may depend on the number of workers
 | 
				
			||||||
    that are available, so this can result in poor query performance.  If this
 | 
					    that are available, so this can result in poor query performance.  If this
 | 
				
			||||||
    occurrence is frequent, considering increasing
 | 
					    occurrence is frequent, consider increasing
 | 
				
			||||||
    <varname>max_worker_processes</> and <varname>max_parallel_workers</>
 | 
					    <varname>max_worker_processes</> and <varname>max_parallel_workers</>
 | 
				
			||||||
    so that more workers can be run simultaneously or alternatively reducing
 | 
					    so that more workers can be run simultaneously or alternatively reducing
 | 
				
			||||||
    <xref linkend="guc-max-parallel-workers-per-gather"> so that the planner
 | 
					    <varname>max_parallel_workers_per_gather</varname> so that the planner
 | 
				
			||||||
    requests fewer workers.
 | 
					    requests fewer workers.
 | 
				
			||||||
   </para>
 | 
					   </para>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
   <para>
 | 
					   <para>
 | 
				
			||||||
    Every background worker process which is successfully started for a given
 | 
					    Every background worker process which is successfully started for a given
 | 
				
			||||||
    parallel query will execute the portion of the plan below
 | 
					    parallel query will execute the parallel portion of the plan.  The leader
 | 
				
			||||||
    the <literal>Gather</> node.  The leader will also execute that portion
 | 
					    will also execute that portion of the plan, but it has an additional
 | 
				
			||||||
    of the plan, but it has an additional responsibility: it must also read
 | 
					    responsibility: it must also read all of the tuples generated by the
 | 
				
			||||||
    all of the tuples generated by the workers.  When the parallel portion of
 | 
					    workers.  When the parallel portion of the plan generates only a small
 | 
				
			||||||
    the plan generates only a small number of tuples, the leader will often
 | 
					    number of tuples, the leader will often behave very much like an additional
 | 
				
			||||||
    behave very much like an additional worker, speeding up query execution.
 | 
					    worker, speeding up query execution.  Conversely, when the parallel portion
 | 
				
			||||||
    Conversely, when the parallel portion of the plan generates a large number
 | 
					    of the plan generates a large number of tuples, the leader may be almost
 | 
				
			||||||
    of tuples, the leader may be almost entirely occupied with reading the
 | 
					    entirely occupied with reading the tuples generated by the workers and
 | 
				
			||||||
    tuples generated by the workers and performing any further processing
 | 
					    performing any further processing steps which are required by plan nodes
 | 
				
			||||||
    steps which are required by plan nodes above the level of the
 | 
					    above the level of the <literal>Gather</literal> node or
 | 
				
			||||||
    <literal>Gather</literal> node.  In such cases, the leader will do very
 | 
					    <literal>Gather Merge</literal> node.  In such cases, the leader will
 | 
				
			||||||
    little of the work of executing the parallel portion of the plan.
 | 
					    do very little of the work of executing the parallel portion of the plan.
 | 
				
			||||||
 | 
					   </para>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					   <para>
 | 
				
			||||||
 | 
					    When the node at the top of the parallel portion of the plan is
 | 
				
			||||||
 | 
					    <literal>Gather Merge</> rather than <literal>Gather</>, it indicates that
 | 
				
			||||||
 | 
					    each process executing the parallel portion of the plan is producing
 | 
				
			||||||
 | 
					    tuples in sorted order, and that the leader is performing an
 | 
				
			||||||
 | 
					    order-preserving merge.  In contrast, <literal>Gather</> reads tuples
 | 
				
			||||||
 | 
					    from the workers in whatever order is convenient, destroying any sort
 | 
				
			||||||
 | 
					    order that may have existed.
 | 
				
			||||||
   </para>   
 | 
					   </para>   
 | 
				
			||||||
 </sect1>
 | 
					 </sect1>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@@ -221,9 +235,9 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 | 
				
			|||||||
        send such a message, this can only occur when using a client that
 | 
					        send such a message, this can only occur when using a client that
 | 
				
			||||||
        does not rely on libpq.  If this is a frequent
 | 
					        does not rely on libpq.  If this is a frequent
 | 
				
			||||||
        occurrence, it may be a good idea to set
 | 
					        occurrence, it may be a good idea to set
 | 
				
			||||||
        <xref linkend="guc-max-parallel-workers-per-gather"> in sessions
 | 
					        <xref linkend="guc-max-parallel-workers-per-gather"> to zero in
 | 
				
			||||||
        where it is likely, so as to avoid generating query plans that may
 | 
					        sessions where it is likely, so as to avoid generating query plans
 | 
				
			||||||
        be suboptimal when run serially.
 | 
					        that may be suboptimal when run serially.
 | 
				
			||||||
      </para>
 | 
					      </para>
 | 
				
			||||||
    </listitem>
 | 
					    </listitem>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@@ -262,6 +276,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 | 
				
			|||||||
    so that each process which executes the plan will generate only a
 | 
					    so that each process which executes the plan will generate only a
 | 
				
			||||||
    subset of the output rows in such a way that each required output row
 | 
					    subset of the output rows in such a way that each required output row
 | 
				
			||||||
    is guaranteed to be generated by exactly one of the cooperating processes.
 | 
					    is guaranteed to be generated by exactly one of the cooperating processes.
 | 
				
			||||||
 | 
					    Generally, this means that the scan on the driving table of the query
 | 
				
			||||||
 | 
					    must be a parallel-aware scan.
 | 
				
			||||||
  </para>
 | 
					  </para>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 <sect2 id="parallel-scans">
 | 
					 <sect2 id="parallel-scans">
 | 
				
			||||||
@@ -302,9 +318,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 | 
				
			|||||||
    </listitem>
 | 
					    </listitem>
 | 
				
			||||||
  </itemizedlist>
 | 
					  </itemizedlist>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    Only the scan types listed above may be used for a scan on the driving
 | 
					    Other scan types, such as scans of non-btree indexes, may support
 | 
				
			||||||
    table within a parallel plan.  Other scan types, such as parallel scans of
 | 
					    parallel scans in the future.
 | 
				
			||||||
    non-btree indexes, may be supported in the future.
 | 
					 | 
				
			||||||
  </para>
 | 
					  </para>
 | 
				
			||||||
 </sect2>
 | 
					 </sect2>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@@ -343,10 +358,10 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 | 
				
			|||||||
    the query performs an aggregation step, producing a partial result for
 | 
					    the query performs an aggregation step, producing a partial result for
 | 
				
			||||||
    each group of which that process is aware.  This is reflected in the plan
 | 
					    each group of which that process is aware.  This is reflected in the plan
 | 
				
			||||||
    as a <literal>Partial Aggregate</> node.  Second, the partial results are
 | 
					    as a <literal>Partial Aggregate</> node.  Second, the partial results are
 | 
				
			||||||
    transferred to the leader via the <literal>Gather</> node.  Finally, the
 | 
					    transferred to the leader via <literal>Gather</> or <literal>Gather
 | 
				
			||||||
    leader re-aggregates the results across all workers in order to produce
 | 
					    Merge</>.  Finally, the leader re-aggregates the results across all
 | 
				
			||||||
    the final result.  This is reflected in the plan as a
 | 
					    workers in order to produce the final result.  This is reflected in the
 | 
				
			||||||
    <literal>Finalize Aggregate</> node.
 | 
					    plan as a <literal>Finalize Aggregate</> node.
 | 
				
			||||||
  </para>
 | 
					  </para>
 | 
				
			||||||
  
 | 
					  
 | 
				
			||||||
  <para>
 | 
					  <para>
 | 
				
			||||||
@@ -416,8 +431,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 | 
				
			|||||||
    operation is one which cannot be performed in a parallel worker, but which
 | 
					    operation is one which cannot be performed in a parallel worker, but which
 | 
				
			||||||
    can be performed in the leader while parallel query is in use.  Therefore,
 | 
					    can be performed in the leader while parallel query is in use.  Therefore,
 | 
				
			||||||
    parallel restricted operations can never occur below a <literal>Gather</>
 | 
					    parallel restricted operations can never occur below a <literal>Gather</>
 | 
				
			||||||
    node, but can occur elsewhere in a plan which contains a
 | 
					    or <literal>Gather Merge</> node, but can occur elsewhere in a plan which
 | 
				
			||||||
    <literal>Gather</> node.  A parallel unsafe operation is one which cannot
 | 
					    contains such a node.  A parallel unsafe operation is one which cannot
 | 
				
			||||||
    be performed while parallel query is in use, not even in the leader.
 | 
					    be performed while parallel query is in use, not even in the leader.
 | 
				
			||||||
    When a query contains anything which is parallel unsafe, parallel query
 | 
					    When a query contains anything which is parallel unsafe, parallel query
 | 
				
			||||||
    is completely disabled for that query.
 | 
					    is completely disabled for that query.
 | 
				
			||||||
@@ -449,7 +464,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
    <listitem>
 | 
					    <listitem>
 | 
				
			||||||
      <para>
 | 
					      <para>
 | 
				
			||||||
        Access to an <literal>InitPlan</> or <literal>SubPlan</>.
 | 
					        Access to an <literal>InitPlan</> or correlated <literal>SubPlan</>.
 | 
				
			||||||
      </para>
 | 
					      </para>
 | 
				
			||||||
    </listitem>
 | 
					    </listitem>
 | 
				
			||||||
  </itemizedlist>
 | 
					  </itemizedlist>
 | 
				
			||||||
@@ -514,8 +529,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 | 
				
			|||||||
    parallel-restricted functions or aggregates involved in the query in
 | 
					    parallel-restricted functions or aggregates involved in the query in
 | 
				
			||||||
    order to obtain a superior plan.  So, for example, if a <literal>WHERE</>
 | 
					    order to obtain a superior plan.  So, for example, if a <literal>WHERE</>
 | 
				
			||||||
    clause applied to a particular table is parallel restricted, the query
 | 
					    clause applied to a particular table is parallel restricted, the query
 | 
				
			||||||
    planner will not consider placing the scan of that table below a
 | 
					    planner will not consider performing a scan of that table in the parallel
 | 
				
			||||||
    <literal>Gather</> node.  In some cases, it would be
 | 
					    portion of a plan.  In some cases, it would be
 | 
				
			||||||
    possible (and perhaps even efficient) to include the scan of that table in
 | 
					    possible (and perhaps even efficient) to include the scan of that table in
 | 
				
			||||||
    the parallel portion of the query and defer the evaluation of the
 | 
					    the parallel portion of the query and defer the evaluation of the
 | 
				
			||||||
    <literal>WHERE</> clause so that it happens above the <literal>Gather</>
 | 
					    <literal>WHERE</> clause so that it happens above the <literal>Gather</>
 | 
				
			||||||
 
 | 
				
			|||||||
		Reference in New Issue
	
	Block a user