1
0
mirror of https://github.com/postgres/postgres.git synced 2026-01-05 23:38:41 +03:00

Support parallel btree index builds.

To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.

The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.

Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
This commit is contained in:
Robert Haas
2018-02-02 13:25:55 -05:00
parent 9aef173163
commit 9da0cc3528
51 changed files with 2238 additions and 362 deletions

View File

@@ -2022,7 +2022,8 @@ include_dir 'conf.d'
<para>
When changing this value, consider also adjusting
<xref linkend="guc-max-parallel-workers"/> and
<xref linkend="guc-max-parallel-workers"/>,
<xref linkend="guc-max-parallel-workers-maintenance"/>, and
<xref linkend="guc-max-parallel-workers-per-gather"/>.
</para>
</listitem>
@@ -2070,6 +2071,44 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
<varlistentry id="guc-max-parallel-workers-maintenance" xreflabel="max_parallel_maintenance_workers">
<term><varname>max_parallel_maintenance_workers</varname> (<type>integer</type>)
<indexterm>
<primary><varname>max_parallel_maintenance_workers</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Sets the maximum number of parallel workers that can be
started by a single utility command. Currently, the only
parallel utility command that supports the use of parallel
workers is <command>CREATE INDEX</command>, and only when
building a B-tree index. Parallel workers are taken from the
pool of processes established by <xref
linkend="guc-max-worker-processes"/>, limited by <xref
linkend="guc-max-parallel-workers"/>. Note that the requested
number of workers may not actually be available at runtime.
If this occurs, the utility operation will run with fewer
workers than expected. The default value is 2. Setting this
value to 0 disables the use of parallel workers by utility
commands.
</para>
<para>
Note that parallel utility commands should not consume
substantially more memory than equivalent non-parallel
operations. This strategy differs from that of parallel
query, where resource limits generally apply per worker
process. Parallel utility commands treat the resource limit
<varname>maintenance_work_mem</varname> as a limit to be applied to
the entire utility command, regardless of the number of
parallel worker processes. However, parallel utility
commands may still consume substantially more CPU resources
and I/O bandwidth.
</para>
</listitem>
</varlistentry>
<varlistentry id="guc-max-parallel-workers" xreflabel="max_parallel_workers">
<term><varname>max_parallel_workers</varname> (<type>integer</type>)
<indexterm>
@@ -2079,8 +2118,9 @@ include_dir 'conf.d'
<listitem>
<para>
Sets the maximum number of workers that the system can support for
parallel queries. The default value is 8. When increasing or
parallel operations. The default value is 8. When increasing or
decreasing this value, consider also adjusting
<xref linkend="guc-max-parallel-workers-maintenance"/> and
<xref linkend="guc-max-parallel-workers-per-gather"/>.
Also, note that a setting for this value which is higher than
<xref linkend="guc-max-worker-processes"/> will have no effect,

View File

@@ -1263,7 +1263,7 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting in an extension.</entry>
</row>
<row>
<entry morerows="32"><literal>IPC</literal></entry>
<entry morerows="33"><literal>IPC</literal></entry>
<entry><literal>BgWorkerShutdown</literal></entry>
<entry>Waiting for background worker to shut down.</entry>
</row>
@@ -1371,6 +1371,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>ParallelBitmapScan</literal></entry>
<entry>Waiting for parallel bitmap scan to become initialized.</entry>
</row>
<row>
<entry><literal>ParallelCreateIndexScan</literal></entry>
<entry>Waiting for parallel <command>CREATE INDEX</command> workers to finish heap scan.</entry>
</row>
<row>
<entry><literal>ProcArrayGroupUpdate</literal></entry>
<entry>Waiting for group leader to clear transaction id at transaction end.</entry>
@@ -3900,13 +3904,15 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
</row>
<row>
<entry><literal>sort-start</literal></entry>
<entry><literal>(int, bool, int, int, bool)</literal></entry>
<entry><literal>(int, bool, int, int, bool, int)</literal></entry>
<entry>Probe that fires when a sort operation is started.
arg0 indicates heap, index or datum sort.
arg1 is true for unique-value enforcement.
arg2 is the number of key columns.
arg3 is the number of kilobytes of work memory allowed.
arg4 is true if random access to the sort result is required.</entry>
arg4 is true if random access to the sort result is required.
arg5 indicates serial when <literal>0</literal>, parallel worker when
<literal>1</literal>, or parallel leader when <literal>2</literal>.</entry>
</row>
<row>
<entry><literal>sort-done</literal></entry>

View File

@@ -599,6 +599,64 @@ Indexes:
which would drive the machine into swapping.
</para>
<para>
<productname>PostgreSQL</productname> can build indexes while
leveraging multiple CPUs in order to process the table rows faster.
This feature is known as <firstterm>parallel index
build</firstterm>. For index methods that support building indexes
in parallel (currently, only B-tree),
<varname>maintenance_work_mem</varname> specifies the maximum
amount of memory that can be used by each index build operation as
a whole, regardless of how many worker processes were started.
Generally, a cost model automatically determines how many worker
processes should be requested, if any.
</para>
<para>
Parallel index builds may benefit from increasing
<varname>maintenance_work_mem</varname> where an equivalent serial
index build will see little or no benefit. Note that
<varname>maintenance_work_mem</varname> may influence the number of
worker processes requested, since parallel workers must have at
least a <literal>32MB</literal> share of the total
<varname>maintenance_work_mem</varname> budget. There must also be
a remaining <literal>32MB</literal> share for the leader process.
Increasing <xref linkend="guc-max-parallel-workers-maintenance"/>
may allow more workers to be used, which will reduce the time
needed for index creation, so long as the index build is not
already I/O bound. Of course, there should also be sufficient
CPU capacity that would otherwise lie idle.
</para>
<para>
Setting a value for <literal>parallel_workers</literal> via <xref
linkend="sql-altertable"/> directly controls how many parallel
worker processes will be requested by a <command>CREATE
INDEX</command> against the table. This bypasses the cost model
completely, and prevents <varname>maintenance_work_mem</varname>
from affecting how many parallel workers are requested. Setting
<literal>parallel_workers</literal> to 0 via <command>ALTER
TABLE</command> will disable parallel index builds on the table in
all cases.
</para>
<tip>
<para>
You might want to reset <literal>parallel_workers</literal> after
setting it as part of tuning an index build. This avoids
inadvertent changes to query plans, since
<literal>parallel_workers</literal> affects
<emphasis>all</emphasis> parallel table scans.
</para>
</tip>
<para>
While <command>CREATE INDEX</command> with the
<literal>CONCURRENTLY</literal> option supports parallel builds
without special restrictions, only the first table scan is actually
performed in parallel.
</para>
<para>
Use <xref linkend="sql-dropindex"/>
to remove an index.

View File

@@ -1228,8 +1228,8 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
This sets the number of workers that should be used to assist a parallel
scan of this table. If not set, the system will determine a value based
on the relation size. The actual number of workers chosen by the planner
may be less, for example due to
the setting of <xref linkend="guc-max-worker-processes"/>.
or by utility statements that use parallel scans may be less, for example
due to the setting of <xref linkend="guc-max-worker-processes"/>.
</para>
</listitem>
</varlistentry>