1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-13 07:41:39 +03:00

Allow vacuum command to process indexes in parallel.

This feature allows the vacuum to leverage multiple CPUs in order to
process indexes.  This enables us to perform index vacuuming and index
cleanup with background workers.  This adds a PARALLEL option to VACUUM
command where the user can specify the number of workers that can be used
to perform the command which is limited by the number of indexes on a
table.  Specifying zero as a number of workers will disable parallelism.
This option can't be used with the FULL option.

Each index is processed by at most one vacuum process.  Therefore parallel
vacuum can be used when the table has at least two indexes.

The parallel degree is either specified by the user or determined based on
the number of indexes that the table has, and further limited by
max_parallel_maintenance_workers.  The index can participate in parallel
vacuum iff it's size is greater than min_parallel_index_scan_size.

Author: Masahiko Sawada and Amit Kapila
Reviewed-by: Dilip Kumar, Amit Kapila, Robert Haas, Tomas Vondra,
Mahendra Singh and Sergei Kornilov
Tested-by: Mahendra Singh and Prabhat Sahu
Discussion:
https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com
This commit is contained in:
Amit Kapila
2020-01-20 07:57:49 +05:30
parent 44f1fc8df5
commit 40d964ec99
13 changed files with 1452 additions and 136 deletions

View File

@ -2308,13 +2308,13 @@ include_dir 'conf.d'
<listitem>
<para>
Sets the maximum number of parallel workers that can be
started by a single utility command. Currently, the only
parallel utility command that supports the use of parallel
workers is <command>CREATE INDEX</command>, and only when
building a B-tree index. Parallel workers are taken from the
pool of processes established by <xref
linkend="guc-max-worker-processes"/>, limited by <xref
linkend="guc-max-parallel-workers"/>. Note that the requested
started by a single utility command. Currently, the parallel
utility commands that support the use of parallel workers are
<command>CREATE INDEX</command> only when building a B-tree index,
and <command>VACUUM</command> without <literal>FULL</literal>
option. Parallel workers are taken from the pool of processes
established by <xref linkend="guc-max-worker-processes"/>, limited
by <xref linkend="guc-max-parallel-workers"/>. Note that the requested
number of workers may not actually be available at run time.
If this occurs, the utility operation will run with fewer
workers than expected. The default value is 2. Setting this
@ -4915,7 +4915,9 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
for a parallel scan to be considered. Note that a parallel index scan
typically won't touch the entire index; it is the number of pages
which the planner believes will actually be touched by the scan which
is relevant.
is relevant. This parameter is also used to decide whether a
particular index can participate in a parallel vacuum. See
<xref linkend="sql-vacuum"/>.
If this value is specified without units, it is taken as blocks,
that is <symbol>BLCKSZ</symbol> bytes, typically 8kB.
The default is 512 kilobytes (<literal>512kB</literal>).

View File

@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
INDEX_CLEANUP [ <replaceable class="parameter">boolean</replaceable> ]
TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ]
PARALLEL <replaceable class="parameter">integer</replaceable>
<phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
@ -75,10 +76,14 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
with normal reading and writing of the table, as an exclusive lock
is not obtained. However, extra space is not returned to the operating
system (in most cases); it's just kept available for re-use within the
same table. <command>VACUUM FULL</command> rewrites the entire contents
of the table into a new disk file with no extra space, allowing unused
space to be returned to the operating system. This form is much slower and
requires an exclusive lock on each table while it is being processed.
same table. It also allows us to leverage multiple CPUs in order to process
indexes. This feature is known as <firstterm>parallel vacuum</firstterm>.
To disable this feature, one can use <literal>PARALLEL</literal> option and
specify parallel workers as zero. <command>VACUUM FULL</command> rewrites
the entire contents of the table into a new disk file with no extra space,
allowing unused space to be returned to the operating system. This form is
much slower and requires an exclusive lock on each table while it is being
processed.
</para>
<para>
@ -223,6 +228,33 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
</listitem>
</varlistentry>
<varlistentry>
<term><literal>PARALLEL</literal></term>
<listitem>
<para>
Perform vacuum index and cleanup index phases of <command>VACUUM</command>
in parallel using <replaceable class="parameter">integer</replaceable>
background workers (for the detail of each vacuum phases, please
refer to <xref linkend="vacuum-phases"/>). If the
<literal>PARALLEL</literal> option is omitted, then
<command>VACUUM</command> decides the number of workers based on number
of indexes that support parallel vacuum operation on the relation which
is further limited by <xref linkend="guc-max-parallel-workers-maintenance"/>.
The index can participate in a parallel vacuum if and only if the size
of the index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
Please note that it is not guaranteed that the number of parallel workers
specified in <replaceable class="parameter">integer</replaceable> will
be used during execution. It is possible for a vacuum to run with fewer
workers than specified, or even with no workers at all. Only one worker
can be used per index. So parallel workers are launched only when there
are at least <literal>2</literal> indexes in the table. Workers for
vacuum launches before starting each phase and exit at the end of
the phase. These behaviors might change in a future release. This
option can't be used with the <literal>FULL</literal> option.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="parameter">boolean</replaceable></term>
<listitem>
@ -237,6 +269,15 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="parameter">integer</replaceable></term>
<listitem>
<para>
Specifies a non-negative integer value passed to the selected option.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="parameter">table_name</replaceable></term>
<listitem>
@ -316,11 +357,19 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
more than a plain <command>VACUUM</command> would.
</para>
<para>
The <option>PARALLEL</option> option is used only for vacuum purpose.
Even if this option is specified with <option>ANALYZE</option> option
it does not affect <option>ANALYZE</option>.
</para>
<para>
<command>VACUUM</command> causes a substantial increase in I/O traffic,
which might cause poor performance for other active sessions. Therefore,
it is sometimes advisable to use the cost-based vacuum delay feature.
See <xref linkend="runtime-config-resource-vacuum-cost"/> for details.
it is sometimes advisable to use the cost-based vacuum delay feature. For
parallel vacuum, each worker sleeps proportional to the work done by that
worker. See <xref linkend="runtime-config-resource-vacuum-cost"/> for
details.
</para>
<para>