1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-28 23:42:10 +03:00

Add a multi-worker capability to autovacuum. This allows multiple worker

processes to be running simultaneously.  Also, now autovacuum processes do not
count towards the max_connections limit; they are counted separately from
regular processes, and are limited by the new GUC variable
autovacuum_max_workers.

The launcher now has intelligence to launch workers on each database every
autovacuum_naptime seconds, limited only on the max amount of worker slots
available.

Also, the global worker I/O utilization is limited by the vacuum cost-based
delay feature.  Workers are "balanced" so that the total I/O consumption does
not exceed the established limit.  This part of the patch was contributed by
ITAGAKI Takahiro.

Per discussion.
This commit is contained in:
Alvaro Herrera
2007-04-16 18:30:04 +00:00
parent 42dc4b66e6
commit e2a186b03c
12 changed files with 1174 additions and 162 deletions

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.119 2007/04/02 15:27:02 petere Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.120 2007/04/16 18:29:50 alvherre Exp $ -->
<chapter Id="runtime-config">
<title>Server Configuration</title>
@ -3166,7 +3166,7 @@ SELECT * FROM parent WHERE key = 2400;
<listitem>
<para>
Controls whether the server should run the
autovacuum daemon. This is off by default.
autovacuum launcher daemon. This is on by default.
<varname>stats_start_collector</> and <varname>stats_row_level</>
must also be turned on for autovacuum to work.
This parameter can only be set in the <filename>postgresql.conf</>
@ -3175,6 +3175,21 @@ SELECT * FROM parent WHERE key = 2400;
</listitem>
</varlistentry>
<varlistentry id="guc-autovacuum-max-workers" xreflabel="autovacuum_max_workers">
<term><varname>autovacuum_max_workers</varname> (<type>integer</type>)</term>
<indexterm>
<primary><varname>autovacuum_max_workers</> configuration parameter</primary>
</indexterm>
<listitem>
<para>
Specifies the maximum number of autovacuum processes (other than the
autovacuum launcher) which may be running at any one time. The default
is three (<literal>3</literal>). This parameter can only be set in
the <filename>postgresql.conf</> file or on the server command line.
</para>
</listitem>
</varlistentry>
<varlistentry id="guc-autovacuum-naptime" xreflabel="autovacuum_naptime">
<term><varname>autovacuum_naptime</varname> (<type>integer</type>)</term>
<indexterm>
@ -3182,9 +3197,9 @@ SELECT * FROM parent WHERE key = 2400;
</indexterm>
<listitem>
<para>
Specifies the delay between activity rounds for the autovacuum
daemon. In each round the daemon examines one database
and issues <command>VACUUM</> and <command>ANALYZE</> commands
Specifies the minimum delay between autovacuum runs on any given
database. In each round the daemon examines the
database and issues <command>VACUUM</> and <command>ANALYZE</> commands
as needed for tables in that database. The delay is measured
in seconds, and the default is one minute (<literal>1m</>).
This parameter can only be set in the <filename>postgresql.conf</>
@ -3318,7 +3333,10 @@ SELECT * FROM parent WHERE key = 2400;
Specifies the cost limit value that will be used in automatic
<command>VACUUM</> operations. If <literal>-1</> is specified (which is the
default), the regular
<xref linkend="guc-vacuum-cost-limit"> value will be used.
<xref linkend="guc-vacuum-cost-limit"> value will be used. Note that
the value is distributed proportionally among the running autovacuum
workers, if there is more than one, so that the sum of the limits of
each worker never exceeds the limit on this variable.
This parameter can only be set in the <filename>postgresql.conf</>
file or on the server command line.
This setting can be overridden for individual tables by entries in

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.70 2007/02/01 19:10:24 momjian Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.71 2007/04/16 18:29:50 alvherre Exp $ -->
<chapter id="maintenance">
<title>Routine Database Maintenance Tasks</title>
@ -466,26 +466,43 @@ HINT: Stop the postmaster and use a standalone backend to VACUUM in "mydb".
<secondary>general information</secondary>
</indexterm>
<para>
Beginning in <productname>PostgreSQL </productname> 8.1, there is a
separate optional server process called the <firstterm>autovacuum
daemon</firstterm>, whose purpose is to automate the execution of
Beginning in <productname>PostgreSQL</productname> 8.1, there is an
optional feature called <firstterm>autovacuum</firstterm>,
whose purpose is to automate the execution of
<command>VACUUM</command> and <command>ANALYZE </command> commands.
When enabled, the autovacuum daemon runs periodically and checks for
When enabled, autovacuum checks for
tables that have had a large number of inserted, updated or deleted
tuples. These checks use the row-level statistics collection facility;
therefore, the autovacuum daemon cannot be used unless <xref
therefore, autovacuum cannot be used unless <xref
linkend="guc-stats-start-collector"> and <xref
linkend="guc-stats-row-level"> are set to <literal>true</literal>. Also,
it's important to allow a slot for the autovacuum process when choosing
the value of <xref linkend="guc-superuser-reserved-connections">. In
the default configuration, autovacuuming is enabled and the related
linkend="guc-stats-row-level"> are set to <literal>true</literal>.
In the default configuration, autovacuuming is enabled and the related
configuration parameters are appropriately set.
</para>
<para>
The autovacuum daemon, when enabled, runs every <xref
linkend="guc-autovacuum-naptime"> seconds. On each run, it selects
one database to process and checks each table within that database.
Beginning in <productname>PostgreSQL</productname> 8.3, autovacuum has a
multi-process architecture: there is a daemon process, called the
<firstterm>autovacuum launcher</firstterm>, which is in charge of starting
an <firstterm>autovacuum worker</firstterm> process on each database every
<xref linkend="guc-autovacuum-naptime"> seconds.
</para>
<para>
There is a limit of <xref linkend="guc-autovacuum-max-workers"> worker
processes that may be running at at any time, so if the <command>VACUUM</>
and <command>ANALYZE</> work to do takes too long to run, the deadline may
be failed to meet for other databases. Also, if a particular database
takes long to process, more than one worker may be processing it
simultaneously. The workers are smart enough to avoid repeating work that
other workers have done, so this is normally not a problem. Note that the
number of running workers does not count towards the <xref
linkend="guc-max-connections"> nor the <xref
linkend="guc-superuser-reserved-connections"> limits.
</para>
<para>
On each run, the worker process checks each table within that database, and
<command>VACUUM</command> or <command>ANALYZE</command> commands are
issued as needed.
</para>
@ -581,6 +598,12 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</para>
</caution>
<para>
When multiple workers are running, the cost limit is "balanced" among all
the running workers, so that the total impact on the system is the same,
regardless of the number of workers actually running.
</para>
</sect2>
</sect1>