Add a multi-worker capability to autovacuum. This allows multiple worker

processes to be running simultaneously. Also, now autovacuum processes do not count towards the max_connections limit; they are counted separately from regular processes, and are limited by the new GUC variable autovacuum_max_workers. The launcher now has intelligence to launch workers on each database every autovacuum_naptime seconds, limited only on the max amount of worker slots available. Also, the global worker I/O utilization is limited by the vacuum cost-based delay feature. Workers are "balanced" so that the total I/O consumption does not exceed the established limit. This part of the patch was contributed by ITAGAKI Takahiro. Per discussion.
2025-07-28 23:42:10 +03:00 · 2007-04-16 18:30:04 +00:00
parent 42dc4b66e6
commit e2a186b03c
12 changed files with 1174 additions and 162 deletions
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.119 2007/04/02 15:27:02 petere Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.120 2007/04/16 18:29:50 alvherre Exp $ -->

 <chapter Id="runtime-config">
  <title>Server Configuration</title>
@ -3166,7 +3166,7 @@ SELECT * FROM parent WHERE key = 2400;
      <listitem>
       <para>
        Controls whether the server should run the
-        autovacuum daemon.  This is off by default.
+        autovacuum launcher daemon.  This is on by default.
        <varname>stats_start_collector</> and <varname>stats_row_level</>
        must also be turned on for autovacuum to work.
        This parameter can only be set in the <filename>postgresql.conf</>
@ -3175,6 +3175,21 @@ SELECT * FROM parent WHERE key = 2400;
      </listitem>
     </varlistentry>

+     <varlistentry id="guc-autovacuum-max-workers" xreflabel="autovacuum_max_workers">
+      <term><varname>autovacuum_max_workers</varname> (<type>integer</type>)</term>
+      <indexterm>
+       <primary><varname>autovacuum_max_workers</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        Specifies the maximum number of autovacuum processes (other than the
+        autovacuum launcher) which may be running at any one time.  The default
+        is three (<literal>3</literal>).  This parameter can only be set in
+        the <filename>postgresql.conf</> file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
     <varlistentry id="guc-autovacuum-naptime" xreflabel="autovacuum_naptime">
      <term><varname>autovacuum_naptime</varname> (<type>integer</type>)</term>
      <indexterm>
@ -3182,9 +3197,9 @@ SELECT * FROM parent WHERE key = 2400;
      </indexterm>
      <listitem>
       <para>
-        Specifies the delay between activity rounds for the autovacuum
-        daemon.  In each round the daemon examines one database
-        and issues <command>VACUUM</> and <command>ANALYZE</> commands
+        Specifies the minimum delay between autovacuum runs on any given
+        database.  In each round the daemon examines the
+        database and issues <command>VACUUM</> and <command>ANALYZE</> commands
        as needed for tables in that database.  The delay is measured
        in seconds, and the default is one minute (<literal>1m</>).
        This parameter can only be set in the <filename>postgresql.conf</>
@ -3318,7 +3333,10 @@ SELECT * FROM parent WHERE key = 2400;
        Specifies the cost limit value that will be used in automatic
        <command>VACUUM</> operations.  If <literal>-1</> is specified (which is the
        default), the regular
-        <xref linkend="guc-vacuum-cost-limit"> value will be used.
+        <xref linkend="guc-vacuum-cost-limit"> value will be used.  Note that
+        the value is distributed proportionally among the running autovacuum
+        workers, if there is more than one, so that the sum of the limits of
+        each worker never exceeds the limit on this variable.
        This parameter can only be set in the <filename>postgresql.conf</>
        file or on the server command line.
        This setting can be overridden for individual tables by entries in
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.70 2007/02/01 19:10:24 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.71 2007/04/16 18:29:50 alvherre Exp $ -->

 <chapter id="maintenance">
 <title>Routine Database Maintenance Tasks</title>
@ -466,26 +466,43 @@ HINT:  Stop the postmaster and use a standalone backend to VACUUM in "mydb".
    <secondary>general information</secondary>
   </indexterm>
   <para>
-    Beginning in <productname>PostgreSQL </productname> 8.1, there is a
-    separate optional server process called the <firstterm>autovacuum
-    daemon</firstterm>, whose purpose is to automate the execution of
+    Beginning in <productname>PostgreSQL</productname> 8.1, there is an
+    optional feature called <firstterm>autovacuum</firstterm>,
+    whose purpose is to automate the execution of
    <command>VACUUM</command> and <command>ANALYZE </command> commands.
-    When enabled, the autovacuum daemon runs periodically and checks for
+    When enabled, autovacuum checks for
    tables that have had a large number of inserted, updated or deleted
    tuples.  These checks use the row-level statistics collection facility;
-    therefore, the autovacuum daemon cannot be used unless <xref
+    therefore, autovacuum cannot be used unless <xref
    linkend="guc-stats-start-collector"> and <xref
-    linkend="guc-stats-row-level"> are set to <literal>true</literal>.  Also,
-    it's important to allow a slot for the autovacuum process when choosing
-    the value of <xref linkend="guc-superuser-reserved-connections">.  In
-    the default configuration, autovacuuming is enabled and the related
+    linkend="guc-stats-row-level"> are set to <literal>true</literal>.
+    In the default configuration, autovacuuming is enabled and the related
    configuration parameters are appropriately set.
   </para>

   <para>
-    The autovacuum daemon, when enabled, runs every <xref
-    linkend="guc-autovacuum-naptime"> seconds.  On each run, it selects
-    one database to process and checks each table within that database.
+	Beginning in <productname>PostgreSQL</productname> 8.3, autovacuum has a
+	multi-process architecture: there is a daemon process, called the
+	<firstterm>autovacuum launcher</firstterm>, which is in charge of starting
+	an <firstterm>autovacuum worker</firstterm> process on each database every
+	<xref linkend="guc-autovacuum-naptime"> seconds.
+   </para>
+
+   <para>
+    There is a limit of <xref linkend="guc-autovacuum-max-workers"> worker
+    processes that may be running at at any time, so if the <command>VACUUM</>
+    and <command>ANALYZE</> work to do takes too long to run, the deadline may
+    be failed to meet for other databases.  Also, if a particular database
+    takes long to process, more than one worker may be processing it
+	simultaneously.  The workers are smart enough to avoid repeating work that
+	other workers have done, so this is normally not a problem.  Note that the
+	number of running workers does not count towards the <xref
+	linkend="guc-max-connections"> nor the <xref
+	linkend="guc-superuser-reserved-connections"> limits.
+   </para>
+
+   <para>
+    On each run, the worker process checks each table within that database, and
    <command>VACUUM</command> or <command>ANALYZE</command> commands are
    issued as needed.
   </para>
@ -581,6 +598,12 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
    </para>
   </caution>

+   <para>
+    When multiple workers are running, the cost limit is "balanced" among all
+    the running workers, so that the total impact on the system is the same,
+    regardless of the number of workers actually running.
+   </para>
+
  </sect2>
 </sect1>