Replace checkpoint_segments with min_wal_size and max_wal_size.

Instead of having a single knob (checkpoint_segments) that both triggers checkpoints, and determines how many checkpoints to recycle, they are now separate concerns. There is still an internal variable called CheckpointSegments, which triggers checkpoints. But it no longer determines how many segments to recycle at a checkpoint. That is now auto-tuned by keeping a moving average of the distance between checkpoints (in bytes), and trying to keep that many segments in reserve. The advantage of this is that you can set max_wal_size very high, but the system won't actually consume that much space if there isn't any need for it. The min_wal_size sets a floor for that; you can effectively disable the auto-tuning behavior by setting min_wal_size equal to max_wal_size. The max_wal_size setting is now the actual target size of WAL at which a new checkpoint is triggered, instead of the distance between checkpoints. Previously, you could calculate the actual WAL usage with the formula "(2 + checkpoint_completion_target) * checkpoint_segments + 1". With this patch, you set the desired WAL usage with max_wal_size, and the system calculates the appropriate CheckpointSegments with the reverse of that formula. That's a lot more intuitive for administrators to set. Reviewed by Amit Kapila and Venkata Balaji N.
2025-12-04 12:02:48 +03:00 · 2015-02-23 18:53:02 +02:00
parent 0fec000365
commit 88e9823026
9 changed files with 327 additions and 108 deletions
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1325,7 +1325,7 @@ include_dir 'conf.d'
        40% of RAM to <varname>shared_buffers</varname> will work better than a
        smaller amount.  Larger settings for <varname>shared_buffers</varname>
        usually require a corresponding increase in
-        <varname>checkpoint_segments</varname>, in order to spread out the
+        <varname>max_wal_size</varname>, in order to spread out the
        process of writing large quantities of new or changed data over a
        longer period of time.
       </para>
@@ -2394,18 +2394,20 @@ include_dir 'conf.d'
     <title>Checkpoints</title>

    <variablelist>
-     <varlistentry id="guc-checkpoint-segments" xreflabel="checkpoint_segments">
-      <term><varname>checkpoint_segments</varname> (<type>integer</type>)
+     <varlistentry id="guc-max-wal-size" xreflabel="max_wal_size">
+      <term><varname>max_wal_size</varname> (<type>integer</type>)</term>
      <indexterm>
-       <primary><varname>checkpoint_segments</> configuration parameter</primary>
+       <primary><varname>max_wal_size</> configuration parameter</primary>
      </indexterm>
-      </term>
      <listitem>
       <para>
-        Maximum number of log file segments between automatic WAL
-        checkpoints (each segment is normally 16 megabytes). The default
-        is three segments.  Increasing this parameter can increase the
-        amount of time needed for crash recovery.
+        Maximum size to let the WAL grow to between automatic WAL
+        checkpoints. This is a soft limit; WAL size can exceed
+        <varname>max_wal_size</> under special circumstances, like
+        under heavy load, a failing <varname>archive_command</>, or a high
+        <varname>wal_keep_segments</> setting. The default is 128 MB.
+        Increasing this parameter can increase the amount of time needed for
+        crash recovery.
        This parameter can only be set in the <filename>postgresql.conf</>
        file or on the server command line.
       </para>
@@ -2458,7 +2460,7 @@ include_dir 'conf.d'
        Write a message to the server log if checkpoints caused by
        the filling of checkpoint segment files happen closer together
        than this many seconds (which suggests that
-        <varname>checkpoint_segments</> ought to be raised).  The default is
+        <varname>max_wal_size</> ought to be raised).  The default is
        30 seconds (<literal>30s</>).  Zero disables the warning.
        No warnings will be generated if <varname>checkpoint_timeout</varname>
        is less than <varname>checkpoint_warning</varname>.
@@ -2468,6 +2470,24 @@ include_dir 'conf.d'
      </listitem>
     </varlistentry>

+     <varlistentry id="guc-min-wal-size" xreflabel="min_wal_size">
+      <term><varname>min_wal_size</varname> (<type>integer</type>)</term>
+      <indexterm>
+       <primary><varname>min_wal_size</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        As long as WAL disk usage stays below this setting, old WAL files are
+        always recycled for future use at a checkpoint, rather than removed.
+        This can be used to ensure that enough WAL space is reserved to
+        handle spikes in WAL usage, for example when running large batch
+        jobs. The default is 80 MB.
+        This parameter can only be set in the <filename>postgresql.conf</>
+        file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
     </variablelist>
     </sect2>
     <sect2 id="runtime-config-wal-archiving">
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1328,19 +1328,19 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
   </para>
  </sect2>

-  <sect2 id="populate-checkpoint-segments">
-   <title>Increase <varname>checkpoint_segments</varname></title>
+  <sect2 id="populate-max-wal-size">
+   <title>Increase <varname>max_wal_size</varname></title>

   <para>
-    Temporarily increasing the <xref
-    linkend="guc-checkpoint-segments"> configuration variable can also
+    Temporarily increasing the <xref linkend="guc-max-wal-size">
+    configuration variable can also
    make large data loads faster.  This is because loading a large
    amount of data into <productname>PostgreSQL</productname> will
    cause checkpoints to occur more often than the normal checkpoint
    frequency (specified by the <varname>checkpoint_timeout</varname>
    configuration variable). Whenever a checkpoint occurs, all dirty
    pages must be flushed to disk. By increasing
-    <varname>checkpoint_segments</varname> temporarily during bulk
+    <varname>max_wal_size</varname> temporarily during bulk
    data loads, the number of checkpoints that are required can be
    reduced.
   </para>
@@ -1445,7 +1445,7 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
      <para>
       Set appropriate (i.e., larger than normal) values for
       <varname>maintenance_work_mem</varname> and
-       <varname>checkpoint_segments</varname>.
+       <varname>max_wal_size</varname>.
      </para>
     </listitem>
     <listitem>
@@ -1512,7 +1512,7 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;

    So when loading a data-only dump, it is up to you to drop and recreate
    indexes and foreign keys if you wish to use those techniques.
-    It's still useful to increase <varname>checkpoint_segments</varname>
+    It's still useful to increase <varname>max_wal_size</varname>
    while loading the data, but don't bother increasing
    <varname>maintenance_work_mem</varname>; rather, you'd do that while
    manually recreating indexes and foreign keys afterwards.
@@ -1577,7 +1577,7 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;

     <listitem>
      <para>
-       Increase <xref linkend="guc-checkpoint-segments"> and <xref
+       Increase <xref linkend="guc-max-wal-size"> and <xref
       linkend="guc-checkpoint-timeout"> ; this reduces the frequency
       of checkpoints, but increases the storage requirements of
       <filename>/pg_xlog</>.
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -472,9 +472,10 @@
  <para>
   The server's checkpointer process automatically performs
   a checkpoint every so often.  A checkpoint is begun every <xref
-   linkend="guc-checkpoint-segments"> log segments, or every <xref
-   linkend="guc-checkpoint-timeout"> seconds, whichever comes first.
-   The default settings are 3 segments and 300 seconds (5 minutes), respectively.
+   linkend="guc-checkpoint-timeout"> seconds, or if
+   <xref linkend="guc-max-wal-size"> is about to be exceeded,
+   whichever comes first.
+   The default settings are 5 minutes and 128 MB, respectively.
   If no WAL has been written since the previous checkpoint, new checkpoints
   will be skipped even if <varname>checkpoint_timeout</> has passed.
   (If WAL archiving is being used and you want to put a lower limit on how
@@ -486,8 +487,8 @@
  </para>

  <para>
-   Reducing <varname>checkpoint_segments</varname> and/or
-   <varname>checkpoint_timeout</varname> causes checkpoints to occur
+   Reducing <varname>checkpoint_timeout</varname> and/or
+   <varname>max_wal_size</varname> causes checkpoints to occur
   more often. This allows faster after-crash recovery, since less work
   will need to be redone. However, one must balance this against the
   increased cost of flushing dirty data pages more often. If
@@ -510,11 +511,11 @@
   parameter.  If checkpoints happen closer together than
   <varname>checkpoint_warning</> seconds,
   a message will be output to the server log recommending increasing
-   <varname>checkpoint_segments</varname>.  Occasional appearance of such
+   <varname>max_wal_size</varname>.  Occasional appearance of such
   a message is not cause for alarm, but if it appears often then the
   checkpoint control parameters should be increased. Bulk operations such
   as large <command>COPY</> transfers might cause a number of such warnings
-   to appear if you have not set <varname>checkpoint_segments</> high
+   to appear if you have not set <varname>max_wal_size</> high
   enough.
  </para>

@@ -525,10 +526,10 @@
   <xref linkend="guc-checkpoint-completion-target">, which is
   given as a fraction of the checkpoint interval.
   The I/O rate is adjusted so that the checkpoint finishes when the
-   given fraction of <varname>checkpoint_segments</varname> WAL segments
-   have been consumed since checkpoint start, or the given fraction of
-   <varname>checkpoint_timeout</varname> seconds have elapsed,
-   whichever is sooner.  With the default value of 0.5,
+   given fraction of
+   <varname>checkpoint_timeout</varname> seconds have elapsed, or before
+   <varname>max_wal_size</varname> is exceeded, whichever is sooner.
+   With the default value of 0.5,
   <productname>PostgreSQL</> can be expected to complete each checkpoint
   in about half the time before the next checkpoint starts.  On a system
   that's very close to maximum I/O throughput during normal operation,
@@ -545,18 +546,35 @@
  </para>

  <para>
-   There will always be at least one WAL segment file, and will normally
-   not be more than (2 + <varname>checkpoint_completion_target</varname>) * <varname>checkpoint_segments</varname> + 1
-   or <varname>checkpoint_segments</> + <xref linkend="guc-wal-keep-segments"> + 1
-   files.  Each segment file is normally 16 MB (though this size can be
-   altered when building the server).  You can use this to estimate space
-   requirements for <acronym>WAL</acronym>.
-   Ordinarily, when old log segment files are no longer needed, they
-   are recycled (that is, renamed to become future segments in the numbered
-   sequence). If, due to a short-term peak of log output rate, there
-   are more than 3 * <varname>checkpoint_segments</varname> + 1
-   segment files, the unneeded segment files will be deleted instead
-   of recycled until the system gets back under this limit.
+   The number of WAL segment files in <filename>pg_xlog</> directory depends on
+   <varname>min_wal_size</>, <varname>max_wal_size</> and
+   the amount of WAL generated in previous checkpoint cycles. When old log
+   segment files are no longer needed, they are removed or recycled (that is,
+   renamed to become future segments in the numbered sequence). If, due to a
+   short-term peak of log output rate, <varname>max_wal_size</> is
+   exceeded, the unneeded segment files will be removed until the system
+   gets back under this limit. Below that limit, the system recycles enough
+   WAL files to cover the estimated need until the next checkpoint, and
+   removes the rest. The estimate is based on a moving average of the number
+   of WAL files used in previous checkpoint cycles. The moving average
+   is increased immediately if the actual usage exceeds the estimate, so it
+   accommodates peak usage rather average usage to some extent.
+   <varname>min_wal_size</> puts a minimum on the amount of WAL files
+   recycled for future usage; that much WAL is always recycled for future use,
+   even if the system is idle and the WAL usage estimate suggests that little
+   WAL is needed.
+  </para>
+
+  <para>
+   Independently of <varname>max_wal_size</varname>,
+   <xref linkend="guc-wal-keep-segments"> + 1 most recent WAL files are
+   kept at all times. Also, if WAL archiving is used, old segments can not be
+   removed or recycled until they are archived. If WAL archiving cannot keep up
+   with the pace that WAL is generated, or if <varname>archive_command</varname>
+   fails repeatedly, old WAL files will accumulate in <filename>pg_xlog</>
+   until the situation is resolved. A slow or failed standby server that
+   uses a replication slot will have the same effect (see
+   <xref linkend="streaming-replication-slots">).
  </para>

  <para>
@@ -571,9 +589,8 @@
   master because restartpoints can only be performed at checkpoint records.
   A restartpoint is triggered when a checkpoint record is reached if at
   least <varname>checkpoint_timeout</> seconds have passed since the last
-   restartpoint. In standby mode, a restartpoint is also triggered if at
-   least <varname>checkpoint_segments</> log segments have been replayed
-   since the last restartpoint.
+   restartpoint, or if WAL size is about to exceed
+   <varname>max_wal_size</>.
  </para>

  <para>