mirror of
https://github.com/postgres/postgres.git
synced 2025-07-30 11:03:19 +03:00
Update WAL configuration discussion to reflect post-7.1 tweaking.
Minor copy-editing.
This commit is contained in:
@ -1,4 +1,4 @@
|
|||||||
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/wal.sgml,v 1.11 2001/09/29 04:02:19 tgl Exp $ -->
|
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/wal.sgml,v 1.12 2001/10/26 23:10:21 tgl Exp $ -->
|
||||||
|
|
||||||
<chapter id="wal">
|
<chapter id="wal">
|
||||||
<title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
|
<title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
|
||||||
@ -88,8 +88,11 @@
|
|||||||
transaction identifiers. Once UNDO is implemented,
|
transaction identifiers. Once UNDO is implemented,
|
||||||
<filename>pg_clog</filename> will no longer be required to be
|
<filename>pg_clog</filename> will no longer be required to be
|
||||||
permanent; it will be possible to remove
|
permanent; it will be possible to remove
|
||||||
<filename>pg_clog</filename> at shutdown, split it into segments
|
<filename>pg_clog</filename> at shutdown. (However, the urgency
|
||||||
and remove old segments.
|
of this concern has decreased greatly with the adoption of a segmented
|
||||||
|
storage method for <filename>pg_clog</filename> --- it is no longer
|
||||||
|
necessary to keep old <filename>pg_clog</filename> entries around
|
||||||
|
forever.)
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -116,6 +119,18 @@
|
|||||||
copying the data files (operating system copy commands are not
|
copying the data files (operating system copy commands are not
|
||||||
suitable).
|
suitable).
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
A difficulty standing in the way of realizing these benefits is that they
|
||||||
|
require saving <acronym>WAL</acronym> entries for considerable periods
|
||||||
|
of time (eg, as long as the longest possible transaction if transaction
|
||||||
|
UNDO is wanted). The present <acronym>WAL</acronym> format is
|
||||||
|
extremely bulky since it includes many disk page snapshots.
|
||||||
|
This is not a serious concern at present, since the entries only need
|
||||||
|
to be kept for one or two checkpoint intervals; but to achieve
|
||||||
|
these future benefits some sort of compressed <acronym>WAL</acronym>
|
||||||
|
format will be needed.
|
||||||
|
</para>
|
||||||
</sect2>
|
</sect2>
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
@ -133,8 +148,8 @@
|
|||||||
<para>
|
<para>
|
||||||
<acronym>WAL</acronym> logs are stored in the directory
|
<acronym>WAL</acronym> logs are stored in the directory
|
||||||
<Filename><replaceable>$PGDATA</replaceable>/pg_xlog</Filename>, as
|
<Filename><replaceable>$PGDATA</replaceable>/pg_xlog</Filename>, as
|
||||||
a set of segment files, each 16 MB in size. Each segment is
|
a set of segment files, each 16MB in size. Each segment is
|
||||||
divided into 8 kB pages. The log record headers are described in
|
divided into 8KB pages. The log record headers are described in
|
||||||
<filename>access/xlog.h</filename>; record content is dependent on
|
<filename>access/xlog.h</filename>; record content is dependent on
|
||||||
the type of event that is being logged. Segment files are given
|
the type of event that is being logged. Segment files are given
|
||||||
ever-increasing numbers as names, starting at
|
ever-increasing numbers as names, starting at
|
||||||
@ -147,8 +162,8 @@
|
|||||||
The <acronym>WAL</acronym> buffers and control structure are in
|
The <acronym>WAL</acronym> buffers and control structure are in
|
||||||
shared memory, and are handled by the backends; they are protected
|
shared memory, and are handled by the backends; they are protected
|
||||||
by lightweight locks. The demand on shared memory is dependent on the
|
by lightweight locks. The demand on shared memory is dependent on the
|
||||||
number of buffers; the default size of the <acronym>WAL</acronym>
|
number of buffers. The default size of the <acronym>WAL</acronym>
|
||||||
buffers is 64 kB.
|
buffers is 8 8KB buffers, or 64KB.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -166,8 +181,8 @@
|
|||||||
disk drives that falsely report a successful write to the kernel,
|
disk drives that falsely report a successful write to the kernel,
|
||||||
when, in fact, they have only cached the data and not yet stored it
|
when, in fact, they have only cached the data and not yet stored it
|
||||||
on the disk. A power failure in such a situation may still lead to
|
on the disk. A power failure in such a situation may still lead to
|
||||||
irrecoverable data corruption; administrators should try to ensure
|
irrecoverable data corruption. Administrators should try to ensure
|
||||||
that disks holding <productname>PostgreSQL</productname>'s data and
|
that disks holding <productname>PostgreSQL</productname>'s
|
||||||
log files do not make such false reports.
|
log files do not make such false reports.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
@ -179,11 +194,12 @@
|
|||||||
checkpoint's position is saved in the file
|
checkpoint's position is saved in the file
|
||||||
<filename>pg_control</filename>. Therefore, when recovery is to be
|
<filename>pg_control</filename>. Therefore, when recovery is to be
|
||||||
done, the backend first reads <filename>pg_control</filename> and
|
done, the backend first reads <filename>pg_control</filename> and
|
||||||
then the checkpoint record; next it reads the redo record, whose
|
then the checkpoint record; then it performs the REDO operation by
|
||||||
position is saved in the checkpoint, and begins the REDO operation.
|
scanning forward from the log position indicated in the checkpoint
|
||||||
Because the entire content of the pages is saved in the log on the
|
record.
|
||||||
first page modification after a checkpoint, the pages will be first
|
Because the entire content of data pages is saved in the log on the
|
||||||
restored to a consistent state.
|
first page modification after a checkpoint, all pages changed since
|
||||||
|
the checkpoint will be restored to a consistent state.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -217,9 +233,9 @@
|
|||||||
buffers. This is undesirable because <function>LogInsert</function>
|
buffers. This is undesirable because <function>LogInsert</function>
|
||||||
is used on every database low level modification (for example,
|
is used on every database low level modification (for example,
|
||||||
tuple insertion) at a time when an exclusive lock is held on
|
tuple insertion) at a time when an exclusive lock is held on
|
||||||
affected data pages and the operation is supposed to be as fast as
|
affected data pages, so the operation needs to be as fast as
|
||||||
possible; what is worse, writing <acronym>WAL</acronym> buffers may
|
possible. What is worse, writing <acronym>WAL</acronym> buffers may
|
||||||
also cause the creation of a new log segment, which takes even more
|
also force the creation of a new log segment, which takes even more
|
||||||
time. Normally, <acronym>WAL</acronym> buffers should be written
|
time. Normally, <acronym>WAL</acronym> buffers should be written
|
||||||
and flushed by a <function>LogFlush</function> request, which is
|
and flushed by a <function>LogFlush</function> request, which is
|
||||||
made, for the most part, at transaction commit time to ensure that
|
made, for the most part, at transaction commit time to ensure that
|
||||||
@ -230,7 +246,7 @@
|
|||||||
one should increase the number of <acronym>WAL</acronym> buffers by
|
one should increase the number of <acronym>WAL</acronym> buffers by
|
||||||
modifying the <varname>WAL_BUFFERS</varname> parameter. The default
|
modifying the <varname>WAL_BUFFERS</varname> parameter. The default
|
||||||
number of <acronym>WAL</acronym> buffers is 8. Increasing this
|
number of <acronym>WAL</acronym> buffers is 8. Increasing this
|
||||||
value will have an impact on shared memory usage.
|
value will correspondingly increase shared memory usage.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -243,34 +259,28 @@
|
|||||||
log (known as the redo record) it should start the REDO operation,
|
log (known as the redo record) it should start the REDO operation,
|
||||||
since any changes made to data files before that record are already
|
since any changes made to data files before that record are already
|
||||||
on disk. After a checkpoint has been made, any log segments written
|
on disk. After a checkpoint has been made, any log segments written
|
||||||
before the undo records are removed, so checkpoints are used to free
|
before the undo records are no longer needed and can be recycled or
|
||||||
disk space in the <acronym>WAL</acronym> directory. (When
|
removed. (When <acronym>WAL</acronym>-based <acronym>BAR</acronym> is
|
||||||
<acronym>WAL</acronym>-based <acronym>BAR</acronym> is implemented,
|
implemented, the log segments would be archived before being recycled
|
||||||
the log segments can be archived instead of just being removed.)
|
or removed.)
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
The checkpoint maker is also able to create a few log segments for
|
The checkpoint maker is also able to create a few log segments for
|
||||||
future use, so as to avoid the need for
|
future use, so as to avoid the need for
|
||||||
<function>LogInsert</function> or <function>LogFlush</function> to
|
<function>LogInsert</function> or <function>LogFlush</function> to
|
||||||
spend time in creating them.
|
spend time in creating them. (If that happens, the entire database
|
||||||
</para>
|
system will be delayed by the creation operation, so it's better if
|
||||||
|
the files can be created in the checkpoint maker, which is not on
|
||||||
<para>
|
anyone's critical path.)
|
||||||
The <acronym>WAL</acronym> log is held on the disk as a set of 16
|
By default a new 16MB segment file is created only if more than 75% of
|
||||||
MB files called <firstterm>segments</firstterm>. By default a new
|
the current segment has been used. This is inadequate if the system
|
||||||
segment is created only if more than 75% of the current segment is
|
generates more than 4MB of log output between checkpoints.
|
||||||
used. One can instruct the server to pre-create up to 64 log segments
|
One can instruct the server to pre-create up to 64 log segments
|
||||||
at checkpoint time by modifying the <varname>WAL_FILES</varname>
|
at checkpoint time by modifying the <varname>WAL_FILES</varname>
|
||||||
configuration parameter.
|
configuration parameter.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
|
||||||
For faster after-crash recovery, it would be better to create
|
|
||||||
checkpoints more often. However, one should balance this against
|
|
||||||
the cost of flushing dirty data pages; in addition, to ensure data
|
|
||||||
page consistency, the first modification of a data page after each
|
|
||||||
checkpoint results in logging the entire page content, thus
|
|
||||||
increasing output to log and the log's size.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
The postmaster spawns a special backend process every so often
|
The postmaster spawns a special backend process every so often
|
||||||
to create the next checkpoint. A checkpoint is created every
|
to create the next checkpoint. A checkpoint is created every
|
||||||
@ -281,6 +291,35 @@
|
|||||||
<command>CHECKPOINT</command>.
|
<command>CHECKPOINT</command>.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Reducing <varname>CHECKPOINT_SEGMENTS</varname> and/or
|
||||||
|
<varname>CHECKPOINT_TIMEOUT</varname> causes checkpoints to be
|
||||||
|
done more often. This allows faster after-crash recovery (since
|
||||||
|
less work will need to be redone). However, one must balance this against
|
||||||
|
the increased cost of flushing dirty data pages more often. In addition,
|
||||||
|
to ensure data page consistency, the first modification of a data page
|
||||||
|
after each checkpoint results in logging the entire page content.
|
||||||
|
Thus a smaller checkpoint interval increases the volume of output to
|
||||||
|
the log, partially negating the goal of using a smaller interval, and
|
||||||
|
in any case causing more disk I/O.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The number of 16MB segment files will always be at least
|
||||||
|
<varname>WAL_FILES</varname> + 1, and will normally not exceed
|
||||||
|
<varname>WAL_FILES</varname> + 2 * <varname>CHECKPOINT_SEGMENTS</varname>
|
||||||
|
+ 1. This may be used to estimate space requirements for WAL. Ordinarily,
|
||||||
|
when an old log segment file is no longer needed, it is recycled (renamed
|
||||||
|
to become the next sequential future segment). If, due to a short-term
|
||||||
|
peak of log output rate, there are more than <varname>WAL_FILES</varname> +
|
||||||
|
2 * <varname>CHECKPOINT_SEGMENTS</varname> + 1 segment files, then unneeded
|
||||||
|
segment files will be deleted instead of recycled until the system gets
|
||||||
|
back under this limit. (If this happens on a regular basis,
|
||||||
|
<varname>WAL_FILES</varname> should be increased to avoid it. Deleting log
|
||||||
|
segments that will only have to be created again later is expensive and
|
||||||
|
pointless.)
|
||||||
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
The <varname>COMMIT_DELAY</varname> parameter defines for how many
|
The <varname>COMMIT_DELAY</varname> parameter defines for how many
|
||||||
microseconds the backend will sleep after writing a commit
|
microseconds the backend will sleep after writing a commit
|
||||||
@ -294,6 +333,8 @@
|
|||||||
Note that on most platforms, the resolution of a sleep request is
|
Note that on most platforms, the resolution of a sleep request is
|
||||||
ten milliseconds, so that any nonzero <varname>COMMIT_DELAY</varname>
|
ten milliseconds, so that any nonzero <varname>COMMIT_DELAY</varname>
|
||||||
setting between 1 and 10000 microseconds will have the same effect.
|
setting between 1 and 10000 microseconds will have the same effect.
|
||||||
|
Good values for these parameters are not yet clear; experimentation
|
||||||
|
is encouraged.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
|
Reference in New Issue
Block a user