mirror of
https://github.com/postgres/postgres.git
synced 2025-06-13 07:41:39 +03:00
Optionally prefetch referenced data in recovery.
Introduce a new GUC recovery_prefetch, disabled by default. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Better mechanisms will follow in later work on the I/O subsystem. The GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os we allow ourselves to initiate, based on pessimistic heuristics used to infer that I/Os have begun and completed. The GUC wal_decode_buffer_size is used to limit the maximum distance we are prepared to read ahead in the WAL to find uncached blocks. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts) Reviewed-by: Andres Freund <andres@anarazel.de> (parts) Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
This commit is contained in:
@ -3565,6 +3565,89 @@ include_dir 'conf.d'
|
||||
</variablelist>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="runtime-config-wal-recovery">
|
||||
|
||||
<title>Recovery</title>
|
||||
|
||||
<indexterm>
|
||||
<primary>configuration</primary>
|
||||
<secondary>of recovery</secondary>
|
||||
<tertiary>general settings</tertiary>
|
||||
</indexterm>
|
||||
|
||||
<para>
|
||||
This section describes the settings that apply to recovery in general,
|
||||
affecting crash recovery, streaming replication and archive-based
|
||||
replication.
|
||||
</para>
|
||||
|
||||
|
||||
<variablelist>
|
||||
<varlistentry id="guc-recovery-prefetch" xreflabel="recovery_prefetch">
|
||||
<term><varname>recovery_prefetch</varname> (<type>boolean</type>)
|
||||
<indexterm>
|
||||
<primary><varname>recovery_prefetch</varname> configuration parameter</primary>
|
||||
</indexterm>
|
||||
</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Whether to try to prefetch blocks that are referenced in the WAL that
|
||||
are not yet in the buffer pool, during recovery. Prefetching blocks
|
||||
that will soon be needed can reduce I/O wait times in some workloads.
|
||||
See also the <xref linkend="guc-wal-decode-buffer-size"/> and
|
||||
<xref linkend="guc-maintenance-io-concurrency"/> settings, which limit
|
||||
prefetching activity.
|
||||
This setting is disabled by default.
|
||||
</para>
|
||||
<para>
|
||||
This feature currently depends on an effective
|
||||
<function>posix_fadvise</function> function, which some
|
||||
operating systems lack.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry id="guc-recovery-prefetch-fpw" xreflabel="recovery_prefetch_fpw">
|
||||
<term><varname>recovery_prefetch_fpw</varname> (<type>boolean</type>)
|
||||
<indexterm>
|
||||
<primary><varname>recovery_prefetch_fpw</varname> configuration parameter</primary>
|
||||
</indexterm>
|
||||
</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Whether to prefetch blocks that were logged with full page images,
|
||||
during recovery. Often this doesn't help, since such blocks will not
|
||||
be read the first time they are needed and might remain in the buffer
|
||||
pool after that. However, on file systems with a block size larger
|
||||
than
|
||||
<productname>PostgreSQL</productname>'s, prefetching can avoid a
|
||||
costly read-before-write when a blocks are later written.
|
||||
The default is off.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry id="guc-wal-decode-buffer-size" xreflabel="wal_decode_buffer_size">
|
||||
<term><varname>wal_decode_buffer_size</varname> (<type>integer</type>)
|
||||
<indexterm>
|
||||
<primary><varname>wal_decode_buffer_size</varname> configuration parameter</primary>
|
||||
</indexterm>
|
||||
</term>
|
||||
<listitem>
|
||||
<para>
|
||||
A limit on how far ahead the server can look in the WAL, to find
|
||||
blocks to prefetch. Setting it too high might be counterproductive,
|
||||
if it means that data falls out of the
|
||||
kernel cache before it is needed. If this value is specified without
|
||||
units, it is taken as bytes.
|
||||
The default is 512kB.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
</variablelist>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="runtime-config-wal-archive-recovery">
|
||||
|
||||
<title>Archive Recovery</title>
|
||||
|
@ -337,6 +337,13 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
|
||||
</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry><structname>pg_stat_prefetch_recovery</structname><indexterm><primary>pg_stat_prefetch_recovery</primary></indexterm></entry>
|
||||
<entry>Only one row, showing statistics about blocks prefetched during recovery.
|
||||
See <xref linkend="pg-stat-prefetch-recovery-view"/> for details.
|
||||
</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry><structname>pg_stat_subscription</structname><indexterm><primary>pg_stat_subscription</primary></indexterm></entry>
|
||||
<entry>At least one row per subscription, showing information about
|
||||
@ -2917,6 +2924,78 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
|
||||
copy of the subscribed tables.
|
||||
</para>
|
||||
|
||||
<table id="pg-stat-prefetch-recovery-view" xreflabel="pg_stat_prefetch_recovery">
|
||||
<title><structname>pg_stat_prefetch_recovery</structname> View</title>
|
||||
<tgroup cols="3">
|
||||
<thead>
|
||||
<row>
|
||||
<entry>Column</entry>
|
||||
<entry>Type</entry>
|
||||
<entry>Description</entry>
|
||||
</row>
|
||||
</thead>
|
||||
|
||||
<tbody>
|
||||
<row>
|
||||
<entry><structfield>prefetch</structfield></entry>
|
||||
<entry><type>bigint</type></entry>
|
||||
<entry>Number of blocks prefetched because they were not in the buffer pool</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><structfield>skip_hit</structfield></entry>
|
||||
<entry><type>bigint</type></entry>
|
||||
<entry>Number of blocks not prefetched because they were already in the buffer pool</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><structfield>skip_new</structfield></entry>
|
||||
<entry><type>bigint</type></entry>
|
||||
<entry>Number of blocks not prefetched because they were new (usually relation extension)</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><structfield>skip_fpw</structfield></entry>
|
||||
<entry><type>bigint</type></entry>
|
||||
<entry>Number of blocks not prefetched because a full page image was included in the WAL and <xref linkend="guc-recovery-prefetch-fpw"/> was set to <literal>off</literal></entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><structfield>skip_seq</structfield></entry>
|
||||
<entry><type>bigint</type></entry>
|
||||
<entry>Number of blocks not prefetched because of repeated access</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><structfield>distance</structfield></entry>
|
||||
<entry><type>integer</type></entry>
|
||||
<entry>How far ahead of recovery the prefetcher is currently reading, in bytes</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><structfield>queue_depth</structfield></entry>
|
||||
<entry><type>integer</type></entry>
|
||||
<entry>How many prefetches have been initiated but are not yet known to have completed</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><structfield>avg_distance</structfield></entry>
|
||||
<entry><type>float4</type></entry>
|
||||
<entry>How far ahead of recovery the prefetcher is on average, while recovery is not idle</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><structfield>avg_queue_depth</structfield></entry>
|
||||
<entry><type>float4</type></entry>
|
||||
<entry>Average number of prefetches in flight while recovery is not idle</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
|
||||
<para>
|
||||
The <structname>pg_stat_prefetch_recovery</structname> view will contain only
|
||||
one row. It is filled with nulls if recovery is not running or WAL
|
||||
prefetching is not enabled. See <xref linkend="guc-recovery-prefetch"/>
|
||||
for more information. The counters in this view are reset whenever the
|
||||
<xref linkend="guc-recovery-prefetch"/>,
|
||||
<xref linkend="guc-recovery-prefetch-fpw"/> or
|
||||
<xref linkend="guc-maintenance-io-concurrency"/> setting is changed and
|
||||
the server configuration is reloaded.
|
||||
</para>
|
||||
|
||||
<table id="pg-stat-subscription" xreflabel="pg_stat_subscription">
|
||||
<title><structname>pg_stat_subscription</structname> View</title>
|
||||
<tgroup cols="1">
|
||||
@ -5049,8 +5128,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
|
||||
all the counters shown in
|
||||
the <structname>pg_stat_bgwriter</structname>
|
||||
view, <literal>archiver</literal> to reset all the counters shown in
|
||||
the <structname>pg_stat_archiver</structname> view or <literal>wal</literal>
|
||||
to reset all the counters shown in the <structname>pg_stat_wal</structname> view.
|
||||
the <structname>pg_stat_archiver</structname> view,
|
||||
<literal>wal</literal> to reset all the counters shown in the
|
||||
<structname>pg_stat_wal</structname> view or
|
||||
<literal>prefetch_recovery</literal> to reset all the counters shown
|
||||
in the <structname>pg_stat_prefetch_recovery</structname> view.
|
||||
</para>
|
||||
<para>
|
||||
This function is restricted to superusers by default, but other users
|
||||
|
@ -803,6 +803,23 @@
|
||||
counted as <literal>wal_write</literal> and <literal>wal_sync</literal>
|
||||
in <structname>pg_stat_wal</structname>, respectively.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <xref linkend="guc-recovery-prefetch"/> parameter can
|
||||
be used to improve I/O performance during recovery by instructing
|
||||
<productname>PostgreSQL</productname> to initiate reads
|
||||
of disk blocks that will soon be needed but are not currently in
|
||||
<productname>PostgreSQL</productname>'s buffer pool.
|
||||
The <xref linkend="guc-maintenance-io-concurrency"/> and
|
||||
<xref linkend="guc-wal-decode-buffer-size"/> settings limit prefetching
|
||||
concurrency and distance, respectively. The
|
||||
prefetching mechanism is most likely to be effective on systems
|
||||
with <varname>full_page_writes</varname> set to
|
||||
<varname>off</varname> (where that is safe), and where the working
|
||||
set is larger than RAM. By default, prefetching in recovery is enabled
|
||||
on operating systems that have <function>posix_fadvise</function>
|
||||
support.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="wal-internals">
|
||||
|
Reference in New Issue
Block a user