mirror of
https://github.com/postgres/postgres.git
synced 2025-05-28 05:21:27 +03:00
- Misc grammar and punctuation fixes. - Stylistic cleanup: use spaces between function arguments and JSON fields in examples. For example "foo(a,b)" -> "foo(a, b)". Add semicolon after last END in a few PL/pgSQL examples that were missing them. - Make sentence that talked about "..." and ".." operators more clear, by avoiding to end the sentence with "..". That makes it look the same as "..." - Fix syntax description for HAVING: HAVING conditions cannot be repeated Patch by Justin Pryzby, per Yaroslav Schekin's report. Backpatch to all supported versions, to the extent that the patch applies easily. Discussion: https://www.postgresql.org/message-id/20201005191922.GE17626%40telsasoft.com
2388 lines
99 KiB
Plaintext
2388 lines
99 KiB
Plaintext
<!-- doc/src/sgml/high-availability.sgml -->
|
|
|
|
<chapter id="high-availability">
|
|
<title>High Availability, Load Balancing, and Replication</title>
|
|
|
|
<indexterm><primary>high availability</></>
|
|
<indexterm><primary>failover</></>
|
|
<indexterm><primary>replication</></>
|
|
<indexterm><primary>load balancing</></>
|
|
<indexterm><primary>clustering</></>
|
|
<indexterm><primary>data partitioning</></>
|
|
|
|
<para>
|
|
Database servers can work together to allow a second server to
|
|
take over quickly if the primary server fails (high
|
|
availability), or to allow several computers to serve the same
|
|
data (load balancing). Ideally, database servers could work
|
|
together seamlessly. Web servers serving static web pages can
|
|
be combined quite easily by merely load-balancing web requests
|
|
to multiple machines. In fact, read-only database servers can
|
|
be combined relatively easily too. Unfortunately, most database
|
|
servers have a read/write mix of requests, and read/write servers
|
|
are much harder to combine. This is because though read-only
|
|
data needs to be placed on each server only once, a write to any
|
|
server has to be propagated to all servers so that future read
|
|
requests to those servers return consistent results.
|
|
</para>
|
|
|
|
<para>
|
|
This synchronization problem is the fundamental difficulty for
|
|
servers working together. Because there is no single solution
|
|
that eliminates the impact of the sync problem for all use cases,
|
|
there are multiple solutions. Each solution addresses this
|
|
problem in a different way, and minimizes its impact for a specific
|
|
workload.
|
|
</para>
|
|
|
|
<para>
|
|
Some solutions deal with synchronization by allowing only one
|
|
server to modify the data. Servers that can modify data are
|
|
called read/write, <firstterm>master</> or <firstterm>primary</> servers.
|
|
Servers that track changes in the master are called <firstterm>standby</>
|
|
or <firstterm>slave</> servers. A standby server that cannot be connected
|
|
to until it is promoted to a master server is called a <firstterm>warm
|
|
standby</> server, and one that can accept connections and serves read-only
|
|
queries is called a <firstterm>hot standby</> server.
|
|
</para>
|
|
|
|
<para>
|
|
Some solutions are synchronous,
|
|
meaning that a data-modifying transaction is not considered
|
|
committed until all servers have committed the transaction. This
|
|
guarantees that a failover will not lose any data and that all
|
|
load-balanced servers will return consistent results no matter
|
|
which server is queried. In contrast, asynchronous solutions allow some
|
|
delay between the time of a commit and its propagation to the other servers,
|
|
opening the possibility that some transactions might be lost in
|
|
the switch to a backup server, and that load balanced servers
|
|
might return slightly stale results. Asynchronous communication
|
|
is used when synchronous would be too slow.
|
|
</para>
|
|
|
|
<para>
|
|
Solutions can also be categorized by their granularity. Some solutions
|
|
can deal only with an entire database server, while others allow control
|
|
at the per-table or per-database level.
|
|
</para>
|
|
|
|
<para>
|
|
Performance must be considered in any choice. There is usually a
|
|
trade-off between functionality and
|
|
performance. For example, a fully synchronous solution over a slow
|
|
network might cut performance by more than half, while an asynchronous
|
|
one might have a minimal performance impact.
|
|
</para>
|
|
|
|
<para>
|
|
The remainder of this section outlines various failover, replication,
|
|
and load balancing solutions.
|
|
</para>
|
|
|
|
<sect1 id="different-replication-solutions">
|
|
<title>Comparison of Different Solutions</title>
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
<term>Shared Disk Failover</term>
|
|
<listitem>
|
|
|
|
<para>
|
|
Shared disk failover avoids synchronization overhead by having only one
|
|
copy of the database. It uses a single disk array that is shared by
|
|
multiple servers. If the main database server fails, the standby server
|
|
is able to mount and start the database as though it were recovering from
|
|
a database crash. This allows rapid failover with no data loss.
|
|
</para>
|
|
|
|
<para>
|
|
Shared hardware functionality is common in network storage devices.
|
|
Using a network file system is also possible, though care must be
|
|
taken that the file system has full <acronym>POSIX</> behavior (see <xref
|
|
linkend="creating-cluster-nfs">). One significant limitation of this
|
|
method is that if the shared disk array fails or becomes corrupt, the
|
|
primary and standby servers are both nonfunctional. Another issue is
|
|
that the standby server should never access the shared storage while
|
|
the primary server is running.
|
|
</para>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>File System (Block-Device) Replication</term>
|
|
<listitem>
|
|
|
|
<para>
|
|
A modified version of shared hardware functionality is file system
|
|
replication, where all changes to a file system are mirrored to a file
|
|
system residing on another computer. The only restriction is that
|
|
the mirroring must be done in a way that ensures the standby server
|
|
has a consistent copy of the file system — specifically, writes
|
|
to the standby must be done in the same order as those on the master.
|
|
<productname>DRBD</> is a popular file system replication solution
|
|
for Linux.
|
|
</para>
|
|
|
|
<!--
|
|
https://forge.continuent.org/pipermail/sequoia/2006-November/004070.html
|
|
|
|
Oracle RAC is a shared disk approach and just send cache invalidations
|
|
to other nodes but not actual data. As the disk is shared, data is
|
|
only committed once to disk and there is a distributed locking
|
|
protocol to make nodes agree on a serializable transactional order.
|
|
-->
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Transaction Log Shipping</term>
|
|
<listitem>
|
|
|
|
<para>
|
|
Warm and hot standby servers can be kept current by reading a
|
|
stream of write-ahead log (<acronym>WAL</>)
|
|
records. If the main server fails, the standby contains
|
|
almost all of the data of the main server, and can be quickly
|
|
made the new master database server. This can be synchronous or
|
|
asynchronous and can only be done for the entire database server.
|
|
</para>
|
|
<para>
|
|
A standby server can be implemented using file-based log shipping
|
|
(<xref linkend="warm-standby">) or streaming replication (see
|
|
<xref linkend="streaming-replication">), or a combination of both. For
|
|
information on hot standby, see <xref linkend="hot-standby">.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Trigger-Based Master-Standby Replication</term>
|
|
<listitem>
|
|
|
|
<para>
|
|
A master-standby replication setup sends all data modification
|
|
queries to the master server. The master server asynchronously
|
|
sends data changes to the standby server. The standby can answer
|
|
read-only queries while the master server is running. The
|
|
standby server is ideal for data warehouse queries.
|
|
</para>
|
|
|
|
<para>
|
|
<productname>Slony-I</> is an example of this type of replication, with per-table
|
|
granularity, and support for multiple standby servers. Because it
|
|
updates the standby server asynchronously (in batches), there is
|
|
possible data loss during fail over.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Statement-Based Replication Middleware</term>
|
|
<listitem>
|
|
|
|
<para>
|
|
With statement-based replication middleware, a program intercepts
|
|
every SQL query and sends it to one or all servers. Each server
|
|
operates independently. Read-write queries must be sent to all servers,
|
|
so that every server receives any changes. But read-only queries can be
|
|
sent to just one server, allowing the read workload to be distributed
|
|
among them.
|
|
</para>
|
|
|
|
<para>
|
|
If queries are simply broadcast unmodified, functions like
|
|
<function>random()</>, <function>CURRENT_TIMESTAMP</>, and
|
|
sequences can have different values on different servers.
|
|
This is because each server operates independently, and because
|
|
SQL queries are broadcast (and not actual modified rows). If
|
|
this is unacceptable, either the middleware or the application
|
|
must query such values from a single server and then use those
|
|
values in write queries. Another option is to use this replication
|
|
option with a traditional master-standby setup, i.e., data modification
|
|
queries are sent only to the master and are propagated to the
|
|
standby servers via master-standby replication, not by the replication
|
|
middleware. Care must also be taken that all
|
|
transactions either commit or abort on all servers, perhaps
|
|
using two-phase commit (<xref linkend="sql-prepare-transaction">
|
|
and <xref linkend="sql-commit-prepared">).
|
|
<productname>Pgpool-II</> and <productname>Continuent Tungsten</>
|
|
are examples of this type of replication.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Asynchronous Multimaster Replication</term>
|
|
<listitem>
|
|
|
|
<para>
|
|
For servers that are not regularly connected or have slow
|
|
communication links, like laptops or
|
|
remote servers, keeping data consistent among servers is a
|
|
challenge. Using asynchronous multimaster replication, each
|
|
server works independently, and periodically communicates with
|
|
the other servers to identify conflicting transactions. The
|
|
conflicts can be resolved by users or conflict resolution rules.
|
|
Bucardo is an example of this type of replication.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Synchronous Multimaster Replication</term>
|
|
<listitem>
|
|
|
|
<para>
|
|
In synchronous multimaster replication, each server can accept
|
|
write requests, and modified data is transmitted from the
|
|
original server to every other server before each transaction
|
|
commits. Heavy write activity can cause excessive locking and
|
|
commit delays, leading to poor performance. Read requests can
|
|
be sent to any server. Some implementations use shared disk
|
|
to reduce the communication overhead. Synchronous multimaster
|
|
replication is best for mostly read workloads, though its big
|
|
advantage is that any server can accept write requests —
|
|
there is no need to partition workloads between master and
|
|
standby servers, and because the data changes are sent from one
|
|
server to another, there is no problem with non-deterministic
|
|
functions like <function>random()</>.
|
|
</para>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</> does not offer this type of replication,
|
|
though <productname>PostgreSQL</> two-phase commit (<xref
|
|
linkend="sql-prepare-transaction"> and <xref
|
|
linkend="sql-commit-prepared">)
|
|
can be used to implement this in application code or middleware.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Commercial Solutions</term>
|
|
<listitem>
|
|
|
|
<para>
|
|
Because <productname>PostgreSQL</> is open source and easily
|
|
extended, a number of companies have taken <productname>PostgreSQL</>
|
|
and created commercial closed-source solutions with unique
|
|
failover, replication, and load balancing capabilities.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
<para>
|
|
<xref linkend="high-availability-matrix"> summarizes
|
|
the capabilities of the various solutions listed above.
|
|
</para>
|
|
|
|
<table id="high-availability-matrix">
|
|
<title>High Availability, Load Balancing, and Replication Feature Matrix</title>
|
|
<tgroup cols="8">
|
|
<thead>
|
|
<row>
|
|
<entry>Feature</entry>
|
|
<entry>Shared Disk Failover</entry>
|
|
<entry>File System Replication</entry>
|
|
<entry>Transaction Log Shipping</entry>
|
|
<entry>Trigger-Based Master-Standby Replication</entry>
|
|
<entry>Statement-Based Replication Middleware</entry>
|
|
<entry>Asynchronous Multimaster Replication</entry>
|
|
<entry>Synchronous Multimaster Replication</entry>
|
|
</row>
|
|
</thead>
|
|
|
|
<tbody>
|
|
|
|
<row>
|
|
<entry>Most Common Implementation</entry>
|
|
<entry align="center">NAS</entry>
|
|
<entry align="center">DRBD</entry>
|
|
<entry align="center">Streaming Repl.</entry>
|
|
<entry align="center">Slony</entry>
|
|
<entry align="center">pgpool-II</entry>
|
|
<entry align="center">Bucardo</entry>
|
|
<entry align="center"></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>Communication Method</entry>
|
|
<entry align="center">shared disk</entry>
|
|
<entry align="center">disk blocks</entry>
|
|
<entry align="center">WAL</entry>
|
|
<entry align="center">table rows</entry>
|
|
<entry align="center">SQL</entry>
|
|
<entry align="center">table rows</entry>
|
|
<entry align="center">table rows and row locks</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>No special hardware required</entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>Allows multiple master servers</entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>No master server overhead</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center"></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>No waiting for multiple servers</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center">with sync off</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center"></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>Master failure will never lose data</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">with sync on</entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center">•</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>Standby accept read-only queries</entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center">with hot</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>Per-table granularity</entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>No conflict resolution necessary</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center">•</entry>
|
|
<entry align="center"></entry>
|
|
<entry align="center">•</entry>
|
|
</row>
|
|
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<para>
|
|
There are a few solutions that do not fit into the above categories:
|
|
</para>
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
<term>Data Partitioning</term>
|
|
<listitem>
|
|
|
|
<para>
|
|
Data partitioning splits tables into data sets. Each set can
|
|
be modified by only one server. For example, data can be
|
|
partitioned by offices, e.g., London and Paris, with a server
|
|
in each office. If queries combining London and Paris data
|
|
are necessary, an application can query both servers, or
|
|
master/standby replication can be used to keep a read-only copy
|
|
of the other office's data on each server.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Multiple-Server Parallel Query Execution</term>
|
|
<listitem>
|
|
|
|
<para>
|
|
Many of the above solutions allow multiple servers to handle multiple
|
|
queries, but none allow a single query to use multiple servers to
|
|
complete faster. This solution allows multiple servers to work
|
|
concurrently on a single query. It is usually accomplished by
|
|
splitting the data among servers and having each server execute its
|
|
part of the query and return results to a central server where they
|
|
are combined and returned to the user. This can be implemented using the
|
|
<productname>PL/Proxy</productname> tool set.
|
|
</para>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="warm-standby">
|
|
<title>Log-Shipping Standby Servers</title>
|
|
|
|
|
|
<para>
|
|
Continuous archiving can be used to create a <firstterm>high
|
|
availability</> (HA) cluster configuration with one or more
|
|
<firstterm>standby servers</> ready to take over operations if the
|
|
primary server fails. This capability is widely referred to as
|
|
<firstterm>warm standby</> or <firstterm>log shipping</>.
|
|
</para>
|
|
|
|
<para>
|
|
The primary and standby server work together to provide this capability,
|
|
though the servers are only loosely coupled. The primary server operates
|
|
in continuous archiving mode, while each standby server operates in
|
|
continuous recovery mode, reading the WAL files from the primary. No
|
|
changes to the database tables are required to enable this capability,
|
|
so it offers low administration overhead compared to some other
|
|
replication solutions. This configuration also has relatively low
|
|
performance impact on the primary server.
|
|
</para>
|
|
|
|
<para>
|
|
Directly moving WAL records from one database server to another
|
|
is typically described as log shipping. <productname>PostgreSQL</>
|
|
implements file-based log shipping by transferring WAL records
|
|
one file (WAL segment) at a time. WAL files (16MB) can be
|
|
shipped easily and cheaply over any distance, whether it be to an
|
|
adjacent system, another system at the same site, or another system on
|
|
the far side of the globe. The bandwidth required for this technique
|
|
varies according to the transaction rate of the primary server.
|
|
Record-based log shipping is more granular and streams WAL changes
|
|
incrementally over a network connection (see <xref
|
|
linkend="streaming-replication">).
|
|
</para>
|
|
|
|
<para>
|
|
It should be noted that log shipping is asynchronous, i.e., the WAL
|
|
records are shipped after transaction commit. As a result, there is a
|
|
window for data loss should the primary server suffer a catastrophic
|
|
failure; transactions not yet shipped will be lost. The size of the
|
|
data loss window in file-based log shipping can be limited by use of the
|
|
<varname>archive_timeout</varname> parameter, which can be set as low
|
|
as a few seconds. However such a low setting will
|
|
substantially increase the bandwidth required for file shipping.
|
|
Streaming replication (see <xref linkend="streaming-replication">)
|
|
allows a much smaller window of data loss.
|
|
</para>
|
|
|
|
<para>
|
|
Recovery performance is sufficiently good that the standby will
|
|
typically be only moments away from full
|
|
availability once it has been activated. As a result, this is called
|
|
a warm standby configuration which offers high
|
|
availability. Restoring a server from an archived base backup and
|
|
rollforward will take considerably longer, so that technique only
|
|
offers a solution for disaster recovery, not high availability.
|
|
A standby server can also be used for read-only queries, in which case
|
|
it is called a Hot Standby server. See <xref linkend="hot-standby"> for
|
|
more information.
|
|
</para>
|
|
|
|
<indexterm zone="high-availability">
|
|
<primary>warm standby</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="high-availability">
|
|
<primary>PITR standby</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="high-availability">
|
|
<primary>standby server</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="high-availability">
|
|
<primary>log shipping</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="high-availability">
|
|
<primary>witness server</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="high-availability">
|
|
<primary>STONITH</primary>
|
|
</indexterm>
|
|
|
|
<sect2 id="standby-planning">
|
|
<title>Planning</title>
|
|
|
|
<para>
|
|
It is usually wise to create the primary and standby servers
|
|
so that they are as similar as possible, at least from the
|
|
perspective of the database server. In particular, the path names
|
|
associated with tablespaces will be passed across unmodified, so both
|
|
primary and standby servers must have the same mount paths for
|
|
tablespaces if that feature is used. Keep in mind that if
|
|
<xref linkend="sql-createtablespace">
|
|
is executed on the primary, any new mount point needed for it must
|
|
be created on the primary and all standby servers before the command
|
|
is executed. Hardware need not be exactly the same, but experience shows
|
|
that maintaining two identical systems is easier than maintaining two
|
|
dissimilar ones over the lifetime of the application and system.
|
|
In any case the hardware architecture must be the same — shipping
|
|
from, say, a 32-bit to a 64-bit system will not work.
|
|
</para>
|
|
|
|
<para>
|
|
In general, log shipping between servers running different major
|
|
<productname>PostgreSQL</> release
|
|
levels is not possible. It is the policy of the PostgreSQL Global
|
|
Development Group not to make changes to disk formats during minor release
|
|
upgrades, so it is likely that running different minor release levels
|
|
on primary and standby servers will work successfully. However, no
|
|
formal support for that is offered and you are advised to keep primary
|
|
and standby servers at the same release level as much as possible.
|
|
When updating to a new minor release, the safest policy is to update
|
|
the standby servers first — a new minor release is more likely
|
|
to be able to read WAL files from a previous minor release than vice
|
|
versa.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="standby-server-operation">
|
|
<title>Standby Server Operation</title>
|
|
|
|
<para>
|
|
In standby mode, the server continuously applies WAL received from the
|
|
master server. The standby server can read WAL from a WAL archive
|
|
(see <xref linkend="restore-command">) or directly from the master
|
|
over a TCP connection (streaming replication). The standby server will
|
|
also attempt to restore any WAL found in the standby cluster's
|
|
<filename>pg_xlog</> directory. That typically happens after a server
|
|
restart, when the standby replays again WAL that was streamed from the
|
|
master before the restart, but you can also manually copy files to
|
|
<filename>pg_xlog</> at any time to have them replayed.
|
|
</para>
|
|
|
|
<para>
|
|
At startup, the standby begins by restoring all WAL available in the
|
|
archive location, calling <varname>restore_command</>. Once it
|
|
reaches the end of WAL available there and <varname>restore_command</>
|
|
fails, it tries to restore any WAL available in the <filename>pg_xlog</> directory.
|
|
If that fails, and streaming replication has been configured, the
|
|
standby tries to connect to the primary server and start streaming WAL
|
|
from the last valid record found in archive or <filename>pg_xlog</>. If that fails
|
|
or streaming replication is not configured, or if the connection is
|
|
later disconnected, the standby goes back to step 1 and tries to
|
|
restore the file from the archive again. This loop of retries from the
|
|
archive, <filename>pg_xlog</>, and via streaming replication goes on until the server
|
|
is stopped or failover is triggered by a trigger file.
|
|
</para>
|
|
|
|
<para>
|
|
Standby mode is exited and the server switches to normal operation
|
|
when <command>pg_ctl promote</> is run or a trigger file is found
|
|
(<varname>trigger_file</>). Before failover,
|
|
any WAL immediately available in the archive or in <filename>pg_xlog</> will be
|
|
restored, but no attempt is made to connect to the master.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="preparing-master-for-standby">
|
|
<title>Preparing the Master for Standby Servers</title>
|
|
|
|
<para>
|
|
Set up continuous archiving on the primary to an archive directory
|
|
accessible from the standby, as described
|
|
in <xref linkend="continuous-archiving">. The archive location should be
|
|
accessible from the standby even when the master is down, i.e., it should
|
|
reside on the standby server itself or another trusted server, not on
|
|
the master server.
|
|
</para>
|
|
|
|
<para>
|
|
If you want to use streaming replication, set up authentication on the
|
|
primary server to allow replication connections from the standby
|
|
server(s); that is, create a role and provide a suitable entry or
|
|
entries in <filename>pg_hba.conf</> with the database field set to
|
|
<literal>replication</>. Also ensure <varname>max_wal_senders</> is set
|
|
to a sufficiently large value in the configuration file of the primary
|
|
server. If replication slots will be used,
|
|
ensure that <varname>max_replication_slots</varname> is set sufficiently
|
|
high as well.
|
|
</para>
|
|
|
|
<para>
|
|
Take a base backup as described in <xref linkend="backup-base-backup">
|
|
to bootstrap the standby server.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="standby-server-setup">
|
|
<title>Setting Up a Standby Server</title>
|
|
|
|
<para>
|
|
To set up the standby server, restore the base backup taken from primary
|
|
server (see <xref linkend="backup-pitr-recovery">). Create a recovery
|
|
command file <filename>recovery.conf</> in the standby's cluster data
|
|
directory, and turn on <varname>standby_mode</>. Set
|
|
<varname>restore_command</> to a simple command to copy files from
|
|
the WAL archive. If you plan to have multiple standby servers for high
|
|
availability purposes, set <varname>recovery_target_timeline</> to
|
|
<literal>latest</>, to make the standby server follow the timeline change
|
|
that occurs at failover to another standby.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
Do not use pg_standby or similar tools with the built-in standby mode
|
|
described here. <varname>restore_command</> should return immediately
|
|
if the file does not exist; the server will retry the command again if
|
|
necessary. See <xref linkend="log-shipping-alternative">
|
|
for using tools like pg_standby.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
If you want to use streaming replication, fill in
|
|
<varname>primary_conninfo</> with a libpq connection string, including
|
|
the host name (or IP address) and any additional details needed to
|
|
connect to the primary server. If the primary needs a password for
|
|
authentication, the password needs to be specified in
|
|
<varname>primary_conninfo</> as well.
|
|
</para>
|
|
|
|
<para>
|
|
If you're setting up the standby server for high availability purposes,
|
|
set up WAL archiving, connections and authentication like the primary
|
|
server, because the standby server will work as a primary server after
|
|
failover.
|
|
</para>
|
|
|
|
<para>
|
|
If you're using a WAL archive, its size can be minimized using the <xref
|
|
linkend="archive-cleanup-command"> parameter to remove files that are no
|
|
longer required by the standby server.
|
|
The <application>pg_archivecleanup</> utility is designed specifically to
|
|
be used with <varname>archive_cleanup_command</> in typical single-standby
|
|
configurations, see <xref linkend="pgarchivecleanup">.
|
|
Note however, that if you're using the archive for backup purposes, you
|
|
need to retain files needed to recover from at least the latest base
|
|
backup, even if they're no longer needed by the standby.
|
|
</para>
|
|
|
|
<para>
|
|
A simple example of a <filename>recovery.conf</> is:
|
|
<programlisting>
|
|
standby_mode = 'on'
|
|
primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
|
|
restore_command = 'cp /path/to/archive/%f %p'
|
|
archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
You can have any number of standby servers, but if you use streaming
|
|
replication, make sure you set <varname>max_wal_senders</> high enough in
|
|
the primary to allow them to be connected simultaneously.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="streaming-replication">
|
|
<title>Streaming Replication</title>
|
|
|
|
<indexterm zone="high-availability">
|
|
<primary>Streaming Replication</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Streaming replication allows a standby server to stay more up-to-date
|
|
than is possible with file-based log shipping. The standby connects
|
|
to the primary, which streams WAL records to the standby as they're
|
|
generated, without waiting for the WAL file to be filled.
|
|
</para>
|
|
|
|
<para>
|
|
Streaming replication is asynchronous by default
|
|
(see <xref linkend="synchronous-replication">), in which case there is
|
|
a small delay between committing a transaction in the primary and the
|
|
changes becoming visible in the standby. This delay is however much
|
|
smaller than with file-based log shipping, typically under one second
|
|
assuming the standby is powerful enough to keep up with the load. With
|
|
streaming replication, <varname>archive_timeout</> is not required to
|
|
reduce the data loss window.
|
|
</para>
|
|
|
|
<para>
|
|
If you use streaming replication without file-based continuous
|
|
archiving, the server might recycle old WAL segments before the standby
|
|
has received them. If this occurs, the standby will need to be
|
|
reinitialized from a new base backup. You can avoid this by setting
|
|
<varname>wal_keep_segments</> to a value large enough to ensure that
|
|
WAL segments are not recycled too early, or by configuring a replication
|
|
slot for the standby. If you set up a WAL archive that's accessible from
|
|
the standby, these solutions are not required, since the standby can
|
|
always use the archive to catch up provided it retains enough segments.
|
|
</para>
|
|
|
|
<para>
|
|
To use streaming replication, set up a file-based log-shipping standby
|
|
server as described in <xref linkend="warm-standby">. The step that
|
|
turns a file-based log-shipping standby into streaming replication
|
|
standby is setting <varname>primary_conninfo</> setting in the
|
|
<filename>recovery.conf</> file to point to the primary server. Set
|
|
<xref linkend="guc-listen-addresses"> and authentication options
|
|
(see <filename>pg_hba.conf</>) on the primary so that the standby server
|
|
can connect to the <literal>replication</> pseudo-database on the primary
|
|
server (see <xref linkend="streaming-replication-authentication">).
|
|
</para>
|
|
|
|
<para>
|
|
On systems that support the keepalive socket option, setting
|
|
<xref linkend="guc-tcp-keepalives-idle">,
|
|
<xref linkend="guc-tcp-keepalives-interval"> and
|
|
<xref linkend="guc-tcp-keepalives-count"> helps the primary promptly
|
|
notice a broken connection.
|
|
</para>
|
|
|
|
<para>
|
|
Set the maximum number of concurrent connections from the standby servers
|
|
(see <xref linkend="guc-max-wal-senders"> for details).
|
|
</para>
|
|
|
|
<para>
|
|
When the standby is started and <varname>primary_conninfo</> is set
|
|
correctly, the standby will connect to the primary after replaying all
|
|
WAL files available in the archive. If the connection is established
|
|
successfully, you will see a walreceiver process in the standby, and
|
|
a corresponding walsender process in the primary.
|
|
</para>
|
|
|
|
<sect3 id="streaming-replication-authentication">
|
|
<title>Authentication</title>
|
|
<para>
|
|
It is very important that the access privileges for replication be set up
|
|
so that only trusted users can read the WAL stream, because it is
|
|
easy to extract privileged information from it. Standby servers must
|
|
authenticate to the primary as a superuser or an account that has the
|
|
<literal>REPLICATION</> privilege. It is recommended to create a
|
|
dedicated user account with <literal>REPLICATION</> and <literal>LOGIN</>
|
|
privileges for replication. While <literal>REPLICATION</> privilege gives
|
|
very high permissions, it does not allow the user to modify any data on
|
|
the primary system, which the <literal>SUPERUSER</> privilege does.
|
|
</para>
|
|
|
|
<para>
|
|
Client authentication for replication is controlled by a
|
|
<filename>pg_hba.conf</> record specifying <literal>replication</> in the
|
|
<replaceable>database</> field. For example, if the standby is running on
|
|
host IP <literal>192.168.1.100</> and the account name for replication
|
|
is <literal>foo</>, the administrator can add the following line to the
|
|
<filename>pg_hba.conf</> file on the primary:
|
|
|
|
<programlisting>
|
|
# Allow the user "foo" from host 192.168.1.100 to connect to the primary
|
|
# as a replication standby if the user's password is correctly supplied.
|
|
#
|
|
# TYPE DATABASE USER ADDRESS METHOD
|
|
host replication foo 192.168.1.100/32 md5
|
|
</programlisting>
|
|
</para>
|
|
<para>
|
|
The host name and port number of the primary, connection user name,
|
|
and password are specified in the <filename>recovery.conf</> file.
|
|
The password can also be set in the <filename>~/.pgpass</> file on the
|
|
standby (specify <literal>replication</> in the <replaceable>database</>
|
|
field).
|
|
For example, if the primary is running on host IP <literal>192.168.1.50</>,
|
|
port <literal>5432</literal>, the account name for replication is
|
|
<literal>foo</>, and the password is <literal>foopass</>, the administrator
|
|
can add the following line to the <filename>recovery.conf</> file on the
|
|
standby:
|
|
|
|
<programlisting>
|
|
# The standby connects to the primary that is running on host 192.168.1.50
|
|
# and port 5432 as the user "foo" whose password is "foopass".
|
|
primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
|
|
</programlisting>
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3 id="streaming-replication-monitoring">
|
|
<title>Monitoring</title>
|
|
<para>
|
|
An important health indicator of streaming replication is the amount
|
|
of WAL records generated in the primary, but not yet applied in the
|
|
standby. You can calculate this lag by comparing the current WAL write
|
|
location on the primary with the last WAL location received by the
|
|
standby. They can be retrieved using
|
|
<function>pg_current_xlog_location</> on the primary and the
|
|
<function>pg_last_xlog_receive_location</> on the standby,
|
|
respectively (see <xref linkend="functions-admin-backup-table"> and
|
|
<xref linkend="functions-recovery-info-table"> for details).
|
|
The last WAL receive location in the standby is also displayed in the
|
|
process status of the WAL receiver process, displayed using the
|
|
<command>ps</> command (see <xref linkend="monitoring-ps"> for details).
|
|
</para>
|
|
<para>
|
|
You can retrieve a list of WAL sender processes via the
|
|
<link linkend="pg-stat-replication-view">
|
|
<literal>pg_stat_replication</></link> view. Large differences between
|
|
<function>pg_current_xlog_location</> and <literal>sent_location</> field
|
|
might indicate that the master server is under heavy load, while
|
|
differences between <literal>sent_location</> and
|
|
<function>pg_last_xlog_receive_location</> on the standby might indicate
|
|
network delay, or that the standby is under heavy load.
|
|
</para>
|
|
</sect3>
|
|
</sect2>
|
|
|
|
<sect2 id="streaming-replication-slots">
|
|
<title>Replication Slots</title>
|
|
<indexterm>
|
|
<primary>replication slot</primary>
|
|
<secondary>streaming replication</secondary>
|
|
</indexterm>
|
|
<para>
|
|
Replication slots provide an automated way to ensure that the master does
|
|
not remove WAL segments until they have been received by all standbys,
|
|
and that the master does not remove rows which could cause a
|
|
<link linkend="hot-standby-conflict">recovery conflict</> even when the
|
|
standby is disconnected.
|
|
</para>
|
|
<para>
|
|
In lieu of using replication slots, it is possible to prevent the removal
|
|
of old WAL segments using <xref linkend="guc-wal-keep-segments">, or by
|
|
storing the segments in an archive using
|
|
<xref linkend="guc-archive-command">.
|
|
However, these methods often result in retaining more WAL segments than
|
|
required, whereas replication slots retain only the number of segments
|
|
known to be needed. An advantage of these methods is that they bound
|
|
the space requirement for <literal>pg_xlog</>; there is currently no way
|
|
to do this using replication slots.
|
|
</para>
|
|
<para>
|
|
Similarly, <xref linkend="guc-hot-standby-feedback">
|
|
and <xref linkend="guc-vacuum-defer-cleanup-age"> provide protection against
|
|
relevant rows being removed by vacuum, but the former provides no
|
|
protection during any time period when the standby is not connected,
|
|
and the latter often needs to be set to a high value to provide adequate
|
|
protection. Replication slots overcome these disadvantages.
|
|
</para>
|
|
<sect3 id="streaming-replication-slots-manipulation">
|
|
<title>Querying and manipulating replication slots</title>
|
|
<para>
|
|
Each replication slot has a name, which can contain lower-case letters,
|
|
numbers, and the underscore character.
|
|
</para>
|
|
<para>
|
|
Existing replication slots and their state can be seen in the
|
|
<link linkend="view-pg-replication-slots"><structname>pg_replication_slots</structname></link>
|
|
view.
|
|
</para>
|
|
<para>
|
|
Slots can be created and dropped either via the streaming replication
|
|
protocol (see <xref linkend="protocol-replication">) or via SQL
|
|
functions (see <xref linkend="functions-replication">).
|
|
</para>
|
|
</sect3>
|
|
<sect3 id="streaming-replication-slots-config">
|
|
<title>Configuration Example</title>
|
|
<para>
|
|
You can create a replication slot like this:
|
|
<programlisting>
|
|
postgres=# SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
|
|
slot_name | xlog_position
|
|
-------------+---------------
|
|
node_a_slot |
|
|
|
|
postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
|
|
slot_name | slot_type | active
|
|
-------------+-----------+--------
|
|
node_a_slot | physical | f
|
|
(1 row)
|
|
</programlisting>
|
|
To configure the standby to use this slot, <varname>primary_slot_name</>
|
|
should be configured in the standby's <filename>recovery.conf</>.
|
|
Here is a simple example:
|
|
<programlisting>
|
|
standby_mode = 'on'
|
|
primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
|
|
primary_slot_name = 'node_a_slot'
|
|
</programlisting>
|
|
</para>
|
|
</sect3>
|
|
</sect2>
|
|
|
|
<sect2 id="cascading-replication">
|
|
<title>Cascading Replication</title>
|
|
|
|
<indexterm zone="high-availability">
|
|
<primary>Cascading Replication</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The cascading replication feature allows a standby server to accept replication
|
|
connections and stream WAL records to other standbys, acting as a relay.
|
|
This can be used to reduce the number of direct connections to the master
|
|
and also to minimize inter-site bandwidth overheads.
|
|
</para>
|
|
|
|
<para>
|
|
A standby acting as both a receiver and a sender is known as a cascading
|
|
standby. Standbys that are more directly connected to the master are known
|
|
as upstream servers, while those standby servers further away are downstream
|
|
servers. Cascading replication does not place limits on the number or
|
|
arrangement of downstream servers, though each standby connects to only
|
|
one upstream server which eventually links to a single master/primary
|
|
server.
|
|
</para>
|
|
|
|
<para>
|
|
A cascading standby sends not only WAL records received from the
|
|
master but also those restored from the archive. So even if the replication
|
|
connection in some upstream connection is terminated, streaming replication
|
|
continues downstream for as long as new WAL records are available.
|
|
</para>
|
|
|
|
<para>
|
|
Cascading replication is currently asynchronous. Synchronous replication
|
|
(see <xref linkend="synchronous-replication">) settings have no effect on
|
|
cascading replication at present.
|
|
</para>
|
|
|
|
<para>
|
|
Hot Standby feedback propagates upstream, whatever the cascaded arrangement.
|
|
</para>
|
|
|
|
<para>
|
|
If an upstream standby server is promoted to become new master, downstream
|
|
servers will continue to stream from the new master if
|
|
<varname>recovery_target_timeline</> is set to <literal>'latest'</>.
|
|
</para>
|
|
|
|
<para>
|
|
To use cascading replication, set up the cascading standby so that it can
|
|
accept replication connections (that is, set
|
|
<xref linkend="guc-max-wal-senders"> and <xref linkend="guc-hot-standby">,
|
|
and configure
|
|
<link linkend="auth-pg-hba-conf">host-based authentication</link>).
|
|
You will also need to set <varname>primary_conninfo</> in the downstream
|
|
standby to point to the cascading standby.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="synchronous-replication">
|
|
<title>Synchronous Replication</title>
|
|
|
|
<indexterm zone="high-availability">
|
|
<primary>Synchronous Replication</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</> streaming replication is asynchronous by
|
|
default. If the primary server
|
|
crashes then some transactions that were committed may not have been
|
|
replicated to the standby server, causing data loss. The amount
|
|
of data loss is proportional to the replication delay at the time of
|
|
failover.
|
|
</para>
|
|
|
|
<para>
|
|
Synchronous replication offers the ability to confirm that all changes
|
|
made by a transaction have been transferred to one or more synchronous
|
|
standby servers. This extends that standard level of durability
|
|
offered by a transaction commit. This level of protection is referred
|
|
to as 2-safe replication in computer science theory, and group-1-safe
|
|
(group-safe and 1-safe) when <varname>synchronous_commit</> is set to
|
|
<literal>remote_write</>.
|
|
</para>
|
|
|
|
<para>
|
|
When requesting synchronous replication, each commit of a
|
|
write transaction will wait until confirmation is
|
|
received that the commit has been written to the transaction log on disk
|
|
of both the primary and standby server. The only possibility that data
|
|
can be lost is if both the primary and the standby suffer crashes at the
|
|
same time. This can provide a much higher level of durability, though only
|
|
if the sysadmin is cautious about the placement and management of the two
|
|
servers. Waiting for confirmation increases the user's confidence that the
|
|
changes will not be lost in the event of server crashes but it also
|
|
necessarily increases the response time for the requesting transaction.
|
|
The minimum wait time is the round-trip time between primary to standby.
|
|
</para>
|
|
|
|
<para>
|
|
Read only transactions and transaction rollbacks need not wait for
|
|
replies from standby servers. Subtransaction commits do not wait for
|
|
responses from standby servers, only top-level commits. Long
|
|
running actions such as data loading or index building do not wait
|
|
until the very final commit message. All two-phase commit actions
|
|
require commit waits, including both prepare and commit.
|
|
</para>
|
|
|
|
<sect3 id="synchronous-replication-config">
|
|
<title>Basic Configuration</title>
|
|
|
|
<para>
|
|
Once streaming replication has been configured, configuring synchronous
|
|
replication requires only one additional configuration step:
|
|
<xref linkend="guc-synchronous-standby-names"> must be set to
|
|
a non-empty value. <varname>synchronous_commit</> must also be set to
|
|
<literal>on</>, but since this is the default value, typically no change is
|
|
required. (See <xref linkend="runtime-config-wal-settings"> and
|
|
<xref linkend="runtime-config-replication-master">.)
|
|
This configuration will cause each commit to wait for
|
|
confirmation that the standby has written the commit record to durable
|
|
storage.
|
|
<varname>synchronous_commit</> can be set by individual
|
|
users, so it can be configured in the configuration file, for particular
|
|
users or databases, or dynamically by applications, in order to control
|
|
the durability guarantee on a per-transaction basis.
|
|
</para>
|
|
|
|
<para>
|
|
After a commit record has been written to disk on the primary, the
|
|
WAL record is then sent to the standby. The standby sends reply
|
|
messages each time a new batch of WAL data is written to disk, unless
|
|
<varname>wal_receiver_status_interval</> is set to zero on the standby.
|
|
In the case that <varname>synchronous_commit</> is set to
|
|
<literal>remote_apply</>, the standby sends reply messages when the commit
|
|
record is replayed, making the transaction visible.
|
|
If the standby is chosen as a synchronous standby, from a priority
|
|
list of <varname>synchronous_standby_names</> on the primary, the reply
|
|
messages from that standby will be considered along with those from other
|
|
synchronous standbys to decide when to release transactions waiting for
|
|
confirmation that the commit record has been received. These parameters
|
|
allow the administrator to specify which standby servers should be
|
|
synchronous standbys. Note that the configuration of synchronous
|
|
replication is mainly on the master. Named standbys must be directly
|
|
connected to the master; the master knows nothing about downstream
|
|
standby servers using cascaded replication.
|
|
</para>
|
|
|
|
<para>
|
|
Setting <varname>synchronous_commit</> to <literal>remote_write</> will
|
|
cause each commit to wait for confirmation that the standby has received
|
|
the commit record and written it out to its own operating system, but not
|
|
for the data to be flushed to disk on the standby. This
|
|
setting provides a weaker guarantee of durability than <literal>on</>
|
|
does: the standby could lose the data in the event of an operating system
|
|
crash, though not a <productname>PostgreSQL</> crash.
|
|
However, it's a useful setting in practice
|
|
because it can decrease the response time for the transaction.
|
|
Data loss could only occur if both the primary and the standby crash and
|
|
the database of the primary gets corrupted at the same time.
|
|
</para>
|
|
|
|
<para>
|
|
Setting <varname>synchronous_commit</> to <literal>remote_apply</> will
|
|
cause each commit to wait until the current synchronous standbys report
|
|
that they have replayed the transaction, making it visible to user
|
|
queries. In simple cases, this allows for load balancing with causal
|
|
consistency.
|
|
</para>
|
|
|
|
<para>
|
|
Users will stop waiting if a fast shutdown is requested. However, as
|
|
when using asynchronous replication, the server will not fully
|
|
shutdown until all outstanding WAL records are transferred to the currently
|
|
connected standby servers.
|
|
</para>
|
|
|
|
</sect3>
|
|
|
|
<sect3 id="synchronous-replication-multiple-standbys">
|
|
<title>Multiple Synchronous Standbys</title>
|
|
|
|
<para>
|
|
Synchronous replication supports one or more synchronous standby servers;
|
|
transactions will wait until all the standby servers which are considered
|
|
as synchronous confirm receipt of their data. The number of synchronous
|
|
standbys that transactions must wait for replies from is specified in
|
|
<varname>synchronous_standby_names</>. This parameter also specifies
|
|
a list of standby names, which determines the priority of each standby
|
|
for being chosen as a synchronous standby. The standbys whose names
|
|
appear earlier in the list are given higher priority and will be considered
|
|
as synchronous. Other standby servers appearing later in this list
|
|
represent potential synchronous standbys. If any of the current
|
|
synchronous standbys disconnects for whatever reason, it will be replaced
|
|
immediately with the next-highest-priority standby.
|
|
</para>
|
|
<para>
|
|
An example of <varname>synchronous_standby_names</> for multiple
|
|
synchronous standbys is:
|
|
<programlisting>
|
|
synchronous_standby_names = '2 (s1, s2, s3)'
|
|
</programlisting>
|
|
In this example, if four standby servers <literal>s1</>, <literal>s2</>,
|
|
<literal>s3</> and <literal>s4</> are running, the two standbys
|
|
<literal>s1</> and <literal>s2</> will be chosen as synchronous standbys
|
|
because their names appear early in the list of standby names.
|
|
<literal>s3</> is a potential synchronous standby and will take over
|
|
the role of synchronous standby when either of <literal>s1</> or
|
|
<literal>s2</> fails. <literal>s4</> is an asynchronous standby since
|
|
its name is not in the list.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3 id="synchronous-replication-performance">
|
|
<title>Planning for Performance</title>
|
|
|
|
<para>
|
|
Synchronous replication usually requires carefully planned and placed
|
|
standby servers to ensure applications perform acceptably. Waiting
|
|
doesn't utilize system resources, but transaction locks continue to be
|
|
held until the transfer is confirmed. As a result, incautious use of
|
|
synchronous replication will reduce performance for database
|
|
applications because of increased response times and higher contention.
|
|
</para>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</> allows the application developer
|
|
to specify the durability level required via replication. This can be
|
|
specified for the system overall, though it can also be specified for
|
|
specific users or connections, or even individual transactions.
|
|
</para>
|
|
|
|
<para>
|
|
For example, an application workload might consist of:
|
|
10% of changes are important customer details, while
|
|
90% of changes are less important data that the business can more
|
|
easily survive if it is lost, such as chat messages between users.
|
|
</para>
|
|
|
|
<para>
|
|
With synchronous replication options specified at the application level
|
|
(on the primary) we can offer synchronous replication for the most
|
|
important changes, without slowing down the bulk of the total workload.
|
|
Application level options are an important and practical tool for allowing
|
|
the benefits of synchronous replication for high performance applications.
|
|
</para>
|
|
|
|
<para>
|
|
You should consider that the network bandwidth must be higher than
|
|
the rate of generation of WAL data.
|
|
</para>
|
|
|
|
</sect3>
|
|
|
|
<sect3 id="synchronous-replication-ha">
|
|
<title>Planning for High Availability</title>
|
|
|
|
<para>
|
|
<varname>synchronous_standby_names</> specifies the number and
|
|
names of synchronous standbys that transaction commits made when
|
|
<varname>synchronous_commit</> is set to <literal>on</>,
|
|
<literal>remote_apply</> or <literal>remote_write</> will wait for
|
|
responses from. Such transaction commits may never be completed
|
|
if any one of synchronous standbys should crash.
|
|
</para>
|
|
|
|
<para>
|
|
The best solution for high availability is to ensure you keep as many
|
|
synchronous standbys as requested. This can be achieved by naming multiple
|
|
potential synchronous standbys using <varname>synchronous_standby_names</>.
|
|
The standbys whose names appear earlier in the list will be used as
|
|
synchronous standbys. Standbys listed after these will take over
|
|
the role of synchronous standby if one of current ones should fail.
|
|
</para>
|
|
|
|
<para>
|
|
When a standby first attaches to the primary, it will not yet be properly
|
|
synchronized. This is described as <literal>catchup</> mode. Once
|
|
the lag between standby and primary reaches zero for the first time
|
|
we move to real-time <literal>streaming</> state.
|
|
The catch-up duration may be long immediately after the standby has
|
|
been created. If the standby is shut down, then the catch-up period
|
|
will increase according to the length of time the standby has been down.
|
|
The standby is only able to become a synchronous standby
|
|
once it has reached <literal>streaming</> state.
|
|
</para>
|
|
|
|
<para>
|
|
If primary restarts while commits are waiting for acknowledgement, those
|
|
waiting transactions will be marked fully committed once the primary
|
|
database recovers.
|
|
There is no way to be certain that all standbys have received all
|
|
outstanding WAL data at time of the crash of the primary. Some
|
|
transactions may not show as committed on the standby, even though
|
|
they show as committed on the primary. The guarantee we offer is that
|
|
the application will not receive explicit acknowledgement of the
|
|
successful commit of a transaction until the WAL data is known to be
|
|
safely received by all the synchronous standbys.
|
|
</para>
|
|
|
|
<para>
|
|
If you really cannot keep as many synchronous standbys as requested
|
|
then you should decrease the number of synchronous standbys that
|
|
transaction commits must wait for responses from
|
|
in <varname>synchronous_standby_names</> (or disable it) and
|
|
reload the configuration file on the primary server.
|
|
</para>
|
|
|
|
<para>
|
|
If the primary is isolated from remaining standby servers you should
|
|
fail over to the best candidate of those other remaining standby servers.
|
|
</para>
|
|
|
|
<para>
|
|
If you need to re-create a standby server while transactions are
|
|
waiting, make sure that the commands pg_start_backup() and
|
|
pg_stop_backup() are run in a session with
|
|
<varname>synchronous_commit</> = <literal>off</>, otherwise those
|
|
requests will wait forever for the standby to appear.
|
|
</para>
|
|
|
|
</sect3>
|
|
</sect2>
|
|
|
|
<sect2 id="continuous-archiving-in-standby">
|
|
<title>Continuous archiving in standby</title>
|
|
|
|
<indexterm>
|
|
<primary>continuous archiving</primary>
|
|
<secondary>in standby</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
When continuous WAL archiving is used in a standby, there are two
|
|
different scenarios: the WAL archive can be shared between the primary
|
|
and the standby, or the standby can have its own WAL archive. When
|
|
the standby has its own WAL archive, set <varname>archive_mode</varname>
|
|
to <literal>always</literal>, and the standby will call the archive
|
|
command for every WAL segment it receives, whether it's by restoring
|
|
from the archive or by streaming replication. The shared archive can
|
|
be handled similarly, but the <varname>archive_command</varname> must
|
|
test if the file being archived exists already, and if the existing file
|
|
has identical contents. This requires more care in the
|
|
<varname>archive_command</varname>, as it must
|
|
be careful to not overwrite an existing file with different contents,
|
|
but return success if the exactly same file is archived twice. And
|
|
all that must be done free of race conditions, if two servers attempt
|
|
to archive the same file at the same time.
|
|
</para>
|
|
|
|
<para>
|
|
If <varname>archive_mode</varname> is set to <literal>on</>, the
|
|
archiver is not enabled during recovery or standby mode. If the standby
|
|
server is promoted, it will start archiving after the promotion, but
|
|
will not archive any WAL or timeline history files that
|
|
it did not generate itself. To get a complete
|
|
series of WAL files in the archive, you must ensure that all WAL is
|
|
archived, before it reaches the standby. This is inherently true with
|
|
file-based log shipping, as the standby can only restore files that
|
|
are found in the archive, but not if streaming replication is enabled.
|
|
When a server is not in recovery mode, there is no difference between
|
|
<literal>on</literal> and <literal>always</literal> modes.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="warm-standby-failover">
|
|
<title>Failover</title>
|
|
|
|
<para>
|
|
If the primary server fails then the standby server should begin
|
|
failover procedures.
|
|
</para>
|
|
|
|
<para>
|
|
If the standby server fails then no failover need take place. If the
|
|
standby server can be restarted, even some time later, then the recovery
|
|
process can also be restarted immediately, taking advantage of
|
|
restartable recovery. If the standby server cannot be restarted, then a
|
|
full new standby server instance should be created.
|
|
</para>
|
|
|
|
<para>
|
|
If the primary server fails and the standby server becomes the
|
|
new primary, and then the old primary restarts, you must have
|
|
a mechanism for informing the old primary that it is no longer the primary. This is
|
|
sometimes known as <acronym>STONITH</> (Shoot The Other Node In The Head), which is
|
|
necessary to avoid situations where both systems think they are the
|
|
primary, which will lead to confusion and ultimately data loss.
|
|
</para>
|
|
|
|
<para>
|
|
Many failover systems use just two systems, the primary and the standby,
|
|
connected by some kind of heartbeat mechanism to continually verify the
|
|
connectivity between the two and the viability of the primary. It is
|
|
also possible to use a third system (called a witness server) to prevent
|
|
some cases of inappropriate failover, but the additional complexity
|
|
might not be worthwhile unless it is set up with sufficient care and
|
|
rigorous testing.
|
|
</para>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</productname> does not provide the system
|
|
software required to identify a failure on the primary and notify
|
|
the standby database server. Many such tools exist and are well
|
|
integrated with the operating system facilities required for
|
|
successful failover, such as IP address migration.
|
|
</para>
|
|
|
|
<para>
|
|
Once failover to the standby occurs, there is only a
|
|
single server in operation. This is known as a degenerate state.
|
|
The former standby is now the primary, but the former primary is down
|
|
and might stay down. To return to normal operation, a standby server
|
|
must be recreated,
|
|
either on the former primary system when it comes up, or on a third,
|
|
possibly new, system. The <xref linkend="app-pgrewind"> utility can be
|
|
used to speed up this process on large clusters.
|
|
Once complete, the primary and standby can be
|
|
considered to have switched roles. Some people choose to use a third
|
|
server to provide backup for the new primary until the new standby
|
|
server is recreated,
|
|
though clearly this complicates the system configuration and
|
|
operational processes.
|
|
</para>
|
|
|
|
<para>
|
|
So, switching from primary to standby server can be fast but requires
|
|
some time to re-prepare the failover cluster. Regular switching from
|
|
primary to standby is useful, since it allows regular downtime on
|
|
each system for maintenance. This also serves as a test of the
|
|
failover mechanism to ensure that it will really work when you need it.
|
|
Written administration procedures are advised.
|
|
</para>
|
|
|
|
<para>
|
|
To trigger failover of a log-shipping standby server,
|
|
run <command>pg_ctl promote</> or create a trigger
|
|
file with the file name and path specified by the <varname>trigger_file</>
|
|
setting in <filename>recovery.conf</>. If you're planning to use
|
|
<command>pg_ctl promote</> to fail over, <varname>trigger_file</> is
|
|
not required. If you're setting up the reporting servers that are
|
|
only used to offload read-only queries from the primary, not for high
|
|
availability purposes, you don't need to promote it.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="log-shipping-alternative">
|
|
<title>Alternative Method for Log Shipping</title>
|
|
|
|
<para>
|
|
An alternative to the built-in standby mode described in the previous
|
|
sections is to use a <varname>restore_command</> that polls the archive location.
|
|
This was the only option available in versions 8.4 and below. In this
|
|
setup, set <varname>standby_mode</> off, because you are implementing
|
|
the polling required for standby operation yourself. See the
|
|
<xref linkend="pgstandby"> module for a reference
|
|
implementation of this.
|
|
</para>
|
|
|
|
<para>
|
|
Note that in this mode, the server will apply WAL one file at a
|
|
time, so if you use the standby server for queries (see Hot Standby),
|
|
there is a delay between an action in the master and when the
|
|
action becomes visible in the standby, corresponding to the time it takes
|
|
to fill up the WAL file. <varname>archive_timeout</> can be used to make that delay
|
|
shorter. Also note that you can't combine streaming replication with
|
|
this method.
|
|
</para>
|
|
|
|
<para>
|
|
The operations that occur on both primary and standby servers are
|
|
normal continuous archiving and recovery tasks. The only point of
|
|
contact between the two database servers is the archive of WAL files
|
|
that both share: primary writing to the archive, standby reading from
|
|
the archive. Care must be taken to ensure that WAL archives from separate
|
|
primary servers do not become mixed together or confused. The archive
|
|
need not be large if it is only required for standby operation.
|
|
</para>
|
|
|
|
<para>
|
|
The magic that makes the two loosely coupled servers work together is
|
|
simply a <varname>restore_command</> used on the standby that,
|
|
when asked for the next WAL file, waits for it to become available from
|
|
the primary. The <varname>restore_command</> is specified in the
|
|
<filename>recovery.conf</> file on the standby server. Normal recovery
|
|
processing would request a file from the WAL archive, reporting failure
|
|
if the file was unavailable. For standby processing it is normal for
|
|
the next WAL file to be unavailable, so the standby must wait for
|
|
it to appear. For files ending in
|
|
<literal>.history</> there is no need to wait, and a non-zero return
|
|
code must be returned. A waiting <varname>restore_command</> can be
|
|
written as a custom script that loops after polling for the existence of
|
|
the next WAL file. There must also be some way to trigger failover, which
|
|
should interrupt the <varname>restore_command</>, break the loop and
|
|
return a file-not-found error to the standby server. This ends recovery
|
|
and the standby will then come up as a normal server.
|
|
</para>
|
|
|
|
<para>
|
|
Pseudocode for a suitable <varname>restore_command</> is:
|
|
<programlisting>
|
|
triggered = false;
|
|
while (!NextWALFileReady() && !triggered)
|
|
{
|
|
sleep(100000L); /* wait for ~0.1 sec */
|
|
if (CheckForExternalTrigger())
|
|
triggered = true;
|
|
}
|
|
if (!triggered)
|
|
CopyWALFileForRecovery();
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
A working example of a waiting <varname>restore_command</> is provided
|
|
in the <xref linkend="pgstandby"> module. It
|
|
should be used as a reference on how to correctly implement the logic
|
|
described above. It can also be extended as needed to support specific
|
|
configurations and environments.
|
|
</para>
|
|
|
|
<para>
|
|
The method for triggering failover is an important part of planning
|
|
and design. One potential option is the <varname>restore_command</>
|
|
command. It is executed once for each WAL file, but the process
|
|
running the <varname>restore_command</> is created and dies for
|
|
each file, so there is no daemon or server process, and
|
|
signals or a signal handler cannot be used. Therefore, the
|
|
<varname>restore_command</> is not suitable to trigger failover.
|
|
It is possible to use a simple timeout facility, especially if
|
|
used in conjunction with a known <varname>archive_timeout</>
|
|
setting on the primary. However, this is somewhat error prone
|
|
since a network problem or busy primary server might be sufficient
|
|
to initiate failover. A notification mechanism such as the explicit
|
|
creation of a trigger file is ideal, if this can be arranged.
|
|
</para>
|
|
|
|
<sect2 id="warm-standby-config">
|
|
<title>Implementation</title>
|
|
|
|
<para>
|
|
The short procedure for configuring a standby server using this alternative
|
|
method is as follows. For
|
|
full details of each step, refer to previous sections as noted.
|
|
<orderedlist>
|
|
<listitem>
|
|
<para>
|
|
Set up primary and standby systems as nearly identical as
|
|
possible, including two identical copies of
|
|
<productname>PostgreSQL</> at the same release level.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Set up continuous archiving from the primary to a WAL archive
|
|
directory on the standby server. Ensure that
|
|
<xref linkend="guc-archive-mode">,
|
|
<xref linkend="guc-archive-command"> and
|
|
<xref linkend="guc-archive-timeout">
|
|
are set appropriately on the primary
|
|
(see <xref linkend="backup-archiving-wal">).
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Make a base backup of the primary server (see <xref
|
|
linkend="backup-base-backup">), and load this data onto the standby.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Begin recovery on the standby server from the local WAL
|
|
archive, using a <filename>recovery.conf</> that specifies a
|
|
<varname>restore_command</> that waits as described
|
|
previously (see <xref linkend="backup-pitr-recovery">).
|
|
</para>
|
|
</listitem>
|
|
</orderedlist>
|
|
</para>
|
|
|
|
<para>
|
|
Recovery treats the WAL archive as read-only, so once a WAL file has
|
|
been copied to the standby system it can be copied to tape at the same
|
|
time as it is being read by the standby database server.
|
|
Thus, running a standby server for high availability can be performed at
|
|
the same time as files are stored for longer term disaster recovery
|
|
purposes.
|
|
</para>
|
|
|
|
<para>
|
|
For testing purposes, it is possible to run both primary and standby
|
|
servers on the same system. This does not provide any worthwhile
|
|
improvement in server robustness, nor would it be described as HA.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="warm-standby-record">
|
|
<title>Record-based Log Shipping</title>
|
|
|
|
<para>
|
|
It is also possible to implement record-based log shipping using this
|
|
alternative method, though this requires custom development, and changes
|
|
will still only become visible to hot standby queries after a full WAL
|
|
file has been shipped.
|
|
</para>
|
|
|
|
<para>
|
|
An external program can call the <function>pg_xlogfile_name_offset()</>
|
|
function (see <xref linkend="functions-admin">)
|
|
to find out the file name and the exact byte offset within it of
|
|
the current end of WAL. It can then access the WAL file directly
|
|
and copy the data from the last known end of WAL through the current end
|
|
over to the standby servers. With this approach, the window for data
|
|
loss is the polling cycle time of the copying program, which can be very
|
|
small, and there is no wasted bandwidth from forcing partially-used
|
|
segment files to be archived. Note that the standby servers'
|
|
<varname>restore_command</> scripts can only deal with whole WAL files,
|
|
so the incrementally copied data is not ordinarily made available to
|
|
the standby servers. It is of use only when the primary dies —
|
|
then the last partial WAL file is fed to the standby before allowing
|
|
it to come up. The correct implementation of this process requires
|
|
cooperation of the <varname>restore_command</> script with the data
|
|
copying program.
|
|
</para>
|
|
|
|
<para>
|
|
Starting with <productname>PostgreSQL</> version 9.0, you can use
|
|
streaming replication (see <xref linkend="streaming-replication">) to
|
|
achieve the same benefits with less effort.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="hot-standby">
|
|
<title>Hot Standby</title>
|
|
|
|
<indexterm zone="high-availability">
|
|
<primary>Hot Standby</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Hot Standby is the term used to describe the ability to connect to
|
|
the server and run read-only queries while the server is in archive
|
|
recovery or standby mode. This
|
|
is useful both for replication purposes and for restoring a backup
|
|
to a desired state with great precision.
|
|
The term Hot Standby also refers to the ability of the server to move
|
|
from recovery through to normal operation while users continue running
|
|
queries and/or keep their connections open.
|
|
</para>
|
|
|
|
<para>
|
|
Running queries in hot standby mode is similar to normal query operation,
|
|
though there are several usage and administrative differences
|
|
explained below.
|
|
</para>
|
|
|
|
<sect2 id="hot-standby-users">
|
|
<title>User's Overview</title>
|
|
|
|
<para>
|
|
When the <xref linkend="guc-hot-standby"> parameter is set to true on a
|
|
standby server, it will begin accepting connections once the recovery has
|
|
brought the system to a consistent state. All such connections are
|
|
strictly read-only; not even temporary tables may be written.
|
|
</para>
|
|
|
|
<para>
|
|
The data on the standby takes some time to arrive from the primary server
|
|
so there will be a measurable delay between primary and standby. Running the
|
|
same query nearly simultaneously on both primary and standby might therefore
|
|
return differing results. We say that data on the standby is
|
|
<firstterm>eventually consistent</firstterm> with the primary. Once the
|
|
commit record for a transaction is replayed on the standby, the changes
|
|
made by that transaction will be visible to any new snapshots taken on
|
|
the standby. Snapshots may be taken at the start of each query or at the
|
|
start of each transaction, depending on the current transaction isolation
|
|
level. For more details, see <xref linkend="transaction-iso">.
|
|
</para>
|
|
|
|
<para>
|
|
Transactions started during hot standby may issue the following commands:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Query access - <command>SELECT</>, <command>COPY TO</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Cursor commands - <command>DECLARE</>, <command>FETCH</>, <command>CLOSE</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Parameters - <command>SHOW</>, <command>SET</>, <command>RESET</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Transaction management commands
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<command>BEGIN</>, <command>END</>, <command>ABORT</>, <command>START TRANSACTION</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>SAVEPOINT</>, <command>RELEASE</>, <command>ROLLBACK TO SAVEPOINT</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>EXCEPTION</> blocks and other internal subtransactions
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>LOCK TABLE</>, though only when explicitly in one of these modes:
|
|
<literal>ACCESS SHARE</>, <literal>ROW SHARE</> or <literal>ROW EXCLUSIVE</>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Plans and resources - <command>PREPARE</>, <command>EXECUTE</>,
|
|
<command>DEALLOCATE</>, <command>DISCARD</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Plugins and extensions - <command>LOAD</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>UNLISTEN</command>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
Transactions started during hot standby will never be assigned a
|
|
transaction ID and cannot write to the system write-ahead log.
|
|
Therefore, the following actions will produce error messages:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Data Manipulation Language (DML) - <command>INSERT</>,
|
|
<command>UPDATE</>, <command>DELETE</>, <command>COPY FROM</>,
|
|
<command>TRUNCATE</>.
|
|
Note that there are no allowed actions that result in a trigger
|
|
being executed during recovery. This restriction applies even to
|
|
temporary tables, because table rows cannot be read or written without
|
|
assigning a transaction ID, which is currently not possible in a
|
|
Hot Standby environment.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Data Definition Language (DDL) - <command>CREATE</>,
|
|
<command>DROP</>, <command>ALTER</>, <command>COMMENT</>.
|
|
This restriction applies even to temporary tables, because carrying
|
|
out these operations would require updating the system catalog tables.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>SELECT ... FOR SHARE | UPDATE</>, because row locks cannot be
|
|
taken without updating the underlying data files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Rules on <command>SELECT</> statements that generate DML commands.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>LOCK</> that explicitly requests a mode higher than <literal>ROW EXCLUSIVE MODE</>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>LOCK</> in short default form, since it requests <literal>ACCESS EXCLUSIVE MODE</>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Transaction management commands that explicitly set non-read-only state:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<command>BEGIN READ WRITE</>,
|
|
<command>START TRANSACTION READ WRITE</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>SET TRANSACTION READ WRITE</>,
|
|
<command>SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>SET transaction_read_only = off</>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Two-phase commit commands - <command>PREPARE TRANSACTION</>,
|
|
<command>COMMIT PREPARED</>, <command>ROLLBACK PREPARED</>
|
|
because even read-only transactions need to write WAL in the
|
|
prepare phase (the first phase of two phase commit).
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Sequence updates - <function>nextval()</>, <function>setval()</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>LISTEN</>, <command>NOTIFY</>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
In normal operation, <quote>read-only</> transactions are allowed to
|
|
use <command>LISTEN</command> and <command>NOTIFY</command>,
|
|
so Hot Standby sessions operate under slightly tighter
|
|
restrictions than ordinary read-only sessions. It is possible that some
|
|
of these restrictions might be loosened in a future release.
|
|
</para>
|
|
|
|
<para>
|
|
During hot standby, the parameter <varname>transaction_read_only</> is always
|
|
true and may not be changed. But as long as no attempt is made to modify
|
|
the database, connections during hot standby will act much like any other
|
|
database connection. If failover or switchover occurs, the database will
|
|
switch to normal processing mode. Sessions will remain connected while the
|
|
server changes mode. Once hot standby finishes, it will be possible to
|
|
initiate read-write transactions (even from a session begun during
|
|
hot standby).
|
|
</para>
|
|
|
|
<para>
|
|
Users will be able to tell whether their session is read-only by
|
|
issuing <command>SHOW transaction_read_only</>. In addition, a set of
|
|
functions (<xref linkend="functions-recovery-info-table">) allow users to
|
|
access information about the standby server. These allow you to write
|
|
programs that are aware of the current state of the database. These
|
|
can be used to monitor the progress of recovery, or to allow you to
|
|
write complex programs that restore the database to particular states.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="hot-standby-conflict">
|
|
<title>Handling Query Conflicts</title>
|
|
|
|
<para>
|
|
The primary and standby servers are in many ways loosely connected. Actions
|
|
on the primary will have an effect on the standby. As a result, there is
|
|
potential for negative interactions or conflicts between them. The easiest
|
|
conflict to understand is performance: if a huge data load is taking place
|
|
on the primary then this will generate a similar stream of WAL records on the
|
|
standby, so standby queries may contend for system resources, such as I/O.
|
|
</para>
|
|
|
|
<para>
|
|
There are also additional types of conflict that can occur with Hot Standby.
|
|
These conflicts are <emphasis>hard conflicts</> in the sense that queries
|
|
might need to be canceled and, in some cases, sessions disconnected to resolve them.
|
|
The user is provided with several ways to handle these
|
|
conflicts. Conflict cases include:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Access Exclusive locks taken on the primary server, including both
|
|
explicit <command>LOCK</> commands and various <acronym>DDL</>
|
|
actions, conflict with table accesses in standby queries.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Dropping a tablespace on the primary conflicts with standby queries
|
|
using that tablespace for temporary work files.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Dropping a database on the primary conflicts with sessions connected
|
|
to that database on the standby.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Application of a vacuum cleanup record from WAL conflicts with
|
|
standby transactions whose snapshots can still <quote>see</> any of
|
|
the rows to be removed.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Application of a vacuum cleanup record from WAL conflicts with
|
|
queries accessing the target page on the standby, whether or not
|
|
the data to be removed is visible.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
On the primary server, these cases simply result in waiting; and the
|
|
user might choose to cancel either of the conflicting actions. However,
|
|
on the standby there is no choice: the WAL-logged action already occurred
|
|
on the primary so the standby must not fail to apply it. Furthermore,
|
|
allowing WAL application to wait indefinitely may be very undesirable,
|
|
because the standby's state will become increasingly far behind the
|
|
primary's. Therefore, a mechanism is provided to forcibly cancel standby
|
|
queries that conflict with to-be-applied WAL records.
|
|
</para>
|
|
|
|
<para>
|
|
An example of the problem situation is an administrator on the primary
|
|
server running <command>DROP TABLE</> on a table that is currently being
|
|
queried on the standby server. Clearly the standby query cannot continue
|
|
if the <command>DROP TABLE</> is applied on the standby. If this situation
|
|
occurred on the primary, the <command>DROP TABLE</> would wait until the
|
|
other query had finished. But when <command>DROP TABLE</> is run on the
|
|
primary, the primary doesn't have information about what queries are
|
|
running on the standby, so it will not wait for any such standby
|
|
queries. The WAL change records come through to the standby while the
|
|
standby query is still running, causing a conflict. The standby server
|
|
must either delay application of the WAL records (and everything after
|
|
them, too) or else cancel the conflicting query so that the <command>DROP
|
|
TABLE</> can be applied.
|
|
</para>
|
|
|
|
<para>
|
|
When a conflicting query is short, it's typically desirable to allow it to
|
|
complete by delaying WAL application for a little bit; but a long delay in
|
|
WAL application is usually not desirable. So the cancel mechanism has
|
|
parameters, <xref linkend="guc-max-standby-archive-delay"> and <xref
|
|
linkend="guc-max-standby-streaming-delay">, that define the maximum
|
|
allowed delay in WAL application. Conflicting queries will be canceled
|
|
once it has taken longer than the relevant delay setting to apply any
|
|
newly-received WAL data. There are two parameters so that different delay
|
|
values can be specified for the case of reading WAL data from an archive
|
|
(i.e., initial recovery from a base backup or <quote>catching up</> a
|
|
standby server that has fallen far behind) versus reading WAL data via
|
|
streaming replication.
|
|
</para>
|
|
|
|
<para>
|
|
In a standby server that exists primarily for high availability, it's
|
|
best to set the delay parameters relatively short, so that the server
|
|
cannot fall far behind the primary due to delays caused by standby
|
|
queries. However, if the standby server is meant for executing
|
|
long-running queries, then a high or even infinite delay value may be
|
|
preferable. Keep in mind however that a long-running query could
|
|
cause other sessions on the standby server to not see recent changes
|
|
on the primary, if it delays application of WAL records.
|
|
</para>
|
|
|
|
<para>
|
|
Once the delay specified by <varname>max_standby_archive_delay</> or
|
|
<varname>max_standby_streaming_delay</> has been exceeded, conflicting
|
|
queries will be canceled. This usually results just in a cancellation
|
|
error, although in the case of replaying a <command>DROP DATABASE</>
|
|
the entire conflicting session will be terminated. Also, if the conflict
|
|
is over a lock held by an idle transaction, the conflicting session is
|
|
terminated (this behavior might change in the future).
|
|
</para>
|
|
|
|
<para>
|
|
Canceled queries may be retried immediately (after beginning a new
|
|
transaction, of course). Since query cancellation depends on
|
|
the nature of the WAL records being replayed, a query that was
|
|
canceled may well succeed if it is executed again.
|
|
</para>
|
|
|
|
<para>
|
|
Keep in mind that the delay parameters are compared to the elapsed time
|
|
since the WAL data was received by the standby server. Thus, the grace
|
|
period allowed to any one query on the standby is never more than the
|
|
delay parameter, and could be considerably less if the standby has already
|
|
fallen behind as a result of waiting for previous queries to complete, or
|
|
as a result of being unable to keep up with a heavy update load.
|
|
</para>
|
|
|
|
<para>
|
|
The most common reason for conflict between standby queries and WAL replay
|
|
is <quote>early cleanup</>. Normally, <productname>PostgreSQL</> allows
|
|
cleanup of old row versions when there are no transactions that need to
|
|
see them to ensure correct visibility of data according to MVCC rules.
|
|
However, this rule can only be applied for transactions executing on the
|
|
master. So it is possible that cleanup on the master will remove row
|
|
versions that are still visible to a transaction on the standby.
|
|
</para>
|
|
|
|
<para>
|
|
Experienced users should note that both row version cleanup and row version
|
|
freezing will potentially conflict with standby queries. Running a manual
|
|
<command>VACUUM FREEZE</> is likely to cause conflicts even on tables with
|
|
no updated or deleted rows.
|
|
</para>
|
|
|
|
<para>
|
|
Users should be clear that tables that are regularly and heavily updated
|
|
on the primary server will quickly cause cancellation of longer running
|
|
queries on the standby. In such cases the setting of a finite value for
|
|
<varname>max_standby_archive_delay</> or
|
|
<varname>max_standby_streaming_delay</> can be considered similar to
|
|
setting <varname>statement_timeout</>.
|
|
</para>
|
|
|
|
<para>
|
|
Remedial possibilities exist if the number of standby-query cancellations
|
|
is found to be unacceptable. The first option is to set the parameter
|
|
<varname>hot_standby_feedback</>, which prevents <command>VACUUM</> from
|
|
removing recently-dead rows and so cleanup conflicts do not occur.
|
|
If you do this, you
|
|
should note that this will delay cleanup of dead rows on the primary,
|
|
which may result in undesirable table bloat. However, the cleanup
|
|
situation will be no worse than if the standby queries were running
|
|
directly on the primary server, and you are still getting the benefit of
|
|
off-loading execution onto the standby.
|
|
If standby servers connect and disconnect frequently, you
|
|
might want to make adjustments to handle the period when
|
|
<varname>hot_standby_feedback</> feedback is not being provided.
|
|
For example, consider increasing <varname>max_standby_archive_delay</>
|
|
so that queries are not rapidly canceled by conflicts in WAL archive
|
|
files during disconnected periods. You should also consider increasing
|
|
<varname>max_standby_streaming_delay</> to avoid rapid cancellations
|
|
by newly-arrived streaming WAL entries after reconnection.
|
|
</para>
|
|
|
|
<para>
|
|
Another option is to increase <xref linkend="guc-vacuum-defer-cleanup-age">
|
|
on the primary server, so that dead rows will not be cleaned up as quickly
|
|
as they normally would be. This will allow more time for queries to
|
|
execute before they are canceled on the standby, without having to set
|
|
a high <varname>max_standby_streaming_delay</>. However it is
|
|
difficult to guarantee any specific execution-time window with this
|
|
approach, since <varname>vacuum_defer_cleanup_age</> is measured in
|
|
transactions executed on the primary server.
|
|
</para>
|
|
|
|
<para>
|
|
The number of query cancels and the reason for them can be viewed using
|
|
the <structname>pg_stat_database_conflicts</> system view on the standby
|
|
server. The <structname>pg_stat_database</> system view also contains
|
|
summary information.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="hot-standby-admin">
|
|
<title>Administrator's Overview</title>
|
|
|
|
<para>
|
|
If <varname>hot_standby</> is turned <literal>on</> in
|
|
<filename>postgresql.conf</> and there is a <filename>recovery.conf</>
|
|
file present, the server will run in Hot Standby mode.
|
|
However, it may take some time for Hot Standby connections to be allowed,
|
|
because the server will not accept connections until it has completed
|
|
sufficient recovery to provide a consistent state against which queries
|
|
can run. During this period,
|
|
clients that attempt to connect will be refused with an error message.
|
|
To confirm the server has come up, either loop trying to connect from
|
|
the application, or look for these messages in the server logs:
|
|
|
|
<programlisting>
|
|
LOG: entering standby mode
|
|
|
|
... then some time later ...
|
|
|
|
LOG: consistent recovery state reached
|
|
LOG: database system is ready to accept read only connections
|
|
</programlisting>
|
|
|
|
Consistency information is recorded once per checkpoint on the primary.
|
|
It is not possible to enable hot standby when reading WAL
|
|
written during a period when <varname>wal_level</> was not set to
|
|
<literal>replica</> or <literal>logical</> on the primary. Reaching
|
|
a consistent state can also be delayed in the presence of both of these
|
|
conditions:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
A write transaction has more than 64 subtransactions
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Very long-lived write transactions
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
If you are running file-based log shipping ("warm standby"), you might need
|
|
to wait until the next WAL file arrives, which could be as long as the
|
|
<varname>archive_timeout</> setting on the primary.
|
|
</para>
|
|
|
|
<para>
|
|
The setting of some parameters on the standby will need reconfiguration
|
|
if they have been changed on the primary. For these parameters,
|
|
the value on the standby must
|
|
be equal to or greater than the value on the primary. If these parameters
|
|
are not set high enough then the standby will refuse to start.
|
|
Higher values can then be supplied and the server
|
|
restarted to begin recovery again. These parameters are:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<varname>max_connections</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<varname>max_prepared_transactions</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<varname>max_locks_per_transaction</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<varname>max_worker_processes</>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
It is important that the administrator select appropriate settings for
|
|
<xref linkend="guc-max-standby-archive-delay"> and <xref
|
|
linkend="guc-max-standby-streaming-delay">. The best choices vary
|
|
depending on business priorities. For example if the server is primarily
|
|
tasked as a High Availability server, then you will want low delay
|
|
settings, perhaps even zero, though that is a very aggressive setting. If
|
|
the standby server is tasked as an additional server for decision support
|
|
queries then it might be acceptable to set the maximum delay values to
|
|
many hours, or even -1 which means wait forever for queries to complete.
|
|
</para>
|
|
|
|
<para>
|
|
Transaction status "hint bits" written on the primary are not WAL-logged,
|
|
so data on the standby will likely re-write the hints again on the standby.
|
|
Thus, the standby server will still perform disk writes even though
|
|
all users are read-only; no changes occur to the data values
|
|
themselves. Users will still write large sort temporary files and
|
|
re-generate relcache info files, so no part of the database
|
|
is truly read-only during hot standby mode.
|
|
Note also that writes to remote databases using
|
|
<application>dblink</application> module, and other operations outside the
|
|
database using PL functions will still be possible, even though the
|
|
transaction is read-only locally.
|
|
</para>
|
|
|
|
<para>
|
|
The following types of administration commands are not accepted
|
|
during recovery mode:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Data Definition Language (DDL) - e.g., <command>CREATE INDEX</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Privilege and Ownership - <command>GRANT</>, <command>REVOKE</>,
|
|
<command>REASSIGN</>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Maintenance commands - <command>ANALYZE</>, <command>VACUUM</>,
|
|
<command>CLUSTER</>, <command>REINDEX</>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
Again, note that some of these commands are actually allowed during
|
|
"read only" mode transactions on the primary.
|
|
</para>
|
|
|
|
<para>
|
|
As a result, you cannot create additional indexes that exist solely
|
|
on the standby, nor statistics that exist solely on the standby.
|
|
If these administration commands are needed, they should be executed
|
|
on the primary, and eventually those changes will propagate to the
|
|
standby.
|
|
</para>
|
|
|
|
<para>
|
|
<function>pg_cancel_backend()</>
|
|
and <function>pg_terminate_backend()</> will work on user backends,
|
|
but not the Startup process, which performs
|
|
recovery. <structname>pg_stat_activity</structname> does not show an
|
|
entry for the Startup process, nor do recovering transactions show
|
|
as active. As a result, <structname>pg_prepared_xacts</structname>
|
|
is always empty during recovery. If you wish to resolve in-doubt
|
|
prepared transactions, view <literal>pg_prepared_xacts</> on the
|
|
primary and issue commands to resolve transactions there.
|
|
</para>
|
|
|
|
<para>
|
|
<structname>pg_locks</structname> will show locks held by backends,
|
|
as normal. <structname>pg_locks</structname> also shows
|
|
a virtual transaction managed by the Startup process that owns all
|
|
<literal>AccessExclusiveLocks</> held by transactions being replayed by recovery.
|
|
Note that the Startup process does not acquire locks to
|
|
make database changes, and thus locks other than <literal>AccessExclusiveLocks</>
|
|
do not show in <structname>pg_locks</structname> for the Startup
|
|
process; they are just presumed to exist.
|
|
</para>
|
|
|
|
<para>
|
|
The <productname>Nagios</> plugin <productname>check_pgsql</> will
|
|
work, because the simple information it checks for exists.
|
|
The <productname>check_postgres</> monitoring script will also work,
|
|
though some reported values could give different or confusing results.
|
|
For example, last vacuum time will not be maintained, since no
|
|
vacuum occurs on the standby. Vacuums running on the primary
|
|
do still send their changes to the standby.
|
|
</para>
|
|
|
|
<para>
|
|
WAL file control commands will not work during recovery,
|
|
e.g., <function>pg_start_backup</>, <function>pg_switch_xlog</> etc.
|
|
</para>
|
|
|
|
<para>
|
|
Dynamically loadable modules work, including <structname>pg_stat_statements</>.
|
|
</para>
|
|
|
|
<para>
|
|
Advisory locks work normally in recovery, including deadlock detection.
|
|
Note that advisory locks are never WAL logged, so it is impossible for
|
|
an advisory lock on either the primary or the standby to conflict with WAL
|
|
replay. Nor is it possible to acquire an advisory lock on the primary
|
|
and have it initiate a similar advisory lock on the standby. Advisory
|
|
locks relate only to the server on which they are acquired.
|
|
</para>
|
|
|
|
<para>
|
|
Trigger-based replication systems such as <productname>Slony</>,
|
|
<productname>Londiste</> and <productname>Bucardo</> won't run on the
|
|
standby at all, though they will run happily on the primary server as
|
|
long as the changes are not sent to standby servers to be applied.
|
|
WAL replay is not trigger-based so you cannot relay from the
|
|
standby to any system that requires additional database writes or
|
|
relies on the use of triggers.
|
|
</para>
|
|
|
|
<para>
|
|
New OIDs cannot be assigned, though some <acronym>UUID</> generators may still
|
|
work as long as they do not rely on writing new status to the database.
|
|
</para>
|
|
|
|
<para>
|
|
Currently, temporary table creation is not allowed during read only
|
|
transactions, so in some cases existing scripts will not run correctly.
|
|
This restriction might be relaxed in a later release. This is
|
|
both a SQL Standard compliance issue and a technical issue.
|
|
</para>
|
|
|
|
<para>
|
|
<command>DROP TABLESPACE</> can only succeed if the tablespace is empty.
|
|
Some standby users may be actively using the tablespace via their
|
|
<varname>temp_tablespaces</> parameter. If there are temporary files in the
|
|
tablespace, all active queries are canceled to ensure that temporary
|
|
files are removed, so the tablespace can be removed and WAL replay
|
|
can continue.
|
|
</para>
|
|
|
|
<para>
|
|
Running <command>DROP DATABASE</> or <command>ALTER DATABASE ... SET
|
|
TABLESPACE</> on the primary
|
|
will generate a WAL entry that will cause all users connected to that
|
|
database on the standby to be forcibly disconnected. This action occurs
|
|
immediately, whatever the setting of
|
|
<varname>max_standby_streaming_delay</>. Note that
|
|
<command>ALTER DATABASE ... RENAME</> does not disconnect users, which
|
|
in most cases will go unnoticed, though might in some cases cause a
|
|
program confusion if it depends in some way upon database name.
|
|
</para>
|
|
|
|
<para>
|
|
In normal (non-recovery) mode, if you issue <command>DROP USER</> or <command>DROP ROLE</>
|
|
for a role with login capability while that user is still connected then
|
|
nothing happens to the connected user - they remain connected. The user cannot
|
|
reconnect however. This behavior applies in recovery also, so a
|
|
<command>DROP USER</> on the primary does not disconnect that user on the standby.
|
|
</para>
|
|
|
|
<para>
|
|
The statistics collector is active during recovery. All scans, reads, blocks,
|
|
index usage, etc., will be recorded normally on the standby. Replayed
|
|
actions will not duplicate their effects on primary, so replaying an
|
|
insert will not increment the Inserts column of pg_stat_user_tables.
|
|
The stats file is deleted at the start of recovery, so stats from primary
|
|
and standby will differ; this is considered a feature, not a bug.
|
|
</para>
|
|
|
|
<para>
|
|
Autovacuum is not active during recovery. It will start normally at the
|
|
end of recovery.
|
|
</para>
|
|
|
|
<para>
|
|
The checkpointer process and the background writer process are active during
|
|
recovery. The checkpointer process will perform restartpoints (similar to
|
|
checkpoints on the primary) and the background writer process will perform
|
|
normal block cleaning activities. This can include updates of the hint bit
|
|
information stored on the standby server.
|
|
The <command>CHECKPOINT</> command is accepted during recovery,
|
|
though it performs a restartpoint rather than a new checkpoint.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="hot-standby-parameters">
|
|
<title>Hot Standby Parameter Reference</title>
|
|
|
|
<para>
|
|
Various parameters have been mentioned above in
|
|
<xref linkend="hot-standby-conflict"> and
|
|
<xref linkend="hot-standby-admin">.
|
|
</para>
|
|
|
|
<para>
|
|
On the primary, parameters <xref linkend="guc-wal-level"> and
|
|
<xref linkend="guc-vacuum-defer-cleanup-age"> can be used.
|
|
<xref linkend="guc-max-standby-archive-delay"> and
|
|
<xref linkend="guc-max-standby-streaming-delay"> have no effect if set on
|
|
the primary.
|
|
</para>
|
|
|
|
<para>
|
|
On the standby, parameters <xref linkend="guc-hot-standby">,
|
|
<xref linkend="guc-max-standby-archive-delay"> and
|
|
<xref linkend="guc-max-standby-streaming-delay"> can be used.
|
|
<xref linkend="guc-vacuum-defer-cleanup-age"> has no effect
|
|
as long as the server remains in standby mode, though it will
|
|
become relevant if the standby becomes primary.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="hot-standby-caveats">
|
|
<title>Caveats</title>
|
|
|
|
<para>
|
|
There are several limitations of Hot Standby.
|
|
These can and probably will be fixed in future releases:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Operations on hash indexes are not presently WAL-logged, so
|
|
replay will not update these indexes.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Full knowledge of running transactions is required before snapshots
|
|
can be taken. Transactions that use large numbers of subtransactions
|
|
(currently greater than 64) will delay the start of read only
|
|
connections until the completion of the longest running write transaction.
|
|
If this situation occurs, explanatory messages will be sent to the server log.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Valid starting points for standby queries are generated at each
|
|
checkpoint on the master. If the standby is shut down while the master
|
|
is in a shutdown state, it might not be possible to re-enter Hot Standby
|
|
until the primary is started up, so that it generates further starting
|
|
points in the WAL logs. This situation isn't a problem in the most
|
|
common situations where it might happen. Generally, if the primary is
|
|
shut down and not available anymore, that's likely due to a serious
|
|
failure that requires the standby being converted to operate as
|
|
the new primary anyway. And in situations where the primary is
|
|
being intentionally taken down, coordinating to make sure the standby
|
|
becomes the new primary smoothly is also standard procedure.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
At the end of recovery, <literal>AccessExclusiveLocks</> held by prepared transactions
|
|
will require twice the normal number of lock table entries. If you plan
|
|
on running either a large number of concurrent prepared transactions
|
|
that normally take <literal>AccessExclusiveLocks</>, or you plan on having one
|
|
large transaction that takes many <literal>AccessExclusiveLocks</>, you are
|
|
advised to select a larger value of <varname>max_locks_per_transaction</>,
|
|
perhaps as much as twice the value of the parameter on
|
|
the primary server. You need not consider this at all if
|
|
your setting of <varname>max_prepared_transactions</> is 0.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The Serializable transaction isolation level is not yet available in hot
|
|
standby. (See <xref linkend="xact-serializable"> and
|
|
<xref linkend="serializable-consistency"> for details.)
|
|
An attempt to set a transaction to the serializable isolation level in
|
|
hot standby mode will generate an error.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|