mirror of
https://github.com/postgres/postgres.git
synced 2025-05-28 05:21:27 +03:00
472 lines
18 KiB
Plaintext
472 lines
18 KiB
Plaintext
<!--
|
|
$Header: /cvsroot/pgsql/doc/src/sgml/backup.sgml,v 2.30 2003/08/31 17:32:18 petere Exp $
|
|
-->
|
|
<chapter id="backup">
|
|
<title>Backup and Restore</title>
|
|
|
|
<indexterm zone="backup"><primary>backup</></>
|
|
|
|
<para>
|
|
As everything that contains valuable data, <productname>PostgreSQL</>
|
|
databases should be backed up regularly. While the procedure is
|
|
essentially simple, it is important to have a basic understanding of
|
|
the underlying techniques and assumptions.
|
|
</para>
|
|
|
|
<para>
|
|
There are two fundamentally different approaches to backing up
|
|
<productname>PostgreSQL</> data:
|
|
<itemizedlist>
|
|
<listitem><para><acronym>SQL</> dump</para></listitem>
|
|
<listitem><para>File system level backup</para></listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<sect1 id="backup-dump">
|
|
<title><acronym>SQL</> Dump</title>
|
|
|
|
<para>
|
|
The idea behind the SQL-dump method is to generate a text file with SQL
|
|
commands that, when fed back to the server, will recreate the
|
|
database in the same state as it was at the time of the dump.
|
|
<productname>PostgreSQL</> provides the utility program
|
|
<application>pg_dump</> for this purpose. The basic usage of this
|
|
command is:
|
|
<synopsis>
|
|
pg_dump <replaceable class="parameter">dbname</replaceable> > <replaceable class="parameter">outfile</replaceable>
|
|
</synopsis>
|
|
As you see, <application>pg_dump</> writes its results to the
|
|
standard output. We will see below how this can be useful.
|
|
</para>
|
|
|
|
<para>
|
|
<application>pg_dump</> is a regular <productname>PostgreSQL</>
|
|
client application (albeit a particularly clever one). This means
|
|
that you can do this backup procedure from any remote host that has
|
|
access to the database. But remember that <application>pg_dump</>
|
|
does not operate with special permissions. In particular, you must
|
|
have read access to all tables that you want to back up, so in
|
|
practice you almost always have to be a database superuser.
|
|
</para>
|
|
|
|
<para>
|
|
To specify which database server <application>pg_dump</> should
|
|
contact, use the command line options <option>-h
|
|
<replaceable>host</></> and <option>-p <replaceable>port</></>. The
|
|
default host is the local host or whatever your
|
|
<envar>PGHOST</envar> environment variable specifies. Similarly,
|
|
the default port is indicated by the <envar>PGPORT</envar>
|
|
environment variable or, failing that, by the compiled-in default.
|
|
(Conveniently, the server will normally have the same compiled-in
|
|
default.)
|
|
</para>
|
|
|
|
<para>
|
|
As any other <productname>PostgreSQL</> client application,
|
|
<application>pg_dump</> will by default connect with the database
|
|
user name that is equal to the current operating system user name. To override
|
|
this, either specify the <option>-U</option> option or set the
|
|
environment variable <envar>PGUSER</envar>. Remember that
|
|
<application>pg_dump</> connections are subject to the normal
|
|
client authentication mechanisms (which are described in <xref
|
|
linkend="client-authentication">).
|
|
</para>
|
|
|
|
<para>
|
|
Dumps created by <application>pg_dump</> are internally consistent,
|
|
that is, updates to the database while <application>pg_dump</> is
|
|
running will not be in the dump. <application>pg_dump</> does not
|
|
block other operations on the database while it is working.
|
|
(Exceptions are those operations that need to operate with an
|
|
exclusive lock, such as <command>VACUUM FULL</command>.)
|
|
</para>
|
|
|
|
<important>
|
|
<para>
|
|
When your database schema relies on OIDs (for instance as foreign
|
|
keys) you must instruct <application>pg_dump</> to dump the OIDs
|
|
as well. To do this, use the <option>-o</option> command line
|
|
option. <quote>Large objects</> are not dumped by default, either.
|
|
See <application>pg_dump</>'s command reference page if you use
|
|
large objects.
|
|
</para>
|
|
</important>
|
|
|
|
<sect2 id="backup-dump-restore">
|
|
<title>Restoring the dump</title>
|
|
|
|
<para>
|
|
The text files created by <application>pg_dump</> are intended to
|
|
be read in by the <application>psql</application> program. The
|
|
general command form to restore a dump is
|
|
<synopsis>
|
|
psql <replaceable class="parameter">dbname</replaceable> < <replaceable class="parameter">infile</replaceable>
|
|
</synopsis>
|
|
where <replaceable class="parameter">infile</replaceable> is what
|
|
you used as <replaceable class="parameter">outfile</replaceable>
|
|
for the <command>pg_dump</> command. The database <replaceable
|
|
class="parameter">dbname</replaceable> will not be created by this
|
|
command, you must create it yourself from <literal>template0</> before executing
|
|
<application>psql</> (e.g., with <literal>createdb -T template0
|
|
<replaceable class="parameter">dbname</></literal>).
|
|
<application>psql</> supports similar options to <application>pg_dump</>
|
|
for controlling the database server location and the user name. See
|
|
its reference page for more information.
|
|
</para>
|
|
|
|
<para>
|
|
If the objects in the original database were owned by different
|
|
users, then the dump will instruct <application>psql</> to connect
|
|
as each affected user in turn and then create the relevant
|
|
objects. This way the original ownership is preserved. This also
|
|
means, however, that all these users must already exist, and
|
|
furthermore that you must be allowed to connect as each of them.
|
|
It might therefore be necessary to temporarily relax the client
|
|
authentication settings.
|
|
</para>
|
|
|
|
<para>
|
|
Once restored, it is wise to run <command>ANALYZE</> on each
|
|
database so the optimizer has useful statistics. You
|
|
can also run <command>vacuumdb -a -z</> to <command>ANALYZE</> all
|
|
databases.
|
|
</para>
|
|
|
|
<para>
|
|
The ability of <application>pg_dump</> and <application>psql</> to
|
|
write to or read from pipes makes it possible to dump a database
|
|
directly from one server to another; for example:
|
|
<programlisting>
|
|
pg_dump -h <replaceable>host1</> <replaceable>dbname</> | psql -h <replaceable>host2</> <replaceable>dbname</>
|
|
</programlisting>
|
|
</para>
|
|
|
|
<important>
|
|
<para>
|
|
The dumps produced by <application>pg_dump</> are relative to
|
|
<literal>template0</>. This means that any languages, procedures,
|
|
etc. added to <literal>template1</> will also be dumped by
|
|
<application>pg_dump</>. As a result, when restoring, if you are
|
|
using a customized <literal>template1</>, you must create the
|
|
empty database from <literal>template0</>, as in the example
|
|
above.
|
|
</para>
|
|
</important>
|
|
|
|
<tip>
|
|
<para>
|
|
Restore performance can be improved by increasing <literal>SORT_MEM</>
|
|
(see <xref linkend="runtime-config-resource-memory">).
|
|
</para>
|
|
</tip>
|
|
</sect2>
|
|
|
|
<sect2 id="backup-dump-all">
|
|
<title>Using <command>pg_dumpall</></title>
|
|
|
|
<para>
|
|
The above mechanism is cumbersome and inappropriate when backing
|
|
up an entire database cluster. For this reason the
|
|
<application>pg_dumpall</> program is provided.
|
|
<application>pg_dumpall</> backs up each database in a given
|
|
cluster, and also preserves cluster-wide data such as
|
|
users and groups. The call sequence for
|
|
<application>pg_dumpall</> is simply
|
|
<synopsis>
|
|
pg_dumpall > <replaceable>outfile</>
|
|
</synopsis>
|
|
The resulting dump can be restored with <application>psql</>:
|
|
<synopsis>
|
|
psql template1 < <replaceable class="parameter">infile</replaceable>
|
|
</synopsis>
|
|
(Actually, you can specify any existing database name to start from,
|
|
but if you are reloading in an empty cluster then <literal>template1</>
|
|
is the only available choice.) It is always necessary to have
|
|
database superuser access when restoring a <application>pg_dumpall</>
|
|
dump, as that is required to restore the user and group information.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="backup-dump-large">
|
|
<title>Large Databases</title>
|
|
|
|
<para>
|
|
Since <productname>PostgreSQL</productname> allows tables larger
|
|
than the maximum file size on your system, it can be problematic
|
|
to dump such a table to a file, since the resulting file will likely
|
|
be larger than the maximum size allowed by your system. As
|
|
<application>pg_dump</> can write to the standard output, you can
|
|
just use standard Unix tools to work around this possible problem.
|
|
</para>
|
|
|
|
<formalpara>
|
|
<title>Use compressed dumps.</title>
|
|
<para>
|
|
You can use your favorite compression program, for example
|
|
<application>gzip</application>.
|
|
|
|
<programlisting>
|
|
pg_dump <replaceable class="parameter">dbname</replaceable> | gzip > <replaceable class="parameter">filename</replaceable>.gz
|
|
</programlisting>
|
|
|
|
Reload with
|
|
|
|
<programlisting>
|
|
createdb <replaceable class="parameter">dbname</replaceable>
|
|
gunzip -c <replaceable class="parameter">filename</replaceable>.gz | psql <replaceable class="parameter">dbname</replaceable>
|
|
</programlisting>
|
|
|
|
or
|
|
|
|
<programlisting>
|
|
cat <replaceable class="parameter">filename</replaceable>.gz | gunzip | psql <replaceable class="parameter">dbname</replaceable>
|
|
</programlisting>
|
|
</para>
|
|
</formalpara>
|
|
|
|
<formalpara>
|
|
<title>Use <command>split</>.</title>
|
|
<para>
|
|
The <command>split</command> command
|
|
allows you to split the output into pieces that are
|
|
acceptable in size to the underlying file system. For example, to
|
|
make chunks of 1 megabyte:
|
|
|
|
<programlisting>
|
|
pg_dump <replaceable class="parameter">dbname</replaceable> | split -b 1m - <replaceable class="parameter">filename</replaceable>
|
|
</programlisting>
|
|
|
|
Reload with
|
|
|
|
<programlisting>
|
|
createdb <replaceable class="parameter">dbname</replaceable>
|
|
cat <replaceable class="parameter">filename</replaceable>* | psql <replaceable class="parameter">dbname</replaceable>
|
|
</programlisting>
|
|
</para>
|
|
</formalpara>
|
|
|
|
<formalpara>
|
|
<title>Use the custom dump format.</title>
|
|
<para>
|
|
If <productname>PostgreSQL</productname> was built on a system with the <application>zlib</> compression library
|
|
installed, the custom dump format will compress data as it writes it
|
|
to the output file. For large databases, this will produce similar dump
|
|
sizes to using <command>gzip</command>, but has the added advantage that the tables can be
|
|
restored selectively. The following command dumps a database using the
|
|
custom dump format:
|
|
|
|
<programlisting>
|
|
pg_dump -Fc <replaceable class="parameter">dbname</replaceable> > <replaceable class="parameter">filename</replaceable>
|
|
</programlisting>
|
|
|
|
See the <application>pg_dump</> and <application>pg_restore</> reference pages for details.
|
|
</para>
|
|
</formalpara>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="backup-dump-caveats">
|
|
<title>Caveats</title>
|
|
|
|
<para>
|
|
<application>pg_dump</> (and by implication
|
|
<application>pg_dumpall</>) has a few limitations which stem from
|
|
the difficulty of reconstructing certain information from the system
|
|
catalogs.
|
|
</para>
|
|
|
|
<para>
|
|
Specifically, the order in which <application>pg_dump</> writes
|
|
the objects is not very sophisticated. This can lead to problems
|
|
for example when functions are used as column default values. The
|
|
only answer is to manually reorder the dump. If you created
|
|
circular dependencies in your schema then you will have more work
|
|
to do.
|
|
</para>
|
|
|
|
<para>
|
|
For reasons of backward compatibility, <application>pg_dump</>
|
|
does not dump large objects by default.<indexterm><primary>large
|
|
object</primary><secondary>backup</secondary></indexterm> To dump
|
|
large objects you must use either the custom or the TAR output
|
|
format, and use the <option>-b</> option in
|
|
<application>pg_dump</>. See the reference pages for details. The
|
|
directory <filename>contrib/pg_dumplo</> of the
|
|
<productname>PostgreSQL</> source tree also contains a program
|
|
that can dump large objects.
|
|
</para>
|
|
|
|
<para>
|
|
Please familiarize yourself with the
|
|
<citerefentry><refentrytitle>pg_dump</></> reference page.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="backup-file">
|
|
<title>File system level backup</title>
|
|
|
|
<para>
|
|
An alternative backup strategy is to directly copy the files that
|
|
<productname>PostgreSQL</> uses to store the data in the database. In
|
|
<xref linkend="creating-cluster"> it is explained where these files
|
|
are located, but you have probably found them already if you are
|
|
interested in this method. You can use whatever method you prefer
|
|
for doing usual file system backups, for example
|
|
|
|
<programlisting>
|
|
tar -cf backup.tar /usr/local/pgsql/data
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
There are two restrictions, however, which make this method
|
|
impractical, or at least inferior to the <application>pg_dump</>
|
|
method:
|
|
|
|
<orderedlist>
|
|
<listitem>
|
|
<para>
|
|
The database server <emphasis>must</> be shut down in order to
|
|
get a usable backup. Half-way measures such as disallowing all
|
|
connections will not work as there is always some buffering
|
|
going on. For this reason it is also not advisable to trust file
|
|
systems that claim to support <quote>consistent
|
|
snapshots</quote>. Information about stopping the server can be
|
|
found in <xref linkend="postmaster-shutdown">.
|
|
</para>
|
|
|
|
<para>
|
|
Needless to say that you also need to shut down the server
|
|
before restoring the data.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
If you have dug into the details of the file system layout of the data you
|
|
may be tempted to try to back up or restore only certain
|
|
individual tables or databases from their respective files or
|
|
directories. This will <emphasis>not</> work because the
|
|
information contained in these files contains only half the
|
|
truth. The other half is in the commit log files
|
|
<filename>pg_clog/*</filename>, which contain the commit status of
|
|
all transactions. A table file is only usable with this
|
|
information. Of course it is also impossible to restore only a
|
|
table and the associated <filename>pg_clog</filename> data
|
|
because that would render all other tables in the database
|
|
cluster useless.
|
|
</para>
|
|
</listitem>
|
|
</orderedlist>
|
|
</para>
|
|
|
|
<para>
|
|
Also note that the file system backup will not necessarily be
|
|
smaller than an SQL dump. On the contrary, it will most likely be
|
|
larger. (<application>pg_dump</application> does not need to dump
|
|
the contents of indexes for example, just the commands to recreate
|
|
them.)
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="migration">
|
|
<title>Migration between releases</title>
|
|
|
|
<indexterm zone="migration">
|
|
<primary>upgrading</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="migration">
|
|
<primary>version</primary>
|
|
<secondary>compatibility</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
As a general rule, the internal data storage format is subject to
|
|
change between releases of <productname>PostgreSQL</>. This does not
|
|
apply to different <quote>patch levels</quote>, these always have
|
|
compatible storage formats. For example, releases 7.0.1, 7.1.2, and
|
|
7.2 are not compatible, whereas 7.1.1 and 7.1.2 are. When you
|
|
update between compatible versions, then you can simply reuse the
|
|
data area in disk by the new executables. Otherwise you need to
|
|
<quote>back up</> your data and <quote>restore</> it on the new
|
|
server, using <application>pg_dump</>. (There are checks in place
|
|
that prevent you from doing the wrong thing, so no harm can be done
|
|
by confusing these things.) The precise installation procedure is
|
|
not subject of this section; these details are in <xref linkend="installation">.
|
|
</para>
|
|
|
|
<para>
|
|
The least downtime can be achieved by installing the new server in
|
|
a different directory and running both the old and the new servers
|
|
in parallel, on different ports. Then you can use something like
|
|
|
|
<programlisting>
|
|
pg_dumpall -p 5432 | psql -d template1 -p 6543
|
|
</programlisting>
|
|
|
|
to transfer your data. Or use an intermediate file if you want.
|
|
Then you can shut down the old server and start the new server at
|
|
the port the old one was running at. You should make sure that the
|
|
database is not updated after you run <application>pg_dumpall</>,
|
|
otherwise you will obviously lose that data. See <xref
|
|
linkend="client-authentication"> for information on how to prohibit
|
|
access. In practice you probably want to test your client
|
|
applications on the new setup before switching over.
|
|
</para>
|
|
|
|
<para>
|
|
If you cannot or do not want to run two servers in parallel you can
|
|
do the back up step before installing the new version, bring down
|
|
the server, move the old version out of the way, install the new
|
|
version, start the new server, restore the data. For example:
|
|
|
|
<programlisting>
|
|
pg_dumpall > backup
|
|
pg_ctl stop
|
|
mv /usr/local/pgsql /usr/local/pgsql.old
|
|
cd ~/postgresql-&version;
|
|
gmake install
|
|
initdb -D /usr/local/pgsql/data
|
|
postmaster -D /usr/local/pgsql/data
|
|
psql template1 < backup
|
|
</programlisting>
|
|
|
|
See <xref linkend="runtime"> about ways to start and stop the
|
|
server and other details. The installation instructions will advise
|
|
you of strategic places to perform these steps.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
When you <quote>move the old installation out of the way</quote>
|
|
it is no longer perfectly usable. Some parts of the installation
|
|
contain information about where the other parts are located. This
|
|
is usually not a big problem but if you plan on using two
|
|
installations in parallel for a while you should assign them
|
|
different installation directories at build time.
|
|
</para>
|
|
</note>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode:sgml
|
|
sgml-omittag:nil
|
|
sgml-shorttag:t
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-tabs-mode:nil
|
|
sgml-indent-data:t
|
|
sgml-parent-document:nil
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:("/usr/share/sgml/catalog")
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
-->
|