mirror of
https://github.com/postgres/postgres.git
synced 2025-07-28 23:42:10 +03:00
Minor improvements in backup and recovery:
- create a separate archive_mode GUC, on which archive_command is dependent - %r option in recovery.conf sends last restartpoint to recovery command - %r used in pg_standby, updated README - minor other code cleanup in pg_standby - doc on Warm Standby now mentions pg_standby and %r - log_restartpoints recovery option emits LOG message at each restartpoint - end of recovery now displays last transaction end time, as requested by Warren Little; also shown at each restartpoint - restart archiver if needed to carry away WAL files at shutdown Simon Riggs
This commit is contained in:
@ -2,16 +2,18 @@ pg_standby README 2006/12/08 Simon Riggs
|
||||
|
||||
o What is pg_standby?
|
||||
|
||||
pg_standby is a production-ready program that can be used to
|
||||
create a Warm Standby server. Other configuration is required
|
||||
as well, all of which is described in the main server manual.
|
||||
pg_standby allows the creation of a Warm Standby server.
|
||||
It is designed to be a production-ready program, as well as a
|
||||
customisable template should you require specific modifications.
|
||||
Other configuration is required as well, all of which is
|
||||
described in the main server manual.
|
||||
|
||||
The program is designed to be a wait-for restore_command,
|
||||
required to turn a normal archive recovery into a Warm Standby.
|
||||
Within the restore_command of the recovery.conf you could
|
||||
configure pg_standby in the following way:
|
||||
|
||||
restore_command = 'pg_standby archiveDir %f %p'
|
||||
restore_command = 'pg_standby archiveDir %f %p %r'
|
||||
|
||||
which would be sufficient to define that files will be restored
|
||||
from archiveDir.
|
||||
@ -42,18 +44,27 @@ o How to use pg_standby?
|
||||
|
||||
The basic usage should be like this:
|
||||
|
||||
restore_command = 'pg_standby archiveDir %f %p'
|
||||
restore_command = 'pg_standby archiveDir %f %p %r'
|
||||
|
||||
with the pg_standby command usage as
|
||||
|
||||
pg_standby [OPTION]... [ARCHIVELOCATION] [NEXTWALFILE] [XLOGFILEPATH]
|
||||
pg_standby [OPTION]... ARCHIVELOCATION NEXTWALFILE XLOGFILEPATH [RESTARTWALFILE]
|
||||
|
||||
When used within the restore_command the %f and %p macros
|
||||
will provide the actual file and path required for the restore/recovery.
|
||||
|
||||
pg_standby assumes that ARCHIVELOCATION is directory accessible by the
|
||||
server-owning user.
|
||||
|
||||
If RESTARTWALFILE is specified, typically by using the %r option, then all files
|
||||
prior to this file will be removed from ARCHIVELOCATION. This then minimises
|
||||
the number of files that need to be held, whilst at the same time maintaining
|
||||
restart capability. This capability additionally assumes that ARCHIVELOCATION
|
||||
directory is writable.
|
||||
|
||||
o options
|
||||
|
||||
pg_standby has number of options.
|
||||
pg_standby allows the following command line switches
|
||||
|
||||
-c
|
||||
use copy/cp command to restore WAL files from archive
|
||||
@ -63,7 +74,10 @@ o options
|
||||
|
||||
-k numfiles
|
||||
Cleanup files in the archive so that we maintain no more
|
||||
than this many files in the archive.
|
||||
than this many files in the archive. This parameter will
|
||||
be silently ignored if RESTARTWALFILE is specified, since
|
||||
that specification method is more accurate in determining
|
||||
the correct cut-off point in archive.
|
||||
|
||||
You should be wary against setting this number too low,
|
||||
since this may mean you cannot restart the standby. This
|
||||
@ -75,8 +89,15 @@ o options
|
||||
It is wholly unrelated to the setting of checkpoint_segments
|
||||
on either primary or standby.
|
||||
|
||||
Setting numfiles to be zero will disable deletion of files
|
||||
from ARCHIVELOCATION.
|
||||
|
||||
If in doubt, use a large value or do not set a value at all.
|
||||
|
||||
If you specify neither RESTARTWALFILE nor -k, then -k 0
|
||||
will be assumed, i.e. keep all files in archive.
|
||||
Default=0, Min=0
|
||||
|
||||
-l
|
||||
use ln command to restore WAL files from archive
|
||||
WAL files will remain in archive
|
||||
@ -84,6 +105,8 @@ o options
|
||||
Link is more efficient, but the default is copy to
|
||||
allow you to maintain the WAL archive for recovery
|
||||
purposes as well as high-availability.
|
||||
The default setting is not necessarily recommended,
|
||||
consult the main database server manual for discussion.
|
||||
|
||||
This option uses the Windows Vista command mklink
|
||||
to provide a file-to-file symbolic link. -l will
|
||||
@ -99,14 +122,14 @@ o options
|
||||
the failure back to the database server. This will be
|
||||
interpreted as and end of recovery and the Standby will come
|
||||
up fully as a result.
|
||||
Default=3
|
||||
Default=3, Min=0
|
||||
|
||||
-s sleeptime
|
||||
the number of seconds to sleep between testing to see
|
||||
if the file to be restored is available in the archive yet.
|
||||
The default setting is not necessarily recommended,
|
||||
consult the main database server manual for discussion.
|
||||
Default=5
|
||||
Default=5, Min=1, Max=60
|
||||
|
||||
-t triggerfile
|
||||
the presence of the triggerfile will cause recovery to end
|
||||
@ -119,9 +142,10 @@ o options
|
||||
-w maxwaittime
|
||||
the maximum number of seconds to wait for the next file,
|
||||
after which recovery will end and the Standby will come up.
|
||||
A setting of zero means wait forever.
|
||||
The default setting is not necessarily recommended,
|
||||
consult the main database server manual for discussion.
|
||||
Default=0
|
||||
Default=0, Min=0
|
||||
|
||||
Note: --help is not supported since pg_standby is not intended
|
||||
for interactive use, except during dev/test
|
||||
@ -148,8 +172,7 @@ o examples
|
||||
Note that backslashes need to be doubled in the archive_command, but
|
||||
*not* in the restore_command, in 8.2, 8.1, 8.0 on Windows.
|
||||
|
||||
restore_command = 'pg_standby -c -d -s 5 -w 0 -t C:\pgsql.trigger.5442
|
||||
..\archive %f %p 2>> standby.log'
|
||||
restore_command = 'pg_standby -c -d -s 5 -w 0 -t C:\pgsql.trigger.5442 ..\archive %f %p 2>> standby.log'
|
||||
|
||||
which will
|
||||
- use a copy command to restore WAL files from archive
|
||||
@ -158,7 +181,26 @@ o examples
|
||||
- never timeout if file not found
|
||||
- stop waiting when a trigger file called C:\pgsql.trigger.5442 appears
|
||||
|
||||
o supported versions
|
||||
|
||||
pg_standby is designed to work with PostgreSQL 8.2 and later. It is
|
||||
currently compatible across minor changes between the way 8.3 and 8.2
|
||||
operate.
|
||||
|
||||
PostgreSQL 8.3 provides the %r command line substitution, designed to
|
||||
let pg_standby know the last file it needs to keep. If the last
|
||||
parameter is omitted, no error is generated, allowing pg_standby to
|
||||
function correctly with PostgreSQL 8.2 also. With PostgreSQL 8.2,
|
||||
the -k option must be used if archive cleanup is required. This option
|
||||
remains available in 8.3.
|
||||
|
||||
o reported test success
|
||||
|
||||
SUSE Linux 10.2
|
||||
Windows XP Pro
|
||||
|
||||
o additional design notes
|
||||
|
||||
The use of a move command seems like it would be a good idea, but
|
||||
this would prevent recovery from being restartable. Also, the last WAL
|
||||
file is always requested twice from the archive.
|
||||
|
@ -47,17 +47,20 @@ int maxwaittime = 0; /* how long are we prepared to wait for? */
|
||||
int keepfiles = 0; /* number of WAL files to keep, 0 keep all */
|
||||
int maxretries = 3; /* number of retries on restore command */
|
||||
bool debug = false; /* are we debugging? */
|
||||
bool triggered = false;
|
||||
bool signaled = false;
|
||||
bool triggered = false; /* have we been triggered? */
|
||||
bool need_cleanup = false; /* do we need to remove files from archive? */
|
||||
|
||||
static volatile sig_atomic_t signaled = false;
|
||||
|
||||
char *archiveLocation; /* where to find the archive? */
|
||||
char *triggerPath; /* where to find the trigger file? */
|
||||
char *xlogFilePath; /* where we are going to restore to */
|
||||
char *xlogFilePath; /* where we are going to restore to */
|
||||
char *nextWALFileName; /* the file we need to get from archive */
|
||||
char *restartWALFileName; /* the file from which we can restart restore */
|
||||
char *priorWALFileName; /* the file we need to get from archive */
|
||||
char WALFilePath[MAXPGPATH];/* the file path including archive */
|
||||
char restoreCommand[MAXPGPATH]; /* run this to restore */
|
||||
char inclusiveCleanupFileName[MAXPGPATH]; /* the file we need to get from archive */
|
||||
char exclusiveCleanupFileName[MAXPGPATH]; /* the file we need to get from archive */
|
||||
|
||||
#define RESTORE_COMMAND_COPY 0
|
||||
#define RESTORE_COMMAND_LINK 1
|
||||
@ -204,36 +207,15 @@ CustomizableNextWALFileReady()
|
||||
static void
|
||||
CustomizableCleanupPriorWALFiles(void)
|
||||
{
|
||||
uint32 tli,
|
||||
log,
|
||||
seg;
|
||||
int signed_log = 0;
|
||||
|
||||
if (keepfiles > 0)
|
||||
{
|
||||
sscanf(nextWALFileName, "%08X%08X%08X", &tli, &log, &seg);
|
||||
signed_log = log - (keepfiles / MaxSegmentsPerLogFile);
|
||||
if (keepfiles <= seg)
|
||||
seg -= keepfiles;
|
||||
else
|
||||
{
|
||||
seg = MaxSegmentsPerLogFile - (keepfiles % MaxSegmentsPerLogFile);
|
||||
signed_log--;
|
||||
}
|
||||
log = (uint32) signed_log;
|
||||
}
|
||||
|
||||
/*
|
||||
* Work out name of prior file from current filename
|
||||
*/
|
||||
if (keepfiles > 0 && signed_log >= 0 && nextWALFileType == XLOG_DATA)
|
||||
if (nextWALFileType == XLOG_DATA)
|
||||
{
|
||||
int rc;
|
||||
DIR *xldir;
|
||||
struct dirent *xlde;
|
||||
|
||||
XLogFileName(inclusiveCleanupFileName, tli, log, seg);
|
||||
|
||||
/*
|
||||
* Assume its OK to keep failing. The failure situation may change over
|
||||
* time, so we'd rather keep going on the main processing than fail
|
||||
@ -252,11 +234,13 @@ CustomizableCleanupPriorWALFiles(void)
|
||||
* complicated.
|
||||
*
|
||||
* We use the alphanumeric sorting property of the filenames to decide
|
||||
* which ones are earlier than the inclusiveCleanupFileName file.
|
||||
* which ones are earlier than the exclusiveCleanupFileName file.
|
||||
* Note that this means files are not removed in the order they were
|
||||
* originally written, in case this worries you.
|
||||
*/
|
||||
if (strlen(xlde->d_name) == XLOG_DATA_FNAME_LEN &&
|
||||
strspn(xlde->d_name, "0123456789ABCDEF") == XLOG_DATA_FNAME_LEN &&
|
||||
strcmp(xlde->d_name + 8, inclusiveCleanupFileName + 8) <= 0)
|
||||
strcmp(xlde->d_name + 8, exclusiveCleanupFileName + 8) < 0)
|
||||
{
|
||||
#ifdef WIN32
|
||||
snprintf(WALFilePath, MAXPGPATH, "%s\\%s", archiveLocation, xlde->d_name);
|
||||
@ -265,22 +249,26 @@ CustomizableCleanupPriorWALFiles(void)
|
||||
#endif
|
||||
|
||||
if (debug)
|
||||
fprintf(stderr, "\npg_standby: removing \"%s\"\n", WALFilePath);
|
||||
fprintf(stderr, "\nremoving \"%s\"", WALFilePath);
|
||||
|
||||
rc = unlink(WALFilePath);
|
||||
if (rc !=0 )
|
||||
fprintf(stderr, "\npg_standby: ERROR failed to remove \"%s\": %s\n", WALFilePath, strerror(errno));
|
||||
|
||||
|
||||
if (rc != 0)
|
||||
{
|
||||
fprintf(stderr, "\npg_standby: ERROR failed to remove \"%s\": %s",
|
||||
WALFilePath, strerror(errno));
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (debug)
|
||||
fprintf(stderr, "\n");
|
||||
}
|
||||
else
|
||||
fprintf(stderr, "pg_standby: archiveLocation \"%s\" open error\n", archiveLocation);
|
||||
|
||||
closedir(xldir);
|
||||
fflush(stderr);
|
||||
}
|
||||
fflush(stderr);
|
||||
}
|
||||
|
||||
/* =====================================================================
|
||||
@ -288,6 +276,61 @@ CustomizableCleanupPriorWALFiles(void)
|
||||
* =====================================================================
|
||||
*/
|
||||
|
||||
/*
|
||||
* SetWALFileNameForCleanup()
|
||||
*
|
||||
* Set the earliest WAL filename that we want to keep on the archive
|
||||
* and decide whether we need_cleanup
|
||||
*/
|
||||
static bool
|
||||
SetWALFileNameForCleanup(void)
|
||||
{
|
||||
uint32 tli = 1,
|
||||
log = 0,
|
||||
seg = 0;
|
||||
uint32 log_diff = 0,
|
||||
seg_diff = 0;
|
||||
bool cleanup = false;
|
||||
|
||||
if (restartWALFileName)
|
||||
{
|
||||
strcpy(exclusiveCleanupFileName, restartWALFileName);
|
||||
return true;
|
||||
}
|
||||
|
||||
if (keepfiles > 0)
|
||||
{
|
||||
sscanf(nextWALFileName, "%08X%08X%08X", &tli, &log, &seg);
|
||||
if (tli > 0 && log >= 0 && seg > 0)
|
||||
{
|
||||
log_diff = keepfiles / MaxSegmentsPerLogFile;
|
||||
seg_diff = keepfiles % MaxSegmentsPerLogFile;
|
||||
if (seg_diff > seg)
|
||||
{
|
||||
log_diff++;
|
||||
seg = MaxSegmentsPerLogFile - seg_diff;
|
||||
}
|
||||
else
|
||||
seg -= seg_diff;
|
||||
|
||||
if (log >= log_diff)
|
||||
{
|
||||
log -= log_diff;
|
||||
cleanup = true;
|
||||
}
|
||||
else
|
||||
{
|
||||
log = 0;
|
||||
seg = 0;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
XLogFileName(exclusiveCleanupFileName, tli, log, seg);
|
||||
|
||||
return cleanup;
|
||||
}
|
||||
|
||||
/*
|
||||
* CheckForExternalTrigger()
|
||||
*
|
||||
@ -353,7 +396,7 @@ RestoreWALFileForRecovery(void)
|
||||
{
|
||||
if (debug)
|
||||
{
|
||||
fprintf(stderr, " success\n");
|
||||
fprintf(stderr, " OK");
|
||||
fflush(stderr);
|
||||
}
|
||||
return true;
|
||||
@ -370,31 +413,30 @@ RestoreWALFileForRecovery(void)
|
||||
}
|
||||
|
||||
static void
|
||||
usage()
|
||||
usage(void)
|
||||
{
|
||||
fprintf(stderr, "\npg_standby allows Warm Standby servers to be configured\n");
|
||||
fprintf(stderr, "Usage:\n");
|
||||
fprintf(stderr, " pg_standby [OPTION]... [ARCHIVELOCATION] [NEXTWALFILE] [XLOGFILEPATH]\n");
|
||||
fprintf(stderr, " note space between [ARCHIVELOCATION] and [NEXTWALFILE]\n");
|
||||
fprintf(stderr, "with main intended use via restore_command in the recovery.conf\n");
|
||||
fprintf(stderr, " restore_command = 'pg_standby [OPTION]... [ARCHIVELOCATION] %%f %%p'\n");
|
||||
fprintf(stderr, "e.g. restore_command = 'pg_standby -l /mnt/server/archiverdir %%f %%p'\n");
|
||||
fprintf(stderr, " pg_standby [OPTION]... ARCHIVELOCATION NEXTWALFILE XLOGFILEPATH [RESTARTWALFILE]\n");
|
||||
fprintf(stderr, " note space between ARCHIVELOCATION and NEXTWALFILE\n");
|
||||
fprintf(stderr, "with main intended use as a restore_command in the recovery.conf\n");
|
||||
fprintf(stderr, " restore_command = 'pg_standby [OPTION]... ARCHIVELOCATION %%f %%p %%r'\n");
|
||||
fprintf(stderr, "e.g. restore_command = 'pg_standby -l /mnt/server/archiverdir %%f %%p %%r'\n");
|
||||
fprintf(stderr, "\nOptions:\n");
|
||||
fprintf(stderr, " -c copies file from archive (default)\n");
|
||||
fprintf(stderr, " -d generate lots of debugging output (testing only)\n");
|
||||
fprintf(stderr, " -k [NUMFILESTOKEEP] keeps history of # files in archives; unlinks/removes files beyond that\n");
|
||||
fprintf(stderr, " -k NUMFILESTOKEEP if RESTARTWALFILE not used, removes files prior to limit (0 keeps all)\n");
|
||||
fprintf(stderr, " -l links into archive (leaves file in archive)\n");
|
||||
fprintf(stderr, " -t [TRIGGERFILE] defines a trigger file to initiate failover (no default)\n");
|
||||
fprintf(stderr, " -r [MAXRETRIES] maximum number of times to retry, with progressive wait (default=3)\n");
|
||||
fprintf(stderr, " -s [SLEEPTIME] number of seconds to wait between file checks (default=5)\n");
|
||||
fprintf(stderr, " -w [MAXWAITTIME] max number of seconds to wait for a file (0 disables)(default=0)\n");
|
||||
fprintf(stderr, " -r MAXRETRIES max number of times to retry, with progressive wait (default=3)\n");
|
||||
fprintf(stderr, " -s SLEEPTIME seconds to wait between file checks (min=1, max=60, default=5)\n");
|
||||
fprintf(stderr, " -t TRIGGERFILE defines a trigger file to initiate failover (no default)\n");
|
||||
fprintf(stderr, " -w MAXWAITTIME max seconds to wait for a file (0=no limit)(default=0)\n");
|
||||
fflush(stderr);
|
||||
}
|
||||
|
||||
static void
|
||||
sighandler(int sig)
|
||||
{
|
||||
triggered = true;
|
||||
signaled = true;
|
||||
}
|
||||
|
||||
@ -419,9 +461,9 @@ main(int argc, char **argv)
|
||||
break;
|
||||
case 'k': /* keepfiles */
|
||||
keepfiles = atoi(optarg);
|
||||
if (keepfiles <= 0)
|
||||
if (keepfiles < 0)
|
||||
{
|
||||
fprintf(stderr, "usage: pg_standby -k keepfiles must be > 0\n");
|
||||
fprintf(stderr, "usage: pg_standby -k keepfiles must be >= 0\n");
|
||||
usage();
|
||||
exit(2);
|
||||
}
|
||||
@ -433,7 +475,7 @@ main(int argc, char **argv)
|
||||
maxretries = atoi(optarg);
|
||||
if (maxretries < 0)
|
||||
{
|
||||
fprintf(stderr, "usage: pg_standby -r maxretries must be > 0\n");
|
||||
fprintf(stderr, "usage: pg_standby -r maxretries must be >= 0\n");
|
||||
usage();
|
||||
exit(2);
|
||||
}
|
||||
@ -519,23 +561,28 @@ main(int argc, char **argv)
|
||||
exit(2);
|
||||
}
|
||||
|
||||
if (optind < argc)
|
||||
{
|
||||
restartWALFileName = argv[optind];
|
||||
optind++;
|
||||
}
|
||||
|
||||
CustomizableInitialize();
|
||||
|
||||
need_cleanup = SetWALFileNameForCleanup();
|
||||
|
||||
if (debug)
|
||||
{
|
||||
fprintf(stderr, "\nTrigger file : %s", triggerPath ? triggerPath : "<not set>");
|
||||
fprintf(stderr, "\nWaiting for WAL file : %s", WALFilePath);
|
||||
fprintf(stderr, "\nWAL file path : %s", nextWALFileName);
|
||||
fprintf(stderr, "\nWaiting for WAL file : %s", nextWALFileName);
|
||||
fprintf(stderr, "\nWAL file path : %s", WALFilePath);
|
||||
fprintf(stderr, "\nRestoring to... : %s", xlogFilePath);
|
||||
fprintf(stderr, "\nSleep interval : %d second%s",
|
||||
sleeptime, (sleeptime > 1 ? "s" : " "));
|
||||
fprintf(stderr, "\nMax wait interval : %d %s",
|
||||
maxwaittime, (maxwaittime > 0 ? "seconds" : "forever"));
|
||||
fprintf(stderr, "\nCommand for restore : %s", restoreCommand);
|
||||
if (keepfiles > 0)
|
||||
fprintf(stderr, "\nNum archived files kept : last %d files", keepfiles);
|
||||
else
|
||||
fprintf(stderr, "\nNum archived files kept : all files");
|
||||
fprintf(stderr, "\nKeep archive history : %s and later", exclusiveCleanupFileName);
|
||||
fflush(stderr);
|
||||
}
|
||||
|
||||
@ -572,6 +619,7 @@ main(int argc, char **argv)
|
||||
|
||||
if (signaled)
|
||||
{
|
||||
triggered = true;
|
||||
if (debug)
|
||||
{
|
||||
fprintf(stderr, "\nsignaled to exit\n");
|
||||
@ -614,7 +662,7 @@ main(int argc, char **argv)
|
||||
* immediately after the failed restore, or when
|
||||
* we restart recovery.
|
||||
*/
|
||||
if (RestoreWALFileForRecovery())
|
||||
if (RestoreWALFileForRecovery() && need_cleanup)
|
||||
CustomizableCleanupPriorWALFiles();
|
||||
|
||||
return 0;
|
||||
|
Reference in New Issue
Block a user