though must not update the last transaction timestamp.
Plus comment and message cleanup for recent named restore point.
Fujii Masao, minor changes by me
the standby has written, flushed, and applied the WAL. At the moment, this
is for informational purposes only, the values are only shown in
pg_stat_replication system view, but in the future they will also be needed
for synchronous replication.
Extracted from Simon riggs' synchronous replication patch by Robert Haas, with
some tweaking by me.
Specifying this option makes the server not wait for the
xlog to be archived, or emit a warning that it can't,
instead leaving the responsibility with the client.
This is useful when the log is being streamed using
the streaming protocol in parallel with the backup,
without having log archiving enabled.
new recovery.conf parameter recovery_target_name allows PITR to
specify named points as recovery targets.
Jaime Casanova, reviewed by Euler Taveira de Oliveira, plus minor edits
If the standby was streaming when trigger file arrives, check also in the
archive for additional WAL files. This is a corner case since it is
unlikely that we would trigger a failover while the master is still
available and sending data to standby, while at the same time running in
archive mode and also while the streaming standby has fallen behind archive.
Someone would eventually be unlucky; we must plug all gaps however small.
Fujii Masao
Prior to 9.0, restartpoints never created, deleted, or recycled WAL
files, but now they can. This code makes log_checkpoints treat
checkpoints and restartpoints symmetrically. It also adjusts up
the documentation of the parameter to mention restartpoints.
Fujii Masao. Docs by me, as suggested by Itagaki Takahiro.
With this patch, pg_basebackup doesn't write a backup_label file in the
data directory, so it doesn't interfere with a pg_start/stop_backup() based
backup anymore. backup_label is still included in the backup, but it is
injected directly into the tar stream.
Heikki Linnakangas, reviewed by Fujii Masao and Magnus Hagander.
If wal_buffers is initially set to -1 (which is now the default), it's
replaced by 1/32nd of shared_buffers, with a minimum of 8 (the old default)
and a maximum of the XLOG segment size. The allowed range for manual
settings is still from 4 up to whatever will fit in shared memory.
Greg Smith, with implementation correction by me.
Move the actual functionality into a separate function that's
easier to call internally, and change the SQL-callable function
to be a wrapper calling this.
Also create a pg_abort_backup() function, only callable internally,
that does only the most vital parts of pg_stop_backup(), making it
safe(r) to call from error handlers.
The contents of an unlogged table are WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery. Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.
This privilege is required to do Streaming Replication, instead of
superuser, making it possible to set up a SR slave that doesn't
have write permissions on the master.
Superuser privileges do NOT override this check, so in order to
use the default superuser account for replication it must be
explicitly granted the REPLICATION permissions. This is backwards
incompatible change, in the interest of higher default security.
Purely cosmetic patch to make our coding standards more consistent ---
we were doing symbolic some places and octal other places. This patch
fixes all C-coded uses of mkdir, chmod, and umask. There might be some
other calls I missed. Inconsistency noted while researching tablespace
directory permissions issue.
an old transaction running in the master, and a lot of transactions have
started and finished since, and a WAL-record is written in the gap between
the creating the running-xacts snapshot and WAL-logging it, recovery will fail
with "too many KnownAssignedXids" error. This bug was reported by
Joachim Wieland on Nov 19th.
In the same scenario, when fewer transactions have started so that all the
xids fit in KnownAssignedXids despite the first bug, a more serious bug
arises. We incorrectly initialize the clog code with the oldest still running
transaction, and when we see the WAL record belonging to a transaction with
an XID larger than one that committed already before the checkpoint we're
recovering from, we zero the clog page containing the already committed
transaction, leading to data loss.
In hindsight, trying to track xids in the known-assigned-xids array before
seeing the running-xacts record was too complicated. To fix that, hold
XidGenLock while the running-xacts snapshot is taken and WAL-logged. That
ensures that no transaction can begin or end in that gap, so that in recvoery
we know that the snapshot contains all transactions running at that point in
WAL.
This eliminates some crufty, special-purpose code and, as a non-trivial
side benefit, allows recovery.conf parameters to be unquoted.
Dimitri Fontaine, with review and cleanup by Alvaro Herrera, Itagaki
Takahiro, and me.
location read from backup label file can be found: wasShutdown was set
incorrectly when a backup label file was found.
Jeff Davis, with a little tweaking by me.
rather than 0/0, so that we can safely use 0/0 as an invalid value. This is a
more future-proof fix for the corner-case bug in streaming replication that
was fixed yesterday. We had a similar corner-case bug with log/seg 0/0 back in
February as well. Avoiding 0/0 as a valid value should prevent bugs like that
in the future. Per Tom Lane's idea.
Back-patch to 9.0. Since this only affects bootstrapping, it makes no
difference to existing installations. We don't need to worry about the
bug in existing installations, because if you've managed to get past the
initial base backup already, you won't hit the bug in the future either.
streaming replication. We used log/seg 0/0 to indicate that no WAL segments
have been removed since startup, but 0/0 is a valid value for the very first
WAL segment after initdb. To make that disambiguous, store
(latest removed WAL segment + 1) in the global variable.
Per report from Matt Chesler, also reproduced by Greg Smith.
that WAL file containing the checkpoint redo-location can be found. This
avoids making the cluster irrecoverable if the redo location is in an earlie
WAL file than the checkpoint record.
Report, analysis and patch by Jeff Davis, with small changes by me.
new WAL arrives via streaming replication. This reduces the latency, and
also allows us to use a longer polling interval, which is good for energy
efficiency.
We still need to poll to check for the appearance of a trigger file, but
the interval is now 5 seconds (instead of 100ms), like when waiting for
a new WAL segment to appear in WAL archive.
Transaction aborts now record their LSN to avoid corner case
behaviour in SR/HS, hence change of name of variables and functions.
As pointed out by Fujii Masao. Cosmetic changes only.
idea from the start since the variable is only meant to track commit/abort
events. This patch reverts the logic around the variable to what it was in
8.4, except that the value is now kept in shared memory rather than a static
variable, so that it can be reported correctly by CreateRestartPoint (which is
executed in the bgwriter).
max_standby_streaming_delay, and revise the implementation to avoid assuming
that timestamps found in WAL records can meaningfully be compared to clock
time on the standby server. Instead, the delay limits are compared to the
elapsed time since we last obtained a new WAL segment from archive or since
we were last "caught up" to WAL data arriving via streaming replication.
This avoids problems with clock skew between primary and standby, as well
as other corner cases that the original coding would misbehave in, such
as the primary server having significant idle time between transactions.
Per my complaint some time ago and considerable ensuing discussion.
Do some desultory editing on the hot standby documentation, too.
master. Otherwise a subsequent crash could cause the master to lose WAL that
has already been applied on the slave, resulting in the slave being out of
sync and soon corrupt. Per recent discussion and an example from Robert Haas.
Fujii Masao
and retry. If the record is genuinely corrupt in the master database,
there's little hope of recovering, but it's better than simply retrying
to apply the corrupt WAL record in a tight loop without even trying to
retransmit it, which is what we used to do.
pg_last_xlog_replay_location(). Per Robert Haas's suggestion, after
Itagaki Takahiro pointed out an issue in the docs. Also, some wording
changes in the docs by me.
checkpoint_timeout to trigger restartpoints. We used to deliberately only
do time-based restartpoints, because if checkpoint_segments is small we
would spend time doing restartpoints more often than really necessary.
But now that restartpoints are done in bgwriter, they're not as
disruptive as they used to be. Secondly, because streaming replication
stores the streamed WAL files in pg_xlog, we want to clean it up more
often to avoid running out of disk space when checkpoint_timeout is large
and checkpoint_segments small.
Patch by Fujii Masao, with some minor changes by me.
the current one. Not doing this would leave the walwriter with a handle to a
deleted file if there was nothing for it to do for a long period of time,
preventing the file from being completely removed.
Reported by Tollef Fog Heen, and thanks to Heikki for some hand-holding with
the patch.