Previously, if primary_slot_name was set to an invalid slot name and
the configuration file was reloaded, both the postmaster and all other
backend processes reported a WARNING. With many processes running,
this could produce a flood of duplicate messages. The problem was that
the GUC check hook for primary_slot_name reported errors at WARNING
level via ereport().
This commit changes the check hook to use GUC_check_errdetail() and
GUC_check_errhint() for error reporting. As with other GUC parameters,
this causes non-postmaster processes to log the message at DEBUG3,
so by default, only the postmaster's message appears in the log file.
Backpatch to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/CAHGQGwFud-cvthCTfusBfKHBS6Jj6kdAPTdLWKvP2qjUX6L_wA@mail.gmail.com
Backpatch-through: 13
... which silently propagates a lot of headers into many places
via pgstat.h, as evidenced by the variety of headers that this patch
needs to add to seemingly random places. Add a minimum of typedefs to
conflict.h to be able to remove execnodes.h, and fix the fallout.
Backpatch to 18, where conflict.h first appeared.
Discussion: https://postgr.es/m/202509191927.uj2ijwmho7nv@alvherre.pgsql
Even after reaching the minimum recovery point, if there are long-lived
write transactions with 64 subtransactions on the primary, the recovery
snapshot may not yet be ready for hot standby, delaying read-only
connections on the standby. Previously, when read-only connections were
not accepted due to this condition, the following error message was logged:
FATAL: the database system is not yet accepting connections
DETAIL: Consistent recovery state has not been yet reached.
This DETAIL message was misleading because the following message was
already logged in this case:
LOG: consistent recovery state reached
This contradiction, i.e., indicating that the recovery state was consistent
while also stating it wasn’t, caused confusion.
This commit improves the error message to better reflect the actual state:
FATAL: the database system is not yet accepting connections
DETAIL: Recovery snapshot is not yet ready for hot standby.
HINT: To enable hot standby, close write transactions with more than 64 subtransactions on the primary server.
To implement this, the commit introduces a new postmaster signal,
PMSIGNAL_RECOVERY_CONSISTENT. When the startup process reaches
a consistent recovery state, it sends this signal to the postmaster,
allowing it to correctly recognize that state.
Since this is not a clear bug, the change is applied only to the master
branch and is not back-patched.
Author: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/02db8cd8e1f527a8b999b94a4bee3165@oss.nttdata.com
check_createrole_self_grant and check_synchronized_standby_slots
were allocating memory on a LOG elevel without checking if the
allocation succeeded or not, which would have led to a segfault
on allocation failure.
On top of that, a number of callsites were using the ERROR level,
relying on erroring out rather than returning false to allow the
GUC machinery handle it gracefully. Other callsites used WARNING
instead of LOG. While neither being not wrong, this changes all
check_ functions do it consistently with LOG.
init_custom_variable gets a promoted elevel to FATAL to keep
the guc_malloc error handling in line with the rest of the
error handling in that function which already call FATAL. If
we encounter an OOM in this callsite there is no graceful
handling to be had, better to error out hard.
Backpatch the fix to check_createrole_self_grant down to v16
and the fix to check_synchronized_standby_slots down to v17
where they were introduced.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Nikita <pm91.arapov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Bug: #18845
Discussion: https://postgr.es/m/18845-582c6e10247377ec@postgresql.org
Backpatch-through: 16
We want to support a "noreturn" decoration on more compilers besides
just GCC-compatible ones, but for that we need to move the decoration
in front of the function declaration instead of either behind it or
wherever, which is the current style afforded by GCC-style attributes.
Also rename the macro to "pg_noreturn" to be similar to the C11
standard "noreturn".
pg_noreturn is now supported on all compilers that support C11 (using
_Noreturn), as well as GCC-compatible ones (using __attribute__, as
before), as well as MSVC (using __declspec). (When PostgreSQL
requires C11, the latter two variants can be dropped.)
Now, all supported compilers effectively support pg_noreturn, so the
extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped.
This also fixes a possible problem if third-party code includes
stdnoreturn.h, because then the current definition of
#define pg_attribute_noreturn() __attribute__((noreturn))
would cause an error.
Note that the C standard does not support a noreturn attribute on
function pointer types. So we have to drop these here. There are
only two instances at this time, so it's not a big loss. In one case,
we can make up for it by adding the pg_noreturn to a wrapper function
and adding a pg_unreachable(), in the other case, the latter was
already done before.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
The usual pattern for handling a signal is that the signal handler
sets a flag and calls SetLatch(MyLatch), and CHECK_FOR_INTERRUPTS() or
other code that is part of a wait loop calls another function to deal
with it. The naming of the functions involved was a bit inconsistent,
however. CHECK_FOR_INTERRUPTS() calls ProcessInterrupts() to do the
heavy-lifting, but the analogous functions in aux processes were
called HandleMainLoopInterrupts(), HandleStartupProcInterrupts(),
etc. Similarly, most subroutines of ProcessInterrupts() were called
Process*(), but some were called Handle*().
To make things less confusing, rename all the functions that are part
of the overall signal/interrupt handling system but are not executed
in a signal handler to e.g. ProcessSomething(), rather than
HandleSomething(). The "Process" prefix is now consistently used in
the non-signal-handler functions, and the "Handle" prefix in functions
that are part of signal handlers, except for some completely unrelated
functions that clearly have nothing to do with signal or interrupt
handling.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/8a384b26-1499-41f6-be33-64b801fb98b8@iki.fi
The original message did not mention where the checkpoint record LSN was
found, a control file or a backup_label file. A couple of LOG messages
are generated before this FATAL check is reached, providing more details
about the way recovery is set up. However, knowing this information in
this specific message is useful for debugging. This is also useful for
instances where log_min_messages is set to FATAL or more, where LOG
messages do not show up.
Author: Benoit Lobréau <benoit.lobreau@dalibo.com>
Reviewed-by: David Steele <david@pgbackrest.org>
Discussion: https://postgr.es/m/4ed10bc8-5513-4d8e-8643-8abcaa08336d@dalibo.com
This commit is a rework of 2421e9a51d, about which Andres Freund has
raised some concerns as it is valuable to have both track_io_timing and
track_wal_io_timing in some cases, as the WAL write and fsync paths can
be a major bottleneck for some workloads. Hence, it can be relevant to
not calculate the WAL timings in environments where pg_test_timing
performs poorly while capturing some IO data under track_io_timing for
the non-WAL IO paths. The opposite can be also true: it should be
possible to disable the non-WAL timings and enable the WAL timings (the
previous GUC setups allowed this possibility).
track_wal_io_timing is added back in this commit, controlling if WAL
timings should be calculated in pg_stat_io for the read, fsync and write
paths, as done previously with pg_stat_wal. pg_stat_wal previously
tracked only the sync and write parts (now removed), read stats is new
data tracked in pg_stat_io, all three are aggregated if
track_wal_io_timing is enabled. The read part matters during recovery
or if a XLogReader is used.
Extra note: more control over if the types of timings calculated in
pg_stat_io could be done with a GUC that lists pairs of (IOObject,IOOp).
Reported-by: Andres Freund <andres@anarazel.de>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3opf2wh2oljco6ldyqf7ukabw3jijnnhno6fjb4mlu6civ5h24@fcwmhsgmlmzu
The four following attributes are removed from pg_stat_wal:
* wal_write
* wal_sync
* wal_write_time
* wal_sync_time
a051e71e28 has added an equivalent of this information in pg_stat_io
with more granularity as this now spreads across the backend types, IO
context and IO objects. So, keeping the same information in pg_stat_wal
has little benefits.
Another benefit of this commit is the removal of PendingWalStats,
simplifying an upcoming patch to add per-backend WAL statistics, which
already support IO statistics and which have access to the write/sync
stats data of WAL.
The GUC track_wal_io_timing, that was used to enable or disable the
aggregation of the write and sync timings for WAL, is also removed.
pgstat_prepare_io_time() is simplified.
Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the update of PgStat_WalStats.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
If the requested recovery timeline is not reachable, the logged
checkpoint and timeline should to be the values read from the
backup_label when it is defined. The message generated used the values
from the control file in this case, which is fine when recovering from
the control file without a backup_label, but not if there is a
backup_label.
Issue introduced in ee994272ca. v15 has introduced xlogrecovery.c and
more simplifications in this area (4a92a1c3d1, a27048cbcb), making
this change a bit simpler to think about, so backpatch only down to this
version.
Author: David Steele <david@pgbackrest.org>
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Benoit Lobréau <benoit.lobreau@dalibo.com>
Discussion: https://postgr.es/m/c3d617d4-1696-4aa7-8a4d-5a7d19cc5618@pgbackrest.org
Backpatch-through: 15
This commit adds WAL IO stats to both pg_stat_io view and per-backend IO
statistics (pg_stat_get_backend_io()). This change is possible since
f92c854cf4, as WAL IO is not counted in blocks in some code paths
where its stats data is measured (like WAL read in xlogreader.c).
IOContext gains IOCONTEXT_INIT and IOObject IOOBJECT_WAL, with the
following combinations allowed:
- IOOBJECT_WAL/IOCONTEXT_NORMAL is used to track I/O operations done on
already-created WAL segments.
- IOOBJECT_WAL/IOCONTEXT_INIT is used for tracking I/O operations done
when initializing WAL segments.
The core changes are done in pg_stat_io.c, backend statistics inherit
them. Backend statistics and pg_stat_io are now available for the WAL
writer, the WAL receiver and the WAL summarizer processes.
I/O timing data is controlled by the GUC track_io_timing, like the
existing data of pg_stat_io for consistency. The timings related to
IOOBJECT_WAL show up if the GUC is enabled (disabled by default).
Bump pgstats file version, due to the additions in IOObject and
IOContext, impacting the amount of data written for the fixed-numbered
IO stats kind in the pgstats file.
Author: Nazir Bilal Yavuz
Reviewed-by: Bertrand Drouvot, Nitin Jadhav, Amit Kapila, Michael
Paquier, Melanie Plageman, Bharath Rupireddy
Discussion: https://postgr.es/m/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com
XLogPageRead() checks immediately for an invalid WAL record header on a
standby, to be able to handle the case of continuation records that need
to be read across two different sources. As written, the check was too
generic, applying to any target LSN. Based on an analysis by Kyotaro
Horiguchi, what really matters is to make sure that the page header is
checked when attempting to read a LSN at the boundary of a segment, to
handle the case of a continuation record that spawns across multiple
pages when dealing with multiple segments, as WAL receivers are spawned
they request WAL from the beginning of a segment. This fix has been
proposed by Kyotaro Horiguchi.
This could cause standbys to loop infinitely when dealing with a
continuation record during a timeline jump, in the case where the
contents of the record in the follow-up page are invalid.
Some regression tests are added to check such scenarios, able to
reproduce the original problem. In the test, the contents of a
continuation record are overwritten with junk zeros on its follow-up
page, and replayed on standbys. This is inspired by 039_end_of_wal.pl,
and is enough to show how standbys should react on promotion by not
being stuck. Without the fix, the test would fail with a timeout. The
test to reproduce the problem has been written by Alexander Kukushkin.
The original check has been introduced in 0668719801, for a similar
problem.
Author: Kyotaro Horiguchi, Alexander Kukushkin
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAFh8B=mozC+e1wGJq0H=0O65goZju+6ab5AU7DEWCSUA2OtwDg@mail.gmail.com
Backpatch-through: 13
They were all missing punctuation, one was missing initial capital.
Per our message style guidelines.
No backpatch, to avoid breaking existing translations.
Instead of talking about setting latches, which is a pretty low-level
mechanism, emphasize that they wake up other processes.
This is in preparation for replacing Latches with a new abstraction.
That's still work in progress, but this seems a little tidier anyway,
so let's get this refactoring out of the way already.
Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c%40iki.fi
3c5db1d6b implemented the pg_wal_replay_wait() stored procedure. Due to
the patch development history, the implementation resided in
src/backend/commands/waitlsn.c (src/include/commands/waitlsn.h for headers).
014f9f34d moved pg_wal_replay_wait() itself to
src/backend/access/transam/xlogfuncs.c near to the WAL-manipulation functions.
But most of the implementation stayed in place.
The code in src/backend/commands/waitlsn.c has nothing to do with commands,
but is related to WAL. So, this commit moves this code into
src/backend/access/transam/xlogwait.c (src/include/access/xlogwait.h for
headers).
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/18c0fa64-0475-415e-a1bd-665d922c5201%40eisentraut.org
Reviewed-by: Pavel Borisov
This change makes the code slightly easier to reason about, because
there is actually no need to know if a specific caller of one of these
routines should fail hard on a PANIC, or just let it go through with a
DEBUG2.
The only caller of XLogFileReadAnyTLI() used DEBUG2, and XLogFileRead()
has never used its emode. This can be simplified since 1bb2558046
that has introduced XLogFileReadAnyTLI(), splitting both.
Author: Yugo Nagata
Discussion: https://postgr.es/m/20240906201043.a640f3b44e755d4db2b6943e@sraoss.co.jp
Similarly to 2065ddf5e3, this introduces a define for "pg_tblspc".
This makes the style more consistent with the existing PG_STAT_TMP_DIR,
for example.
There is a difference with the other cases with the introduction of
PG_TBLSPC_DIR_SLASH, required in two places for recovery and backups.
Author: Bertrand Drouvot
Reviewed-by: Ashutosh Bapat, Álvaro Herrera, Yugo Nagata, Michael
Paquier
Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal
pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed. This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.
The queue of waiters is stored in the shared memory as an LSN-ordered pairing
heap, where the waiter with the nearest LSN stays on the top. During
the replay of WAL, waiters whose LSNs have already been replayed are deleted
from the shared memory pairing heap and woken up by setting their latches.
pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise,
the snapshot could prevent the replay of WAL records, implying a kind of
self-deadlock. This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working without an active snapshot,
not a function.
Catversion is bumped.
Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi
All the errors triggered in the code paths patched here would cause the
backend to issue an internal_error errcode, which is a state that should
be used only for "can't happen" situations. However, these code paths
are reachable by the regression tests, and could be seen by users in
valid cases. Some regression tests expect internal errcodes as they
manipulate the backend state to cause corruption (like checksums), or
use elog() because it is more convenient (like injection points), these
have no need to change.
This reduces the number of internal failures triggered in a check-world
by more than half, while providing correct errcodes for these valid
cases.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/Zic_GNgos5sMxKoa@paquier.xyz
After further review, we want to move in the direction of always
quoting GUC names in error messages, rather than the previous (PG16)
wildly mixed practice or the intermittent (mid-PG17) idea of doing
this depending on how possibly confusing the GUC name is.
This commit applies appropriate quotes to (almost?) all mentions of
GUC names in error messages. It partially supersedes a243569bf6 and
8d9978a717, which had moved things a bit in the opposite direction
but which then were abandoned in a partial state.
Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com
06c418e163 introduced pg_wal_replay_wait() procedure allowing to wait for
the particular LSN to be replayed on standby. The waiters were stored in
the flat array. Even though scanning small arrays is fast, that might be a
problem at scale (a lot of waiting processes).
This commit replaces the flat shared memory array with the pairing heap,
which holds the waiter with the least LSN at the top. This gives us O(log N)
complexity for both inserting and removing waiters.
Reported-by: Alvaro Herrera
Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql
pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed before starting the transaction.
This option is useful when the user makes some data changes on primary and
needs a guarantee to see these changes on standby.
The queue of waiters is stored in the shared memory array sorted by LSN.
During replay of WAL waiters whose LSNs are already replayed are deleted from
the shared memory array and woken up by setting of their latches.
pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise,
the snapshot could prevent the replay of WAL records implying a kind of
self-deadlock. This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working in a non-atomic context,
not a function.
Catversion is bumped.
Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
Two FATALs and one PANIC gain details about the LSNs they fail at:
- When restoring from a backup_label, the FATAL log generated when not
finding the checkpoint record now reports its LSN.
- When restoring from a backup_label, the FATAL log generated when not
finding the redo record referenced by a checkpoint record now shows both
the redo and checkpoint record LSNs.
- When not restoring from a backup_label, the PANIC error generated when
not finding the checkpoint record now reports its LSN.
This information is useful when debugging corruption issues, and these
LSNs may not show up in the logs depending on the level of logging
configured in the backend.
Author: David Steele
Discussion: https://postgr.es/m/0e90da89-77ca-4ccf-872c-9626d755e288@pgmasters.net
as determined by include-what-you-use (IWYU)
While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that. In some cases, a more
specific #include replaces another less specific one.
Some manual adjustments of the automatic result:
- IWYU currently doesn't know about includes that provide global
variable declarations (like -Wmissing-variable-declarations), so
those includes are being kept manually.
- All includes for port(ability) headers are being kept for now, to
play it safe.
- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
patch from exploding in size.
Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.
As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.
Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
By enabling slot synchronization, all the failover logical replication
slots on the primary (assuming configurations are appropriate) are
automatically created on the physical standbys and are synced
periodically. The slot sync worker on the standby server pings the primary
server at regular intervals to get the necessary failover logical slots
information and create/update the slots locally. The slots that no longer
require synchronization are automatically dropped by the worker.
The nap time of the worker is tuned according to the activity on the
primary. The slot sync worker waits for some time before the next
synchronization, with the duration varying based on whether any slots were
updated during the last cycle.
A new parameter sync_replication_slots enables or disables this new
process.
On promotion, the slot sync worker is shut down by the startup process to
drop any temporary slots acquired by the slot sync worker and to prevent
the worker from trying to fetch the failover slots.
A functionality to allow logical walsenders to wait for the physical will
be done in a subsequent commit.
Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
Three LOG messages are added in the recovery code paths, providing
information that can be useful to track corruption issues depending on
the state of the cluster, telling that:
- Recovery has started from a backup_label.
- Recovery is restarting from a backup start LSN, without a
backup_label.
- Recovery has completed from a backup.
Author: Andres Freund
Reviewed-by: David Steele, Laurenz Albe, Michael Paquier
Discussion: https://postgr.es/m/20231117041811.vz4vgkthwjnwp2pp@awork3.anarazel.de
To take an incremental backup, you use the new replication command
UPLOAD_MANIFEST to upload the manifest for the prior backup. This
prior backup could either be a full backup or another incremental
backup. You then use BASE_BACKUP with the INCREMENTAL option to take
the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST
option to trigger this behavior.
An incremental backup is like a regular full backup except that
some relation files are replaced with files with names like
INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains
additional lines identifying it as an incremental backup. The new
pg_combinebackup tool can be used to reconstruct a data directory
from a full backup and a series of incremental backups.
Patch by me. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub
Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to
Jakub for incredibly helpful and extensive testing.
Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
This GUC was intended as a debugging help in the 9.0 area when hot
standby and streaming replication were being developped, able to offer
more information at LOG level rather than DEBUGn. There are more tools
available these days that are able to offer rather equivalent
information, like pg_waldump introduced in 9.3. It is not obvious how
this facility is useful these days, so let's remove it.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/ZXEXEAUVFrvpquSd@paquier.xyz
When beginning recovery, a LOG is displayed by the startup process to
show which recovery mode will be used depending on the .signal file(s)
set in the data folder, like "standby mode", recovery up to a given
target type and value, or archive recovery.
A different patch is under discussion to simplify the startup code by
requiring the presence of recovery.signal and/or standby.signal when a
backup_label file is read. Delaying a bit this LOG ensures that the
correct recovery mode would be reported, and putting it at this position
does not make it lose its value.
While on it, this commit adds a few comments documenting a bit more the
initial recovery steps and their dependencies, and fixes an incorrect
comment format. This introduces no behavior changes.
Extracted from a larger patch by me.
Reviewed-by: David Steele, Bowen Shi
Discussion: https://postgr.es/m/ZArVOMifjzE7f8W7@paquier.xyz
When beginning recovery from a base backup by reading a backup_label
file, it may be possible that no checkpoint record is available
depending on the method used when the case backup was taken, which would
prevent recovery from beginning. In this case, the FATAL messages
issued, initially added by c900c15269, mentioned recovery.signal as
an option to do recovery but not standby.signal. Let's add it as an
available option, for clarity.
Per suggestion from Bowen Shi, extracted from a larger patch by me.
Author: Michael Paquier
Discussion: https://postgr.es/m/CAM_vCudkSjr7NsNKSdjwtfAm9dbzepY6beZ5DP177POKy8=2aw@mail.gmail.com
Since C99, there can be a trailing comma after the last value in an
enum definition. A lot of new code has been introducing this style on
the fly. Some new patches are now taking an inconsistent approach to
this. Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one. We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.
I omitted a few places where there was a fixed "last" value that will
always stay last. I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers. There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.
Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
This shouldn't change behavior except in the unusual case where
there are file in the tablespace directory that have entirely
numeric names but are nevertheless not possible names for a
tablespace directory, either because their names have leading zeroes
that shouldn't be there, or the value is actually zero, or because
the value is too large to represent as an OID.
In those cases, the directory would previously have made it into
the list of tablespaceinfo objects and no longer will. Thus, base
backups will now ignore such directories, instead of treating them
as legitimate tablespace directories. Similarly, if entries for
such tablespaces occur in a tablespace_map file, they will now
be rejected as erroneous, instead of being honored.
This is infrastructure for future work that wants to be able to
know the tablespace of each relation that is part of a backup
*as an OID*. By strengthening the up-front validation, we don't
have to worry about weird cases later, and can more easily avoid
repeated string->integer conversions.
Patch by me, reviewed by David Steele.
Discussion: http://postgr.es/m/CA+TgmoZNVeBzoqDL8xvr-nkaepq815jtDR4nJzPew7=3iEuM1g@mail.gmail.com
This allows tools that read the WAL sequentially to identify (possible)
redo points when they're reached, rather than only being able to
detect them in retrospect when XLOG_CHECKPOINT_ONLINE is found, possibly
much later in the WAL stream. There are other possible applications as
well; see the discussion links below.
Any redo location that precedes the checkpoint location should now point
to an XLOG_CHECKPOINT_REDO record, so add a cross-check to verify this.
While adjusting the code in CreateCheckPoint() for this patch, I made it
call WALInsertLockAcquireExclusive a bit later than before, since there
appears to be no need for it to be held while checking whether the system
is idle, whether this is an end-of-recovery checkpoint, or what the current
timeline is.
Bump XLOG_PAGE_MAGIC.
Patch by me, based in part on earlier work from Dilip Kumar. Review by
Dilip Kumar, Amit Kapila, Andres Freund, and Michael Paquier.
Discussion: http://postgr.es/m/CA+TgmoYy-Vc6G9QKcAKNksCa29cv__czr+N9X_QCxEfQVpp_8w@mail.gmail.com
Discussion: http://postgr.es/m/20230614194717.jyuw3okxup4cvtbt%40awork3.anarazel.de
Discussion: http://postgr.es/m/CA+hUKG+b2ego8=YNW2Ohe9QmSiReh1-ogrv8V_WZpJTqP3O+2w@mail.gmail.com
Run pgindent, pgperltidy, and reformat-dat-files.
This set of diffs is a bit larger than typical. We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop). We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up. Going
forward, that should make for fewer random-seeming changes to existing
code.
Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
Physical walsenders can't send data until it's been flushed; logical
walsenders can't decode and send data until it's been applied. On the
standby, the WAL is flushed first, which will only wake up physical
walsenders; and then applied, which will only wake up logical
walsenders.
Previously, all walsenders were awakened when the WAL was flushed. That
was fine for logical walsenders on the primary; but on the standby the
flushed WAL would have been not applied yet, so logical walsenders were
awakened too early.
Per idea from Jeff Davis and Amit Kapila.
Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Reviewed-By: Jeff Davis <pgsql@j-davis.com>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1+zO5LUeisabX10c81LU-fWMKO4M9Wyg1cdkbW7Hqh6vQ@mail.gmail.com
In standby mode, we don't actually report progress of recovery,
but up until now, startup_progress_timeout_handler() nevertheless
got called every log_startup_progress_interval seconds. That's
an unnecessary expense, so avoid it.
Report by Thomas Munro. Patch by Bharath Rupireddy, reviewed by
Simon Riggs, Thomas Munro, and me. Back-patch to v15, where
the problem was introduced.
Discussion: https://www.postgresql.org/message-id/CA%2BhUKGKCHSffAj8zZJKJvNX7ygnQFxVD6wm1d-2j3fVw%2BMafPQ%40mail.gmail.com
7fcbf6a and 2ff6555 changed the function signature of XLogPageRead()
but did not update the comment.
XLogReaderRoutine contains up to date information about the API, so no
need to repeat all that at XLogPageRead(), but fix the mentions of the
no longer existing function arguments.
When ending recovery based on recovery_target_xid matching with
recovery_target_inclusive = off, we printed an incorrect timestamp
(always 2000-01-01) in the "recovery stopping before ... transaction"
log message. This is a consequence of sloppy refactoring in
c945af80c: the code to fetch recordXtime out of the commit/abort
record used to be executed unconditionally, but it was changed
to get called only in the RECOVERY_TARGET_TIME case. We need only
flip the order of operations to restore the intended behavior.
Per report from Torsten Förtsch. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/CAKkG4_kUevPqbmyOfLajx7opAQk6Cvwkvx0HRcFjSPfRPTXanA@mail.gmail.com
Pay down some ancient technical debt (dating to commit 022fd9966):
fix a couple of places in datetime parsing that were throwing
ereport's immediately instead of returning a DTERR code that could be
interpreted by DateTimeParseError. The reason for that was that there
was no mechanism for passing any auxiliary data (such as a zone name)
to DateTimeParseError, and these errors seemed to really need it.
Up to now it didn't matter that much just where the error got thrown,
but now we'd like to have a hard policy that datetime parse errors
get thrown from just the one place.
Hence, invent a "DateTimeErrorExtra" struct that can be used to
carry any extra values needed for specific DTERR codes. Perhaps
in the future somebody will be motivated to use this to improve
the specificity of other DateTimeParseError messages, but for now
just deal with the timezone-error cases.
This is on the way to making the datetime input functions report
parse errors softly; but it's really an independent change, so
commit separately.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru