c186ba13 has fixed an issue related to the updates of the minimum
recovery LSN across multiple processes on standbys, but we never really
had a test case able to reliably check its logic.
This commit introduces a new test case to close the gap, and is designed
to check the consistency of data based on the minimum recovery point set
by either the startup process or the checkpointer for both an offline
cluster (by looking at the on-disk page headers) and an online cluster
(using pageinspect).
Note that with c186ba13 reverted, this test fails badly for both the
online and offline cases, as designed.
Author: Michael Paquier, Andrew Gierth
Reviewed-by: Andrew Gierth, Georgios Kokolatos, Arthur Zakirov
Discussion: https://postgr.es/m/20181108044525.GA17482@paquier.xyz
recovery.conf settings are now set in postgresql.conf (or other GUC
sources). Currently, all the affected settings are PGC_POSTMASTER;
this could be refined in the future case by case.
Recovery is now initiated by a file recovery.signal. Standby mode is
initiated by a file standby.signal. The standby_mode setting is
gone. If a recovery.conf file is found, an error is issued.
The trigger_file setting has been renamed to promote_trigger_file as
part of the move.
The documentation chapter "Recovery Configuration" has been integrated
into "Server Configuration".
pg_basebackup -R now appends settings to postgresql.auto.conf and
creates a standby.signal file.
Author: Fujii Masao <masao.fujii@gmail.com>
Author: Simon Riggs <simon@2ndquadrant.com>
Author: Abhijit Menon-Sen <ams@2ndquadrant.com>
Author: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/607741529606767@web3g.yandex.ru/
Reporting only the stderr is unhelpful when the problem is that the
server output we're getting doesn't match what was expected. So we
should report the query output too; and just for good measure, let's
print the query we used and the output we expected.
Back-patch to 9.5 where poll_query_until was introduced.
Discussion: https://postgr.es/m/17913.1539634756@sss.pgh.pa.us
Buildfarm member tern failed src/bin/pg_ctl/t/001_start_stop.pl when a
check_mode_recursive() call overlapped a server's startup-time deletion
of pg_stat/global.stat. Just warn. Also, include errno in the message.
Back-patch to v11, where check_mode_recursive() first appeared.
Currently there are two ways to trigger log rotation in logging collector
process: call pg_rotate_logfile() SQL-function or send SIGUSR1 signal directly
to logging collector process. However, it's nice to have more suitable way
for external tools to do that, which wouldn't require SQL connection or
knowledge of logging collector pid. This commit implements triggering log
rotation by "pg_ctl logrotate" command.
Discussion: https://postgr.es/m/20180416.115435.28153375.horiguchi.kyotaro%40lab.ntt.co.jp
Author: Kyotaro Horiguchi, Alexander Kuzmenkov, Alexander Korotkov
WAL senders sending logically-decoded data fail to properly report in
"streaming" state when starting up, hence as long as one extra record is
not replayed, such WAL senders would remain in a "catchup" state, which
is inconsistent with the physical cousin.
This can be easily reproduced by for example using pg_recvlogical and
restarting the upstream server. The TAP tests have been slightly
modified to detect the failure and strengthened so as future tests also
make sure that a node is in streaming state when waiting for its
catchup.
Backpatch down to 9.4 where this code has been introduced.
Reported-by: Sawada Masahiko
Author: Simon Riggs, Sawada Masahiko
Reviewed-by: Petr Jelinek, Michael Paquier, Vaishnavi Prabakaran
Discussion: https://postgr.es/m/CAD21AoB2ZbCCqOx=bgKMcLrAvs1V0ZMqzs7wBTuDySezTGtMZA@mail.gmail.com
In TAP test functions, that is, those that produce test results, locally
increment $Test::Builder::Level. This has the effect that test failures
are reported at the callers location rather than somewhere in the test
support libraries.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
search.cpan.org has been EOL'd, with metacpan.org being the official
replacement to which URLs now redirect. Update links to match the new
URL. Also update links to CPAN to use https as it will redirect from
http.
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/B74C0219-6BA9-46E1-A524-5B9E8CD3BDB3@yesql.se
This complies with the perlcritic policy
Subroutines::RequireFinalReturn, which is a severity 4 policy. Since we
only currently check at severity level 5, the policy is raised to that
level until we move to level 4 or lower, so that any new infringements
will be caught.
A small cosmetic piece of tidying of the pgperlcritic script is
included.
Mike Blackwell
Discussion: https://postgr.es/m/CAESHdJpfFm_9wQnQ3koY3c91FoRQsO-fh02za9R3OEMndOn84A@mail.gmail.com
The vertical tightness settings collapse vertical whitespace between
opening and closing brackets (parentheses, square brakets and braces).
This can make data structures in particular harder to read, and is not
very consistent with our style in non-Perl code. This patch restricts
that setting to parentheses only, and reformats all the perl code
accordingly. Not applying this to parentheses has some unfortunate
effects, so the consensus is to keep the setting for parentheses and not
for the others.
The diff for this patch does highlight some places where structures
should have trailing commas. They can be added manually, as there is no
automatic tool to do so.
Discussion: https://postgr.es/m/a2f2b87c-56be-c070-bfc0-36288b4b41c1@2ndQuadrant.com
Commit 6271fceb8 changed PostgresNode.pm to force log_min_messages = debug1
in all TAP tests, without any discussion and without a concrete need for
it. This makes some of the TAP tests noticeably slower (although much of
that may be due to poorly-written regexes), and for certain it's bloating
the buildfarm logs. Revert the change.
Discussion: https://postgr.es/m/32459.1525657786@sss.pgh.pa.us
Allow the cluster to be optionally init'd with read access for the
group.
This means a relatively non-privileged user can perform a backup of the
cluster without requiring write privileges, which enhances security.
The mode of PGDATA is used to determine whether group permissions are
enabled for directory and file creates. This method was chosen as it's
simple and works well for the various utilities that write into PGDATA.
Changing the mode of PGDATA manually will not automatically change the
mode of all the files contained therein. If the user would like to
enable group access on an existing cluster then changing the mode of all
the existing files will be required. Note that pg_upgrade will
automatically change the mode of all migrated files if the new cluster
is init'd with the -g option.
Tests are included for the backend and all the utilities which operate
on the PG data directory to ensure that the correct mode is set based on
the data directory permissions.
Author: David Steele <david@pgmasters.net>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
Consolidate directory and file create permissions for tools which work
with the PG data directory by adding a new module (common/file_perm.c)
that contains variables (pg_file_create_mode, pg_dir_create_mode) and
constants to initialize them (0600 for files and 0700 for directories).
Convert mkdir() calls in the backend to MakePGDirectory() if the
original call used default permissions (always the case for regular PG
directories).
Add tests to make sure permissions in PGDATA are set correctly by the
tools which modify the PG data directory.
Authors: David Steele <david@pgmasters.net>,
Adam Brightwell <adam.brightwell@crunchydata.com>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
The new unlogged_reinit recovery tests create a new tablespace using
TestLib.pm's tempdir. However, on msys that function returns a virtual
path that isn't understood by Postgres. Here we add a new function to
TestLib.pm to turn such a path into a real path on the underlying file
system, and use it in the new test to create the tablespace. The new
function is essentially a NOOP everywhere but msys.
This was nearly the same code. Extend wait_for_catchup to allow waiting
for pg_current_wal_lsn() and use that in the subscription tests. Also
change one use in the pg_rewind tests to use this.
Also remove some broken code in wait_for_catchup and
wait_for_slot_catchup. The error message in case the waiting failed
wanted to show the current LSN, but the way it was written never
worked. So since nobody ever cared, just remove it.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Add a function to TestLib that allows us to check pg_config.h and then
decide the expected test outcome based on that.
Author: Michael Paquier <michael.paquier@gmail.com>
Various Perl scripts we use to generate files were in the habit of
printing things like "generated by $0" into their output files.
That looks like a fine idea at first glance, but it results in
non-reproducible output, because in VPATH builds $0 won't be just
the name of the script file, but a full path for it. We'd prefer
that you get identical results whether using VPATH or not, so this
is a bad thing.
Some of these places also printed their input file name(s), causing
an additional hazard of the same type.
Hence, establish a policy that thou shalt not print $0, nor input file
pathnames, into output files (they're still allowed in error messages,
though). Instead just write the script name verbatim. While we are at
it, we can make these annotations more useful by giving the script's
full relative path name within the PG source tree, eg instead of
Gen_fmgrtab.pl let's print src/backend/utils/Gen_fmgrtab.pl.
Not all of the changes made here actually affect any files shipped
in finished tarballs today, but it seems best to apply the policy
everyplace so that nobody copies unsafe code into places where it
could matter.
Christoph Berg and Tom Lane
Discussion: https://postgr.es/m/20171215102223.GB31812@msg.df7cb.de
Don't write to stdin of a psql process that could have already exited
with an authentication failure. Buildfarm members crake and mandrill
have failed once by doing so. Ignore SIGPIPE in all TAP tests.
Back-patch to v10, where these tests were introduced.
Reviewed by Michael Paquier.
Discussion: https://postgr.es/m/20171209210203.GC3362632@rfd.leadboat.com
This is only used in the pg_rewind tests, so only set it there. It's
better if other tests run closer to a default configuration.
Author: Michael Paquier <michael.paquier@gmail.com>
When copying from an active database tree, it's possible for files to be
deleted after we see them in a readdir() scan but before we can open them.
(Once we've got a file open, we don't expect any further errors from it
getting unlinked, though.) Tweak RecursiveCopy so it can cope with this
case, so as to avoid irreproducible test failures.
Back-patch to 9.6 where this code was added. In v10 and HEAD, also
remove unused "use RecursiveCopy" in one recovery test script.
Michael Paquier and Tom Lane
Discussion: https://postgr.es/m/24621.1504924323@sss.pgh.pa.us
* Remove no-such-user test case, output isn't stable, and we really
don't need to be testing such cases here anyway.
* Fix the process exit code test logic to match PostgresNode::psql
(but I didn't bother with looking at the "core" flag).
* Give up on inf/nan tests.
Per buildfarm.
This moves the data directories from using temporary directories with
randomness in the directory name to a static name, to make it easier to
debug. The data directory will be retained if tests fail or the test
code dies/exits with failure, and is automatically removed on the next
make check.
If the environment variable PG_TEST_NOCLEAN is defined, the data
directories will be retained regardless of test or exit status.
Author: Daniel Gustafsson <daniel@yesql.se>
When output of IPC::Run::run () is redirected to scalar references, in
certain circumstances the Msys perl does not correctly detect that the
end of file has been seen, making the test hang indefinitely. One such
circumstance is when the command is 'pg_ctl start', and such a change
was made in commit f13ea95f9e. The workaround, which only applies on
MSys, is to redirect the output to temporary files and then read them in
when the process has finished.
Patch by me, reviewed and tweaked by Tom Lane.
This module becomes much more useful if we allow it to be used as base
class for external projects. To achieve this, change the exported
get_new_node function into a class method instead, and use the standard
Perl idiom of accepting the class as first argument. This method works
as expected for subclasses. The standalone function is kept for
backwards compatibility, though it could be removed in pg11.
Author: Chap Flackman, based on an earlier patch from Craig Ringer
Discussion: https://postgr.es/m/CAMsr+YF8kO+4+K-_U4PtN==2FndJ+5Bn6A19XHhMiBykEwv0wA@mail.gmail.com
By default, Perl's split() function drops trailing empty fields,
which is not what we want here. Oversight in commit fb093e4cb.
We'd managed to miss it thus far thanks to the very limited usage
of this function.
Discussion: https://postgr.es/m/14837.1499029831@sss.pgh.pa.us
Add an optional "expected" argument to override the default assumption
that we're waiting for the query to return "t". This allows replacing
a handwritten polling loop in recovery/t/007_sync_rep.pl with use of
poll_query_until(); AFAICS that's the only remaining ad-hoc polling
loop in our TAP tests.
Change poll_query_until() to probe ten times per second not once per
second. Like some similar changes I've been making recently, the
one-second interval seems to be rooted in ancient traditions rather
than the actual likely wait duration on modern machines. I'd consider
reducing it further if there were a convenient way to spawn just one
psql for the whole loop rather than one per probe attempt.
Discussion: https://postgr.es/m/12486.1498938782@sss.pgh.pa.us
Several callers of PostgresNode::poll_query_until() neglected to check
for failure; I do not think that's optional. Also, rewrite one place
that had reinvented poll_query_until() for no very good reason.
By default, wal_retrieve_retry_interval is five seconds, which is far
more than is needed in any of our TAP tests, leaving the test cases
just twiddling their thumbs for significant stretches. Moreover,
because it's so large, we get basically no testing of the retry-before-
master-is-ready code path. Hence, make PostgresNode::init set up
wal_retrieve_retry_interval = '500ms' as part of its customization of
test clusters' postgresql.conf. This shaves quite a few seconds off
the runtime of the recovery TAP tests.
Back-patch into 9.6. We have wal_retrieve_retry_interval in 9.5,
but the test infrastructure isn't there.
Discussion: https://postgr.es/m/31624.1498500416@sss.pgh.pa.us
Per discussion, "location" is a rather vague term that could refer to
multiple concepts. "LSN" is an unambiguous term for WAL locations and
should be preferred. Some function names, view column names, and function
output argument names used "lsn" already, but others used "location",
as well as yet other terms such as "wal_position". Since we've already
renamed a lot of things in this area from "xlog" to "wal" for v10,
we may as well incur a bit more compatibility pain and make these names
all consistent.
David Rowley, minor additional docs hacking by me
Discussion: https://postgr.es/m/CAKJS1f8O0njDKe8ePFQ-LK5-EjwThsDws6ohJ-+c6nWK+oUxtg@mail.gmail.com
archive_command and restore_command need to refer to Windows paths, not
Msys virtual file system paths, as postgres is completely unaware of the
latter, so prefix them with the Windows path to the virtual file system
root. Clean psql and pg_recvlogical output of carriage returns.
PostgresNode blithely ignored the exit status of pg_ctl, and in general
made no effort to be sure that the server was running when it should be.
This caused it to miss server crashes, which is a serious shortcoming
in a test scaffold. Make it complain if pg_ctl fails, and modify the
start and stop logic to complain if the server doesn't start, or doesn't
stop, when expected.
Also, have it turn off the "restart_after_crash" configuration parameter
in created clusters, as bitter experience has shown that leaving that on
can mask crashes too.
We might at some point need variant functions that allow for, eg,
server start failure to be expected. But no existing test case appears
to want that, and it surely shouldn't be the default behavior.
Note that this *will* break the buildfarm, as it will expose known
bugs that the previous testing failed to. I'm committing it despite
that, to verify that we get the expected failures in the buildfarm
not just in manual testing.
Back-patch into 9.6 where PostgresNode was introduced. (The 9.6
branch is not expected to show any failures.)
Discussion: https://postgr.es/m/21432.1492886428@sss.pgh.pa.us
Although the documentation for append_conf said clearly that it didn't
add a newline, many test authors seem to have forgotten that ... or maybe
they just consulted the example at the top of the POD documentation,
which clearly shows adding a config entry without bothering to add a
trailing newline. The worst part of that is that it works, as long as
you don't do it more than once, since the backend isn't picky about
whether config files end with newlines. So there's not a strong forcing
function reminding test authors not to do it like that. Upshot is that
this is a terribly fragile way to go about things, and there's at least
one existing test case that is demonstrably broken and not testing what
it thinks it is.
Let's just make append_conf append a newline, instead; that is clearly
way safer than the old definition.
I also cleaned up a few call sites that were unnecessarily ugly.
(I left things alone in places where it's plausible that additional
config lines would need to be added someday.)
Back-patch the change in append_conf itself to 9.6 where it was added,
as having a definitional inconsistency between branches would obviously
be pretty hazardous for back-patching TAP tests. The other changes are
just cosmetic and don't need to be back-patched.
Discussion: https://postgr.es/m/19751.1492892376@sss.pgh.pa.us