This commit adds the version information of a node initialized by
Cluster.pm, that may vary depending on the install_path given by the
test. The code was written so as the node information, that includes
the version number, was dumped before the version number was set.
This is particularly useful for the pg_upgrade TAP tests, that may mix
several versions for cross-version runs. The TAP infrastructure also
allows mixing nodes with different versions, so this information can be
useful for out-of-core tests.
Backpatch down to v15, where Cluster.pm and the pg_upgrade TAP tests
have been introduced.
Author: Potapov Alexander <a.potapov@postgrespro.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/e59bb-692c0a80-5-6f987180@170377126
Backpatch-through: 15
The documentation for log_check() had the parameters in the wrong
order. Also while there, rename %parameters to %params to better
documentation for similar functions which use %params. Backpatch
down to v14 where this was introduced.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/9F503B5-32F2-45D7-A0AE-952879AD65F1@yesql.se
Backpatch-through: 14
Previously the portlock logic, added in 9b4eafcaf4, didn't actually work
properly when the tests were run via meson. 9b4eafcaf4 used the
MESON_BUILD_ROOT environment variable to determine the directory for the port
lock directory, but that's never set for running the tests. That meant that
each test used its own portlock dir, unless the PG_TEST_PORT_DIR environment
variable was set.
Fix the problem by setting top_builddir for the environment. That's also used
for the autoconf/make build.
Backpatch back to 16, where meson support was added.
Reported-by: Zharkov Roman <r.zharkov@postgrespro.ru>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Backpatch-through: 16
This is a backport of ba08edb065. Originally it was only applied to master,
but I (Andres) am planning to fix a few bugs in BackgroundPsql, which would be
somewhat harder with the behavioural differences across branches. It's also
generally good for test infrastructure to behave similarly across branches, to
avoid pain during backpatching.
Discussion: https://postgr.es/m/ilcctzb5ju2gulvnadjmhgatnkxsdpac652byb2u3d3wqziyvx@fbuqcglker46
Michael's original commit message:
This commit extends the constructor routine of BackgroundPsql.pm with a
new "wait" parameter. If set to 0, the routine returns without waiting
for psql to start, ready to consume input.
background_psql() in Cluster.pm gains the same "wait" parameter. The
default behavior is still to wait for psql to start. It becomes now
possible to not wait, giving to TAP scripts the possibility to perform
actions between a BackgroundPsql startup and its wait_connect() call.
Author: Jacob Champion
Discussion: https://postgr.es/m/CAOYmi+=60deN20WDyCoHCiecgivJxr=98s7s7-C8SkXwrCfHXg@mail.gmail.com
These facilities were originally in the recovery TAP test
039_end_of_wal.pl. A follow-up bug fix with a TAP test doing similar
WAL manipulations requires them, and all these had better not be
duplicated due to their complexity. The routine names are tweaked to
use "wal" more consistently, similarly to the existing "advance_wal".
In v14 and v13, the new routines are moved to PostgresNode.pm.
039_end_of_wal.pl is updated to use the refactored routines, without
changing its coverage.
Reviewed-by: Alexander Kukushkin
Discussion: https://postgr.es/m/CAFh8B=mozC+e1wGJq0H=0O65goZju+6ab5AU7DEWCSUA2OtwDg@mail.gmail.com
Backpatch-through: 13
If we choose ports in the range typically used for ephemeral ports there
is a danger of encountering a port conflict due to a race condition
between the time we choose the port in a range below that typically used
to allocate ephemeral ports, but higher than the range typically used by
well known services.
Author: Jelte Fenema-Nio, with some editing by me.
Discussion: https://postgr.es/m/d6ee8761-39d1-0033-1afb-d5a57ee056f2@gmail.com
Backpatch to all live branches (12 and up)
Backport the new BackgroundPsql modules and the constructor functions,
background_psql() and interactive_psql, to all supported
branches. That makes it easier to backpatch tests that use it.
BackgroundPsql was introduced in version 16. On version 16, this
commit backports just the new timeout argument from master (commit
334f512f45). On older branches, the whole facility. This includes the
change to `use warnings FATAL => 'all'`, which we haven't otherwise
backported, but it seems good to keep the file identical across
branches.
Discussion: https://www.postgresql.org/message-id/b7c64f20-ea01-4f15-9088-0cd6832af149@iki.fi
Before the previous commit, the test could hang until
LOG_SNAPSHOT_INTERVAL_MS (15s), until checkpoint_timeout (300s), or
indefinitely. An indefinite hang was awfully improbable. It entailed
the test reaching checkpoint_timeout before the
DecodingContextFindStartpoint() of a CREATE SUBSCRIPTION, yet after the
preceding WAL record. Back-patch to v16, which introduced the test.
Bertrand Drouvot, reported by Noah Misch.
Discussion: https://postgr.es/m/20240211010227.a2.nmisch@google.com
Commit 664d75753 pulled 010_tab_completion.pl's infrastructure for
invoking an interactive psql session out into a generally-useful test
function, but it didn't move enough stuff. We need to set up various
environment variables that readline will look at, both to ensure
stability of test results and to prevent test actions from cluttering
the calling user's ~/.psql_history. Expecting calling scripts to
remember to do that is too failure-prone: the other existing caller
001_password.pl did not do it. Hence, remove those initialization
steps from 010_tab_completion.pl and put them into interactive_psql().
Since interactive_psql was already making a local ENV hash, this has
no effect on calling scripts.
Discussion: https://postgr.es/m/794610.1703182896@sss.pgh.pa.us
The same routine to check if a specific pattern can be found in the
server logs was copied over four different test scripts. This refactors
the whole to use a single routine located in PostgreSQL::Test::Cluster,
named log_contains, to grab the contents of the server logs and check
for a specific pattern.
On HEAD, the code previously used assumed that slurp_file() could not
handle an undefined offset, setting it to zero, but slurp_file() does
do an extra fseek() before retrieving the log contents only if an offset
is defined. In two places, the test was retrieving the full log
contents with slurp_file() after calling substr() to apply an offset,
ignoring that slurp_file() would be able to handle that.
Backpatch all the way down to ease the introduction of new tests that
could rely on the new routine.
Author: Vignesh C
Reviewed-by: Andrew Dunstan, Dagfinn Ilmari Mannsåker, Michael Paquier
Discussion: https://postgr.es/m/CALDaNm0YSiLpjCmajwLfidQrFOrLNKPQir7s__PeVvh9U3uoTQ@mail.gmail.com
Backpatch-through: 11
This commit refactors a bit the code in charge of checking for log
patterns when connections fail or succeed, by moving the log pattern
checks into their own routine, for clarity. This has come up as
something to improve while discussing the refactoring of find_in_log().
Backpatch down to 14 where these routines are used, to ease the
introduction of new tests that could rely on them.
Author: Vignesh C, Michael Paquier
Discussion: https://postgr.es/m/CALDaNm0YSiLpjCmajwLfidQrFOrLNKPQir7s__PeVvh9U3uoTQ@mail.gmail.com
Backpatch-through: 14
Run pgindent, pgperltidy, and reformat-dat-files.
This set of diffs is a bit larger than typical. We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop). We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up. Going
forward, that should make for fewer random-seeming changes to existing
code.
Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
This breaks out the background and interactive psql functionality into a
new class, PostgreSQL::Test::BackgroundPsql. Sessions are still initiated
via PostgreSQL::Test::Cluster, but once started they can be manipulated by
the new helper functions which intend to make querying easier. A sample
session for a command which can be expected to finish at a later time can
be seen below.
my $session = $node->background_psql('postgres');
$bsession->query_until(qr/start/, q(
\echo start
CREATE INDEX CONCURRENTLY idx ON t(a);
));
$bsession->quit;
Patch by Andres Freund with some additional hacking by me.
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/20230130194350.zj5v467x4jgqt3d6@awork3.anarazel.de
This adds support for load balancing connections with libpq using a
connection parameter: load_balance_hosts=<string>. When setting the
param to random, hosts and addresses will be connected to in random
order. This then results in load balancing across these addresses and
hosts when multiple clients or frequent connection setups are used.
The randomization employed performs two levels of shuffling:
1. The given hosts are randomly shuffled, before resolving them
one-by-one.
2. Once a host its addresses get resolved, the returned addresses
are shuffled, before trying to connect to them one-by-one.
Author: Jelte Fennema <postgres@jeltef.nl>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Reviewed-by: Andrey Borodin <amborodin86@gmail.com>
Discussion: https://postgr.es/m/PR3PR83MB04768E2FF04818EEB2179949F7A69@PR3PR83MB0476.EURPRD83.prod.outlook.
Currently there is a race condition where if concurrent TAP tests both
test that they can open a port they will assume that it is free and use
it, causing one of them to fail. To prevent this we record a reservation
using an exclusive lock, and any TAP test that discovers a reservation
checks to see if the reserving process is still alive, and looks for
another free port if it is.
Ports are reserved in a directory set by the environment setting
PG_TEST_PORT_DIR, or if that doesn't exist a subdirectory of the top
build directory as set by meson or Makefile.global, or its own
tmp_check directory.
The prove_check recipe in Makefile.global.in is extended to export
top_builddir to the TAP tests. This was already exported by the
prove_installcheck recipes.
Per complaint from Andres Freund
This will be backpatched in due course after some testing.
Discussion: https://postgr.es/m/20221002164931.d57hlutrcz4d2zi7@awork3.anarazel.de
Currently this only allows for one argument, which must be present, and
always returns a single string. With this change the following now all
work:
$all_config = $node->config_data;
%config_map = ($node->config_data);
$incdir = $node->config_data('--include-dir');
($incdir, $sharedir) = $node->config_data(
qw(--include-dir --share-dir));
Backpatch to release 15 where this was introduced.
Discussion: https://postgr.es/m/73eea68e-3b6f-5f63-6024-25ed26b52016@dunslane.net
Reviewed by Tom Lane, Alvaro Herrera, Michael Paquier.
Test files should now ignore has_wal_read_bug() so long as
wait_for_catchup() is their only known way of reaching the bug. That's
at least five files today, a number expected to grow over time. This
commit removes skip logic from three. By doing so, systems having the
bug regain the ability to catch other kinds of defects via those three
tests. The other two, 002_databases.pl and 031_recovery_conflict.pl,
have been unprotected. Back-patch to v15, where done_testing() first
became our standard.
Discussion: https://postgr.es/m/20221030031639.GA3082137@rfd.leadboat.com
Set up a signal handler for INT/TERM so that we run our END block if we
get them. In END, if the exit status indicates a problem, call
_update_pid(-1) to improve chances of the stop working in case start()
hasn't returned yet.
Also, change END's teardown_node() so that it passes fail_ok=>1, so that
if a node fails to stop, we still stop the other nodes in the same test.
Per complaint from Andres Freund.
This doesn't seem important enough to backpatch, at least for now.
Discussion: https://postgr.es/m/20220930040734.mbted42oiynhn2t6@awork3.anarazel.de
The oldest vendor-shipped Perl in the buildfarm is 5.14.2, which is
the last version that Debian Wheezy shipped. That OS is EOL, but we
keep it running because there is no other convenient way to test certain
non-mainstream 32-bit platforms. There is no bugfix in the 5.14.2 release
that is required, and yet it's also not the latest minor release --
that would be 5.14.4. To clarify the situation, we have thus arranged the
buildfarm to test 5.14.0. That allows configure scripts and documentation
to state 5.14 without fine print.
The MSVC build didn't check the version, since our previous minimum 5.8.3
was considered too old to check for on Windows. We will need a check for
Windows sometime during the v16 cycle, but that could be rendered moot
by the impending Meson conversion, so it seems safe to just document
the requirement for now.
Reviewed by Tom Lane
Discussion: https://www.postgresql.org/message-id/20220902181553.ev4pgzhubhdkguuv@awork3.anarazel.de
The TAP tests for logical replication in src/test/subscription are using
the following code in many places to make sure that the subscription is
synchronized with the publisher:
$node_publisher->wait_for_catchup('tap_sub');
$node_subscriber->poll_query_until('postgres',
qq[SELECT count(1) = 0
FROM pg_subscription_rel
WHERE srsubstate NOT IN ('r', 's')]);
The new function wait_for_subscription_sync() can be used to replace the
above code. This eliminates duplicated code and makes it easier to write
future tests.
Author: Masahiko Sawada
Reviewed by: Amit Kapila, Shi yu
Discussion: https://postgr.es/m/CAD21AoC-fvAkaKHa4t1urupwL8xbAcWRePeETvshvy80f6WV1A@mail.gmail.com
_pg_version (version number based on PostgreSQL::Version) is a field
private to Cluster.pm but there was no helper routine to retrieve it
from a Cluster's node. The same is done for install_path, for example,
and the version object becomes handy when writing tests that need
version-specific handling.
Reviewed-by: Andrew Dunstan, Daniel Gustafsson
Discussion: https://postgr.es/m/YoWfoJTc987tsxpV@paquier.xyz
Exclusive-mode backups have been deprecated since 9.6 (when
non-exclusive backups were introduced) due to the issues
they can cause should the system crash while one is running and
generally because non-exclusive provides a much better interface.
Further, exclusive backup mode wasn't really being tested (nor was most
of the related code- like being able to log in just to stop an exclusive
backup and the bits of the state machine related to that) and having to
possibly deal with an exclusive backup and the backup_label file
existing during pg_basebackup, pg_rewind, etc, added other complexities
that we are better off without.
This patch removes the exclusive backup mode, the various special cases
for dealing with it, and greatly simplifies the online backup code and
documentation.
Authors: David Steele, Nathan Bossart
Reviewed-by: Chapman Flack
Discussion: https://postgr.es/m/ac7339ca-3718-3c93-929f-99e725d1172c@pgmasters.nethttps://postgr.es/m/CAHg+QDfiM+WU61tF6=nPZocMZvHDzCK47Kneyb0ZRULYzV5sKQ@mail.gmail.com
Commit fb16d2c658 used the old but not quite old enough parent module,
which dates to perl version 5.10.1 as a core module. We still have a
dinosaur or two running older versions of perl, so rather than require
an upgrade in those we simply do in place what parent.pm's import()
would have done for us.
Reviewed by Tom Lane
Discussion: https://postgr.es/m/474104.1648685981@sss.pgh.pa.us
We do this via a subclass for any branch older than the minimum known
to be compatible with the main package (currently release 12).
This should be useful for constructing cross-version tests.
In theory this could be extended back any number of versions, with
varying degrees of compatibility.
Reviewed by Michael Paquier and Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/a3efd19a-d5c9-fdf2-6094-4cde056a2708@dunslane.net
Curently, some TAP test that directly call the underlying function
PostgreSQL::Test::Utils::run_log() care about the return value, but
none of those that call it via PostgreSQL::Test::Cluster::run_log() care.
However, I'd like to add a test that will care, so adjust this function
to return whatever it gets back from the underlying function, just as
we do for a number of other functions in this module.
Discussion: http://postgr.es/m/CA+Tgmobj6u-nWF-j=FemygUhobhryLxf9h-wJN7W-2rSsseHNA@mail.gmail.com
The previous method for doing that was to write zeroes into a
predetermined set of page locations. However, there's a roughly
1-in-64K chance that the existing checksum will match by chance,
and yesterday several buildfarm animals started to reproducibly
see that, resulting in test failures because no checksum mismatch
was reported.
Since the checksum includes the page LSN, test success depends on
the length of the installation's WAL history, which is affected by
(at least) the initial catalog contents, the set of locales installed
on the system, and the length of the pathname of the test directory.
Sooner or later we were going to hit a chance match, and today is
that day.
Harden these tests by specifically inverting the checksum field and
leaving all else alone, thereby guaranteeing that the checksum is
incorrect.
In passing, fix places that were using seek() to set up for syswrite(),
a combination that the Perl docs very explicitly warn against. We've
probably escaped problems because no regular buffered I/O is done on
these filehandles; but if it ever breaks, we wouldn't deserve or get
much sympathy.
Although we've only seen problems in HEAD, now that we recognize the
environmental dependencies it seems like it might be just a matter
of time until someone manages to hit this in back-branch testing.
Hence, back-patch to v11 where we started doing this kind of test.
Discussion: https://postgr.es/m/3192026.1648185780@sss.pgh.pa.us
Slow hosts may avoid load-induced, spurious failures by setting
environment variable PG_TEST_TIMEOUT_DEFAULT to some number of seconds
greater than 180. Developers may see faster failures by setting that
environment variable to some lesser number of seconds. In tests, write
$PostgreSQL::Test::Utils::timeout_default wherever the convention has
been to write 180. This change raises the default for some briefer
timeouts. Back-patch to v10 (all supported versions).
Discussion: https://postgr.es/m/20220218052842.GA3627003@rfd.leadboat.com
Following migration of Windows buildfarm members running TAP tests to
use of ucrt64 perl for those tests, special processing for msys perl is
no longer necessary and so is removed.
Backpatch to release 10
Discussion: https://postgr.es/m/c65a8781-77ac-ea95-d185-6db291e1baeb@dunslane.net
Commits 6c4a8903b et al. had a couple of deficiencies:
* The logic I added to Cluster::start to see if a PID file is present
could be fooled by a stale PID file left over from a previous
postmaster. To fix, if we're not sure whether we expect to find a
running postmaster or not, validate the PID using "kill 0".
* 017_shm.pl has a loop in which it just issues repeated Cluster::start
calls; this will fail if some invocation fails but leaves self->_pid
set. Per buildfarm results, the above fix is not enough to make this
safe: we might have "validated" a PID for a postmaster that exits
immediately after we look. Hence, match each failed start call with
a stop call that will get us back to the self->_pid == undef state.
Add a fail_ok option to Cluster::stop to make this work.
Discussion: https://postgr.es/m/CA+hUKGKV6fOHvfiPt8=dOKzvswjAyLoFoJF1iQXMNpi7+hD1JQ@mail.gmail.com
"pg_ctl start" might start a new postmaster and then return failure
anyway, for example if PGCTLTIMEOUT is exceeded. If there is a
postmaster there, it's still incumbent on us to shut it down at
script end, so check for the PID file even though we are about
to fail.
This has been broken all along, so back-patch to all supported branches.
Discussion: https://postgr.es/m/647439.1642622744@sss.pgh.pa.us
By default, wait_for_catchup() waits for the replication connection
to reach the primary's write LSN. That's fine, but in an apparent
attempt to save one query round-trip, it was coded so that we
executed pg_current_wal_lsn() again during each probe query.
Thus, we presented the standby with a moving target to be reached.
(While the test script itself couldn't be causing the write LSN
to advance while it's blocked in wait_for_catchup(), it's plenty
plausible that background activity such as autovacuum is emitting
more WAL.) That could make the test take longer than necessary,
and potentially it could mask bugs by allowing the standby to process
more WAL than a strict interpretation of the test scenario allows.
So, change wait_for_catchup() to do it "by the book", explicitly
collecting the write LSN to wait for at the outset.
Also, various call sites were instructing wait_for_catchup() to
wait for the standby to reach the primary's insert LSN rather than
its write LSN. This also seems like a bad idea. While in most
test scenarios those are the same, if they are different then the
inserted-but-not-yet-written WAL is not presently available to the
standby. The test isn't doing anything to make it become so, so
again we have the potential for unwanted test delay, perhaps even
a test timeout. (Again, background activity would be needed to
make this more than a hypothetical problem.) Hence, change the
callers where necessary so that the wait target is always the
primary's write LSN.
While at it, simplify callers by making use of wait_for_catchup's
default arguments wherever possible (the preceding change makes
this possible in more places than it was before). And rewrite
wait_for_catchup's documentation a bit.
Patch by me; thanks to Julien Rouhaud for review.
Discussion: https://postgr.es/m/2368336.1641843098@sss.pgh.pa.us
Prevent logical replication workers from performing insert, update,
delete, truncate, or copy commands on tables unless the subscription
owner has permission to do so.
Prevent subscription owners from circumventing row-level security by
forbidding replication into tables with row-level security policies
which the subscription owner is subject to, without regard to whether
the policy would ordinarily allow the INSERT, UPDATE, DELETE or
TRUNCATE which is being replicated. This seems sufficient for now, as
superusers, roles with bypassrls, and target table owners should still
be able to replicate despite RLS policies. We can revisit the
question of applying row-level security policies on a per-row basis if
this restriction proves too severe in practice.
Author: Mark Dilger
Reviewed-by: Jeff Davis, Andrew Dunstan, Ronan Dunklau
Discussion: https://postgr.es/m/9DFC88D3-1300-4DE8-ACBC-4CEF84399A53%40enterprisedb.com