1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-03 20:02:46 +03:00
Commit Graph

66 Commits

Author SHA1 Message Date
8c7a8e19bb Probe only 127.0.0.1 when looking for ports on Unix.
Commit c0985099, later adjusted by commit 4ab02e81, probed 0.0.0.0
in addition to 127.0.0.1, for the benefit of Windows build farm
animals.  It isn't really useful on Unix systems, and turned out to
be a bit inconvenient to users of some corporate firewall software.
Switch back to probing just 127.0.0.1 on non-Windows systems.

Back-patch to 9.6, like the earlier changes.

Discussion: https://postgr.es/m/CA%2BhUKG%2B21EPwfgs4m%2BtqyRtbVqkOUvP8QQ8sWk9%2Bh55Aub1H3A%40mail.gmail.com
2019-05-08 22:04:03 +12:00
ab359624b4 Fix SHOW ALL command for non-superusers with replication connection
Since Postgres 10, SHOW commands can be triggered with replication
connections in a WAL sender context, however it missed that a
transaction context is needed for syscache lookups.  This commit makes
sure that the syscache lookups can happen correctly by setting a
transaction context when running SHOW commands in a WAL sender.

Superuser-only parameters can be displayed using SHOW commands not only
to superusers, but also to members of system role pg_read_all_settings,
which requires a syscache lookup to check if the connected role is a
member of this system role or not, or the instance crashes.  Superusers
do not need to check the syscache so it worked correctly in this case.

New tests are added to cover this issue.

Reported-by: Alexander Kukushkin
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/15734-2daa8761eeed8e20@postgresql.org
Backpatch-through: 10
2019-04-15 12:35:02 +09:00
4543ef36f0 Test both 0.0.0.0 and 127.0.0.x addresses to find a usable port.
Commit c098509927 changed
PostgresNode::get_new_node() to probe 0.0.0.0 instead of 127.0.0.1, but
the new test was less effective for Windows native Perl.  This increased
the failure rate of buildfarm members bowerbird and jacana.  Instead,
test 0.0.0.0 and concrete addresses.  This restores the old level of
defense, but the algorithm is still subject to its longstanding time of
check to time of use race condition.  Back-patch to 9.6, like the
previous change.

Discussion: https://postgr.es/m/GrdLgAdUK9FdyZg8VIcTDKVOkys122ZINEb3CjjoySfGj2KyPiMKTh1zqtRp0TAD7FJ27G-OBB3eplxIB5GhcQH5o8zzGZfp0MuJaXJxVxk=@yesql.se
2019-04-14 20:03:48 -07:00
2bc0474792 MSYS: Skip src/test/recovery/t/017_shm.pl.
Commit 947a35014f relied on a feature
available in v11 and later, so back-patching it to v10 and v9.6 was
invalid.  In those branches, revert it and skip the test on msys.

Discussion: https://postgr.es/m/GrdLgAdUK9FdyZg8VIcTDKVOkys122ZINEb3CjjoySfGj2KyPiMKTh1zqtRp0TAD7FJ27G-OBB3eplxIB5GhcQH5o8zzGZfp0MuJaXJxVxk=@yesql.se
2019-04-14 00:36:47 -07:00
61c0962d90 When Perl "kill(9, ...)" fails, try "pg_ctl kill".
Per buildfarm member jacana, the former fails under msys Perl 5.8.8.
Back-patch to 9.6, like the code in question.

Discussion: https://postgr.es/m/GrdLgAdUK9FdyZg8VIcTDKVOkys122ZINEb3CjjoySfGj2KyPiMKTh1zqtRp0TAD7FJ27G-OBB3eplxIB5GhcQH5o8zzGZfp0MuJaXJxVxk=@yesql.se
2019-04-13 11:09:30 -07:00
6d81e3c652 Consistently test for in-use shared memory.
postmaster startup scrutinizes any shared memory segment recorded in
postmaster.pid, exiting if that segment matches the current data
directory and has an attached process.  When the postmaster.pid file was
missing, a starting postmaster used weaker checks.  Change to use the
same checks in both scenarios.  This increases the chance of a startup
failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1
postmaster.pid` && rm postmaster.pid && pg_ctl -w start".  A postmaster
will no longer stop if shmat() of an old segment fails with EACCES.  A
postmaster will no longer recycle segments pertaining to other data
directories.  That's good for production, but it's bad for integration
tests that crash a postmaster and immediately delete its data directory.
Such a test now leaks a segment indefinitely.  No "make check-world"
test does that.  win32_shmem.c already avoided all these problems.  In
9.6 and later, enhance PostgresNode to facilitate testing.  Back-patch
to 9.4 (all supported versions).

Reviewed (in earlier versions) by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20190408064141.GA2016666@rfd.leadboat.com
2019-04-12 22:36:42 -07:00
7d18a55c90 Revert "Consistently test for in-use shared memory."
This reverts commits 2f932f71d9,
16ee6eaf80 and
6f0e190056.  The buildfarm has revealed
several bugs.  Back-patch like the original commits.

Discussion: https://postgr.es/m/20190404145319.GA1720877@rfd.leadboat.com
2019-04-05 00:00:55 -07:00
7c414cdc39 Consistently test for in-use shared memory.
postmaster startup scrutinizes any shared memory segment recorded in
postmaster.pid, exiting if that segment matches the current data
directory and has an attached process.  When the postmaster.pid file was
missing, a starting postmaster used weaker checks.  Change to use the
same checks in both scenarios.  This increases the chance of a startup
failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1
postmaster.pid` && rm postmaster.pid && pg_ctl -w start".  A postmaster
will no longer recycle segments pertaining to other data directories.
That's good for production, but it's bad for integration tests that
crash a postmaster and immediately delete its data directory.  Such a
test now leaks a segment indefinitely.  No "make check-world" test does
that.  win32_shmem.c already avoided all these problems.  In 9.6 and
later, enhance PostgresNode to facilitate testing.  Back-patch to 9.4
(all supported versions).

Reviewed by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20130911033341.GD225735@tornado.leadboat.com
2019-04-03 17:03:50 -07:00
0a576cd2a9 Make PostgresNode.pm's poll_query_until() more chatty about failures.
Reporting only the stderr is unhelpful when the problem is that the
server output we're getting doesn't match what was expected.  So we
should report the query output too; and just for good measure, let's
print the query we used and the output we expected.

Back-patch to 9.5 where poll_query_until was introduced.

Discussion: https://postgr.es/m/17913.1539634756@sss.pgh.pa.us
2018-10-16 12:27:33 -04:00
21d304dfed Final pgindent + perltidy run for v10. 2017-08-14 17:29:33 -04:00
54dacc7466 Make PostgresNode easily subclassable
This module becomes much more useful if we allow it to be used as base
class for external projects.  To achieve this, change the exported
get_new_node function into a class method instead, and use the standard
Perl idiom of accepting the class as first argument.  This method works
as expected for subclasses.  The standalone function is kept for
backwards compatibility, though it could be removed in pg11.

Author: Chap Flackman, based on an earlier patch from Craig Ringer
Discussion: https://postgr.es/m/CAMsr+YF8kO+4+K-_U4PtN==2FndJ+5Bn6A19XHhMiBykEwv0wA@mail.gmail.com
2017-07-25 18:51:47 -04:00
cde11fa3c0 Improve legibility of numeric literal 2017-07-17 15:35:46 -04:00
6c6970a280 Use usleep instead of select for timeouts in PostgresNode.pm
select() for pure timeouts is not portable, and in particular doesn't
work on Windows.

Discussion: https://postgr.es/m/186943e0-3405-978d-b19d-9d3335427c86@2ndQuadrant.com
2017-07-17 15:22:37 -04:00
efdb4f29ba Fix bug in PostgresNode::query_hash's split() call.
By default, Perl's split() function drops trailing empty fields,
which is not what we want here.  Oversight in commit fb093e4cb.
We'd managed to miss it thus far thanks to the very limited usage
of this function.

Discussion: https://postgr.es/m/14837.1499029831@sss.pgh.pa.us
2017-07-02 17:22:09 -04:00
de3de0afd7 Improve TAP test function PostgresNode::poll_query_until().
Add an optional "expected" argument to override the default assumption
that we're waiting for the query to return "t".  This allows replacing
a handwritten polling loop in recovery/t/007_sync_rep.pl with use of
poll_query_until(); AFAICS that's the only remaining ad-hoc polling
loop in our TAP tests.

Change poll_query_until() to probe ten times per second not once per
second.  Like some similar changes I've been making recently, the
one-second interval seems to be rooted in ancient traditions rather
than the actual likely wait duration on modern machines.  I'd consider
reducing it further if there were a convenient way to spawn just one
psql for the whole loop rather than one per probe attempt.

Discussion: https://postgr.es/m/12486.1498938782@sss.pgh.pa.us
2017-07-02 14:03:41 -04:00
b0f069d931 Clean up misuse and nonuse of poll_query_until().
Several callers of PostgresNode::poll_query_until() neglected to check
for failure; I do not think that's optional.  Also, rewrite one place
that had reinvented poll_query_until() for no very good reason.
2017-07-01 14:25:09 -04:00
2710ccd782 Reduce wal_retrieve_retry_interval in applicable TAP tests.
By default, wal_retrieve_retry_interval is five seconds, which is far
more than is needed in any of our TAP tests, leaving the test cases
just twiddling their thumbs for significant stretches.  Moreover,
because it's so large, we get basically no testing of the retry-before-
master-is-ready code path.  Hence, make PostgresNode::init set up
wal_retrieve_retry_interval = '500ms' as part of its customization of
test clusters' postgresql.conf.  This shaves quite a few seconds off
the runtime of the recovery TAP tests.

Back-patch into 9.6.  We have wal_retrieve_retry_interval in 9.5,
but the test infrastructure isn't there.

Discussion: https://postgr.es/m/31624.1498500416@sss.pgh.pa.us
2017-06-26 19:01:26 -04:00
ce55481032 Post-PG 10 beta1 pgperltidy run 2017-05-17 19:01:23 -04:00
c1a7f64b4a Replace "transaction log" with "write-ahead log"
This makes documentation and error messages match the renaming of "xlog"
to "wal" in APIs and file naming.
2017-05-12 11:52:43 -04:00
d10c626de4 Rename WAL-related functions and views to use "lsn" not "location".
Per discussion, "location" is a rather vague term that could refer to
multiple concepts.  "LSN" is an unambiguous term for WAL locations and
should be preferred.  Some function names, view column names, and function
output argument names used "lsn" already, but others used "location",
as well as yet other terms such as "wal_position".  Since we've already
renamed a lot of things in this area from "xlog" to "wal" for v10,
we may as well incur a bit more compatibility pain and make these names
all consistent.

David Rowley, minor additional docs hacking by me

Discussion: https://postgr.es/m/CAKJS1f8O0njDKe8ePFQ-LK5-EjwThsDws6ohJ-+c6nWK+oUxtg@mail.gmail.com
2017-05-11 11:49:59 -04:00
33f3bbc6d3 Fix TAP infrastructure to support Mingw better
archive_command and restore_command need to refer to Windows paths, not
Msys virtual file system paths, as postgres is completely unaware of the
latter, so prefix them with the Windows path to the virtual file system
root. Clean psql and pg_recvlogical output of carriage returns.
2017-04-23 09:21:38 -04:00
7d68f2281a Make PostgresNode.pm check server status more carefully.
PostgresNode blithely ignored the exit status of pg_ctl, and in general
made no effort to be sure that the server was running when it should be.
This caused it to miss server crashes, which is a serious shortcoming
in a test scaffold.  Make it complain if pg_ctl fails, and modify the
start and stop logic to complain if the server doesn't start, or doesn't
stop, when expected.

Also, have it turn off the "restart_after_crash" configuration parameter
in created clusters, as bitter experience has shown that leaving that on
can mask crashes too.

We might at some point need variant functions that allow for, eg,
server start failure to be expected.  But no existing test case appears
to want that, and it surely shouldn't be the default behavior.

Note that this *will* break the buildfarm, as it will expose known
bugs that the previous testing failed to.  I'm committing it despite
that, to verify that we get the expected failures in the buildfarm
not just in manual testing.

Back-patch into 9.6 where PostgresNode was introduced.  (The 9.6
branch is not expected to show any failures.)

Discussion: https://postgr.es/m/21432.1492886428@sss.pgh.pa.us
2017-04-22 18:18:25 -04:00
8a19c1a373 Make PostgresNode::append_conf append a newline automatically.
Although the documentation for append_conf said clearly that it didn't
add a newline, many test authors seem to have forgotten that ... or maybe
they just consulted the example at the top of the POD documentation,
which clearly shows adding a config entry without bothering to add a
trailing newline.  The worst part of that is that it works, as long as
you don't do it more than once, since the backend isn't picky about
whether config files end with newlines.  So there's not a strong forcing
function reminding test authors not to do it like that.  Upshot is that
this is a terribly fragile way to go about things, and there's at least
one existing test case that is demonstrably broken and not testing what
it thinks it is.

Let's just make append_conf append a newline, instead; that is clearly
way safer than the old definition.

I also cleaned up a few call sites that were unnecessarily ugly.
(I left things alone in places where it's plausible that additional
config lines would need to be added someday.)

Back-patch the change in append_conf itself to 9.6 where it was added,
as having a definitional inconsistency between branches would obviously
be pretty hazardous for back-patching TAP tests.  The other changes are
just cosmetic and don't need to be back-patched.

Discussion: https://postgr.es/m/19751.1492892376@sss.pgh.pa.us
2017-04-22 16:58:15 -04:00
3371e4d9b1 Change default of log_directory to 'log'
The previous default 'pg_log' might have indicated by its "pg_" prefix
that it is an internal system directory.  The new default is more in
line with the typical naming of directories with user-facing log files.
Together with the renaming of pg_clog and pg_xlog, this should clear up
that difference.

Author: Andreas Karlsson <andreas@proxel.se>
2017-03-27 10:34:33 -04:00
facde2a98f Clean up Perl code according to perlcritic
Fix all perlcritic warnings of severity level 5, except in
src/backend/utils/Gen_dummy_probes.pl, which is automatically generated.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
2017-03-27 08:18:22 -04:00
eb2a6131be Add a pg_recvlogical wrapper to PostgresNode
Allows testing of logical decoding using SQL interface and/or pg_recvlogical
Most logical decoding tests are in contrib/test_decoding. This module
is for work that doesn't fit well there, like where server restarts
are required.

Craig Ringer
2017-03-21 14:04:49 +00:00
be37c2120a Enable replication connections by default in pg_hba.conf
initdb now initializes a pg_hba.conf that allows replication connections
from the local host, same as it does for regular connections.  The
connecting user still needs to have the REPLICATION attribute or be a
superuser.

The intent is to allow pg_basebackup from the local host to succeed
without requiring additional configuration.

Michael Paquier <michael.paquier@gmail.com> and me
2017-03-09 08:39:44 -05:00
231f48796b Fix timeouts in PostgresNode::psql
Newer Perl or IPC::Run versions default to appending the filename to string
exceptions, e.g. the exception

    psql timed out

 is thrown as

    psql timed out at /usr/share/perl5/vendor_perl/IPC/Run.pm line 2961.

To handle this, match exceptions with !~ rather than ne.

From: Craig Ringer <craig@2ndquadrant.com>
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
2017-03-01 14:18:51 -05:00
806091c96f Remove all references to "xlog" from SQL-callable functions in pg_proc.
Commit f82ec32ac3 renamed the pg_xlog
directory to pg_wal.  To make things consistent, and because "xlog" is
terrible terminology for either "transaction log" or "write-ahead log"
rename all SQL-callable functions that contain "xlog" in the name to
instead contain "wal".  (Note that this may pose an upgrade hazard for
some users.)

Similarly, rename the xlog_position argument of the functions that
create slots to be called wal_position.

Discussion: https://www.postgresql.org/message-id/CA+Tgmob=YmA=H3DbW1YuOXnFVgBheRmyDkWcD9M8f=5bGWYEoQ@mail.gmail.com
2017-02-09 15:10:09 -05:00
665d1fad99 Logical replication
- Add PUBLICATION catalogs and DDL
- Add SUBSCRIPTION catalog and DDL
- Define logical replication protocol and output plugin
- Add logical replication workers

From: Petr Jelinek <petr@2ndquadrant.com>
Reviewed-by: Steve Singer <steve@ssinger.info>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Erik Rijkers <er@xs4all.nl>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
2017-01-20 09:04:49 -05:00
f6d6d2920d Change default values for backup and replication parameters
This changes the default values of the following parameters:

wal_level = replica
max_wal_senders = 10
max_replication_slots = 10

in order to make it possible to make a backup and set up simple
replication on the default settings, without requiring a system restart.

Discussion: https://postgr.es/m/CABUevEy4PR_EAvZEzsbF5s+V0eEvw7shJ2t-AUwbHOjT+yRb3A@mail.gmail.com

Reviewed by Peter Eisentraut. Benchmark help from Tomas Vondra.
2017-01-14 17:14:56 +01:00
05cd12ed5b pg_ctl: Change default to wait for all actions
The different actions in pg_ctl had different defaults for -w and -W,
mostly for historical reasons.  Most users will want the -w behavior, so
make that the default.

Remove the -w option in most example and test code, so avoid confusion
and reduce verbosity.  pg_upgrade is not touched, so it can continue to
work with older installations.

Reviewed-by: Beena Emerson <memissemerson@gmail.com>
Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
2017-01-14 09:15:08 -05:00
750c59d7ec Fix mistake in comment
The node->restart() function doesn't take a mode argument.
2017-01-12 10:24:10 -05:00
2e44f379bc Fix format for TAP test docs
Small number of fixes to perl docs for TAP tests.
Plus two comments that use "xlog" rather than WAL

Michael Paquier
2017-01-05 10:07:59 +00:00
fb093e4cb3 Allow PostgresNode.pm tests to wait for catchup
Add methods to the core test framework PostgresNode.pm to allow us to
test that standby nodes have caught up with the master, as well as
basic LSN handling.  Used in tests recovery/t/001_stream_rep.pl and
recovery/t/004_timeline_switch.pl

Craig Ringer, reviewed by Aleksander Alekseev and Simon Riggs
2017-01-04 16:50:23 +00:00
9a4d51077c Make wal streaming the default mode for pg_basebackup
Since streaming is now supported for all output formats, make this the
default as this is what most people want.

To get the old behavior, the parameter -X none can be specified to turn
it off.

This also removes the parameter -x for fetch, now requiring -X fetch to
be specified to use that.

Reviewed by Vladimir Rusinov, Michael Paquier and Simon Riggs
2017-01-04 10:40:38 +01:00
e5a9bcb529 Use pg_ctl promote -w in TAP tests
Switch TAP tests to use the new wait mode of pg_ctl promote.  This
allows avoiding extra logic with poll_query_until() to be sure that a
promoted standby is ready for read-write queries.

From: Michael Paquier <michael.paquier@gmail.com>
2016-10-19 09:18:50 -04:00
5d58c07a44 initdb pg_basebackup: Rename --noxxx options to --no-xxx
--noclean and --nosync were the only options spelled without a hyphen,
so change this for consistency with other options.  The options in
pg_basebackup have not been in a release, so we just rename them.  For
initdb, we retain the old variants.

Vik Fearing and me
2016-10-19 08:48:48 -04:00
61f9e7ba3c Update obsolete comments and perldoc.
Loose ends from commit 2a0f89cd71.

Daniel Gustafsson
2016-10-05 13:09:52 -04:00
a4327296df Set log_line_prefix and application name in test drivers
Before pg_regress runs psql, set the application name to the test name.
Similarly, set the application name to the test file name in the TAP
tests.  Also, set a default log_line_prefix that show the application
name, as well as the PID and a time stamp.

That way, the server log output can be correlated to the test input
files, making debugging a bit easier.
2016-09-30 21:32:33 -04:00
728a3e73e9 Switch pg_basebackup commands in Postgres.pm to use --nosync
On slow machines, this greatly reduces the I/O pressure induced by the
tests.

From: Michael Paquier <michael.paquier@gmail.com>
2016-09-29 12:00:00 -04:00
8b845520fb Add tests for various connection string issues
Add tests for consistent support of connection strings in frontend
programs as well as proper handling of unusual characters in database
and user names.  These tests were developed for the issues of
CVE-2016-5424.

To allow testing of names with spaces, change the pg_regress
command-line options --create-role and --dbname to split their arguments
by comma only, not space or comma as before.  Only commas were actually
used in existing uses.

Noah Misch, Michael Paquier, Peter Eisentraut
2016-09-22 12:00:00 -04:00
b5bce6c1ec Final pgindent + perltidy run for 9.6. 2016-08-15 13:42:51 -04:00
2a0f89cd71 Give recovery tests more time to finish
These tests are currently only running in buildfarm member hamster,
which is purposefully very slow.  This suite has failed a couple of
times recently because of timeouts, so increase the allowed number of
iterations to avoid spurious failures.

Author: Michaël Paquier
2016-07-25 01:34:35 -04:00
30b2731bd2 Fix TAP tests and MSVC scripts for pathnames with spaces.
Change assorted places in our Perl code that did things like
	system("prog $path/file");
to do it more like
	system('prog', "$path/file");
which is safe against spaces and other special characters in the path
variable.  The latter was already the prevailing style, but a few bits
of code hadn't gotten this memo.  Back-patch to 9.4 as relevant.

Michael Paquier, Kyotaro Horiguchi

Discussion: <20160704.160213.111134711.horiguchi.kyotaro@lab.ntt.co.jp>
2016-07-09 16:47:38 -04:00
3be0a62ffe Finish pgindent run for 9.6: Perl files. 2016-06-12 04:19:56 -04:00
08af921906 Fix order of shutdown cleanup operations in PostgresNode.pm.
Previously, database clusters created by a TAP test were shut down by
DESTROY methods attached to the PostgresNode objects representing them.
The trouble with that is that if the objects survive into the final global
destruction phase (which they do), Perl executes the DESTROY methods in an
unspecified order.  Thus, the order of shutdown of multiple clusters was
indeterminate, which might lead to not-very-reproducible errors getting
logged (eg from a slave whose master might or might not get killed first).
Worse, the File::Temp objects representing the temporary PGDATA directories
might get destroyed before the PostgresNode objects, resulting in attempts
to delete PGDATA directories that still have live servers in them.  On
Windows, this would lead to directory deletion failures; on Unix, it
usually had no effects worse than erratic "could not open temporary
statistics file "pg_stat/global.tmp": No such file or directory" log
messages.

While none of this would affect the reported result of the TAP test, which
is already determined, it could be very confusing when one is trying to
understand from the logs what went wrong with a failed test.

To fix, do the postmaster shutdowns in an END block rather than at object
destruction time.  The END block will execute at a well-defined (and
reasonable) time during script termination, and it will stop the
postmasters in order of PostgresNode object creation.  (Perhaps we should
change that to be reverse order of creation, but the main point here is
that we now have control which we did not before.)  Use "pg_ctl stop", not
an asynchronous kill(SIGQUIT), so that we wait for the postmasters to shut
down before proceeding with directory deletion.

Deletion of temporary directories still happens in an unspecified order
during global destruction, but I can see no reason to care about that
once the postmasters are stopped.
2016-04-26 12:43:03 -04:00
40e89e2ab8 Try harder to detect a port conflict in PostgresNode.pm.
Commit fab84c7787 tried to get away without doing an actual bind(),
but buildfarm results show that that doesn't get the job done.  So we must
really bind to the target port --- and at least on my Linux box, we need a
listen() as well, or conflicts won't be detected.  We rely on SO_REUSEADDR
to prevent problems from starting a postmaster on the socket immediately
after we've bound to it in the test code.  (There may be platforms where
that doesn't work too well.  But fortunately, we only really care whether
this works on Windows, and there the default behavior should be OK.)
2016-04-25 12:28:49 -04:00
fab84c7787 Improve PostgresNode.pm's logic for detecting already-in-use ports.
Buildfarm members bowerbird and jacana have shown intermittent "could not
bind IPv4 socket" failures in the BinInstallCheck stage since mid-December,
shortly after commits 1caef31d9e and 9821492ee4 changed the
logic for selecting which port to use in temporary installations.  One
plausible explanation is that we are randomly selecting ports that are
already in use for some non-Postgres purpose.  Although the code tried
to defend against already-in-use ports, it used pg_isready to probe
the port which is quite unhelpful: if some non-Postgres server responds
at the given address, pg_isready will generally say "no response",
leading to exactly the wrong conclusion about whether the port is free.

Instead, let's use a simple TCP connect() call to see if anything answers
without making assumptions about what it is.  Note that this means there's
no direct check for a conflicting Unix socket, but that should be okay
because there should be no other Unix sockets in use in the temporary
socket directory created for a test run.

This is only a partial solution for the TCP case, since if the port number
is in use for an outgoing connection rather than a listening socket, we'll
fail to detect that.  We could try to bind() to the proposed port as a
means of detecting that case, but that would introduce its own failure
modes, since the system might consider the address to remain reserved for
some period of time after we drop the bound socket.  Close study of the
errors returned by bowerbird and jacana suggests that what we're seeing
there may be conflicts with listening not outgoing sockets, so let's try
this and see if it improves matters.  It's certainly better than what's
there now, in any case.

Michael Paquier, adjusted by me to work on non-Windows as well as Windows
2016-04-24 15:31:45 -04:00
196b72fb9a Add regression tests for multiple synchronous standbys.
Authors: Suraj Kharage, Michael Paquier, Masahiko Sawada, refactored by me
Reviewed-By: Kyotaro Horiguchi
2016-04-08 16:48:53 +09:00