Turns out that the socket option for this is named TCP_KEEPALIVE_THRESHOLD,
at least according to the tcp(7P) man page for Solaris 11. (But since that
text refers to "SunOS", it's likely pretty ancient.) It appears that the
symbol TCP_KEEPALIVE does get defined on that platform, but it doesn't
seem to represent a valid protocol-level socket option. This leads to
bleats in the postmaster log, and no tcp_keepalives_idle functionality.
Per bug #14720 from Andrey Lizenko, as well as an earlier report from
Dhiraj Chawla that nobody had followed up on. The issue's been there
since we added the TCP_KEEPALIVE code path in commit 5acd417c8, so
back-patch to all supported branches.
Discussion: https://postgr.es/m/20170627163757.25161.528@wrigleys.postgresql.org
If authentication over an SSL connection fails, with sslmode=prefer,
libpq will reconnect without SSL and retry. However, we did not clear
the variables related to GSS, SSPI, and SASL authentication state, when
reconnecting. Because of that, the second authentication attempt would
always fail with a "duplicate GSS/SASL authentication request" error.
pg_SSPI_startup did not check for duplicate authentication requests like
the corresponding GSS and SASL functions, so with SSPI, you would leak
some memory instead.
Another way this could manifest itself, on version 10, is if you list
multiple hostnames in the "host" parameter. If the first server requests
Kerberos or SCRAM authentication, but it fails, the attempts to connect to
the other servers will also fail with "duplicate authentication request"
errors.
To fix, move the clearing of authentication state from closePGconn to
pgDropConnection, so that it is cleared also when re-connecting.
Patch by Michael Paquier, with some kibitzing by me.
Backpatch down to 9.3. 9.2 has the same bug, but the code around closing
the connection is somewhat different, so that this patch doesn't apply.
To fix this in 9.2, I think we would need to back-port commit 210eb9b743
first, and then apply this patch. However, given that we only bumped into
this in our own testing, we haven't heard any reports from users about
this, and that 9.2 will be end-of-lifed in a couple of months anyway, it
doesn't seem worth the risk and trouble.
Discussion: https://www.postgresql.org/message-id/CAB7nPqRuOUm0MyJaUy9L3eXYJU3AKCZ-0-03=-aDTZJGV4GyWw@mail.gmail.com
Commit 65c3bf19fd moved handling of the,
already then, deprecated requiressl parameter into conninfo_storeval().
The default PGREQUIRESSL environment variable was however lost in the
change resulting in a potentially silent accept of a non-SSL connection
even when set. Its documentation remained. Restore its implementation.
Also amend the documentation to mark PGREQUIRESSL as deprecated for
those not following the link to requiressl. Back-patch to 9.3, where
commit 65c3bf1 first appeared.
Behavior has been more complex when the user provides both deprecated
and non-deprecated settings. Before commit 65c3bf1, libpq operated
according to the first of these found:
requiressl=1
PGREQUIRESSL=1
sslmode=*
PGSSLMODE=*
(Note requiressl=0 didn't override sslmode=*; it would only suppress
PGREQUIRESSL=1 or a previous requiressl=1. PGREQUIRESSL=0 had no effect
whatsoever.) Starting with commit 65c3bf1, libpq ignored PGREQUIRESSL,
and order of precedence changed to this:
last of requiressl=* or sslmode=*
PGSSLMODE=*
Starting now, adopt the following order of precedence:
last of requiressl=* or sslmode=*
PGSSLMODE=*
PGREQUIRESSL=1
This retains the 65c3bf1 behavior for connection strings that contain
both requiressl=* and sslmode=*. It retains the 65c3bf1 change that
either connection string option overrides both environment variables.
For the first time, PGSSLMODE has precedence over PGREQUIRESSL; this
avoids reducing security of "PGREQUIRESSL=1 PGSSLMODE=verify-full"
configurations originating under v9.3 and later.
Daniel Gustafsson
Security: CVE-2017-7485
On Windows, libc will mask \r\n line endings for us, since we read the
password file in text mode. But that doesn't happen on Unix. People
who share password files across both systems might have \r\n line endings
in a file they use on Unix, so as a convenience, ignore trailing \r.
Per gripe from Josh Berkus.
In passing, put the existing check for empty line somewhere where it's
actually useful, ie after stripping the newline not before.
Vik Fearing, adjusted a bit by me
Discussion: <0de37763-5843-b2cc-855e-5d0e5df25807@agliodbs.com>
Due to simplistic quoting and confusion of database names with conninfo
strings, roles with the CREATEDB or CREATEROLE option could escalate to
superuser privileges when a superuser next ran certain maintenance
commands. The new coding rule for PQconnectdbParams() calls, documented
at conninfo_array_parse(), is to pass expand_dbname=true and wrap
literal database names in a trivial connection string. Escape
zero-length values in appendConnStrVal(). Back-patch to 9.1 (all
supported versions).
Nathan Bossart, Michael Paquier, and Noah Misch. Reviewed by Peter
Eisentraut. Reported by Nathan Bossart.
Security: CVE-2016-5424
In commit 210eb9b743 I centralized libpq's logic for closing down
the backend communication socket, and made the new pqDropConnection
routine always reset the I/O buffers to empty. Many of the call sites
previously had not had such code, and while that amounted to an oversight
in some cases, there was one place where it was intentional and necessary
*not* to flush the input buffer: pqReadData should never cause that to
happen, since we probably still want to process whatever data we read.
This is the true cause of the problem Robert was attempting to fix in
c3e7c24a1d, namely that libpq no longer reported the backend's final
ERROR message before reporting "server closed the connection unexpectedly".
But that only accidentally fixed it, by invoking parseInput before the
input buffer got flushed; and very likely there are timing scenarios
where we'd still lose the message before processing it.
To fix, pass a flag to pqDropConnection to tell it whether to flush the
input buffer or not. On review I think flushing is actually correct for
every other call site.
Back-patch to 9.3 where the problem was introduced. In HEAD, also improve
the comments added by c3e7c24a1d.
After finding an "=" character, the pointer was advanced twice when it
should only advance once. This is harmless as long as the value after "="
has at least one character; but if it doesn't, we'd miss the terminator
character and include too much in the value.
In principle this could lead to reading off the end of memory. It does not
seem worth treating as a security issue though, because it would happen on
client side, and besides client logic that's taking conninfo strings from
untrusted sources has much worse security problems than this.
Report and patch received off-list from Thomas Fanghaenel.
Back-patch to 9.2 where the faulty code was introduced.
Coverity complained that the "else" added to fillPGconn() was unreachable,
which it was. Remove the dead code. In passing, rearrange the tests so as
not to bother trying to fetch values for options that can't be assigned.
Pre-9.3 did not have that issue, but it did have a "return" that should be
"goto oom_error" to ensure that a suitable error message gets filled in.
If the "dbname" attribute in PQconnectDBParams contained a connection string
or URI (and expand_dbname = TRUE), the database name from the connection
string could not be overridden by a subsequent "dbname" keyword in the
array. That was not intentional; all other options can be overridden.
Furthermore, any subsequent "dbname" caused the connection string from the
first dbname value to be processed again, overriding any values for the same
options that were given between the connection string and the second dbname
option.
In the passing, clarify in the docs that only the first dbname option in the
array is parsed as a connection string.
Alex Shulgin. Backpatch to all supported versions.
An out-of-memory in most of these would lead to strange behavior, like
connecting to a different database than intended, but some would lead to
an outright segfault.
Alex Shulgin and me. Backpatch to all supported versions.
If you call PQreset() repeatedly, and the connection cannot be
re-established, the error messages from the failed connection attempts
kept accumulating in the error string.
Fixes bug #11455 reported by Caleb Epstein. Backpatch to all supported
versions.
This was not changed in HEAD, but will be done later as part of a
pgindent run. Future pgindent runs will also do this.
Report by Tom Lane
Backpatch through all supported branches, but not HEAD
Introduced in 585bca39: msgid is not used in the Windows code path.
Also adjust comments a tad (mostly to keep pgindent from messing it up).
David Rowley
Bind attempts to an LDAP server should time out after two seconds,
allowing additional lines in the service control file to be parsed
(which provide a fall back to a secondary LDAP server or default options).
The existing code failed to enforce that timeout during TCP connect,
resulting in a hang far longer than two seconds if the LDAP server
does not respond.
Laurenz Albe
Previously, in some places, socket creation errors were checked for
negative values, which is not true for Windows because sockets are
unsigned. This masked socket creation errors on Windows.
Backpatch through 9.0. 8.4 doesn't have the infrastructure to fix this.
"8" was correct back when "disable" was the longest allowed value, but
since "verify-full" was added, it should be "12". Given the lack of
complaints, I wouldn't be surprised if nobody is actually using these
values ... but still, if they're in the API, they should be right.
Noticed while pursuing a different problem. It's been wrong for quite
a long time, so back-patch to all supported branches.
In the platform that doesn't support Unix-domain socket, when
neither host nor hostaddr are specified, the default host
'localhost' is used to connect to the server and PQhost() must
return that, but it didn't. This patch fixes PQhost() so that
it returns the default host in that case.
Also this patch fixes PQhost() so that it doesn't return
Unix-domain socket directory path in the platform that doesn't
support Unix-domain socket.
Back-patch to all supported versions.
On Unix-ish platforms, EWOULDBLOCK may be the same as EAGAIN, which is
*not* a success return, at least not on Linux. We need to treat it as a
failure to avoid giving a misleading error message. Per the Single Unix
Spec, only EINPROGRESS and EINTR returns indicate that the connection
attempt is in progress.
On Windows, on the other hand, EWOULDBLOCK (WSAEWOULDBLOCK) is the expected
case. We must accept EINPROGRESS as well because Cygwin will return that,
and it doesn't seem worth distinguishing Cygwin from native Windows here.
It's not very clear whether EINTR can occur on Windows, but let's leave
that part of the logic alone in the absence of concrete trouble reports.
Also, remove the test for errno == 0, effectively reverting commit
da9501bddb, which AFAICS was just a thinko;
or at best it might have been a workaround for a platform-specific bug,
which we can hope is gone now thirteen years later. In any case, since
libpq makes no effort to reset errno to zero before calling connect(),
it seems unlikely that that test has ever reliably done anything useful.
Andres Freund and Tom Lane
This allows a caller to get back the exact conninfo array that was
used to create a connection, including parameters read from the
environment.
In doing this, restructure how options are copied from the conninfo
to the actual connection.
Zoltan Boszormenyi and Magnus Hagander
The length of a socket path name is constrained by the size of struct
sockaddr_un, and there's not a lot we can do about it since that is a
kernel API. However, it would be a good thing if we produced an
intelligible error message when the user specifies a socket path that's too
long --- and getaddrinfo's standard API is too impoverished to do this in
the natural way. So insert explicit tests at the places where we construct
a socket path name. Now you'll get an error that makes sense and even
tells you what the limit is, rather than something generic like
"Non-recoverable failure in name resolution".
Per trouble report from Jeremy Drake and a fix idea from Andrew Dunstan.
Create an internal function pqDropConnection that does the physical socket
close and cleans up closely-associated state. This removes a bunch of ad
hoc, not always consistent closure code. The ulterior motive is to have a
single place to wait for a spawned child backend to exit, but this seems
like good cleanup even if that never happens.
I went back and forth on whether to include "conn->status = CONNECTION_BAD"
in pqDropConnection's actions, but for the moment decided not to. Only a
minority of the call sites actually want that, and in any case it's
arguable that conn->status is slightly higher-level state, and thus not
part of this function's purview.
After taking awhile to digest the row-processor feature that was added to
libpq in commit 92785dac2e, we've concluded
it is over-complicated and too hard to use. Leave the core infrastructure
changes in place (that is, there's still a row processor function inside
libpq), but remove the exposed API pieces, and instead provide a "single
row" mode switch that causes PQgetResult to return one row at a time in
separate PGresult objects.
This approach incurs more overhead than proper use of a row processor
callback would, since construction of a PGresult per row adds extra cycles.
However, it is far easier to use and harder to break. The single-row mode
still affords applications the primary benefit that the row processor API
was meant to provide, namely not having to accumulate large result sets in
memory before processing them. Preliminary testing suggests that we can
probably buy back most of the extra cycles by micro-optimizing construction
of the extra results, but that task will be left for another day.
Marko Kreen
Drop special handling of host component with slashes to mean
Unix-domain socket. Specify it as separate parameter or using
percent-encoding now.
Allow omitting username, password, and port even if the corresponding
designators are present in URI.
Handle percent-encoding in query parameter keywords.
Alex Shulgin
some documentation improvements by myself
postgres:// URIs are an attempt to "stop the bleeding" in this general
area that has been said to occur due to external projects adopting their
own syntaxes. The syntaxes supported by this patch:
postgres://[user[:pwd]@][unix-socket][:port[/dbname]][?param1=value1&...]
postgres://[user[:pwd]@][net-location][:port][/dbname][?param1=value1&...]
should be enough to cover most interesting cases without having to
resort to "param=value" pairs, but those are provided for the cases that
need them regardless.
libpq documentation has been shuffled around a bit, to avoid stuffing
all the format details into the PQconnectdbParams description, which was
already a bit overwhelming. The list of keywords has moved to its own
subsection, and the details on the URI format live in another subsection.
This includes a simple test program, as requested in discussion, to
ensure that interesting corner cases continue to work appropriately in
the future.
Author: Alexander Shulgin
Some tweaking by Álvaro Herrera, Greg Smith, Daniel Farina, Peter Eisentraut
Reviewed by Robert Haas, Alexey Klyukin (offlist), Heikki Linnakangas,
Marko Kreen, and others
Oh, it also supports postgresql:// but that's probably just an accident.
Traditionally libpq has collected an entire query result before passing
it back to the application. That provides a simple and transactional API,
but it's pretty inefficient for large result sets. This patch allows the
application to process each row on-the-fly instead of accumulating the
rows into the PGresult. Error recovery becomes a bit more complex, but
often that tradeoff is well worth making.
Kyotaro Horiguchi, reviewed by Marko Kreen and Tom Lane
When using connection info arrays with a conninfo string in the dbname
slot, some memory would be leaked if an error occurred while
processing the following array slots.
found by Coverity
This has been broken just about forever (or more specifically, commit
7f4981f4af) and nobody noticed until
Richard Huxton reported it recently. Analysis and fix by Ross
Reedstrom, although I didn't use his patch. This doesn't seem
important enough to back-patch and is mildly backward incompatible, so
I'm just doing this in master.
The keywords and values arguments of these functions are more properly
declared "const char * const *" than just "const char **".
Lionel Elie Mamane, reviewed by Craig Ringer
These days, such a response is far more likely to signify a server-side
problem, such as fork failure. Reporting "server does not support SSL"
(in sslmode=require) could be quite misleading. But the results could
be even worse in sslmode=prefer: if the problem was transient and the
next connection attempt succeeds, we'll have silently fallen back to
protocol version 2.0, possibly disabling features the user needs.
Hence, it seems best to just eliminate the assumption that backing off
to non-SSL/2.0 protocol is the way to recover from an "E" response, and
instead treat the server error the same as we would in non-SSL cases.
I tested this change against a pre-7.0 server, and found that there
was a second logic bug in the "prefer" path: the test to decide whether
to make a fallback connection attempt assumed that we must have opened
conn->ssl, which in fact does not happen given an "E" response. After
fixing that, the code does indeed connect successfully to pre-7.0,
as long as you didn't set sslmode=require. (If you did, you get
"Unsupported frontend protocol", which isn't completely off base
given the server certainly doesn't support SSL.)
Since there seems no reason to believe that pre-7.0 servers exist anymore
in the wild, back-patch to all supported branches.
There are assorted situations wherein PQconnectPoll() will abandon a
connection attempt and try again with different parameters (eg, SSL versus
not SSL). However, the code forgot to discard any pending data in libpq's
I/O buffers when doing this. In at least one case (server returns E
message during SSL negotiation), there is unread input data which bollixes
the next connection attempt. I have not checked to see whether this is
possible in the other cases where we close the socket and retry, but it
seems like a matter of good defensive programming to add explicit
buffer-flushing code to all of them.
This is one of several issues exposed by Daniel Farina's report of
misbehavior after a server-side fork failure.
This has been wrong since forever, so back-patch to all supported branches.
This unifies a bunch of ugly #ifdef's in one place. Per discussion,
we only need this where HAVE_UNIX_SOCKETS, so no need to cover Windows.
Marko Kreen, some adjustment by Tom Lane
It turns out the reason we hadn't found out about the portability issues
with our credential-control-message code is that almost no modern platforms
use that code at all; the ones that used to need it now offer getpeereid(),
which we choose first. The last holdout was NetBSD, and they added
getpeereid() as of 5.0. So far as I can tell, the only live platform on
which that code was being exercised was Debian/kFreeBSD, ie, FreeBSD kernel
with Linux userland --- since glibc doesn't provide getpeereid(), we fell
back to the control message code. However, the FreeBSD kernel provides a
LOCAL_PEERCRED socket parameter that's functionally equivalent to Linux's
SO_PEERCRED. That is both much simpler to use than control messages, and
superior because it doesn't require receiving a message from the other end
at just the right time.
Therefore, add code to use LOCAL_PEERCRED when necessary, and rip out all
the credential-control-message code in the backend. (libpq still has such
code so that it can still talk to pre-9.1 servers ... but eventually we can
get rid of it there too.) Clean up related autoconf probes, too.
This means that libpq's requirepeer parameter now works on exactly the same
platforms where the backend supports peer authentication, so adjust the
documentation accordingly.
Even though this only affects the insertion of a parenthesized word,
it's unwise to assume that parentheses can pass through untranslated.
And in any case, the new version is clearer in the code and for
translators.
The code to assemble ldap_get_values_len's output into a single string
wrote the terminating null one byte past where it should. Fix that,
and make some other cosmetic adjustments to make the code a trifle more
readable and more in line with usual Postgres coding style.
Also, free the "result" string when done with it, to avoid a permanent
memory leak.
Bug report and patch by Albe Laurenz, cosmetic adjustments by me.