Whenever this function is used with the FORMAT_MESSAGE_FROM_SYSTEM flag,
it's good practice to include FORMAT_MESSAGE_IGNORE_INSERTS as well.
Otherwise, if the message contains any %n insertion markers, the function
will try to fetch argument strings to substitute --- which we are not
passing, possibly leading to a crash. This is exactly analogous to the
rule about not giving printf() a format string you're not in control of.
Noted and patched by Christian Ullrich.
Back-patch to all supported branches.
pgwin32_recv() has treated a non-error return of zero bytes from WSARecv()
as being a reason to block ever since the current implementation was
introduced in commit a4c40f140d23cefb. However, so far as one can tell
from Microsoft's documentation, that is just wrong: what it means is
graceful connection closure (in stream protocols) or receipt of a
zero-length message (in message protocols), and neither case should result
in blocking here. The only reason the code worked at all was that control
then fell into the retry loop, which did *not* treat zero bytes specially,
so we'd get out after only wasting some cycles. But as of 9.5 we do not
normally reach the retry loop and so the bug is exposed, as reported by
Shay Rojansky and diagnosed by Andres Freund.
Remove the unnecessary test on the byte count, and rearrange the code
in the retry loop so that it looks identical to the initial sequence.
Back-patch of commit 90e61df8130dc7051a108ada1219fb0680cb3eb6. The
original plan was to apply this only to 9.5 and up, but after discussion
and buildfarm testing, it seems better to back-patch. The noblock code
path has been at risk of this problem since it was introduced (in 9.0);
if it did happen in pre-9.5 branches, the symptom would be that a walsender
would wait indefinitely rather than noticing a loss of connection. While
we lack proof that the case has been seen in the field, it seems possible
that it's happened without being reported.
Previously, in some places, socket creation errors were checked for
negative values, which is not true for Windows because sockets are
unsigned. This masked socket creation errors on Windows.
Backpatch through 9.0. 8.4 doesn't have the infrastructure to fix this.
Make sure WaitLatchOrSocket regards FD_CLOSE as a read-ready condition.
We might want to tweak this further, but it was surely wrong as-is.
Make pgwin32_waitforsinglesocket detach its private event object from the
passed socket before returning. I suspect that failure to do so leads
to race conditions when other code (such as WaitLatchOrSocket) attaches
a different event object to the same socket. Moreover, the existing
coding meant that repeated calls to pgwin32_waitforsinglesocket would
perform ResetEvent on an event actively connected to a socket, which
is rumored to be an unsafe practice; the WSAEventSelect documentation
appears to recommend against this, though it does not say not to do it
in so many words.
Also, uniformly use the coding pattern "WSAEventSelect(s, NULL, 0)" to
detach events from sockets, rather than passing the event in the second
parameter. The WSAEventSelect documentation says that the second parameter
is ignored if the third is 0, so theoretically this should make no
difference. However, elsewhere on the same reference page the use of NULL
in this context is recommended, and I have found suggestions on the net
that some versions of Windows have bugs with a non-NULL second parameter
in this usage.
Some other mostly-cosmetic cleanup, such as using the right one of
WSAGetLastError and GetLastError for reporting errors from these functions.
than replication_timeout (a new GUC) milliseconds. The TCP timeout is often
too long, you want the master to notice a dead connection much sooner.
People complained about that in 9.0 too, but with synchronous replication
it's even more important to notice dead connections promptly.
Fujii Masao and Heikki Linnakangas
and use this in pq_getbyte_if_available.
It's only a limited implementation which swithes the whole emulation layer
no non-blocking mode, but that's enough as long as non-blocking is only
used during a short period of time, and only one socket is accessed during
this time.
to be cases when at least Windows 2000 can do this even though select
just indicated that the socket is readable.
Per report and analysis from Cyril VELTER.
input in the stats collector. Our select() emulation is apparently buggy
for UDP sockets :-(. This should resolve problems with stats collection
(and hence autovacuum) failing under more than minimal load. Diagnosis
and patch by Magnus Hagander.
Patch probably needs to be back-ported to 8.1 and 8.0, but first let's
see if it makes the buildfarm happy...
FormatMessage() (This should have been in 8.2.0, patched to 8.2.X and
HEAD):
I think this problem to be complex....
http://archives.postgresql.org/pgsql-hackers/2006-11/msg00042.php
FormatMessage of windows cannot consider the encoding of the database.
However, I should try the solution now. It is necessary to clear the
problem.
Multi character-code exists together in message and log. It doesn't
consider
the data base encoding that the user intended....
The user in multi-byte country can try this.
http://inet.winpg.jp/~saito/pg_bug/MessageCheck.c
That is, it is likely to become it in this manner.(Japanese)
http://inet.winpg.jp/~saito/pg_bug/FormatMessage998.png
Hiroshi Saito
1) pgwin32_waitforsinglesocket(): WaitForMultipleObjectsEx now called with
finite timeout (100ms) in case of FP_WRITE and UDP socket. If timeout occurs
then pgwin32_waitforsinglesocket() tries to write empty packet goes to
WaitForMultipleObjectsEx again.
2) pgwin32_send(): add loop around WSASend and pgwin32_waitforsinglesocket().
The reason is: for overlapped socket, 'ok' result from
pgwin32_waitforsinglesocket() isn't guarantee that socket is still free,
it can become busy again and following WSASend call will fail with
WSAEWOULDBLOCK error.
See http://archives.postgresql.org/pgsql-hackers/2006-10/msg00561.php
to the main thread. This allows removal of WaitForSingleObjectEx() calls
from the main thread, thereby allowing us to re-enable Qingqing Zhou's
CHECK_FOR_INTERRUPTS performance improvement. Qingqing, Magnus, et al.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
* Links with -leay32 and -lssleay32 instead of crypto and ssl. On win32,
"crypto and ssl" is only used for static linking.
* Initializes SSL in the backend and not just in the postmaster. We
cannot pass the SSL context from the postmaster through the parameter
file, because it contains function pointers.
* Split one error check in be-secure.c. Previously we could not tell
which of three calls actually failed. The previous code also returned
incorrect error messages if SSL_accept() failed - that function needs to
use SSL_get_error() on the return value, can't just use the error queue.
* Since the win32 implementation uses non-blocking sockets "behind the
scenes" in order to deliver signals correctly, implements a version of
SSL_accept() that can handle this. Also, add a wait function in case
SSL_read or SSL_write() needs more data.
Magnus Hagander
It works on the principle of turning sockets into non-blocking, and then
emulate blocking behaviour on top of that, while allowing signals to
run. Signals are now implemented using an event instead of APCs, thus
getting rid of the issue of APCs not being compatible with "old style"
sockets functions.
It also moves the win32 specific code away from pqsignal.h/c into
port/win32, and also removes the "thread style workaround" of the APC
issue previously in place.
In order to make things work, a few things are also changed in pgstat.c:
1) There is now a separate pipe to the collector and the bufferer. This
is required because the pipe will otherwise only be signalled in one of
the processes when the postmaster goes down. The MS winsock code for
select() must have some kind of workaround for this behaviour, but I
have found no stable way of doing that. You really are not supposed to
use the same socket from more than one process (unless you use
WSADuplicateSocket(), in which case the docs specifically say that only
one will be flagged).
2) The check for "postmaster death" is moved into a separate select()
call after the main loop. The previous behaviour select():ed on the
postmaster pipe, while later explicitly saying "we do NOT check for
postmaster exit inside the loop".
The issue was that the code relies on the same select() call seeing both
the postmaster pipe *and* the pgstat pipe go away. This does not always
happen, and it appears that useing WSAEventSelect() makes it even more
common that it does not.
Since it's only called when the process exits, I don't think using a
separate select() call will have any significant impact on how the stats
collector works.
Magnus Hagander