The original definition had the problem that timeouts exceeding about 2100
seconds couldn't be specified on 32-bit machines. Milliseconds seem like
sufficient resolution, and finer grain than that would be fantasy anyway
on many platforms.
Back-patch to 9.1 so that this aspect of the latch API won't change between
9.1 and later releases.
Peter Geoghegan
Improve the documentation around weak-memory-ordering risks, and do a pass
of general editorialization on the comments in the latch code. Make the
Windows latch code more like the Unix latch code where feasible; in
particular provide the same Assert checks in both implementations.
Fix poorly-placed WaitLatch call in syncrep.c.
This patch resolves, for the moment, concerns around weak-memory-ordering
bugs in latch-related code: we have documented the restrictions and checked
that existing calls meet them. In 9.2 I hope that we will install suitable
memory barrier instructions in SetLatch/ResetLatch, so that their callers
don't need to be quite so careful.
detect postmaster death. Postmaster keeps the write-end of the pipe open,
so when it dies, children get EOF in the read-end. That can conveniently
be waited for in select(), which allows eliminating some of the polling
loops that check for postmaster death. This patch doesn't yet change all
the loops to use the new mechanism, expect a follow-on patch to do that.
This changes the interface to WaitLatch, so that it takes as argument a
bitmask of events that it waits for. Possible events are latch set, timeout,
postmaster death, and socket becoming readable or writeable.
The pipe method behaves slightly differently from the kill() method
previously used in PostmasterIsAlive() in the case that postmaster has died,
but its parent has not yet read its exit code with waitpid(). The pipe
returns EOF as soon as the process dies, but kill() continues to return
true until waitpid() has been called (IOW while the process is a zombie).
Because of that, change PostmasterIsAlive() to use the pipe too, otherwise
WaitLatch() would return immediately with WL_POSTMASTER_DEATH, while
PostmasterIsAlive() would claim it's still alive. That could easily lead to
busy-waiting while postmaster is in zombie state.
Peter Geoghegan with further changes by me, reviewed by Fujii Masao and
Florian Pflug.
Most of these cast DWORD to int or unsigned int for printf type handling.
This is safe even on 64 bit architectures because a DWORD is always 32 bits.
In one case a variable is initialised to keep the compiler happy.
than replication_timeout (a new GUC) milliseconds. The TCP timeout is often
too long, you want the master to notice a dead connection much sooner.
People complained about that in 9.0 too, but with synchronous replication
it's even more important to notice dead connections promptly.
Fujii Masao and Heikki Linnakangas
dynamic pool of event handles, we can permanently assign one for each
shared latch. Thanks to that, we no longer need a separate shared memory
block for latches, and we don't need to know in advance how many shared
latches there is, so you no longer need to remember to update
NumSharedLatches when you introduce a new latch to the system.
wait until it is set. Latches can be used to reliably wait until a signal
arrives, which is hard otherwise because signals don't interrupt select()
on some platforms, and even when they do, there's race conditions.
On Unix, latches use the so called self-pipe trick under the covers to
implement the sleep until the latch is set, without race conditions. On
Windows, Windows events are used.
Use the new latch abstraction to sleep in walsender, so that as soon as
a transaction finishes, walsender is woken up to immediately send the WAL
to the standby. This reduces the latency between master and standby, which
is good.
Preliminary work by Fujii Masao. The latch implementation is by me, with
helpful comments from many people.