pg_basebackup's child process did not pay any attention to the pipe
from its parent while waiting for input from the source server.
If no server data was arriving, it would only wake up and check the
pipe every standby_message_timeout or so. This creates a problem
since the parent process might determine and send the desired stop
position only after the server has reached end-of-WAL and stopped
sending data. In the src/test/recovery regression tests, the timing
is repeatably such that it takes nearly 10 seconds for the child
process to realize that it should shut down. It's not clear how
often that would happen in real-world cases, but it sure seems like
a bug --- and if the user turns off standby_message_timeout or sets
it very large, the delay could be a lot worse.
To fix, expand the StreamCtl API to allow the pipe input FD to be
passed down to the low-level wait routine, and watch both sockets
when sleeping.
(Note: AFAICS this issue doesn't affect the Windows port, since
it doesn't rely on a pipe to transfer the stop position to the
child thread.)
Discussion: https://postgr.es/m/6456.1493263884@sss.pgh.pa.us
Previously the logical replication launcher stored the last timestamp
when it started the worker, in the local variable "last_start_time",
in order to check whether wal_retrive_retry_interval elapsed since
the last startup of worker. If it has elapsed, the launcher sees
pg_subscription and starts new worker if necessary. This is for
limitting the startup of worker to once a wal_retrieve_retry_interval.
The bug was that the variable "last_start_time" was defined and
always initialized with 0 at the beginning of the launcher's main loop.
So even if it's set to the last timestamp in later phase of the loop,
it's always reset to 0. Therefore the launcher could not check
correctly whether wal_retrieve_retry_interval elapsed since
the last startup.
This patch moves the variable "last_start_time" outside the main loop
so that it will not be reset.
Reviewed-by: Petr Jelinek
Discussion: http://postgr.es/m/CAHGQGwGJrPO++XM4mFENAwpy1eGXKsGdguYv43GUgLgU-x8nTQ@mail.gmail.com
Commit fa31b6f4e supposed that we didn't have to worry about that
anymore, but it seems that RHEL5 is like that, and that's still
a supported platform. Put back the prior coding under an #ifdef,
adding an explicit fcntl() to retain the desired CLOEXEC property.
Discussion: https://postgr.es/m/12307.1493325329@sss.pgh.pa.us
The logical decoding machinery already preserved all the required
catalog tuples, which is sufficient in the course of normal logical
decoding, but did not guarantee that non-catalog tuples were preserved
during computation of the initial snapshot when creating a slot over
the replication protocol.
This could cause a corrupted initial snapshot being exported. The
time window for issues is usually not terribly large, but on a busy
server it's perfectly possible to it hit it. Ongoing decoding is not
affected by this bug.
To avoid increased overhead for the SQL API, only retain additional
tuples when a logical slot is being created over the replication
protocol. To do so this commit changes the signature of
CreateInitDecodingContext(), but it seems unlikely that it's being
used in an extension, so that's probably ok.
In a drive-by fix, fix handling of
ReplicationSlotsComputeRequiredXmin's already_locked argument, which
should only apply to ProcArrayLock, not ReplicationSlotControlLock.
Reported-By: Erik Rijkers
Analyzed-By: Petr Jelinek
Author: Petr Jelinek, heavily editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.
Although the postmaster doesn't currently create a self-pipe or any
latches, there's discussion of it doing so in future. It's also
conceivable that a shared_preload_libraries extension would try to
create such a thing in the postmaster process today. In that case
the self-pipe FDs would be inherited by forked child processes.
latch.c was entirely unprepared for such a case and could suffer an
assertion failure, or worse try to use the inherited pipe if somebody
called WaitLatch without having called InitializeLatchSupport in that
process. Make it keep track of whether InitializeLatchSupport has been
called in the *current* process, and do the right thing if state has
been inherited from a parent.
Apply FD_CLOEXEC to file descriptors created in latch.c (the self-pipe,
as well as epoll event sets). This ensures that child processes spawned
in backends, the archiver, etc cannot accidentally or intentionally mess
with these FDs. It also ensures that we end up with the right state
for the self-pipe in EXEC_BACKEND processes, which otherwise wouldn't
know to close the postmaster's self-pipe FDs.
Back-patch to 9.6, mainly to keep latch.c looking similar in all branches
it exists in.
Discussion: https://postgr.es/m/8322.1493240739@sss.pgh.pa.us
The bug fixed by 0874d4f3e183757ba15a4b3f3bf563e0393dd9c2
caused us to question and rework the handling of
subtransactions in 2PC during and at end of recovery.
Patch adds checks and tests to ensure no further bugs.
This effectively removes the temporary measure put in place
by 546c13e11b29a5408b9d6a6e3cca301380b47f7f.
Author: Simon Riggs
Reviewed-by: Tom Lane, Michael Paquier
Discussion: http://postgr.es/m/CANP8+j+vvXmruL_i2buvdhMeVv5TQu0Hm2+C5N+kdVwHJuor8w@mail.gmail.com
Previously, maybe_start_bgworker() would launch at most one bgworker
process per call, on the grounds that the postmaster might otherwise
neglect its other duties for too long. However, that seems overly
conservative, especially since bad effects only become obvious when
many hundreds of bgworkers need to be launched at once. On the other
side of the coin is that the existing logic could result in substantial
delay of bgworker launches, because ServerLoop isn't guaranteed to
iterate immediately after a signal arrives. (My attempt to fix that
by using pselect(2) encountered too many portability question marks,
and in any case could not help on platforms without pselect().)
One could also question the wisdom of using an O(N^2) processing
method if the system is intended to support so many bgworkers.
As a compromise, allow that function to launch up to 100 bgworkers
per call (and in consequence, rename it to maybe_start_bgworkers).
This will allow any normal parallel-query request for workers
to be satisfied immediately during sigusr1_handler, avoiding the
question of whether ServerLoop will be able to launch more promptly.
There is talk of rewriting the postmaster to use a WaitEventSet to
avoid the signal-response-delay problem, but I'd argue that this change
should be kept even after that happens (if it ever does).
Backpatch to 9.6 where parallel query was added. The issue exists
before that, but previous uses of bgworkers typically aren't as
sensitive to how quickly they get launched.
Discussion: https://postgr.es/m/4707.1493221358@sss.pgh.pa.us
Our general rule for pg_get_X(oid) functions is to simply return NULL
when passed an invalid or inappropriate OID. Teach pg_get_partkeydef to
do this also, making it easier for users to use this function when
querying against tables with both partitions and non-partitions (such as
pg_class).
As a concrete example, this makes pg_dump's life a little easier.
Author: Amit Langote
Publisher relation can be incorrectly chosen, if there are more than
one relation in different schemas with the same name.
Author: Euler Taveira <euler@timbira.com.br>
The code was originally written with assumption that launcher is the
only process starting the worker. However that hasn't been true since
commit 7c4f52409 which failed to modify the worker management code
adequately.
This patch adds an in_use field to the LogicalRepWorker struct to
indicate whether the worker slot is being used and uses proper locking
everywhere this flag is set or read.
However if the parent process dies while the new worker is starting and
the new worker fails to attach to shared memory, this flag would never
get cleared. We solve this rare corner case by adding a sort of garbage
collector for in_use slots. This uses another field in the
LogicalRepWorker struct named launch_time that contains the time when
the worker was started. If any request to start a new worker does not
find free slot, we'll check for workers that were supposed to start but
took too long to actually do so, and reuse their slot.
In passing also fix possible race conditions when stopping a worker that
hasn't finished starting yet.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
There is no need to forbid ALTER TABLE ONLY on partitioned tables,
when no partitions exist yet. This can be handy for users who are
building up their partitioned table independently and will create actual
partitions later.
In addition, this is how pg_dump likes to operate in certain instances.
Author: Amit Langote, with some error message word-smithing by me
Otherwise one would have to wait up to DEFAULT_NAPTIME_PER_CYCLE until
the subscription worker is considered for starting.
There is a small race condition: If one enables a subscription right
after disabling it, the launcher might not have registered the stopping
when receiving the wakeup signal for the re-enabling. The start will
then not happen right away but after the full cycle time.
Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
In quorum-based synchronous replication, all the standbys listed in
synchronous_standby_names equally have chances to be chosen
as synchronous standbys. So they should have the same priority.
However, previously, quorum standbys whose names appear earlier
in the list were given higher priority values though the difference of
those priority values didn't affect the selection of synchronous standbys.
Users could see those "meaningless" priority values in pg_stat_replication
and this was confusing.
This commit gives all the quorum synchronous standbys the same
highest priority, i.e., 1, in order to remove such confusion.
Author: Fujii Masao
Reviewed-by: Masahiko Sawada, Kyotaro Horiguchi
Discussion: http://postgr.es/m/CAHGQGwEKOw=SmPLxJzkBsH6wwDBgOnVz46QjHbtsiZ-d-2RGUg@mail.gmail.com
Objects in an extension are shippable to a foreign server if the
extension is part of the foreign server definition's shippable
extensions list. But this was not properly considered in some cases
when checking whether a join condition can be pushed to a foreign server
and the join condition uses an object from a shippable extension. So
the join would never be pushed down in those cases.
So, the list of extensions needs to be made available in fpinfo of the
relation being considered to be pushed down before any expressions are
assessed for being shippable. Fix foreign_join_ok() to do that for a
join relation.
The code to save FDW options in fpinfo is scattered at multiple places.
Bring all of that together into functions apply_server_options(),
apply_table_options(), and merge_fdw_options().
David Rowley and Ashutosh Bapat, per report from David Rowley
This reverts commit 81069a9efc5a374dd39874a161f456f0fb3afba4.
Buildfarm results suggest that some platforms have versions of pselect(2)
that are not merely non-atomic, but flat out non-functional. Revert the
use-pselect patch to confirm this diagnosis (and exclude the no-SA_RESTART
patch as the source of trouble). If it's so, we should probably look into
blacklisting specific platforms that have broken pselect.
Discussion: https://postgr.es/m/9696.1493072081@sss.pgh.pa.us
Traditionally we've unblocked signals, called select(2), and then blocked
signals again. The code expects that the select() will be cancelled with
EINTR if an interrupt occurs; but there's a race condition, which is that
an already-pending signal will be delivered as soon as we unblock, and then
when we reach select() there will be nothing preventing it from waiting.
This can result in a long delay before we perform any action that
ServerLoop was supposed to have taken in response to the signal. As with
the somewhat-similar symptoms fixed by commit 893902085, the main practical
problem is slow launching of parallel workers. The window for trouble is
usually pretty short, corresponding to one iteration of ServerLoop; but
it's not negligible.
To fix, use pselect(2) in place of select(2) where available, as that's
designed to solve exactly this problem. Where not available, we continue
to use the old way, and are no worse off than before.
pselect(2) has been required by POSIX since about 2001, so most modern
platforms should have it. A bigger portability issue is that some
implementations are said to be non-atomic, ie pselect() isn't really
any different from unblock/select/reblock. Still, we're no worse off
than before on such a platform.
There is talk of rewriting the postmaster to use a WaitEventSet and
not do signal response work in signal handlers, at which point this
could be reverted, since we'd be using a self-pipe to solve the race
condition. But that's not happening before v11 at the earliest.
Back-patch to 9.6. The problem exists much further back, but the
worst symptom arises only in connection with parallel query, so it
does not seem worth taking any portability risks in older branches.
Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
The postmaster keeps signals blocked everywhere except while waiting
for something to happen in ServerLoop(). The code expects that the
select(2) will be cancelled with EINTR if an interrupt occurs; without
that, followup actions that should be performed by ServerLoop() itself
will be delayed. However, some platforms interpret the SA_RESTART
signal flag as meaning that they should restart rather than cancel
the select(2). Worse yet, some of them restart it with the original
timeout delay, meaning that a steady stream of signal interrupts can
prevent ServerLoop() from iterating at all if there are no incoming
connection requests.
Observable symptoms of this, on an affected platform such as HPUX 10,
include extremely slow parallel query startup (possibly as much as
30 seconds) and failure to update timestamps on the postmaster's sockets
and lockfiles when no new connections arrive for a long time.
We can fix this by running the postmaster's signal handlers without
SA_RESTART. That would be quite a scary change if the range of code
where signals are accepted weren't so tiny, but as it is, it seems
safe enough. (Note that postmaster children do, and must, reset all
the handlers before unblocking signals; so this change should not
affect any child process.)
There is talk of rewriting the postmaster to use a WaitEventSet and
not do signal response work in signal handlers, at which point it might
be appropriate to revert this patch. But that's not happening before
v11 at the earliest.
Back-patch to 9.6. The problem exists much further back, but the
worst symptom arises only in connection with parallel query, so it
does not seem worth taking any portability risks in older branches.
Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
This corner case didn't behave nicely at all: the postmaster would
(partially) update its state as though the process had started
successfully, and be quite confused thereafter. Fix it to act
like the worker had crashed, instead.
In passing, refactor so that do_start_bgworker contains all the
state-change logic for bgworker launch, rather than just some of it.
Back-patch as far as 9.4. 9.3 contains similar logic, but it's just
enough different that I don't feel comfortable applying the patch
without more study; and the use of bgworkers in 9.3 was so small
that it doesn't seem worth the extra work.
transam/parallel.c is still entirely unprepared for the possibility
of bgworker startup failure, but that seems like material for a
separate patch.
Discussion: https://postgr.es/m/4905.1492813727@sss.pgh.pa.us
Fix machine-dependent sorting of column numbers. (Odd behavior
would only materialize for column numbers above 255, but that's
certainly legal.)
Fix poor choice of SQLSTATE for some errors, and improve error message
wording. (Notably, "is not a scalar type" is a totally misleading way
to explain "does not have a default btree opclass".)
Avoid taking AccessExclusiveLock on the associated relation during DROP
STATISTICS. That's neither necessary nor desirable, and it could easily
have put us into situations where DROP fails (compare commit 68ea2b7f9).
Adjust/improve comments.
David Rowley and Tom Lane
Discussion: https://postgr.es/m/CAKJS1f-GmCfPvBbAEaM5xoVOaYdVgVN1gicALSoYQ77z-+vLbw@mail.gmail.com
poll.h is mandated by Single Unix Spec v2, the usual baseline for
postgres on unix. None of the unixoid buildfarms animals has
sys/poll.h but not poll.h. Therefore there's not much point to test
for sys/poll.h's existence and include it optionally.
Author: Andres Freund, per suggestion from Tom Lane
Discussion: https://postgr.es/m/20505.1492723662@sss.pgh.pa.us