This fixes a crash if the local table has a function index and the
function makes non-immutable calls.
Reported-by: Scott Milliken <scott@deltaex.com>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
It is relatively easy to get a replication slot to look as still active
while one process is in the process of getting rid of it; when some
other process tries to "acquire" the slot, it would fail with an error
message of "replication slot XYZ is active for PID N".
The error message in itself is fine, except that when the intention is
to drop the slot, it is unhelpful: the useful behavior would be to wait
until the slot is no longer acquired, so that the drop can proceed. To
implement this, we use a condition variable so that slot acquisition can
be told to wait on that condition variable if the slot is already
acquired, and we make any change in active_pid broadcast a signal on the
condition variable. Thus, as soon as the slot is released, the drop
will proceed properly.
Reported by: Tom Lane
Discussion: https://postgr.es/m/11904.1499039688@sss.pgh.pa.us
Authors: Petr Jelínek, Álvaro Herrera
The regression tests contain numerous cases where we do some activity on a
master server and then wait till the slave has ack'd flushing its copy of
that transaction. Because WAL flush on the slave is asynchronous to the
logicalrep worker process, the worker cannot send such a feedback message
during the LogicalRepApplyLoop iteration where it processes the last data
from the master. In the previous coding, the feedback message would come
out only when the loop's WaitLatchOrSocket call returned WL_TIMEOUT. That
requires one full second of delay (NAPTIME_PER_CYCLE); and to add insult
to injury, it could take more than that if the WaitLatchOrSocket was
interrupted a few times by latch-setting events.
In reality we can expect the slave's walwriter process to have flushed the
WAL data after, more or less, WalWriterDelay (typically 200ms). Hence,
if there are unacked transactions pending, make the wait delay only that
long rather than the full NAPTIME_PER_CYCLE. Also, move one of the
send_feedback() calls into the loop main line, so that we'll check for the
need to send feedback even if we were woken by a latch event and not either
socket data or timeout.
It's not clear how much this matters for production purposes, but
it's definitely helpful for testing.
Discussion: https://postgr.es/m/30864.1498861103@sss.pgh.pa.us
When waiting for a logical replication worker process to start or stop,
we have to busy-wait until we see it add or remove itself from the
LogicalRepWorker slot in shared memory. Those loops were using a
one-second delay between checks, but on any reasonably modern machine, it
doesn't take more than a couple of msec for a worker to spawn or shut down.
Reduce the loop delays to 10ms to avoid wasting quite so much time in the
related regression tests.
In principle, a better solution would be to fix things so that the waiting
process can be awakened via its latch at the right time. But that seems
considerably more invasive, which is undesirable for a post-beta fix.
Worker start/stop performance likely isn't of huge interest anyway for
production purposes, so we might not ever get around to it.
In passing, rearrange the second wait loop in logicalrep_worker_stop()
so that the lock is held at the top of the loop, thus saving one lock
acquisition/release per call, and making it look more like the other loop.
Discussion: https://postgr.es/m/30864.1498861103@sss.pgh.pa.us
When a sync worker is waiting for the associated apply worker to notice
that it's in SYNCWAIT state, wait_for_worker_state_change() would just
patiently wait for that to happen. This generally required waiting for
the 1-second timeout in LogicalRepApplyLoop to elapse. Kicking the worker
via its latch makes things significantly snappier.
While at it, fix race conditions that could potentially result in crashes:
we can *not* call logicalrep_worker_wakeup_ptr() once we've released the
LogicalRepWorkerLock, because worker->proc might've been reset to NULL
after we do that (indeed, there's no really solid reason to believe that
the LogicalRepWorker slot even belongs to the same worker anymore).
In logicalrep_worker_wakeup(), we can just move the wakeup inside the
lock scope. In process_syncing_tables_for_apply(), a bit more code
rearrangement is needed.
Also improve some nearby comments.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:
* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
than the expected column 33.
On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list. This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.
There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses. I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
When, during logical decoding, a transaction gets too big, it's
contents get spilled to disk. Not just the top-transaction gets
spilled, but *also* all of its subtransactions, even if they're not
that large themselves. Unfortunately we didn't clean up
such small spilled subtransactions from disk.
Fix that, by keeping better track of whether a transaction has been
spilled to disk.
Author: Andres Freund
Reported-By: Dmitriy Sarafannikov, Fabrízio de Royes Mello
Discussion:
https://postgr.es/m/1457621358.355011041@f382.i.mail.ruhttps://postgr.es/m/CAFcNs+qNMhNYii4nxpO6gqsndiyxNDYV0S=JNq0v_sEE+9PHXg@mail.gmail.com
Backpatch: 9.4-, where logical decoding was introduced
Previously we required every exported transaction to have an xid
assigned. That was used to check that the exporting transaction is
still running, which in turn is needed to guarantee that that
necessary rows haven't been removed in between exporting and importing
the snapshot.
The exported xid caused unnecessary problems with logical decoding,
because slot creation has to wait for all concurrent xid to finish,
which in turn serializes concurrent slot creation. It also
prohibited snapshots to be exported on hot-standby replicas.
Instead export the virtual transactionid, which avoids the unnecessary
serialization and the inability to export snapshots on standbys. This
changes the file name of the exported snapshot, but since we never
documented what that one means, that seems ok.
Author: Petr Jelinek, slightly editorialized by me
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f598b4b8-8cd7-0d54-0939-adda763d8c34@2ndquadrant.com
When a table is removed from a subscription before the tablesync worker
could start, this would previously result in an error when reading
pg_subscription_rel. Now we just ignore this.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Previously the exit handling was only able to exit from within the
main loop, and not from within the backend code it calls. Fix that by
using the standard die() SIGTERM handler, and adding the necessary
CHECK_FOR_INTERRUPTS() call.
This requires adding yet another process-type-specific branch to
ProcessInterrupts(), which hints that we probably should generalize
that handling. But that's work for another day.
Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/fe072153-babd-3b5d-8052-73527a6eb657@2ndquadrant.com
A logical replication worker should not insert new rows into
pg_subscription_rel, only update existing rows, so that there are no
races if a concurrent refresh removes rows. Adjust the API to be able
to choose that behavior.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
The logical replication apply worker uses the subscription name as
application name, except for table sync. This was incorrectly set to
use the replication slot name, which might be different, in one case.
Also add a comment why the other case is different.
The larger part of this patch replaces usages of MyProc->procLatch
with MyLatch. The latter works even early during backend startup,
where MyProc->procLatch doesn't yet. While the affected code
shouldn't run in cases where it's not initialized, it might get copied
into places where it might. Using MyLatch is simpler and a bit faster
to boot, so there's little point to stick with the previous coding.
While doing so I noticed some weaknesses around newly introduced uses
of latches that could lead to missed events, and an omitted
CHECK_FOR_INTERRUPTS() call in worker_spi.
As all the actual bugs are in v10 code, there doesn't seem to be
sufficient reason to backpatch this.
Author: Andres Freund
Discussion:
https://postgr.es/m/20170606195321.sjmenrfgl2nu6j63@alap3.anarazel.dehttps://postgr.es/m/20170606210405.sim3yl6vpudhmufo@alap3.anarazel.de
Backpatch: -
Make apply busy wait check the catalog instead of shmem state to ensure
that next transaction will see the expected table synchronization state.
Also make the handover always go through same set of steps to make the
overall process easier to understand and debug.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Mark Kirkwood <mark.kirkwood@catalyst.net.nz>
Tested-by: Erik Rijkers <er@xs4all.nl>
We didn't accept any invalidation messages until the whole sync process
had finished (because it flattens all the remote transactions in the
single one). So the sync worker didn't learn about subscription
changes/drop until it has finished. This could lead to "orphaned" sync
workers.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
This avoids "orphaned" sync workers.
This was caused by a thinko in wait_for_sync_status_change.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
The logical replication worker processes now use the normal die()
handler for SIGTERM and CHECK_FOR_INTERRUPTS() instead of custom code.
One problem before was that the apply worker would not exit promptly
when a subscription was dropped, which could lead to deadlocks.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
Move the walrcv_disconnect() calls into the before_shmem_exit handler.
This makes sure the call is always made even during exit by signal, it
saves some duplicate code, and it makes the logic more similar to
walreceiver.c.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Logical replication supports replicating between tables with different
column order. But this failed for the initial table sync because of a
logic error in how the column list for the internal COPY command was
composed. Fix that and also add a test.
Also fix a minor omission in the column name mapping cache. When
creating the mapping list, it would not skip locally dropped columns.
So if a remote column had the same name as a locally dropped
column (...pg.dropped...), then the expected error would not occur.
Reduce some redundant messages to DEBUG1. Be clearer about the
distinction between apply workers and table synchronization workers.
Add subscription and table name where possible.
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
When creating a subscription with slot_name = NONE, we failed to check
that also create_slot = false and enabled = false were set. This
created an invalid subscription and could later lead to a crash if a
NULL slot name was accessed. Add more checks around that for
robustness.
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
We used to only check for a supported relkind on the subscriber during
replication, which is needed to ensure that the setup is valid and we
don't crash. But it's also useful to tell the user immediately when
CREATE or ALTER SUBSCRIPTION is executed that the relation being added
to the subscription is not of a supported relkind.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
Reformat various places in which pgindent will make a mess, and
fix a few small violations of coding style that I happened to notice
while perusing the diffs from a pgindent dry run.
There is one actual bug fix here: the need-to-enlarge-the-buffer code
path in icu_convert_case was obviously broken. Perhaps it's unreachable
in our usage? Or maybe this is just sadly undertested.
Before 955a684e04 logical decoding snapshot maintenance needed to
cope with transactions it might not have seen in their entirety. For
such transactions we'd to assume they modified the catalog (could have
happened before we were watching), and thus a new snapshot had to be
built, and distributed to concurrently running transactions.
That's problematic because building a new snapshot isn't that cheap ,
especially as the the array of committed transactions needs to be
sorted. When creating a slot on a server with a lot of transactions,
this could make logical slot creation infeasibly expensive.
After 955a684e04 there's no need to deal with transaction that
aren't guaranteed to be fully observable. That allows to avoid
building snapshots for transactions that haven't modified catalog,
even before reaching consistency.
While this isn't necessarily a bugfix, slot creation being impossible
in some production workloads, is severe enough to warrant
backpatching.
Author: Andres Freund, based on a quite different patch from Petr Jelinek
Analyzed-By: Petr Jelinek
Reviewed-By: Petr Jelinek
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backpatch: 9.4-, where logical decoding has been introduced
The snapshot assembly during the creation of logical slots relied
waiting for transactions in xl_running_xacts to end, by checking for
their commit/abort records. Unfortunately, despite locking, it is
possible to see an xl_running_xact record listing transactions as
ready, that have already WAL-logged an commit/abort record, as the
locking just prevents the ProcArray to be adjusted, and the commit
record has to be logged first.
That lead to either delayed or hanging snapshot creation, because
snapbuild.c would wait "forever" to see commit/abort records for some
transactions. That hang resolved only if a xl_running_xacts record
without any running transactions happened to be logged, far from
certain on a busy server.
It's impractical to prevent that via more heavyweight locking, the
likelihood of deadlocks and significantly increased contention would
be too big.
Instead change the initial snapshot creation to be solely based on
tracking the oldest running transaction via
xl_running_xacts->oldestRunningXid - that actually ends up
significantly simplifying the code. That has two disadvantages:
1) Because we cannot fully "trust" the contents of xl_running_xacts,
we cannot use it to build the initial snapshot. Instead we have to
wait twice for all running transactions to finish.
2) Previously a slot, unless the race occurred, could be created when
the all transaction perceived as running based on commit/abort
records, now we have to wait for the next xl_running_xacts record.
To address that, trigger logging new xl_running_xacts record from
within snapbuild.c exactly when necessary.
Unfortunately snabuild.c's SnapBuild is stored on disk, one of the
stupider ideas of a certain Mr Freund, so we can't change it in a
minor release. As this is going to be backpatched, we have to hack
around a bit to keep on-disk compatibility. A later commit will
rejigger that on master.
Author: Andres Freund, based on a quite different patch from Petr Jelinek
Analyzed-By: Petr Jelinek
Reviewed-By: Petr Jelinek
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backpatch: 9.4-, where logical decoding has been introduced
Lag tracking is called for each commit, but we introduce
a pacing delay to ensure we don't swamp the lag tracker.
Author: Petr Jelinek, with minor pacing delay code from me
Previously, the memory used by the logical replication apply worker for
processing messages would never be freed, so that could end up using a
lot of memory. To improve that, change the existing ApplyContext memory
context to ApplyMessageContext and reset that after every
message (similar to MessageContext used elsewhere). For consistency of
naming, rename the ApplyCacheContext to ApplyContext.
Author: Stas Kelvich <s.kelvich@postgrespro.ru>
This new arrangement ensures that statistics are reported right after
commit of transactions. The previous arrangement didn't get this quite
right and could lead to assertion failures.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Erik Rijkers <er@xs4all.nl>
After the logical replication launcher was told to wake up at
commit (for example, by a CREATE SUBSCRIPTION command), the flag to wake
up was not reset, so it would be woken up at every following commit as
well. So fix that by resetting the flag.
Also, we don't need to wake up anything if the transaction was rolled
back. Just reset the flag in that case.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
Thinko in commit de4389712: this warning message references the wrong
"LogicalRepWorker *" variable. This would often result in a core dump,
but if it didn't, the message would show the wrong subscription OID.
In passing, adjust the message text to format a subscription OID
similarly to how that's done elsewhere in the function; and fix
grammatical issues in some nearby messages.
Per Coverity testing.
Before restarting a tablesync worker for the same relation, wait
wal_retrieve_retry_interval (currently 5s by default). This avoids
restarting failing workers in a tight loop.
We keep the last start times in a hash table last_start_times that is
separate from the table_states list, because that list is cleared out on
syscache invalidation, which happens whenever a table finishes syncing.
The hash table is kept until all tables have finished syncing.
A future project might be to unify these two and keep everything in one
data structure, but for now this is a less invasive change to accomplish
the original purpose.
For the test suite, set wal_retrieve_retry_interval to its minimum
value, to not increase the test suite run time.
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
Logical decoding stores historical snapshots on disk, so that logical
decoding can restart without having to reconstruct a snapshot from
scratch (for which the resources are not guaranteed to be present
anymore). These serialized snapshots were also used when creating a
new slot via the walsender interface, which can export a "full"
snapshot (i.e. one that can read all tables, not just catalog ones).
The problem is that the serialized snapshots are only useful for
catalogs and not for normal user tables. Thus the use of such a
serialized snapshot could result in an inconsistent snapshot being
exported, which could lead to queries returning wrong data. This
would only happen if logical slots are created while another logical
slot already exists.
Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.
Previously the logical replication launcher stored the last timestamp
when it started the worker, in the local variable "last_start_time",
in order to check whether wal_retrive_retry_interval elapsed since
the last startup of worker. If it has elapsed, the launcher sees
pg_subscription and starts new worker if necessary. This is for
limitting the startup of worker to once a wal_retrieve_retry_interval.
The bug was that the variable "last_start_time" was defined and
always initialized with 0 at the beginning of the launcher's main loop.
So even if it's set to the last timestamp in later phase of the loop,
it's always reset to 0. Therefore the launcher could not check
correctly whether wal_retrieve_retry_interval elapsed since
the last startup.
This patch moves the variable "last_start_time" outside the main loop
so that it will not be reset.
Reviewed-by: Petr Jelinek
Discussion: http://postgr.es/m/CAHGQGwGJrPO++XM4mFENAwpy1eGXKsGdguYv43GUgLgU-x8nTQ@mail.gmail.com
The logical decoding machinery already preserved all the required
catalog tuples, which is sufficient in the course of normal logical
decoding, but did not guarantee that non-catalog tuples were preserved
during computation of the initial snapshot when creating a slot over
the replication protocol.
This could cause a corrupted initial snapshot being exported. The
time window for issues is usually not terribly large, but on a busy
server it's perfectly possible to it hit it. Ongoing decoding is not
affected by this bug.
To avoid increased overhead for the SQL API, only retain additional
tuples when a logical slot is being created over the replication
protocol. To do so this commit changes the signature of
CreateInitDecodingContext(), but it seems unlikely that it's being
used in an extension, so that's probably ok.
In a drive-by fix, fix handling of
ReplicationSlotsComputeRequiredXmin's already_locked argument, which
should only apply to ProcArrayLock, not ReplicationSlotControlLock.
Reported-By: Erik Rijkers
Analyzed-By: Petr Jelinek
Author: Petr Jelinek, heavily editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.
Publisher relation can be incorrectly chosen, if there are more than
one relation in different schemas with the same name.
Author: Euler Taveira <euler@timbira.com.br>
The code was originally written with assumption that launcher is the
only process starting the worker. However that hasn't been true since
commit 7c4f52409 which failed to modify the worker management code
adequately.
This patch adds an in_use field to the LogicalRepWorker struct to
indicate whether the worker slot is being used and uses proper locking
everywhere this flag is set or read.
However if the parent process dies while the new worker is starting and
the new worker fails to attach to shared memory, this flag would never
get cleared. We solve this rare corner case by adding a sort of garbage
collector for in_use slots. This uses another field in the
LogicalRepWorker struct named launch_time that contains the time when
the worker was started. If any request to start a new worker does not
find free slot, we'll check for workers that were supposed to start but
took too long to actually do so, and reuse their slot.
In passing also fix possible race conditions when stopping a worker that
hasn't finished starting yet.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>