The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:
* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
than the expected column 33.
On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list. This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.
There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses. I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
When, during logical decoding, a transaction gets too big, it's
contents get spilled to disk. Not just the top-transaction gets
spilled, but *also* all of its subtransactions, even if they're not
that large themselves. Unfortunately we didn't clean up
such small spilled subtransactions from disk.
Fix that, by keeping better track of whether a transaction has been
spilled to disk.
Author: Andres Freund
Reported-By: Dmitriy Sarafannikov, Fabrízio de Royes Mello
Discussion:
https://postgr.es/m/1457621358.355011041@f382.i.mail.ruhttps://postgr.es/m/CAFcNs+qNMhNYii4nxpO6gqsndiyxNDYV0S=JNq0v_sEE+9PHXg@mail.gmail.com
Backpatch: 9.4-, where logical decoding was introduced
We had three occurrences of essentially the same coding pattern
wherein we tried to retrieve a query result from a libpq connection
without blocking. In the case where PQconsumeInput failed (typically
indicating a lost connection), all three loops simply gave up and
returned, forgetting to clear any previously-collected PGresult
object. Since those are malloc'd not palloc'd, the oversight results
in a process-lifespan memory leak.
One instance, in libpqwalreceiver, is of little significance because
the walreceiver process would just quit anyway if its connection fails.
But we might as well fix it.
The other two instances, in postgres_fdw, are somewhat more worrisome
because at least in principle the scenario could be repeated, allowing
the amount of memory leaked to build up to something worth worrying
about. Moreover, in these cases the loops contain CHECK_FOR_INTERRUPTS
calls, as well as other calls that could potentially elog(ERROR),
providing another way to exit without having cleared the PGresult.
Here we need to add PG_TRY logic similar to what exists in quite a
few other places in postgres_fdw.
Coverity noted the libpqwalreceiver bug; I found the other two cases
by checking all calls of PQconsumeInput.
Back-patch to all supported versions as appropriate (9.2 lacks
postgres_fdw, so this is really quite unexciting for that branch).
Discussion: https://postgr.es/m/22620.1497486981@sss.pgh.pa.us
Previously we required every exported transaction to have an xid
assigned. That was used to check that the exporting transaction is
still running, which in turn is needed to guarantee that that
necessary rows haven't been removed in between exporting and importing
the snapshot.
The exported xid caused unnecessary problems with logical decoding,
because slot creation has to wait for all concurrent xid to finish,
which in turn serializes concurrent slot creation. It also
prohibited snapshots to be exported on hot-standby replicas.
Instead export the virtual transactionid, which avoids the unnecessary
serialization and the inability to export snapshots on standbys. This
changes the file name of the exported snapshot, but since we never
documented what that one means, that seems ok.
Author: Petr Jelinek, slightly editorialized by me
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f598b4b8-8cd7-0d54-0939-adda763d8c34@2ndquadrant.com
When a table is removed from a subscription before the tablesync worker
could start, this would previously result in an error when reading
pg_subscription_rel. Now we just ignore this.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Previously the exit handling was only able to exit from within the
main loop, and not from within the backend code it calls. Fix that by
using the standard die() SIGTERM handler, and adding the necessary
CHECK_FOR_INTERRUPTS() call.
This requires adding yet another process-type-specific branch to
ProcessInterrupts(), which hints that we probably should generalize
that handling. But that's work for another day.
Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/fe072153-babd-3b5d-8052-73527a6eb657@2ndquadrant.com
Since 7c4f52409a (merged in v10), a shutdown master is reported as
FATAL: unexpected result after CommandComplete: server closed the connection unexpectedly
by walsender. It used to be
LOG: replication terminated by primary server
FATAL: could not send end-of-streaming message to primary: no COPY in progress
while the old message clearly is not perfect, it's definitely better
than what's reported now.
The change comes from the attempt to handle finished COPYs without
erroring out, needed for the new logical replication, which wasn't
needed before.
There's probably better ways to handle this, but for now just
explicitly check for a closed connection.
Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f7c7dd08-855c-e4ed-41f4-d064a6c0665a@2ndquadrant.com
Backpatch: -
A logical replication worker should not insert new rows into
pg_subscription_rel, only update existing rows, so that there are no
races if a concurrent refresh removes rows. Adjust the API to be able
to choose that behavior.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
The logical replication apply worker uses the subscription name as
application name, except for table sync. This was incorrectly set to
use the replication slot name, which might be different, in one case.
Also add a comment why the other case is different.
The larger part of this patch replaces usages of MyProc->procLatch
with MyLatch. The latter works even early during backend startup,
where MyProc->procLatch doesn't yet. While the affected code
shouldn't run in cases where it's not initialized, it might get copied
into places where it might. Using MyLatch is simpler and a bit faster
to boot, so there's little point to stick with the previous coding.
While doing so I noticed some weaknesses around newly introduced uses
of latches that could lead to missed events, and an omitted
CHECK_FOR_INTERRUPTS() call in worker_spi.
As all the actual bugs are in v10 code, there doesn't seem to be
sufficient reason to backpatch this.
Author: Andres Freund
Discussion:
https://postgr.es/m/20170606195321.sjmenrfgl2nu6j63@alap3.anarazel.dehttps://postgr.es/m/20170606210405.sim3yl6vpudhmufo@alap3.anarazel.de
Backpatch: -
Make apply busy wait check the catalog instead of shmem state to ensure
that next transaction will see the expected table synchronization state.
Also make the handover always go through same set of steps to make the
overall process easier to understand and debug.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Mark Kirkwood <mark.kirkwood@catalyst.net.nz>
Tested-by: Erik Rijkers <er@xs4all.nl>
This allows to cancel commands run over replication connections. While
it might have some use before v10, it has become important now that
normal SQL commands are allowed in database connected walsender
connections.
Author: Petr Jelinek
Reviewed-By: Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/7966f454-7cd7-2b0c-8b70-cdca9d5a8c97@2ndquadrant.com
Because walsender and normal backends share the same main loop it's
problematic to have two different flag variables, set in signal
handlers, indicating a pending configuration reload. Only certain
walsender commands reach code paths checking for the
variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT
... LOGICAL, notably not base backups).
This is a bug present since the introduction of walsender, but has
gotten worse in releases since then which allow walsender to do more.
A later patch, not slated for v10, will similarly unify SIGHUP
handling in other types of processes as well.
Author: Petr Jelinek, Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de
Backpatch: 9.2-, bug is present since 9.0
When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and
throws a PANIC if so. At that point, only walsenders are still
active, so one might think this could not happen, but walsenders can
also generate WAL, for instance in BASE_BACKUP and logical decoding
related commands (e.g. via hint bits). So they can trigger this panic
if such a command is run while the shutdown checkpoint is being
written.
To fix this, divide the walsender shutdown into two phases. First,
checkpointer, itself triggered by postmaster, sends a
PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend
is idle or runs an SQL query this causes the backend to shutdown, if
logical replication is in progress all existing WAL records are
processed followed by a shutdown. Otherwise this causes the walsender
to switch to the "stopping" state. In this state, the walsender will
reject any further replication commands. The checkpointer begins the
shutdown checkpoint once all walsenders are confirmed as
stopping. When the shutdown checkpoint finishes, the postmaster sends
us SIGUSR2. This instructs walsender to send any outstanding WAL,
including the shutdown checkpoint record, wait for it to be replicated
to the standby, and then exit.
Author: Andres Freund, based on an earlier patch by Michael Paquier
Reported-By: Fujii Masao, Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
Backpatch: 9.4, where logical decoding was introduced
The non-participation in procsignal was a problem for both changes in
master, e.g. parallelism not working for normal statements run in
walsender backends, and older branches, e.g. recovery conflicts and
catchup interrupts not working for logical decoding walsenders.
This commit thus replaces the previous WalSndXLogSendHandler with
procsignal_sigusr1_handler. In branches since db0f6cad48 that can
lead to additional SetLatch calls, but that only rarely seems to make
a difference.
Author: Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20170421014030.fdzvvvbrz4nckrow@alap3.anarazel.de
Backpatch: 9.4, earlier commits don't seem to benefit sufficiently
This reverts commit 086221cf6b, which
was made to master only.
The approach implemented in the above commit has some issues. While
those could easily be fixed incrementally, doing so would make
backpatching considerably harder, so instead first revert this patch.
Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
We didn't accept any invalidation messages until the whole sync process
had finished (because it flattens all the remote transactions in the
single one). So the sync worker didn't learn about subscription
changes/drop until it has finished. This could lead to "orphaned" sync
workers.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
This avoids "orphaned" sync workers.
This was caused by a thinko in wait_for_sync_status_change.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
When trying to access a replication slot that is supposed to already
exist, we don't need to check the naming rules again. If the slot
does not exist, we will then get a "does not exist" error message, which
is generally more useful from the perspective of an end user.
The logical replication worker processes now use the normal die()
handler for SIGTERM and CHECK_FOR_INTERRUPTS() instead of custom code.
One problem before was that the apply worker would not exit promptly
when a subscription was dropped, which could lead to deadlocks.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
Move the walrcv_disconnect() calls into the before_shmem_exit handler.
This makes sure the call is always made even during exit by signal, it
saves some duplicate code, and it makes the logic more similar to
walreceiver.c.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Logical replication supports replicating between tables with different
column order. But this failed for the initial table sync because of a
logic error in how the column list for the internal COPY command was
composed. Fix that and also add a test.
Also fix a minor omission in the column name mapping cache. When
creating the mapping list, it would not skip locally dropped columns.
So if a remote column had the same name as a locally dropped
column (...pg.dropped...), then the expected error would not occur.
Reduce some redundant messages to DEBUG1. Be clearer about the
distinction between apply workers and table synchronization workers.
Add subscription and table name where possible.
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Using flex's -i switch to achieve case-insensitivity is not a very safe
practice, because the scanner's behavior may then depend on the locale
that flex was invoked in. In the particular example at hand, that's
not academic: the possible matches for "FIRST" will be different in a
Turkish locale than elsewhere. Do it the hard way instead, as our
other scanners do.
Also, drop use of -b -CF -p, because this scanner is only used when
parsing the contents of a GUC variable. That's not done often, and
the amount of text to be parsed can be expected to be trivial, so
prioritizing scanner speed over code size seems like quite the wrong
tradeoff. Using flex's default optimization options reduces the
size of syncrep_gram.o by more than 50%.
The case-insensitivity problem is new in HEAD (cf commit 3901fd70c).
The poor choice of optimization flags exists also in 9.6, but it doesn't
seem important enough to back-patch.
Discussion: https://postgr.es/m/24403.1495225931@sss.pgh.pa.us
When creating a subscription with slot_name = NONE, we failed to check
that also create_slot = false and enabled = false were set. This
created an invalid subscription and could later lead to a crash if a
NULL slot name was accessed. Add more checks around that for
robustness.
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
We used to only check for a supported relkind on the subscriber during
replication, which is needed to ensure that the setup is valid and we
don't crash. But it's also useful to tell the user immediately when
CREATE or ALTER SUBSCRIPTION is executed that the relation being added
to the subscription is not of a supported relkind.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
Reformat various places in which pgindent will make a mess, and
fix a few small violations of coding style that I happened to notice
while perusing the diffs from a pgindent dry run.
There is one actual bug fix here: the need-to-enlarge-the-buffer code
path in icu_convert_case was obviously broken. Perhaps it's unreachable
in our usage? Or maybe this is just sadly undertested.
Before 955a684e04 logical decoding snapshot maintenance needed to
cope with transactions it might not have seen in their entirety. For
such transactions we'd to assume they modified the catalog (could have
happened before we were watching), and thus a new snapshot had to be
built, and distributed to concurrently running transactions.
That's problematic because building a new snapshot isn't that cheap ,
especially as the the array of committed transactions needs to be
sorted. When creating a slot on a server with a lot of transactions,
this could make logical slot creation infeasibly expensive.
After 955a684e04 there's no need to deal with transaction that
aren't guaranteed to be fully observable. That allows to avoid
building snapshots for transactions that haven't modified catalog,
even before reaching consistency.
While this isn't necessarily a bugfix, slot creation being impossible
in some production workloads, is severe enough to warrant
backpatching.
Author: Andres Freund, based on a quite different patch from Petr Jelinek
Analyzed-By: Petr Jelinek
Reviewed-By: Petr Jelinek
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backpatch: 9.4-, where logical decoding has been introduced
The snapshot assembly during the creation of logical slots relied
waiting for transactions in xl_running_xacts to end, by checking for
their commit/abort records. Unfortunately, despite locking, it is
possible to see an xl_running_xact record listing transactions as
ready, that have already WAL-logged an commit/abort record, as the
locking just prevents the ProcArray to be adjusted, and the commit
record has to be logged first.
That lead to either delayed or hanging snapshot creation, because
snapbuild.c would wait "forever" to see commit/abort records for some
transactions. That hang resolved only if a xl_running_xacts record
without any running transactions happened to be logged, far from
certain on a busy server.
It's impractical to prevent that via more heavyweight locking, the
likelihood of deadlocks and significantly increased contention would
be too big.
Instead change the initial snapshot creation to be solely based on
tracking the oldest running transaction via
xl_running_xacts->oldestRunningXid - that actually ends up
significantly simplifying the code. That has two disadvantages:
1) Because we cannot fully "trust" the contents of xl_running_xacts,
we cannot use it to build the initial snapshot. Instead we have to
wait twice for all running transactions to finish.
2) Previously a slot, unless the race occurred, could be created when
the all transaction perceived as running based on commit/abort
records, now we have to wait for the next xl_running_xacts record.
To address that, trigger logging new xl_running_xacts record from
within snapbuild.c exactly when necessary.
Unfortunately snabuild.c's SnapBuild is stored on disk, one of the
stupider ideas of a certain Mr Freund, so we can't change it in a
minor release. As this is going to be backpatched, we have to hack
around a bit to keep on-disk compatibility. A later commit will
rejigger that on master.
Author: Andres Freund, based on a quite different patch from Petr Jelinek
Analyzed-By: Petr Jelinek
Reviewed-By: Petr Jelinek
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backpatch: 9.4-, where logical decoding has been introduced
Lag tracking is called for each commit, but we introduce
a pacing delay to ensure we don't swamp the lag tracker.
Author: Petr Jelinek, with minor pacing delay code from me
Previously, the memory used by the logical replication apply worker for
processing messages would never be freed, so that could end up using a
lot of memory. To improve that, change the existing ApplyContext memory
context to ApplyMessageContext and reset that after every
message (similar to MessageContext used elsewhere). For consistency of
naming, rename the ApplyCacheContext to ApplyContext.
Author: Stas Kelvich <s.kelvich@postgrespro.ru>
This new arrangement ensures that statistics are reported right after
commit of transactions. The previous arrangement didn't get this quite
right and could lead to assertion failures.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Erik Rijkers <er@xs4all.nl>
When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and throws
a PANIC if so. At that point, only walsenders are still active, so one
might think this could not happen, but walsenders can also generate WAL,
for instance in BASE_BACKUP and certain variants of
CREATE_REPLICATION_SLOT. So they can trigger this panic if such a
command is run while the shutdown checkpoint is being written.
To fix this, divide the walsender shutdown into two phases. First, the
postmaster sends a SIGUSR2 signal to all walsenders. The walsenders
then put themselves into the "stopping" state. In this state, they
reject any new commands. (For simplicity, we reject all new commands,
so that in the future we do not have to track meticulously which
commands might generate WAL.) The checkpointer waits for all walsenders
to reach this state before proceeding with the shutdown checkpoint.
After the shutdown checkpoint is done, the postmaster sends
SIGINT (previously unused) to the walsenders. This triggers the
existing shutdown behavior of sending out the shutdown checkpoint record
and then terminating.
Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
After the logical replication launcher was told to wake up at
commit (for example, by a CREATE SUBSCRIPTION command), the flag to wake
up was not reset, so it would be woken up at every following commit as
well. So fix that by resetting the flag.
Also, we don't need to wake up anything if the transaction was rolled
back. Just reset the flag in that case.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
Thinko in commit de4389712: this warning message references the wrong
"LogicalRepWorker *" variable. This would often result in a core dump,
but if it didn't, the message would show the wrong subscription OID.
In passing, adjust the message text to format a subscription OID
similarly to how that's done elsewhere in the function; and fix
grammatical issues in some nearby messages.
Per Coverity testing.
Before restarting a tablesync worker for the same relation, wait
wal_retrieve_retry_interval (currently 5s by default). This avoids
restarting failing workers in a tight loop.
We keep the last start times in a hash table last_start_times that is
separate from the table_states list, because that list is cleared out on
syscache invalidation, which happens whenever a table finishes syncing.
The hash table is kept until all tables have finished syncing.
A future project might be to unify these two and keep everything in one
data structure, but for now this is a less invasive change to accomplish
the original purpose.
For the test suite, set wal_retrieve_retry_interval to its minimum
value, to not increase the test suite run time.
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
Earlier commits (56e19d938d and 2bef06d516) make it cheaper to
create a logical slot if not exporting the initial snapshot. If
NOEXPORT_SNAPSHOT is specified, we can skip the overhead, not just
when creating a slot via sql (which can't export snapshots). As
NOEXPORT_SNAPSHOT has only recently been introduced, this shouldn't be
backpatched.
Logical decoding stores historical snapshots on disk, so that logical
decoding can restart without having to reconstruct a snapshot from
scratch (for which the resources are not guaranteed to be present
anymore). These serialized snapshots were also used when creating a
new slot via the walsender interface, which can export a "full"
snapshot (i.e. one that can read all tables, not just catalog ones).
The problem is that the serialized snapshots are only useful for
catalogs and not for normal user tables. Thus the use of such a
serialized snapshot could result in an inconsistent snapshot being
exported, which could lead to queries returning wrong data. This
would only happen if logical slots are created while another logical
slot already exists.
Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.
Previously the logical replication launcher stored the last timestamp
when it started the worker, in the local variable "last_start_time",
in order to check whether wal_retrive_retry_interval elapsed since
the last startup of worker. If it has elapsed, the launcher sees
pg_subscription and starts new worker if necessary. This is for
limitting the startup of worker to once a wal_retrieve_retry_interval.
The bug was that the variable "last_start_time" was defined and
always initialized with 0 at the beginning of the launcher's main loop.
So even if it's set to the last timestamp in later phase of the loop,
it's always reset to 0. Therefore the launcher could not check
correctly whether wal_retrieve_retry_interval elapsed since
the last startup.
This patch moves the variable "last_start_time" outside the main loop
so that it will not be reset.
Reviewed-by: Petr Jelinek
Discussion: http://postgr.es/m/CAHGQGwGJrPO++XM4mFENAwpy1eGXKsGdguYv43GUgLgU-x8nTQ@mail.gmail.com
The logical decoding machinery already preserved all the required
catalog tuples, which is sufficient in the course of normal logical
decoding, but did not guarantee that non-catalog tuples were preserved
during computation of the initial snapshot when creating a slot over
the replication protocol.
This could cause a corrupted initial snapshot being exported. The
time window for issues is usually not terribly large, but on a busy
server it's perfectly possible to it hit it. Ongoing decoding is not
affected by this bug.
To avoid increased overhead for the SQL API, only retain additional
tuples when a logical slot is being created over the replication
protocol. To do so this commit changes the signature of
CreateInitDecodingContext(), but it seems unlikely that it's being
used in an extension, so that's probably ok.
In a drive-by fix, fix handling of
ReplicationSlotsComputeRequiredXmin's already_locked argument, which
should only apply to ProcArrayLock, not ReplicationSlotControlLock.
Reported-By: Erik Rijkers
Analyzed-By: Petr Jelinek
Author: Petr Jelinek, heavily editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.