The code was originally written with assumption that launcher is the
only process starting the worker. However that hasn't been true since
commit 7c4f52409 which failed to modify the worker management code
adequately.
This patch adds an in_use field to the LogicalRepWorker struct to
indicate whether the worker slot is being used and uses proper locking
everywhere this flag is set or read.
However if the parent process dies while the new worker is starting and
the new worker fails to attach to shared memory, this flag would never
get cleared. We solve this rare corner case by adding a sort of garbage
collector for in_use slots. This uses another field in the
LogicalRepWorker struct named launch_time that contains the time when
the worker was started. If any request to start a new worker does not
find free slot, we'll check for workers that were supposed to start but
took too long to actually do so, and reuse their slot.
In passing also fix possible race conditions when stopping a worker that
hasn't finished starting yet.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
In quorum-based synchronous replication, all the standbys listed in
synchronous_standby_names equally have chances to be chosen
as synchronous standbys. So they should have the same priority.
However, previously, quorum standbys whose names appear earlier
in the list were given higher priority values though the difference of
those priority values didn't affect the selection of synchronous standbys.
Users could see those "meaningless" priority values in pg_stat_replication
and this was confusing.
This commit gives all the quorum synchronous standbys the same
highest priority, i.e., 1, in order to remove such confusion.
Author: Fujii Masao
Reviewed-by: Masahiko Sawada, Kyotaro Horiguchi
Discussion: http://postgr.es/m/CAHGQGwEKOw=SmPLxJzkBsH6wwDBgOnVz46QjHbtsiZ-d-2RGUg@mail.gmail.com
This seems to be largely cosmetic, avoiding valgrind bleats and the
like. The uninitialized padding influences the CRC of the on-disk
entry, but because it's also used when verifying the CRC, that doesn't
cause spurious failures. Backpatch nonetheless.
It's a bit unfortunate that contrib/test_decoding/sql/replorigin.sql
doesn't exercise the checkpoint path, but checkpoints are fairly
expensive on weaker machines, and we'd have to stop/start for that to
be meaningful.
Author: Andres Freund
Discussion: https://postgr.es/m/20170422183123.w2jgiuxtts7qrqaq@alap3.anarazel.de
Backpatch: 9.5, where replication origins were introduced
As reported by buildfarm animal skink / valgrind, some of the
variables weren't always initialized. To avoid further mishaps use
memset to ensure the entire entry is initialized.
Author: Petr Jelinek
Reported-By: Andres Freund
Discussion: https://postgr.es/m/20170422183123.w2jgiuxtts7qrqaq@alap3.anarazel.de
Backpatch: none, code new in master
Bug was masked by error in running 004_timeline_switch.pl that was
fixed recently in 7d68f2281a.
Detective work by Alvaro Herrera and Tom Lane
Author: Thomas Munro
Commit 7c4f524 allowed walsender to execute normal SQL commands
to support table sync feature in logical replication. Previously
while log_statement caused such SQL commands to be logged,
log_replication_commands caused them to be logged, too.
That is, such SQL commands were logged twice unexpectedly
when those settings were both enabled.
This commit forces log_replication_commands to log only replication
commands, to prevent normal SQL commands from being logged twice.
Author: Masahiko Sawada
Reviewed-by: Kyotaro Horiguchi
Reported-by: Fujii Masao
Discussion: http://postgr.es/m/CAHGQGwFDWh_Qr-q_GEMpD+qH=vYPMdVqw=ZOSY3kX_Pna9R9SA@mail.gmail.com
It's not safe to raise an error while holding spinlock. But previously
logical replication worker for table sync called the function which
reads the system catalog and may raise an error while it's holding
spinlock. Which could lead to the trouble where spinlock will never
be released and the server gets stuck infinitely.
Author: Petr Jelinek
Reviewed-by: Kyotaro Horiguchi and Fujii Masao
Reported-by: Fujii Masao
Discussion: http://postgr.es/m/CAHGQGwFDWh_Qr-q_GEMpD+qH=vYPMdVqw=ZOSY3kX_Pna9R9SA@mail.gmail.com
* Be sure to reset the launcher's pid (LogicalRepCtx->launcher_pid) to 0
even when the launcher emits an error.
* Declare ApplyLauncherWakeup() as a static function because it's called
only in launcher.c.
* Previously IsBackendPId() was used to check whether the launcher's pid
was valid. IsBackendPid() was necessary because there was the bug where
the launcher's pid was not reset to 0. But now it's fixed, so IsBackendPid()
is not necessary and this patch removes it.
Author: Masahiko Sawada
Reviewed-by: Kyotaro Horiguchi
Reported-by: Fujii Masao
Discussion: http://postgr.es/m/CAHGQGwFDWh_Qr-q_GEMpD+qH=vYPMdVqw=ZOSY3kX_Pna9R9SA@mail.gmail.com
CopyFrom() needs a range table for formatting certain errors for
constraint violations.
This changes the mechanism of how the range table is passed to the
CopyFrom() executor state. We used to generate the range table and one
entry for the relation manually inside DoCopy(). Now we use
addRangeTableEntryForRelation() to setup the range table and relation
entry for the ParseState, which is then passed down by BeginCopyFrom().
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Euler Taveira <euler@timbira.com.br>
Coverity complained because bgw.bgw_extra wasn't being filled in by
ApplyLauncherRegister(). The most future-proof fix is to memset the
whole BackgroundWorker struct to zeroes. While at it, let's apply the
same coding rule to other places that set up BackgroundWorker structs;
four out of five had the same or related issues.
When sending a tuple attribute, the previous coding erroneously sent the
length byte before encoding conversion, which would lead to protocol
failures on the receiving side if the length did not match the following
string.
To fix that, use pq_sendcountedtext() for sending tuple attributes,
which takes care of all of that internally. To match the API of
pq_sendcountedtext(), send even text values without a trailing zero byte
and have the receiving end put it in place instead. This matches how
the standard FE/BE protocol behaves.
Reported-by: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
All error messages use the American English spelling of recognize,
apply to the single one not doing so to be consistent.
Author: Daniel Gustafsson <daniel@yesql.se>
We need to set the origin remote position to end_lsn, not commit_lsn, as
commit_lsn is the start of commit record, and we use the origin remote
position as start position when restarting replication stream. If we'd
use commit_lsn, we could request data that we already received from the
remote server after a crash of a downstream server.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Since change of slot name is a supported operation, handle it more
gracefully, instead of in the this-should-not-happen way.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
On EXEC_BACKEND builds, this can fail if ASLR is in use.
Backpatch to 9.5. On master, completely remove the bgw_main field
completely, since there is no situation in which it is safe for an
EXEC_BACKEND build. On 9.6 and 9.5, leave the field intact to avoid
breaking things for third-party code that doesn't care about working
under EXEC_BACKEND. Prior to 9.5, there are no in-core bgworker
entrypoints.
Petr Jelinek, reviewed by me.
Discussion: http://postgr.es/m/09d8ad33-4287-a09b-a77f-77f8761adb5e@2ndquadrant.com
Three nologin roles with non-overlapping privs are created by default
* pg_read_all_settings - read all GUCs.
* pg_read_all_stats - pg_stat_*, pg_database_size(), pg_tablespace_size()
* pg_stat_scan_tables - may lock/scan tables
Top level role - pg_monitor includes all of the above by default, plus others
Author: Dave Page
Reviewed-by: Stephen Frost, Robert Haas, Peter Eisentraut, Simon Riggs
There are no functional changes here; this simply encapsulates knowledge
of the ItemPointerData struct so that a future patch can change things
without more breakage.
All direct users of ip_blkid and ip_posid are changed to use existing
macros ItemPointerGetBlockNumber and ItemPointerGetOffsetNumber
respectively. For callers where that's inappropriate (because they
Assert that the itempointer is is valid-looking), add
ItemPointerGetBlockNumberNoCheck and ItemPointerGetOffsetNumberNoCheck,
which lack the assertion but are otherwise identical.
Author: Pavan Deolasee
Discussion: https://postgr.es/m/CABOikdNnFon4cJiL=h1mZH3bgUeU+sWHuU4Yr8AB=j3A2p1GiA@mail.gmail.com
Automatically drop all logical replication slots associated with a
database when the database is dropped. Previously we threw an ERROR
if a slot existed. Now we throw ERROR only if a slot is active in
the database being dropped.
Craig Ringer
Previously, auxiliary processes and background workers not connected
to a database (such as the logical replication launcher) weren't
shown. Include them, so that we can see the associated wait state
information. Add a new column to identify the processes type, so that
people can filter them out easily using SQL if they wish.
Before this patch was written, there was discussion about whether we
should expose this information in a separate view, so as to avoid
contaminating pg_stat_activity with things people might not want to
see. But putting everything in pg_stat_activity was a more popular
choice, so that's what the patch does.
Kuntal Ghosh, reviewed by Amit Langote and Michael Paquier. Some
revisions and bug fixes by me.
Discussion: http://postgr.es/m/CA+TgmoYES5nhkEGw9nZXU8_FhA8XEm8NTm3-SO+3ML1B81Hkww@mail.gmail.com
Fix an incorrect assert condition (noted by Coverity), and spell the new
name of the function correctly. Typos introduced in commit 7c4f52409.
Michael Paquier
If the upstream walsender is using a physical replication slot, store the
catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a
slot and has only a PGPROC entry behaviour doesn't change, as we store the
combined xmin and catalog_xmin in the PGPROC entry.
Author: Craig Ringer
Always return tupleslot and tupledesc from libpqrcv_exec. This avoids
requiring callers to handle that separately.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Michael Banck <michael.banck@credativ.de>
Adds write_lag, flush_lag and replay_lag cols to pg_stat_replication.
Implements a lag tracker module that reports the lag times based upon
measurements of the time taken for recent WAL to be written, flushed and
replayed and for the sender to hear about it. These times
represent the commit lag that was (or would have been) introduced by each
synchronous commit level, if the remote server was configured as a
synchronous standby. For an asynchronous standby, the replay_lag column
approximates the delay before recent transactions became visible to queries.
If the standby server has entirely caught up with the sending server and
there is no more WAL activity, the most recently measured lag times will
continue to be displayed for a short time and then show NULL.
Physical replication lag tracking is automatic. Logical replication tracking
is possible but is the responsibility of the logical decoding plugin.
Tracking is a private module operating within each walsender individually,
with values reported to shared memory. Module not used outside of walsender.
Design and code is good enough now to commit - kudos to the author.
In many ways a difficult topic, with important and subtle behaviour so this
shoudl be expected to generate discussion and multiple open items: Test now!
Author: Thomas Munro, following designs by Fujii Masao and Simon Riggs
Review: Simon Riggs, Ian Barwick and Craig Ringer
Add functionality for a new subscription to copy the initial data in the
tables and then sync with the ongoing apply process.
For the copying, add a new internal COPY option to have the COPY source
data provided by a callback function. The initial data copy works on
the subscriber by receiving COPY data from the publisher and then
providing it locally into a COPY that writes to the destination table.
A WAL receiver can now execute full SQL commands. This is used here to
obtain information about tables and publications.
Several new options were added to CREATE and ALTER SUBSCRIPTION to
control whether and when initial table syncing happens.
Change pg_dump option --no-create-subscription-slots to
--no-subscription-connect and use the new CREATE SUBSCRIPTION
... NOCONNECT option for that.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Erik Rijkers <er@xs4all.nl>
Uses page-based mechanism to ensure we’re using the correct timeline.
Tests are included to exercise the functionality using a cold disk-level copy
of the master that's started up as a replica with slots intact, but the
intended use of the functionality is with later features.
Craig Ringer, reviewed by Simon Riggs and Andres Freund
Previous commits, notably 53be0b1add and
6f3bd98ebf, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
The original coding in commit 1e8a85009 didn't use PQconnectPoll per
spec, and while the rewrite in e434ad39a is closer, it still doesn't
guarantee to wait until the socket is read-ready or write-ready (as
appropriate) before calling PQconnectPoll. It's not clear whether
that omission is causing the continuing failures on buildfarm member
bowerbird; but given the lack of other explanations meeting the
available facts, let's tighten that up and see what happens.
An independent issue in the same loop was that it had a race condition
whereby it could clear the process's latch without having serviced an
interrupt request, causing failure to respond to a cancel while waiting
for connection (the very problem 1e8a85009 was meant to fix).
Discussion: https://postgr.es/m/7295.1489596949@sss.pgh.pa.us
We used to export snapshots unconditionally in CREATE_REPLICATION_SLOT
in the replication protocol, but several upcoming patches want more
control over what happens.
Suppress snapshot export in pg_recvlogical, which neither needs nor can
use the exported snapshot. Since snapshot exporting can fail this
improves reliability.
This also paves the way for allowing the creation of replication slots
on standbys, which cannot export snapshots because they cannot allocate
new XIDs.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
This makes almost all core code follow the policy introduced in the
previous commit. Specific decisions:
- Text search support functions with char* and length arguments, such as
prsstart and lexize, may receive unaligned strings. I doubt
maintainers of non-core text search code will notice.
- Use plain VARDATA() on values detoasted or synthesized earlier in the
same function. Use VARDATA_ANY() on varlenas sourced outside the
function, even if they happen to always have four-byte headers. As an
exception, retain the universal practice of using VARDATA() on return
values of SendFunctionCall().
- Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large
for a one-byte header, so this misses no optimization.) Sites that do
not call get_page_from_raw() typically need the four-byte alignment.
- For now, do not change btree_gist. Its use of four-byte headers in
memory is partly entangled with storage of 4-byte headers inside
GBT_VARKEY, on disk.
- For now, do not change gtrgm_consistent() or gtrgm_distance(). They
incorporate the varlena header into a cache, and there are multiple
credible implementation strategies to consider.
There's no really good reason why the autovacuum launcher and logical
replication launcher should announce themselves at startup and shutdown
by default. Users don't care that those processes exist, and it's
inconsistent that those background processes announce themselves while
others don't. So, reduce those messages from LOG to DEBUG1 level.
I was sorely tempted to reduce the "starting logical replication worker
for subscription ..." message to DEBUG1 as well, but forebore for now.
Those processes might possibly be of direct interest to users, at least
until logical replication is a lot better shaken out than it is today.
Discussion: https://postgr.es/m/19479.1489121003@sss.pgh.pa.us
Any logical rep workers must have their subscription entries in
pg_subscription. To ensure this, we need to prevent the launcher
from starting new worker corresponding to the subscription that
DROP SUBSCRIPTION command is removing. To implement this,
previously LogicalRepLauncherLock was introduced and held until
the end of transaction running DROP SUBSCRIPTION. But using
LWLock for that purpose was not valid.
Instead, this commit changes DROP SUBSCRIPTION so that it takes
AccessExclusiveLock on pg_subscription, in order to ensure that
the launcher cannot see any subscriptions being removed. Also this
commit gets rid of LogicalRepLauncherLock.
Patch by me, reviewed by Petr Jelinek
Discussion: https://www.postgresql.org/message-id/CAHGQGwHPi8ky-yANFfe0sgmhKtsYcQLTnKx07bW9S7-Rn1746w@mail.gmail.com
Per libpq documentation, the initial state must be
PGRES_POLLING_WRITING. Failing to do that appears to cause some issues
on some Windows systems.
From: Petr Jelinek <petr.jelinek@2ndquadrant.com>