mirror of
https://github.com/postgres/postgres.git
synced 2025-04-22 23:02:54 +03:00
Emit cascaded standby message on shutdown only when appropriate.
Adds additional test for active walsenders and closes a race condition for when we failover when a new walsender was connecting. Reported and fixed bu Fujii Masao. Review by Heikki Linnakangas
This commit is contained in:
parent
39039e6d7a
commit
dde70cc313
@ -2328,10 +2328,11 @@ reaper(SIGNAL_ARGS)
|
||||
* XXX should avoid the need for disconnection. When we do,
|
||||
* am_cascading_walsender should be replaced with RecoveryInProgress()
|
||||
*/
|
||||
if (max_wal_senders > 0)
|
||||
if (max_wal_senders > 0 && CountChildren(BACKEND_TYPE_WALSND) > 0)
|
||||
{
|
||||
ereport(LOG,
|
||||
(errmsg("terminating all walsender processes to force cascaded standby(s) to update timeline and reconnect")));
|
||||
(errmsg("terminating all walsender processes to force cascaded "
|
||||
"standby(s) to update timeline and reconnect")));
|
||||
SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
|
||||
}
|
||||
|
||||
|
@ -368,6 +368,35 @@ StartReplication(StartReplicationCmd *cmd)
|
||||
MarkPostmasterChildWalSender();
|
||||
SendPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE);
|
||||
|
||||
/*
|
||||
* When promoting a cascading standby, postmaster sends SIGUSR2 to
|
||||
* any cascading walsenders to kill them. But there is a corner-case where
|
||||
* such walsender fails to receive SIGUSR2 and survives a standby promotion
|
||||
* unexpectedly. This happens when postmaster sends SIGUSR2 before
|
||||
* the walsender marks itself as a WAL sender, because postmaster sends
|
||||
* SIGUSR2 to only the processes marked as a WAL sender.
|
||||
*
|
||||
* To avoid this corner-case, if recovery is NOT in progress even though
|
||||
* the walsender is cascading one, we do the same thing as SIGUSR2 signal
|
||||
* handler does, i.e., set walsender_ready_to_stop to true. Which causes
|
||||
* the walsender to end later.
|
||||
*
|
||||
* When terminating cascading walsenders, usually postmaster writes
|
||||
* the log message announcing the terminations. But there is a race condition
|
||||
* here. If there is no walsender except this process before reaching here,
|
||||
* postmaster thinks that there is no walsender and suppresses that
|
||||
* log message. To handle this case, we always emit that log message here.
|
||||
* This might cause duplicate log messages, but which is less likely to happen,
|
||||
* so it's not worth writing some code to suppress them.
|
||||
*/
|
||||
if (am_cascading_walsender && !RecoveryInProgress())
|
||||
{
|
||||
ereport(LOG,
|
||||
(errmsg("terminating walsender process to force cascaded standby "
|
||||
"to update timeline and reconnect")));
|
||||
walsender_ready_to_stop = true;
|
||||
}
|
||||
|
||||
/*
|
||||
* We assume here that we're logging enough information in the WAL for
|
||||
* log-shipping, since this is checked in PostmasterMain().
|
||||
|
Loading…
x
Reference in New Issue
Block a user