1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-26 01:22:12 +03:00

Fix various concurrency issues in logical replication worker launching

The code was originally written with assumption that launcher is the
only process starting the worker.  However that hasn't been true since
commit 7c4f52409 which failed to modify the worker management code
adequately.

This patch adds an in_use field to the LogicalRepWorker struct to
indicate whether the worker slot is being used and uses proper locking
everywhere this flag is set or read.

However if the parent process dies while the new worker is starting and
the new worker fails to attach to shared memory, this flag would never
get cleared.  We solve this rare corner case by adding a sort of garbage
collector for in_use slots.  This uses another field in the
LogicalRepWorker struct named launch_time that contains the time when
the worker was started.  If any request to start a new worker does not
find free slot, we'll check for workers that were supposed to start but
took too long to actually do so, and reuse their slot.

In passing also fix possible race conditions when stopping a worker that
hasn't finished starting yet.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
This commit is contained in:
Peter Eisentraut
2017-04-26 10:43:04 -04:00
parent 309191f66a
commit de43897122
2 changed files with 156 additions and 36 deletions

View File

@ -21,6 +21,15 @@
typedef struct LogicalRepWorker
{
/* Time at which this worker was launched. */
TimestampTz launch_time;
/* Indicates if this slot is used or free. */
bool in_use;
/* Increased everytime the slot is taken by new worker. */
uint16 generation;
/* Pointer to proc array. NULL if not running. */
PGPROC *proc;