Commit 9da0cc35284, which introduced parallel CREATE INDEX, failed to
propagate relmapper.c backend local cache state to parallel worker
processes. This could result in parallel index builds against mapped
catalog relations where the leader process (participating as a worker)
scans the new, pristine relfilenode, while worker processes scan the
obsolescent relfilenode. When this happened, the final index structure
was typically not consistent with the owning table's structure. The
final index structure could contain entries formed from both heap
relfilenodes. Only rebuilds on mapped catalog relations that occur as
part of a VACUUM FULL or CLUSTER could become corrupt in practice, since
their mapped relation relfilenode swap is what allows the inconsistency
to arise.
On master, fix the problem by propagating the required relmapper.c
backend state as part of standard parallel initialization (Cf. commit
29d58fd3). On v11, simply disallow builds against mapped catalog
relations by deeming them parallel unsafe.
Author: Peter Geoghegan
Reported-By: "death lock"
Reviewed-By: Tom Lane, Amit Kapila
Bug: #15309
Discussion: https://postgr.es/m/153329671686.1405.18298309097348420351@wrigleys.postgresql.org
Backpatch: 11-, where parallel CREATE INDEX was introduced.
Since 9.6, heavyweight locking is not an abstract and unhandled
concern of the parallel machinery, but rather something to which
we have a specific approach.
We'd already recognized that we can't pass function pointers across process
boundaries for functions in loadable modules, since a shared library could
get loaded at different addresses in different processes. But actually the
practice doesn't work for functions in the core backend either, if we're
using EXEC_BACKEND. This is the cause of recent failures on buildfarm
member culicidae. Switch to passing a string function name in all cases.
Something like this needs to be back-patched into 9.6, but let's see
if the buildfarm likes it first.
Petr Jelinek, with a bunch of basically-cosmetic adjustments by me
Discussion: https://postgr.es/m/548f9c1d-eafa-e3fa-9da8-f0cc2f654e60@2ndquadrant.com
Commit b0b0d84b3d663a148022e900ebfc164284a95f55 purported to make it
possible to relaunch workers using the same parallel context, but it had
an unpleasant race condition: we might reinitialize after the workers
have sent their last control message but before they have dettached the
DSM, leaving to crashes. Repair by introducing a new ParallelContext
operation, ReinitializeParallelDSM.
Adjust execParallel.c to use this new support, so that we can rescan a
Gather node by relaunching workers but without needing to recreate the
DSM.
Amit Kapila, with some adjustments by me. Extracted from latest parallel
sequential scan patch.
This may allow some callers to avoid the overhead involved in tearing
down a parallel context and then setting up a new one, which means
releasing the DSM and then allocating and populating a new one. I
suspect we'll want to revise the Gather node to make use of this new
capability, but even if not it may be useful elsewhere and requires
very little additional code.
This does four basic things. First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers. Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes. Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.
Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.