1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-27 12:41:57 +03:00

Provide options for postmaster to kill child processes with SIGABRT.

The postmaster normally sends SIGQUIT to force-terminate its
child processes after a child crash or immediate-stop request.
If that doesn't result in child exit within a few seconds,
we follow it up with SIGKILL.  This patch provides GUC flags
that allow either of these signals to be replaced with SIGABRT.
On typically-configured Unix systems, that will result in a
core dump being produced for each such child.  This can be
useful for debugging problems, although it's not something you'd
want to have on in production due to the risk of disk space
bloat from lots of core files.

The old postmaster -T switch, which sent SIGSTOP in place of
SIGQUIT, is changed to be the same as send_abort_for_crash.
As far as I can tell from the code comments, the intent of
that switch was just to block things for long enough to force
core dumps manually, which seems like an unnecessary extra step.
(Maybe at the time, there was no way to get most kernels to
produce core files with per-PID names, requiring manual core
file renaming after each one.  But now it's surely the hard way.)

I also took the opportunity to remove the old postmaster -n
(skip shmem reinit) switch, which hasn't actually done anything
in decades, though the documentation still claimed it did.

Discussion: https://postgr.es/m/2251016.1668797294@sss.pgh.pa.us
This commit is contained in:
Tom Lane
2022-11-21 11:59:29 -05:00
parent e2933a6e11
commit 51b5834cd5
6 changed files with 134 additions and 129 deletions

View File

@ -11500,6 +11500,62 @@ LOG: CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
</listitem>
</varlistentry>
<varlistentry id="guc-send-abort-for-crash" xreflabel="send_abort_for_crash">
<term><varname>send_abort_for_crash</varname> (<type>boolean</type>)
<indexterm>
<primary><varname>send_abort_for_crash</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
By default, after a backend crash the postmaster will stop remaining
child processes by sending them <systemitem>SIGQUIT</systemitem>
signals, which permits them to exit more-or-less gracefully. When
this option is set to <literal>on</literal>,
<systemitem>SIGABRT</systemitem> is sent instead. That normally
results in production of a core dump file for each such child
process.
This can be handy for investigating the states of other processes
after a crash. It can also consume lots of disk space in the event
of repeated crashes, so do not enable this on systems you are not
monitoring carefully.
Beware that no support exists for cleaning up the core file(s)
automatically.
This parameter can only be set in
the <filename>postgresql.conf</filename> file or on the server
command line.
</para>
</listitem>
</varlistentry>
<varlistentry id="guc-send-abort-for-kill" xreflabel="send_abort_for_kill">
<term><varname>send_abort_for_kill</varname> (<type>boolean</type>)
<indexterm>
<primary><varname>send_abort_for_kill</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
By default, after attempting to stop a child process with
<systemitem>SIGQUIT</systemitem>, the postmaster will wait five
seconds and then send <systemitem>SIGKILL</systemitem> to force
immediate termination. When this option is set
to <literal>on</literal>, <systemitem>SIGABRT</systemitem> is sent
instead of <systemitem>SIGKILL</systemitem>. That normally results
in production of a core dump file for each such child process.
This can be handy for investigating the states
of <quote>stuck</quote> child processes. It can also consume lots
of disk space in the event of repeated crashes, so do not enable
this on systems you are not monitoring carefully.
Beware that no support exists for cleaning up the core file(s)
automatically.
This parameter can only be set in
the <filename>postgresql.conf</filename> file or on the server
command line.
</para>
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1 id="runtime-config-short">

View File

@ -409,24 +409,6 @@ PostgreSQL documentation
</listitem>
</varlistentry>
<varlistentry>
<term><option>-n</option></term>
<listitem>
<para>
This option is for debugging problems that cause a server
process to die abnormally. The ordinary strategy in this
situation is to notify all other server processes that they
must terminate and then reinitialize the shared memory and
semaphores. This is because an errant server process could
have corrupted some shared state before terminating. This
option specifies that <command>postgres</command> will
not reinitialize shared data structures. A knowledgeable
system programmer can then use a debugger to examine shared
memory and semaphore state.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-O</option></term>
<listitem>
@ -466,14 +448,9 @@ PostgreSQL documentation
This option is for debugging problems that cause a server
process to die abnormally. The ordinary strategy in this
situation is to notify all other server processes that they
must terminate and then reinitialize the shared memory and
semaphores. This is because an errant server process could
have corrupted some shared state before terminating. This
option specifies that <command>postgres</command> will
stop all other server processes by sending the signal
<literal>SIGSTOP</literal>, but will not cause them to
terminate. This permits system programmers to collect core
dumps from all server processes by hand.
must terminate, by sending them <systemitem>SIGQUIT</systemitem>
signals. With this option, <systemitem>SIGABRT</systemitem>
will be sent instead, resulting in production of core dump files.
</para>
</listitem>
</varlistentry>