1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-22 12:22:45 +03:00
Commit Graph

1934 Commits

Author SHA1 Message Date
Andres Freund
7e957cbb50 postmaster: Introduce variadic btmask_all_except()
Upcoming patches would otherwise need btmask_all_except3().

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/w3z6w3g4aovivs735nk4pzjhmegntecesm3kktpebchegm5o53@aonnq2kn27xi
2025-01-10 11:43:00 -05:00
Andres Freund
40d4031abd postmaster: Improve logging of signals sent by postmaster
Previously many, in some cases important, signals we never logged. In other
cases the signal name was only included numerically.

As part of this, change the debug log level the signal is logged at to DEBUG3,
previously some where DEBUG2, some DEBUG4.

Also move from direct use of kill() to signal the av launcher to
signal_child(). There doesn't seem to be a reason for directly using kill().

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
2025-01-10 11:43:00 -05:00
Andres Freund
7148cbbdc6 postmaster: Update pmState via a wrapper function
This makes logging of state changes easier - state transitions are now logged
at DEBUG1. Without that logging it was surprisingly hard to understand the
current state of the system while debugging.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
2025-01-10 11:42:56 -05:00
Nathan Bossart
c758119e5b Allow changing autovacuum_max_workers without restarting.
This commit introduces a new parameter named
autovacuum_worker_slots that controls how many autovacuum worker
slots to reserve during server startup.  Modifying this new
parameter's value does require a server restart, but it should
typically be set to the upper bound of what you might realistically
need to set autovacuum_max_workers.  With that new parameter in
place, autovacuum_max_workers can now be changed with a SIGHUP
(e.g., pg_ctl reload).

If autovacuum_max_workers is set higher than
autovacuum_worker_slots, a WARNING is emitted, and the server will
only start up to autovacuum_worker_slots workers at a given time.
If autovacuum_max_workers is set to a value less than the number of
currently-running autovacuum workers, the existing workers will
continue running, but no new workers will be started until the
number of running autovacuum workers drops below
autovacuum_max_workers.

Reviewed-by: Sami Imseih, Justin Pryzby, Robert Haas, Andres Freund, Yogesh Sharma
Discussion: https://postgr.es/m/20240410212344.GA1824549%40nathanxps13
2025-01-06 15:01:22 -06:00
David Rowley
11012c5037 Fix an assortment of spelling mistakes and typos
Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/5812a0b9-b0cf-4151-9a14-d9f00e4f2858@gmail.com
2025-01-02 12:42:01 +13:00
Bruce Momjian
50e6eb731d Update copyright for 2025
Backpatch-through: 13
2025-01-01 11:21:55 -05:00
Tom Lane
34486b6092 Exclude parallel workers from connection privilege/limit checks.
Cause parallel workers to not check datallowconn, rolcanlogin, and
ACL_CONNECT privileges.  The leader already checked these things
(except for rolcanlogin which might have been checked for a different
role).  Re-checking can accomplish little except to induce unexpected
failures in applications that might not even be aware that their query
has been parallelized.  We already had the principle that parallel
workers rely on their leader to pass a valid set of authorization
information, so this change just extends that a bit further.

Also, modify the ReservedConnections, datconnlimit and rolconnlimit
logic so that these limits are only enforced against regular backends,
and only regular backends are counted while checking if the limits
were already reached.  Previously, background processes that had an
assigned database or role were subject to these limits (with rather
random exclusions for autovac workers and walsenders), and the set of
existing processes that counted against each limit was quite haphazard
as well.  The point of these limits, AFAICS, is to ensure the
availability of PGPROC slots for regular backends.  Since all other
types of processes have their own separate pools of PGPROC slots, it
makes no sense either to enforce these limits against them or to count
them while enforcing the limit.

While edge-case failures of these sorts have been possible for a
long time, the problem got a good deal worse with commit 5a2fed911
(CVE-2024-10978), which caused parallel workers to make some of these
checks using the leader's current role where before we had used its
AuthenticatedUserId, thus allowing parallel queries to fail after
SET ROLE.  The previous behavior was fairly accidental and I have
no desire to return to it.

This patch includes reverting 73c9f91a1, which was an emergency hack
to suppress these same checks in some cases.  It wasn't complete,
as shown by a recent bug report from Laurenz Albe.  We can also revert
fd4d93d26 and 492217301, which hacked around the same problems in one
regression test.

In passing, remove the special case for autovac workers in
CheckMyDatabase; it seems cleaner to have AutoVacWorkerMain pass
the INIT_PG_OVERRIDE_ALLOW_CONNS flag, now that that does what's
needed.

Like 5a2fed911, back-patch to supported branches (which sadly no
longer includes v12).

Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
2024-12-28 16:08:50 -05:00
Heikki Linnakangas
952365cded Remove unnecessary GetTransactionSnapshot() calls
In get_database_list() and get_subscription_list(), the
GetTransactionSnapshot() call is not required because the catalog
table scans use the catalog snapshot, which is held until the end of
the scan. See table_beginscan_catalog(), which calls
RegisterSnapshot(GetCatalogSnapshot(relid)).

In InitPostgres, it's a little less obvious that it's not required,
but still true I believe. All the catalog lookups in InitPostgres()
also use the catalog snapshot, and the looked up values are copied
while still holding the snapshot.

Furthermore, as the removed FIXME comments said, calling
GetTransactionSnapshot() didn't really prevent MyProc->xmin from being
reset anyway.

Discussion: https://www.postgresql.org/message-id/7c56f180-b9e1-481e-8c1d-efa63de3ecbb@iki.fi
2024-12-23 12:42:39 +02:00
Tom Lane
c91963da13 Set the stack_base_ptr in main(), not in random other places.
Previously we did this in PostmasterMain() and InitPostmasterChild(),
which meant that stack depth checking was disabled in non-postmaster
server processes, for instance in single-user mode.  That seems like
a fairly bad idea, since there's no a-priori restriction on the
complexity of queries we will run in single-user mode.  Moreover, this
led to not having quite the same stack depth limit in all processes,
which likely has no real-world effect but it offends my inner neatnik.
Setting the depth in main() guarantees that check_stack_depth() is
armed and has a consistent interpretation of stack depth in all forms
of server processes.

While at it, move the code associated with checking the stack depth
out of tcop/postgres.c (which was never a great home for it) into
a new file src/backend/utils/misc/stack_depth.c.

Discussion: https://postgr.es/m/2081982.1734393311@sss.pgh.pa.us
2024-12-17 12:08:42 -05:00
Thomas Munro
7bc9a8bdd2 Fix warnings about declaration of environ on MinGW.
POSIX says that the global variable environ shouldn't be declared in a
header, and that you have to declare it yourself.  MinGW declares it in
<stdlib.h> with some macrology that messes up our declarations.  Visual
Studio doesn't warn (there are clues that it may also declare it, but if
so, apparently compatibly).  Suppress our declarations, on MinGW only.

This clears the last warnings on CI's optional MinGW task, and hopefully
on build farm animal fairywren too.

Like 1319997d, no back-patch for now as it's not known to be breaking
anything, and my humble goal is just to keep the MinGW build clean going
forward.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGJLMh%2B6W5E4M_jSFb43gnrA_-Q6-%2BBf3HkBXyGfRFcBsQ%40mail.gmail.com
2024-12-15 00:41:27 +13:00
Noah Misch
8b9cbf4922 Fix elog(FATAL) before PostmasterMain() or just after fork().
Since commit 97550c0711, these failed with
"PANIC:  proc_exit() called in child process" due to uninitialized or
stale MyProcPid.  That was reachable if close() failed in
ClosePostmasterPorts() or setlocale(category, "C") failed, both
unlikely.  Back-patch to v13 (all supported versions).

Discussion: https://postgr.es/m/20241208034614.45.nmisch@google.com
2024-12-10 13:51:59 -08:00
Nathan Bossart
76fd342496 Provide a better error message for misplaced dispatch options.
Before this patch, misplacing a special must-be-first option for
dispatching to a subprogram (e.g., postgres -D . --single) would
fail with an error like

	FATAL:  --single requires a value

This patch adjusts this error to more accurately complain that the
special option wasn't listed first.  The aforementioned error
message now looks like

	FATAL:  --single must be first argument

The dispatch option parsing code has been refactored for use
wherever ParseLongOption() is called.  Beyond the obvious advantage
of avoiding code duplication, this should prevent similar problems
when new dispatch options are added.  Note that we assume that none
of the dispatch option names match another valid command-line
argument, such as the name of a configuration parameter.

Ideally, we'd remove this must-be-first requirement for these
options, but after some investigation, we decided that wasn't worth
the added complexity and behavior changes.

Author: Nathan Bossart, Greg Sabino Mullane
Reviewed-by: Greg Sabino Mullane, Peter Eisentraut, Álvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/CAKAnmmJkZtZAiSryho%3DgYpbvC7H-HNjEDAh16F3SoC9LPu8rqQ%40mail.gmail.com
2024-12-04 15:04:15 -06:00
Andres Freund
6a5bcf7f7d postmaster: Reduce verbosity of environment dump debug message
Emitting each variable separately is unnecessarily verbose / hard to skim
over. Emit the whole thing in one ereport() to address that.

Also remove program name and function reference from the message. The former
doesn't seem particularly helpful and the latter is provided by the elog.c
infrastructure these days.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/leouteo5ozcrux3fepuhtbp6c56tbfd4naxeokidbx7m75cabz@hhw6g4urlowt
2024-11-27 11:17:23 -05:00
Heikki Linnakangas
5b00786857 Pass MyPMChildSlot as an explicit argument to child process
All the other global variables passed from postmaster to child have
the same value in all the processes, while MyPMChildSlot is more like
a parameter to each child process.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
2024-11-14 16:12:32 +02:00
Heikki Linnakangas
a78af04270 Assign a child slot to every postmaster child process
Previously, only backends, autovacuum workers, and background workers
had an entry in the PMChildFlags array. With this commit, all
postmaster child processes, including all the aux processes, have an
entry. Dead-end backends still don't get an entry, though, and other
processes that don't touch shared memory will never mark their
PMChildFlags entry as active.

We now maintain separate freelists for different kinds of child
processes. That ensures that there are always slots available for
autovacuum and background workers. Previously, pre-authentication
backends could prevent autovacuum or background workers from starting
up, by using up all the slots.

The code to manage the slots in the postmaster process is in a new
pmchild.c source file. Because postmaster.c is just so large.
Assigning pmsignal slot numbers is now pmchild.c's responsibility.
This replaces the PMChildInUse array in pmsignal.c.

Some of the comments in postmaster.c still talked about the "stats
process", but that was removed in commit 5891c7a8ed. Fix those while
we're at it.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
2024-11-14 16:12:28 +02:00
Heikki Linnakangas
bb861414fe Kill dead-end children when there's nothing else left
Previously, the postmaster would never try to kill dead-end child
processes, even if there were no other processes left. A dead-end
backend will eventually exit, when authentication_timeout expires, but
if a dead-end backend is the only thing that's preventing the server
from shutting down, it seems better to kill it immediately. It's
particularly important, if there was a bug in the early startup code
that prevented a dead-end child from timing out and exiting normally.

Includes a test for that case where a dead-end backend previously
prevented the server from shutting down.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
2024-11-14 16:12:04 +02:00
Heikki Linnakangas
18d67a8d7d Replace postmaster.c's own backend type codes with BackendType
Introduce a separate BackendType for dead-end children, so that we
don't need a separate dead_end flag.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
2024-11-14 16:06:16 +02:00
Heikki Linnakangas
368d8270c8 Rename two functions that wake up other processes
Instead of talking about setting latches, which is a pretty low-level
mechanism, emphasize that they wake up other processes.

This is in preparation for replacing Latches with a new abstraction.
That's still work in progress, but this seems a little tidier anyway,
so let's get this refactoring out of the way already.

Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c%40iki.fi
2024-11-01 13:47:24 +02:00
Heikki Linnakangas
a9c546a5a3 Use ProcNumbers instead of direct Latch pointers to address other procs
This is in preparation for replacing Latches with a new abstraction.
That's still work in progress, but this seems a little tidier anyway,
so let's get this refactoring out of the way already.

Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c%40iki.fi
2024-11-01 13:47:20 +02:00
Daniel Gustafsson
fb7e27abfb Remove duplicate words in comments
A few comments contained duplicate "the" in sentences, fix by removing
one occurrence.

Author: Vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CALDaNm2aEEiPwGJmPdzBxROVvs8n75yCjKz4K1f1B2TdWpzxTA@mail.gmail.com
2024-10-31 11:38:03 +01:00
Peter Eisentraut
e18512c000 Remove unused #include's from backend .c files
as determined by IWYU

These are mostly issues that are new since commit dbbca2cf29.

Discussion: https://www.postgresql.org/message-id/flat/0df1d5b1-8ca8-4f84-93be-121081bde049%40eisentraut.org
2024-10-27 08:26:50 +01:00
Heikki Linnakangas
f9ecb57a50 Clean up WaitLatch calls that passed latch without WL_LATCH_SET
The 'latch' argument is ignored if WL_LATCH_SET is not given. Clarify
these calls by not pointlessly passing MyLatch.

Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c@iki.fi
2024-10-05 15:31:06 +03:00
Peter Eisentraut
10b721821d Use macro to define the number of enum values
Refactoring in the interest of code consistency, a follow-up to 2e068db56e.

The argument against inserting a special enum value at the end of the enum
definition is that a switch statement might generate a compiler warning unless
it has a default clause.

Aleksander Alekseev, reviewed by Michael Paquier, Dean Rasheed, Peter Eisentraut

Discussion: https://postgr.es/m/CAJ7c6TMsiaV5urU_Pq6zJ2tXPDwk69-NKVh4AMN5XrRiM7N%2BGA%40mail.gmail.com
2024-10-01 09:30:24 -04:00
Fujii Masao
559efce1d6 Add num_done counter to the pg_stat_checkpointer view.
Checkpoints can be skipped when the server is idle. The existing num_timed and
num_requested counters in pg_stat_checkpointer track both completed and
skipped checkpoints, but there was no way to count only the completed ones.

This commit introduces the num_done counter, which tracks only completed
checkpoints, making it easier to see how many were actually performed.

Bump catalog version.

Author: Anton A. Melnikov
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/9ea77f40-818d-4841-9dee-158ac8f6e690@oss.nttdata.com
2024-09-30 11:56:05 +09:00
Tomas Vondra
c4d5cb71d2 Increase the number of fast-path lock slots
Replace the fixed-size array of fast-path locks with arrays, sized on
startup based on max_locks_per_transaction. This allows using fast-path
locking for workloads that need more locks.

The fast-path locking introduced in 9.2 allowed each backend to acquire
a small number (16) of weak relation locks cheaply. If a backend needs
to hold more locks, it has to insert them into the shared lock table.
This is considerably more expensive, and may be subject to contention
(especially on many-core systems).

The limit of 16 fast-path locks was always rather low, because we have
to lock all relations - not just tables, but also indexes, views, etc.
For planning we need to lock all relations that might be used in the
plan, not just those that actually get used in the final plan. So even
with rather simple queries and schemas, we often need significantly more
than 16 locks.

As partitioning gets used more widely, and the number of partitions
increases, this limit is trivial to hit. Complex queries may easily use
hundreds or even thousands of locks. For workloads doing a lot of I/O
this is not noticeable, but for workloads accessing only data in RAM,
the access to the shared lock table may be a serious issue.

This commit removes the hard-coded limit of the number of fast-path
locks. Instead, the size of the fast-path arrays is calculated at
startup, and can be set much higher than the original 16-lock limit.
The overall fast-path locking protocol remains unchanged.

The variable-sized fast-path arrays can no longer be part of PGPROC, but
are allocated as a separate chunk of shared memory and then references
from the PGPROC entries.

The fast-path slots are organized as a 16-way set associative cache. You
can imagine it as a hash table of 16-slot "groups". Each relation is
mapped to exactly one group using hash(relid), and the group is then
processed using linear search, just like the original fast-path cache.
With only 16 entries this is cheap, with good locality.

Treating this as a simple hash table with open addressing would not be
efficient, especially once the hash table gets almost full. The usual
remedy is to grow the table, but we can't do that here easily. The
access would also be more random, with worse locality.

The fast-path arrays are sized using the max_locks_per_transaction GUC.
We try to have enough capacity for the number of locks specified in the
GUC, using the traditional 2^n formula, with an upper limit of 1024 lock
groups (i.e. 16k locks). The default value of max_locks_per_transaction
is 64, which means those instances will have 64 fast-path slots.

The main purpose of the max_locks_per_transaction GUC is to size the
shared lock table. It is often set to the "average" number of locks
needed by backends, with some backends using significantly more locks.
This should not be a major issue, however. Some backens may have to
insert locks into the shared lock table, but there can't be too many of
them, limiting the contention.

The only solution is to increase the GUC, even if the shared lock table
already has sufficient capacity. That is not free, especially in terms
of memory usage (the shared lock table entries are fairly large). It
should only happen on machines with plenty of memory, though.

In the future we may consider a separate GUC for the number of fast-path
slots, but let's try without one first.

Reviewed-by: Robert Haas, Jakub Wartak
Discussion: https://postgr.es/m/510b887e-c0ce-4a0c-a17a-2c6abb8d9a5c@enterprisedb.com
2024-09-21 20:09:35 +02:00
Michael Paquier
b4db64270e Apply more quoting to GUC names in messages
This is a continuation of 17974ec259.  More quotes are applied to
GUC names in error messages and hints, taking care of what seems to be
all the remaining holes currently in the tree for the GUCs.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com
2024-09-04 13:50:44 +09:00
Michael Paquier
4236825197 Fix typos and grammar in code comments and docs
Author: Alexander Lakhin
Discussion: https://postgr.es/m/f7e514cf-2446-21f1-a5d2-8c089a6e2168@gmail.com
2024-09-03 14:49:04 +09:00
Peter Eisentraut
edee0c621d Message style improvements 2024-08-29 14:43:34 +02:00
Heikki Linnakangas
56d23855c8 Fix garbled process name on backend crash
The log message on backend crash used wrong variable, which could be
uninitialized. Introduced in commit 28a520c0b7.

Reported-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/451b0797-83b8-cdbc-727f-8d7a7b0e3bca@gmail.com
2024-08-19 09:48:25 +03:00
Peter Eisentraut
93660d1c27 Use errmsg_internal for debug messages
Some newer code was applying this inconsistently.
2024-08-13 10:01:49 +02:00
Heikki Linnakangas
3354f85284 Consolidate postmaster code to launch background processes
Much of the code in process_pm_child_exit() to launch replacement
processes when one exits or when progressing to next postmaster state
was unnecessary, because the ServerLoop will launch any missing
background processes anyway. Remove the redundant code and let
ServerLoop handle it.

In ServerLoop, move the code to launch all the processes to a new
subroutine, to group it all together.

Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/8f2118b9-79e3-4af7-b2c9-bd5818193ca4@iki.fi
2024-08-12 10:04:26 +03:00
John Naylor
bbf668d66f Lower minimum maintenance_work_mem to 64kB
Since the introduction of TID store, vacuum uses far less memory in
the common case than in versions 16 and earlier. Invoking multiple
rounds of index vacuuming in turn requires a much larger table. It'd
be a good idea anyway to cover this case in regression testing, and a
lower limit is less painful for slow buildfarm animals. The reason to
do it now is to re-enable coverage of the bugfix in commit 83c39a1f7f.

For consistency, give autovacuum_work_mem the same treatment.

Suggested by Andres Freund
Tested by Melanie Plageman
Backpatch to v17, where TID store was introduced

Discussion: https://postgr.es/m/20240516205458.ohvlzis5b5tvejru@awork3.anarazel.de
Discussion: https://postgr.es/m/20240722164745.fvaoh6g6zprisqgp%40awork3.anarazel.de
2024-08-10 14:52:56 +07:00
Heikki Linnakangas
a79ed10e6c Fix comment on processes being kept over a restart
All child processes except the syslogger are killed on a restart. The
archiver might be already running though, if it was started during
recovery.

The split in the comments between "other special children" and the
first group of "background tasks" seemed really arbitrary, so I just
merged them all into one group.

Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/8f2118b9-79e3-4af7-b2c9-bd5818193ca4@iki.fi
2024-08-10 00:06:19 +03:00
Heikki Linnakangas
28a520c0b7 Refactor code to handle death of a backend or bgworker in postmaster
Currently, when a child process exits, the postmaster first scans
through BackgroundWorkerList, to see if it the child process was a
background worker. If not found, then it scans through BackendList to
see if it was a regular backend. That leads to some duplication
between the bgworker and regular backend cleanup code, as both have an
entry in the BackendList that needs to be cleaned up in the same way.
Refactor that so that we scan just the BackendList to find the child
process, and if it was a background worker, do the additional
bgworker-specific cleanup in addition to the normal Backend cleanup.

Change HandleChildCrash so that it doesn't try to handle the cleanup
of the process that already exited, only the signaling of all the
other processes. When called for any of the aux processes, the caller
had already cleared the *PID global variable, so the code in
HandleChildCrash() to do that was unused.

On Windows, if a child process exits with ERROR_WAIT_NO_CHILDREN, it's
now logged with that exit code, instead of 0. Also, if a bgworker
exits with ERROR_WAIT_NO_CHILDREN, it's now treated as crashed and is
restarted. Previously it was treated as a normal exit.

If a child process is not found in the BackendList, the log message
now calls it "untracked child process" rather than "server process".
Arguably that should be a PANIC, because we do track all the child
processes in the list, so failing to find a child process is highly
unexpected. But if we want to change that, let's discuss and do that
as a separate commit.

Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
2024-08-10 00:04:43 +03:00
Heikki Linnakangas
b43100fa71 Make BackgroundWorkerList doubly-linked
This allows ForgetBackgroundWorker() and ReportBackgroundWorkerExit()
to take a RegisteredBgWorker pointer as argument, rather than a list
iterator. That feels a little more natural. But more importantly, this
paves the way for more refactoring in the next commit.

Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
2024-08-09 22:44:20 +03:00
Heikki Linnakangas
63bef4df97 Minor refactoring of assign_backendlist_entry()
Make assign_backendlist_entry() responsible just for allocating the
Backend struct. Linking it to the RegisteredBgWorker is the caller's
responsibility now. Seems more clear that way.

Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
2024-08-01 23:23:55 +03:00
Thomas Munro
71d6c4b966 Remove useless member of BackendParameters.
Oversight in e2562667, which stopped using SpinlockSemaArray but forgot
to remove it from the array.

Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/310f4005-91d7-42b2-ac70-92624260dd28%40iki.fi
2024-07-30 23:15:09 +12:00
Thomas Munro
e25626677f Remove --disable-spinlocks.
A later change will require atomic support, so it wouldn't make sense
for a hypothetical new system not to be able to implement spinlocks.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (concept, not the patch)
Reviewed-by: Andres Freund <andres@anarazel.de> (concept, not the patch)
Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us
2024-07-30 22:58:37 +12:00
Heikki Linnakangas
6a1d8cef46 Detach syslogger from shared memory
Commit aafc05de1b removed the calls to detach from shared memory from
syslogger startup. That was not intentional, so put them back.

Author: Rui Zhao
Reviewed-by: Aleksander Alekseev
Backpatch-through: 17
Discussion: https://www.postgresql.org/message-id/11505016-8cf3-4691-b996-7faed99b7877.xiyuan.zr@alibaba-inc.com
2024-07-29 22:21:34 +03:00
Heikki Linnakangas
9d9b9d46f3 Move cancel key generation to after forking the backend
Move responsibility of generating the cancel key to the backend
process. The cancel key is now generated after forking, and the
backend advertises it in the ProcSignal array. When a cancel request
arrives, the backend handling it scans the ProcSignal array to find
the target pid and cancel key. This is similar to how this previously
worked in the EXEC_BACKEND case with the ShmemBackendArray, just
reusing the ProcSignal array.

One notable change is that we no longer generate cancellation keys for
non-backend processes. We generated them before just to prevent a
malicious user from canceling them; the keys for non-backend processes
were never actually given to anyone. There is now an explicit flag
indicating whether a process has a valid key or not.

I wrote this originally in preparation for supporting longer cancel
keys, but it's a nice cleanup on its own.

Reviewed-by: Jelte Fennema-Nio
Discussion: https://www.postgresql.org/message-id/508d0505-8b7a-4864-a681-e7e5edfe32aa@iki.fi
2024-07-29 15:37:48 +03:00
Robert Haas
8a53539bd6 Wait for WAL summarization to catch up before creating .partial file.
When a standby is promoted, CleanupAfterArchiveRecovery() may decide
to rename the final WAL file from the old timeline by adding ".partial"
to the name. If WAL summarization is enabled and this file is renamed
before its partial contents are summarized, WAL summarization breaks:
the summarizer gets stuck at that point in the WAL stream and just
errors out.

To fix that, first make the startup process wait for WAL summarization
to catch up before renaming the file. Generally, this should be quick,
and if it's not, the user can shut off summarize_wal and try again.
To make this fix work, also teach the WAL summarizer that after a
promotion has occurred, no more WAL can appear on the previous
timeline: previously, the WAL summarizer wouldn't switch to the new
timeline until we actually started writing WAL there, but that meant
that when the startup process was waiting for the WAL summarizer, it
was waiting for an action that the summarizer wasn't yet prepared to
take.

In the process of fixing these bugs, I realized that the logic to wait
for WAL summarization to catch up was spread out in a way that made
it difficult to reuse properly, so this code refactors things to make
it easier.

Finally, add a test case that would have caught this bug and the
previously-fixed bug that WAL summarization sometimes needs to back up
when the timeline changes.

Discussion: https://postgr.es/m/CA+TgmoZGEsZodXC4f=XZNkAeyuDmWTSkpkjCEOcF19Am0mt_OA@mail.gmail.com
2024-07-26 15:00:48 -04:00
Robert Haas
c883453cb2 Fix indentation. 2024-07-26 12:00:04 -04:00
Robert Haas
cf8a489836 Allow WAL summarization to back up when timeline changes.
The old code believed that it was not possible to switch timelines
without first replaying all of the WAL from the old timeline, but
that turns out to be false, as demonstrated by an example from Fujii
Masao. As a result, it assumed that summarization would always
continue from the LSN where summarization previously ended. But in
fact, when a timeline switch occurs without replaying all the WAL
from the previous timeline, we can need to back up to an earlier
LSN. Adjust accordingly.

Discussion: https://postgr.es/m/CA+TgmoZGEsZodXC4f=XZNkAeyuDmWTSkpkjCEOcF19Am0mt_OA@mail.gmail.com
2024-07-26 09:50:31 -04:00
Heikki Linnakangas
b9e5249c29 Fix using injection points at backend startup in EXEC_BACKEND mode
Commit 86db52a506 changed the locking of injection points to use only
atomic ops and spinlocks, to make it possible to define injection
points in processes that don't have a PGPROC entry (yet). However, it
didn't work in EXEC_BACKEND mode, because the pointer to shared memory
area was not initialized until the process "attaches" to all the
shared memory structs. To fix, pass the pointer to the child process
along with other global variables that need to be set up early.

Backpatch-through: 17
2024-07-26 15:11:50 +03:00
Peter Eisentraut
d3cc5ffe81 Move extern declarations for EXEC_BACKEND to header files
This fixes warnings from -Wmissing-variable-declarations (not yet part
of the standard warning options) under EXEC_BACKEND.  The
NON_EXEC_STATIC variables need a suitable declaration in a header file
under EXEC_BACKEND.

Also fix the inconsistent application of the volatile qualifier for
PMSignalState, which was revealed by this change.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org
2024-07-23 15:07:10 +02:00
Robert Haas
c0348fd0e3 Add missing call to ConditionVariableCancelSleep().
After calling ConditionVariableSleep() or ConditionVariableTimedSleep()
one or more times, code is supposed to call ConditionVariableCancelSleep()
to remove itself from the waitlist. This code neglected to do so.
As far as I know, that had no observable consequences, but let's make
the code correct.

Discussion: http://postgr.es/m/CA+TgmoYW8eR+KN6zhVH0sin7QH6AvENqw_bkN-bB4yLYKAnsew@mail.gmail.com
2024-07-22 10:02:31 -04:00
Robert Haas
402b586d0a Do not summarize WAL if generated with wal_level=minimal.
To do this, we must include the wal_level in the first WAL record
covered by each summary file; so add wal_level to struct Checkpoint
and the payload of XLOG_CHECKPOINT_REDO and XLOG_END_OF_RECOVERY.

This, in turn, requires bumping XLOG_PAGE_MAGIC and, since the
Checkpoint is also stored in the control file, also
PG_CONTROL_VERSION. It's not great to do that so late in the release
cycle, but the alternative seems to ship v17 without robust
protections against this scenario, which could result in corrupted
incremental backups.

A side effect of this patch is that, when a server with
wal_level=replica is started with summarize_wal=on for the first time,
summarization will no longer begin with the oldest WAL that still
exists in pg_wal, but rather from the first checkpoint after that.
This change should be harmless, because a WAL summary for a partial
checkpoint cycle can never make an incremental backup possible when
it would otherwise not have been.

Report by Fujii Masao. Patch by me. Review and/or testing by Jakub
Wartak and Fujii Masao.

Discussion: http://postgr.es/m/6e30082e-041b-4e31-9633-95a66de76f5d@oss.nttdata.com
2024-07-18 12:09:48 -04:00
Michael Paquier
d2b74882ca Add tap test for pg_signal_autovacuum role
This commit provides testig coverage for ccd38024bc, checking that a
role granted pg_signal_autovacuum_worker is able to stop a vacuum
worker.

An injection point with a wait is placed at the beginning of autovacuum
worker startup to make sure that a worker is still alive when sending
and processing the signal sent.

Author: Anthony Leung, Michael Paquier, Kirill Reshke
Reviewed-by: Andrey Borodin, Nathan Bossart
Discussion: https://postgr.es/m/CALdSSPiQPuuQpOkF7x0g2QkA5eE-3xXt7hiJFvShV1bHKDvf8w@mail.gmail.com
2024-07-16 10:05:46 +09:00
Heikki Linnakangas
f3412a61f3 Avoid 0-length memcpy to NULL with EXEC_BACKEND
memcpy(NULL, src, 0) is forbidden by POSIX, even though every
production version of libc allows it. Let's be tidy.

Per report from Thomas Munro, running UBSan with EXEC_BACKEND.
Backpatch to v17, where this code was added.

Discussion: https://www.postgresql.org/message-id/CA%2BhUKG%2Be-dV7YWBzfBZXsgovgRuX5VmvmOT%2Bv0aXiZJ-EKbXcw@mail.gmail.com
2024-07-03 15:58:14 +03:00
Heikki Linnakangas
eb21f5bc67 Remove redundant SetProcessingMode(InitProcessing) calls
After several refactoring iterations, auxiliary processes are no
longer initialized from the bootstrapper. Using the InitProcessing
mode for initializing auxiliary processes is more appropriate. Since
the global variable Mode is initialized to InitProcessing, we can just
remove the redundant calls of SetProcessingMode(InitProcessing).

Author: Xing Guo <higuoxing@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACpMh%2BDBHVT4xPGimzvex%3DwMdMLQEu9PYhT%2BkwwD2x2nu9dU_Q%40mail.gmail.com
2024-07-02 20:14:40 +03:00