Patch introduces a hash map reloid -> PgStat_TableStatus which improves
performance in case of large number of tables/partitions.
Author: Aleksander Alekseev
Reviewed-by: Andres Freund, Anastasia Lubennikova, Tels, me
https://commitfest.postgresql.org/13/1058/
Previously, auxiliary processes and background workers not connected
to a database (such as the logical replication launcher) weren't
shown. Include them, so that we can see the associated wait state
information. Add a new column to identify the processes type, so that
people can filter them out easily using SQL if they wish.
Before this patch was written, there was discussion about whether we
should expose this information in a separate view, so as to avoid
contaminating pg_stat_activity with things people might not want to
see. But putting everything in pg_stat_activity was a more popular
choice, so that's what the patch does.
Kuntal Ghosh, reviewed by Amit Langote and Michael Paquier. Some
revisions and bug fixes by me.
Discussion: http://postgr.es/m/CA+TgmoYES5nhkEGw9nZXU8_FhA8XEm8NTm3-SO+3ML1B81Hkww@mail.gmail.com
Add functionality for a new subscription to copy the initial data in the
tables and then sync with the ongoing apply process.
For the copying, add a new internal COPY option to have the COPY source
data provided by a callback function. The initial data copy works on
the subscriber by receiving COPY data from the publisher and then
providing it locally into a COPY that writes to the destination table.
A WAL receiver can now execute full SQL commands. This is used here to
obtain information about tables and publications.
Several new options were added to CREATE and ALTER SUBSCRIPTION to
control whether and when initial table syncing happens.
Change pg_dump option --no-create-subscription-slots to
--no-subscription-connect and use the new CREATE SUBSCRIPTION
... NOCONNECT option for that.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Erik Rijkers <er@xs4all.nl>
The non-concurrent code path for REFRESH MATERIALIZED VIEW failed to
report its updates to the stats collector. This is bad since it means
auto-analyze doesn't know there's any work to be done. Adjust it to
report the refresh as a table truncate followed by insertion of an
appropriate number of rows.
Since a matview could contain more than INT_MAX rows, change the
signature of pgstat_count_heap_insert() to accept an int64 rowcount.
(The accumulator it's adding into is already int64, but existing
callers could not insert more than a small number of rows at once,
so the argument had been declared just "int n".)
This is surely a bug fix, but changing pgstat_count_heap_insert()'s API
seems too risky for the back branches. Given the lack of previous
complaints, I'm not sure it's a big enough problem to justify a kluge
solution that would avoid that. So, no back-patch, at least for now.
Jim Mlodgenski, adjusted a bit by me
Discussion: https://postgr.es/m/CAB_5SRchSz7-WmdO5szdiknG8Oj_GGqJytrk1KRd11yhcMs1KQ@mail.gmail.com
Previous commits, notably 53be0b1add and
6f3bd98ebf, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
There's no really good reason why the autovacuum launcher and logical
replication launcher should announce themselves at startup and shutdown
by default. Users don't care that those processes exist, and it's
inconsistent that those background processes announce themselves while
others don't. So, reduce those messages from LOG to DEBUG1 level.
I was sorely tempted to reduce the "starting logical replication worker
for subscription ..." message to DEBUG1 as well, but forebore for now.
Those processes might possibly be of direct interest to users, at least
until logical replication is a lot better shaken out than it is today.
Discussion: https://postgr.es/m/19479.1489121003@sss.pgh.pa.us
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
This makes the connection attempt from CREATE SUBSCRIPTION and from
WalReceiver interruptable by the user in case the libpq connection is
hanging. The previous coding required immediate shutdown (SIGQUIT) of
PostgreSQL in that situation.
From: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Thom Brown <thom@linux.com>
The syslogger will write out the current stderr and csvlog names, if
it's running and there are any, to a new file in the data directory
called "current_logfiles". We take care to remove this file when it
might no longer be valid (but not at shutdown). The function
pg_current_logfile() can be used to read the entries in the file.
Gilles Darold, reviewed and modified by Karl O. Pinc, Michael
Paquier, and me. Further review by Álvaro Herrera and Christoph Berg.
Tom Lane observed buildfarm failures caused by the select_parallel
regression test trying to launch new parallel queries before the
worker slots used by the previous ones were freed. Try to fix this by
having the postmaster free the worker slots before it sends the
SIGUSR1 notifications to the registering process. This doesn't
completely eliminate the possibility that the user backend might
(correctly) observe the worker as dead before the slot is free, but I
believe it should make the window significantly narrower.
Patch by me, per complaint from Tom Lane. Reviewed by Amit Kapila.
Discussion: http://postgr.es/m/30673.1487310734@sss.pgh.pa.us
c.h #includes a number of core libc header files, such as <stdio.h>.
There's no point in re-including these after having read postgres.h,
postgres_fe.h, or c.h; so remove code that did so.
While at it, also fix some places that were ignoring our standard pattern
of "include postgres[_fe].h, then system header files, then other Postgres
header files". While there's not any great magic in doing it that way
rather than system headers last, it's silly to have just a few files
deviating from the general pattern. (But I didn't attempt to enforce this
globally, only in files I was touching anyway.)
I'd be the first to say that this is mostly compulsive neatnik-ism,
but over time it might save enough compile cycles to be useful.
This isn't exposed to the optimizer or the executor yet; we'll add
support for those things in a separate patch. But this puts the
basic mechanism in place: several processes can attach to a parallel
btree index scan, and each one will get a subset of the tuples that
would have been produced by a non-parallel scan. Each index page
becomes the responsibility of a single worker, which then returns
all of the TIDs on that page.
Rahila Syed, Amit Kapila, Robert Haas, reviewed and tested by
Anastasia Lubennikova, Tushar Ahuja, and Haribabu Kommi.
When (1) autovacuum = off and (2) there's at least one database with
an XID age greater than autovacuum_freeze_max_age and (3) all tables
in that database that need vacuuming are already being processed by a
worker and (4) the autovacuum launcher is started, a kind of infinite
loop occurs. The launcher starts a worker and immediately exits. The
worker, finding no worker to do, immediately starts the launcher,
supposedly so that the next database can be processed. But because
datfrozenxid for that database hasn't been advanced yet, the new
worker gets put right back into the same database as the old one,
where it once again starts the launcher and exits. High-speed ping
pong ensues.
There are several possible ways to break the cycle; this seems like
the safest one.
Amit Khandekar (code) and Robert Haas (comments), reviewed by
Álvaro Herrera.
Discussion: http://postgr.es/m/CAJ3gD9eWejf72HKquKSzax0r+epS=nAbQKNnykkMA0E8c+rMDg@mail.gmail.com
Leave OpenSSL's default passphrase collection callback in place during
the first call of secure_initialize() in server startup. Although that
doesn't work terribly well in daemon contexts, some people feel we should
not break it for anyone who was successfully using it before. We still
block passphrase demands during SIGHUP, meaning that you can't adjust SSL
configuration on-the-fly if you used a passphrase, but this is no worse
than what it was before commit de41869b6. And we block passphrase demands
during EXEC_BACKEND reloads; that behavior wasn't useful either, but at
least now it's documented.
Tweak some related log messages for more readability, and avoid issuing
essentially duplicate messages about reload failure caused by a passphrase.
Discussion: https://postgr.es/m/29982.1483412575@sss.pgh.pa.us
It is no longer necessary to restart the server to enable, disable,
or reconfigure SSL. Instead, we just create a new SSL_CTX struct
(by re-reading all relevant files) whenever we get SIGHUP. Testing
shows that this is fast enough that it shouldn't be a problem.
In conjunction with that, downgrade the logic that complains about
pg_hba.conf "hostssl" lines when SSL isn't active: now that's just
a warning condition not an error.
An issue that still needs to be addressed is what shall we do with
passphrase-protected server keys? As this stands, the server would
demand the passphrase again on every SIGHUP, which is certainly
impractical. But the case was only barely supported before, so that
does not seem a sufficient reason to hold up committing this patch.
Andreas Karlsson, reviewed by Michael Banck and Michael Paquier
Discussion: https://postgr.es/m/556A6E8A.9030400@proxel.se
Some background activity (like checkpoints, archive timeout, standby
snapshots) is not supposed to happen on an idle system. Unfortunately
so far it was not easy to determine when a system is idle, which
defeated some of the attempts to avoid redundant activity on an idle
system.
To make that easier, allow to make individual WAL insertions as not
being "important". By checking whether any important activity happened
since the last time an activity was performed, it now is easy to check
whether some action needs to be repeated.
Use the new facility for checkpoints, archive timeout and standby
snapshots.
The lack of a facility causes some issues in older releases, but in my
opinion the consequences (superflous checkpoints / archived segments)
aren't grave enough to warrant backpatching.
Author: Michael Paquier, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Amit Kapila, Kyotaro HORIGUCHI
Bug: #13685
Discussion:
https://www.postgresql.org/message-id/20151016203031.3019.72930@wrigleys.postgresql.orghttps://www.postgresql.org/message-id/CAB7nPqQcPqxEM3S735Bd2RzApNqSNJVietAC=6kfkYv_45dKwA@mail.gmail.com
Backpatch: -
array_base and array_stride were added so that we could identify the
offset of an LWLock within a tranche, but this facility is only very
marginally used apart from the main tranche. So, give every lock in
the main tranche its own tranche ID and get rid of array_base,
array_stride, and all that's attached. For debugging facilities
(Trace_lwlocks and LWLOCK_STATS) print the pointer address of the
LWLock using %p instead of the offset. This is arguably more useful,
and certainly a lot cheaper. Drop the offset-within-tranche from
the information reported to dtrace and from one can't-happen message
inside lwlock.c.
The main user-visible impact of this change is that pg_stat_activity
will now report all waits for LWLocks as "LWLock" rather than
reporting some as "LWLockTranche" and others as "LWLockNamed".
The main motivation for this change is that the need to specify an
array_base and an array_stride is awkward for parallel query. There
is only a very limited supply of tranche IDs so we can't just keep
allocating new ones, and if we try to use the same tranche IDs every
time then we run into trouble when multiple parallel contexts are
use simultaneously. So if we didn't get rid of this mechanism we'd
have to make it even more complicated. By simplifying it in this
way, we instead reduce the size of the generated code for lwlock.c
by about 5%.
Discussion: http://postgr.es/m/CA+TgmoYsFn6NUW1x0AZtupJGUAs1UDY4dJtCN47_Q6D0sP80PA@mail.gmail.com
Previously, the "sem" field of PGPROC varied in size depending on which
kernel semaphore API we were using. That was okay as long as there was
only one likely choice per platform, but in the wake of commit ecb0d20a9,
that assumption seems rather shaky. It doesn't seem out of the question
anymore that an extension compiled against one API choice might be loaded
into a postmaster built with another choice. Moreover, this prevents any
possibility of selecting the semaphore API at postmaster startup, which
might be something we want to do in future.
Hence, change PGPROC.sem to be PGSemaphore (i.e. a pointer) for all Unix
semaphore APIs, and turn the pointed-to data into an opaque struct whose
contents are only known within the responsible modules.
For the SysV and unnamed-POSIX APIs, the pointed-to data has to be
allocated elsewhere in shared memory, which takes a little bit of
rejiggering of the InitShmemAllocation code sequence. (I invented a
ShmemAllocUnlocked() function to make that a little cleaner than it used
to be. That function is not meant for any uses other than the ones it
has now, but it beats having InitShmemAllocation() know explicitly about
allocation of space for semaphores and spinlocks.) This change means an
extra indirection to access the semaphore data, but since we only touch
that when blocking or awakening a process, there shouldn't be any
meaningful performance penalty. Moreover, at least for the unnamed-POSIX
case on Linux, the sem_t type is quite a bit wider than a pointer, so this
reduces sizeof(PGPROC) which seems like a good thing.
For the named-POSIX API, there's effectively no change: the PGPROC.sem
field was and still is a pointer to something returned by sem_open() in
the postmaster's memory space. Document and check the pre-existing
limitation that this case can't work in EXEC_BACKEND mode.
It did not seem worth unifying the Windows semaphore ABI with the Unix
cases, since there's no likelihood of needing ABI compatibility much less
runtime switching across those cases. However, we can simplify the Windows
code a bit if we define PGSemaphore as being directly a HANDLE, rather than
pointer to HANDLE, so let's do that while we're here. (This also ends up
being no change in what's physically stored in PGPROC.sem. We're just
moving the HANDLE fetch from callees to callers.)
It would take a bunch of additional code shuffling to get to the point of
actually choosing a semaphore API at postmaster start, but the effects
of that would now be localized in the port/XXX_sema.c files, so it seems
like fit material for a separate patch. The need for it is unproven as
yet, anyhow, whereas the ABI risk to extensions seems real enough.
Discussion: https://postgr.es/m/4029.1481413370@sss.pgh.pa.us
Also use the new facility for generating RADIUS authenticator requests,
and salt in chkpass extension.
Reword the error messages to be nicer. Fix bogus error code used in the
message in BackendStartup.
pg_backend_random() is used for MD5 salt generation, but it can fail, and
no checks were done on its status code.
Fix memory leak, if generating a random number for a cancel key failed.
Both issues were spotted by Coverity. Fix by Michael Paquier.
In commit fe0a0b59, the datatype used for MyCancelKey and other variables
that store cancel keys were changed from long to uint32, but I missed this
one. That broke query cancellation on platforms where long is wider than 32
bits.
Report by Andres Freund, fix by Michael Paquier.
This adds a new routine, pg_strong_random() for generating random bytes,
for use in both frontend and backend. At the moment, it's only used in
the backend, but the upcoming SCRAM authentication patches need strong
random numbers in libpq as well.
pg_strong_random() is based on, and replaces, the existing implementation
in pgcrypto. It can acquire strong random numbers from a number of sources,
depending on what's available:
- OpenSSL RAND_bytes(), if built with OpenSSL
- On Windows, the native cryptographic functions are used
- /dev/urandom
Unlike the current pgcrypto function, the source is chosen by configure.
That makes it easier to test different implementations, and ensures that
we don't accidentally fall back to a less secure implementation, if the
primary source fails. All of those methods are quite reliable, it would be
pretty surprising for them to fail, so we'd rather find out by failing
hard.
If no strong random source is available, we fall back to using erand48(),
seeded from current timestamp, like PostmasterRandom() was. That isn't
cryptographically secure, but allows us to still work on platforms that
don't have any of the above stronger sources. Because it's not very secure,
the built-in implementation is only used if explicitly requested with
--disable-strong-random.
This replaces the more complicated Fortuna algorithm we used to have in
pgcrypto, which is unfortunate, but all modern platforms have /dev/urandom,
so it doesn't seem worth the maintenance effort to keep that. pgcrypto
functions that require strong random numbers will be disabled with
--disable-strong-random.
Original patch by Magnus Hagander, tons of further work by Michael Paquier
and me.
Discussion: https://www.postgresql.org/message-id/CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAB7nPqRWkNYRRPJA7-cF+LfroYV10pvjdz6GNvxk-Eee9FypKA@mail.gmail.com
deleteWhatDependsOn() had grown an uncomfortably large number of
assumptions about what it's used for. There are actually only two minor
differences between what it does and what a regular performDeletion() call
can do, so let's invent additional bits in performDeletion's existing flags
argument that specify those behaviors, and get rid of deleteWhatDependsOn()
as such. (We'd probably have done it this way from the start, except that
performDeletion didn't originally have a flags argument, IIRC.)
Also, add a SKIP_EXTENSIONS flag bit that prevents ever recursing to an
extension, and use that when dropping temporary objects at session end.
This provides a more general solution to the problem addressed in a hacky
way in commit 08dd23cec: if an extension script creates temp objects and
forgets to remove them again, the whole extension went away when its
contained temp objects were deleted. The previous solution only covered
temp relations, but this solves it for all object types.
These changes require minor additions in dependency.c to pass the flags
to subroutines that previously didn't get them, but it's still a net
savings of code, and it seems cleaner than before.
Having done this, revert the special-case code added in 08dd23cec that
prevented addition of pg_depend records for temp table extension
membership, because that caused its own oddities: dropping an extension
that had created such a table didn't automatically remove the table,
leading to a failure if the table had another dependency on the extension
(such as use of an extension data type), or to a duplicate-name failure if
you then tried to recreate the extension. But we keep the part that
prevents the pg_temp_nnn schema from becoming an extension member; we never
want that to happen. Add a regression test case covering these behaviors.
Although this fixes some arguable bugs, we've heard few field complaints,
and any such problems are easily worked around by explicitly dropping temp
objects at the end of extension scripts (which seems like good practice
anyway). So I won't risk a back-patch.
Discussion: https://postgr.es/m/e51f4311-f483-4dd0-1ccc-abec3c405110@BlueTreble.com
Increase the default value of the existing max_worker_processes GUC
from 8 to 16, and add a new max_parallel_workers GUC with a maximum
of 8. This way, even if the maximum amount of parallel query is
happening, there is still room for background workers that do other
things, as originally envisioned when max_worker_processes was added.
Julien Rouhaud, reviewed by Amit Kapila and by revised by me.
Replace use of poll()/select() by WaitLatchOrSocket(), which is more
portable and flexible.
Also change walreceiver to use its procLatch instead of a custom latch.
From: Petr Jelinek <petr@2ndquadrant.com>
Commit a734fd5d1 exposed some race conditions that existed previously
in the autovac code, but were basically harmless because autovac would
not try to delete orphaned relations immediately. Specifically, the test
for orphaned-ness was made on a pg_class tuple that might be dead by now,
allowing autovac to try to remove a table that the owning backend had just
finished deleting. This resulted in a hard crash due to inadequate caution
about accessing the table's catalog entries without any lock. We must take
a relation lock and then recheck whether the table is still present and
still looks deletable before we do anything.
Also, it seemed to me that deleting multiple tables per transaction, and
trying to continue after errors, represented unjustifiable complexity.
We do not expect this code path to be taken often in the field, nor even
during testing, which means that prioritizing performance over correctness
is a bad tradeoff. Rip all that out in favor of just starting a new
transaction after each successful temp table deletion. If we're unlucky
enough to get an error, which shouldn't happen anyway now that we're being
more cautious, let the autovacuum worker fail as it normally would.
In passing, improve the order of operations in the initial scan loop.
Now that we don't care about whether a temp table is a wraparound hazard,
there's no need to perform extract_autovac_opts, get_pgstat_tabentry_relid,
or relation_needs_vacanalyze for temp tables.
Also, if GetTempNamespaceBackendId returns InvalidBackendId (indicating
it doesn't recognize the schema as temp), treat that as meaning it's NOT
an orphaned temp table, not that it IS one, which is what happened before
because BackendIdGetProc necessarily failed. The case really shouldn't
come up for a table that has RELPERSISTENCE_TEMP, but the consequences
if it did seem undesirable. (This might represent a back-patchable bug
fix; not sure if it's worth the trouble.)
Discussion: https://postgr.es/m/21299.1480272347@sss.pgh.pa.us
A new thing also called a "barrier" is proposed, but whether we decide
to take that patch or not, this file seems to have outlived its
usefulness.
Thomas Munro
Condition variables provide a flexible way to sleep until a
cooperating process causes an arbitrary condition to become true. In
simple cases, this can be accomplished with a WaitLatch/ResetLatch
loop; the cooperating process can call SetLatch after performing work
that might cause the condition to be satisfied, and the waiting
process can recheck the condition each time. However, if the process
performing the work doesn't have an easy way to identify which
processes might be waiting, this doesn't work, because it can't
identify which latches to set. Condition variables solve that problem
by internally maintaining a list of waiters; a process that may have
caused some waiter's condition to be satisfied must "signal" or
"broadcast" on the condition variable.
Robert Haas and Thomas Munro
Previously, we only dropped an orphan temp table when it became old
enough to threaten wraparound; instead, doing it immediately. The
only value of waiting is that someone might be able to examine the
contents of the orphan temp table for forensic purposes, but it's
pretty difficult to actually do that and few users will wish to do so.
On the flip side, not performing the drop immediately generates log
spam and bloats pg_class.
In addition, per a report from Grigory Smolkin, if a temporary schema
contains a very large number of temporary tables, a backend attempting
to clear the temporary schema might fail due to lock table exhaustion.
It's helpful for autovacuum to clean up after such cases, and we don't
want it to wait for wraparound to threaten before doing so. To
prevent autovacuum from failing in the same manner as a backend trying
to drop an entire temp schema, remove orphan temp tables in batches of
50, committing after each batch, so that we don't accumulate an
unbounded number of locks. If a drop fails, retry other orphan tables
that need to be dropped up to 10 times before giving up. With this
system, if a backend does fail to clean a temporary schema due to
lock table exhaustion, autovacuum should hopefully put things right
the next time it processes the database.
Discussion: CAB7nPqSbYT6dRwsXVgiKmBdL_ARemfDZMPA+RPeC_ge0GK70hA@mail.gmail.com
Michael Paquier, with a bunch of comment changes by me.
This bug exists as far back as 9.0, when Hot Standby was introduced,
so back-patch to all supported branches.
Report and patch by Takayuki Tsunakawa, reviewed by Michael Paquier
and Kuntal Ghosh.
This reverts commit 9e083fd468. That was a
few bricks shy of a load:
* Query cancel stopped working
* Buildfarm member pademelon stopped working, because the box doesn't have
/dev/urandom nor /dev/random.
This clearly needs some more discussion, and a quite different patch, so
revert for now.
This adds a new routine, pg_strong_random() for generating random bytes,
for use in both frontend and backend. At the moment, it's only used in
the backend, but the upcoming SCRAM authentication patches need strong
random numbers in libpq as well.
pg_strong_random() is based on, and replaces, the existing implementation
in pgcrypto. It can acquire strong random numbers from a number of sources,
depending on what's available:
- OpenSSL RAND_bytes(), if built with OpenSSL
- On Windows, the native cryptographic functions are used
- /dev/urandom
- /dev/random
Original patch by Magnus Hagander, with further work by Michael Paquier
and me.
Discussion: <CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com>
pgstat_twophase_postcommit is supposed to duplicate the math in
AtEOXact_PgStat, but it had missed out the bit about clearing
t_delta_live_tuples/t_delta_dead_tuples for a TRUNCATE.
It's harder than you might think to replicate the issue here, because
those counters would only be nonzero when a previous transaction in
the same backend had added/deleted tuples in the truncated table,
and those counts hadn't been sent to the stats collector yet.
Evident oversight in commit d42358efb. I've not added a regression
test for this; we tried to add one in d42358efb, and had to revert it
because it was too timing-sensitive for the buildfarm.
Back-patch to 9.5 where d42358efb came in.
Stas Kelvich
Discussion: <EB57BF68-C06D-4737-BDDC-4BA778F4E62B@postgrespro.ru>
Fix detaching of the mmap'd segment to have its own on_shmem_exit callback,
rather than piggybacking on the one for detaching from the SysV segment.
That was confusing, and given the distance between the two attach calls,
it was trouble waiting to happen.
Make the detaching calls idempotent by clearing AnonymousShmem to show
we've already unmapped. I spent quite a bit of time yesterday trying
to find a path that would allow the munmap()'s to be done twice, and
while I did not succeed, it seems silly that there's even a question.
Make the #ifdef logic less confusing by separating "do we want to use
anonymous shmem" from EXEC_BACKEND. Even though there's no current
scenario where those conditions are different, it is not helpful for
different places in the same file to be testing EXEC_BACKEND for what
are fundamentally different reasons.
Don't do on_exit_reset() in StartBackgroundWorker(). At best that's
useless (InitPostmasterChild would have done it already) and at worst
it could zap some callback that's unrelated to shared memory.
Improve comments, and simplify the huge_pages enablement logic slightly.
Back-patch to 9.4 where hugepage support was introduced.
Arguably this should go into 9.3 as well, but the code looks
significantly different there, and I doubt it's worth the
trouble of adapting the patch given I can't show a live bug.
Windows apparently has a constant named WAIT_TIMEOUT, and some of these
other names are pretty generic, too. Insert "PG_" at the front of each
name in order to disambiguate.
Michael Paquier
WaitLatch, WaitLatchOrSocket, and WaitEventSetWait now taken an
additional wait_event_info parameter; legal values are defined in
pgstat.h. This makes it possible to uniquely identify every point in
the core code where we are waiting for a latch; extensions can pass
WAIT_EXTENSION.
Because latches were the major wait primitive not previously covered
by this patch, it is now possible to see information in
pg_stat_activity on a large number of important wait events not
previously addressed, such as ClientRead, ClientWrite, and SyncRep.
Unfortunately, many of the wait events added by this patch will fail
to appear in pg_stat_activity because they're only used in background
processes which don't currently appear in pg_stat_activity. We should
fix this either by creating a separate view for such information, or
else by deciding to include them in pg_stat_activity after all.
Michael Paquier and Robert Haas, reviewed by Alexander Korotkov and
Thomas Munro.
In standard Unix builds, postmaster child processes do ClosePostmasterPorts
immediately after InitPostmasterChild, that is almost immediately after
being spawned. This is important because we don't want children holding
open the postmaster's end of the postmaster death watch pipe.
However, in EXEC_BACKEND builds, SubPostmasterMain was postponing this
responsibility significantly, in order to make it slightly more convenient
to pass the right flag value to ClosePostmasterPorts. This is bad,
particularly seeing that process_shared_preload_libraries() might invoke
nearly-arbitrary code. Rearrange so that we do it as soon as we've
fetched the socket FDs via read_backend_variables().
Also move the comment explaining about randomize_va_space to before the
call of PGSharedMemoryReAttach, which is where it's relevant. The old
placement was appropriate when the reattach happened inside
CreateSharedMemoryAndSemaphores, but that was a long time ago.
Back-patch to 9.3; the patch doesn't apply cleanly before that, and
it doesn't seem worth a lot of effort given that we've had no actual
field complaints traceable to this.
Discussion: <4157.1475178360@sss.pgh.pa.us>
<sys/select.h> is required by POSIX.1-2001 to get the prototype of
select(2), but nearly no systems enforce that because older standards
let you get away with including some other headers. Recent OpenBSD
hacking has removed that frail touch of friendliness, however, which
broke some compiles; fix all the way back to 9.1 by adding the required
standard. Only vacuumdb.c was reported to fail, but it seems easier to
fix the whole lot in a fell swoop.
Per bug #14334 by Sean Farrell.