1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-10 17:42:29 +03:00
Commit Graph

713 Commits

Author SHA1 Message Date
Andres Freund
fe0972ee5e Add further debug info to help debug 019_replslot_limit.pl failures.
See also afdeff1052. Failures after that commit provided a few more hints,
but not yet enough to understand what's going on.

In 019_replslot_limit.pl shut down nodes with fast instead of immediate mode
if we observe the failure mode. That should tell us whether the failures we're
observing are just a timing issue under high load. PGCTLTIMEOUT should prevent
buildfarm animals from hanging endlessly.

Also adds a bit more logging to replication slot drop and ShutdownPostgres().

Discussion: https://postgr.es/m/20220225192941.hqnvefgdzaro6gzg@alap3.anarazel.de
2022-02-25 17:04:39 -08:00
Peter Eisentraut
5c868c92ca Fix possible null pointer reference
Per Coverity.  Introduced in 37851a8b83.
2022-02-21 09:42:46 +01:00
Andres Freund
fbabdf8f9a Fix meaning-changing typo introduced in fa0e03c15a. 2022-02-20 13:51:36 -08:00
Tom Lane
de447bb8e6 Suppress warning about stack_base_ptr with late-model GCC.
GCC 12 complains that set_stack_base is storing the address of
a local variable in a long-lived pointer.  This is an entirely
reasonable warning (indeed, it just helped us find a bug);
but that behavior is intentional here.  We can work around it
by using __builtin_frame_address(0) instead of a specific local
variable; that produces an address a dozen or so bytes different,
in my testing, but we don't care about such a small difference.
Maybe someday a compiler lacking that function will start to issue
a similar warning, but we'll worry about that when it happens.

Patch by me, per a suggestion from Andres Freund.  Back-patch to
v12, which is as far back as the patch will go without some pain.
(Recently-established project policy would permit a back-patch as
far as 9.2, but I'm disinclined to expend the work until GCC 12
is much more widespread.)

Discussion: https://postgr.es/m/3773792.1645141467@sss.pgh.pa.us
2022-02-17 22:46:01 -05:00
Peter Geoghegan
8f388f6f55 Increase hash_mem_multiplier default to 2.0.
Double the default setting for hash_mem_multiplier, from 1.0 to 2.0.
This setting makes hash-based executor nodes use twice the usual
work_mem limit.

The PostgreSQL 15 release notes should have a compatibility note about
this change.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wzndc_ROk6CY-bC6p9O53q974Y0Ey4WX8jcPbuTZYM4Q3A@mail.gmail.com
2022-02-16 18:41:52 -08:00
Andres Freund
2f6501fa3c Move replication slot release to before_shmem_exit().
Previously, replication slots were released in ProcKill() on error, resulting
in reporting replication slot drop of ephemeral slots after the stats
subsystem was already shut down.

To fix this problem, move replication slot release to a before_shmem_exit()
hook that is called before the stats collector shuts down. There wasn't really
a good reason for the slot handling to be in ProcKill() anyway.

Patch by Masahiko Sawada, with very minor polishing by me.

I, Andres, wrote a test for dropping slots during process exit, but there may
be some OS dependent issues around the number of times FATAL error messages
are displayed due to a still debated libpq issue. So that test will be
committed separately / later.

Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoDAeEpAbZEyYJsPZJUmSPaRicVSBObaL7sPaofnKz+9zg@mail.gmail.com
2022-02-14 17:08:17 -08:00
Peter Eisentraut
37851a8b83 Database-level collation version tracking
This adds to database objects the same version tracking that collation
objects have.  There is a new pg_database column datcollversion that
stores the version, a new function
pg_database_collation_actual_version() to get the version from the
operating system, and a new subcommand ALTER DATABASE ... REFRESH
COLLATION VERSION.

This was not originally added together with pg_collation.collversion,
since originally version tracking was only supported for ICU, and ICU
on a database-level is not currently supported.  But we now have
version tracking for glibc (since PG13), FreeBSD (since PG14), and
Windows (since PG13), so this is useful to have now.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/f0ff3190-29a3-5b39-a179-fa32eee57db6%40enterprisedb.com
2022-02-14 08:27:26 +01:00
Robert Haas
aa64f23b02 Remove MaxBackends variable in favor of GetMaxBackends() function.
Previously, it was really easy to write code that accessed MaxBackends
before we'd actually initialized it, especially when coding up an
extension. To make this less error-prune, introduce a new function
GetMaxBackends() which should be used to obtain the correct value.
This will ERROR if called too early. Demote the global variable to
a file-level static, so that nobody can peak at it directly.

Nathan Bossart. Idea by Andres Freund. Review by Greg Sabino Mullane,
by Michael Paquier (who had doubts about the approach), and by me.

Discussion: http://postgr.es/m/20210802224204.bckcikl45uezv5e4@alap3.anarazel.de
2022-02-08 15:53:19 -05:00
Robert Haas
5ef1eefd76 Allow archiving via loadable modules.
Running a shell command for each file to be archived has a lot of
overhead and may not offer as much error checking as you want, or the
exact semantics that you want. So, offer the option to call a loadable
module for each file to be archived, rather than running a shell command.

Also, add a 'basic_archive' contrib module as an example implementation
that archives to a local directory.

Nathan Bossart, with a little bit of kibitzing by me.

Discussion: http://postgr.es/m/20220202224433.GA1036711@nathanxps13
2022-02-03 14:05:02 -05:00
Alvaro Herrera
b3d7d6e462 Remove xloginsert.h from xlog.h
xlog.h is directly and indirectly #included in a lot of places.  With
this change, xloginsert.h is no longer unnecessarily included in the
large number of them that don't need it.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CALj2ACVe-W+WM5P44N7eG9C2_FmaeM8Dq5aCnD3fHt0Ba=WR6w@mail.gmail.com
2022-01-30 12:25:24 -03:00
Michael Paquier
5ecd0183fb Fix comments about bgworker registration before MaxBackends initialization
Since 6bc8ef0b, InitializeMaxBackends() has used max_worker_processes
instead of adapting MaxBackends to the number of background workers
registered by modules loaded in shared_preload_libraries (at this time,
bgworkers were only static, but gained dynamic capabilities as a matter
of supporting parallel queries meaning that a control cap was
necessary).

Some comments referred to the past registration logic, making them
confusing and incorrect, so fix these.

Some of the out-of-core modules that could be loaded in this path
sometimes like to manipulate dynamically some of the resource-related
GUCs for their own needs, this commit adds a note about that.

Author: Nathan Bossart
Discussion: https://postgr.es/m/20220127181815.GA551692@nathanxps13
2022-01-29 10:47:36 +09:00
Peter Eisentraut
54637508f8 Change collate and ctype fields to type text
This changes the data type of the catalog fields datcollate, datctype,
collcollate, and collctype from name to text.  There wasn't ever a
really good reason for them to be of type name; presumably this was
just carried over from when they were fixed-size fields in pg_control,
first into the corresponding pg_database fields, and then to
pg_collation.  The values are not identifiers or object names, and we
don't ever look them up that way.

Changing to type text saves space in the typical case, since locale
names are typically only a few bytes long.  But it is also possible
that an ICU locale name with several customization options appended
could be longer than 63 bytes, so this also enables that case, which
was previously probably broken.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a@2ndquadrant.com
2022-01-27 08:54:25 +01:00
Bruce Momjian
27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
Robert Haas
fa0e03c15a Remove InitXLOGAccess().
It's not great that RecoveryInProgress() calls InitXLOGAccess(),
because a status inquiry function typically shouldn't have the side
effect of performing initializations. We could fix that by calling
InitXLOGAccess() from some other place, but instead, let's remove it
altogether.

One thing InitXLogAccess() did is initialize wal_segment_size, but it
doesn't need to do that. In the postmaster, PostmasterMain() calls
LocalProcessControlFile(), and all child processes will inherit that
value -- except in EXEC_BACKEND bulds, but then each backend runs
SubPostmasterMain() which also calls LocalProcessControlFile().

The other thing InitXLOGAccess() did is update RedoRecPtr and
doPageWrites, but that's not critical, because all code that uses
them will just retry if it turns out that they've changed. The
only difference is that most code will now see an initial value that
is definitely invalid instead of one that might have just been way
out of date, but that will only happen once per backend lifetime,
so it shouldn't be a big deal.

Patch by me, reviewed by Nathan Bossart, Michael Paquier, Andres
Freund, Heikki Linnakangas, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoY7b65qRjzHN_tWUk8B4sJqk1vj1d31uepVzmgPnZKeLg@mail.gmail.com
2021-12-13 09:58:36 -05:00
Andres Freund
3b34645678 Initialize backend status reporting during bootstrap.
This allows a later commit to reduce the number of branches in performance
sensitive functions during normal running, compared to a very minor saving
during bootstrapping.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAAKRu_Yeg+vh6SHNEo1+=O7e-BPX35cU0XQM=YwQRnkFyv_y+w@mail.gmail.com
2021-11-19 08:43:12 -08:00
Robert Haas
e51c46991f Move InitXLogInsert() call from InitXLOGAccess() to BaseInit().
At present, there is an undocumented coding rule that you must call
RecoveryInProgress(), or do something else that results in a call
to InitXLogInsert(), before trying to write WAL. Otherwise, the
WAL construction buffers won't be initialized, resulting in
failures.

Since it's not good to rely on a status inquiry function like
RecoveryInProgress() having the side effect of initializing
critical data structures, instead do the initialization eariler,
when the backend first starts up.

Patch by me. Reviewed by Nathan Bossart and Michael Paquier.

Discussion: http://postgr.es/m/CA+TgmoY7b65qRjzHN_tWUk8B4sJqk1vj1d31uepVzmgPnZKeLg@mail.gmail.com
2021-11-16 09:43:17 -05:00
Andres Freund
1d5135f004 Fix typo.
Reported-By: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/YRIlNQhLNfx555Nx@paquier.xyz
2021-08-13 05:44:03 -07:00
Andres Freund
675c945394 Move temporary file cleanup to before_shmem_exit().
As reported by a few OSX buildfarm animals there exist at least one path where
temporary files exist during AtProcExit_Files() processing. As temporary file
cleanup causes pgstat reporting, the assertions added in ee3f8d3d3a caused
failures.

This is not an OSX specific issue, we were just lucky that timing on OSX
reliably triggered the problem.  The known way to cause this is a FATAL error
during perform_base_backup() with a MANIFEST used - adding an elog(FATAL)
after InitializeBackupManifest() reliably reproduces the problem in isolation.

The problem is that the temporary file created in InitializeBackupManifest()
is not cleaned up via resource owner cleanup as WalSndResourceCleanup()
currently is only used for non-FATAL errors. That then allows to reach
AtProcExit_Files() with existing temporary files, causing the assertion
failure.

To fix this problem, move temporary file cleanup to a before_shmem_exit() hook
and add assertions ensuring that no temporary files are created before / after
temporary file management has been initialized / shut down. The cleanest way
to do so seems to be to split fd.c initialization into two, one for plain file
access and one for temporary file access.

Right now there's no need to perform further fd.c cleanup during process exit,
so I just renamed AtProcExit_Files() to BeforeShmemExit_Files(). Alternatively
we could perform another pass through the files to check that no temporary
files exist, but the added assertions seem to provide enough protection
against that.

It might turn out that the assertions added in ee3f8d3d3a will cause too much
noise - in that case we'll have to downgrade them to a WARNING, at least
temporarily.

This commit is not necessarily the best approach to address this issue, but it
should resolve the buildfarm failures. We can revise later.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210807190131.2bm24acbebl4wl6i@alap3.anarazel.de
2021-08-07 19:20:47 -07:00
Andres Freund
a1bb3d5dbe Schedule ShutdownXLOG() in single user mode using before_shmem_exit().
Previously on_shmem_exit() was used. The upcoming shared memory stats patch
uses DSM segments to store stats, which can not be used after the
dsm_backend_shutdown() call in shmem_exit().  There does not seem to be any
reason to do ShutdownXLOG() via on_shmem_exit(), so change it.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de
Discussion: https://postgr.es/m/20210803023612.iziacxk5syn2r4ut@alap3.anarazel.de
2021-08-06 19:10:32 -07:00
Andres Freund
ee3f8d3d3a pgstat: Bring up pgstat in BaseInit() to fix uninitialized use of pgstat by AV.
Previously pgstat_initialize() was called in InitPostgres() and
AuxiliaryProcessMain(). As it turns out there was at least one case where we
reported stats before pgstat_initialize() was called, see
AutoVacWorkerMain()'s intentionally early call to pgstat_report_autovac().

This turns out to not be a problem with the current pgstat implementation as
pgstat_initialize() only registers a shutdown callback. But in the shared
memory based stats implementation we are working towards pgstat_initialize()
has to do more work.

After b406478b87 BaseInit() is a central place where initialization shared by
normal backends and auxiliary backends can be put. Obviously BaseInit() is
called before InitPostgres() registers ShutdownPostgres. Previously
ShutdownPostgres was the first before_shmem_exit callback, now that's commonly
pgstats. That should be fine.

Previously pgstat_initialize() was not called in bootstrap mode, but there
does not appear to be a need for that. It's now done unconditionally.

To detect future issues like this, assertions are added to a few places
verifying that the pgstat subsystem is initialized and not yet shut down.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
2021-08-06 19:05:59 -07:00
Andres Freund
b406478b87 process startup: Always call Init[Auxiliary]Process() before BaseInit().
For EXEC_BACKEND InitProcess()/InitAuxiliaryProcess() needs to have been
called well before we call BaseInit(), as SubPostmasterMain() needs LWLocks to
work. Having the order of initialization differ between platforms makes it
unnecessarily hard to understand the system and to add initialization points
for new subsystems without a lot of duplication.

To be able to change the order, BaseInit() cannot trigger
CreateSharedMemoryAndSemaphores() anymore - obviously that needs to have
happened before we can call InitProcess(). It seems cleaner to create shared
memory explicitly in single user/bootstrap mode anyway.

After this change the separation of bufmgr initialization into
InitBufferPoolAccess() / InitBufferPoolBackend() is not meaningful anymore so
the latter is removed.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
2021-08-05 15:36:59 -07:00
Andres Freund
07bf378509 process startup: Centralize pgwin32_signal_initialize() calls.
For one, the existing location lead to somewhat awkward code in main(). For
another, the new location is easier to understand anyway.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
2021-08-05 12:36:06 -07:00
Fujii Masao
43620e3286 Add function to log the memory contexts of specified backend process.
Commit 3e98c0bafb added pg_backend_memory_contexts view to display
the memory contexts of the backend process. However its target process
is limited to the backend that is accessing to the view. So this is
not so convenient when investigating the local memory bloat of other
backend process. To improve this situation, this commit adds
pg_log_backend_memory_contexts() function that requests to log
the memory contexts of the specified backend process.

This information can be also collected by calling
MemoryContextStats(TopMemoryContext) via a debugger. But
this technique cannot be used in some environments because no debugger
is available there. So, pg_log_backend_memory_contexts() allows us to
see the memory contexts of specified backend more easily.

Only superusers are allowed to request to log the memory contexts
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its memory contexts at LOG_SERVER_ONLY level,
so that these memory contexts will appear in the server log but not
be sent to the client. It logs one message per memory context.
Because if it buffers all memory contexts into StringInfo to log them
as one message, which may require the buffer to be enlarged very much
and lead to OOM error since there can be a large number of memory
contexts in a backend.

When a backend process is consuming huge memory, logging all its
memory contexts might overrun available disk space. To prevent this,
now this patch limits the number of child contexts to log per parent
to 100. As with MemoryContextStats(), it supposes that practical cases
where the log gets long will typically be huge numbers of siblings
under the same parent context; while the additional debugging value
from seeing details about individual siblings beyond 100 will not be large.

There was another proposed patch to add the function to return
the memory contexts of specified backend as the result sets,
instead of logging them, in the discussion. However that patch is
not included in this commit because it had several issues to address.

Thanks to Tatsuhito Kasahara, Andres Freund, Tom Lane, Tomas Vondra,
Michael Paquier, Kyotaro Horiguchi and Zhihong Yu for the discussion.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Zhihong Yu, Fujii Masao
Discussion: https://postgr.es/m/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com
2021-04-06 13:44:15 +09:00
Andres Freund
e1025044cd Split backend status and progress related functionality out of pgstat.c.
Backend status (supporting pg_stat_activity) and command
progress (supporting pg_stat_progress*) related code is largely
independent from the rest of pgstat.[ch] (supporting views like
pg_stat_all_tables that accumulate data over time). See also
a333476b92.

This commit doesn't rename the function names to make the distinction
from the rest of pgstat_ clearer - that'd be more invasive and not
clearly beneficial. If we were to decide to do such a rename at some
point, it's better done separately from moving the code as well.

Robert's review was of an earlier version.

Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de
2021-04-03 11:42:52 -07:00
Thomas Munro
c30f54ad73 Detect POLLHUP/POLLRDHUP while running queries.
Provide a new GUC check_client_connection_interval that can be used to
check whether the client connection has gone away, while running very
long queries.  It is disabled by default.

For now this uses a non-standard Linux extension (also adopted by at
least one other OS).  POLLRDHUP is not defined by POSIX, and other OSes
don't have a reliable way to know if a connection was closed without
actually trying to read or write.

In future we might consider trying to send a no-op/heartbeat message
instead, but that could require protocol changes.

Author: Sergey Cherkashin <s.cherkashin@postgrespro.ru>
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Tatsuo Ishii <ishii@sraoss.co.jp>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Maksim Milyutin <milyutinma@gmail.com>
Reviewed-by: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (much earlier version)
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
2021-04-03 09:02:41 +13:00
Peter Geoghegan
9f3665fbfc Don't consider newly inserted tuples in nbtree VACUUM.
Remove the entire idea of "stale stats" within nbtree VACUUM (stop
caring about stats involving the number of inserted tuples).  Also
remove the vacuum_cleanup_index_scale_factor GUC/param on the master
branch (though just disable them on postgres 13).

The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM
partially responsible for deciding when pg_class.reltuples stats needed
to be updated.  This seems contrary to the spirit of the index AM API,
though -- it is not actually necessary for an index AM's bulk delete and
cleanup callbacks to provide accurate stats when it happens to be
inconvenient.  The core code owns that.  (Index AMs have the authority
to perform or not perform certain kinds of deferred cleanup based on
their own considerations, such as page deletion and recycling, but that
has little to do with pg_class.reltuples/num_index_tuples.)

This issue was fairly harmless until the introduction of the
autovacuum_vacuum_insert_threshold feature by commit b07642db, which had
an undesirable interaction with the vacuum_cleanup_index_scale_factor
mechanism: it made insert-driven autovacuums perform full index scans,
even though there is no real benefit to doing so.  This has been tied to
a regression with an append-only insert benchmark [1].

Also have remaining cases that perform a full scan of an index during a
cleanup-only nbtree VACUUM indicate that the final tuple count is only
an estimate.  This prevents vacuumlazy.c from setting the index's
pg_class.reltuples in those cases (it will now only update pg_class when
vacuumlazy.c had TIDs for nbtree to bulk delete).  This arguably fixes
an oversight in deduplication-related bugfix commit 48e12913.

[1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com
Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.
2021-03-10 16:27:01 -08:00
Michael Paquier
f9264d1524 Remove support for SSL compression
PostgreSQL disabled compression as of e3bdb2d and the documentation
recommends against using it since.  Additionally, SSL compression has
been disabled in OpenSSL since version 1.1.0, and was disabled in many
distributions long before that.  The most recent TLS version, TLSv1.3,
disallows compression at the protocol level.

This commit removes the feature itself, removing support for the libpq
parameter sslcompression (parameter still listed for compatibility
reasons with existing connection strings, just ignored), and removes
the equivalent field in pg_stat_ssl and de facto PgBackendSSLStatus.

Note that, on top of removing the ability to activate compression by
configuration, compression is actively disabled in both frontend and
backend to avoid overrides from local configurations.

A TAP test is added for deprecated SSL parameters to check after
backwards compatibility.

Bump catalog version.

Author: Daniel Gustafsson
Reviewed-by: Peter Eisentraut, Magnus Hagander, Michael Paquier
Discussion:  https://postgr.es/m/7E384D48-11C5-441B-9EC3-F7DB1F8518F6@yesql.se
2021-03-09 11:16:47 +09:00
Thomas Munro
a042ba2ba7 Introduce symbolic names for FeBeWaitSet positions.
Previously we used 0 and 1 to refer to the socket and latch in far flung
parts of the tree, without any explanation.  Also use PGINVALID_SOCKET
rather than -1 in a couple of places that didn't already do that.

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
2021-03-01 16:10:16 +13:00
Thomas Munro
6a2a70a020 Use signalfd(2) for epoll latches.
Cut down on system calls and other overheads by reading from a signalfd
instead of using a signal handler and self-pipe.  Affects Linux sytems,
and possibly others including illumos that implement the Linux epoll and
signalfd interfaces.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 14:12:02 +13:00
Peter Eisentraut
0e392fcc0d Use errmsg_internal for debug messages
An inconsistent set of debug-level messages was not using
errmsg_internal(), thus uselessly exposing the messages to translation
work.  Fix those.
2021-02-17 11:33:25 +01:00
Peter Geoghegan
e19594c5c0 Reduce the default value of vacuum_cost_page_miss.
When commit f425b605 introduced cost based vacuum delays back in 2004,
the defaults reflected then-current trends in hardware, as well as
certain historical limitations in PostgreSQL.  There have been enormous
improvements in both areas since that time.  The cost limit GUC defaults
finally became much more representative of current trends following
commit cbccac37, which decreased autovacuum_vacuum_cost_delay's default
by 10x for PostgreSQL 12 (it went from 20ms to only 2ms).

The relative costs have shifted too.  This should also be accounted for
by the defaults.  More specifically, the relative importance of avoiding
dirtying pages within VACUUM has greatly increased, primarily due to
main memory capacity scaling and trends in flash storage.  Within
Postgres itself, improvements like sequential access during index
vacuuming (at least in nbtree and GiST indexes) have also been
contributing factors.

To reflect all this, decrease the default of vacuum_cost_page_miss to 2.
Since the default of vacuum_cost_page_dirty remains 20, dirtying a page
is now considered 10x "costlier" than a page miss by default.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzmLPFnkWT8xMjmcsm7YS3+_Qi3iRWAb2+_Bc8UhVyHfuA@mail.gmail.com
2021-01-27 15:11:13 -08:00
Tom Lane
9877374bef Add idle_session_timeout.
This GUC variable works much like idle_in_transaction_session_timeout,
in that it kills sessions that have waited too long for a new client
query.  But it applies when we're not in a transaction, rather than
when we are.

Li Japin, reviewed by David Johnston and Hayato Kuroda, some
fixes by me

Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
2021-01-06 18:28:52 -05:00
Bruce Momjian
ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Tom Lane
622ae4621e Fix assorted issues in backend's GSSAPI encryption support.
Unrecoverable errors detected by GSSAPI encryption can't just be
reported with elog(ERROR) or elog(FATAL), because attempting to
send the error report to the client is likely to lead to infinite
recursion or loss of protocol sync.  Instead make this code do what
the SSL encryption code has long done, which is to just report any
such failure to the server log (with elevel COMMERROR), then pretend
we've lost the connection by returning errno = ECONNRESET.

Along the way, fix confusion about whether message translation is done
by pg_GSS_error() or its callers (the latter should do it), and make
the backend version of that function work more like the frontend
version.

Avoid allocating the port->gss struct until it's needed; we surely
don't need to allocate it in the postmaster.

Improve logging of "connection authorized" messages with GSS enabled.
(As part of this, I back-patched the code changes from dc11f31a1.)

Make BackendStatusShmemSize() account for the GSS-related space that
will be allocated by CreateSharedBackendStatus().  This omission
could possibly cause out-of-shared-memory problems with very high
max_connections settings.

Remove arbitrary, pointless restriction that only GSS authentication
can be used on a GSS-encrypted connection.

Improve documentation; notably, document the fact that libpq now
prefers GSS encryption over SSL encryption if both are possible.

Per report from Mikael Gustavsson.  Back-patch to v12 where
this code was introduced.

Discussion: https://postgr.es/m/e5b0b6ed05764324a2f3fe7acfc766d5@smhi.se
2020-12-28 17:44:17 -05:00
Stephen Frost
dc11f31a1a Add GSS information to connection authorized log message
GSS information (if used) such as if the connection was authorized using
GSS or if it was encrypted using GSS, and perhaps most importantly, what
the GSS principal used for the authentication was, is extremely useful
but wasn't being included in the connection authorized log message.

Therefore, add to the connection authorized log message that
information, in a similar manner to how we log SSL information when SSL
is used for a connection.

Author: Vignesh C
Reviewed-by: Bharath Rupireddy
Discussion: https://www.postgresql.org/message-id/CALDaNm2N1385_Ltoo%3DS7VGT-ESu_bRQa-sC1wg6ikrM2L2Z49w%40mail.gmail.com
2020-12-02 14:41:53 -05:00
Peter Eisentraut
bdc4edbea6 Move catalog index declarations
Move the system catalog index declarations from catalog/indexing.h to
the respective parent tables' catalog/pg_*.h files.  The original
reason for having it split was that the old genbki system produced the
output in the order of the catalog files it read, so all the indexing
stuff needed to come separately.  But this is no longer the case, and
keeping it together makes more sense.

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/c7cc82d6-f976-75d6-2e3e-b03d2cab26bb@2ndquadrant.com
2020-11-07 12:26:24 +01:00
Tom Lane
44fc6e259b Centralize setup of SIGQUIT handling for postmaster child processes.
We decided that the policy established in commit 7634bd4f6 for
the bgwriter, checkpointer, walwriter, and walreceiver processes,
namely that they should accept SIGQUIT at all times, really ought
to apply uniformly to all postmaster children.  Therefore, get
rid of the duplicative and inconsistent per-process code for
establishing that signal handler and removing SIGQUIT from BlockSig.
Instead, make InitPostmasterChild do it.

The handler set up by InitPostmasterChild is SignalHandlerForCrashExit,
which just summarily does _exit(2).  In interactive backends, we
almost immediately replace that with quickdie, since we would prefer
to try to tell the client that we're dying.  However, this patch is
changing the behavior of autovacuum (both launcher and workers), as
well as walsenders.  Those processes formerly also used quickdie,
but AFAICS that was just mindless copy-and-paste: they don't have
any interactive client that's likely to benefit from being told this.

The stats collector continues to be an outlier, in that it thinks
SIGQUIT means normal exit.  That should probably be changed for
consistency, but there's another patch set where that's being
dealt with, so I didn't do so here.

Discussion: https://postgr.es/m/644875.1599933441@sss.pgh.pa.us
2020-09-16 16:04:36 -04:00
Tom Lane
0038f94387 Fix postmaster's behavior during smart shutdown.
Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the
postmaster has immediately killed all "optional" background processes,
and subsequently refused to launch new ones while it's waiting for
foreground client processes to exit.  No doubt this seemed like an OK
policy at some point; but it's a pretty bad one now, because it makes
for a seriously degraded environment for the remaining clients:

* Parallel queries are killed, and new ones fail to launch. (And our
parallel-query infrastructure utterly fails to deal with the case
in a reasonable way --- it just hangs waiting for workers that are
not going to arrive.  There is more work needed in that area IMO.)

* Autovacuum ceases to function.  We can tolerate that for awhile,
but if bulk-update queries continue to run in the surviving client
sessions, there's eventually going to be a mess.  In the worst case
the system could reach a forced shutdown to prevent XID wraparound.

* The bgwriter and walwriter are also stopped immediately, likely
resulting in performance degradation.

Hence, let's rearrange things so that the only immediate change in
behavior is refusing to let in new normal connections.  Once the last
normal connection is gone, shut everything down as though we'd received
a "fast" shutdown.  To implement this, remove the PM_WAIT_BACKUP and
PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY
while normal connections remain.  A subsidiary state variable tracks
whether or not we're letting in new connections in those states.

This also allows having just one copy of the logic for killing child
processes in smart and fast shutdown modes.  I moved that logic into
PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS.

Back-patch to 9.6 where parallel query was added.  In principle
this'd be a good idea in 9.5 as well, but the risk/reward ratio
is not as good there, since lack of autovacuum is not a problem
during typical uses of smart shutdown.

Per report from Bharath Rupireddy.

Patch by me, reviewed by Thomas Munro

Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com
2020-08-14 13:26:57 -04:00
Andres Freund
dc7420c2c9 snapshot scalability: Don't compute global horizons while building snapshots.
To make GetSnapshotData() more scalable, it cannot not look at at each proc's
xmin: While snapshot contents do not need to change whenever a read-only
transaction commits or a snapshot is released, a proc's xmin is modified in
those cases. The frequency of xmin modifications leads to, particularly on
higher core count systems, many cache misses inside GetSnapshotData(), despite
the data underlying a snapshot not changing. That is the most
significant source of GetSnapshotData() scaling poorly on larger systems.

Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons /
thresholds as it has so far. But we don't really have to: The horizons don't
actually change that much between GetSnapshotData() calls. Nor are the horizons
actually used every time a snapshot is built.

The trick this commit introduces is to delay computation of accurate horizons
until there use and using horizon boundaries to determine whether accurate
horizons need to be computed.

The use of RecentGlobal[Data]Xmin to decide whether a row version could be
removed has been replaces with new GlobalVisTest* functions.  These use two
thresholds to determine whether a row can be pruned:
1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed
   are definitely still visible.
2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
   definitely be removed
GetSnapshotData() updates definitely_needed to be the xmin of the computed
snapshot.

When testing whether a row can be removed (with GlobalVisTestIsRemovableXid())
and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID <
definitely_needed) the boundaries can be recomputed to be more accurate. As it
is not cheap to compute accurate boundaries, we limit the number of times that
happens in short succession.  As the boundaries used by
GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by
GetSnapshotData()), it is likely that further test can benefit from an earlier
computation of accurate horizons.

To avoid regressing performance when old_snapshot_threshold is set (as that
requires an accurate horizon to be computed), heap_page_prune_opt() doesn't
unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the
computation of the limited horizon, and the triggering of errors (with
SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove
tuples.

This commit just removes the accesses to PGXACT->xmin from
GetSnapshotData(), but other members of PGXACT residing in the same
cache line are accessed. Therefore this in itself does not result in a
significant improvement. Subsequent commits will take advantage of the
fact that GetSnapshotData() now does not need to access xmins anymore.

Note: This contains a workaround in heap_page_prune_opt() to keep the
snapshot_too_old tests working. While that workaround is ugly, the tests
currently are not meaningful, and it seems best to address them separately.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-12 16:03:49 -07:00
Thomas Munro
3347c982ba Use a long lived WaitEventSet for WaitLatch().
Create LatchWaitSet at backend startup time, and use it to implement
WaitLatch().  This avoids repeated epoll/kqueue setup and teardown
system calls.

Reorder SubPostmasterMain() slightly so that we restore the postmaster
pipe and Windows signal emulation before we reach InitPostmasterChild(),
to make this work in EXEC_BACKEND builds.

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
2020-07-30 17:40:00 +12:00
Peter Geoghegan
d6c08e29e7 Add hash_mem_multiplier GUC.
Add a GUC that acts as a multiplier on work_mem.  It gets applied when
sizing executor node hash tables that were previously size constrained
using work_mem alone.

The new GUC can be used to preferentially give hash-based nodes more
memory than the generic work_mem limit.  It is intended to enable admin
tuning of the executor's memory usage.  Overall system throughput and
system responsiveness can be improved by giving hash-based executor
nodes more memory (especially over sort-based alternatives, which are
often much less sensitive to being memory constrained).

The default value for hash_mem_multiplier is 1.0, which is also the
minimum valid value.  This means that hash-based nodes continue to apply
work_mem in the traditional way by default.

hash_mem_multiplier is generally useful.  However, it is being added now
due to concerns about hash aggregate performance stability for users
that upgrade to Postgres 13 (which added disk-based hash aggregation in
commit 1f39bce0).  While the old hash aggregate behavior risked
out-of-memory errors, it is nevertheless likely that many users actually
benefited.  Hash agg's previous indifference to work_mem during query
execution was not just faster; it also accidentally made aggregation
resilient to grouping estimate problems (at least in cases where this
didn't create destabilizing memory pressure).

hash_mem_multiplier can provide a certain kind of continuity with the
behavior of Postgres 12 hash aggregates in cases where the planner
incorrectly estimates that all groups (plus related allocations) will
fit in work_mem/hash_mem.  This seems necessary because hash-based
aggregation is usually much slower when only a small fraction of all
groups can fit.  Even when it isn't possible to totally avoid hash
aggregates that spill, giving hash aggregation more memory will reliably
improve performance (the same cannot be said for external sort
operations, which appear to be almost unaffected by memory availability
provided it's at least possible to get a single merge pass).

The PostgreSQL 13 release notes should advise users that increasing
hash_mem_multiplier can help with performance regressions associated
with hash aggregation.  That can be taken care of by a later commit.

Author: Peter Geoghegan
Reviewed-By: Álvaro Herrera, Jeff Davis
Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de
Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com
Backpatch: 13-, where disk-based hash aggregation was introduced.
2020-07-29 14:14:58 -07:00
Andres Freund
e07633646a code: replace 'master' with 'leader' where appropriate.
Leader already is the more widely used terminology, but a few places
didn't get the message.

Author: Andres Freund
Reviewed-By: David Steele
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
2020-07-08 12:58:32 -07:00
Peter Geoghegan
947456a823 Initialize work_mem using current guc.c default.
Do the same for the maintenance_work_mem global variable.

Oversight in commit 848ae330a4, which increased the previous defaults
for work_mem and maintenance_work_mem by 4X.
2020-07-02 16:34:54 -07:00
Tom Lane
5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Andres Freund
fc3f4453a2 Recompute stack base in forked postmaster children.
This is for the benefit of running postgres under the rr
debugger. When using rr signal handlers running while a syscall is
active use an alternative stack. As e.g. bgworkers are started from
within signal handlers, the forked backend then has a different stack
base than postmaster. Previously that subsequently lead to those
processes triggering spurious "stack depth limit exceeded" errors.

Discussion: https://postgr.es/m/20200327182217.ubrrl32lyfhxfwk5@alap3.anarazel.de
2020-04-05 18:23:30 -07:00
Peter Eisentraut
8e8a0becb3 Unify several ways to tracking backend type
Add a new global variable MyBackendType that uses the same BackendType
enum that was previously only used by the stats collector.  That way
several duplicate ways of checking what type a particular process is
can be simplified.  Since it's no longer just for stats, move to
miscinit.c and rename existing functions to match the expanded
purpose.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
2020-03-13 14:01:10 +01:00
Peter Eisentraut
bf68b79e50 Refactor ps_status.c API
The init_ps_display() arguments were mostly lies by now, so to match
typical usage, just use one argument and let the caller assemble it
from multiple sources if necessary.  The only user of the additional
arguments is BackendInitialize(), which was already doing string
assembly on the caller side anyway.

Remove the second argument of set_ps_display() ("force") and just
handle that in init_ps_display() internally.

BackendInitialize() also used to set the initial status as
"authentication", but that was very far from where authentication
actually happened.  So now it's set to "initializing" and then
"authentication" just before the actual call to
ClientAuthentication().

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
2020-03-11 16:38:31 +01:00
Peter Eisentraut
1c91838181 Clean up order in miscinit.c a bit
The code around InitPostmasterChild() from commit 31c453165b somehow
ended up in the middle of a block of code related to "User ID state".
Move it into its own block instead.
2020-03-11 13:51:55 +01:00
Tom Lane
481c8e9232 Assume that we have utime() and <utime.h>.
These are required by POSIX since SUSv2, and no live platforms fail
to provide them.  On Windows, utime() exists and we bring our own
<utime.h>, so we're good there too.  So remove the configure probes
and ad-hoc substitute code.  We don't need to check for utimes()
anymore either, since that was only used as a substitute.

In passing, make the Windows build include <sys/utime.h> only where
we need it, not everywhere.

This is part of a series of commits to get rid of no-longer-relevant
configure checks and dead src/port/ code.  I'm committing them separately
to make it easier to back out individual changes if they prove less
portable than I expect.

Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us
2020-02-21 14:30:47 -05:00
Thomas Munro
701a51fd4e Use pg_pwrite() in more places.
This removes some lseek() system calls.

Author: Thomas Munro
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CA%2BhUKGJ%2BoHhnvqjn3%3DHro7xu-YDR8FPr0FL6LF35kHRX%3D_bUzg%40mail.gmail.com
2020-02-11 17:50:22 +13:00