1
0
mirror of https://github.com/postgres/postgres.git synced 2025-05-28 05:21:27 +03:00

60839 Commits

Author SHA1 Message Date
Nathan Bossart
0164a0f9ee Add vacuum_truncate configuration parameter.
This new parameter works just like the storage parameter of the
same name: if set to true (which is the default), autovacuum and
VACUUM attempt to truncate any empty pages at the end of the table.
It is primarily intended to help users avoid locking issues on hot
standbys.  The setting can be overridden with the storage parameter
or VACUUM's TRUNCATE option.

Since there's presently no way to determine whether a Boolean
storage parameter is explicitly set or has just picked up the
default value, this commit also introduces an isset_offset member
to relopt_parse_elt.

Suggested-by: Will Storey <will@summercat.com>
Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Gurjeet Singh <gurjeet@singh.im>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/Z2DE4lDX4tHqNGZt%40dev.null
2025-03-20 10:16:50 -05:00
Peter Eisentraut
618c64ffd3 Revert workarounds for -Wmissing-braces false positives on old GCC
We have collected several instances of a workaround for GCC bug 53119,
which caused false-positive compiler warnings.  This bug has long been
fixed, but was still seen on the buildfarm, most recently on lapwing
with gcc (Debian 4.7.2-5).  (The GCC bug tracker mentions that a fix
was backported to 4.7.4 and 4.8.3.)

That compiler no longer runs warning-free since commit 6fdd5d95634, so
we don't need to keep these workarounds.  And furthermore, the
consensus appears to be that we don't want to keep supporting that era
of platform anymore at all.

This reverts the following commits:

d937904cce6a3d82e4f9c2127de7b59105a134b3
506428d091760650971433f6bc083531c307b368
b449afb582bb9015bfbb85abc10ce122aef9ec70
6392f2a0968c20ecde4d27b6652703ad931fce92
bad0763a4d7be3005eae35d460c73ac4bc7ebaad
5e0c761d0a13c7b4f7c5de618ac38560d74d74d0

and makes a few similar fixes to newer code.

Discussion: https://www.postgresql.org/message-id/flat/e170d61f-01ab-4cf9-ab68-91cd1fac62c5%40eisentraut.org
Discussion: https://www.postgresql.org/message-id/flat/CA%2BTgmoYEAm-KKZibAP3hSqbTFTjUd47XtVcf3xSFDpyecXX9uQ%40mail.gmail.com
2025-03-20 11:25:58 +01:00
Peter Eisentraut
b7076c1e7f Fix extension control path tests
Change expected extension to be installed from amcheck to plpgsql since
not all build farm animals has the contrib module installed.

Author: Matheus Alcantara <mths.dev@pm.me>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/E7C7BFFB-8857-48D4-A71F-88B359FADCFD@justatheory.com
2025-03-20 10:53:59 +01:00
Peter Eisentraut
47929324c5 Fix typo in comment 2025-03-20 10:44:12 +01:00
Amit Kapila
e5aeed4b80 pg_createsubscriber: Add -R publications option.
This patch introduces a new '-R'/'--remove' option in the
'pg_createsubscriber' utility to specify the object types to be removed
from the subscriber. Currently, we add support to specify 'publications'
as an object type. In the future, other object types like failover-slots
could be added.

This feature allows optionally to remove publications on the subscriber
that were replicated from the primary server (before running this tool)
during physical replication. Users may want to retain these publications
in case they want some pre-existing subscribers to point to the newly
created subscriber.

Author: Shubham Khanna <khannashubham1197@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAHv8RjL4OvoYafofTb_U_JD5HuyoNowBoGpMfnEbhDSENA74Kg@mail.gmail.com
2025-03-20 12:21:54 +05:30
Andres Freund
5941946d09 meson: Flush stdout in testwrap
Otherwise the progress won't reliably be displayed during a test.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/kx6xu7suexal5vwsxpy7ybgkcznx6hgywbuhkr6qabcwxjqax2@i4pcpk75jvaa
Backpatch-through: 16
2025-03-19 09:04:09 -04:00
Peter Eisentraut
190dc27998 Update a code comment
The comment explained that ALTER TABLE ADD CONSTRAINT USING INDEX is
only supported with a btree index.  (This is not being changed.)  The
reason is to keep upgrades robust, as explained there.  The other part
of the comment, that btree is the only unique index kind anyway, is
somewhat less true as we're trying to enable unique indexes other than
btree, and it's irrelevant to this check.  There is a check for
indisunique earlier already.  So just remove this part of the comment.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-19 10:39:06 +01:00
Peter Eisentraut
4f7f7b0375 extension_control_path
The new GUC extension_control_path specifies a path to look for
extension control files.  The default value is $system, which looks in
the compiled-in location, as before.

The path search uses the same code and works in the same way as
dynamic_library_path.

Some use cases of this are: (1) testing extensions during package
builds, (2) installing extensions outside security-restricted
containers like Python.app (on macOS), (3) adding extensions to
PostgreSQL running in a Kubernetes environment using operators such as
CloudNativePG without having to rebuild the base image for each new
extension.

There is also a tweak in Makefile.global so that it is possible to
install extensions using PGXS into an different directory than the
default, using 'make install prefix=/else/where'.  This previously
only worked when specifying the subdirectories, like 'make install
datadir=/else/where/share pkglibdir=/else/where/lib', for purely
implementation reasons.  (Of course, without the path feature,
installing elsewhere was rarely useful.)

Author: Peter Eisentraut <peter@eisentraut.org>
Co-authored-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Reviewed-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
Reviewed-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Reviewed-by: Niccolò Fei <niccolo.fei@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E7C7BFFB-8857-48D4-A71F-88B359FADCFD@justatheory.com
2025-03-19 07:03:20 +01:00
Michael Paquier
2cce0fe440 psql: Allow queries terminated by semicolons while in pipeline mode
Currently, the only way to pipe queries in an ongoing pipeline (in a
\startpipeline block) is to leverage the meta-commands able to create
extended queries such as \bind, \parse or \bind_named.

While this is good enough for testing the backend with pipelines, it has
been mentioned that it can also be very useful to allow queries
terminated by semicolons to be appended to a pipeline.  For example, it
would be possible to migrate existing psql scripts to use pipelines by
just adding a set of \startpipeline and \endpipeline meta-commands,
making such scripts more efficient.

Doing such a change is proving to be simple in psql: queries terminated
by semicolons can be executed through PQsendQueryParams() without any
parameters set when the pipeline mode is active, instead of
PQsendQuery(), the default, like pgbench.  \watch is still forbidden
while in a pipeline, as it expects its results to be processed
synchronously.

The large portion of this commit consists in providing more test
coverage, with mixes of extended queries appended in a pipeline by \bind
and friends, and queries terminated by semicolons.

This improvement has been suggested by Daniel Vérité.

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/d67b9c19-d009-4a50-8020-1a0ea92366a1@manitou-mail.org
2025-03-19 13:34:59 +09:00
Thomas Munro
0b53c08677 Fix compiler warning for commit 434dbf69.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
2025-03-19 17:26:16 +13:00
Thomas Munro
1cf4c56480 oauth: Simplify copy of PGoauthBearerRequest
Follow-up to 03366b61d. Since there are no more const members in the
PGoauthBearerRequest struct, the previous memcpy() can be replaced with
simple assignment.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/p4bd7mn6dxr2zdak74abocyltpfdxif4pxqzixqpxpetjwt34h%40qc6jgfmoddvq
2025-03-19 16:59:25 +13:00
Thomas Munro
873c0fd678 oauth: Improve validator docs on interruptibility
Andres pointed out that EINTR handling is inadequate for real-world use
cases. Direct module writers to our wait APIs instead.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/p4bd7mn6dxr2zdak74abocyltpfdxif4pxqzixqpxpetjwt34h%40qc6jgfmoddvq
2025-03-19 16:58:06 +13:00
Thomas Munro
d7e40845f9 oauth: Disallow synchronous DNS in libcurl
There is concern that a blocking DNS lookup in libpq could stall a
backend process (say, via FDW). Since there's currently no strong
evidence that synchronous DNS is a popular option, disallow it entirely
rather than warning at configure time. We can revisit if anyone
complains.

Per query from Andres Freund.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/p4bd7mn6dxr2zdak74abocyltpfdxif4pxqzixqpxpetjwt34h%40qc6jgfmoddvq
2025-03-19 16:56:19 +13:00
Thomas Munro
434dbf6907 oauth: Fix postcondition for set_timer on macOS
On macOS, readding an EVFILT_TIMER to a kqueue does not appear to clear
out previously queued timer events, so checks for timer expiration do
not work correctly during token retrieval. Switching to IPv4-only
communication exposes the problem, because libcurl is no longer clearing
out other timeouts related to Happy Eyeballs dual-stack handling.

Fully remove and re-register the kqueue timer events during each call to
set_timer(), to clear out any stale expirations.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/CAOYmi%2Bn4EDOOUL27_OqYT2-F2rS6S%2B3mK-ppWb2Ec92UEoUbYA%40mail.gmail.com
2025-03-19 16:45:01 +13:00
Thomas Munro
8d9d5843b5 oauth: Use IPv4-only issuer in oauth_validator tests
The test authorization server implemented in oauth_server.py does not
listen on IPv6. Most of the time, libcurl happily falls back to IPv4
after failing its initial connection, but on NetBSD, something is
consistently showing up on the unreserved IPv6 port and causing a test
failure.

Rather than deal with dual-stack details across all test platforms,
change the issuer to enforce the use of IPv4 only. (This elicits more
punishing timeout behavior from libcurl, so it's a useful change from
the testing perspective as well.)

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2Bn4EDOOUL27_OqYT2-F2rS6S%2B3mK-ppWb2Ec92UEoUbYA%40mail.gmail.com
2025-03-19 16:45:01 +13:00
Amit Langote
28317de723 Ensure first ModifyTable rel initialized if all are pruned
Commit cbc127917e introduced tracking of unpruned relids to avoid
processing pruned relations, and changed ExecInitModifyTable() to
initialize only unpruned result relations. As a result, MERGE
statements that prune all target partitions can now lead to crashes
or incorrect behavior during execution.

The crash occurs because some executor code paths rely on
ModifyTableState.resultRelInfo[0] being present and initialized,
even when no result relations remain after pruning. For example,
ExecMerge() and ExecMergeNotMatched() use the first resultRelInfo
to determine the appropriate action. Similarly,
ExecInitPartitionInfo() assumes that at least one result relation
exists.

To preserve these assumptions, ExecInitModifyTable() now includes the
first result relation in the initialized result relation list if all
result relations for that ModifyTable were pruned. To enable that,
ExecDoInitialPruning() ensures the first relation is locked if it was
pruned and locking is necessary.

To support this exception to the pruning logic, PlannedStmt now
includes a list of RT indexes identifying the first result relation
of each ModifyTable node in the plan. This allows
ExecDoInitialPruning() to check whether each such relation was
pruned and, if so, lock it if necessary.

Bug: #18830
Reported-by: Robins Tharakan <tharakan@gmail.com>
Diagnozed-by: Tender Wang <tndrwang@gmail.com>
Diagnozed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/18830-1f31ea1dc930d444%40postgresql.org
2025-03-19 12:14:24 +09:00
Thomas Munro
06fb5612c9 Increase io_combine_limit range to 1MB.
The default of 128kB is unchanged, but the upper limit is changed from
32 blocks to 128 blocks, unless the operating system's IOV_MAX is too
low.  Some other RDBMSes seem to cap their multi-block buffer pool I/O
around this number, and it seems useful to allow experimentation.

The concrete change is to our definition of PG_IOV_MAX, which provides
the maximum for io_combine_limit and io_max_combine_limit.  It also
affects a couple of other places that work with arrays of struct iovec
or smaller objects on the stack, so we still don't want to use the
system IOV_MAX directly without a clamp: it is not under our control and
likely to be 1024.  128 seems acceptable for our current usage.

For Windows, we can't use real scatter/gather yet, so we continue to
define our own IOV_MAX value of 16 and emulate preadv()/pwritev() with
loops.  Someone would need to research the trade-offs of raising that
number.

NB if trying to see this working: you might temporarily need to hack
BAS_BULKREAD to be bigger, since otherwise the obvious way of "a very
big SELECT" is limited by that for now.

Suggested-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
2025-03-19 15:40:35 +13:00
Thomas Munro
10f6646847 Introduce io_max_combine_limit.
The existing io_combine_limit can be changed by users.  The new
io_max_combine_limit is fixed at server startup time, and functions as a
silent clamp on the user setting.  That in itself is probably quite
useful, but the primary motivation is:

aio_init.c allocates shared memory for all asynchronous IOs including
some per-block data, and we didn't want to waste memory you'd never used
by assuming they could be up to PG_IOV_MAX.  This commit already halves
the size of 'AioHandleIov' and 'AioHandleData'.  A follow-up commit can
now expand PG_IOV_MAX without affecting that.

Since our GUC system doesn't support dependencies or cross-checks
between GUCs, the user-settable one now assigns a "raw" value to
io_combine_limit_guc, and the lower of io_combine_limit_guc and
io_max_combine_limit is maintained in io_combine_limit.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
2025-03-19 15:23:54 +13:00
Michael Paquier
17d8bba6da Fix copy-paste error related to the autovacuum launcher in pgstat_io.c
Autovacuum launchers perform no WAL IO reads, but pgstat_tracks_io_op()
was tracking them as an allowed combination for the "init" and "normal"
contexts.

This caused the "read", "read_bytes" and "read_time" attributes of
pg_stat_io to show zeros for the autovacuum launcher rather than NULL.
NULL means that a combination of IO object, IO context and IO operation
has no meaning for a backend type.  Zero is the same as telling that a
combination is relevant, and that WAL reads are possible in an
autovacuum launcher, but it is not relevant.

Copy-pasto introduced in a051e71e28a1.

Author: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAEudQAopEMAPiUqE7BvDV+x2fUPmKmb9RrsaoDR+hhQzLKg4PQ@mail.gmail.com
2025-03-19 08:52:10 +09:00
Masahiko Sawada
f4290f20dd Fix assertion failure in parallel vacuum with minimal maintenance_work_mem setting.
bbf668d66fbf lowered the minimum value of maintenance_work_mem to
64kB. However, in parallel vacuum cases, since the initial underlying
DSA size is 256kB, it attempts to perform a cycle of index vacuuming
and table vacuuming with an empty TID store, resulting in an assertion
failure.

This commit ensures that at least one page is processed before index
vacuuming and table vacuuming begins.

Backpatch to 17, where the minimum maintenance_work_mem value was
lowered.

Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAD21AoCEAmbkkXSKbj4dB+5pJDRL4ZHxrCiLBgES_g_g8mVi1Q@mail.gmail.com
Backpatch-through: 17
2025-03-18 16:37:02 -07:00
Michael Paquier
6d3ea48ff1 Optimize check for pending backend IO stats
This commit changes the backend stats code so as we rely on a single
boolean rather than a repeated check based on pg_memory_is_all_zeros()
in the code, making it cheaper should PgStat_PendingIO get bigger in
size.

The frequency of backend stats reports is not a bottleneck, but there is
no reason to not make that cheaper, and the logic is simple as the only
entry points updating backend IO stats are pgstat_count_backend_io_op()
and pgstat_count_backend_io_op_time().

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/Z8WYf1jyy4MwOveQ@ip-10-97-1-34.eu-west-3.compute.internal
2025-03-19 08:03:06 +09:00
Nathan Bossart
7fb418f020 Add commit 796bdda484 to .git-blame-ignore-revs. 2025-03-18 17:00:23 -05:00
Nathan Bossart
c9d502eb68 Update guidance for running vacuumdb after pg_upgrade.
Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-stats-only option
to only analyze relations that are missing statistics.

Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
2025-03-18 16:32:56 -05:00
Nathan Bossart
edba754f05 vacuumdb: Add option for analyzing only relations missing stats.
This commit adds a new --missing-stats-only option that can be used
with --analyze-only or --analyze-in-stages.  When this option is
specified, vacuumdb will analyze a relation if it lacks any
statistics for a column, expression index, or extended statistics
object.  This new option is primarily intended for use after
pg_upgrade (since it can now retain most optimizer statistics), but
it might be useful in other situations, too.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
2025-03-18 16:32:56 -05:00
Nathan Bossart
9c03c8d187 vacuumdb: Teach vacuum_one_database() to reuse query results.
Presently, each call to vacuum_one_database() queries the catalogs
to retrieve the list of tables to process.  A follow-up commit will
add a "missing stats only" feature to --analyze-in-stages, which
requires saving the catalog query results (since tables without
statistics will have them after the first stage).  This commit adds
a new parameter to vacuum_one_database() that specifies either a
previously-retrieved list or a place to return the catalog query
results.  Note that nothing uses this new parameter yet.

Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
2025-03-18 16:32:55 -05:00
Tom Lane
a6524105d2 Doc: manually break lines in wide UUID examples.
Buildfarm member crake has been complaining "WARNING: The contents of
fo:inline line 1 exceed the available area in the inline-progression
direction by 20500 millipoints. (See position 23808:106)" since
ba57dcfdc went in.  The other doc-building animals are not showing
this warning, and I don't see it on my RHEL8 workstation either, but
I was able to reproduce it on a Fedora 41 box.  So apparently this
is due to a recent-ish change in DocBook's line-breaking heuristics,
which caused it to cope less well with the UUIDs in these examples.
Put in some zero-width spaces to encourage the PDF toolchain to
break these lines in a better place.  (Only one of these examples
actually needs this today, but I marked up all three to ensure that
they get wrapped in a consistent way.)
2025-03-18 15:35:13 -04:00
Andres Freund
499faf9063 smgr: Make SMgrRelation initialization safer against errors
In case the smgr_open callback failed, the ->pincount field would not be
initialized and the relation would not be put onto the unpinned_relns list.

This buglet was introduced in 21d9c3ee4ef7, in 17.

Discussion: https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl
Backpatch-through: 17
2025-03-18 14:04:44 -04:00
Álvaro Herrera
62d712ecfd
Introduce squashing of constant lists in query jumbling
pg_stat_statements produces multiple entries for queries like
    SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on the number of parameters, because every element of
ArrayExpr is individually jumbled.  Most of the time that's undesirable,
especially if the list becomes too large.

Fix this by introducing a new GUC query_id_squash_values which modifies
the node jumbling code to only consider the first and last element of a
list of constants, rather than each list element individually.  This
affects both the query_id generated by query jumbling, as well as
pg_stat_statements query normalization so that it suppresses printing of
the individual elements of such a list.

The default value is off, meaning the previous behavior is maintained.

Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Sergey Dudoladov (mysterious, off-list)
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Marcos Pegoraro <marcos@f10.com.br>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Tested-by: Yasuo Honda <yasuo.honda@gmail.com>
Tested-by: Sergei Kornilov <sk@zsrv.org>
Tested-by: Maciek Sakrejda <m.sakrejda@gmail.com>
Tested-by: Chengxi Sun <sunchengxi@highgo.com>
Tested-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/CA+q6zcWtUbT_Sxj0V6HY6EZ89uv5wuG5aefpe_9n0Jr3VwntFg@mail.gmail.com
2025-03-18 18:56:11 +01:00
Andres Freund
247ce06b88 aio: Add io_method=worker
The previous commit introduced the infrastructure to start io_workers. This
commit actually makes the workers execute IOs.

IO workers consume IOs from a shared memory submission queue, run traditional
synchronous system calls, and perform the shared completion handling
immediately.  Client code submits most requests by pushing IOs into the
submission queue, and waits (if necessary) using condition variables.  Some
IOs cannot be performed in another process due to lack of infrastructure for
reopening the file, and must processed synchronously by the client code when
submitted.

For now the default io_method is changed to "worker". We should re-evaluate
that around beta1, we might want to be careful and set the default to "sync"
for 18.

Reviewed-by: Noah Misch <noah@leadboat.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-18 11:54:01 -04:00
Andres Freund
55b454d0e1 aio: Infrastructure for io_method=worker
This commit contains the basic, system-wide, infrastructure for
io_method=worker. It does not yet actually execute IO, this commit just
provides the infrastructure for running IO workers, kept separate for easier
review.

The number of IO workers can be adjusted with a PGC_SIGHUP GUC. Eventually
we'd like to make the number of workers dynamically scale up/down based on the
current "IO load".

To allow the number of IO workers to be increased without a restart, we need
to reserve PGPROC entries for the workers unconditionally. This has been
judged to be worth the cost. If it turns out to be problematic, we can
introduce a PGC_POSTMASTER GUC to control the maximum number.

As io workers might be needed during shutdown, e.g. for AIO during the
shutdown checkpoint, a new PMState phase is added. IO workers are shut down
after the shutdown checkpoint has been performed and walsender/archiver have
shut down, but before the checkpointer itself shuts down. See also
87a6690cc69.

Updates PGSTAT_FILE_FORMAT_ID due to the addition of a new BackendType.

Reviewed-by: Noah Misch <noah@leadboat.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-18 11:54:01 -04:00
Jeff Davis
549ea06e42 Fix headerscheck warning.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/93731.1742310701@sss.pgh.pa.us
2025-03-18 08:37:07 -07:00
Tom Lane
4078da6c47 Silence compiler warning.
Assorted buildfarm members are complaining about "'process_list' may
be used uninitialized in this function" since f76892c9f, presumably
because they don't trust that the switch case labels are exhaustive.
We can silence that by initializing the variable to NULL.  Should
a switch fall-through actually happen, we'll get SIGSEGV at the
first use, which is as good as an Assert.
2025-03-18 10:54:10 -04:00
Daniel Gustafsson
daa02c6bd9 Add X25519 to the default set of curves
Since many clients default to the X25519 curve in the TLS handshake,
the fact that the server by defualt doesn't support it cause an extra
roundtrip for each TLS connection.  By adding multiple curves, which
is supported since 3d1ef3a15c3eb68da, we can reduce the risk of extra
roundtrips.

Author: Daniel Gustafsson <daniel@yesql.se>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/20240616234612.6cslu7nqexquvwj7@awork3.anarazel.de
2025-03-18 15:26:27 +01:00
Robert Haas
4fd02bf7cf Add some new hooks so extensions can add details to EXPLAIN.
Specifically, add a per-node hook that is called after the per-node
information has been displayed but before we display children, and a
per-query hook that is called after existing query-level information
is printed. This assumes that extension-added information should
always go at the end rather than the beginning or the middle, but
that seems like an acceptable limitation for simplicity. It also
assumes that extensions will only want to add information, not remove
or reformat existing details; those also seem like acceptable
restrictions, at least for now.

If multiple EXPLAIN extensions are used, the order in which any
additional details are printed is likely to depend on the order in
which the modules are loaded. That seems OK, since the user may
have opinions about the order in which output should appear, and the
extension author can't really know whether their stuff is more or
less important to a particular user than some other extension.

Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com
Reviewed-by: Srinath Reddy <srinath2133@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
2025-03-18 09:28:01 -04:00
Álvaro Herrera
f76892c9ff
Simplify reindexdb coding
get_parallel_object_list() was trying to serve two masters, and it was
doing a bad job at both.  In particular, it treated the given user_list
as an output argument, but only sometimes.  This was confusing, and the
two paths through it didn't really have all that much in common, so the
complexity wasn't buying us much.  Split it in two:
get_parallel_tables_list() handles the straightforward cases for
schemas, databases and tables, takes one list as argument and returns
another list.

A new function get_parallel_tabidx_list() handles the case for indexes.
This takes a list as argument and outputs two lists, just like
get_parallel_object_list used to do, but now the API is clearer (IMO
anyway).  Another difference is that accompanying the list of indexes
now we have a list of tables as an OID list rather than a
fully-qualified table name list.  This makes some comparisons easier,
and we don't really need the names of the tables, just their OIDs.
(This requires atooid, which requires <stdlib.h>).

Author: Ranier Vilela <ranier.vf@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAEudQArfqr0-s0VVPSEh=0kgOgBJvFNdGW=xSL5rBcr0WDMQYQ@mail.gmail.com
2025-03-18 14:21:26 +01:00
Melanie Plageman
cc6be07ebd Increase default maintenance_io_concurrency to 16
Since its introduction in fc34b0d9de27a, the default
maintenance_io_concurrency has been larger than the default
effective_io_concurrency. maintenance_io_concurrency primarily
controlled prefetching done on behalf of the whole system, for
operations like recovery. Therefore it makes sense for it to have a
value equal to or greater than effective_io_concurrency, which controls
I/O concurrency for reading a relation in a bitmap heap scan.

ff79b5b2ab increased effective_io_concurrency to 16, so we'll increase
maintenance_io_concurrency as well. For now, though, we'll keep the
defaults of effective_io_concurrency and maintenance_io_concurrency
equal to one another (16).

On fast, high IOPs systems, significantly higher values of
maintenance_io_concurrency are observably beneficial [1]. However, such
values would flood low IOPs systems and increase overall system I/O
latency.

It is worth mentioning that since 9256822608f and c3e775e608f,
maintenance_io_concurrency also controls the I/O concurrency of each
vacuum worker. Since many autovacuum workers may be simultaneously
issuing I/Os, we want to keep maintenance_io_concurrency appropriately
conservative.

[1] https://postgr.es/m/c5d52837-6256-0556-ac8c-d6d3d558820a%40enterprisedb.com

Suggested-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/CAKZiRmxdHQaU%2B2Zpe6d%3Dx%3D0vigJ1sfWwwVYLJAf%3Dud_wQ_VcUw%40mail.gmail.com
2025-03-18 09:08:10 -04:00
Robert Haas
796bdda484 Fix indentation again.
Because somehow I manage to keep forgetting this.
2025-03-18 09:02:36 -04:00
Robert Haas
c65bc2e1d1 Make it possible for loadable modules to add EXPLAIN options.
Modules can use RegisterExtensionExplainOption to register new
EXPLAIN options, and GetExplainExtensionId, GetExplainExtensionState,
and SetExplainExtensionState to store related state inside the
ExplainState object.

Since this substantially increases the amount of code that needs
to handle ExplainState-related tasks, move a few bits of existing
code to a new file explain_state.c and add the rest of this
infrastructure there.

See the comments at the top of explain_state.c for further
explanation of how this mechanism works.

This does not yet provide a way for such such options to do anything
useful. The intention is that we'll add hooks for that purpose in a
separate commit.

Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com
Reviewed-by: Srinath Reddy <srinath2133@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
2025-03-18 08:41:12 -04:00
Peter Eisentraut
9d6db8bec1 Allow non-btree unique indexes for matviews
We were rejecting non-btree indexes in some cases owing to the
inability to determine the equality operators for other index AMs;
that problem no longer exists, because we can look up the equality
operator using COMPARE_EQ.

Stop rejecting these indexes, but instead rely on all unique indexes
having equality operators.  Unique indexes must have equality
operators.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-18 11:29:15 +01:00
Peter Eisentraut
f278e1fe30 Allow non-btree unique indexes for partition keys
We were rejecting non-btree indexes in some cases owing to the
inability to determine the equality operators for other index AMs;
that problem no longer exists, because we can look up the equality
operator using COMPARE_EQ.  The problem of not knowing the strategy
number for equality in other index AMs is already resolved.

Stop rejecting the indexes upfront, and instead reject any for which
the equality operator lookup fails.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-18 11:25:36 +01:00
Peter Eisentraut
7317e64126 Add some opfamily support functions to lsyscache.c
Add get_opfamily_method() and get_opfamily_member_for_cmptype() in
lsyscache.c.  No callers yet, but we'll add some soon.  This is part
of generalizing some parts of the code away from having btree
hardcoded and use CompareType instead.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-18 11:17:43 +01:00
Amit Kapila
122a9af5de Fix typo.
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CALDaNm1KqJ0VFfDJRPbfYi9Shz6LHFEE-Ckn+eqsePfKhebv9w@mail.gmail.com
2025-03-18 14:18:09 +05:30
Amit Kapila
01e27aab05 Use correct variable name in publicationcmds.c.
subid was used at few places for publicationid in publicationcmds.c/.h.

Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CALDaNm1KqJ0VFfDJRPbfYi9Shz6LHFEE-Ckn+eqsePfKhebv9w@mail.gmail.com
2025-03-18 14:06:51 +05:30
Masahiko Sawada
c462b054ba Fix the test 005_char_signedness.
pg_upgrade test 005_char_signedness was leaving files like
delete_old_cluster.sh in the source directory for VPATH and meson
builds. The fix is to change the directory to tmp_check before running
the test.

Reported-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoYg5e4oznn0XGoJ3+mceG1qe_JJt34rF2JLwvGS5T1hgQ@mail.gmail.com
2025-03-17 21:34:10 -07:00
Michael Paquier
17caf66445 psql: Add \sendpipeline to send query buffers while in a pipeline
In the initial pipeline support for psql added in 41625ab8ea3d, \g was
used as the way to push extended query into an ongoing pipeline.  \gx
was blocked.

These two meta-commands have format-related options that can be applied
when fetching a query result (expanded, etc.).  As the results of a
pipeline are fetched asynchronously, not at the moment of the
meta-command execution but at the moment of a \getresults or a
\endpipeline, authorizing \g while blocking \gx leads to a confusing
implementation, making one think that psql should be smart enough to
remember the output format options defined from the time when \g or \gx
were executed.  Doing so would lead to more code complications when
retrieving a batch of results.  There is an extra argument other than
simplicity here: the output format options defined at the point of a
\getresults or a \endpipeline execution should be what affect the output
format for a batch of results.

To avoid any confusion, we have settled to the introduction of a new
meta-command called \sendpipeline, replacing \g when within a pipeline.
An advantage of this design is that it is possible to add new options
specific to pipelines when sending a query buffer, independent of \g
and \gx, should it prove to be necessary.

Most of the changes of this commit happen in the regression tests, where
\g is replaced by \sendpipeline.  More tests are added to check that \g
is not allowed.

Per discussion between the author, Daniel Vérité and me.

Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/ad4b9f1a-f7fe-4ab8-8546-90754726d0be@manitou-mail.org
2025-03-18 09:41:21 +09:00
Andres Freund
da7226993f aio: Add core asynchronous I/O infrastructure
The main motivations to use AIO in PostgreSQL are:

a) Reduce the time spent waiting for IO by issuing IO sufficiently early.

   In a few places we have approximated this using posix_fadvise() based
   prefetching, but that is fairly limited (no completion feedback, double the
   syscalls, only works with buffered IO, only works on some OSs).

b) Allow to use Direct-I/O (DIO).

   DIO can offload most of the work for IO to hardware and thus increase
   throughput / decrease CPU utilization, as well as reduce latency.  While we
   have gained the ability to configure DIO in d4e71df6, it is not yet usable
   for real world workloads, as every IO is executed synchronously.

For portability, the new AIO infrastructure allows to implement AIO using
different methods. The choice of the AIO method is controlled by the new
io_method GUC. As of this commit, the only implemented method is "sync",
i.e. AIO is not actually executed asynchronously. The "sync" method exists to
allow to bypass most of the new code initially.

Subsequent commits will introduce additional IO methods, including a
cross-platform method implemented using worker processes and a linux specific
method using io_uring.

To allow different parts of postgres to use AIO, the core AIO infrastructure
does not need to know what kind of files it is operating on. The necessary
behavioral differences for different files are abstracted as "AIO
Targets". One example target would be smgr. For boring portability reasons,
all targets currently need to be added to an array in aio_target.c.  This
commit does not implement any AIO targets, just the infrastructure for
them. The smgr target will be added in a later commit.

Completion (and other events) of IOs for one type of file (i.e. one AIO
target) need to be reacted to differently, based on the IO operation and the
callsite. This is made possible by callbacks that can be registered on
IOs. E.g. an smgr read into a local buffer does not need to update the
corresponding BufferDesc (as there is none), but a read into shared buffers
does.  This commit does not contain any callbacks, they will be added in
subsequent commits.

For now the AIO infrastructure only understands READV and WRITEV operations,
but it is expected that more operations will be added. E.g. fsync/fdatasync,
flush_range and network operations like send/recv.

As of this commit, nothing uses the AIO infrastructure. Later commits will add
an smgr target, md.c and bufmgr.c callbacks and then finally use AIO for
read_stream.c IO, which, in one fell swoop, will convert all read stream users
to AIO.

The goal is to use AIO in many more places. There are patches to use AIO for
checkpointer and bgwriter that are reasonably close to being ready. There also
are prototypes to use it for WAL, relation extension, backend writes and many
more. Those prototypes were important to ensure the design of the AIO
subsystem is not too limiting (e.g. WAL writes need to happen in critical
sections, which influenced a lot of the design).

A future commit will add an AIO README explaining the AIO architecture and how
to use the AIO subsystem. The README is added later, as it references details
only added in later commits.

Many many more people than the folks named below have contributed with
feedback, work on semi-independent patches etc. E.g. various folks have
contributed patches to use the read stream infrastructure (added by Thomas in
b5a9b18cd0b) in more places. Similarly, a *lot* of folks have contributed to
the CI infrastructure, which I had started to work on to make adding AIO
feasible.

Some of the work by contributors has gone into the "v1" prototype of AIO,
which heavily influenced the current design of the AIO subsystem. None of the
code from that directly survives, but without the prototype, the current
version of the AIO infrastructure would not exist.

Similarly, the reviewers below have not necessarily looked at the current
design or the whole infrastructure, but have provided very valuable input. I
am to blame for problems, not they.

Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-17 18:51:33 -04:00
Andres Freund
02844012b3 aio: Basic subsystem initialization
This commit just does the minimal wiring up of the AIO subsystem, added in the
next commit, to the rest of the system. The next commit contains more details
about motivation and architecture.

This commit is kept separate to make it easier to review, separating the
changes across the tree, from the implementation of the new subsystem.

We discussed squashing this commit with the main commit before merging AIO,
but there has been a mild preference for keeping it separate.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-17 18:51:33 -04:00
Nathan Bossart
65db3963ae Add commit 203c1b4cc4 to .git-blame-ignore-revs. 2025-03-17 15:58:02 -05:00
Robert Haas
203c1b4cc4 Fix indentation.
Commit 99aeb84703177308c1541e2d11c09fdc59acb724 wasn't fully
reindented prior to commit.
2025-03-17 16:06:17 -04:00
Nathan Bossart
7e05df430b pg_upgrade: Remove some dead code.
Since commit e469f0aaf3, tablespace_suffix can't be empty.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/Z9hc3mkYFKR56Xof%40nathan
2025-03-17 13:18:14 -05:00