1
0
mirror of https://github.com/postgres/postgres.git synced 2026-01-27 21:43:08 +03:00
Commit Graph

12064 Commits

Author SHA1 Message Date
Peter Eisentraut
a359d37019 Simplify and generalize PrepareSortSupportFromIndexRel()
PrepareSortSupportFromIndexRel() was accepting btree strategy numbers
purely for the purpose of comparing it later against btree strategies
to determine if the sort direction was forward or reverse.  Change
that.  Instead, pass a bool directly, to indicate the same without an
unfortunate assumption that a strategy number refers specifically to a
btree strategy.  (This is similar in spirit to commits 0d2aa4d493 and
c594f1ad2ba.)

(This could arguably be simplfied further by having the callers fill
in ssup_reverse directly.  But this way, it preserves consistency by
having all PrepareSortSupport*() variants be responsible for filling
in ssup_reverse.)

Moreover, remove the hardcoded check against BTREE_AM_OID, and check
against amcanorder instead, which is the actual requirement.

Co-authored-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-03-14 10:34:08 +01:00
Peter Eisentraut
05cbd6cb22 Swap order of extern/static and pg_nodiscard
When pg_nodiscard was first added, the C standard draft had it as a
function specifier, and so the code comment about placement was
written with that in mind.  The final C23 standard has it as an
attribute and the placement rules are a bit different for that.
Specifically, it needs to be before extern or static.  (Or at least
both current clang and gcc require that.)  So just swap these.  (To be
clear: The current implementation with gcc attributes doesn't care.
This change is just for maximum forward compatibility for non-gcc
compilers.)  This also keeps the order consistent with the previously
introduced pg_noreturn.  Also update the code comment to reflect the
mentioned developments since its introduction.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
2025-03-14 07:18:07 +01:00
Thomas Munro
01261fb078 Improve buffer manager API for backend pin limits.
Previously the support functions assumed that the caller needed one pin
to make progress, and could optionally use some more, allowing enough
for every connection to do the same.  Add a couple more functions for
callers that want to know:

* what the maximum possible number could be, irrespective of currently
  held pins, for space planning purposes

* how many additional pins they could acquire right now, without the
  special case allowing one pin, for callers that already hold pins and
  could already make progress even if no extra pins are available

The pin limit logic began in commit 31966b15.  This refactoring is
better suited to read_stream.c, which will be adjusted to respect the
remaining limit as it changes over time in a follow-up commit.  It also
computes MaxProportionalPins up front, to avoid performing divisions
whenever a caller needs to check the balance.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
2025-03-14 17:13:09 +13:00
Nathan Bossart
0697b23906 Add reverse(bytea).
This commit introduces a function for reversing the order of the
bytes in binary strings.

Bumps catversion.

Author: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/CAJ7c6TMe0QVRuNssUArbMi0bJJK32%2BzNA3at5m3osrBQ25MHuw%40mail.gmail.com
2025-03-13 11:20:53 -05:00
Peter Eisentraut
3691edfab9 pg_noreturn to replace pg_attribute_noreturn()
We want to support a "noreturn" decoration on more compilers besides
just GCC-compatible ones, but for that we need to move the decoration
in front of the function declaration instead of either behind it or
wherever, which is the current style afforded by GCC-style attributes.
Also rename the macro to "pg_noreturn" to be similar to the C11
standard "noreturn".

pg_noreturn is now supported on all compilers that support C11 (using
_Noreturn), as well as GCC-compatible ones (using __attribute__, as
before), as well as MSVC (using __declspec).  (When PostgreSQL
requires C11, the latter two variants can be dropped.)

Now, all supported compilers effectively support pg_noreturn, so the
extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped.

This also fixes a possible problem if third-party code includes
stdnoreturn.h, because then the current definition of

    #define pg_attribute_noreturn() __attribute__((noreturn))

would cause an error.

Note that the C standard does not support a noreturn attribute on
function pointer types.  So we have to drop these here.  There are
only two instances at this time, so it's not a big loss.  In one case,
we can make up for it by adding the pg_noreturn to a wrapper function
and adding a pg_unreachable(), in the other case, the latter was
already done before.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
2025-03-13 12:37:26 +01:00
Amit Kapila
3abe9dc188 Avoid invalidating all RelationSyncCache entries on publication rename.
On Publication rename, we need to only invalidate the RelationSyncCache
entries corresponding to relations that are part of the publication being
renamed.

As part of this patch, we introduce a new invalidation message to
invalidate the cache maintained by the logical decoding output plugin. We
can't use existing relcache invalidation for this purpose, as that would
unnecessarily cause relcache invalidations in other backends.

This will improve performance by building fewer relation cache entries
during logical replication.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966C09AA201EFFA706576A7F5C92@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-03-13 09:16:33 +05:30
Masahiko Sawada
4ecdd4110d pg_rewind: Add dbname to primary_conninfo when using --write-recovery-conf.
This commit enhances pg_rewind's --write-recovery-conf option to
include the dbname in the generated primary_conninfo value when
specified in the --source-server option. With this modification, the
rewound server can connect to the primary server without manual
configuration file modifications when sync_replication_slots is
enabled.

Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAD21AoAkW=Ht0k9dVoBTCcqLiiZ2MXhVr+d=j2T_EZMerGrLWQ@mail.gmail.com
2025-03-12 16:56:04 -07:00
Heikki Linnakangas
ac4494646d Rename alloc/free functions in reorderbuffer.c
There used to be bespoken pools for these structs to reduce the
palloc/pfree overhead, but that was ripped out a long time ago and
replaced with the generic, cheaper generational memory allocator
(commit a4ccc1cef5). The Get/Return terminology made sense with the
pools, as you "got" an object from the pool and "returned" it later,
but now it just looks weird. Rename to Alloc/Free.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/c9e43d2d-8e83-444f-b111-430377368989@iki.fi
2025-03-12 22:03:39 +02:00
Melanie Plageman
ff79b5b2ab Increase default effective_io_concurrency to 16
The default effective_io_concurrency has been 1 since it was introduced
in b7b8f0b609. Referencing the associated discussion [1], it
seems 1 was chosen as a conservative value that seemed unlikely to cause
regressions.

Experimentation on high latency cloud storage as well as fast, local
nvme storage (see Discussion link) shows that even slightly higher
values improve query timings substantially. 1 actually performs worse
than 0 [2]. With effective_io_concurrency 1, we are not prefetching
enough to avoid I/O stalls, but we are issuing extra syscalls.

The new default is 16, which should be more appropriate for common
hardware while still avoiding flooding low IOPs devices with I/O
requests.

[1] https://www.postgresql.org/message-id/flat/FDDBA24E-FF4D-4654-BA75-692B3BA71B97%40enterprisedb.com
[2] https://www.postgresql.org/message-id/CAAKRu_Zv08Cic%3DqdCfzrQabpEXGrd9Z9UOW5svEVkCM6%3DFXA9g%40mail.gmail.com

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAAKRu_Z%2BJa-mwXebOoOERMMUMvJeRhzTjad4dSThxG0JLXESxw%40mail.gmail.com
2025-03-12 15:57:44 -04:00
Melanie Plageman
18cd15e706 Add connection establishment duration logging
Add log_connections option 'setup_durations' which logs durations of
several key parts of connection establishment and backend setup.

For an incoming connection, starting from when the postmaster gets a
socket from accept() and ending when the forked child backend is first
ready for query, there are multiple steps that could each take longer
than expected due to external factors. This logging provides visibility
into authentication and fork duration as well as the end-to-end
connection establishment and backend initialization time.

To make this portable, the timings captured in the postmaster (socket
creation time, fork initiation time) are passed through the
BackendStartupData.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Guillaume Lelarge <guillaume.lelarge@dalibo.com>
Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
2025-03-12 11:35:27 -04:00
Melanie Plageman
9219093cab Modularize log_connections output
Convert the boolean log_connections GUC into a list GUC comprised of the
connection aspects to log.

This gives users more control over the volume and kind of connection
logging.

The current log_connections options are 'receipt', 'authentication', and
'authorization'. The empty string disables all connection logging. 'all'
enables all available connection logging.

For backwards compatibility, the most common values for the
log_connections boolean are still supported (on, off, 1, 0, true, false,
yes, no). Note that previously supported substrings of on, off, true,
false, yes, and no are no longer supported.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
2025-03-12 11:35:21 -04:00
Masahiko Sawada
bd65cb3cd4 pg_logicalinspect: Fix possible crash when passing a directory path.
Previously, pg_logicalinspect functions were too trusting of their
input and blindly passed it to SnapBuildRestoreSnapshot(). If the
input pointed to a directory, the server could a PANIC error while
attempting to fsync_fname() with isdir=false on a directory.

This commit adds validation checks for input filenames and passes the
LSN extracted from the filename to SnapBuildRestoreSnapshot() instead
of the filename itself. It also adds regression tests for various
input patterns and permission checks.

Bug: #18828
Reported-by: Robins Tharakan <tharakan@gmail.com>
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/18828-0f4701c635064211@postgresql.org
2025-03-11 09:56:40 -07:00
Tom Lane
8b1b342544 Improve EXPLAIN's display of window functions.
Up to now we just punted on showing the window definitions used
in a plan, with window function calls represented as "OVER (?)".
To improve that, show the window definition implemented by each
WindowAgg plan node, and reference their window names in OVER.
For nameless window clauses generated by "OVER (...)", assign
unique names w1, w2, etc.

In passing, re-order the properties shown for a WindowAgg node
so that the Run Condition (if any) appears after the Window
property and before the Filter (if any).  This seems more
sensible since the Run Condition is associated with the Window
and acts before the Filter.

Thanks to David G. Johnston and Álvaro Herrera for design
suggestions.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/144530.1741469955@sss.pgh.pa.us
2025-03-11 11:19:54 -04:00
Peter Geoghegan
426ea61117 nbtree: Make BTMaxItemSize into object-like macro.
Make nbtree's "1/3 of a page limit" BTMaxItemSize function-like macro
(which accepts a "page" argument) into an object-like macro that can be
used from code that doesn't have convenient access to an nbtree page.

Preparation for an upcoming patch that adds skip scan to nbtree.
Parallel index scans that use skip scan will serialize datums (not just
SAOP array subscripts) when scheduling primitive scans.  BTMaxItemSize
will be used by btestimateparallelscan to determine how much DSM to
request.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wz=H_RG5weNGeUG_TkK87tRBnH9mGCQj6WpM4V4FNWKv2g@mail.gmail.com
2025-03-11 10:35:56 -04:00
Peter Geoghegan
0fbceae841 Show index search count in EXPLAIN ANALYZE, take 2.
Expose the count of index searches/index descents in EXPLAIN ANALYZE's
output for index scan/index-only scan/bitmap index scan nodes.  This
information is particularly useful with scans that use ScalarArrayOp
quals, where the number of index searches can be unpredictable due to
implementation details that interact with physical index characteristics
(at least with nbtree SAOP scans, since Postgres 17 commit 5bf748b8).
The information shown also provides useful context when EXPLAIN ANALYZE
runs a plan with an index scan node that successfully applied the skip
scan optimization (set to be added to nbtree by an upcoming patch).

The instrumentation works by teaching all index AMs to increment a new
nsearches counter whenever a new index search begins.  The counter is
incremented at exactly the same point that index AMs already increment
the pg_stat_*_indexes.idx_scan counter (we're counting the same event,
but at the scan level rather than the relation level).  Parallel queries
have workers copy their local counter struct into shared memory when an
index scan node ends -- even when it isn't a parallel aware scan node.
An earlier version of this patch that only worked with parallel aware
scans became commit 5ead85fb (though that was quickly reverted by commit
d00107cd following "debug_parallel_query=regress" buildfarm failures).

Our approach doesn't match the approach used when tracking other index
scan related costs (e.g., "Rows Removed by Filter:").  It is comparable
to the approach used in similar cases involving costs that are only
readily accessible inside an access method, not from the executor proper
(e.g., "Heap Blocks:" output for a Bitmap Heap Scan, which was recently
enhanced to show per-worker costs by commit 5a1e6df3, using essentially
the same scheme as the one used here).  It is necessary for index AMs to
have direct responsibility for maintaining the new counter, since the
counter might need to be incremented multiple times per amgettuple call
(or per amgetbitmap call).  But it is also necessary for the executor
proper to manage the shared memory now used to transfer each worker's
counter struct to the leader.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
2025-03-11 09:20:50 -04:00
Daniel Gustafsson
d35d32d711 Add special case fast-paths for strict functions
Many STRICT function calls will have one or two arguments, in which
case we can speed up checking for NULL input by avoiding setting up
a loop over the arguments. This adds EEOP_FUNCEXPR_STRICT_1 and the
corresponding EEOP_FUNCEXPR_STRICT_2 for functions with one and two
arguments respectively.

Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/415721CE-7D2E-4B74-B5D9-1950083BA03E@yesql.se
Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
2025-03-11 12:02:42 +01:00
Daniel Gustafsson
8dd7c7cd0a Replace EEOP_DONE with special steps for return/no return
Knowing when the side-effects of an expression is the intended result
of the execution, rather than the returnvalue, is important for being
able generate more efficient JITed code. This replaces EEOP_DONE with
two new steps: EEOP_DONE_RETURN and EEOP_DONE_NO_RETURN.  Expressions
which return a value should use the former step; expressions used for
their side-effects which don't return value should use the latter.

Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/415721CE-7D2E-4B74-B5D9-1950083BA03E@yesql.se
Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
2025-03-11 12:02:38 +01:00
Michael Paquier
76def4cdd7 Add WAL data to backend statistics
This commit adds per-backend WAL statistics, providing the same
information as pg_stat_wal, except that it is now possible to know how
much WAL activity is happening in each backend rather than an overall
aggregate of all the activity.  Like pg_stat_wal, the implementation
relies on pgWalUsage, tracking the difference of activity between two
reports to pgstats.

This data can be retrieved with a new system function called
pg_stat_get_backend_wal(), that returns one tuple based on the PID
provided in input.  Like pg_stat_get_backend_io(), this is useful when
joined with pg_stat_activity to get a live picture of the WAL generated
for each running backend, showing how the activity is [un]balanced.

pgstat_flush_backend() gains a new flag value, able to control the flush
of the WAL stats.

This commit relies mostly on the infrastructure provided by
9aea73fc61, that has introduced backend statistics.

Bump catalog version.  A bump of PGSTAT_FILE_FORMAT_ID is not required,
as backend stats do not persist on disk.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
2025-03-11 09:04:11 +09:00
Alexander Korotkov
6bb6a62f3c Use extended stats for precise estimation of bucket size in hash join
Recognizing the real-life complexity where columns in the table often have
functional dependencies, PostgreSQL's estimation of the number of distinct
values over a set of columns can be underestimated (or much rarely,
overestimated) when dealing with multi-clause JOIN.  In the case of hash
join, it can end up with a small number of predicted hash  buckets and, as
a result, picking non-optimal merge join.

To improve the situation, we introduce one additional stage of bucket size
estimation - having two or more join clauses estimator lookup for extended
statistics and use it for multicolumn estimation.  Clauses are grouped into
lists, each containing expressions referencing the same relation.  The result
of the multicolumn estimation made over such a list is combined with others
according to the caller's logic.  Clauses that are not estimated are returned
to the caller for further estimation.

Discussion: https://postgr.es/m/52257607-57f6-850d-399a-ec33a654457b%40postgrespro.ru
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Andy Fan <zhihui.fan1213@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-03-10 13:42:01 +02:00
David Rowley
e033696596 Fix incorrect #endif comment
Noticed while reading code in this area.
2025-03-10 13:36:04 +13:00
Peter Geoghegan
67fc4c9fd7 Make parallel nbtree index scans use an LWLock.
Teach parallel nbtree index scans to use an LWLock (not a spinlock) to
protect the scan's shared descriptor state.

Preparation for an upcoming patch that will add skip scan optimizations
to nbtree.  That patch will create the need to occasionally allocate
memory while the scan descriptor is locked, while copying datums that
were serialized by another backend.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
2025-03-08 11:10:14 -05:00
Peter Eisentraut
af4002b381 Rename amcancrosscompare
After more discussion about commit ce62f2f2a0, rename the index AM
property amcancrosscompare to two separate properties
amconsistentequality and amconsistentordering.  Also improve the
documentation and update some comments that were previously missed.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
2025-03-07 11:46:33 +01:00
Dean Rasheed
6da469bada Allow casting between bytea and integer types.
This allows smallint, integer, and bigint values to be cast to and
from bytea. The bytea value is the two's complement representation of
the integer, with the most significant byte first. For example:

  1234::bytea -> \x000004d2
  (-1234)::bytea -> \xfffffb2e

Author: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com
2025-03-07 09:31:18 +00:00
Heikki Linnakangas
39de4f157d Fix compiler warnings about typedef redefinitions
Clang with -Wtypedef-redefinition produced warnings:

    src/include/storage/latch.h:122:3: error: redefinition of typedef 'Latch' is a C11 feature [-Werror,-Wtypedef-redefinition]

Per buildfarm
2025-03-06 03:10:22 +02:00
Heikki Linnakangas
393e0d2314 Split WaitEventSet functions to separate source file
latch.c now only contains the Latch related functions, which build on
the WaitEventSet abstraction. Most of the platform-dependent stuff is
now in waiteventset.c.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/8a507fb6-df28-49d3-81a5-ede180d7f0fb@iki.fi
2025-03-06 01:26:16 +02:00
Heikki Linnakangas
a98e4dee63 Remove unused ShutdownLatchSupport() function
The only caller was removed in commit 80a8f95b3b. I don't foresee
needing it any time soon, and I'm working on some big changes in this
area, so let's remove it out of the way.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/8a507fb6-df28-49d3-81a5-ede180d7f0fb@iki.fi
2025-03-05 23:52:04 +02:00
Peter Geoghegan
d00107cd63 Revert "Show index search count in EXPLAIN ANALYZE."
This reverts commit 5ead85fbc8.

This commit shows test failures with debug_parallel_query=regress.  The
underlying issue needs to be debugged, so revert for now.
2025-03-05 10:27:31 -05:00
Andrew Dunstan
4603903d29 Allow json{b}_strip_nulls to remove null array elements
An additional paramater ("strip_in_arrays") is added to these functions.
It defaults to false. If true, then null array elements are removed as
well as null valued object fields. JSON that just consists of a single
null is not affected.

Author: Florents Tselai <florents.tselai@gmail.com>

Discussion: https://postgr.es/m/4BCECCD5-4F40-4313-9E98-9E16BEB0B01D@gmail.com
2025-03-05 10:04:02 -05:00
Peter Geoghegan
5ead85fbc8 Show index search count in EXPLAIN ANALYZE.
Expose the count of index searches/index descents in EXPLAIN ANALYZE's
output for index scan nodes.  This information is particularly useful
with scans that use ScalarArrayOp quals, where the number of index scans
isn't predictable in advance (at least not with optimizations like the
one added to nbtree by Postgres 17 commit 5bf748b8).  It will also be
useful when EXPLAIN ANALYZE shows details of an nbtree index scan that
uses skip scan optimizations set to be introduced by an upcoming patch.

The instrumentation works by teaching index AMs to increment a new
nsearches counter whenever a new index search begins.  The counter is
incremented at exactly the same point that index AMs must already
increment the index's pg_stat_*_indexes.idx_scan counter (we're counting
the same event, but at the scan level rather than the relation level).
The new counter is stored in the scan descriptor (IndexScanDescData),
which explain.c reaches by going through the scan node's PlanState.

This approach doesn't match the approach used when tracking other index
scan specific costs (e.g., "Rows Removed by Filter:").  It is similar to
the approach used in other cases where we must track costs that are only
readily accessible inside an access method, and not from the executor
(e.g., "Heap Blocks:" output for a Bitmap Heap Scan).  It is inherently
necessary to maintain a counter that can be incremented multiple times
during a single amgettuple call (or amgetbitmap call), and directly
exposing PlanState.instrument to index access methods seems unappealing.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com
2025-03-05 09:36:48 -05:00
Heikki Linnakangas
635f580120 Rename some signal and interrupt handling functions for consistency
The usual pattern for handling a signal is that the signal handler
sets a flag and calls SetLatch(MyLatch), and CHECK_FOR_INTERRUPTS() or
other code that is part of a wait loop calls another function to deal
with it. The naming of the functions involved was a bit inconsistent,
however. CHECK_FOR_INTERRUPTS() calls ProcessInterrupts() to do the
heavy-lifting, but the analogous functions in aux processes were
called HandleMainLoopInterrupts(), HandleStartupProcInterrupts(),
etc. Similarly, most subroutines of ProcessInterrupts() were called
Process*(), but some were called Handle*().

To make things less confusing, rename all the functions that are part
of the overall signal/interrupt handling system but are not executed
in a signal handler to e.g. ProcessSomething(), rather than
HandleSomething(). The "Process" prefix is now consistently used in
the non-signal-handler functions, and the "Handle" prefix in functions
that are part of signal handlers, except for some completely unrelated
functions that clearly have nothing to do with signal or interrupt
handling.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/8a384b26-1499-41f6-be33-64b801fb98b8@iki.fi
2025-03-05 16:22:26 +02:00
Álvaro Herrera
f4e53e10b6 Add ALTER TABLE ... ALTER CONSTRAINT ... SET [NO] INHERIT
This allows to redefine an existing non-inheritable constraint to be
inheritable, which allows to straighten up situations with NO INHERIT
constraints so that thay can become normal constraints without having to
re-verify existing data.  For existing inheritance children this may
require creating additional constraints, if they don't exist already.

It also allows to do the opposite, if only for symmetry.

Author: Suraj Kharage <suraj.kharage@enterprisedb.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CAF1DzPVfOW6Kk=7SSh7LbneQDJWh=PbJrEC_Wkzc24tHOyQWGg@mail.gmail.com
2025-03-05 13:50:22 +01:00
Tomas Vondra
c878de1db4 Make FP_LOCK_SLOTS_PER_BACKEND look like a function
The FP_LOCK_SLOTS_PER_BACKEND macro looks like a constant, but it
depends on the max_locks_per_transaction GUC, and thus can change. This
is non-obvious and confusing, so make it look more like a function by
renaming it to FastPathLockSlotsPerBackend().

While at it, use the macro when initializing fast-path shared memory,
instead of using the formula.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/ffiwtzc6vedo6wb4gbwelon5nefqg675t5c7an2ta7pcz646cg%40qwmkdb3l4ett
2025-03-04 18:33:12 +01:00
Michael Paquier
c76db55c90 Split pgstat_bestart() into three different routines
pgstat_bestart(), used post-authentication to set up a backend entry
in the PgBackendStatus array, so as its data becomes visible in
pg_stat_activity and related catalogs, has its logic divided into three
routines with this commit, called in order at different steps of the
backend initialization:
* pgstat_bestart_initial() sets up the backend entry with a minimal
amount of information, reporting it with a new BackendState called
STATE_STARTING while waiting for backend initialization and client
authentication to complete.  The main benefit that this offers is
observability, so as it is possible to monitor the backend activity
during authentication.  This step happens earlier than in the logic
prior to this commit.  pgstat_beinit() happens earlier as well, before
authentication.
* pgstat_bestart_security() reports the SSL/GSS status of the
connection, once authentication completes.  Auxiliary processes, for
example, do not need to call this step, hence it is optional.  This
step is called after performing authentication, same as previously.
* pgstat_bestart_final() reports the user and database IDs, takes the
entry out of STATE_STARTING, and reports its application_name.  This is
called as the last step of the three, once authentication completes.

An injection point is added, with a test checking that the "starting"
phase of a backend entry is visible in pg_stat_activity.  Some follow-up
patches are planned to take advantage of this refactoring with more
information provided in backend entries during authentication (LDAP
hanging was a problem for the author, initially).

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAOYmi+=60deN20WDyCoHCiecgivJxr=98s7s7-C8SkXwrCfHXg@mail.gmail.com
2025-03-04 14:09:44 +09:00
Tom Lane
4528768d98 Remove now-dead code in StoreAttrDefault().
StoreAttrDefault() is no longer responsible for filling
attmissingval, so remove the code for that.

Get rid of RawColumnDefault.missingMode, too, as we no longer
need that to pass information around.

While here, clean up some sloppy coding in StoreAttrDefault(),
such as failure to use XXXGetDatum macros.  These aren't bugs
but they're not good code either.

Reported-by: jian he <jian.universality@gmail.com>
Author: jian he <jian.universality@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxHFssPvkP1we7WMhPD_1kwgbG52o=kQgL+TnVoX5LOyCQ@mail.gmail.com
2025-03-03 13:09:20 -05:00
Tom Lane
95f650674d Fix broken handling of domains in atthasmissing logic.
If a domain type has a default, adding a column of that type (without
any explicit DEFAULT clause) failed to install the domain's default
value in existing rows, instead leaving the new column null.  This
is unexpected, and it used to work correctly before v11.  The cause
is confusion in the atthasmissing mechanism about which default value
to install: we'd only consider installing an explicitly-specified
default, and then we'd decide that no table rewrite is needed.

To fix, take the responsibility for filling attmissingval out of
StoreAttrDefault, and instead put it into ATExecAddColumn's existing
logic that derives the correct value to fill the new column with.
Also, centralize the logic that determines the need for
default-related table rewriting there, instead of spreading it over
four or five places.

In the back branches, we'll leave the attmissingval-filling code
in StoreAttrDefault even though it's now dead, for fear that some
extension may be depending on that functionality to exist there.
A separate HEAD-only patch will clean up the now-useless code.

Reported-by: jian he <jian.universality@gmail.com>
Author: jian he <jian.universality@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxHFssPvkP1we7WMhPD_1kwgbG52o=kQgL+TnVoX5LOyCQ@mail.gmail.com
Backpatch-through: 13
2025-03-03 12:43:44 -05:00
Melanie Plageman
99f8f3fbbc Add relallfrozen to pg_class
Add relallfrozen, an estimate of the number of pages marked all-frozen
in the visibility map.

pg_class already has relallvisible, an estimate of the number of pages
in the relation marked all-visible in the visibility map. This is used
primarily for planning.

relallfrozen, together with relallvisible, is useful for estimating the
outstanding number of all-visible but not all-frozen pages in the
relation for the purposes of scheduling manual VACUUMs and tuning vacuum
freeze parameters.

A future commit will use relallfrozen to trigger more frequent vacuums
on insert-focused workloads with significant volume of frozen data.

Bump catalog version

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_aj-P7YyBz_cPNwztz6ohP%2BvWis%3Diz3YcomkB3NpYA--w%40mail.gmail.com
2025-03-03 11:18:05 -05:00
Tomas Vondra
8492feb98f Allow parallel CREATE INDEX for GIN indexes
Allow using parallel workers to build a GIN index, similarly to BTREE
and BRIN. For large tables this may result in significant speedup when
the build is CPU-bound.

The work is divided so that each worker builds index entries on a subset
of the table, determined by the regular parallel scan used to read the
data. Each worker uses a local tuplesort to sort and merge the entries
for the same key. The TID lists do not overlap (for a given key), which
means the merge sort simply concatenates the two lists. The merged
entries are written into a shared tuplesort for the leader.

The leader needs to merge the sorted entries again, before writing them
into the index. But this way a significant part of the work happens in
the workers, and the leader is left with merging fewer large entries,
which is more efficient.

Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for GIN indexes
(e.g. uniqueness checks).

Original patch by me, with reviews and substantial improvements by
Matthias van de Meent, certainly enough to make him a co-author.

Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com
2025-03-03 16:53:06 +01:00
Masahiko Sawada
8a1012b35d Re-export NextCopyFromRawFields() to copy.h.
Commit 7717f63006 removed NextCopyFromRawFields() from copy.h. While
it was hoped that NextCopyFrom() could serve as an alternative,
certain use cases still require NextCopyFromRawFields(). For instance,
extensions like file_text_array_fdw, which process source data with an
unknown number of columns, rely on this function.

Per buildfarm member crake.

Reported-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Sutou Kouhei <kou@clear-code.com>
Discussion: https://postgr.es/m/5c7e1ac8-5083-4c08-af19-cb9ade2f16ce@dunslane.net
2025-02-28 15:11:41 -08:00
Masahiko Sawada
7717f63006 Refactor COPY FROM to use format callback functions.
This commit introduces a new CopyFromRoutine struct, which is a set of
callback routines to read tuples in a specific format. It also makes
COPY FROM with the existing formats (text, CSV, and binary) utilize
these format callbacks.

This change is a preliminary step towards making the COPY FROM command
extensible in terms of input formats.

Similar to 2e4127b6d2, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.

Author: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
2025-02-28 10:29:36 -08:00
Robert Haas
77cb08be51 Avoid including explain.h in explain_format.h and explain_dr.h
As per a suggestion from Tom Lane, we do this by declaring "struct
ExplainState" here and refer to that rather than "ExplainState".

Also per Tom, CreateExplainSerializeDestReceiver was still defined
in explain.h in addition to explain_dr.h. Remove leftover prototype.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYtaad3i21V0jqua-fbr+CR0ix6uBvEX8_s6BG96abd=g@mail.gmail.com
2025-02-28 13:17:29 -05:00
Michael Paquier
c2a50ac678 Invent pgstat_fetch_stat_backend_by_pid()
This code is extracted from pg_stat_get_backend_io() in pgstatfuncs.c,
so as it can be shared with other areas that need backend pgstats
entries while having the benefits of the various sanity checks
refactored here.  As per its name, this retrieves backend statistics
based on a PID, with the option of retrieving a BackendType if given in
input.

Currently, this is used for the backend-level IO statistics.  The next
move would be to reuse that for the backend-level WAL statistics.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-28 11:20:31 +09:00
Masahiko Sawada
2e4127b6d2 Refactor COPY TO to use format callback functions.
This commit introduces a new CopyToRoutine struct, which is a set of
callback routines to copy tuples in a specific format. It also makes
the existing formats (text, CSV, and binary) utilize these format
callbacks.

This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.

Additionally, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.

Author: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
2025-02-27 15:03:52 -08:00
Robert Haas
555960a0fb Create explain_dr.c and move DestReceiver-related code there.
explain.c has grown rather large, and the code that deals with the
DestReceiver that supports the SERIALIZE option is pretty easily severable
from the rest of explain.c; hence, move it to a separate file.

Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: http://postgr.es/m/CA+TgmoYutMw1Jgo8BWUmB3TqnOhsEAJiYO=rOQufF4gPLWmkLQ@mail.gmail.com
2025-02-27 13:14:16 -05:00
Robert Haas
9173e8b604 Create explain_format.c and move relevant code there.
explain.c has grown rather large, so move various functions that
are principally concerned with output generation to a new source
file, explain_format.c, instead of lumping them in with everything
else that is part of explain.c

Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: http://postgr.es/m/CA+TgmoYutMw1Jgo8BWUmB3TqnOhsEAJiYO=rOQufF4gPLWmkLQ@mail.gmail.com
2025-02-27 12:37:10 -05:00
Peter Eisentraut
ce62f2f2a0 Generalize hash and ordering support in amapi
Stop comparing access method OID values against HASH_AM_OID and
BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see
if it advertises its ability to perform the necessary ordering,
hashing, or cross-type comparing functionality.  A field amcanorder
already existed, this uses it more widely.  Fields amcanhash and
amcancrosscompare are added for the other purposes.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-27 17:03:31 +01:00
Tom Lane
c8c74ad7e1 Get rid of O(N^2) script-parsing overhead in pgbench.
pgbench wants to record the starting line number of each command
in its scripts.  It was computing that by scanning from the script
start and counting newlines, so that O(N^2) work had to be done
for an N-command script.  In a script with 50K lines, this adds
up to about 10 seconds on my machine.

To add insult to injury, the results were subtly wrong, because
expr_scanner_offset() scanned to find the NUL that flex inserts
at the end of the current token --- and before the first yylex
call, no such NUL has been inserted.  So we ended by computing the
script's last line number not its first one.  This was visible only
in case of \gset at the start of a script, which perhaps accounts
for the lack of complaints.

To fix, steal an idea from plpgsql and track the current lexer
ending position and line count as we advance through the script.
(It's a bit simpler than plpgsql since we can't need to back up.)
Also adjust a couple of other places that were invoking scans
from script start when they didn't really need to.  I made a new
psqlscan function psql_scan_get_location() that replaces both
expr_scanner_offset() and expr_scanner_get_lineno(), since in
practice expr_scanner_get_lineno() was only being invoked to find
the line number of the current lexer end position.

Reported-by: Daniel Vérité <daniel@manitou-mail.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/84a8a89e-adb8-47a9-9d34-c13f7150ee45@manitou-mail.org
2025-02-27 10:53:38 -05:00
Jeff Davis
15df9d7b51 Remove stray diff introduced by a5cbdeb98a.
Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z77IkjmmfbFfNh3f@paquier.xyz
2025-02-26 13:37:14 -08:00
Michael Paquier
0e42d31b0b Adding new PgStat_WalCounters structure in pgstat.h
This new structure contains the counters and the data related to the WAL
activity statistics gathered from WalUsage, separated into its own
structure so as it can be shared across more than one Stats structure in
pg_stat.h.

This refactoring will be used by an upcoming patch introducing
backend-level WAL statistics.

Bump PGSTAT_FILE_FORMAT_ID.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-26 16:48:54 +09:00
Michael Paquier
d7cbeaf261 Remove pgstat_flush_wal()
All the processes that generate WAL should call pgstat_report_wal() to
report all their statistics related to WAL, and this is already what
happens in the tree.  Keeping pgstat_report_wal() is confusing while the
other routine is encouraged.

This routine is not required since fc415edf8c, where it was lastly
used in pgstat_report_stat() before an equivalent callback existed.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z71oPkJJICrRB5Ws@paquier.xyz
2025-02-26 15:37:28 +09:00
Michael Paquier
6c349d83b6 Re-add GUC track_wal_io_timing
This commit is a rework of 2421e9a51d, about which Andres Freund has
raised some concerns as it is valuable to have both track_io_timing and
track_wal_io_timing in some cases, as the WAL write and fsync paths can
be a major bottleneck for some workloads.  Hence, it can be relevant to
not calculate the WAL timings in environments where pg_test_timing
performs poorly while capturing some IO data under track_io_timing for
the non-WAL IO paths.  The opposite can be also true: it should be
possible to disable the non-WAL timings and enable the WAL timings (the
previous GUC setups allowed this possibility).

track_wal_io_timing is added back in this commit, controlling if WAL
timings should be calculated in pg_stat_io for the read, fsync and write
paths, as done previously with pg_stat_wal.  pg_stat_wal previously
tracked only the sync and write parts (now removed), read stats is new
data tracked in pg_stat_io, all three are aggregated if
track_wal_io_timing is enabled.  The read part matters during recovery
or if a XLogReader is used.

Extra note: more control over if the types of timings calculated in
pg_stat_io could be done with a GUC that lists pairs of (IOObject,IOOp).

Reported-by: Andres Freund <andres@anarazel.de>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3opf2wh2oljco6ldyqf7ukabw3jijnnhno6fjb4mlu6civ5h24@fcwmhsgmlmzu
2025-02-26 09:49:59 +09:00