1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-24 00:23:06 +03:00
Commit Graph

2369 Commits

Author SHA1 Message Date
Thomas Munro
b6904afae4 ci: Use a RAM disk for NetBSD and OpenBSD.
Put the RAM disk setup for all three *BSD CI tasks into a common script,
replacing the old FreeBSD-specific one from commit 0265e5c1.  This makes
them run 3 times and a bit over 2 times faster, respectively.

NetBSD and FreeBSD now share the same one-liner to mount tmpfs.  OpenBSD
needs a GCP-image specific recipe that knows where to steal an unused
disk partition needed to reserve swap space for an mfs RAM disk, because
its tmpfs is deprecated and currently broken.  The configured size is
enough for our current tests but could potentially need future
expansion.  Thanks to Bilal for the disklabel incantation.

Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJJ-XrPhN%2BQA4ZUfYAAXcwOSDty9t0vE9Z8__AdacKnQg%40mail.gmail.com
2025-03-04 11:29:21 +13:00
Tomas Vondra
8492feb98f Allow parallel CREATE INDEX for GIN indexes
Allow using parallel workers to build a GIN index, similarly to BTREE
and BRIN. For large tables this may result in significant speedup when
the build is CPU-bound.

The work is divided so that each worker builds index entries on a subset
of the table, determined by the regular parallel scan used to read the
data. Each worker uses a local tuplesort to sort and merge the entries
for the same key. The TID lists do not overlap (for a given key), which
means the merge sort simply concatenates the two lists. The merged
entries are written into a shared tuplesort for the leader.

The leader needs to merge the sorted entries again, before writing them
into the index. But this way a significant part of the work happens in
the workers, and the leader is left with merging fewer large entries,
which is more efficient.

Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for GIN indexes
(e.g. uniqueness checks).

Original patch by me, with reviews and substantial improvements by
Matthias van de Meent, certainly enough to make him a co-author.

Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com
2025-03-03 16:53:06 +01:00
Masahiko Sawada
7717f63006 Refactor COPY FROM to use format callback functions.
This commit introduces a new CopyFromRoutine struct, which is a set of
callback routines to read tuples in a specific format. It also makes
COPY FROM with the existing formats (text, CSV, and binary) utilize
these format callbacks.

This change is a preliminary step towards making the COPY FROM command
extensible in terms of input formats.

Similar to 2e4127b6d2, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.

Author: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
2025-02-28 10:29:36 -08:00
Masahiko Sawada
2e4127b6d2 Refactor COPY TO to use format callback functions.
This commit introduces a new CopyToRoutine struct, which is a set of
callback routines to copy tuples in a specific format. It also makes
the existing formats (text, CSV, and binary) utilize these format
callbacks.

This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.

Additionally, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.

Author: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
2025-02-27 15:03:52 -08:00
Michael Paquier
0e42d31b0b Adding new PgStat_WalCounters structure in pgstat.h
This new structure contains the counters and the data related to the WAL
activity statistics gathered from WalUsage, separated into its own
structure so as it can be shared across more than one Stats structure in
pg_stat.h.

This refactoring will be used by an upcoming patch introducing
backend-level WAL statistics.

Bump PGSTAT_FILE_FORMAT_ID.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-26 16:48:54 +09:00
Andres Freund
ecbff4378b Change _mdfd_segpath() to return paths by value
This basically mirrors the changes done in the predecessor commit. While there
isn't currently a need to get these paths in critical sections, it seems a
shame to unnecessarily allocate memory in these paths now that relpath()
doesn't allocate anymore.

Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
2025-02-25 09:02:07 -05:00
Andres Freund
37c87e63f9 Change relpath() et al to return path by value
For AIO, and also some other recent patches, we need the ability to call
relpath() in a critical section. Until now that was not feasible, as it
allocated memory.

The fact that relpath() allocated memory also made it awkward to use in log
messages because we had to take care to free the memory afterwards. Which we
e.g. didn't do for when zeroing out an invalid buffer.

We discussed other solutions, e.g. filling a pre-allocated buffer that's
passed to relpath(), but they all came with plenty downsides or were larger
projects. The easiest fix seems to be to make relpath() return the path by
value.

To be able to return the path by value we need to determine the maximum length
of a relation path. This patch adds a long #define that computes the exact
maximum, which is verified to be correct in a regression test.

As this change the signature of relpath(), extensions using it will need to
adapt their code. We discussed leaving a backward-compat shim in place, but
decided it's not worth it given the use of relpath() doesn't seem widespread.

Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
2025-02-25 09:02:07 -05:00
Michael Paquier
2421e9a51d Remove read/sync fields from pg_stat_wal and GUC track_wal_io_timing
The four following attributes are removed from pg_stat_wal:
* wal_write
* wal_sync
* wal_write_time
* wal_sync_time

a051e71e28 has added an equivalent of this information in pg_stat_io
with more granularity as this now spreads across the backend types, IO
context and IO objects.  So, keeping the same information in pg_stat_wal
has little benefits.

Another benefit of this commit is the removal of PendingWalStats,
simplifying an upcoming patch to add per-backend WAL statistics, which
already support IO statistics and which have access to the write/sync
stats data of WAL.

The GUC track_wal_io_timing, that was used to enable or disable the
aggregation of the write and sync timings for WAL, is also removed.
pgstat_prepare_io_time() is simplified.

Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the update of PgStat_WalStats.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-24 09:51:56 +09:00
Andrew Dunstan
8e4d72573c Ignore blank lines in pgindent exclude files
Currently a blank line matches everything, which is almost never what
someone would want. If they really want that they can use a wildcard
regex to do it.

Author: Zsolt Parragi <zsolt.parragi@percona.com>

Discussion: https://postgr.es/m/CAN4CZFNka+2q3=-Dithr4w65RJfwPaV92T62spEzLn+T4MgcMg@mail.gmail.com
2025-02-20 11:36:07 -05:00
Daniel Gustafsson
b3f0be788a Add support for OAUTHBEARER SASL mechanism
This commit implements OAUTHBEARER, RFC 7628, and OAuth 2.0 Device
Authorization Grants, RFC 8628.  In order to use this there is a
new pg_hba auth method called oauth.  When speaking to a OAuth-
enabled server, it looks a bit like this:

  $ psql 'host=example.org oauth_issuer=... oauth_client_id=...'
  Visit https://oauth.example.org/login and enter the code: FPQ2-M4BG

Device authorization is currently the only supported flow so the
OAuth issuer must support that in order for users to authenticate.
Third-party clients may however extend this and provide their own
flows.  The built-in device authorization flow is currently not
supported on Windows.

In order for validation to happen server side a new framework for
plugging in OAuth validation modules is added.  As validation is
implementation specific, with no default specified in the standard,
PostgreSQL does not ship with one built-in.  Each pg_hba entry can
specify a specific validator or be left blank for the validator
installed as default.

This adds a requirement on libcurl for the client side support,
which is optional to build, but the server side has no additional
build requirements.  In order to run the tests, Python is required
as this adds a https server written in Python.  Tests are gated
behind PG_TEST_EXTRA as they open ports.

This patch has been a multi-year project with many contributors
involved with reviews and in-depth discussions:  Michael Paquier,
Heikki Linnakangas, Zhihong Yu, Mahendrakar Srinivasarao, Andrey
Chudnovsky and Stephen Frost to name a few.  While Jacob Champion
is the main author there have been some levels of hacking by others.
Daniel Gustafsson contributed the validation module and various bits
and pieces; Thomas Munro wrote the client side support for kqueue.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Reviewed-by: Kashif Zeeshan <kashi.zeeshan@gmail.com>
Discussion: https://postgr.es/m/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com
2025-02-20 16:25:17 +01:00
Jeff Davis
1fd1bd8710 Transfer statistics during pg_upgrade.
Add support to pg_dump for dumping stats, and use that during
pg_upgrade so that statistics are transferred during upgrade. In most
cases this removes the need for a costly re-analyze after upgrade.

Some statistics are not transferred, such as extended statistics or
statistics with a custom stakind.

Now pg_dump accepts the options --schema-only, --no-schema,
--data-only, --no-data, --statistics-only, and --no-statistics; which
allow all combinations of schema, data, and/or stats. The options are
named this way to preserve compatibility with the previous
--schema-only and --data-only options.

Statistics are in SECTION_DATA, unless the object itself is in
SECTION_POST_DATA.

The stats are represented as calls to pg_restore_relation_stats() and
pg_restore_attribute_stats().

Author: Corey Huinker, Jeff Davis
Reviewed-by: Jian He
Discussion: https://postgr.es/m/CADkLM=fzX7QX6r78fShWDjNN3Vcr4PVAnvXxQ4DiGy6V=0bCUA@mail.gmail.com
Discussion: https://postgr.es/m/CADkLM%3DcB0rF3p_FuWRTMSV0983ihTRpsH%2BOCpNyiqE7Wk0vUWA%40mail.gmail.com
2025-02-20 01:29:06 -08:00
Álvaro Herrera
80d7f99049 Add ATAlterConstraint struct for ALTER .. CONSTRAINT
Replace the use of Constraint with a new ATAlterConstraint struct, which
allows us to pass additional information.  No functionality is added by
this commit.  This is necessary for future work that allows altering
constraints in other ways.

I (Álvaro) took the liberty of restructuring the code for ALTER
CONSTRAINT beyond what Amul did.  The original coding before Amul's
patch was unnecessarily baroque, and this change makes things simpler
by removing one level of subroutine.  Also, partly remove the assumption
that only partitioned tables are relevant (by passing sensible 'recurse'
arguments) and no longer ignore whether ONLY was specified.  I say
'partly' because the current coding only walks down via the 'conparentid'
relationship, which is only used for partitioned tables; but future
patches could handle ONLY or not for other types of constraint changes
for legacy inheritance trees too.

Author: Amul Sul <sulamul@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAAJ_b94bfgPV-8Mw_HwSBeheVwaK9=5s+7+KbBj_NpwXQFgDGg@mail.gmail.com
2025-02-19 13:06:13 +01:00
Amit Kapila
ac0e33136a Invalidate inactive replication slots.
This commit introduces idle_replication_slot_timeout GUC that allows
inactive slots to be invalidated at the time of checkpoint. Because
checkpoints happen checkpoint_timeout intervals, there can be some lag
between when the idle_replication_slot_timeout was exceeded and when the
slot invalidation is triggered at the next checkpoint. To avoid such lags,
users can force a checkpoint to promptly invalidate inactive slots.

Note that the idle timeout invalidation mechanism is not applicable for
slots that do not reserve WAL or for slots on the standby server that are
synced from the primary server (i.e., standby slots having 'synced' field
'true'). Synced slots are always considered to be inactive because they
don't perform logical decoding to produce changes.

The slots can become inactive for a long period if a subscriber is down
due to a system error or inaccessible because of network issues. If such a
situation persists, it might be more practical to recreate the subscriber
rather than attempt to recover the node and wait for it to catch up which
could be time-consuming.

Then, external tools could create replication slots (e.g., for migrations
or upgrades) that may fail to remove them if an error occurs, leaving
behind unused slots that take up space and resources. Manually cleaning
them up can be tedious and error-prone, and without intervention, these
lingering slots can cause unnecessary WAL retention and system bloat.

As the duration of idle_replication_slot_timeout is in minutes, any test
using that would be time-consuming. We are planning to commit a follow up
patch for tests by using the injection point framework.

Author: Nisha Moond <nisha.moond412@gmail.com>
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB5716C131A7D80DAE8CB9E88794FC2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-02-19 09:29:50 +05:30
Daniel Gustafsson
95ef3d9029 Add PGErrorVerbosity to typedefs.list
PGErrorVerbosity was missing which resulted in incorrect whitespace
alignment going back all the way to e3860ffa4d.  No backpatch for
this though since we don't pgindent backbranches.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAGECzQTVi8n-HW4Q27je-b9ckQk7zf6bS_it42gNvQu+DX0NCQ@mail.gmail.com
2025-02-18 13:23:13 +01:00
Alexander Korotkov
fc069a3a63 Implement Self-Join Elimination
The Self-Join Elimination (SJE) feature removes an inner join of a plain
table to itself in the query tree if it is proven that the join can be
replaced with a scan without impacting the query result.  Self-join and
inner relation get replaced with the outer in query, equivalence classes,
and planner info structures.  Also, the inner restrictlist moves to the
outer one with the removal of duplicated clauses.  Thus, this optimization
reduces the length of the range table list (this especially makes sense for
partitioned relations), reduces the number of restriction clauses and,
in turn, selectivity estimations, and potentially improves total planner
prediction for the query.

This feature is dedicated to avoiding redundancy, which can appear after
pull-up transformations or the creation of an EquivalenceClass-derived clause
like the below.

  SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
  SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x);
  SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x;

In the future, we could also reduce redundancy caused by subquery pull-up
after unnecessary outer join removal in cases like the one below.

  SELECT * FROM t1 WHERE x IN
    (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x);

Also, it can drastically help to join partitioned tables, removing entries
even before their expansion.

The SJE proof is based on innerrel_is_unique() machinery.

We can remove a self-join when for each outer row:

 1. At most, one inner row matches the join clause;
 2. Each matched inner row must be (physically) the same as the outer one;
 3. Inner and outer rows have the same row mark.

In this patch, we use the next approach to identify a self-join:

 1. Collect all merge-joinable join quals which look like a.x = b.x;
 2. Add to the list above the baseretrictinfo of the inner table;
 3. Check innerrel_is_unique() for the qual list.  If it returns false, skip
    this pair of joining tables;
 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
    possibility of self-join elimination, the inner and outer clauses must
    match exactly.

The relation replacement procedure is not trivial and is partly combined
with the one used to remove useless left joins.  Tests covering this feature
were added to join.sql.  Some of the existing regression tests changed due
to self-join removal logic.

Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Reviewed-by: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Hywel Carver <hywel@skillerwhale.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Michał Kłeczek <michal@kleczek.org>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-02-17 12:44:12 +02:00
Andres Freund
c45963c5d5 ci: Collect core files on NetBSD and OpenBSD
Support for NetBSD and OpenBSD operating systems have been added to CI in the
prior commit. Now add support for collect core files and generating backtraces
using for all core files.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ32ySyYa06k9MFd+VY5vHhUyBpvgmJUZae5PihjzaurVg@mail.gmail.com
2025-02-12 09:40:20 -05:00
Andres Freund
e291573534 ci: Test NetBSD and OpenBSD
NetBSD and OpenBSD Postgres CI images are now generated [1], but aren't yet
utilized for Postgres' CI. This commit adds CI support for them.

For now the tasks will be manually triggered, to save on CI credits.

[1] https://github.com/anarazel/pg-vm-images

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ32ySyYa06k9MFd+VY5vHhUyBpvgmJUZae5PihjzaurVg@mail.gmail.com
2025-02-12 09:40:07 -05:00
Tom Lane
c366d2bdba Allow extension functions to participate in in-place updates.
Commit 1dc5ebc90 allowed PL/pgSQL to perform in-place updates
of expanded-object variables that are being updated with
assignments like "x := f(x, ...)".  However this was allowed
only for a hard-wired list of functions f(), since we need to
be sure that f() will not modify the variable if it fails.
It was always envisioned that we should make that extensible,
but at the time we didn't have a good way to do so.  Since
then we've invented the idea of "support functions" to allow
attaching specialized optimization knowledge to functions,
and that is a perfect mechanism for doing this.

Hence, adjust PL/pgSQL to use a support function request instead
of hard-wired logic to decide if in-place update is safe.
Preserve the previous optimizations by creating support functions
for the three functions that were previously hard-wired.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
2025-02-11 12:49:34 -05:00
Tom Lane
6c7251db0c Implement new optimization rule for updates of expanded variables.
If a read/write expanded variable is declared locally to the
assignment statement that is updating it, and it is referenced
exactly once in the assignment RHS, then we can optimize the
operation as a direct update of the expanded value, whether
or not the function(s) operating on it can be trusted not to
modify the value before throwing an error.  This works because
if an error does get thrown, we no longer care what value the
variable has.

In cases where that doesn't work, fall back to the previous
rule that checks for safety of the top-level function.

In any case, postpone determination of whether these optimizations
are feasible until we are executing a Param referencing the target
variable and that variable holds a R/W expanded object.  While the
previous incarnation of exec_check_rw_parameter was pretty cheap,
this is a bit less so, and our plan to invoke support functions
will make it even less so.  So avoiding the check for variables
where it couldn't be useful should be a win.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com
2025-02-11 12:34:59 -05:00
Andres Freund
ac00ff1c96 Add test of various escape functions
As highlighted by the prior commit, writing correct escape functions is less
trivial than one might hope.

This test module tries to verify that different escaping functions behave
reasonably. It e.g. tests:

- Invalidly encoded input to an escape function leads to invalidly encoded
  output

- Trailing incomplete multi-byte characters are handled sensibly

- Escaped strings are parsed as single statement by psql's parser (which
  derives from the backend parser)

There are further tests that would be good to add. But even in the current
state it was rather useful for writing the fix in the prior commit.

Reviewed-by: Noah Misch <noah@leadboat.com>
Backpatch-through: 13
Security: CVE-2025-1094
2025-02-10 10:03:37 -05:00
Peter Eisentraut
c09e5a6a01 Convert strategies to and from compare types
For each Index AM, provide a mapping between operator strategies and
the system-wide generic concept of a comparison type.  For example,
for btree, BTLessStrategyNumber maps to and from COMPARE_LT.  Numerous
places in the planner and executor think directly in terms of btree
strategy numbers (and a few in terms of hash strategy numbers.)  These
should be converted over subsequent commits to think in terms of
CompareType instead.  (This commit doesn't make any use of this API
yet.)

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-02 10:26:04 +01:00
Peter Eisentraut
2452e71ff2 Add script to keep .editorconfig in sync with .gitattributes
Our repo already contained an .editorconfig file, but it was not kept
up to date with .gitattributes.  This adds a script that keeps these
files in sync.  A big advantage of the editorconfig file is that it
many editors/IDEs get automatically configured to trim trailing
newlines and add a final newline on save, while .gitattributes only
complains about these problems instead of automatically fixing them.

This also adds rules to .gitattributes for Python files as well as for
C files in pg_bsd_indent directory (which have a different tab_width
than most C files due to being vendored in).

Author: Jelte Fennema-Nio <github-tech@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQQGzbroAXi+Yicp3HvcCo4=g84kaOgjuvQ5MW9F0ubOGg@mail.gmail.com
2025-02-01 10:09:45 +01:00
Amit Kapila
e65dbc9927 Change publication's publish_generated_columns option type to enum.
The current boolean publish_generated_columns option only supports a
binary choice, which is insufficient for future enhancements where
generated columns can be of different types (e.g., stored or virtual). The
supported values for the publish_generated_columns option are 'none' and
'stored'.

Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/d718d219-dd47-4a33-bb97-56e8fc4da994@eisentraut.org
Discussion: https://postgr.es/m/B80D17B2-2C8E-4C7D-87F2-E5B4BE3C069E@gmail.com
2025-01-23 15:28:37 +05:30
Michael Paquier
be31ac2519 Run perltidy
A follow-up patch will adjust the TAP tests to follow a more-structured
format for option lists in commands, that perltidy is able to cope
better with.  Putting the tree first in a clean state makes the next
change a bit easier.  v20230309 has been used.

Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/87jzc46d8u.fsf@wibble.ilmari.org
2025-01-22 10:15:32 +09:00
Peter Eisentraut
6e4df237fb Add PageData C type
This adds the C type PageData and makes the existing type Page a
pointer to it.  This follows the usual PostgreSQL C type naming scheme
of Foo/FooData pairs.  (Prior to commit ddbba3aac8, PageData existed
as an unrelated type.)  The type definitions are compatible, so this
doesn't change anything except some of the naming.

Discussion: https://www.postgresql.org/message-id/flat/692ee0da-49da-4d32-8dca-da224cc2800e@eisentraut.org
2025-01-20 11:06:49 +01:00
Jeff Davis
286a365b9c Support Unicode full case mapping and conversion.
Generate tables from Unicode SpecialCasing.txt to support more
sophisticated case mapping behavior:

 * support case mappings to multiple codepoints, such as "ß"
   uppercasing to "SS"
 * support conditional case mappings, such as the "final sigma"
 * support titlecase variants, such as "dž" uppercasing to "DŽ" but
   titlecasing to "Dž"

Discussion: https://postgr.es/m/ddfd67928818f138f51635712529bc5e1d25e4e7.camel@j-davis.com
Discussion: https://postgr.es/m/27bb0e52-801d-4f73-a0a4-02cfdd4a9ada@eisentraut.org
Reviewed-by: Peter Eisentraut, Daniel Verite
2025-01-17 15:56:20 -08:00
Melanie Plageman
f7a8fc10cc Add and use BitmapHeapScanDescData struct
Move the several members of HeapScanDescData which are specific to
Bitmap Heap Scans into a new struct, BitmapHeapScanDescData, which
inherits from HeapScanDescData.

This reduces the size of the HeapScanDescData for other types of scans
and will allow us to add additional bitmap heap scan-specific members in
the future without fear of bloating the HeapScanDescData.

Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/c736f6aa-8b35-4e20-9621-62c7c82e2168%40vondra.me
2025-01-16 18:42:39 -05:00
Tom Lane
d7674c9fab Seek zone abbreviations in the IANA data before timezone_abbreviations.
If a time zone abbreviation used in datetime input is defined in
the currently active timezone, use that definition in preference
to looking in the timezone_abbreviations list.  That allows us to
correctly handle abbreviations that have different meanings in
different timezones.  Also, it eliminates an inconsistency between
datetime input and datetime output: the non-ISO datestyles for
timestamptz have always printed abbreviations taken from the IANA
data, not from timezone_abbreviations.  Before this fix, it was
possible to demonstrate cases where casting a timestamp to text
and back fails or changes the value significantly because of that
inconsistency.

While this change removes the ability to override the IANA data about
an abbreviation known in the current zone, it's not clear that there's
any real use-case for doing so.  But it is clear that this makes life
a lot easier for dealing with abbreviations that have conflicts across
different time zones.

Also update the pg_timezone_abbrevs view to report abbreviations
that are recognized via the IANA data, and *not* report any
timezone_abbreviations entries that are thereby overridden.
Under the hood, there are now two SRFs, one that pulls the IANA
data and one that pulls timezone_abbreviations entries.  They're
combined by logic in the view.  This approach was useful for
debugging (since the functions can be called on their own).
While I don't intend to document the functions explicitly,
they might be useful to call directly.

Also improve DecodeTimezoneAbbrev's caching logic so that it can
cache zone abbreviations found in the IANA data.  Without that,
this patch would have caused a noticeable degradation of the
runtime of timestamptz_in.

Per report from Aleksander Alekseev and additional investigation.

Discussion: https://postgr.es/m/CAJ7c6TOATjJqvhnYsui0=CO5XFMF4dvTGH+skzB--jNhqSQu5g@mail.gmail.com
2025-01-16 14:11:19 -05:00
Dean Rasheed
80feb727c8 Add OLD/NEW support to RETURNING in DML queries.
This allows the RETURNING list of INSERT/UPDATE/DELETE/MERGE queries
to explicitly return old and new values by using the special aliases
"old" and "new", which are automatically added to the query (if not
already defined) while parsing its RETURNING list, allowing things
like:

  RETURNING old.colname, new.colname, ...

  RETURNING old.*, new.*

Additionally, a new syntax is supported, allowing the names "old" and
"new" to be changed to user-supplied alias names, e.g.:

  RETURNING WITH (OLD AS o, NEW AS n) o.colname, n.colname, ...

This is useful when the names "old" and "new" are already defined,
such as inside trigger functions, allowing backwards compatibility to
be maintained -- the interpretation of any existing queries that
happen to already refer to relations called "old" or "new", or use
those as aliases for other relations, is not changed.

For an INSERT, old values will generally be NULL, and for a DELETE,
new values will generally be NULL, but that may change for an INSERT
with an ON CONFLICT ... DO UPDATE clause, or if a query rewrite rule
changes the command type. Therefore, we put no restrictions on the use
of old and new in any DML queries.

Dean Rasheed, reviewed by Jian He and Jeff Davis.

Discussion: https://postgr.es/m/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
2025-01-16 14:57:35 +00:00
Peter Eisentraut
44512e7c95 Add a bit of documentation related to IWYU
Add some basic information about IWYU to src/tools/pginclude/README.

Discussion: https://www.postgresql.org/message-id/flat/9395d484-eff4-47c2-b276-8e228526c8ae@eisentraut.org
2025-01-15 18:57:53 +01:00
Peter Eisentraut
6fdd5d9563 Drop warning-free support for Flex 2.5.35
This removes all the various workarounds for avoiding compiler
warnings with Flex 2.5.35.  Several recent patches have added
additional warnings that would either need to be fixed along the lines
of the existing workarounds, or we decide to no longer care about
this, which we do here.

Flex 2.5.35 is extremely outdated, and you can't even download it
anymore from any of the Flex project sites, so it's nearly impossible
to support.

After this, using Flex 2.5.35 will still work, but the generated code
will produce numerous compiler warnings.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/1a204ccd-7ae6-478c-a431-407b5c48ccc6@eisentraut.org
2025-01-15 15:35:08 +01:00
Peter Eisentraut
6339f6468e Rename RowCompareType to CompareType
RowCompareType served as a way to describe the fundamental meaning of
an operator, notionally independent of an operator class (although so
far this was only really supported for btrees).  Its original purpose
was for use inside RowCompareExpr, and it has also found some small
use outside, such as for get_op_btree_interpretation().

We want to expand this now, as a more general way to describe operator
semantics for other index access methods, including gist (to improve
GistTranslateStratnum()) and others not written yet.  To avoid future
confusion, we rename the type to CompareType and the symbols from
ROWCOMPARE_XXX to COMPARE_XXX to reflect their more general purpose.

Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-01-15 08:44:01 +01:00
Heikki Linnakangas
af8cd1639a Fix catcache invalidation of a list entry that's being built
If a new catalog tuple is inserted that belongs to a catcache list
entry, and cache invalidation happens while the list entry is being
built, the list entry might miss the newly inserted tuple.

To fix, change the way we detect concurrent invalidations while a
catcache entry is being built. Keep a stack of entries that are being
built, and apply cache invalidation to those entries in addition to
the real catcache entries. This is similar to the in-progress list in
relcache.c.

Back-patch to all supported versions.

Reviewed-by: Noah Misch
Discussion: https://www.postgresql.org/message-id/2234dc98-06fe-42ed-b5db-ac17384dc880@iki.fi
2025-01-14 14:28:49 +02:00
Michael Paquier
2c14037bb5 Refactor some code related to backend statistics
This commit changes the way pending backend statistics are tracked by
moving them into a new structure called PgStat_BackendPending, removing
PgStat_BackendPendingIO.  PgStat_BackendPending currently only includes
PgStat_PendingIO for the pending I/O stats.

pgstat_flush_backend() is extended with a "flags" argument to control
which parts of the stats of a backend should be flushed.

With this refactoring, it becomes easier to plug into backend statistics
more data.  A patch to add information related to WAL in this stats kind
is under discussion.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
2025-01-10 09:00:48 +09:00
Andrew Dunstan
b20fe54c9c Set exit status for pgindent if pg_bsd_indent fails
Also document the exit codes in the script.

The new exit code is 3, and is not overridden by the exit code set in
--check mode.

Author: Ashutosh Bapat

Discussion: https://postgr.es/m/CAExHW5sPRSiFeLdP-u1Fa5ba7YS2f0gvLjmKOobopKadJwQ_GQ@mail.gmail.com
2025-01-08 10:56:12 -05:00
Jeff Davis
834c9e807c Add missing typedefs.list entry for AggStatePerGroupData.
Discussion: https://postgr.es/m/7530bd8783b1a78d53a3c70383e38d8da0a5ffe5.camel%40j-davis.com
2025-01-07 14:33:21 -08:00
Bruce Momjian
50e6eb731d Update copyright for 2025
Backpatch-through: 13
2025-01-01 11:21:55 -05:00
Peter Eisentraut
5af699066f Remove pgrminclude and associated scripts
Per git log, the last time someone tried to do something with
pgrminclude was around 2011.  And it's always had a tendency of
causing trouble when it was active.  Also, pgcominclude is redundant
with headerscheck.

Discussion: https://www.postgresql.org/message-id/flat/2d4dc7b2-cb2e-49b1-b8ca-ba5f7024f05b%40eisentraut.org
2024-12-24 14:02:42 +01:00
David Rowley
5983a4cffc Introduce CompactAttribute array in TupleDesc, take 2
The new compact_attrs array stores a few select fields from
FormData_pg_attribute in a more compact way, using only 16 bytes per
column instead of the 104 bytes that FormData_pg_attribute uses.  Using
CompactAttribute allows performance-critical operations such as tuple
deformation to be performed without looking at the FormData_pg_attribute
element in TupleDesc which means fewer cacheline accesses.

For some workloads, tuple deformation can be the most CPU intensive part
of processing the query.  Some testing with 16 columns on a table
where the first column is variable length showed around a 10% increase in
transactions per second for an OLAP type query performing aggregation on
the 16th column.  However, in certain cases, the increases were much
higher, up to ~25% on one AMD Zen4 machine.

This also makes pg_attribute.attcacheoff redundant.  A follow-on commit
will remove it, thus shrinking the FormData_pg_attribute struct by 4
bytes.

Author: David Rowley
Reviewed-by: Andres Freund, Victor Yegorov
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
2024-12-20 22:31:26 +13:00
Tom Lane
2762792952 Convert SetOp to read its inputs as outerPlan and innerPlan.
The original design for set operations involved appending the two
input relations into one and adding a flag column that allows
distinguishing which side each row came from.  Then the SetOp node
pries them apart again based on the flag.  This is bizarre.  The
only apparent reason to do it is that when sorting, we'd only need
one Sort node not two.  But since sorting is at least O(N log N),
sorting all the data is actually worse than sorting each side
separately --- plus, we have no chance of taking advantage of
presorted input.  On top of that, adding the flag column frequently
requires an additional projection step that adds cycles, and then
the Append node isn't free either.  Let's get rid of all of that
and make the SetOp node have two separate children, using the
existing outerPlan/innerPlan infrastructure.

This initial patch re-implements nodeSetop.c and does a bare minimum
of work on the planner side to generate correctly-shaped plans.
In particular, I've tried not to change the cost estimates here,
so that the visible changes in the regression test results will only
involve removal of useless projection steps and not any changes in
whether to use sorted vs hashed mode.

For SORTED mode, we combine successive identical tuples from each
input into groups, and then merge-join the groups.  The tuple
comparisons now use SortSupport instead of simple equality, but
the group-formation part should involve roughly the same number of
tuple comparisons as before.  The cross-comparisons between left and
right groups probably add to that, but I'm not sure to quantify how
many more comparisons we might need.

For HASHED mode, nodeSetop's logic is almost the same as before,
just refactored into two separate loops instead of one loop that
has an assumption that it will see all the left-hand inputs first.

In both modes, I added early-exit logic to not bother reading the
right-hand relation if the left-hand input is empty, since neither
INTERSECT nor EXCEPT modes can produce any output if the left input
is empty.  This could have been done before in the hashed mode, but
not in sorted mode.  Sorted mode can also stop as soon as it exhausts
the left input; any remaining right-hand tuples cannot have matches.

Also, this patch adds some infrastructure for detecting whether
child plan nodes all output the same type of tuple table slot.
If they do, the hash table logic can use slightly more efficient
code based on assuming that that's the input slot type it will see.
We'll make use of that infrastructure in other plan node types later.

Patch by me; thanks to Richard Guo and David Rowley for review.

Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
2024-12-19 16:23:45 -05:00
Michael Paquier
9aea73fc61 Add backend-level statistics to pgstats
This adds a new variable-numbered statistics kind in pgstats, where the
object ID key of the stats entries is based on the proc number of the
backends.  This acts as an upper-bound for the number of stats entries
that can exist at once.  The entries are created when a backend starts
after authentication succeeds, and are removed when the backend exits,
making the stats entry exist for as long as their backend is up and
running.  These are not written to the pgstats file at shutdown (note
that write_to_file is disabled, as a safety measure).

Currently, these stats include only information about the I/O generated
by a backend, using the same layer as pg_stat_io, except that it is now
possible to know how much activity is happening in each backend rather
than an overall aggregate of all the activity.  A function called
pg_stat_get_backend_io() is added to access this data depending on the
PID of a backend.  The existing structure could be expanded in the
future to add more information about other statistics related to
backends, depending on requirements or ideas.

Auxiliary processes are not included in this set of statistics.  These
are less interesting to have than normal backends as they have dedicated
entries in pg_stat_io, and stats kinds of their own.

This commit includes also pg_stat_reset_backend_stats(), function able
to reset all the stats associated to a single backend.

Bump catalog version and PGSTAT_FILE_FORMAT_ID.

Author: Bertrand Drouvot
Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, Michael Paquier, Nazir
Bilal Yavuz
Discussion: https://postgr.es/m/ZtXR+CtkEVVE/LHF@ip-10-97-1-34.eu-west-3.compute.internal
2024-12-19 13:19:22 +09:00
Melanie Plageman
7f9d4187e7 Add common interface for TBMIterators
Add and use TBMPrivateIterator, which replaces the current TBMIterator
for serial use cases, and repurpose TBMIterator to be a unified
interface for both the serial ("private") and parallel ("shared") TID
Bitmap iterator interfaces. This encapsulation simplifies call sites for
callers supporting both parallel and serial TID Bitmap access.
TBMIterator is not yet used in this commit.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Heikki Linnakangas
Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi
2024-12-18 18:19:28 -05:00
Nathan Bossart
76fd342496 Provide a better error message for misplaced dispatch options.
Before this patch, misplacing a special must-be-first option for
dispatching to a subprogram (e.g., postgres -D . --single) would
fail with an error like

	FATAL:  --single requires a value

This patch adjusts this error to more accurately complain that the
special option wasn't listed first.  The aforementioned error
message now looks like

	FATAL:  --single must be first argument

The dispatch option parsing code has been refactored for use
wherever ParseLongOption() is called.  Beyond the obvious advantage
of avoiding code duplication, this should prevent similar problems
when new dispatch options are added.  Note that we assume that none
of the dispatch option names match another valid command-line
argument, such as the name of a configuration parameter.

Ideally, we'd remove this must-be-first requirement for these
options, but after some investigation, we decided that wasn't worth
the added complexity and behavior changes.

Author: Nathan Bossart, Greg Sabino Mullane
Reviewed-by: Greg Sabino Mullane, Peter Eisentraut, Álvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/CAKAnmmJkZtZAiSryho%3DgYpbvC7H-HNjEDAh16F3SoC9LPu8rqQ%40mail.gmail.com
2024-12-04 15:04:15 -06:00
David Rowley
4171c44c9b Revert "Introduce CompactAttribute array in TupleDesc"
This reverts commit d28dff3f6c.

Quite a large number of buildfarm members didn't like this commit and
it's not yet clear why.  Reverting this before too many animals turn
red.

Discussion: https://postgr.es/m/CAApHDvr9i6T5=iAwQCxFDgMsthr_obVxgwBaEJkC8KUH6yM3Hw@mail.gmail.com
2024-12-03 17:12:38 +13:00
David Rowley
d28dff3f6c Introduce CompactAttribute array in TupleDesc
The new compact_attrs array stores a few select fields from
FormData_pg_attribute in a more compact way, using only 16 bytes per
column instead of the 104 bytes that FormData_pg_attribute uses.  Using
CompactAttribute allows performance-critical operations such as tuple
deformation to be performed without looking at the FormData_pg_attribute
element in TupleDesc which means fewer cacheline accesses.  With this
change, NAMEDATALEN could be increased with a much smaller negative impact
on performance.

For some workloads, tuple deformation can be the most CPU intensive part
of processing the query.  Some testing with 16 columns on a table
where the first column is variable length showed around a 10% increase in
transactions per second for an OLAP type query performing aggregation on
the 16th column.  However, in certain cases, the increases were much
higher, up to ~25% on one AMD Zen4 machine.

This also makes pg_attribute.attcacheoff redundant.  A follow-on commit
will remove it, thus shrinking the FormData_pg_attribute struct by 4
bytes.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
Reviewed-by: Andres Freund, Victor Yegorov
2024-12-03 16:50:59 +13:00
Tom Lane
e032e4c7dd Avoid mislabeling of lateral references, redux.
As I'd feared, commit 5c9d8636d was still a few bricks shy of a load.
We can't just leave pulled-up lateral-reference Vars with no new
nullingrels: we have to carefully compute what subset of the
to-be-replaced Var's nullingrels apply to them, else we still get
"wrong varnullingrels" errors.  This is a bit tedious, but it looks
like we can use the nullingrel data this patch computes for other
purposes, enabling better optimization.  We don't want to inject
unnecessary plan changes into stable branches though, so leave that
idea for a later HEAD-only patch.

Patch by me, but thanks to Richard Guo for devising a test case that
broke 5c9d8636d, and for preliminary investigation about how to fix
it.  As before, back-patch to v16.

Discussion: https://postgr.es/m/E1tGn4j-0003zi-MP@gemulon.postgresql.org
2024-11-30 12:42:19 -05:00
Andres Freund
262283d5ee ci: Fix cached MacPorts installation management
1.  The error reporting of "port setrequested list-of-packages..."
changed, hiding errors we were relying on to know if all packages in our
list were already installed.  Use a loop instead.

2.  The cached MacPorts installation was shared between PostgreSQL
major branches, though each branch wanted different packages.  Add the
list of packages to cache key, so that different branches, when tested
in one github account/repo such as postgres/postgres, stop fighting with
each other, adding and removing packages.

Back-patch to 15 where CI began.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/au2uqfuy2nf43nwy2txmc5t2emhwij7kzupygto3d2ffgtrdgr%40ckvrlwyflnh2
2024-11-27 11:51:54 -05:00
Peter Eisentraut
41272784b9 Fix typo
from commit 9044fc1d45
2024-11-27 11:20:46 +01:00
Alexander Korotkov
ae4569161a Teach bitmap path generation about transforming OR-clauses to SAOP's
When optimizer generates bitmap paths, it considers breaking OR-clause
arguments one-by-one.  But now, a group of similar OR-clauses can be
transformed into SAOP during index matching.  So, bitmap paths should
keep up.

This commit teaches bitmap paths generation machinery to group similar
OR-clauses into dedicated RestrictInfos.  Those RestrictInfos are considered
both to match index as a whole (as SAOP), or to match as a set of individual
OR-clause argument one-by-one (the old way).

Therefore, bitmap path generation will takes advantage of OR-clauses to SAOP's
transformation.  The old way of handling them is also considered.  So, there
shouldn't be planning regression.

Discussion: https://postgr.es/m/CAPpHfdu5iQOjF93vGbjidsQkhHvY2NSm29duENYH_cbhC6x%2BMg%40mail.gmail.com
Author: Alexander Korotkov, Andrey Lepikhov
Reviewed-by: Alena Rybakina, Andrei Lepikhov, Jian he, Robert Haas
Reviewed-by: Peter Geoghegan
2024-11-24 01:41:45 +02:00
Álvaro Herrera
90bcc7c2db Avoid deleting critical WAL segments during pg_rewind
Previously, in unlucky cases, it was possible for pg_rewind to remove
certain WAL segments from the rewound demoted primary.  In particular
this happens if those files have been marked for archival (i.e., their
.ready files were created) but not yet archived; the newly promoted node
no longer has such files because of them having been recycled, but they
are likely critical for recovery in the demoted node.  If pg_rewind
removes them, recovery is not possible anymore.

Fix this by maintaining a hash table of files in this situation in the
scan that looks for a checkpoint, which the decide_file_actions phase
can consult so that it knows to preserve them.

Backpatch to 14.  The problem also exists in 13, but that branch was not
blessed with commit eb00f1d4bf, so this patch is difficult to apply
there.  Users of older releases will just have to continue to be extra
careful when rewinding.

Co-authored-by: Полина Бунгина (Polina Bungina) <bungina@gmail.com>
Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Discussion: https://postgr.es/m/CAAtGL4AhzmBRsEsaDdz7065T+k+BscNadfTqP1NcPmsqwA5HBw@mail.gmail.com
2024-11-15 12:53:12 +01:00