1
0
mirror of https://github.com/postgres/postgres.git synced 2026-01-27 21:43:08 +03:00
Commit Graph

2911 Commits

Author SHA1 Message Date
Michael Paquier
f9a09aa295 Add wal_fpi_bytes to pg_stat_wal and pg_stat_get_backend_wal()
This new counter, called "wal_fpi_bytes", tracks the total amount in
bytes of full page images (FPIs) generated in WAL.  This data becomes
available globally via pg_stat_wal, and for backend statistics via
pg_stat_get_backend_wal().

Previously, this information could only be retrieved with pg_waldump or
pg_walinspect, which may not be available depending on the environment,
and are expensive to execute.  It offers hints about how much FPIs
impact the WAL generated, which could be a large percentage for some
workloads, as well as the effects of wal_compression or page holes.

Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the addition of wal_fpi_bytes in
PgStat_WalCounters.

Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOzEurQtZEAfg6P0kU3Wa-f9BWQOi0RzJEMPN56wNTOmJLmfaQ@mail.gmail.com
2025-10-28 16:21:51 +09:00
Amit Kapila
f0b3573c3a Introduce "REFRESH SEQUENCES" for subscriptions.
This patch adds support for a new SQL command:
ALTER SUBSCRIPTION ... REFRESH SEQUENCES
This command updates the sequence entries present in the
pg_subscription_rel catalog table with the INIT state to trigger
resynchronization.

In addition to the new command, the following subscription commands have
been enhanced to automatically refresh sequence mappings:
ALTER SUBSCRIPTION ... REFRESH PUBLICATION
ALTER SUBSCRIPTION ... ADD PUBLICATION
ALTER SUBSCRIPTION ... DROP PUBLICATION
ALTER SUBSCRIPTION ... SET PUBLICATION

These commands will perform the following actions:
Add newly published sequences that are not yet part of the subscription.
Remove sequences that are no longer included in the publication.

This ensures that sequence replication remains aligned with the current
state of the publication on the publisher side.

Note that the actual synchronization of sequence data/values will be
handled in a subsequent patch that introduces a dedicated sequence sync
worker.

Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-10-23 08:30:27 +00:00
Michael Paquier
2519fa8362 Bump catalog version for new function error_on_null()
Oversight in 2b75c38b70.  No comments.

Discussion: https://postgr.es/m/aPgu7kwiT4iGo6Ya@paquier.xyz
2025-10-22 10:08:47 +09:00
Michael Paquier
2b75c38b70 Add error_on_null(), checking if the input is the null value
This polymorphic function produces an error if the input value is
detected as being the null value; otherwise it returns the input value
unchanged.

This function can for example become handy in SQL function bodies, to
enforce that exactly one row was returned.

Author: Joel Jacobson <joel@compiler.org>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/ece8c6d1-2ab1-45d5-ba12-8dec96fc8886@app.fastmail.com
Discussion: https://postgr.es/m/de94808d-ed58-4536-9e28-e79b09a534c7@app.fastmail.com
2025-10-22 09:55:17 +09:00
Amit Kapila
41c674d2e3 Refactor logical worker synchronization code into a separate file.
To support the upcoming addition of a sequence synchronization worker,
this patch extracts common synchronization logic shared by table sync
workers and the new sequence sync worker into a dedicated file. This
modularization improves code reuse, maintainability, and clarity in the
logical workers framework.

Author: vignesh C <vignesh21@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-10-16 05:10:50 +00:00
Amit Kapila
96b3784973 Add "ALL SEQUENCES" support to publications.
This patch adds support for the ALL SEQUENCES clause in publications,
enabling synchronization/replication of all sequences that is useful for
upgrades.

Publications can now include all sequences via FOR ALL SEQUENCES.
psql enhancements:
\d shows publications for a given sequence.
\dRp indicates if a publication includes all sequences.

ALL SEQUENCES can be combined with ALL TABLES, but not with other options
like TABLE or TABLES IN SCHEMA. We can extend support for more granular
clauses in future.

The view pg_publication_sequences provides information about the mapping
between publications and sequences.

This patch enables publishing of sequences; subscriber-side support will
be added in upcoming patches.

Author: vignesh C <vignesh21@gmail.com>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-10-09 03:48:54 +00:00
Masahiko Sawada
d3b6183dd9 Add mem_exceeded_count column to pg_stat_replication_slots.
This commit introduces a new column mem_exceeded_count to the
pg_stat_replication_slots view. This counter tracks how often the
memory used by logical decoding exceeds the logical_decoding_work_mem
limit. The new statistic helps users determine whether exceeding the
logical_decoding_work_mem limit is a rare occurrences or a frequent
issue, information that wasn't available through existing statistics.

Bumps catversion.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/978D21E8-9D3B-40EA-A4B1-F87BABE7868C@yesql.se
2025-10-08 10:05:04 -07:00
Richard Guo
185e304263 Allow negative aggtransspace to indicate unbounded state size
This patch reuses the existing aggtransspace in pg_aggregate to
signal that an aggregate's transition state can grow unboundedly.  If
aggtransspace is set to a negative value, it now indicates that the
transition state may consume unpredictable or large amounts of memory,
such as in aggregates like array_agg or string_agg that accumulate
input rows.

This information can be used by the planner to avoid applying
memory-sensitive optimizations (e.g., eager aggregation) when there is
a risk of excessive memory usage during partial aggregation.

Bump catalog version.

Per idea from Robert Haas, though applied differently than originally
suggested.

Discussion: https://postgr.es/m/CA+TgmoYbkvYwLa+1vOP7RDY7kO2=A7rppoPusoRXe44VDOGBPg@mail.gmail.com
2025-10-08 17:01:48 +09:00
Michael Paquier
b71bae41a0 Add stats_reset to pg_stat_user_functions
It is possible to call pg_stat_reset_single_function_counters() for a
single function, but the reset time was missing the system view showing
its statistics.  Like all the fields of pg_stat_user_functions, the GUC
track_functions needs to be enabled to show the statistics about
function executions.

Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, as a result of the new field added to
PgStat_StatFuncEntry.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aONjnsaJSx-nEdfU@paquier.xyz
2025-10-08 12:43:40 +09:00
Amit Kapila
b93172ca59 Expose sequence page LSN via pg_get_sequence_data.
This patch enhances the pg_get_sequence_data function to include the
page-level LSN (Log Sequence Number) of the sequence. This additional
metadata will be used by upcoming patches to support synchronization
of sequences during logical replication.

By exposing the LSN, we enable more accurate tracking of sequence
changes, which is essential for maintaining consistency across
replicated nodes.

Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-10-06 08:30:16 +00:00
Michael Paquier
a5b543258a Add stats_reset to pg_stat_all_{tables,indexes} and related views
It is possible to call pg_stat_reset_single_table_counters() on a
relation (index or table) but the reset time was missing from the system
views showing their statistics.

This commit adds the reset time as an attribute of pg_stat_all_tables,
pg_stat_all_indexes, and other relations related to them.

Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, as a result of the new field added to
PgStat_StatTabEntry.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aN8l182jKxEq1h9f@paquier.xyz
2025-10-06 15:31:21 +09:00
Tom Lane
ef38a4d975 Add GROUP BY ALL.
GROUP BY ALL is a form of GROUP BY that adds any TargetExpr that does
not contain an aggregate or window function into the groupClause of
the query, making it exactly equivalent to specifying those same
expressions in an explicit GROUP BY list.

This feature is useful for certain kinds of data exploration.  It's
already present in some other DBMSes, and the SQL committee recently
accepted it into the standard, so we can be reasonably confident in
the syntax being stable.  We do have to invent part of the semantics,
as the standard doesn't allow for expressions in GROUP BY, so they
haven't specified what to do with window functions.  We assume that
those should be treated like aggregates, i.e., left out of the
constructed GROUP BY list.

In passing, wordsmith some existing documentation about GROUP BY,
and update some neglected synopsis entries in select_into.sgml.

Author: David Christensen <david@pgguru.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAHM0NXjz0kDwtzoe-fnHAqPB1qA8_VJN0XAmCgUZ+iPnvP5LbA@mail.gmail.com
2025-09-29 16:55:17 -04:00
Tom Lane
261f89a976 Track the maximum possible frequency of non-MCE array elements.
The lossy-counting algorithm that ANALYZE uses to identify most-common
array elements has a notion of cutoff frequency: elements with
frequency greater than that are guaranteed to be collected, elements
with smaller frequencies are not.  In cases where we find fewer MCEs
than the stats target would permit us to store, the cutoff frequency
provides valuable additional information, to wit that there are no
non-MCEs with frequency greater than that.  What the selectivity
estimation functions actually use the "minfreq" entry for is as a
ceiling on the possible frequency of non-MCEs, so using the cutoff
rather than the lowest stored MCE frequency provides a tighter bound
and more accurate estimates.

Therefore, instead of redundantly storing the minimum observed MCE
frequency, store the cutoff frequency when there are fewer tracked
values than we want.  (When there are more, then of course we cannot
assert that no non-stored elements are above the cutoff frequency,
since we're throwing away some that are; so we still use the
minimum stored frequency in that case.)

Notably, this works even when none of the values are common enough
to be called MCEs.  In such cases we previously stored nothing in
the STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the
selectivity functions falling back to default estimates.  So in that
case we want to construct a STATISTIC_KIND_MCELEM entry that contains
no "values" but does have "numbers", to wit the three extra numbers
that the MCELEM entry type defines.  A small obstacle is that
update_attstats() has traditionally stored a null, not an empty array,
when passed zero "values" for a slot.  That gives rise to an MCELEM
entry that get_attstatsslot() will spit up on.  The least risky
solution seems to be to adjust update_attstats() so that it will emit
a non-null (but possibly empty) array when the passed stavalues array
pointer isn't NULL, rather than conditioning that on numvalues > 0.
In other existing cases I don't believe that that changes anything.
For consistency, handle the stanumbers array the same way.

In passing, improve the comments in routines that use
STATISTIC_KIND_MCELEM data.  Particularly, explain why we use
minfreq / 2 not minfreq as the estimate for non-MCE values.

Thanks to Matt Long for the suggestion that we could apply this
idea even when there are more than zero MCEs.

Reported-by: Mark Frost <FROSTMAR@uk.ibm.com>
Reported-by: Matt Long <matt@mattlong.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
2025-09-20 14:48:16 -04:00
Amit Kapila
5b148706c5 Add optional pid parameter to pg_replication_origin_session_setup().
Commit 216a784829 introduced parallel apply workers, allowing multiple
processes to share a replication origin. To support this,
replorigin_session_setup() was extended to accept a pid argument
identifying the process using the origin.

This commit exposes that capability through the SQL interface function
pg_replication_origin_session_setup() by adding an optional pid parameter.
This enables multiple processes to coordinate replication using the same
origin when using SQL-level replication functions.

This change allows the non-builtin logical replication solutions to
implement parallel apply for large transactions.

Additionally, an existing internal error was made user-facing, as it can
now be triggered via the exposed SQL API.

Author: Doruk Yilmaz <doruk@mixrank.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com
Discussion: https://postgr.es/m/CAE2gYzyTSNvHY1+iWUwykaLETSuAZsCWyryokjP6rG46ZvRgQA@mail.gmail.com
2025-09-19 05:38:40 +00:00
Tom Lane
83a5641945 Provide more-specific error details/hints for function lookup failures.
Up to now we've contented ourselves with a one-size-fits-all error
hint when we fail to find any match to a function or procedure call.
That was mostly okay in the beginning, but it was never great, and
since the introduction of named arguments it's really not adequate.
We at least ought to distinguish "function name doesn't exist" from
"function name exists, but not with those argument names".  And the
rules for named-argument matching are arcane enough that some more
detail seems warranted if we match the argument names but the call
still doesn't work.

This patch creates a framework for dealing with these problems:
FuncnameGetCandidates and related code will now pass back a bitmask of
flags showing how far the match succeeded.  This allows a considerable
amount of granularity in the reports.  The set-bits-in-a-bitmask
approach means that when there are multiple candidate functions, the
report will reflect the match(es) that got the furthest, which seems
correct.  Also, we can avoid mentioning "maybe add casts" unless
failure to match argument types is actually the issue.

Extend the same return-a-bitmask approach to OpernameGetCandidates.
The issues around argument names don't apply to operator syntax,
but it still seems worth distinguishing between "there is no
operator of that name" and "we couldn't match the argument types".

While at it, adjust these messages and related ones to more strictly
separate "detail" from "hint", following our message style guidelines'
distinction between those.

Reported-by: Dominique Devienne <ddevienne@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/1756041.1754616558@sss.pgh.pa.us
2025-09-16 12:17:02 -04:00
Dean Rasheed
faf071b553 Add date and timestamp variants of random(min, max).
This adds 3 new variants of the random() function:

    random(min date, max date) returns date
    random(min timestamp, max timestamp) returns timestamp
    random(min timestamptz, max timestamptz) returns timestamptz

Each returns a random value x in the range min <= x <= max.

Author: Damien Clochard <damien@dalibo.info>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/f524d8cab5914613d9e624d9ce177d3d@dalibo.info
2025-09-09 10:39:30 +01:00
Amit Kapila
a850be2fe6 Add max_retention_duration option to subscriptions.
This commit introduces a new subscription parameter,
max_retention_duration, aimed at mitigating excessive accumulation of dead
tuples when retain_dead_tuples is enabled and the apply worker lags behind
the publisher.

When the time spent advancing a non-removable transaction ID exceeds the
max_retention_duration threshold, the apply worker will stop retaining
conflict detection information. In such cases, the conflict slot's xmin
will be set to InvalidTransactionId, provided that all apply workers
associated with the subscription (with retain_dead_tuples enabled) confirm
the retention duration has been exceeded.

To ensure retention status persists across server restarts, a new column
subretentionactive has been added to the pg_subscription catalog. This
prevents unnecessary reactivation of retention logic after a restart.

The conflict detection slot will not be automatically re-initialized
unless a new subscription is created with retain_dead_tuples = true, or
the user manually re-enables retain_dead_tuples.

A future patch will introduce support for automatic slot re-initialization
once at least one apply worker confirms that the retention duration is
within the configured max_retention_duration.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-09-02 03:20:18 +00:00
Álvaro Herrera
325fc0ab14 Avoid including commands/dbcommands.h in so many places
This has been done historically because of get_database_name (which
since commit cb98e6fb8f belongs in lsyscache.c/h, so let's move it
there) and get_database_oid (which is in the right place, but whose
declaration should appear in pg_database.h rather than dbcommands.h).
Clean this up.

Also, xlogreader.h and stringinfo.h are no longer needed by dbcommands.h
since commit f1fd515b39, so remove them.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/202508191031.5ipojyuaswzt@alvherre.pgsql
2025-08-28 12:39:04 +02:00
Peter Eisentraut
16d434d53d Add src/include/catalog/README
This just includes a link to the bki documentation, to help people get
started.

Before commit 372728b0d4, there was a README at
src/backend/catalog/README, but then this was moved to the SGML
documentation.  So this effectively puts back a link to what was
moved.  But src/include/catalog/ is probably a better location,
because that's where all the interesting files are.

Co-authored-by: Florents Tselai <florents.tselai@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+v5N400GJFJ9RyXAX7hFKbtF7vVQGvWdFWEfcSQmvVhi9xfrA@mail.gmail.com
2025-08-19 08:41:42 +02:00
Tom Lane
ee54046601 Grab the low-hanging fruit from forcing USE_FLOAT8_BYVAL to true.
Remove conditionally-compiled code for the other case.

Replace uses of FLOAT8PASSBYVAL with constant "true", mainly because
it was quite confusing in cases where the type we were dealing with
wasn't float8.

I left the associated pg_control and Pg_magic_struct fields in place.
Perhaps we should get rid of them, but it would save little, so it
doesn't seem worth thinking hard about the compatibility implications.
I just labeled them "vestigial" in places where that seemed helpful.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us
2025-08-13 17:18:22 -04:00
Tom Lane
2a600a93c7 Make type Datum be 8 bytes wide everywhere.
This patch makes sizeof(Datum) be 8 on all platforms including
32-bit ones.  The objective is to allow USE_FLOAT8_BYVAL to be true
everywhere, and in consequence to remove a lot of code that is
specific to pass-by-reference handling of float8, int8, etc.  The
code for abbreviated sort keys can be simplified similarly.  In this
way we can reduce the maintenance effort involved in supporting 32-bit
platforms, without going so far as to actually desupport them.  Since
Datum is strictly an in-memory concept, this has no impact on on-disk
storage, though an initdb or pg_upgrade will be needed to fix affected
catalog entries.

We have required platforms to support [u]int64 for ages, so this
breaks no supported platform.  We can expect that this change will
make 32-bit builds a bit slower and more memory-hungry, although being
able to use pass-by-value handling of 8-byte types may buy back some
of that.  But we stopped optimizing for 32-bit cases a long time ago,
and this seems like just another step on that path.

This initial patch simply forces the correct type definition and
USE_FLOAT8_BYVAL setting, and cleans up a couple of minor compiler
complaints that ensued.  This is sufficient for testing purposes.
In the wake of a bunch of Datum-conversion cleanups by Peter
Eisentraut, this now compiles cleanly with gcc on a 32-bit platform.
(I'd only tested the previous version with clang, which it turns out
is less picky than gcc about width-changing coercions.)

There is a good deal of now-dead code that I'll remove in separate
follow-up patches.

A catversion bump is required because this affects initial catalog
contents (on 32-bit machines) in two ways: pg_type.typbyval changes
for some built-in types, and Const nodes in stored views/rules will
now have 8 bytes not 4 for pass-by-value types.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us
2025-08-13 17:18:22 -04:00
Masahiko Sawada
deb674454c Add backup_type column to pg_stat_progress_basebackup.
This commit introduces a new column backup_type that indicates the
type of backup being performed: either 'full' or 'incremental'.

Bump catalog version.

Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/CAOzEurQuzbHwTj1ehk1a+eeQDidJPyrE5s6mYumkjwjZnurhkQ@mail.gmail.com
2025-08-05 10:50:45 -07:00
Amit Kapila
fd5a1a0c3e Detect and report update_deleted conflicts.
This enhancement builds upon the infrastructure introduced in commit
228c370868, which enables the preservation of deleted tuples and their
origin information on the subscriber. This capability is crucial for
handling concurrent transactions replicated from remote nodes.

The update introduces support for detecting update_deleted conflicts
during the application of update operations on the subscriber. When an
update operation fails to locate the target row-typically because it has
been concurrently deleted-we perform an additional table scan. This scan
uses the SnapshotAny mechanism and we do this additional scan only when
the retain_dead_tuples option is enabled for the relevant subscription.

The goal of this scan is to locate the most recently deleted tuple-matching
the old column values from the remote update-that has not yet been removed
by VACUUM and is still visible according to our slot (i.e., its deletion
is not older than conflict-detection-slot's xmin). If such a tuple is
found, the system reports an update_deleted conflict, including the origin
and transaction details responsible for the deletion.

This provides a groundwork for more robust and accurate conflict
resolution process, preventing unexpected behavior by correctly
identifying cases where a remote update clashes with a deletion from
another origin.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-08-04 04:02:47 +00:00
Amit Kapila
2ab2d6f970 Fix a deadlock during ALTER SUBSCRIPTION ... DROP PUBLICATION.
A deadlock can occur when the DDL command and the apply worker acquire
catalog locks in different orders while dropping replication origins.

The issue is rare in PG16 and higher branches because, in most cases, the
tablesync worker performs the origin drop in those branches, and its
locking sequence does not conflict with DDL operations.

This patch ensures consistent lock acquisition to prevent such deadlocks.

As per buildfarm.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/bab95e12-6cc5-4ebb-80a8-3e41956aa297@gmail.com
2025-08-01 07:58:48 +00:00
Amit Kapila
228c370868 Preserve conflict-relevant data during logical replication.
Logical replication requires reliable conflict detection to maintain data
consistency across nodes. To achieve this, we must prevent premature
removal of tuples deleted by other origins and their associated commit_ts
data by VACUUM, which could otherwise lead to incorrect conflict reporting
and resolution.

This patch introduces a mechanism to retain deleted tuples on the
subscriber during the application of concurrent transactions from remote
nodes. Retaining these tuples allows us to correctly ignore concurrent
updates to the same tuple. Without this, an UPDATE might be misinterpreted
as an INSERT during resolutions due to the absence of the original tuple.

Additionally, we ensure that origin metadata is not prematurely removed by
vacuum freeze, which is essential for detecting update_origin_differs and
delete_origin_differs conflicts.

To support this, a new replication slot named pg_conflict_detection is
created and maintained by the launcher on the subscriber. Each apply
worker tracks its own non-removable transaction ID, which the launcher
aggregates to determine the appropriate xmin for the slot, thereby
retaining necessary tuples.

Conflict information retention (deleted tuples and commit_ts) can be
enabled per subscription via the retain_conflict_info option. This is
disabled by default to avoid unnecessary overhead for configurations that
do not require conflict resolution or logging.

During upgrades, if any subscription on the old cluster has
retain_conflict_info enabled, a conflict detection slot will be created to
protect relevant tuples from deletion when the new cluster starts.

This is a foundational work to correctly detect update_deleted conflict
which will be done in a follow-up patch.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2025-07-23 02:56:00 +00:00
Nathan Bossart
167ed8082f Introduce pg_dsm_registry_allocations view.
This commit adds a new system view that provides information about
entries in the dynamic shared memory (DSM) registry.  Specifically,
it returns the name, type, and size of each entry.  Note that since
we cannot discover the size of dynamic shared memory areas (DSAs)
and hash tables backed by DSAs (dshashes) without first attaching
to them, the size column is left as NULL for those.

Bumps catversion.

Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Sungwoo Chang <swchangdev@gmail.com>
Discussion: https://postgr.es/m/4D445D3E-81C5-4135-95BB-D414204A0AB4%40gmail.com
2025-07-09 09:17:56 -05:00
Amit Kapila
b5cd0ecd4d Fix typo in pg_publication.h.
Author: shveta malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CAJpy0uAyFN9o7vU_ZkZFv5-6ysXDNKNx_fC0gwLLKg=8==E3ow@mail.gmail.com
2025-07-01 15:17:03 +05:30
Nathan Bossart
bd09f024a1 Add new OID alias type regdatabase.
This provides a convenient way to look up a database's OID.  For
example, the query

    SELECT * FROM pg_shdepend
    WHERE dbid = (SELECT oid FROM pg_database
                  WHERE datname = current_database());

can now be simplified to

    SELECT * FROM pg_shdepend
    WHERE dbid = current_database()::regdatabase;

Like the regrole type, regdatabase has cluster-wide scope, so we
disallow regdatabase constants from appearing in stored
expressions.

Bumps catversion.

Author: Ian Lawrence Barwick <barwick@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aBpjJhyHpM2LYcG0%40nathan
2025-06-30 15:38:54 -05:00
Joe Conway
9c5b9a280c Do pre-release housekeeping on catalog data.
Run renumber_oids.pl to move high-numbered OIDs down, as per pre-beta
tasks specified by RELEASE_CHANGES.  For reference, the command was

./renumber_oids.pl --first-mapped-oid 8000 --target-oid 6300

This should have been done prior to beta1, but it was forgotten. This
will ensure we get the correct numbering for beta2 onward.
2025-06-29 21:43:39 -04:00
Joe Conway
0ebd242555 Run pgperltidy
This is required before the creation of a new branch.  pgindent is
clean, as well as is reformat-dat-files.

perltidy version is v20230309, as documented in pgindent's README.
2025-06-29 21:14:21 -04:00
Peter Eisentraut
0cd69b3d7e Restrict virtual columns to use built-in functions and types
Just like selecting from a view is exploitable (CVE-2024-7348),
selecting from a table with virtual generated columns is exploitable.
Users who are concerned about this can avoid selecting from views, but
telling them to avoid selecting from tables is less practical.

To address this, this changes it so that generation expressions for
virtual generated columns are restricted to using built-in functions
and types, and the columns are restricted to having a built-in type.
We assume that built-in functions and types cannot be exploited for
this purpose.

In the future, this could be expanded by some new mechanism to declare
other functions and types as safe or trusted for this purpose, but
that is to be designed.

(An alternative approach might have been to expand the
restrict_nonsystem_relation_kind GUC to handle this, like the fix for
CVE-2024-7348.  But that is kind of an ugly approach.  That fix had to
fit in the constraints of fixing an ancient vulnerability in all
branches.  Since virtual generated columns are new, we're free from
the constraints of the past, and we can and should use cleaner
options.)

Reported-by: Feike Steenbergen <feikesteenbergen@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAK_s-G2Q7de8Q0qOYUR%3D_CTB5FzzVBm5iZjOp%2BmeVWpMpmfO0w%40mail.gmail.com
2025-06-25 09:56:49 +02:00
Álvaro Herrera
0f65f3eec4 Fix squashing algorithm for query texts
The algorithm to squash lists of constants added by commit 62d712ecfd
was a bit too simplistic; we wanted to avoid adding unnecessary
complexity, but cases like direct function calls of typecasting
functions (and others) were missed, and bogus SQL syntax was being shown
in pg_stat_statements normalized query text field.  To fix normalization
for those cases, we need the parser to transmit information about were
each list of constant values starts and ends, so add that to a couple of
nodes.  Also add a few more test cases to make sure we're doing the
right thing.

The patch initially submitted by Sami added a new private struct in
gram.y to carry the start/end information for A_Expr, but I (Álvaro)
decided that a better fix was to remove the parser indirection via the
in_expr production, and instead create separate components in the a_expr
rule.  I'm surprised that this works and doesn't require more changes,
but I assume (without checking) that the grammar used to be more complex
and got simplified at some point.

Bump catversion.

Author: Sami Imseih <samimseih@gmail.com>
Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAA5RZ0tRXoPG2y6bMgBCWNDt0Tn=unRerbzYM=oW0syi1=C1OA@mail.gmail.com
2025-06-12 14:21:21 +02:00
Peter Eisentraut
32edf732e8 Rename gist stratnum support function
Commit 7406ab623f added a gist support function that we internally
refer to by the symbol GIST_STRATNUM_PROC.  This translated from
"well-known" strategy numbers to opfamily-specific strategy numbers.
However, we later (commit 630f9a43ce) changed this to fit into
index-AM-level compare type mapping, so this function actually now
maps from compare type to opfamily-specific strategy numbers.  So this
name is no longer fitting.

Moreover, the index AM level also supports the opposite, a function to
map from strategy number to compare type.  This is currently not
supported in gist, but one might wonder what this function is supposed
to be called when it is added.

This patch changes the naming of the gist-level functionality to be
more in line with the index-AM-level functionality.  This makes sense
because these are essentially the same thing on different levels.
This also changes the names of the externally visible functions that
are provided for use as such a support function.

Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/37ebb1d9-9036-485f-a215-e55435689917%40eisentraut.org
2025-06-02 08:41:27 +02:00
Daniel Gustafsson
fb844b9f06 Revert function to get memory context stats for processes
Due to concerns raised about the approach, and memory leaks found
in sensitive contexts the functionality is reverted. This reverts
commits 45e7e8ca9, f8c115a6c, d2a1ed172, 55ef7abf8 and 042a66291
for v18 with an intent to revisit this patch for v19.

Discussion: https://postgr.es/m/594293.1747708165@sss.pgh.pa.us
2025-05-23 15:44:54 +02:00
Nathan Bossart
16bf24e0e4 Remove pg_replication_origin's TOAST table.
A few places that access this catalog don't set up an active
snapshot before potentially accessing its TOAST table.  However,
roname (the replication origin name) is the only varlena column, so
this is only a problem if the name requires out-of-line storage.
This commit removes its TOAST table to avoid needing to set up a
snapshot.  It also places a limit on replication origin names so
that attempts to set long names will fail with a more user-friendly
error.  Those chosen limit of 512 bytes should be sufficient to
avoid "row is too big" errors independent of BLCKSZ, but it should
also be lenient enough for all reasonable use-cases.

Bumps catversion.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan
2025-05-07 14:47:36 -05:00
Noah Misch
f4ece891fc Assert lack of hazardous buffer locks before possible catalog read.
Commit 0bada39c83 fixed a bug of this kind,
which existed in all branches for six days before detection.  While the
probability of reaching the trouble was low, the disruption was extreme.  No
new backends could start, and service restoration needed an immediate
shutdown.  Hence, add this to catch the next bug like it.

The new check in RelationIdGetRelation() suffices to make autovacuum detect
the bug in commit 243e9b40f1 that led to commit
0bada39.  This also checks in a number of similar places.  It replaces each
Assert(IsTransactionState()) that pertained to a conditional catalog read.

No back-patch for now, but a back-patch of commit 243e9b4 should back-patch
this, too.  A back-patch could omit the src/test/regress changes, since back
branches won't gain new index columns.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/20250410191830.0e.nmisch@google.com
Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
2025-04-17 05:00:30 -07:00
Daniel Gustafsson
ef366b7d7e Perform missed catversion bump
Commit c57971034e renamed an argument for a function but missed
to bump the catversion to reflect this.

Reported-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqOega=dPtu3h2C5fJWJEuaGCMDib_sVfhKQqgUNJVmFA@mail.gmail.com
2025-04-09 09:29:12 +02:00
Daniel Gustafsson
c57971034e Rename argument in pg_get_process_memory_contexts().
During development the third argument to pg_get_process_memory_contexts
was a retry count, but it was changed to a timeout instead.  The param
name was accidentally left in pg_proc.dat though.  Fix by renaming to
the correct parameter name.

Author: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/3eb40b3e-45c7-426a-b7f8-81f7d05a9b53@oss.nttdata.com
2025-04-08 23:09:13 +02:00
Daniel Gustafsson
042a66291b Add function to get memory context stats for processes
This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.

When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory.  Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.

A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.

In order to not block on busy processes the caller specifies
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout,  the last known statistics are returned, or NULL if
no previously published statistics exist.  This allows dash-
board type queries to continually publish even if the target
process is temporarily congested.  Context records contain a
timestamp to indicate when they were submitted.

Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Discussion: https://postgr.es/m/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw@mail.gmail.com
2025-04-08 11:06:56 +02:00
Tomas Vondra
8cc139bec3 Introduce pg_shmem_allocations_numa view
Introduce new pg_shmem_alloctions_numa view with information about how
shared memory is distributed across NUMA nodes. For each shared memory
segment, the view returns one row for each NUMA node backing it, with
the total amount of memory allocated from that node.

The view may be relatively expensive, especially when executed for the
first time in a backend, as it has to touch all memory pages to get
reliable information about the NUMA node. This may also force allocation
of the shared memory.

Unlike pg_shmem_allocations, the view does not show anonymous shared
memory allocations. It also does not show memory allocated using the
dynamic shared memory infrastructure.

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com
2025-04-07 23:08:17 +02:00
Tomas Vondra
65c298f61f Add support for basic NUMA awareness
Add basic NUMA awareness routines, using a minimal src/port/pg_numa.c
portability wrapper and an optional build dependency, enabled by
--with-libnuma configure option. For now this is Linux-only, other
platforms may be supported later.

A built-in SQL function pg_numa_available() allows checking NUMA
support, i.e. that the server was built/linked with the NUMA library.

The main function introduced is pg_numa_query_pages(), which allows
determining the NUMA node for individual memory pages. Internally the
function uses move_pages(2) syscall, as it allows batching, and is more
efficient than get_mempolicy(2).

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com
2025-04-07 23:08:17 +02:00
Tom Lane
b73e6d71a8 Fix erroneous construction of functions' dependencies on transforms.
The list of transform objects that a function should use is specified
in CREATE FUNCTION's TRANSFORM clause, and then represented indirectly
in pg_proc.protrftypes.  However, ProcedureCreate completely ignored
that for purposes of constructing pg_depend entries, and instead made
the function depend on any transforms that exist for its parameter or
return data types.  This is bad in both directions: the function could
be made dependent on a transform it does not actually use, or it
could try to use a transform that's since been dropped.  (The latter
scenario would require use of a transform that's not for any of the
parameter or return types, but that seems legit for cases where the
function performs SQL operations internally.)

To fix, pass in the list of transform objects that CreateFunction
identified, and build pg_depend entries from that not from the
parameter/return types.  This results in changes in the expected
test outputs in contrib/bool_plperl, which I guess are due to
different ordering of pg_depend entries -- that test case is
surely not exercising either of the problem scenarios.

This fix is not back-patchable as-is: changing the signature of
ProcedureCreate seems too risky in stable branches.  We could
do something like making ProcedureCreate a wrapper around
ProcedureCreateExt or so.  However, I'm more inclined to do
nothing in the back branches.  We had no field complaints up to
now, so the hazards don't seem to be a big issue in practice.
And we couldn't do anything about existing pg_depend entries,
so a back-patched fix would result in a mishmash of dependencies
created according to different rules.  That cure could be worse
than the disease, perhaps.

I bumped catversion just to lay down a marker that the expected
contents of pg_depend are a bit different than before.

Reported-by: Chapman Flack <jcflack@acm.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3112950.1743984111@sss.pgh.pa.us
2025-04-07 13:31:37 -04:00
Álvaro Herrera
a379061a22 Allow NOT NULL constraints to be added as NOT VALID
This allows them to be added without scanning the table, and validating
them afterwards without holding access exclusive lock on the table after
any violating rows have been deleted or fixed.

Doing ALTER TABLE ... SET NOT NULL for a column that has an invalid
not-null constraint validates that constraint.  ALTER TABLE .. VALIDATE
CONSTRAINT is also supported.  There are various checks on whether an
invalid constraint is allowed in a child table when the parent table has
a valid constraint; this should match what we do for enforced/not
enforced constraints.

pg_attribute.attnotnull is now only an indicator for whether a not-null
constraint exists for the column; whether it's valid or invalid must be
queried in pg_constraint.  Applications can continue to query
pg_attribute.attnotnull as before, but now it's possible that NULL rows
are present in the column even when that's set to true.

For backend internal purposes, we cache the nullability status in
CompactAttribute->attnullability that each tuple descriptor carries
(replacing CompactAttribute.attnotnull, which was a mirror of
Form_pg_attribute.attnotnull).  During the initial tuple descriptor
creation, based on the pg_attribute scan, we set this to UNRESTRICTED if
pg_attribute.attnotnull is false, or to UNKNOWN if it's true; then we
update the latter to VALID or INVALID depending on the pg_constraint
scan.  This flag is also copied when tupledescs are copied.

Comparing tuple descs for equality must also compare the
CompactAttribute.attnullability flag and return false in case of a
mismatch.

pg_dump deals with these constraints by storing the OIDs of invalid
not-null constraints in a separate array, and running a query to obtain
their properties.  The regular table creation SQL omits them entirely.
They are then dealt with in the same way as "separate" CHECK
constraints, and dumped after the data has been loaded.  Because no
additional pg_dump infrastructure was required, we don't bump its
version number.

I decided not to bump catversion either, because the old catalog state
works perfectly in the new world.  (Trying to run with new catalog state
and the old server version would likely run into issues, however.)

System catalogs do not support invalid not-null constraints (because
commit 14e87ffa5c didn't allow them to have pg_constraint rows
anyway.)

Author: Rushabh Lathia <rushabh.lathia@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Tested-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAGPqQf0KitkNack4F5CFkFi-9Dqvp29Ro=EpcWt=4_hs-Rt+bQ@mail.gmail.com
2025-04-07 19:19:50 +02:00
Peter Geoghegan
92fe23d93a Add nbtree skip scan optimization.
Teach nbtree multi-column index scans to opportunistically skip over
irrelevant sections of the index given a query with no "=" conditions on
one or more prefix index columns.  When nbtree is passed input scan keys
derived from a predicate "WHERE b = 5", new nbtree preprocessing steps
output "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys.
That is, preprocessing generates a "skip array" (and an output scan key)
for the omitted prefix column "a", which makes it safe to mark the scan
key on "b" as required to continue the scan.  The scan is therefore able
to repeatedly reposition itself by applying both the "a" and "b" keys.

A skip array has "elements" that are generated procedurally and on
demand, but otherwise works just like a regular ScalarArrayOp array.
Preprocessing can freely add a skip array before or after any input
ScalarArrayOp arrays.  Index scans with a skip array decide when and
where to reposition the scan using the same approach as any other scan
with array keys.  This design builds on the design for array advancement
and primitive scan scheduling added to Postgres 17 by commit 5bf748b8.

Testing has shown that skip scans of an index with a low cardinality
skipped prefix column can be multiple orders of magnitude faster than an
equivalent full index scan (or sequential scan).  In general, the
cardinality of the scan's skipped column(s) limits the number of leaf
pages that can be skipped over.

The core B-Tree operator classes on most discrete types generate their
array elements with the help of their own custom skip support routine.
This infrastructure gives nbtree a way to generate the next required
array element by incrementing (or decrementing) the current array value.
It can reduce the number of index descents in cases where the next
possible indexable value frequently turns out to be the next value
stored in the index.  Opclasses that lack a skip support routine fall
back on having nbtree "increment" (or "decrement") a skip array's
current element by setting the NEXT (or PRIOR) scan key flag, without
directly changing the scan key's sk_argument.  These sentinel values
behave just like any other value from an array -- though they can never
locate equal index tuples (they can only locate the next group of index
tuples containing the next set of non-sentinel values that the scan's
arrays need to advance to).

A skip array's range is constrained by "contradictory" inequality keys.
For example, a skip array on "x" will only generate the values 1 and 2
given a qual such as "WHERE x BETWEEN 1 AND 2 AND y = 66".  Such a skip
array qual usually has near-identical performance characteristics to a
comparable SAOP qual "WHERE x = ANY('{1, 2}') AND y = 66".  However,
improved performance isn't guaranteed.  Much depends on physical index
characteristics.

B-Tree preprocessing is optimistic about skipping working out: it
applies static, generic rules when determining where to generate skip
arrays, which assumes that the runtime overhead of maintaining skip
arrays will pay for itself -- or lead to only a modest performance loss.
As things stand, these assumptions are much too optimistic: skip array
maintenance will lead to unacceptable regressions with unsympathetic
queries (queries whose scan can't skip over many irrelevant leaf pages).
An upcoming commit will address the problems in this area by enhancing
_bt_readpage's approach to saving cycles on scan key evaluation, making
it work in a way that directly considers the needs of = array keys
(particularly = skip array keys).

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiro Ikeda <masahiro.ikeda@nttdata.com>
Reviewed-By: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-By: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: https://postgr.es/m/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com
2025-04-04 12:27:04 -04:00
Fujii Masao
0d6c477664 Extend ALTER DEFAULT PRIVILEGES to define default privileges for large objects.
Previously, ALTER DEFAULT PRIVILEGES did not support large objects.
This meant that to grant privileges to users other than the owner,
permissions had to be manually assigned each time a large object
was created, which was inconvenient.

This commit extends ALTER DEFAULT PRIVILEGES to allow defining default
access privileges for large objects. With this change, specified privileges
will automatically apply to newly created large objects, making privilege
management more efficient.

As a side effect, this commit introduces the new keyword OBJECTS
since it's used in the syntax of ALTER DEFAULT PRIVILEGES.

Original patch by Haruka Takatsuka, with some fixes and tests by Yugo Nagata,
and rebased by Laurenz Albe.

Author: Takatsuka Haruka <harukat@sraoss.co.jp>
Co-authored-by: Yugo Nagata <nagata@sraoss.co.jp>
Co-authored-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Masao Fujii <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20240424115242.236b499b2bed5b7a27f7a418@sraoss.co.jp
2025-04-04 19:02:17 +09:00
Amit Kapila
4868c96bc8 Fix slot synchronization for two_phase enabled slots.
The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.

To address the issue on HEAD, the two_phase_at field of the slot is
exposed by the pg_replication_slots view and allows the slot
synchronization to copy this value to the corresponding synced slot on the
standby server.

This bug is likely to occur if the user toggles the two_phase option to
true after initial slot creation. Given that altering the two_phase option
of a replication slot is not allowed in PostgreSQL 17, this bug is less
likely to occur. We can't change the view/function definition in
backbranch so we can't push the same fix but we are brainstorming an
appropriate solution for PG17.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/TYAPR01MB5724CC7C288535BBCEEE65DA94A72@TYAPR01MB5724.jpnprd01.prod.outlook.com
2025-04-03 12:26:54 +05:30
Heikki Linnakangas
e9e7b66044 Add GiST and btree sortsupport routines for range types
For GiST, having a sortsupport function allows building the index
using the "sorted build" method, which is much faster.

For b-tree, the sortsupport routine doesn't give any new
functionality, but speeds up sorting a tiny bit. The difference is not
very significant, about 2% in cursory testing on my laptop, because
the range type comparison function has quite a lot of overhead from
detoasting. In any case, since we have the function for GiST anyway,
we might as well register it for the btree opfamily too.

Author: Bernd Helmle <mailings@oopsware.de>
Discussion: https://www.postgresql.org/message-id/64d324ce2a6d535d3f0f3baeeea7b25beff82ce4.camel@oopsware.de
2025-04-02 19:51:28 +03:00
Tom Lane
6c12ae09f5 Introduce a SQL-callable function array_sort(anyarray).
Create a function that will sort the elements of an array
according to the element type's sort order.  If the array
has more than one dimension, the sub-arrays of the first
dimension are sorted per normal array-comparison rules,
leaving their contents alone.

In support of this, add pg_type.typarray to the set of fields
cached by the typcache.

Author: Junwang Zhao <zhjwpku@gmail.com>
Co-authored-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/CAEG8a3J41a4dpw_-F94fF-JPRXYxw-GfsgoGotKcjs9LVfEEvw@mail.gmail.com
2025-04-01 18:03:55 -04:00
Andres Freund
60f566b4f2 aio: Add pg_aios view
The new view lists all IO handles that are currently in use and is mainly
useful for PG developers, but may also be useful when tuning PG.

Bumps catversion.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-04-01 13:30:33 -04:00
Tom Lane
9324c8c580 Introduce PG_MODULE_MAGIC_EXT macro.
This macro allows dynamically loaded shared libraries (modules) to
provide a wired-in module name and version, and possibly other
compile-time-constant fields in future.  This information can be
retrieved with the new pg_get_loaded_modules() function.

This feature is expected to be particularly useful for modules
that do not have any exposed SQL functionality and thus are
not associated with a SQL-level extension object.  But even for
modules that do belong to extensions, being able to verify the
actual code version can be useful.

Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Yurii Rashkovskii <yrashk@omnigres.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com
2025-03-26 11:06:12 -04:00