1
0
mirror of https://github.com/postgres/postgres.git synced 2025-09-11 00:12:06 +03:00
Commit Graph

4965 Commits

Author SHA1 Message Date
Peter Eisentraut
ed5e5f0710 Remove unnecessary (char *) casts [xlog]
Remove (char *) casts no longer needed after XLogRegisterData() and
XLogRegisterBufData() argument type change.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-13 10:57:07 +01:00
Michael Paquier
773c51dd39 Fix MakeTransitionCaptureState() to return a consistent result
When an UPDATE trigger referencing a new table and a DELETE trigger
referencing an old table are both present, MakeTransitionCaptureState()
returns an inconsistent result for UPDATE commands in its set of flags
and tuplestores holding the TransitionCaptureState for transition
tables.

As proved by the test added here, this issue causes a crash in v14 and
earlier versions (down to 11, actually, older versions do not support
triggers on partitioned tables) during cross-partition updates on a
partitioned table.  v15 and newer versions are safe thanks to
7103ebb7aa.

This commit fixes the function so that it returns a consistent state
by using portions of the changes made in commit 7103ebb7aa for v13 and
v14.  v15 and newer versions are slightly tweaked to match with the
older versions, mainly for consistency across branches.

Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20250207.150238.968446820828052276.horikyota.ntt@gmail.com
Backpatch-through: 13
2025-02-13 16:30:58 +09:00
Nathan Bossart
bb8dff9995 Add cost-based vacuum delay time to progress views.
This commit adds the amount of time spent sleeping due to
cost-based delay to the pg_stat_progress_vacuum and
pg_stat_progress_analyze system views.  A new configuration
parameter named track_cost_delay_timing, which is off by default,
controls whether this information is gathered.  For vacuum, the
reported value includes the sleep time of any associated parallel
workers.  However, parallel workers only report their sleep time
once per second to avoid overloading the leader process.

Bumps catversion.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
2025-02-11 16:38:14 -06:00
Nathan Bossart
e5b0b0ce15 Add is_analyze parameter to vacuum_delay_point().
This function is used in both vacuum and analyze code paths, and a
follow-up commit will require distinguishing between the two.  This
commit forces callers to specify whether they are in a vacuum or
analyze path, but it does not use that information for anything
yet.

Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
2025-02-11 16:38:14 -06:00
Melanie Plageman
052026c9b9 Eagerly scan all-visible pages to amortize aggressive vacuum
Aggressive vacuums must scan every unfrozen tuple in order to advance
the relfrozenxid/relminmxid. Because data is often vacuumed before it is
old enough to require freezing, relations may build up a large backlog
of pages that are set all-visible but not all-frozen in the visibility
map. When an aggressive vacuum is triggered, all of these pages must be
scanned. These pages have often been evicted from shared buffers and
even from the kernel buffer cache. Thus, aggressive vacuums often incur
large amounts of extra I/O at the expense of foreground workloads.

To amortize the cost of aggressive vacuums, eagerly scan some
all-visible but not all-frozen pages during normal vacuums.

All-visible pages that are eagerly scanned and set all-frozen in the
visibility map are counted as successful eager freezes and those not
frozen are counted as failed eager freezes.

If too many eager scans fail in a row, eager scanning is temporarily
suspended until a later portion of the relation. The number of failures
tolerated is configurable globally and per table.

To effectively amortize aggressive vacuums, we cap the number of
successes as well. Capping eager freeze successes also limits the amount
of potentially wasted work if these pages are modified again before the
next aggressive vacuum. Once we reach the maximum number of blocks
successfully eager frozen, eager scanning is disabled for the remainder
of the vacuum of the relation.

Original design idea from Robert Haas, with enhancements from
Andres Freund, Tomas Vondra, and me

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com
2025-02-11 13:53:48 -05:00
Peter Eisentraut
bfe21b760e Support non-btree indexes for foreign keys
Previously, only btrees were supported as the referenced unique index
for foreign keys because there was no way to get the equality strategy
number for other index methods.  We have this now (commit
c09e5a6a01), so we can support this.  In fact, this is now just a
special case of the existing generalized "period" foreign key
support, since that already knows how to lookup equality strategy
numbers.

Note that this does not change the requirement that the referenced
index needs to be unique, and at the moment, only btree supports that,
so this does not change anything in practice, but it would allow
another index method that has amcanunique to be supported.

Co-authored-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-07 11:23:34 +01:00
Peter Eisentraut
83ea6c5402 Virtual generated columns
This adds a new variant of generated columns that are computed on read
(like a view, unlike the existing stored generated columns, which are
computed on write, like a materialized view).

The syntax for the column definition is

    ... GENERATED ALWAYS AS (...) VIRTUAL

and VIRTUAL is also optional.  VIRTUAL is the default rather than
STORED to match various other SQL products.  (The SQL standard makes
no specification about this, but it also doesn't know about VIRTUAL or
STORED.)  (Also, virtual views are the default, rather than
materialized views.)

Virtual generated columns are stored in tuples as null values.  (A
very early version of this patch had the ambition to not store them at
all.  But so much stuff breaks or gets confused if you have tuples
where a column in the middle is completely missing.  This is a
compromise, and it still saves space over being forced to use stored
generated columns.  If we ever find a way to improve this, a bit of
pg_upgrade cleverness could allow for upgrades to a newer scheme.)

The capabilities and restrictions of virtual generated columns are
mostly the same as for stored generated columns.  In some cases, this
patch keeps virtual generated columns more restricted than they might
technically need to be, to keep the two kinds consistent.  Some of
that could maybe be relaxed later after separate careful
considerations.

Some functionality that is currently not supported, but could possibly
be added as incremental features, some easier than others:

- index on or using a virtual column
- hence also no unique constraints on virtual columns
- extended statistics on virtual columns
- foreign-key constraints on virtual columns
- not-null constraints on virtual columns (check constraints are supported)
- ALTER TABLE / DROP EXPRESSION
- virtual column cannot have domain type
- virtual columns are not supported in logical replication

The tests in generated_virtual.sql have been copied over from
generated_stored.sql with the keyword replaced.  This way we can make
sure the behavior is mostly aligned, and the differences can be
visible.  Some tests for currently not supported features are
currently commented out.

Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Tested-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org
2025-02-07 09:46:59 +01:00
Amit Langote
cbc127917e Track unpruned relids to avoid processing pruned relations
This commit introduces changes to track unpruned relations explicitly,
making it possible for top-level plan nodes, such as ModifyTable and
LockRows, to avoid processing partitions pruned during initial
pruning.  Scan-level nodes, such as Append and MergeAppend, already
avoid the unnecessary processing by accessing partition pruning
results directly via part_prune_index. In contrast, top-level nodes
cannot access pruning results directly and need to determine which
partitions remain unpruned.

To address this, this commit introduces a new bitmapset field,
es_unpruned_relids, which the executor uses to track the set of
unpruned relations.  This field is referenced during plan
initialization to skip initializing certain nodes for pruned
partitions. It is initialized with PlannedStmt.unprunableRelids,
a new field that the planner populates with RT indexes of relations
that cannot be pruned during runtime pruning. These include relations
not subject to partition pruning and those required for execution
regardless of pruning.

PlannedStmt.unprunableRelids is computed during set_plan_refs() by
removing the RT indexes of runtime-prunable relations, identified
from PartitionPruneInfos, from the full set of relation RT indexes.
ExecDoInitialPruning() then updates es_unpruned_relids by adding
partitions that survive initial pruning.

To support this, PartitionedRelPruneInfo and PartitionedRelPruningData
now include a leafpart_rti_map[] array that maps partition indexes to
their corresponding RT indexes. The former is used in set_plan_refs()
when constructing unprunableRelids, while the latter is used in
ExecDoInitialPruning() to convert partition indexes returned by
get_matching_partitions() into RT indexes, which are then added to
es_unpruned_relids.

These changes make it possible for ModifyTable and LockRows nodes to
process only relations that remain unpruned after initial pruning.
ExecInitModifyTable() trims lists, such as resultRelations,
withCheckOptionLists, returningLists, and updateColnosLists, to
consider only unpruned partitions. It also creates ResultRelInfo
structs only for these partitions. Similarly, child RowMarks for
pruned relations are skipped.

By avoiding unnecessary initialization of structures for pruned
partitions, these changes improve the performance of updates and
deletes on partitioned tables during initial runtime pruning.

Due to ExecInitModifyTable() changes as described above, EXPLAIN on a
plan for UPDATE and DELETE that uses runtime initial pruning no longer
lists partitions pruned during initial pruning.

Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
2025-02-07 17:15:09 +09:00
Nathan Bossart
401a6956fa Disallow COPY FREEZE on foreign tables.
This didn't actually work: the COPY succeeds, but the FREEZE
optimization isn't applied.  There doesn't seem to be an easy way
to support FREEZE on foreign tables, so let's follow the precedent
established by commit 5c9a5513a3 by raising an error early.  This
is arguably a bug fix, but due to the lack of reports, the minimal
discussion on the mailing list, and the potential to break existing
scripts, I am not back-patching it for now.

Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0ujeNgKpE3OrLtR%3DeJGa5LkGMekFzQTwjgw%3DrzaLufQLQ%40mail.gmail.com
2025-02-06 15:23:40 -06:00
Peter Eisentraut
622f678c10 Integrate GistTranslateCompareType() into IndexAmTranslateCompareType()
This turns GistTranslateCompareType() into a callback function of the
gist index AM instead of a standalone function.  The existing callers
are changed to use IndexAmTranslateCompareType().  This then makes
that code not hardcoded toward gist.

This means in particular that the temporal keys code is now
independent of gist.  Also, this generalizes commit 74edabce7a, so
other index access methods other than the previously hardcoded ones
could now work as REPLICA IDENTITY in a logical replication
subscriber.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-03 10:53:18 +01:00
Peter Eisentraut
43493cceda Add get_opfamily_name() function
This refactors and simplifies various existing code to make use of the
new function.

Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-01 10:42:58 +01:00
Peter Eisentraut
a5709b5bb2 Rename GistTranslateStratnum() to GistTranslateCompareType()
Follow up to commit 630f9a43ce.  The previous name had become
confusing, because it doesn't actually translate a strategy number but
a CompareType into a strategy number.  We might add the inverse at
some point, which would then probably be called something like
GistTranslateStratnum.

Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-02-01 10:18:46 +01:00
Tom Lane
53a4936505 Doc: add commentary about cowboy assignment of maintenance_work_mem.
Whilst working on commit 041e8b95b I happened to notice that
parallel_vacuum_main() assigns directly to the maintenance_work_mem
GUC.  This is definitely not per project conventions, so I tried to
fix it to use SetConfigOption().  But that fails with "parameter
cannot be set during a parallel operation".  It doesn't seem worth
working on a cleaner answer, at least not till we have a few more
instances of similar problems.  But add some commentary, just so
nobody gets the idea that this is an approved way to set a GUC.
2025-01-31 15:17:15 -05:00
Tom Lane
041e8b95b8 Get rid of our dependency on type "long" for memory size calculations.
Consistently use "Size" (or size_t, or in some places int64 or double)
as the type for variables holding memory allocation sizes.  In most
places variables' data types were fine already, but we had an ancient
habit of computing bytes from kilobytes-units GUCs with code like
"work_mem * 1024L".  That risks overflow on Win64 where they did not
make "long" as wide as "size_t".  We worked around that by restricting
such GUCs' ranges, so you couldn't set work_mem et al higher than 2GB
on Win64.  This patch removes that restriction, after replacing such
calculations with "work_mem * (Size) 1024" or variants of that.

It should be noted that this patch was constructed by searching
outwards from the GUCs that have MAX_KILOBYTES as upper limit.
So I can't positively guarantee there are no other places doing
memory-size arithmetic in int or long variables.  I do however feel
pretty confident that increasing MAX_KILOBYTES on Win64 is safe now.
Also, nothing in our code should be dealing in multiple-gigabyte
allocations without authorization from a relevant GUC, so it seems
pretty likely that this search caught everything that could be at
risk of overflow.

Author: Vladlen Popolitov <v.popolitov@postgrespro.ru>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1a01f0-66ec2d80-3b-68487680@27595217
2025-01-31 13:52:40 -05:00
Amit Kapila
75eb9766ec Rename pubgencols_type to pubgencols in pg_publication.
The column added in commit e65dbc9927, pubgencols_type, was inconsistent
with the naming conventions of other columns in the pg_publication
catalog.

Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CALDaNm1u-ufVOW-RUsXSooqzkpohxfZYy=z78fbcr_9Pq5hbCg@mail.gmail.com
2025-01-28 10:42:46 +05:30
Michael Paquier
30a6ed0ce4 Track per-relation cumulative time spent in [auto]vacuum and [auto]analyze
This commit adds four fields to the statistics of relations, aggregating
the amount of time spent for each operation on a relation:
- total_vacuum_time, for manual vacuum.
- total_autovacuum_time, for vacuum done by the autovacuum daemon.
- total_analyze_time, for manual analyze.
- total_autoanalyze_time, for analyze done by the autovacuum daemon.

This gives users the option to derive the average time spent for these
operations with the help of the related "count" fields.

Bump catalog version (for the catalog changes) and PGSTAT_FILE_FORMAT_ID
(for the additions in PgStat_StatTabEntry).

Author: Sami Imseih
Reviewed-by: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/CAA5RZ0uVOGBYmPEeGF2d1B_67tgNjKx_bKDuL+oUftuoz+=Y1g@mail.gmail.com
2025-01-28 09:57:32 +09:00
Michael Paquier
65281391a9 Print out error position for some ALTER TABLE ALTER COLUMN type
A ParseState exists in ATPrepAlterColumnType() since its introduction
in 077db40fa1, and it has never relied on a query string that could be
used to point at a location in the origin string on error.

The output of some regression tests are updated, showing the error
location where applicable.  Six error strings are upgraded with the
error location.

Author: Jian He
Discussion: https://postgr.es/m/CACJufxGfbPfWLjcEz33G9eW_epDW0UDi2H05i9eSTPKGJ4rxSA@mail.gmail.com
2025-01-27 13:51:23 +09:00
Álvaro Herrera
0a16c8326c Add missing CommandCounterIncrement
For commit b663b9436e I thought this was useless, but turns out not to
be for the case where a partitioned table has two identical foreign key
constraints which can both be matched by the same constraint in a
partition during attach.  This CCI makes the match search for the second
constraint in the parent ignore the constraint in the child that has
already been matched by the first constraint in the parent.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/c599253c-1ccd-4161-80fc-c9065e037a09@gmail.com
2025-01-26 17:34:28 +01:00
Tom Lane
01463e1ccc Ensure that AFTER triggers run as the instigating user.
With deferred triggers, it is possible that the current role changes
between the time when the trigger is queued and the time it is
executed (for example, the triggering data modification could have
been executed in a SECURITY DEFINER function).

Up to now, deferred trigger functions would run with the current role
set to whatever was active at commit time.  That does not matter for
foreign-key constraints, whose correctness doesn't depend on the
current role.  But for user-written triggers, the current role
certainly can matter.

Hence, fix things so that AFTER triggers are fired under the role
that was active when they were queued, matching the behavior of
BEFORE triggers which would have actually fired at that time.
(If the trigger function is marked SECURITY DEFINER, that of course
overrides this, as it always has.)

This does not create any new security exposure: if you do DML on a
table owned by a hostile user, that user has always had various ways
to exploit your permissions, such as the aforementioned BEFORE
triggers, default expressions, etc.  It might remove some security
exposure, because the old behavior could potentially expose some
other role besides the one directly modifying the table.

There was discussion of making a larger change, such as running as
the trigger's owner.  However, that would break the common idiom of
capturing the value of CURRENT_USER in a trigger for auditing/logging
purposes.  This change will make no difference in the typical scenario
where the current role doesn't change before commit.

Arguably this is a bug fix, but it seems too big a semantic change
to consider for back-patching.

Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Joseph Koshakow <koshy44@gmail.com>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://postgr.es/m/77ee784cf248e842f74588418f55c2931e47bd78.camel@cybertec.at
2025-01-23 12:25:55 -05:00
Tom Lane
7921927bbb Reverse the search order in afterTriggerAddEvent().
When scanning existing AfterTriggerSharedData records in search
of a match to the event being queued, we were examining the
records from oldest to newest.  But it makes more sense to do
the opposite.  The newest record is likely to be from the current
query, while the oldest is likely to be from some previous command
in the same transaction, which will likely have different details.

There aren't expected to be very many active AfterTriggerSharedData
records at once, so that this change is unlikely to make any
spectacular difference.  Still, having added a nontrivially-expensive
bms_equal call to this loop yesterday, I feel a need to shave cycles
where possible.

Discussion: https://postgr.es/m/4166712.1737583961@sss.pgh.pa.us
2025-01-23 11:08:05 -05:00
Álvaro Herrera
b663b9436e Allow NOT VALID foreign key constraints on partitioned tables
This feature was intentionally omitted when FKs were first implemented
for partitioned tables, and had been requested a few times; the
usefulness is clear.

Validation can happen for each partition individually, which is useful
to contain the number of locks held and the duration; or it can be
executed for the partitioning hierarchy as a single command, which
validates all child constraints that haven't been validated already.

This is also useful to implement NOT ENFORCED constraints on top.

Author: Amul Sul <sulamul@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b96Bp=-ZwihPPtuaNX=SrZ0U6ZsXD3+fgARO0JuKa8v2jQ@mail.gmail.com
2025-01-23 15:54:38 +01:00
Amit Kapila
e65dbc9927 Change publication's publish_generated_columns option type to enum.
The current boolean publish_generated_columns option only supports a
binary choice, which is insufficient for future enhancements where
generated columns can be of different types (e.g., stored or virtual). The
supported values for the publish_generated_columns option are 'none' and
'stored'.

Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/d718d219-dd47-4a33-bb97-56e8fc4da994@eisentraut.org
Discussion: https://postgr.es/m/B80D17B2-2C8E-4C7D-87F2-E5B4BE3C069E@gmail.com
2025-01-23 15:28:37 +05:30
Tom Lane
ea68ea6320 Repair incorrect handling of AfterTriggerSharedData.ats_modifiedcols.
This patch fixes two distinct errors that both ultimately trace
to commit 71d60e2aa, which added the ats_modifiedcols field.

The more severe error is that ats_modifiedcols wasn't accounted for
in afterTriggerAddEvent's scanning loop that looks for a pre-existing
duplicate AfterTriggerSharedData.  Thus, a new event could be
incorrectly matched to an AfterTriggerSharedData that has a different
value of ats_modifiedcols, resulting in the wrong tg_updatedcols
bitmap getting passed to the trigger whenever it finally gets fired.
We'd not noticed because (a) few triggers consult tg_updatedcols,
and (b) we had no tests exercising a case where such a trigger was
called as an AFTER trigger.  In the test case added by this commit,
contrib/lo's trigger fails to remove a large object when expected
because (without this fix) it thinks the LO OID column hasn't changed.

The other problem was introduced by commit ce5aaea8c, which copied the
modified-columns bitmap into trigger-related storage.  It made a copy
for every trigger event, whereas what we really want is to make a new
copy only when we make a new AfterTriggerSharedData entry.  (We could
imagine adding extra logic to reduce the number of bitmap copies still
more, but it doesn't look worthwhile at the moment.)  In a simple test
of an UPDATE of 10000000 rows with a single AFTER trigger, this thinko
roughly tripled the amount of memory consumed by the pending-triggers
data structures, from 160446744 to 480443440 bytes.

Fixing the first problem requires introducing a bms_equal() call into
afterTriggerAddEvent's scanning loop, which is slightly annoying from
a speed perspective.  However, getting rid of the excessive bms_copy()
calls from the second problem balances that out; overall speed of
trigger operations is the same or slightly better, in my tests.

Discussion: https://postgr.es/m/3496294.1737501591@sss.pgh.pa.us
Backpatch-through: 13
2025-01-22 11:58:20 -05:00
Álvaro Herrera
9b21f203dd Fix detach of a partition that has a toplevel FK to a partitioned table
In common cases, foreign keys are defined on the toplevel partitioned
table; but if instead one is defined on a partition and references a
partitioned table, and the referencing partition is detached, we would
examine the pg_constraint row on the partition being detached, and fail
to realize that the sub-constraints must be left alone.  This causes the
ALTER TABLE DETACH process to fail with

 ERROR:  could not find ON INSERT check triggers of foreign key constraint NNN

This is similar but not quite the same as what was fixed by
53af9491a0.  This bug doesn't affect branches earlier than 15, because
the detach procedure was different there, so we only backpatch down to
15.

Fix by skipping such modifying constraints that are children of other
constraints being detached.

Author: Amul Sul <sulamul@gmail.com>
Diagnosys-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b97GuPh6wQPbxQS-Zpy16Oh+0aMv-w64QcGrLhCOZZ6p+g@mail.gmail.com
2025-01-21 14:53:46 +01:00
Peter Eisentraut
1772d554b0 Fix NO ACTION temporal foreign keys when the referenced endpoints change
If a referenced UPDATE changes the temporal start/end times, shrinking
the span the row is valid, we get a false return from
ri_Check_Pk_Match(), but overlapping references may still be valid, if
their reference didn't overlap with the removed span.

We need to consider what span(s) are still provided in the referenced
table.  Instead of returning that from ri_Check_Pk_Match(), we can
just look it up in the main SQL query.

Reported-by: Sam Gabrielsson <sam@movsom.se>
Author: Paul Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
2025-01-21 14:39:24 +01:00
Tom Lane
d7674c9fab Seek zone abbreviations in the IANA data before timezone_abbreviations.
If a time zone abbreviation used in datetime input is defined in
the currently active timezone, use that definition in preference
to looking in the timezone_abbreviations list.  That allows us to
correctly handle abbreviations that have different meanings in
different timezones.  Also, it eliminates an inconsistency between
datetime input and datetime output: the non-ISO datestyles for
timestamptz have always printed abbreviations taken from the IANA
data, not from timezone_abbreviations.  Before this fix, it was
possible to demonstrate cases where casting a timestamp to text
and back fails or changes the value significantly because of that
inconsistency.

While this change removes the ability to override the IANA data about
an abbreviation known in the current zone, it's not clear that there's
any real use-case for doing so.  But it is clear that this makes life
a lot easier for dealing with abbreviations that have conflicts across
different time zones.

Also update the pg_timezone_abbrevs view to report abbreviations
that are recognized via the IANA data, and *not* report any
timezone_abbreviations entries that are thereby overridden.
Under the hood, there are now two SRFs, one that pulls the IANA
data and one that pulls timezone_abbreviations entries.  They're
combined by logic in the view.  This approach was useful for
debugging (since the functions can be called on their own).
While I don't intend to document the functions explicitly,
they might be useful to call directly.

Also improve DecodeTimezoneAbbrev's caching logic so that it can
cache zone abbreviations found in the IANA data.  Without that,
this patch would have caused a noticeable degradation of the
runtime of timestamptz_in.

Per report from Aleksander Alekseev and additional investigation.

Discussion: https://postgr.es/m/CAJ7c6TOATjJqvhnYsui0=CO5XFMF4dvTGH+skzB--jNhqSQu5g@mail.gmail.com
2025-01-16 14:11:19 -05:00
Álvaro Herrera
86374c9a0e Split ATExecValidateConstraint into reusable pieces
With this, we have separate functions to add validation requests to
ALTER TABLE's phase 3 queue for check and foreign key constraints, which
allows reusing them in future commits -- particularly this will allow us
to perform validation of invalid foreign key constraints in partitioned
tables.

We could have let the check constraint code alone since we don't need to
reuse that for anything at this point, but it seems cleaner and more
consistent to do both at the same time.

Author: Amul Sul <sulamul@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b96Bp=-ZwihPPtuaNX=SrZ0U6ZsXD3+fgARO0JuKa8v2jQ@mail.gmail.com
2025-01-16 16:44:24 +01:00
Peter Eisentraut
7407b2d48c Remove dead code
As of commit 9895b35cb8, AlterDomainAddConstraint() can only be
called with constraints of type CONSTR_CHECK and CONSTR_NOTNULL.  So
all the code to check for and reject other constraint type values is
dead and can be removed.

Author: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CACJufxHitd5LGLBSSAPShhtDWxT0ViVKTHinkYW-skBX93TcpA@mail.gmail.com
2025-01-16 14:37:28 +01:00
Peter Eisentraut
7a947ed25b refactor: split ATExecAlterConstrRecurse()
This splits out a couple of subroutines from
ATExecAlterConstrRecurse().  This makes the main function a bit
smaller, and a future patch (NOT ENFORCED foreign-key constraints)
will also want to call some of the pieces separately.

Author: Amul Sul <amul.sul@enterprisedb.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA%40mail.gmail.com
2025-01-16 13:24:11 +01:00
Peter Eisentraut
630f9a43ce Change gist stratnum function to use CompareType
This changes commit 7406ab623f in that the gist strategy number
mapping support function is changed to use the CompareType enum as
input, instead of the "well-known" RT*StrategyNumber strategy numbers.

This is a bit cleaner, since you are not dealing with two sets of
strategy numbers.  Also, this will enable us to subsume this system
into a more general system of using CompareType to define operator
semantics across index methods.

Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2025-01-15 11:34:04 +01:00
Peter Eisentraut
ca87c415e2 Add support for NOT ENFORCED in CHECK constraints
This adds support for the NOT ENFORCED/ENFORCED flag for constraints,
with support for check constraints.

The plan is to eventually support this for foreign key constraints,
where it is typically more useful.

Note that CHECK constraints do not currently support ALTER operations,
so changing the enforceability of an existing constraint isn't
possible without dropping and recreating it.  This could be added
later.

Author: Amul Sul <amul.sul@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Tested-by: Triveni N <triveni.n@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
2025-01-11 10:52:30 +01:00
Álvaro Herrera
cc811f92ba Adjust signature of cluster_rel() and its subroutines
cluster_rel() receives the OID of the relation to process, which it
opens and locks; but then its subroutine copy_table_data() also receives
the relation OID and opens it by itself.  This is a bit wasteful.  It's
better to have cluster_rel() receive the relation already open, and pass
it down to its subroutines as necessary; then cluster_rel closes the rel
before returning.  This simplifies things.

But a better motivation to make this change is that a future command to
do logical-decoding-based "concurrent VACUUM FULL" will need to release
all locks on the relation (and possibly on the clustering index) at some
point.  Since it makes little sense to keep the relation reference
without the lock, the cluster_rel() function will also close it (and
the index).  With this arrangement, neither the function nor its
subroutines need open extra references, which, again, makes things simpler.

Author: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/82651.1720540558@antos
2025-01-10 13:09:38 +01:00
Nathan Bossart
39e3bcae44 Fix an ALTER GROUP ... DROP USER error message.
This error message stated the privileges required to add a member
to a group even if the user was trying to drop a member:

	postgres=> alter group a drop user b;
	ERROR:  permission denied to alter role
	DETAIL:  Only roles with the ADMIN option on role "a" may add members.

Since the required privileges for both operations are the same, we
can fix this by modifying the message to mention both adding and
dropping members:

	postgres=> alter group a drop user b;
	ERROR:  permission denied to alter role
	DETAIL:  Only roles with the ADMIN option on role "a" may add or drop members.

Author: ChangAo Chen
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/tencent_FAA0D00E3514AAF0BBB6322542A6094FEF05%40qq.com
Backpatch-through: 16
2025-01-09 17:10:13 -06:00
Álvaro Herrera
ebd8fc7e47 Simplify signature of RewriteTable
This function doesn't need the lockmode to be passed: it was being used
to lock the new heap, but that's bogus, because the only caller has
already obtained the appropriate lock on the new heap (which is
unimportant anyway, because the relation's creation is not yet committed
and so no other session can see it).

Noticed while reviewed Antonin Houska's patch to add VACUUM FULL
CONCURRENTLY.
2025-01-09 14:17:12 +01:00
Álvaro Herrera
5b291d1c9c Remove unnecessary code to handle CONSTR_NOTNULL
Commit 14e87ffa5c needlessly added support for CONSTR_NOTNULL entries
to StoreConstraints.  It's dead code, so remove it.

To make the situation regarding constraint creation clearer, change
comments in heap_create_with_catalog, StoreConstraints, MergeAttributes
to explain which types of constraint are used on each.

Author: 何建 (Jian He) <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxFxzqrCiUNfjJ0tQU+=nKQkQCGtGzUBude=SMOwj5VNjQ@mail.gmail.com
2025-01-07 16:49:41 +01:00
David Rowley
11012c5037 Fix an assortment of spelling mistakes and typos
Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/5812a0b9-b0cf-4151-9a14-d9f00e4f2858@gmail.com
2025-01-02 12:42:01 +13:00
Bruce Momjian
50e6eb731d Update copyright for 2025
Backpatch-through: 13
2025-01-01 11:21:55 -05:00
David Rowley
eb53ff5517 Fix overly large values/nulls arrays
These arrays were sized with Natts_pg_trigger (19) when they should have
been sized with Natts_pg_event_trigger (7).  We'd better fix this as
it's clearly a mistake and it could become problematic if
pg_event_trigger were to gain a dozen or so more columns in the future.

No backpatch as there's no actual bug and the column count on those
tables isn't going to change in released versions.

Author: Xin Zhang <zhanghien@qq.com>
Discussion: https://postgr.es/m/tencent_05AD0FB321A414EC3661204D2102AA6EF605@qq.com
2024-12-29 23:57:43 +13:00
Noah Misch
ff90ee6145 In REASSIGN OWNED of a database, lock the tuple as mandated.
Commit aac2c9b4fd mandated such locking
and attempted to fulfill that mandate, but it missed REASSIGN OWNED.
Hence, it remained possible to lose VACUUM's inplace update of
datfrozenxid if a REASSIGN OWNED processed that database at the same
time.  This didn't affect the other inplace-updated catalog, pg_class.
For pg_class, REASSIGN OWNED calls ATExecChangeOwner() instead of the
generic AlterObjectOwner_internal(), and ATExecChangeOwner() fulfills
the locking mandate.

Like in GRANT, implement this by following the locking protocol for any
catalog subject to the generic AlterObjectOwner_internal().  It would
suffice to do this for IsInplaceUpdateOid() catalogs only.  Back-patch
to v13 (all supported versions).

Kirill Reshke.  Reported by Alexander Kukushkin.

Discussion: https://postgr.es/m/CAFh8B=mpKjAy4Cuun-HP-f_vRzh2HSvYFG3rhVfYbfEBUhBAGg@mail.gmail.com
2024-12-28 07:16:22 -08:00
Michael Paquier
a86cfcae7c Fix typo in comment of compute_return_type() in functioncmds.c
Author: Japin Li
Discussion: https://postgr.es/m/ME0P300MB0445D51BCFA8680F0B35FD6EB60C2@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
2024-12-26 12:53:55 +09:00
David Rowley
7ec4b9ff80 Fix incorrect source filename references
Jian He reported the src/include/utility/tcop.h one and the remainder
were found by using a script to look for src/* and check that we have a
filename or directory of that name.

In passing, fix some out-date comments.

Reported-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CACJufxGoE3H-7VgO02=PrR4SNuVWDVbfTyUnwO0HvS-Lxurnog@mail.gmail.com
2024-12-23 19:41:49 +13:00
David Rowley
5983a4cffc Introduce CompactAttribute array in TupleDesc, take 2
The new compact_attrs array stores a few select fields from
FormData_pg_attribute in a more compact way, using only 16 bytes per
column instead of the 104 bytes that FormData_pg_attribute uses.  Using
CompactAttribute allows performance-critical operations such as tuple
deformation to be performed without looking at the FormData_pg_attribute
element in TupleDesc which means fewer cacheline accesses.

For some workloads, tuple deformation can be the most CPU intensive part
of processing the query.  Some testing with 16 columns on a table
where the first column is variable length showed around a 10% increase in
transactions per second for an OLAP type query performing aggregation on
the 16th column.  However, in certain cases, the increases were much
higher, up to ~25% on one AMD Zen4 machine.

This also makes pg_attribute.attcacheoff redundant.  A follow-on commit
will remove it, thus shrinking the FormData_pg_attribute struct by 4
bytes.

Author: David Rowley
Reviewed-by: Andres Freund, Victor Yegorov
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
2024-12-20 22:31:26 +13:00
Michael Paquier
0f23dedc91 Print out error position for some more DDLs
The following commands gain some information about the error position in
the query, should they fail when looking at the type used:
- CREATE TYPE (LIKE)
- CREATE TABLE OF

Both are related to typenameType() where the type name lookup is done.
These calls gain the ParseState that already exists in these paths.

Author: Kirill Reshke, Jian He
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/CALdSSPhqfvKbDwqJaY=yEePi_aq61GmMpW88i6ZH7CMG_2Z4Cg@mail.gmail.com
2024-12-17 09:44:06 +09:00
Michael Paquier
39240bcad5 Print out error position for CREATE DOMAIN
This is simply done by pushing down the ParseState available in
ProcessUtility() to DefineDomain(), giving more information about the
position of an error when running a CREATE DOMAIN query.

Most of the queries impacted by this change have been added previously
in 0172b4c944.

Author: Kirill Reshke, Jian He
Reviewed-by: Álvaro Herrera, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CALdSSPhqfvKbDwqJaY=yEePi_aq61GmMpW88i6ZH7CMG_2Z4Cg@mail.gmail.com
2024-12-16 14:52:11 +09:00
David Rowley
c2a4078eba Enable BUFFERS with EXPLAIN ANALYZE by default
The topic of turning EXPLAIN's BUFFERS option on with the ANALYZE option
has come up a few times over the past few years.  In many ways, doing this
seems like a good idea as it may be more obvious to users why a given
query is running more slowly than they might expect.  Also, from my own
(David's) personal experience, I've seen users posting to the mailing
lists with two identical plans, one slow and one fast asking why their
query is sometimes slow.  In many cases, this is due to additional reads.
Having BUFFERS on by default may help reduce some of these questions, and
if not, make it more obvious to the user before they post, or save a
round-trip to the mailing list when additional I/O effort is the cause of
the slowness.

The general consensus is that we want BUFFERS on by default with
ANALYZE.  However, there were more than zero concerns raised with doing
so.  The primary reason against is the additional verbosity, making it
harder to read large plans.  Another concern was that buffer information
isn't always useful so may not make sense to have it on by default.

It's currently December, so let's commit this to see if anyone comes
forward with a strong objection against making this change.  We have over
half a year remaining in the v18 cycle where we could still easily consider
reverting this if someone were to come forward with a convincing enough
reason as to why doing this is a bad idea.

There were two patches independently submitted to achieve this goal, one
by me and the other by Guillaume.  This commit is a mix of both of these
patches with some additional work done by me to adjust various
additional places in the documentation which include EXPLAIN ANALYZE
output.

Author: Guillaume Lelarge, David Rowley
Reviewed-by: Robert Haas, Greg Sabino Mullane, Michael Christofides
Discussion: https://postgr.es/m/CANNMO++W7MM8T0KyXN3ZheXXt-uLVM3aEtZd+WNfZ=obxffUiA@mail.gmail.com
2024-12-11 22:35:11 +13:00
Michael Paquier
d37e856410 Fix comments of GUC hooks for timezone_abbreviations
The GUC assign and check hooks used "assign_timezone_abbreviations",
which was incorrect.

Issue noticed while browsing this area of the code, introduced in
0a20ff54f5.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/Z1eV6Y8yk77GZhZI@paquier.xyz
Backpatch-through: 16
2024-12-10 13:02:21 +09:00
Daniel Gustafsson
73a392d236 Fix small memory leaks in GUC checks
Follow-up commit to a9d58bfe8a.  Backpatch down to v16 where
this was added in order to keep the code consistent for future
backpatches.

Author: Tofig Aliev <t.aliev@postgrespro.ru>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/bba4313fdde9db46db279f96f3b748b1@postgrespro.ru
Backpatch-through: 16
2024-12-09 20:58:23 +01:00
Tom Lane
3eea7a0c97 Simplify executor's determination of whether to use parallelism.
Our parallel-mode code only works when we are executing a query
in full, so ExecutePlan must disable parallel mode when it is
asked to do partial execution.  The previous logic for this
involved passing down a flag (variously named execute_once or
run_once) from callers of ExecutorRun or PortalRun.  This is
overcomplicated, and unsurprisingly some of the callers didn't
get it right, since it requires keeping state that not all of
them have handy; not to mention that the requirements for it were
undocumented.  That led to assertion failures in some corner
cases.  The only state we really need for this is the existing
QueryDesc.already_executed flag, so let's just put all the
responsibility in ExecutePlan.  (It could have been done in
ExecutorRun too, leading to a slightly shorter patch -- but if
there's ever more than one caller of ExecutePlan, it seems better
to have this logic in the subroutine than the callers.)

This makes those ExecutorRun/PortalRun parameters unnecessary.
In master it seems okay to just remove them, returning the
API for those functions to what it was before parallelism.
Such an API break is clearly not okay in stable branches,
but for them we can just leave the parameters in place after
documenting that they do nothing.

Per report from Yugo Nagata, who also reviewed and tested
this patch.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp
2024-12-09 14:38:19 -05:00
Tom Lane
3f9b962176 Ensure that pg_amop/amproc entries depend on their lefttype/righttype.
Usually an entry in pg_amop or pg_amproc does not need a dependency on
its amoplefttype/amoprighttype/amproclefttype/amprocrighttype types,
because there is an indirect dependency via the argument types of its
referenced operator or procedure, or via the opclass it belongs to.
However, for some support procedures in some index AMs, the argument
types of the support procedure might not mention the column data type
at all.  Also, the amop/amproc entry might be treated as "loose" in
the opfamily, in which case it lacks a dependency on any particular
opclass; or it might be a cross-type entry having a reference to a
datatype that is not its opclass' opcintype.

The upshot of all this is that there are cases where a datatype can
be dropped while leaving behind amop/amproc entries that mention it,
because there is no path in pg_depend showing that those entries
depend on that type.  Such entries are harmless in normal activity,
because they won't get used, but they cause problems for maintenance
actions such as dropping the operator family.  They also cause pg_dump
to produce bogus output.  The previous commit put a band-aid on the
DROP OPERATOR FAMILY failure, but a real fix is needed.

To fix, add pg_depend entries showing that a pg_amop/pg_amproc entry
depends on its lefttype/righttype.  To avoid bloating pg_depend too
much, skip this if the referenced operator or function has that type
as an input type.  (I did not bother with considering the possible
indirect dependency via the opclass' opcintype; at least in the
reported case, that wouldn't help anyway.)

Probably, the reason this has escaped notice for so long is that
add-on datatypes and relevant opclasses/opfamilies are usually
packaged as extensions nowadays, so that there's no way to drop
a type without dropping the referencing opclasses/opfamilies too.
Still, in the absence of pg_depend entries there's nothing that
constrains DROP EXTENSION to drop the opfamily entries before the
datatype, so it seems possible for a DROP failure to occur anyway.

The specific case that was reported doesn't fail in v13, because
v13 prefers to attach the support procedure to the opclass not the
opfamily.  But it's surely possible to construct other edge cases
that do fail in v13, so patch that too.

Per report from Yoran Heling.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/Z1MVCOh1hprjK5Sf@gmai021
2024-12-07 15:56:28 -05:00
John Naylor
ccc8194e42 Fix use-after-free in parallel_vacuum_reset_dead_items
parallel_vacuum_reset_dead_items used a local variable to hold a
pointer from the passed vacrel, purely as a shorthand. This pointer
was later freed and a new allocation was made and stored to the
struct. Then the local pointer was mistakenly referenced again.

This apparently happened not to break anything since the freed chunk
would have been put on the context's freelist, so it was accidentally
the same pointer anyway, in which case the DSA handle was correctly
updated. The minimal fix is to change two places so they access
dead_items through the vacrel. This coding style is a maintenance
hazard, so while at it get rid of most other similar usages, which
were inconsistently used anyway.

Analysis and patch by Vallimaharajan G, with further defensive coding
by me

Backpath to v17, when TidStore came in

Discussion: https://postgr.es/m/1936493cc38.68cb2ef27266.7456585136086197135@zohocorp.com
2024-12-04 16:58:25 +07:00