1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-14 18:42:34 +03:00
Commit Graph

61398 Commits

Author SHA1 Message Date
d3b2e5e1ab Refactor convert_case() to prepare for optimizations.
Upcoming optimizations will add complexity to convert_case(). This
patch reorganizes slightly so that the complexity can be contained
within the logic to convert the case of a single character, rather
than mixing it in with logic to iterate through the string.

Reviewed-by: Alexander Borisov <lex.borisov@gmail.com>
Discussion: https://postgr.es/m/44005c3d-88f4-4a26-981f-fd82dfa8e313@gmail.com
2025-03-12 21:51:52 -07:00
3abe9dc188 Avoid invalidating all RelationSyncCache entries on publication rename.
On Publication rename, we need to only invalidate the RelationSyncCache
entries corresponding to relations that are part of the publication being
renamed.

As part of this patch, we introduce a new invalidation message to
invalidate the cache maintained by the logical decoding output plugin. We
can't use existing relcache invalidation for this purpose, as that would
unnecessarily cause relcache invalidations in other backends.

This will improve performance by building fewer relation cache entries
during logical replication.

Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966C09AA201EFFA706576A7F5C92@OSCPR01MB14966.jpnprd01.prod.outlook.com
2025-03-13 09:16:33 +05:30
75da2bece6 Fix read_stream.c for changing io_combine_limit.
In a couple of places, read_stream.c assumed that io_combine_limit would
be stable during the lifetime of a stream.  That is not true in at least
one unusual case: streams held by CURSORs where you could change the GUC
between FETCH commands, with unpredictable results.

Fix, by storing stream->io_combine_limit and referring only to that
after construction.  This mirrors the treatment of the other important
setting {effective,maintenance}_io_concurrency, which is stored in
stream->max_ios.

One of the cases was the queue overflow space, which was sized for
io_combine_limit and could be overrun if the GUC was increased.  Since
that coding was a little hard to follow, also introduce a variable for
better readability instead of open-coding the arithmetic.  Doing so
revealed an off-by-one thinko while clamping max_pinned_buffers to
INT16_MAX, though that wasn't a live bug due to the current limits on
GUC values.

Back-patch to 17.

Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
2025-03-13 15:43:34 +13:00
d4f79865d4 Fix copy-paste error in datum_to_jsonb_internal()
Commit 3c152a27b0 mistakenly repeated JSONTYPE_JSON in a condition,
omitting JSONTYPE_CAST. As a result, datum_to_jsonb_internal() failed
to reject inputs that were casts (e.g., from an enum to json as in the
example below) when used as keys in JSON constructors.

This led to a crash in cases like:

  SELECT JSON_OBJECT('happy'::mood: '123'::jsonb);

where 'happy'::mood is implicitly cast to json. The missing check
meant such casted values weren’t properly rejected as invalid
(non-scalar) JSON keys.

Reported-by: Maciek Sakrejda <maciek@pganalyze.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Maciek Sakrejda <maciek@pganalyze.com>
Discussion: https://postgr.es/m/CADXhmgTJtJZK9A3Na_ry+Xrq-ghjcejBRhcRMzWZvbd__QdgJA@mail.gmail.com
Backpatch-through: 17
2025-03-13 09:56:36 +09:00
4ecdd4110d pg_rewind: Add dbname to primary_conninfo when using --write-recovery-conf.
This commit enhances pg_rewind's --write-recovery-conf option to
include the dbname in the generated primary_conninfo value when
specified in the --source-server option. With this modification, the
rewound server can connect to the primary server without manual
configuration file modifications when sync_replication_slots is
enabled.

Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAD21AoAkW=Ht0k9dVoBTCcqLiiZ2MXhVr+d=j2T_EZMerGrLWQ@mail.gmail.com
2025-03-12 16:56:04 -07:00
cdc1471cc7 Add b955df443 to .git-blame-ignore-revs 2025-03-13 12:44:26 +13:00
b955df4434 Fix indentation issue
Introduced recently by 9e088f7dd

Per buildfarm member koel
2025-03-13 12:41:44 +13:00
9e088f7dd8 Fix compiler warning in pg_logicalinspect.
Oversight in bd65cb3cd4.

Reported-by: David Rowley <dgrowleyml@gmail.com>
Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqrhFfnetbcwgGkJ=z63T8HfQ_OyP=vX8BYiXyxFKt67w@mail.gmail.com
2025-03-12 14:23:56 -07:00
ac4494646d Rename alloc/free functions in reorderbuffer.c
There used to be bespoken pools for these structs to reduce the
palloc/pfree overhead, but that was ripped out a long time ago and
replaced with the generic, cheaper generational memory allocator
(commit a4ccc1cef5). The Get/Return terminology made sense with the
pools, as you "got" an object from the pool and "returned" it later,
but now it just looks weird. Rename to Alloc/Free.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/c9e43d2d-8e83-444f-b111-430377368989@iki.fi
2025-03-12 22:03:39 +02:00
025e7e1eb4 Remove count_one_bits() in acl.c.
The only caller, select_best_grantor(), can instead use
pg_popcount64().  This isn't performance-critical code, but we
might as well use the centralized implementation.  While at it, add
some test coverage for this part of select_best_grantor().

Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/Z9GtL7Nm6hsYyJnF%40nathan
2025-03-12 15:01:52 -05:00
ff79b5b2ab Increase default effective_io_concurrency to 16
The default effective_io_concurrency has been 1 since it was introduced
in b7b8f0b609. Referencing the associated discussion [1], it
seems 1 was chosen as a conservative value that seemed unlikely to cause
regressions.

Experimentation on high latency cloud storage as well as fast, local
nvme storage (see Discussion link) shows that even slightly higher
values improve query timings substantially. 1 actually performs worse
than 0 [2]. With effective_io_concurrency 1, we are not prefetching
enough to avoid I/O stalls, but we are issuing extra syscalls.

The new default is 16, which should be more appropriate for common
hardware while still avoiding flooding low IOPs devices with I/O
requests.

[1] https://www.postgresql.org/message-id/flat/FDDBA24E-FF4D-4654-BA75-692B3BA71B97%40enterprisedb.com
[2] https://www.postgresql.org/message-id/CAAKRu_Zv08Cic%3DqdCfzrQabpEXGrd9Z9UOW5svEVkCM6%3DFXA9g%40mail.gmail.com

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAAKRu_Z%2BJa-mwXebOoOERMMUMvJeRhzTjad4dSThxG0JLXESxw%40mail.gmail.com
2025-03-12 15:57:44 -04:00
af717317a0 Handle interrupts while waiting on Append's async subplans
We did not wake up on interrupts while waiting on async events on an
async-capable append node. For example, if you tried to cancel the
query, nothing would happen until one of the async subplans becomes
readable. To fix, add WL_LATCH_SET to the WaitEventSet.

Backpatch down to v14 where async Append execution was introduced.

Discussion: https://www.postgresql.org/message-id/37a40570-f558-40d3-b5ea-5c2079b3b30b@iki.fi
2025-03-12 20:53:09 +02:00
f4e7756ef9 Build whole-row Vars the same way during parsing and planning.
makeWholeRowVar() has different rules for constructing a
whole-row Var depending on the kind of RTE it's representing.
This turns out to be problematic because the rewriter and planner
can convert view RTEs and set-returning-function RTEs into
subquery RTEs; so a whole-row Var made during planning might
look different from one made by the parser.  In isolation this
doesn't cause any problem, but if a query contains Vars made
both ways for the same varno, there are cross-checks in the
executor that will complain.  This manifests for UPDATE, DELETE,
and MERGE queries that use whole-row table references.

To fix, we need makeWholeRowVar() to produce the same result
from an inlined RTE as it would have for the original.  For
an inlined view, we can use RangeTblEntry.relid to detect
that this had been a view RTE.  For inlined SRFs, make a
data structure definition change akin to commit 47bb9db75,
and say that we won't clear RangeTblEntry.functions until
the end of planning.  That allows makeWholeRowVar() to
repeat what it would have done with the unmodified RTE.

Reported-by: Duncan Sands <duncan.sands@deepbluecap.com>
Reported-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/3518c50a-ab18-482f-b916-a37263622501@deepbluecap.com
Backpatch-through: 13
2025-03-12 11:47:38 -04:00
18cd15e706 Add connection establishment duration logging
Add log_connections option 'setup_durations' which logs durations of
several key parts of connection establishment and backend setup.

For an incoming connection, starting from when the postmaster gets a
socket from accept() and ending when the forked child backend is first
ready for query, there are multiple steps that could each take longer
than expected due to external factors. This logging provides visibility
into authentication and fork duration as well as the end-to-end
connection establishment and backend initialization time.

To make this portable, the timings captured in the postmaster (socket
creation time, fork initiation time) are passed through the
BackendStartupData.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Guillaume Lelarge <guillaume.lelarge@dalibo.com>
Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
2025-03-12 11:35:27 -04:00
9219093cab Modularize log_connections output
Convert the boolean log_connections GUC into a list GUC comprised of the
connection aspects to log.

This gives users more control over the volume and kind of connection
logging.

The current log_connections options are 'receipt', 'authentication', and
'authorization'. The empty string disables all connection logging. 'all'
enables all available connection logging.

For backwards compatibility, the most common values for the
log_connections boolean are still supported (on, off, 1, 0, true, false,
yes, no). Note that previously supported substrings of on, off, true,
false, yes, and no are no longer supported.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
2025-03-12 11:35:21 -04:00
f554a95379 Remove initialization from PendingBackendStats
9a8dd2c5a6 has added an initialization to PendingBackendStats, which
has been causing compilation warnings in the buildfarm.  This code does
not strictly require it as PendingBackendStats is always initialized
with memset(0), so let's remove it.

Per report from multiple buildfarm members, like ayu and batfish, via
Tom Lane.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/1870853.1741749264@sss.pgh.pa.us
2025-03-12 20:37:43 +09:00
72a3d0462b Prepare for Python "Limited API" in PL/Python
Using the Python Limited API would allow building PL/Python against
any Python 3.x version and using another Python 3.x version at run
time.  This commit does not activate that, but it prepares the code to
only use APIs supported by the Limited API.

Implementation details:

- Convert static types to heap types
  (https://docs.python.org/3/howto/isolating-extensions.html#heap-types).

- Replace PyRun_String() with component functions.

- Replace PyList_SET_ITEM() with PyList_SetItem().

This was previously committed as c47e8df815 and then reverted because
it wasn't working under Python older than 3.8.  That has been fixed in
this version.  There was a Python API change/bugfix between 3.7 and
3.8 that directly affects this patch.  The relevant commit is
<https://github.com/python/cpython/commit/364f0b0f19c>.  The
workarounds described there have been applied in this patch, and it
has been confirmed to work with Python 3.6 and 3.7.

Reviewed-by: Jakob Egger <jakob@eggerapps.at>
Discussion: https://www.postgresql.org/message-id/flat/ee410de1-1e0b-4770-b125-eeefd4726a24@eisentraut.org
2025-03-12 08:53:54 +01:00
c872516d8f Doc: silence A4 PDF build warnings.
Commit 0fbceae84 put a "&zwsp;" in almost but not quite the correct
place to avoid "The contents of fo:block line 1 exceed the available
area" warnings.  Per buildfarm.
2025-03-11 23:35:39 -04:00
043745c3a0 Improve snapmgr.c comment
Add more details on the different kinds of snapshots, how to use them,
and how the active snapshot stack works.

Discussion: https://www.postgresql.org/message-id/7c56f180-b9e1-481e-8c1d-efa63de3ecbb@iki.fi
2025-03-11 23:28:38 +02:00
8076c00592 Assert that a snapshot is active or registered before it's used
The comment in GetTransactionSnapshot() said that you "should call
RegisterSnapshot or PushActiveSnapshot on the returned snap if it is
to be used very long". That felt too unclear to me. Make the comment
more strongly worded.

To enforce that rule and to catch potential bugs where a snapshot
might get invalidated while it's still in use, add an assertion to
HeapTupleSatisfiesMVCC() to check that the snapshot is registered or
pushed to active stack. No new bugs were found by this, but it seems
like good future-proofing. It's not a great place for the check;
HeapTupleSatisfiesMVCC() is in fact safe to call with an unregistered
snapshot, and the assertion won't catch other unsafe uses. But it goes
a long way in practice.

Fix a few cases that were playing fast and loose with that and just
assumed that the snapshot cannot be invalidated during a scan. Those
assumptions were not wrong, but they're not performance critical, so
let's drop the excuses and just register the snapshot. These were
false positives found by the new assertion.

Discussion: https://www.postgresql.org/message-id/7c56f180-b9e1-481e-8c1d-efa63de3ecbb@iki.fi
2025-03-11 23:20:34 +02:00
bd65cb3cd4 pg_logicalinspect: Fix possible crash when passing a directory path.
Previously, pg_logicalinspect functions were too trusting of their
input and blindly passed it to SnapBuildRestoreSnapshot(). If the
input pointed to a directory, the server could a PANIC error while
attempting to fsync_fname() with isdir=false on a directory.

This commit adds validation checks for input filenames and passes the
LSN extracted from the filename to SnapBuildRestoreSnapshot() instead
of the filename itself. It also adds regression tests for various
input patterns and permission checks.

Bug: #18828
Reported-by: Robins Tharakan <tharakan@gmail.com>
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/18828-0f4701c635064211@postgresql.org
2025-03-11 09:56:40 -07:00
a49927f04c pg_logicalinspect: Stabilize isolation tests.
The previous isolation tests did not account for the possibility that
the background writer or the checkpointer could write a RUNNING_XACTS
record, which could cause logical decoding to produce more logical
snapshots than expected.

This commit modifies the isolation tests to verify that at least one
logical snapshot contains the expected number of committed or ongoing
catalog-change transactions.

Per buildfarm member skink.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/5qbxud4pvnvmtuoi7weiizm5hmumxaeohx4vztfhrwlfhyz6rj@buh4435mllwo
2025-03-11 09:30:00 -07:00
8b1b342544 Improve EXPLAIN's display of window functions.
Up to now we just punted on showing the window definitions used
in a plan, with window function calls represented as "OVER (?)".
To improve that, show the window definition implemented by each
WindowAgg plan node, and reference their window names in OVER.
For nameless window clauses generated by "OVER (...)", assign
unique names w1, w2, etc.

In passing, re-order the properties shown for a WindowAgg node
so that the Run Condition (if any) appears after the Window
property and before the Filter (if any).  This seems more
sensible since the Run Condition is associated with the Window
and acts before the Filter.

Thanks to David G. Johnston and Álvaro Herrera for design
suggestions.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/144530.1741469955@sss.pgh.pa.us
2025-03-11 11:19:54 -04:00
426ea61117 nbtree: Make BTMaxItemSize into object-like macro.
Make nbtree's "1/3 of a page limit" BTMaxItemSize function-like macro
(which accepts a "page" argument) into an object-like macro that can be
used from code that doesn't have convenient access to an nbtree page.

Preparation for an upcoming patch that adds skip scan to nbtree.
Parallel index scans that use skip scan will serialize datums (not just
SAOP array subscripts) when scheduling primitive scans.  BTMaxItemSize
will be used by btestimateparallelscan to determine how much DSM to
request.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wz=H_RG5weNGeUG_TkK87tRBnH9mGCQj6WpM4V4FNWKv2g@mail.gmail.com
2025-03-11 10:35:56 -04:00
0fbceae841 Show index search count in EXPLAIN ANALYZE, take 2.
Expose the count of index searches/index descents in EXPLAIN ANALYZE's
output for index scan/index-only scan/bitmap index scan nodes.  This
information is particularly useful with scans that use ScalarArrayOp
quals, where the number of index searches can be unpredictable due to
implementation details that interact with physical index characteristics
(at least with nbtree SAOP scans, since Postgres 17 commit 5bf748b8).
The information shown also provides useful context when EXPLAIN ANALYZE
runs a plan with an index scan node that successfully applied the skip
scan optimization (set to be added to nbtree by an upcoming patch).

The instrumentation works by teaching all index AMs to increment a new
nsearches counter whenever a new index search begins.  The counter is
incremented at exactly the same point that index AMs already increment
the pg_stat_*_indexes.idx_scan counter (we're counting the same event,
but at the scan level rather than the relation level).  Parallel queries
have workers copy their local counter struct into shared memory when an
index scan node ends -- even when it isn't a parallel aware scan node.
An earlier version of this patch that only worked with parallel aware
scans became commit 5ead85fb (though that was quickly reverted by commit
d00107cd following "debug_parallel_query=regress" buildfarm failures).

Our approach doesn't match the approach used when tracking other index
scan related costs (e.g., "Rows Removed by Filter:").  It is comparable
to the approach used in similar cases involving costs that are only
readily accessible inside an access method, not from the executor proper
(e.g., "Heap Blocks:" output for a Bitmap Heap Scan, which was recently
enhanced to show per-worker costs by commit 5a1e6df3, using essentially
the same scheme as the one used here).  It is necessary for index AMs to
have direct responsibility for maintaining the new counter, since the
counter might need to be incremented multiple times per amgettuple call
(or per amgetbitmap call).  But it is also necessary for the executor
proper to manage the shared memory now used to transfer each worker's
counter struct to the leader.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
2025-03-11 09:20:50 -04:00
12c5f797ea Update nls.mk for newly added file
Commit f18231e817 moved some code to a new file, but the new file
wasn't added to nls.mk.
2025-03-11 13:48:14 +01:00
17ce344f86 BRIN: be more strict about required support procs
With improperly defined operator classes, it's possible to get a
Postgres crash because we'd try to invoke a procedure that doesn't
exist.  This is because the code is being a bit too trusting that the
opclass is correctly defined.  Add some ereport(ERROR)s for cases where
mandatory support procedures are not defined, transforming the crashes
into errors.

The particular case that was reported is an incomplete opclass in
PostGIS.

Backpatch all the way down to 13.

Reported-by: Tobias Wendorff <tobias.wendorff@tu-dortmund.de>
Diagnosed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/fb6d9a35-6c8e-4869-af80-0a4944a793a4@tu-dortmund.de
2025-03-11 12:50:35 +01:00
d35d32d711 Add special case fast-paths for strict functions
Many STRICT function calls will have one or two arguments, in which
case we can speed up checking for NULL input by avoiding setting up
a loop over the arguments. This adds EEOP_FUNCEXPR_STRICT_1 and the
corresponding EEOP_FUNCEXPR_STRICT_2 for functions with one and two
arguments respectively.

Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/415721CE-7D2E-4B74-B5D9-1950083BA03E@yesql.se
Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
2025-03-11 12:02:42 +01:00
8dd7c7cd0a Replace EEOP_DONE with special steps for return/no return
Knowing when the side-effects of an expression is the intended result
of the execution, rather than the returnvalue, is important for being
able generate more efficient JITed code. This replaces EEOP_DONE with
two new steps: EEOP_DONE_RETURN and EEOP_DONE_NO_RETURN.  Expressions
which return a value should use the former step; expressions used for
their side-effects which don't return value should use the latter.

Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://postgr.es/m/415721CE-7D2E-4B74-B5D9-1950083BA03E@yesql.se
Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
2025-03-11 12:02:38 +01:00
dabccf4513 Move RemoveInheritedConstraint() call slightly earlier
This change is harmless and does not affect the existing intended
operation.  It is necessary for a subsequent patch operation (NOT
ENFORCED foreign keys), where we may need to change the child
constraint to enforced.  In this case, we would create the necessary
triggers and queue the constraint for validation, so it is important
to remove any unnecessary constraints before proceeding.

This is a small change that could have been included in the previous
"split tryAttachPartitionForeignKey" refactoring patch (commit
1d26c2d2c4), but was kept separate to highlight the changes.

Author: Amul Sul <amul.sul@enterprisedb.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA%40mail.gmail.com
2025-03-11 10:43:48 +01:00
1d26c2d2c4 refactor: Split tryAttachPartitionForeignKey()
Split tryAttachPartitionForeignKey() into three functions:
AttachPartitionForeignKey(), RemoveInheritedConstraint(), and
DropForeignKeyConstraintTriggers(), so they can be reused in some
subsequent patches for the NOT ENFORCED feature.

Author: Amul Sul <amul.sul@enterprisedb.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA%40mail.gmail.com
2025-03-11 09:35:24 +01:00
64224a834c refactor: re-add ATExecAlterChildConstr()
ATExecAlterChildConstr() was removed in commit 80d7f99049, but it is
needed in some subsequent patches for the NOT ENFORCED feature, to
recurse over child constraints.  This adds it back in slightly altered
form.

Author: Amul Sul <amul.sul@enterprisedb.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA%40mail.gmail.com
2025-03-11 08:43:35 +01:00
76def4cdd7 Add WAL data to backend statistics
This commit adds per-backend WAL statistics, providing the same
information as pg_stat_wal, except that it is now possible to know how
much WAL activity is happening in each backend rather than an overall
aggregate of all the activity.  Like pg_stat_wal, the implementation
relies on pgWalUsage, tracking the difference of activity between two
reports to pgstats.

This data can be retrieved with a new system function called
pg_stat_get_backend_wal(), that returns one tuple based on the PID
provided in input.  Like pg_stat_get_backend_io(), this is useful when
joined with pg_stat_activity to get a live picture of the WAL generated
for each running backend, showing how the activity is [un]balanced.

pgstat_flush_backend() gains a new flag value, able to control the flush
of the WAL stats.

This commit relies mostly on the infrastructure provided by
9aea73fc61, that has introduced backend statistics.

Bump catalog version.  A bump of PGSTAT_FILE_FORMAT_ID is not required,
as backend stats do not persist on disk.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
2025-03-11 09:04:11 +09:00
59a1592e39 tests: Make postmaster/002_connection_limits deal verbose logs
When log_error_verbosity=verbose is configured the test would hand (and then
fail), because of the sqlstate being added between log level and message. Make
regex cope.

Reported-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/c7ba6bd0-3701-43d1-9087-017777fe9cd2%40dunslane.net
2025-03-10 19:32:26 -04:00
29d6808ede CREATE INDEX: do update index stats if autovacuum=off.
This fixes a thinko from commit d611f8b15.  The intent was to prevent
updating the stats of the pre-existing heap if autovacuum is off,
but it also disabled updating the stats of the just-created index.
There is AFAICS no good reason to do the latter, since there could not
be any pre-existing stats to refrain from overwriting, and the zeroed
stats that are there to begin with are very unlikely to be useful.
Moreover, the change broke our cross-version upgrade tests again.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1116282.1741374848@sss.pgh.pa.us
2025-03-10 17:49:27 -04:00
f7c566a1a2 Fix a few more redundant calls of GetLatestSnapshot()
Commit 2367503177 fixed this in RelationFindReplTupleByIndex(), but I
missed two other similar cases.

Per report from Ranier Vilela.

Discussion: https://www.postgresql.org/message-id/CAEudQArUT1dE45WN87F-Gb7XMy_hW6x1DFd3sqdhhxP-RMDa0Q@mail.gmail.com
Backpatch-through: 13
2025-03-10 18:58:10 +02:00
2367503177 Fix snapshot used in logical replication index lookup
The function calls GetLatestSnapshot() to acquire a fresh snapshot,
makes it active, and was meant to pass it to table_tuple_lock(), but
instead called GetLatestSnapshot() again to acquire yet another
snapshot. It was harmless because the heap AM and all other known
table AMs ignore the 'snapshot' argument anyway, but let's be tidy.

In the long run, this perhaps should be redesigned so that snapshot
was not needed in the first place. The table AM API uses TID +
snapshot as the unique identifier for the row version, which is
questionable when the row came from an index scan with a Dirty
snapshot. You might lock a different row version when you use a
different snapshot in the table_tuple_lock() call (a fresh MVCC
snapshot) than in the index scan (DirtySnapshot). However, in the heap
AM and other AMs where the TID alone identifies the row version, it
doesn't matter. So for now, just fix the obvious albeit harmless bug.

This has been wrong ever since the table AM API was introduced in
commit 5db6df0c01, so backpatch to all supported versions.

Discussion: https://www.postgresql.org/message-id/83d243d6-ad8d-4307-8b51-2ee5844f6230@iki.fi
Backpatch-through: 13
2025-03-10 17:07:38 +02:00
9f87e2593f Doc: improve description of window function processing.
The previous wording talked about a "single pass over the data",
which can be read as promising more than intended (to wit, that only
one WindowAgg plan node will be used).  What we promise is only what
the SQL spec requires, namely that the data not get re-sorted between
window functions with compatible PARTITION BY/ORDER BY clauses.
Adjust the wording in hopes of making this clearer.

Reported-by: Christopher Inokuchi <cinokuchi@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/CABde6B5va2wMsnM79u_x=n9KUgfKQje_pbLROEBmA9Ru5XWidw@mail.gmail.com
Backpatch-through: 13
2025-03-10 10:22:08 -04:00
6bb6a62f3c Use extended stats for precise estimation of bucket size in hash join
Recognizing the real-life complexity where columns in the table often have
functional dependencies, PostgreSQL's estimation of the number of distinct
values over a set of columns can be underestimated (or much rarely,
overestimated) when dealing with multi-clause JOIN.  In the case of hash
join, it can end up with a small number of predicted hash  buckets and, as
a result, picking non-optimal merge join.

To improve the situation, we introduce one additional stage of bucket size
estimation - having two or more join clauses estimator lookup for extended
statistics and use it for multicolumn estimation.  Clauses are grouped into
lists, each containing expressions referencing the same relation.  The result
of the multicolumn estimation made over such a list is combined with others
according to the caller's logic.  Clauses that are not estimated are returned
to the caller for further estimation.

Discussion: https://postgr.es/m/52257607-57f6-850d-399a-ec33a654457b%40postgrespro.ru
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Andy Fan <zhihui.fan1213@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-03-10 13:42:01 +02:00
fae535da0a Teach Append to consider tuple_fraction when accumulating subpaths.
This change is dedicated to more active usage of IndexScan and parameterized
NestLoop paths in partitioned cases under an Append node, as it already works
with plain tables.  As newly added regression tests demonstrate, it should
provide more smartness to the partitionwise technique.

With an indication of how many tuples are needed, it may be more meaningful
to use the 'fractional branch' subpaths of the Append path list, which are
more optimal for this specific number of tuples.  Planning on a higher level,
if the optimizer needs all the tuples, it will choose non-fractional paths.
In the case when, during execution, Append needs to return fewer tuples than
declared by tuple_fraction, it would not be harmful to use the 'intermediate'
variant of paths.  However, it will earn a considerable profit if a sensible
set of tuples is selected.

The change of the existing regression test demonstrates the positive outcome
of this feature: instead of scanning the whole table, the optimizer prefers
to use a parameterized scan, being aware of the only single tuple the join
has to produce to perform the query.

Discussion: https://www.postgresql.org/message-id/flat/CAN-LCVPxnWB39CUBTgOQ9O7Dd8DrA_tpT1EY3LNVnUuvAX1NjA%40mail.gmail.com
Author: Nikita Malakhov <hukutoc@gmail.com>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Andy Fan <zhihuifan1213@163.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-03-10 13:38:39 +02:00
b83e8a2ca2 Remove support for temporal RESTRICT foreign keys
It isn't clear how these should behave, so let's wait to implement them
until we are sure how to do it.

This feature was initially added by commit 89f908a6d0, so it hasn't
been released yet.

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/e773bc11-4ac1-40de-bb91-814e02f05b6d%40eisentraut.org
2025-03-10 11:31:01 +01:00
e033696596 Fix incorrect #endif comment
Noticed while reading code in this area.
2025-03-10 13:36:04 +13:00
03f8e9a7fe Fix incorrect assertion in libpqwalreceiver
Was supposed to check the length of the array, but was checking its
size in bytes.

Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA%2BCOZaA_9afJxj9ZuO73U5P7WXP%2BZM9NGnZvTDCmBFz0FGP%2BwA@mail.gmail.com
2025-03-09 20:40:45 +02:00
2a943afcff Fix test name and username used in failed connection attempts
The first failed connection tests the "regular" connections limit, not
the reserved limit.

In the second failed connection, the username doesn't really matter,
but since the previous successful connections used "regress_reserved",
it seems weird to switch back to "regress_regular" for the
expected-to-fail attempt.

Discussion: https://www.postgresql.org/message-id/fd5e9523-78d3-4270-86b2-fd1b1eeb4fc9@iki.fi
2025-03-09 19:47:55 +02:00
fedfcf6650 Don't try to parallelize array_agg() on an anonymous record type.
This doesn't work because record_recv requires the typmod that
identifies the specific record type (in our session) and
array_agg_deserialize has no convenient way to get that information.
The result is an "input of anonymous composite types is not
implemented" error.

We could probably make this work if we had to, but it does not seem
worth the trouble, given that it took this long to get a field report.
Just shut off parallelization, as though record_recv didn't exist.

Oversight in commit 16fd03e95.  Back-patch to v16 where that
came in.

Reported-by: Kirill Zdornyy <kirill@dineserve.com>
Diagnosed-by: Richard Guo <guofenglinux@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/atLI5Kce2ie1zcYjU0w_kjtVaxiYbYGTihrkLDmGZQnRDD4pnXukIATaABbnIj9pUnelC4ESvCXMm4HAyHg-v61XABaKpERj0A2IXzJZM7g=@dineserve.com
Backpatch-through: 16
2025-03-09 13:11:20 -04:00
3c472a1829 doc: Adjust note about pg_upgrade's --jobs option.
Presently, this section lists a couple of parallelized parts of
pg_upgrade and suggests a starting point for setting the --jobs
option.  The list of parallelized tasks is not particularly
actionable, and the phrasing for the --jobs recommendation is
confusing to some readers.

This commit attempts to improve this section by eliminating the
list of parallelized tasks and instead highlighting that --jobs is
most useful for clusters with multiple databases or tablespaces.
Additionally, the recommendation for setting --jobs is simplified
to suggest starting with the number of CPU cores.

Reported-by: Magnus Hagander <magnus@hagander.net>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Magnus Hagander <magnus@hagander.net>
Discussion: https://postgr.es/m/Z8dBn_5iGLNuYiPo%40nathan
2025-03-08 14:28:16 -06:00
1852aea3f5 Don't convert to and from floats in pg_dump.
Commit 8f427187db improved performance by remembering relation stats
as native types rather than issuing a new query for each relation.

Using native types is fine for integers like relpages; but reltuples
is floating point. The commit controllled for that complexity by using
setlocale(LC_NUMERIC, "C"). After that, Alexander Lakhin found a
problem in pg_strtof(), fixed in 00d61a08c5.

While we aren't aware of any more problems with that approach, it
seems wise to just use a string the whole way for floating point
values, as Corey's original patch did, and get rid of the
setlocale(). Integers are still converted to native types to avoid
wasting memory.

Co-authored-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/3049348.1740855411@sss.pgh.pa.us
Discussion: https://postgr.es/m/560cca3781740bd69881bb07e26eb8f65b09792c.camel%40j-davis.com
2025-03-08 11:25:36 -08:00
7fb8801021 Clear errno before calling strtol() in spell.c.
Per POSIX, a caller of strtol() that wishes to check for errors must
set errno to 0 beforehand.  Several places in spell.c neglected that,
so that they risked delivering a false overflow error in case errno
had been ERANGE already.  Given the lack of field reports, this case
may be unreachable at present --- but it's surely trouble waiting to
happen, so fix it.

Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://postgr.es/m/CA+COZaBhsq6EromFm+knMJfzK6nTpG23zJ+K2=nfUQQXcj_xcQ@mail.gmail.com
Backpatch-through: 13
2025-03-08 11:24:25 -05:00
67fc4c9fd7 Make parallel nbtree index scans use an LWLock.
Teach parallel nbtree index scans to use an LWLock (not a spinlock) to
protect the scan's shared descriptor state.

Preparation for an upcoming patch that will add skip scan optimizations
to nbtree.  That patch will create the need to occasionally allocate
memory while the scan descriptor is locked, while copying datums that
were serialized by another backend.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
2025-03-08 11:10:14 -05:00
8021c77769 Make amcanorder independent of amconsistentordering
Follow-up to commit af4002b381: Make amconsistentordering not depend
on amcanorder.  Although they are related, they are independent
properties.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
2025-03-08 09:37:06 +01:00