1
0
mirror of https://github.com/postgres/postgres.git synced 2025-12-19 17:02:53 +03:00
Commit Graph

26043 Commits

Author SHA1 Message Date
Noah Misch
bcb784e7d2 Assert lack of hazardous buffer locks before possible catalog read.
Commit 0bada39c83 fixed a bug of this kind,
which existed in all branches for six days before detection.  While the
probability of reaching the trouble was low, the disruption was extreme.  No
new backends could start, and service restoration needed an immediate
shutdown.  Hence, add this to catch the next bug like it.

The new check in RelationIdGetRelation() suffices to make autovacuum detect
the bug in commit 243e9b40f1 that led to commit
0bada39.  This also checks in a number of similar places.  It replaces each
Assert(IsTransactionState()) that pertained to a conditional catalog read.

Back-patch to v14 - v17.  This a back-patch of commit
f4ece891fc (from before v18 branched) to
all supported branches, to accompany the back-patch of commits 243e9b4
and 0bada39.  For catalog indexes, the bttextcmp() behavior that
motivated IsCatalogTextUniqueIndexOid() was v18-specific.  Hence, this
back-patch doesn't need that or its correction from commit
4a4ee0c2c1.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/20250410191830.0e.nmisch@google.com
Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
Backpatch-through: 14-17
2025-12-16 16:13:54 -08:00
Noah Misch
d3e5d89504 WAL-log inplace update before revealing it to other sessions.
A buffer lock won't stop a reader having already checked tuple
visibility.  If a vac_update_datfrozenid() and then a crash happened
during inplace update of a relfrozenxid value, datfrozenxid could
overtake relfrozenxid.  That could lead to "could not access status of
transaction" errors.

Back-patch to v14 - v17.  This is a back-patch of commits:

- 8e7e672cda
  (main change, on master, before v18 branched)
- 8180136652
  (defect fix, on master, before v18 branched)

It reverses commit bc6bad8857, my revert
of the original back-patch.

In v14, this also back-patches the assertion removal from commit
7fcf2faf9c.

Discussion: https://postgr.es/m/20240620012908.92.nmisch@google.com
Backpatch-through: 14-17
2025-12-16 16:13:54 -08:00
Noah Misch
0f69beddea For inplace update, send nontransactional invalidations.
The inplace update survives ROLLBACK.  The inval didn't, so another
backend's DDL could then update the row without incorporating the
inplace update.  In the test this fixes, a mix of CREATE INDEX and ALTER
TABLE resulted in a table with an index, yet relhasindex=f.  That is a
source of index corruption.

Back-patch to v14 - v17.  This is a back-patch of commits:

- 243e9b40f1
  (main change, on master, before v18 branched)
- 0bada39c83
  (defect fix, on master, before v18 branched)
- bae8ca82fd
  (cosmetics from post-commit review, on REL_18_STABLE)

It reverses commit c1099dd745, my revert
of the original back-patch of 243e9b4.

This back-patch omits the non-comment heap_decode() changes.  I find
those changes removed harmless code that was last necessary in v13.  See
discussion thread for details.  The back branches aren't the place to
remove such code.

Like the original back-patch, this doesn't change WAL, because these
branches use end-of-recovery SIResetAll().  All branches change the ABI
of extern function PrepareToInvalidateCacheTuple().  No PGXN extension
calls that, and there's no apparent use case in extensions.  Expect
".abi-compliance-history" edits to follow.

Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Surya Poondla <s_poondla@apple.com>
Reviewed-by: Ilyasov Ian <ianilyasov@outlook.com>
Reviewed-by: Nitin Motiani <nitinmotiani@google.com> (in earlier versions)
Reviewed-by: Andres Freund <andres@anarazel.de> (in earlier versions)
Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
Backpatch-through: 14-17
2025-12-16 16:13:54 -08:00
Robert Haas
1d0fc2499f Switch memory contexts in ReinitializeParallelDSM.
We already do this in CreateParallelContext, InitializeParallelDSM, and
LaunchParallelWorkers. I suspect the reason why the matching logic was
omitted from ReinitializeParallelDSM is that I failed to realize that
any memory allocation was happening here -- but shm_mq_attach does
allocate, which could result in a shm_mq_handle being allocated in a
shorter-lived context than the ParallelContext which points to it.

That could result in a crash if the shorter-lived context is freed
before the parallel context is destroyed. As far as I am currently
aware, there is no way to reach a crash using only code that is
present in core PostgreSQL, but extensions could potentially trip
over this. Fixing this in the back-branches appears low-risk, so
back-patch to all supported versions.

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Co-authored-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Backpatch-through: 14
Discussion: http://postgr.es/m/CAKZiRmwfVripa3FGo06=5D1EddpsLu9JY2iJOTgbsxUQ339ogQ@mail.gmail.com
2025-12-16 10:59:04 -05:00
Michael Paquier
f5927da4ff Fail recovery when missing redo checkpoint record without backup_label
This commit adds an extra check at the beginning of recovery to ensure
that the redo record of a checkpoint exists before attempting WAL
replay, logging a PANIC if the redo record referenced by the checkpoint
record could not be found.  This is the same level of failure as when a
checkpoint record is missing.  This check is added when a cluster is
started without a backup_label, after retrieving its checkpoint record.
The redo LSN used for the check is retrieved from the checkpoint record
successfully read.

In the case where a backup_label exists, the startup process already
fails if the redo record cannot be found after reading a checkpoint
record at the beginning of recovery.

Previously, the presence of the redo record was not checked.  If the
redo and checkpoint records were located on different WAL segments, it
would be possible to miss a entire range of WAL records that should have
been replayed but were just ignored.  The consequences of missing the
redo record depend on the version dealt with, these becoming worse the
older the version used:
- On HEAD, v18 and v17, recovery fails with a pointer dereference at the
beginning of the redo loop, as the redo record is expected but cannot be
found.  These versions are good students, because we detect a failure
before doing anything, even if the failure is misleading in the shape of
a segmentation fault, giving no information that the redo record is
missing.
- In v16 and v15, problems show at the end of recovery within
FinishWalRecovery(), the startup process using a buggy LSN to decide
from where to start writing WAL.  The cluster gets corrupted, still it
is noisy about it.
- v14 and older versions are worse: a cluster gets corrupted but it is
entirely silent about the matter.  The redo record missing causes the
startup process to skip entirely recovery, because a missing record is
the same as not redo being required at all.  This leads to data loss, as
everything is missed between the redo record and the checkpoint record.

Note that I have tested that down to 9.4, reproducing the issue with a
version of the author's reproducer slightly modified.  The code is wrong
since at least 9.2, but I did not look at the exact point of origin.

This problem has been found by debugging a cluster where the WAL segment
including the redo segment was missing due to an operator error, leading
to a crash, based on an investigation in v15.

Requesting archive recovery with the creation of a recovery.signal or
a standby.signal even without a backup_label would mitigate the issue:
if the record cannot be found in pg_wal/, the missing segment can be
retrieved with a restore_command when checking that the redo record
exists.  This was already the case without this commit, where recovery
would re-fetch the WAL segment that includes the redo record.  The check
introduced by this commit makes the segment to be retrieved earlier to
make sure that the redo record can be found.

On HEAD, the code will be slightly changed in a follow-up commit to not
rely on a PANIC, to include a test able to emulate the original problem.
This is a minimal backpatchable fix, kept separated for clarity.

Reported-by: Andres Freund <andres@anarazel.de>
Analyzed-by: Andres Freund <andres@anarazel.de>
Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Discussion: https://postgr.es/m/20231023232145.cmqe73stvivsmlhs@awork3.anarazel.de
Discussion: https://postgr.es/m/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m=daGqiOuVdizYWYaA@mail.gmail.com
Backpatch-through: 14
2025-12-16 13:29:39 +09:00
Heikki Linnakangas
cd1a887fe9 Clarify comment on multixid offset wraparound check
Coverity complained that offset cannot be 0 here because there's an
explicit check for "offset == 0" earlier in the function, but it
didn't see the possibility that offset could've wrapped around to 0.
The code is correct, but clarify the comment about it.

The same code exists in backbranches in the server
GetMultiXactIdMembers() function and in 'master' in the pg_upgrade
GetOldMultiXactIdSingleMember function. In backbranches Coverity
didn't complain about it because the check was merely an assertion,
but change the comment in all supported branches for consistency.

Per Tom Lane's suggestion.

Discussion: https://www.postgresql.org/message-id/1827755.1765752936@sss.pgh.pa.us
2025-12-15 11:47:58 +02:00
Michael Paquier
0bab0c3b74 Fix allocation formula in llvmjit_expr.c
An array of LLVMBasicBlockRef is allocated with the size used for an
element being "LLVMBasicBlockRef *" rather than "LLVMBasicBlockRef".
LLVMBasicBlockRef is a type that refers to a pointer, so this did not
directly cause a problem because both should have the same size, still
it is incorrect.

This issue has been spotted while reviewing a different patch, and
exists since 2a0faed9d7, so backpatch all the way down.

Discussion: https://postgr.es/m/CA+hUKGLngd9cKHtTUuUdEo2eWEgUcZ_EQRbP55MigV2t_zTReg@mail.gmail.com
Backpatch-through: 14
2025-12-11 10:25:46 +09:00
Heikki Linnakangas
998d100cdb Fix some near-bugs related to ResourceOwner function arguments
These functions took a ResourceOwner argument, but only checked if it
was NULL, and then used CurrentResourceOwner for the actual work.
Surely the intention was to use the passed-in resource owner. All
current callers passed CurrentResourceOwner or NULL, so this has no
consequences at the moment, but it's an accident waiting to happen for
future caller and extensions.

Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEze2Whnfv8VuRZaohE-Af+GxBA1SNfD_rXfm84Jv-958UCcJA@mail.gmail.com
Backpatch-through: 17
2025-12-10 11:44:07 +02:00
Amit Kapila
f2818868ae Fix LOCK_TIMEOUT handling in slotsync worker.
Previously, the slotsync worker relied on SIGINT for graceful shutdown
during promotion. However, SIGINT is also used by the LOCK_TIMEOUT handler
to cancel queries. Since the slotsync worker can lock catalog tables while
parsing libpq tuples, this overlap caused it to ignore LOCK_TIMEOUT
signals and potentially wait indefinitely on locks.

This patch replaces the slotsync worker's SIGINT handler with
StatementCancelHandler to correctly process query-cancel interrupts.
Additionally, the startup process now uses SIGUSR1 to signal the slotsync
worker to stop during promotion. The worker exits after detecting that the
shared memory flag stopSignaled is set.

Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17, here it was introduced
Discussion: https://postgr.es/m/TY4PR01MB169078F33846E9568412D878C94A2A@TY4PR01MB16907.jpnprd01.prod.outlook.com
2025-12-09 07:02:08 +00:00
David Rowley
ca98d8ba10 Doc: fix typo in hash index documentation
Plus a similar fix to the README.

Backpatch as far back as the sgml issue exists.  The README issue does
exist in v14, but that seems unlikely to harm anyone.

Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ed3db7ea-55b4-4809-86af-81ad3bb2c7d3@gmail.com
Backpatch-through: 15
2025-12-09 14:42:40 +13:00
Heikki Linnakangas
cad40cec24 Fix setting next multixid's offset at offset wraparound
In commit 789d65364c, we started updating the next multixid's offset
too when recording a multixid, so that it can always be used to
calculate the number of members. I got it wrong at offset wraparound:
we need to skip over offset 0. Fix that.

Discussion: https://www.postgresql.org/message-id/d9996478-389a-4340-8735-bfad456b313c@iki.fi
Backpatch-through: 14
2025-12-05 11:35:44 +02:00
Heikki Linnakangas
8ba61bc063 Set next multixid's offset when creating a new multixid
With this commit, the next multixid's offset will always be set on the
offsets page, by the time that a backend might try to read it, so we
no longer need the waiting mechanism with the condition variable. In
other words, this eliminates "corner case 2" mentioned in the
comments.

The waiting mechanism was broken in a few scenarios:

- When nextMulti was advanced without WAL-logging the next
  multixid. For example, if a later multixid was already assigned and
  WAL-logged before the previous one was WAL-logged, and then the
  server crashed. In that case the next offset would never be set in
  the offsets SLRU, and a query trying to read it would get stuck
  waiting for it. Same thing could happen if pg_resetwal was used to
  forcibly advance nextMulti.

- In hot standby mode, a deadlock could happen where one backend waits
  for the next multixid assignment record, but WAL replay is not
  advancing because of a recovery conflict with the waiting backend.

The old TAP test used carefully placed injection points to exercise
the old waiting code, but now that the waiting code is gone, much of
the old test is no longer relevant. Rewrite the test to reproduce the
IPC/MultixactCreation hang after crash recovery instead, and to verify
that previously recorded multixids stay readable.

Backpatch to all supported versions. In back-branches, we still need
to be able to read WAL that was generated before this fix, so in the
back-branches this includes a hack to initialize the next offsets page
when replaying XLOG_MULTIXACT_CREATE_ID for the last multixid on a
page. On 'master', bump XLOG_PAGE_MAGIC instead to indicate that the
WAL is not compatible.

Author: Andrey Borodin <amborodin@acm.org>
Reviewed-by: Dmitry Yurichev <dsy.075@yandex.ru>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Ivan Bykov <i.bykov@modernsys.ru>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/172e5723-d65f-4eec-b512-14beacb326ce@yandex.ru
Backpatch-through: 14
2025-12-03 19:15:21 +02:00
Dean Rasheed
c090965036 Avoid rewriting data-modifying CTEs more than once.
Formerly, when updating an auto-updatable view, or a relation with
rules, if the original query had any data-modifying CTEs, the rewriter
would rewrite those CTEs multiple times as RewriteQuery() recursed
into the product queries. In most cases that was harmless, because
RewriteQuery() is mostly idempotent. However, if the CTE involved
updating an always-generated column, it would trigger an error because
any subsequent rewrite would appear to be attempting to assign a
non-default value to the always-generated column.

This could perhaps be fixed by attempting to make RewriteQuery() fully
idempotent, but that looks quite tricky to achieve, and would probably
be quite fragile, given that more generated-column-type features might
be added in the future.

Instead, fix by arranging for RewriteQuery() to rewrite each CTE
exactly once (by tracking the number of CTEs already rewritten as it
recurses). This has the advantage of being simpler and more efficient,
but it does make RewriteQuery() dependent on the order in which
rewriteRuleAction() joins the CTE lists from the original query and
the rule action, so care must be taken if that is ever changed.

Reported-by: Bernice Southey <bernice.southey@gmail.com>
Author: Bernice Southey <bernice.southey@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CAEDh4nyD6MSH9bROhsOsuTqGAv_QceU_GDvN9WcHLtZTCYM1kA@mail.gmail.com
Backpatch-through: 14
2025-11-29 12:32:12 +00:00
Tom Lane
e79b276621 Allow indexscans on partial hash indexes with implied quals.
Normally, if a WHERE clause is implied by the predicate of a partial
index, we drop that clause from the set of quals used with the index,
since it's redundant to test it if we're scanning that index.
However, if it's a hash index (or any !amoptionalkey index), this
could result in dropping all available quals for the index's first
key, preventing us from generating an indexscan.

It's fair to question the practical usefulness of this case.  Since
hash only supports equality quals, the situation could only arise
if the index's predicate is "WHERE indexkey = constant", implying
that the index contains only one hash value, which would make hash
a really poor choice of index type.  However, perhaps there are
other !amoptionalkey index AMs out there with which such cases are
more plausible.

To fix, just don't filter the candidate indexquals this way if
the index is !amoptionalkey.  That's a bit hokey because it may
result in testing quals we didn't need to test, but to do it
more accurately we'd have to redundantly identify which candidate
quals are actually usable with the index, something we don't know
at this early stage of planning.  Doesn't seem worth the effort.

Reported-by: Sergei Glukhov <s.glukhov@postgrespro.ru>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/e200bf38-6b45-446a-83fd-48617211feff@postgrespro.ru
Backpatch-through: 14
2025-11-27 13:09:59 -05:00
Amit Langote
b5511fed50 Fix error reporting for SQL/JSON path type mismatches
transformJsonFuncExpr() used exprType()/exprLocation() on the
possibly coerced path expression, which could be NULL when
coercion to jsonpath failed, leading to "cache lookup failed
for type 0" errors.

Preserve the original expression node so that type and location
in the "must be of type jsonpath" error are reported correctly.
Add regression tests to cover these cases.

Reported-by: Jian He <jian.universality@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CACJufxHunVg81JMuNo8Yvv_hJD0DicgaVN2Wteu8aJbVJPBjZA@mail.gmail.com
Backpatch-through: 17
2025-11-27 11:59:36 +09:00
Nathan Bossart
2fc5c50622 Teach DSM registry to retry entry initialization if needed.
If DSM registry entry initialization fails, backends could try to
use an uninitialized DSM segment, DSA, or dshash table (since the
entry is still added to the registry).  To fix, restructure the
code so that the registry retries initialization as needed.  This
commit also modifies pg_get_dsm_registry_allocations() to leave out
partially-initialized entries, as they shouldn't have any allocated
memory.

DSM registry entry initialization shouldn't fail often in practice,
but retrying was deemed better than leaving entries in a
permanently failed state (as was done by commit 1165a933aa, which
has since been reverted).

Suggested-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/E1vJHUk-006I7r-37%40gemulon.postgresql.org
Backpatch-through: 17
2025-11-26 15:12:25 -06:00
Nathan Bossart
c7e0f263d6 Revert "Teach DSM registry to ERROR if attaching to an uninitialized entry."
This reverts commit 1165a933aa (and the corresponding commits on
the back-branches).  In a follow-up commit, we'll teach the
registry to retry entry initialization instead of leaving it in a
permanently failed state.

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/E1vJHUk-006I7r-37%40gemulon.postgresql.org
Backpatch-through: 17
2025-11-26 11:37:21 -06:00
Andres Freund
427e886a79 lwlock: Fix, currently harmless, bug in LWLockWakeup()
Accidentally the code in LWLockWakeup() checked the list of to-be-woken up
processes to see if LW_FLAG_HAS_WAITERS should be unset. That means that
HAS_WAITERS would not get unset immediately, but only during the next,
unnecessary, call to LWLockWakeup().

Luckily, as the code stands, this is just a small efficiency issue.

However, if there were (as in a patch of mine) a case in which LWLockWakeup()
would not find any backend to wake, despite the wait list not being empty,
we'd wrongly unset LW_FLAG_HAS_WAITERS, leading to potentially hanging.

While the consequences in the backbranches are limited, the code as-is
confusing, and it is possible that there are workloads where the additional
wait list lock acquisitions hurt, therefore backpatch.

Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
Backpatch-through: 14
2025-11-24 17:39:57 -05:00
Thomas Munro
60215eae7c jit: Adjust AArch64-only code for LLVM 21.
LLVM 21 changed the arguments of RTDyldObjectLinkingLayer's
constructor, breaking compilation with the backported
SectionMemoryManager from commit 9044fc1d.

cd585864c0

Backpatch-through: 14
Author: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/d25e6e4a-d1b4-84d3-2f8a-6c45b975f53d%40applied-asynchrony.com
2025-11-22 21:22:37 +13:00
Tom Lane
075a763e2d Don't allow CTEs to determine semantic levels of aggregates.
The fix for bug #19055 (commit b0cc0a71e) allowed CTE references in
sub-selects within aggregate functions to affect the semantic levels
assigned to such aggregates.  It turns out this broke some related
cases, leading to assertion failures or strange planner errors such
as "unexpected outer reference in CTE query".  After experimenting
with some alternative rules for assigning the semantic level in
such cases, we've come to the conclusion that changing the level
is more likely to break things than be helpful.

Therefore, this patch undoes what b0cc0a71e changed, and instead
installs logic to throw an error if there is any reference to a
CTE that's below the semantic level that standard SQL rules would
assign to the aggregate based on its contained Var and Aggref nodes.
(The SQL standard disallows sub-selects within aggregate functions,
so it can't reach the troublesome case and hence has no rule for
what to do.)

Perhaps someone will come along with a legitimate query that this
logic rejects, and if so probably the example will help us craft
a level-adjustment rule that works better than what b0cc0a71e did.
I'm not holding my breath for that though, because the previous
logic had been there for a very long time before bug #19055 without
complaints, and that bug report sure looks to have originated from
fuzzing not from real usage.

Like b0cc0a71e, back-patch to all supported branches, though
sadly that no longer includes v13.

Bug: #19106
Reported-by: Kamil Monicz <kamil@monicz.dev>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19106-9dd3668a0734cd72@postgresql.org
Backpatch-through: 14
2025-11-18 12:56:55 -05:00
Thomas Munro
d66a922f92 Define PS_USE_CLOBBER_ARGV on GNU/Hurd.
Until d2ea2d310d, the PS_USE_PS_STRINGS
option was used on the GNU/Hurd. As this option got removed and
PS_USE_CLOBBER_ARGV appears to work fine nowadays on the Hurd, define
this one to re-enable process title changes on this platform.

In the 14 and 15 branches, the existing test for __hurd__ (added 25
years ago by commit 209aa77d, removed in 16 by the above commit) is left
unchanged for now as it was activating slightly different code paths and
would need investigation by a Hurd user.

Author: Michael Banck <mbanck@debian.org>
Discussion: https://postgr.es/m/CA%2BhUKGJMNGUAqf27WbckYFrM-Mavy0RKJvocfJU%3DJ2XcAZyv%2Bw%40mail.gmail.com
Backpatch-through: 16
2025-11-17 12:48:37 +13:00
Dean Rasheed
d6c415c4b4 Fix Assert failure in EXPLAIN ANALYZE MERGE with a concurrent update.
When instrumenting a MERGE command containing both WHEN NOT MATCHED BY
SOURCE and WHEN NOT MATCHED BY TARGET actions using EXPLAIN ANALYZE, a
concurrent update of the target relation could lead to an Assert
failure in show_modifytable_info(). In a non-assert build, this would
lead to an incorrect value for "skipped" tuples in the EXPLAIN output,
rather than a crash.

This could happen if the concurrent update caused a matched row to no
longer match, in which case ExecMerge() treats the single originally
matched row as a pair of not matched rows, and potentially executes 2
not-matched actions for the single source row. This could then lead to
a state where the number of rows processed by the ModifyTable node
exceeds the number of rows produced by its source node, causing
"skipped_path" in show_modifytable_info() to be negative, triggering
the Assert.

Fix this in ExecMergeMatched() by incrementing the instrumentation
tuple count on the source node whenever a concurrent update of this
kind is detected, if both kinds of merge actions exist, so that the
number of source rows matches the number of actions potentially
executed, and the "skipped" tuple count is correct.

Back-patch to v17, where support for WHEN NOT MATCHED BY SOURCE
actions was introduced.

Bug: #19111
Reported-by: Dilip Kumar <dilipbalaut@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/19111-5b06624513d301b3@postgresql.org
Backpatch-through: 17
2025-11-16 22:15:45 +00:00
Nathan Bossart
505ce19a20 Add note about CreateStatistics()'s selective use of check_rights.
Commit 5e4fcbe531 added a check_rights parameter to this function
for use by ALTER TABLE commands that re-create statistics objects.
However, we intentionally ignore check_rights when verifying
relation ownership because this function's lookup could return a
different answer than the caller's.  This commit adds a note to
this effect so that we remember it down the road.

Reviewed-by: Noah Misch <noah@leadboat.com>
Backpatch-through: 14
2025-11-14 13:20:09 -06:00
Nathan Bossart
ac2800ddc1 Teach DSM registry to ERROR if attaching to an uninitialized entry.
If DSM entry initialization fails, backends could try to use an
uninitialized DSM segment, DSA, or dshash table (since the entry is
still added to the registry).  To fix, keep track of whether
initialization completed, and ERROR if a backend tries to attach to
an uninitialized entry.  We could instead retry initialization as
needed, but that seemed complicated, error prone, and unlikely to
help most cases.  Furthermore, such problems probably indicate a
coding error.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/dd36d384-55df-4fc2-825c-5bc56c950fa9%40gmail.com
Backpatch-through: 17
2025-11-12 14:30:11 -06:00
Heikki Linnakangas
d80d5f0995 Clear 'xid' in dummy async notify entries written to fill up pages
Before we started to freeze async notify entries (commit 8eeb4a0f7c),
no one looked at the 'xid' on an entry with invalid 'dboid'. But now
we might actually need to freeze it later. Initialize them with
InvalidTransactionId to begin with, to avoid that work later.

Álvaro pointed this out in review of commit 8eeb4a0f7c, but I forgot
to include this change there.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://www.postgresql.org/message-id/202511071410.52ll56eyixx7@alvherre.pgsql
Backpatch-through: 14
2025-11-12 21:27:02 +02:00
Heikki Linnakangas
c2682810ab Fix remaining race condition with CLOG truncation and LISTEN/NOTIFY
Previous commit fixed a bug where VACUUM would truncate the CLOG
that's still needed to check the commit status of XIDs in the async
notify queue, but as mentioned in the commit message, it wasn't a full
fix. If a backend is executing asyncQueueReadAllNotifications() and
has just made a local copy of an async SLRU page which contains old
XIDs, vacuum can concurrently truncate the CLOG covering those XIDs,
and the backend still gets an error when it calls
TransactionIdDidCommit() on those XIDs in the local copy. This commit
fixes that race condition.

To fix, hold the SLRU bank lock across the TransactionIdDidCommit()
calls in NOTIFY processing.

Per Tom Lane's idea. Backpatch to all supported versions.

Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Discussion: https://www.postgresql.org/message-id/2759499.1761756503@sss.pgh.pa.us
Backpatch-through: 14
2025-11-12 21:01:16 +02:00
Heikki Linnakangas
d02c03ddc5 Fix bug where we truncated CLOG that was still needed by LISTEN/NOTIFY
The async notification queue contains the XID of the sender, and when
processing notifications we call TransactionIdDidCommit() on the
XID. But we had no safeguards to prevent the CLOG segments containing
those XIDs from being truncated away. As a result, if a backend didn't
for some reason process its notifications for a long time, or when a
new backend issued LISTEN, you could get an error like:

test=# listen c21;
ERROR:  58P01: could not access status of transaction 14279685
DETAIL:  Could not open file "pg_xact/000D": No such file or directory.
LOCATION:  SlruReportIOError, slru.c:1087

To fix, make VACUUM "freeze" the XIDs in the async notification queue
before truncating the CLOG. Old XIDs are replaced with
FrozenTransactionId or InvalidTransactionId.

Note: This commit is not a full fix. A race condition remains, where a
backend is executing asyncQueueReadAllNotifications() and has just
made a local copy of an async SLRU page which contains old XIDs, while
vacuum concurrently truncates the CLOG covering those XIDs. When the
backend then calls TransactionIdDidCommit() on those XIDs from the
local copy, you still get the error. The next commit will fix that
remaining race condition.

This was first reported by Sergey Zhuravlev in 2021, with many other
people hitting the same issue later. Thanks to:
- Alexandra Wang, Daniil Davydov, Andrei Varashen and Jacques Combrink
  for investigating and providing reproducable test cases,
- Matheus Alcantara and Arseniy Mukhin for review and earlier proposed
  patches to fix this,
- Álvaro Herrera and Masahiko Sawada for reviews,
- Yura Sokolov aka funny-falcon for the idea of marking transactions
  as committed in the notification queue, and
- Joel Jacobson for the final patch version. I hope I didn't forget
  anyone.

Backpatch to all supported versions. I believe the bug goes back all
the way to commit d1e027221d, which introduced the SLRU-based async
notification queue.

Discussion: https://www.postgresql.org/message-id/16961-25f29f95b3604a8a@postgresql.org
Discussion: https://www.postgresql.org/message-id/18804-bccbbde5e77a68c2@postgresql.org
Discussion: https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw@mail.gmail.com
Backpatch-through: 14
2025-11-12 21:01:13 +02:00
Heikki Linnakangas
b821c92920 Escalate ERRORs during async notify processing to FATAL
Previously, if async notify processing encountered an error, we would
report the error to the client and advance our read position past the
offending entry to prevent trying to process it over and over
again. Trying to continue after an error has a few problems however:

- We have no way of telling the client that a notification was
  lost. They get an ERROR, but that doesn't tell you much. As such,
  it's not clear if keeping the connection alive after losing a
  notification is a good thing. Depending on the application logic,
  missing a notification could cause the application to get stuck
  waiting, for example.

- If the connection is idle, PqCommReadingMsg is set and any ERROR is
  turned into FATAL anyway.

- We bailed out of the notification processing loop on first error
  without processing any subsequent notifications. The subsequent
  notifications would not be processed until another notify interrupt
  arrives. For example, if there were two notifications pending, and
  processing the first one caused an ERROR, the second notification
  would not be processed until someone sent a new NOTIFY.

This commit changes the behavior so that any ERROR while processing
async notifications is turned into FATAL, causing the client
connection to be terminated. That makes the behavior more consistent
as that's what happened in idle state already, and terminating the
connection is a clear signal to the application that it might've
missed some notifications.

The reason to do this now is that the next commits will change the
notification processing code in a way that would make it harder to
skip over just the offending notification entry on error.

Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Discussion: https://www.postgresql.org/message-id/fedbd908-4571-4bbe-b48e-63bfdcc38f64@iki.fi
Backpatch-through: 14
2025-11-12 21:01:08 +02:00
Daniel Gustafsson
1aa5a029fc Fix range for commit_siblings in sample conf
The range for commit_siblings was incorrectly listed as starting on 1
instead of 0 in the sample configuration file.  Backpatch down to all
supported branches.

Author: Man Zeng <zengman@halodbtech.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/tencent_53B70BA72303AE9C6889E78E@qq.com
Backpatch-through: 14
2025-11-12 13:51:53 +01:00
Michael Paquier
f30cd34b3f Report better object limits in error messages for injection points
Previously, error messages for oversized injection point names, libraries,
and functions showed buffer sizes (64, 128, 128) instead of the usable
character limits (63, 127, 127) as it did not count for the
zero-terminated byte, which was confusing.  These messages are adjusted
to show better the reality.

The limit enforced for the private area was also too strict by one byte,
as specifying a zone worth exactly INJ_PRIVATE_MAXLEN should be able to
work because three is no zero-terminated byte in this case.

This is a stylistic change (well, mostly, a private_area size of exactly
1024 bytes can be defined with this change, something that nobody seem
to care about based on the lack of complaints).  However, this is a
testing facility let's keep the logic consistent across all the branches
where this code exists, as there is an argument in favor of out-of-core
extensions that use injection points.

Author: Xuneng Zhou <xunengzhou@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CABPTF7VxYp4Hny1h+7ejURY-P4O5-K8WZg79Q3GUx13cQ6B2kg@mail.gmail.com
Backpatch-through: 17
2025-11-12 10:19:20 +09:00
Nathan Bossart
e2fb3dfa81 Check for CREATE privilege on the schema in CREATE STATISTICS.
This omission allowed table owners to create statistics in any
schema, potentially leading to unexpected naming conflicts.  For
ALTER TABLE commands that require re-creating statistics objects,
skip this check in case the user has since lost CREATE on the
schema.  The addition of a second parameter to CreateStatistics()
breaks ABI compatibility, but we are unaware of any impacted
third-party code.

Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Security: CVE-2025-12817
Backpatch-through: 13
2025-11-10 09:00:00 -06:00
Peter Eisentraut
3b5a61d413 Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 94da961c417317e08b3eb80ed33e7ebb7a165fce
2025-11-10 13:02:28 +01:00
Peter Eisentraut
07f787e573 Disallow generated columns in COPY WHERE clause
Stored generated columns are not yet computed when the filtering
happens, so we need to prohibit them to avoid incorrect behavior.

Co-authored-by: jian he <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CACJufxHb8YPQ095R_pYDr77W9XKNaXg5Rzy-WP525mkq+hRM3g@mail.gmail.com
2025-11-06 14:01:42 +01:00
Etsuro Fujita
d898e68469 Update obsolete comment in ExecScanReScan().
Commit 27cc7cd2b removed the epqScanDone flag from the EState struct,
and instead added an equivalent flag named relsubs_done to the EPQState
struct; but it failed to update this comment.

Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAPmGK152zJ3fU5avDT5udfL0namrDeVfMTL3dxdOXw28SOrycg%40mail.gmail.com
Backpatch-through: 13
2025-11-06 12:25:02 +09:00
Tom Lane
a9515f294d Avoid possible crash within libsanitizer.
We've successfully used libsanitizer for awhile with the undefined
and alignment sanitizers, but with some other sanitizers (at least
thread and hwaddress) it crashes due to internal recursion before
it's fully initialized itself.  It turns out that that's due to the
"__ubsan_default_options" hack installed by commit f686ae82f, and we
can fix it by ensuring that __ubsan_default_options is built without
any sanitizer instrumentation hooks.

Reported-by: Emmanuel Sibi <emmanuelsibi.mec@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Emmanuel Sibi <emmanuelsibi.mec@gmail.com>
Fix-suggested-by: Jacob Champion <jacob.champion@enterprisedb.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/F7543B04-E56C-4D68-A040-B14CCBAD38F1@gmail.com
Discussion: https://postgr.es/m/dbf77bf7-6e54-ed8a-c4ae-d196eeb664ce@gmail.com
Backpatch-through: 16
2025-11-05 11:09:30 -05:00
Andres Freund
3e45d8c67f jit: Fix accidentally-harmless type confusion
In 2a0faed9d7, which added JIT compilation support for expressions, I
accidentally used sizeof(LLVMBasicBlockRef *) instead of
sizeof(LLVMBasicBlockRef) as part of computing the size of an allocation. That
turns out to have no real negative consequences due to LLVMBasicBlockRef being
a pointer itself (and thus having the same size). It still is wrong and
confusing, so fix it.

Reported by coverity.

Backpatch-through: 13
2025-11-04 18:42:03 -05:00
Álvaro Herrera
3b5007347b Fix snapshot handling bug in recent BRIN fix
Commit a95e3d84c0 added ActiveSnapshot push+pop when processing
work-items (BRIN autosummarization), but forgot to handle the case of
a transaction failing during the run, which drops the snapshot untimely.
Fix by making the pop conditional on an element being actually there.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Backpatch-through: 13
Discussion: https://postgr.es/m/202511041648.nofajnuddmwk@alvherre.pgsql
2025-11-04 20:31:43 +01:00
Andres Freund
0f8a2b1336 Backpatch: Fix warnings about declaration of environ on MinGW
Backpatch commit 7bc9a8bdd2 to 13-17. The motivation for backpatching is that
we want to update CI to Debian Trixie. Trixie contains a newer mingw
installation, which would trigger the warning addressed by 7bc9a8bdd2. The
risk of backpatching seems fairly low, given that it did not cause issues in
the branches the commit is already present.

While CI is not present in 13-14, it seems better to be consistent across
branches.

Author: Thomas Munro <tmunro@postgresql.org>
Discussion: https://postgr.es/m/o5yadhhmyjo53svzwvaocww6zkrp63i4f32cw3treuh46pxtza@hyqio5b2tkt6
Backpatch-through: 13
2025-11-04 13:24:57 -05:00
Peter Eisentraut
0b44f2443f Tighten check for generated column in partition key expression
A generated column may end up being part of the partition key
expression, if it's specified as an expression e.g. "(<generated
column name>)" or if the partition key expression contains a whole-row
reference, even though we do not allow a generated column to be part
of partition key expression.  Fix this hole.

Co-authored-by: jian he <jian.universality@gmail.com>
Co-authored-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://www.postgresql.org/message-id/flat/CACJufxF%3DWDGthXSAQr9thYUsfx_1_t9E6N8tE3B8EqXcVoVfQw%40mail.gmail.com
2025-11-04 15:18:09 +01:00
Álvaro Herrera
f4b68b0336 BRIN autosummarization may need a snapshot
It's possible to define BRIN indexes on functions that require a
snapshot to run, but the autosummarization feature introduced by commit
7526e10224 fails to provide one.  This causes autovacuum to leave a
BRIN placeholder tuple behind after a failed work-item execution, making
such indexes less efficient.  Repair by obtaining a snapshot prior to
running the task, and add a test to verify this behavior.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Reported-by: Giovanni Fabris <giovanni.fabris@icon.it>
Reported-by: Arthur Nascimento <tureba@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/202511031106.h4fwyuyui6fz@alvherre.pgsql
2025-11-04 13:23:26 +01:00
Peter Eisentraut
1e6dfdaa04 Error message stylistic correction
Fixup for commit ef5e60a9d3: The inconsistent use of articles was a
bit awkward.
2025-11-04 12:25:20 +01:00
Michael Paquier
e7340b484c Fix unconditional WAL receiver shutdown during stream-archive transition
Commit b4f584f9d2 (affecting v15~, later backpatched down to 13 as of
3635a0a35a) introduced an unconditional WAL receiver shutdown when
switching from streaming to archive WAL sources.  This causes problems
during a timeline switch, when a WAL receiver enters WALRCV_WAITING
state but remains alive, waiting for instructions.

The unconditional shutdown can break some monitoring scenarios as the
WAL receiver gets repeatedly terminated and re-spawned, causing
pg_stat_wal_receiver.status to show a "streaming" instead of "waiting"
status, masking the fact that the WAL receiver is waiting for a new TLI
and a new LSN to be able to continue streaming.

This commit changes the WAL receiver behavior so as the shutdown becomes
conditional, with InstallXLogFileSegmentActive being always reset to
prevent the regression fixed by b4f584f9d2: only terminate the WAL
receiver when it is actively streaming (WALRCV_STREAMING,
WALRCV_STARTING, or WALRCV_RESTARTING).  When in WALRCV_WAITING state,
just reset InstallXLogFileSegmentActive flag to allow archive
restoration without killing the process.  WALRCV_STOPPED and
WALRCV_STOPPING are not reachable states in this code path.  For the
latter, the startup process is the one in charge of setting
WALRCV_STOPPING via ShutdownWalRcv(), waiting for the WAL receiver to
reach a WALRCV_STOPPED state after switching walRcvState, so
WaitForWALToBecomeAvailable() cannot be reached while a WAL receiver is
in a WALRCV_STOPPING state.

A regression test is added to check that a WAL receiver is not stopped
on timeline jump, that fails when the fix of this commit is reverted.

Reported-by: Ryan Bird <ryanzxg@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org
Backpatch-through: 13
2025-11-04 10:52:35 +09:00
Noah Misch
27f714b5da Doc: cover index CONCURRENTLY causing errors in INSERT ... ON CONFLICT.
Author: Mikhail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CANtu0ojXmqjmEzp-=aJSxjsdE76iAsRgHBoK0QtYHimb_mEfsg@mail.gmail.com
Backpatch-through: 13
2025-11-03 12:57:12 -08:00
Michael Paquier
f3fb6bc9fe Fix regression with slot invalidation checks
This commit reverts 818fefd8fd, that has been introduced to address a
an instability in some of the TAP tests due to the presence of random
standby snapshot WAL records, when slots are invalidated by
InvalidatePossiblyObsoleteSlot().

Anyway, this commit had also the consequence of introducing a behavior
regression.  After 818fefd8fd, the code may determine that a slot needs
to be invalidated while it may not require one: the slot may have moved
from a conflicting state to a non-conflicting state between the moment
when the mutex is released and the moment when we recheck the slot, in
InvalidatePossiblyObsoleteSlot().  Hence, the invalidations may be more
aggressive than they actually have to.

105b2cb336 has tackled the test instability in a way that should be
hopefully sufficient for the buildfarm, even for slow members:
- In v18, the test relies on an injection point that bypasses the
creation of the random records generated for standby snapshots,
eliminating the random factor that impacted the test.  This option was
not available when 818fefd8fd was discussed.
- In v16 and v17, the problem was bypassed by disallowing a slot to
become active in some of the scenarios tested.

While on it, this commit adds a comment to document that it is fine for
a recheck to use xmin and LSN values stored in the slot, without storing
and reusing them across multiple checks.

Reported-by: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/f492465f-657e-49af-8317-987460cb68b0.mengjuan.cmj@alibaba-inc.com
Backpatch-through: 16
2025-10-30 13:13:34 +09:00
David Rowley
bd6f986c9e Fix bogus use of "long" in AllocSetCheck()
Because long is 32-bit on 64-bit Windows, it isn't a good datatype to
store the difference between 2 pointers.  The under-sized type could
overflow and lead to scary warnings in MEMORY_CONTEXT_CHECKING builds,
such as:

WARNING:  problem in alloc set ExecutorState: bad single-chunk %p in block %p

However, the problem lies only in the code running the check, not from
an actual memory accounting bug.

Fix by using "Size" instead of "long".  This means using an unsigned
type rather than the previous signed type.  If the block's freeptr was
corrupted, we'd still catch that if the unsigned type wrapped.  Unsigned
allows us to avoid further needless complexities around comparing signed
and unsigned types.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAApHDvo-RmiT4s33J=aC9C_-wPZjOXQ232V-EZFgKftSsNRi4w@mail.gmail.com
2025-10-30 14:49:42 +13:00
Amit Kapila
0024f5a102 Fix GUC check_hook validation for synchronized_standby_slots.
Previously, the check_hook for synchronized_standby_slots attempted to
validate that each specified slot existed and was physical. However, these
checks were not performed during server startup. As a result, if users
configured non-existent slots before startup, the misconfiguration would
go undetected initially. This could later cause parallel query failures,
as newly launched workers would detect the issue and raise an ERROR.

This patch improves the check_hook by validating the syntax and format of
slot names. Validation of slot existence and type is deferred to the WAL
sender process, aligning with the behavior of the check_hook for
primary_slot_name.

Reported-by: Fabrice Chapuis <fabrice636861@gmail.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/CAA5-nLCeO4MQzWipCXH58qf0arruiw0OeUc1+Q=Z=4GM+=v1NQ@mail.gmail.com
2025-10-27 06:34:29 +00:00
David Rowley
0d30746153 Fix incorrect logic for caching ResultRelInfos for triggers
When dealing with ResultRelInfos for partitions, there are cases where
there are mixed requirements for the ri_RootResultRelInfo.  There are
cases when the partition itself requires a NULL ri_RootResultRelInfo and
in the same query, the same partition may require a ResultRelInfo with
its parent set in ri_RootResultRelInfo.  This could cause the column
mapping between the partitioned table and the partition not to be done
which could result in crashes if the column attnums didn't match
exactly.

The fix is simple.  We now check that the ri_RootResultRelInfo matches
what the caller passed to ExecGetTriggerResultRel() and only return a
cached ResultRelInfo when the ri_RootResultRelInfo matches what the
caller wants, otherwise we'll make a new one.

Author: David Rowley <dgrowleyml@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reported-by: Dmitry Fomin <fomin.list@gmail.com>
Discussion: https://postgr.es/m/7DCE78D7-0520-4207-822B-92F60AEA14B4@gmail.com
Backpatch-through: 15
2025-10-26 11:01:53 +13:00
Tom Lane
39d24475c1 Fix off-by-one Asserts in FreePageBtreeInsertInternal/Leaf.
These two functions expect there to be room to insert another item
in the FreePageBtree's array, but their assertions were too weak
to guarantee that.  This has little practical effect granting that
the callers are not buggy, but it seems to be misleading late-model
Coverity into complaining about possible array overrun.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/799984.1761150474@sss.pgh.pa.us
Backpatch-through: 13
2025-10-23 12:32:26 -04:00
Fujii Masao
8ceab82ca5 Add comments explaining overflow entries in the replication lag tracker.
Commit 883a95646a introduced overflow entries in the replication lag tracker
to fix an issue where lag columns in pg_stat_replication could stall when
the replay LSN stopped advancing.

This commit adds comments clarifying the purpose and behavior of overflow
entries to improve code readability and understanding.

Since commit 883a95646a was recently applied and backpatched to all
supported branches, this follow-up commit is also backpatched accordingly.

Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VxqQA_DePxyZ7Y8V+ErYyXkmwJ1P6NC+YC+cvxMipWKw@mail.gmail.com
Backpatch-through: 13
2025-10-23 13:26:30 +09:00
David Rowley
10945148ee Fix incorrect zero extension of Datum in JIT tuple deform code
When JIT deformed tuples (controlled via the jit_tuple_deforming GUC),
types narrower than sizeof(Datum) would be zero-extended up to Datum
width.  This wasn't the same as what fetch_att() does in the standard
tuple deforming code.  Logically the values are the same when fetching
via the DatumGet*() marcos, but negative numbers are not the same in
binary form.

In the report, the problem was manifesting itself with:

ERROR: could not find memoization table entry

in a query which had a "Cache Mode: binary" Memoize node. However, it's
currently unclear what else is affected.  Anything that uses
datum_image_eq() or datum_image_hash() on a Datum from a tuple deformed by
JIT could be affected, but it may not be limited to that.

The fix for this is simple: use signed extension instead of zero
extension.

Many thanks to Emmanuel Touzery for reporting this issue and providing
steps and backup which allowed the problem to easily be recreated.

Reported-by: Emmanuel Touzery <emmanuel.touzery@plandela.si>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/DB8P194MB08532256D5BAF894F241C06393F3A@DB8P194MB0853.EURP194.PROD.OUTLOOK.COM
Backpatch-through: 13
2025-10-23 13:12:49 +13:00