If extensions of equal names were installed in different directories
in the path, the views pg_available_extensions and
pg_available_extension_versions would show all of them, even though
only the first one was actually reachable by CREATE EXTENSION. To
fix, have those views skip extensions found later in the path if they
have names already found earlier.
Also add a bit of documentation that only the first extension in the
path can be used.
Reported-by: Pierrick <pierrick.chovelon@dalibo.com>
Discussion: https://www.postgresql.org/message-id/flat/8f5a0517-1cb8-4085-ae89-77e7454e27ba%40dalibo.com
The TimescaleDB extension expects to be able to change an nbtree scan's
keys across rescans. The issue arises in the extension's implementation
of loose index scan. This is arguably a misuse of the index AM API,
though apparently it worked until recently. It stopped working when the
skipScan flag was added to BTScanOpaqueData by commit 8a510275, though.
The flag wouldn't reliably track whether the scan (actually, the current
rescan) has any skip arrays, leading to confusion in _bt_set_startikey.
nbtree preprocessing will now defensively initialize the scan's skipScan
flag in all cases, including the case where _bt_preprocess_array_keys
returns early due to the (re)scan not using arrays. While nbtree isn't
obligated to support this use case (at least not according to my reading
of the index AM API), it still seems like a good idea to be consistent
here, on general robustness grounds.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Natalya Aksman <natalya@timescale.com>
Discussion: https://postgr.es/m/CAJumhcirfMojbk20+W0YimbNDkwdECvJprQGQ-XqK--ph09nQw@mail.gmail.com
Backpatch-through: 18
Commit e3ffc3e91 fixed the translation of character classes in
SIMILAR TO regular expressions. Unfortunately the fix broke a corner
case: if there is an escape character right after the opening bracket
(for example in "[\q]"), a closing bracket right after the escape
sequence would not be seen as closing the character class.
There were two more oversights: a backslash or a nested opening bracket
right at the beginning of a character class should remove the special
meaning from any following caret or closing bracket.
This bug suggests that this code needs to be more readable, so also
rename the variables "charclass_depth" and "charclass_start" to
something more meaningful, rewrite an "if" cascade to be more
consistent, and improve the commentary.
Reported-by: Dominique Devienne <ddevienne@gmail.com>
Reported-by: Stephan Springl <springl-psql@bfw-online.de>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFCRh-8NwJd0jq6P=R3qhHyqU7hw0BTor3W0SvUcii24et+zAw@mail.gmail.com
Backpatch-through: 13
Commit a0b99fc12 caused pg_event_trigger_dropped_objects()
to not fill the object_name field for schemas, which it
should have; and caused it to fill the object_name field
for default values, which it should not have.
In addition, triggers and RLS policies really should behave
the same way as we're making column defaults do; that is,
they should have is_temporary = true if they belong to a
temporary table.
Fix those things, and upgrade event_trigger.sql's woefully
inadequate test coverage of these secondary output columns.
As before, back-patch only to v15.
Reported-by: Sergey Shinderuk <s.shinderuk@postgrespro.ru>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/bd7b4651-1c26-4d30-832b-f942fabcb145@postgrespro.ru
Backpatch-through: 15
A recently added nbtree preprocessing step failed to account for the
fact that DESC columns already had their B-Tree strategy number commuted
at this point in preprocessing. As a result, preprocessing could output
a set of scan keys where one or more keys had the correct strategy
number, but used the wrong comparison routine.
To fix, make the faulty code path that looks up a more restrictive
replacement operator/comparison routine commute its requested inequality
strategy (while outputting the transformed strategy number as before).
This makes the final transformed scan key comport with the approach
preprocessing has always used to deal with DESC columns (which is
described by comments above _bt_fix_scankey_strategy).
Oversight in commit commit b3f1a13f, which made nbtree preprocessing
perform transformations on skip array inequalities that can reduce the
total number of index searches.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Natalya Aksman <natalya@timescale.com>
Discussion: https://postgr.es/m/19049-b7df801e71de41b2@postgresql.org
Backpatch-through: 18
Clang 21 shows some new compiler warnings, for example:
warning: variable 'dstsize' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer]
The fix is to initialize the variables when they are defined. This is
similar to, for example, the existing situation in gistKeyIsEQ().
Discussion: https://www.postgresql.org/message-id/flat/6604ad6e-5934-43ac-8590-15113d6ae4b1%40eisentraut.org
pg_event_trigger_dropped_objects() would report a column default
object with is_temporary = false, even if it belongs to a
temporary table. This seems clearly wrong, so adjust it to
report the table's temp-ness.
While here, refactor EventTriggerSQLDropAddObject to make its
handling of namespace objects less messy and avoid duplication
of the schema-lookup code. And add some explicit test coverage
of dropped-object reports for dependencies of temp tables.
Back-patch to v15. The bug exists further back, but the
GetAttrDefaultColumnAddress function this patch depends on does not,
and it doesn't seem worth adjusting it to cope with the older code.
Author: Antoine Violin <violin.antuan@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFjUV9x3-hv0gihf+CtUc-1it0hh7Skp9iYFhMS7FJjtAeAptA@mail.gmail.com
Backpatch-through: 15
If the hash functions used for hashing tuples leaked any memory,
we failed to clean that up, resulting in query-lifespan memory
leakage in queries using hashed subplans. One way that could
happen is if the values being hashed require de-toasting, since
most of our hash functions don't trouble to clean up de-toasted
inputs.
Prior to commit bf6c614a2, this leakage was largely masked
because TupleHashTableMatch would reset hashtable->tempcxt
(via execTuplesMatch). But it doesn't do that anymore, and
that's not really the right place for this anyway: doing it
there could reset the tempcxt many times per hash lookup,
or not at all. Instead put reset calls into ExecHashSubPlan
and buildSubPlanHash. Along the way to that, rearrange
ExecHashSubPlan so that there's just one place to call
MemoryContextReset instead of several.
This amounts to accepting the de-facto API spec that the caller
of the TupleHashTable routines is responsible for resetting the
tempcxt adequately often. Although the other callers seem to
get this right, it was not documented anywhere, so add a comment
about it.
Bug: #19040
Reported-by: Haiyang Li <mohen.lhy@alibaba-inc.com>
Author: Haiyang Li <mohen.lhy@alibaba-inc.com>
Reviewed-by: Fei Changhong <feichanghong@qq.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19040-c9b6073ef814f48c@postgresql.org
Backpatch-through: 13
The startup process does not process shared invalidation messages, only
sending them, and never calls AtEOXact_SMgr() which clean up any
unpinned SMgrRelations. Hence, it is never able to free SMgrRelations
on a periodic basis, bloating its hashtable over time.
Like the checkpointer and the bgwriter, this commit takes a conservative
approach by freeing periodically SMgrRelations when replaying a
checkpoint record, either online or shutdown, so as the startup process
has a way to perform a periodic cleanup.
Issue caused by 21d9c3ee4e, so backpatch down to v17.
Author: Jingtang Zhang <mrdrivingduck@gmail.com>
Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
Discussion: https://postgr.es/m/28C687D4-F335-417E-B06C-6612A0BD5A10@gmail.com
Backpatch-through: 17
A new pgstats entry is created as a two-step process:
- The entry is looked at in the shared hashtable of pgstats, and is
inserted if not found.
- When not found and inserted, its fields are then initialized. This
part include a DSA chunk allocation for the stats data of the new entry.
As currently coded, if the DSA chunk allocation fails due to an
out-of-memory failure, an ERROR is generated, leaving in the pgstats
shared hashtable an inconsistent entry due to the first step, as the
entry has already been inserted in the hashtable. These broken entries
can then be found by other backends, crashing them.
There are only two callers of pgstat_init_entry(), when loading the
pgstats file at startup and when creating a new pgstats entry. This
commit changes pgstat_init_entry() so as we use dsa_allocate_extended()
with DSA_ALLOC_NO_OOM, making it return NULL on allocation failure
instead of failing. This way, a backend failing an entry creation can
take appropriate cleanup actions in the shared hashtable before throwing
an error. Currently, this means removing the entry from the shared
hashtable before throwing the error for the allocation failure.
Out-of-memory errors unlikely happen in the wild, and we do not bother
with back-patches when these are fixed, usually. However, the problem
dealt with here is a degree worse as it breaks the shared memory state
of pgstats, impacting other processes that may look at an inconsistent
entry that a different process has failed to create.
Author: Mikhail Kot <mikhail.kot@databricks.com>
Discussion: https://postgr.es/m/CAAi9E7jELo5_-sBENftnc2E8XhW2PKZJWfTC3i2y-GMQd2bcqQ@mail.gmail.com
Backpatch-through: 15
When executing a MERGE UPDATE action, if there is more than one
concurrent update of the target row, the lock-and-retry code would
sometimes incorrectly identify the latest version of the target tuple,
leading to incorrect results.
This was caused by using the ctid field from the TM_FailureData
returned by table_tuple_lock() in a case where the result was TM_Ok,
which is unsafe because the TM_FailureData struct is not guaranteed to
be fully populated in that case. Instead, it should use the tupleid
passed to (and updated by) table_tuple_lock().
To reduce the chances of similar errors in the future, improve the
commentary for table_tuple_lock() and TM_FailureData to make it
clearer that table_tuple_lock() updates the tid passed to it, and most
fields of TM_FailureData should not be relied on in non-failure cases.
An exception to this is the "traversed" field, which is set in both
success and failure cases.
Reported-by: Dmitry <dsy.075@yandex.ru>
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1570d30e-2b95-4239-b9c3-f7bf2f2f8556@yandex.ru
Backpatch-through: 15
If an INSERT has an ON CONFLICT DO UPDATE clause, the executor must
check that the target relation supports UPDATE as well as INSERT. In
particular, it must check that the target relation has a REPLICA
IDENTITY if it publishes updates. Formerly, it was not doing this
check, which could lead to silently breaking replication.
Fix by adding such a check to CheckValidResultRel(), which requires
adding a new onConflictAction argument. In back-branches, preserve ABI
compatibility by introducing a wrapper function with the original
signature.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Tested-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/OS3PR01MB57180C87E43A679A730482DF94B62@OS3PR01MB5718.jpnprd01.prod.outlook.com
Backpatch-through: 13
The counters saved from pgWalUsage, used for the difference calculations
when flushing the backend WAL stats, are updated when calling
pgstat_flush_backend() under PGSTAT_BACKEND_FLUSH_WAL, and not
pgstat_report_wal(). The comment updated in this commit referenced the
latter, but it is perfectly OK to flush the backend stats independently
of the WAL stats.
Noticed while looking at this area of the code, introduced by
76def4cdd7 as a copy-pasto.
Backpatch-through: 18
SubPlan nodes are typically built very early, before any RelOptInfos
have been constructed for the parent query level. As a result, the
simple_rel_array in the parent root has not yet been initialized.
Currently, during cost estimation of a SubPlan's testexpr, we may call
examine_variable() to look up statistical data about the expressions.
This can lead to "no relation entry for relid" errors.
To fix, pass root as NULL to cost_qual_eval() in cost_subplan(), since
the root does not yet contain enough information to safely consult
statistics.
One exception is SubPlan nodes built for the initplans of MIN/MAX
aggregates from indexes. In this case, having a NULL root is safe
because testexpr will be NULL. Additionally, an initplan will by
definition not consult anything from the parent plan.
Backpatch to all supported branches. Although the reported call path
that triggers this error is not reachable prior to v17, there's no
guarantee that other code paths -- especially in extensions -- could
not encounter the same issue when cost_qual_eval() is called with a
root that lacks a valid simple_rel_array. The test case is not
included in pre-v17 branches though.
Bug: #19037
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19037-3d1c7bb553c7ce84@postgresql.org
Backpatch-through: 13
SLRU bank locks are referred as "bank locks" or "SLRU bank locks" in the
code comments. The comments updated in this commit use the latter term.
Oversight in 53c2a97a92, that has replaced the single ControlLock by
the bank control locks.
Author: Julien Rouhaud <julien.rouhaud@free.fr>
Discussion: https://postgr.es/m/aLUT2UO8RjJOzZNq@jrouhaud
Backpatch-through: 17
It's possible that if the only live partition is concurrently dropped
and try_table_open() fails, that the bms_del_member() will pfree the
live_parts Bitmapset. Since the bms_del_member() call does not assign
the result back to the live_parts local variable, the while loop could
segfault as that variable would still reference the pfree'd Bitmapset.
Backpatch to 15. 52f3de874 was backpatched to 14, but there's no
bms_del_member() there due to live_parts not yet existing in RelOptInfo in
that version. Technically there's no bug in version 15 as
bms_del_member() didn't pfree when the set became empty prior to
00b41463c (from v16). Applied to v15 anyway to keep the code similar and
to avoid the bad coding pattern.
Author: Bernd Reiß <bd_reiss@gmx.at>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/6b88f27a-c45c-4826-8e37-d61a04d90182@gmx.at
Backpatch-through: 15
For a child relation, we should not assume that its parent's
unique-ified relation (or unique-ified path in v18) always exists. In
cases where all RHS columns that need to be unique-ified are equated
to constants, the unique-ified relation/path for the parent table is
not built, as there are no columns left to unique-ify. Failing to
account for this can result in a SIGSEGV crash during planning.
This patch checks whether the parent's unique-ified relation or path
exists and skips unique-ification of the child relation if it does
not.
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49MOdLW2c+qbLHHBt8VBu=4ONpM91D19=AWeW93eFUF6A@mail.gmail.com
Backpatch-through: 18
One mechanism we have for implementing semi-joins is to de-duplicate
the output of the RHS and then treat the join as a plain inner join.
Initial construction of the join's SpecialJoinInfo identifies the
RHS columns that need to be de-duplicated, but later we may find that
some of those don't need to be handled explicitly, either because
they're known to be constant or because they are redundant with some
previous column.
Up to now, while sort-based de-duplication handled such cases well,
hash-based de-duplication didn't: we'd still hash on all of the
originally-identified columns. This is probably not a very big
deal performance-wise, but in the wake of commit a3179ab69 it can
cause planner errors. That happens when join elimination causes
recalculation of variables' attr_needed bitmapsets, and we decide
that a variable mentioned in a semijoin clause doesn't need to be
propagated up to the join level anymore.
There are a number of ways we could slice the blame for this, but the
only fix that doesn't result in pessimizing plans for loosely-related
cases is to be more careful about not hashing columns we don't
actually need to de-duplicate. We can install that consideration
into create_unique_paths in master, or the predecessor code in
create_unique_path in v18, without much refactoring.
(As follow-up work, it might be a good idea to look at more-invasive
refactoring, in hopes of preventing other bugs in this area. But
with v18 release so close, there's not time for that now, nor would
we be likely to want to put such refactoring into v18 anyway.)
Reported-by: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Diagnosed-by: Richard Guo <guofenglinux@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/1fd1a421-4609-4d46-a1af-ab74d5de504a@tantorlabs.ru
Backpatch-through: 18
During an investigation into rather odd aio related errors on macos, observed
by Alexander and Konstantin, we started to wonder if bitfield access is
related to the error. At the moment it looks like it is related, we cannot
reproduce the failures when replacing the bitfields. In addition, the problem
can only be reproduced with some compiler [versions] and not everyone has been
able to reproduce the issue.
The observed problem is that, very rarely, PgAioHandle->{state,target} are in
an inconsistent state, after having been checked to be in a valid state not
long before, triggering an assertion failure. Unfortunately, this could be
caused by wrong compiler code generation or somehow of missing memory barriers
- we don't really know. In theory there should not be any concurrent write
access to the handle in the state the bug is triggered, as the handle was idle
and is just being initialized.
Separately from the bug, we observed that at least gcc and clang generate
rather terrible code for the bitfield access. Even if it's not clear if the
observed assertion failure is actually caused by the bitfield somehow, the bad
code generation alone is sufficient reason to stop using bitfields.
Therefore, replace the enum bitfields with uint8s and instead cast in each
switch statement.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/1500090.1745443021@sss.pgh.pa.us
Backpatch-through: 18
This patch fixes a bug in how 'load_external_function' handles
'$libdir/ prefixes in module paths.
Previously, 'load_external_function' would unconditionally strip
'$libdir/' from the beginning of the 'filename' string. This caused
an issue when the path was nested, such as "$libdir/nested/my_lib".
Stripping the prefix resulted in a path of "nested/my_lib", which
would fail to be found by the expand_dynamic_library_name function
because the original '$libdir' macro was removed.
To fix this, the code now checks for the presence of an additional
directory separator ('/' or '\') after the '$libdir/' prefix. The
prefix is only stripped if the remaining string does not contain a
separator. This ensures that simple filenames like '"$libdir/my_lib"'
are correctly handled, while nested paths are left intact for
'expand_dynamic_library_name' to process correctly.
Reported-by: Dilip Kumar <dilipbalaut@gmail.com>
Co-authored-by: Matheus Alcantara <matheusssilv97@gmail.com>
Co-authored-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFiTN-uKNzAro4tVwtJhF1UqcygfJ%2BR%2BRL%3Db-_ZMYE3LdHoGhA%40mail.gmail.com
Commit 4b754d6c1 introduced the concept of an excludeOnly scan key,
which cannot select matching index entries but can reject
non-matching tuples, for example a tsquery such as '!term'. There are
poorly-documented assumptions that such scan keys do not appear as the
first scan key. ginNewScanKey did nothing to ensure that, however,
with the result that certain GIN index searches could go into an
infinite loop while apparently-equivalent queries with the clauses in
a different order were fine.
Fix by teaching ginNewScanKey to place all excludeOnly scan keys
after all not-excludeOnly ones. So far as we know at present,
it might be sufficient to avoid the case where the very first
scan key is excludeOnly; but I'm not very convinced that there
aren't other dependencies on the ordering.
Bug: #19031
Reported-by: Tim Wood <washwithcare@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19031-0638148643d25548@postgresql.org
Backpatch-through: 13
The CHECK_FOR_INTERRUPTS call in gingetbitmap turns out to be
inadequate to prevent a long uninterruptible loop, because
we now know a case where looping occurs within scanGetItem.
While the next patch will fix the bug that caused that, it
seems foolish to assume that no similar patterns are possible.
Let's do the CFI within scanGetItem's retry loop, instead.
This demonstrably allows canceling out of the loop exhibited
in bug #19031.
Bug: #19031
Reported-by: Tim Wood <washwithcare@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19031-0638148643d25548@postgresql.org
Backpatch-through: 13
The Self-Join Elimination SJE feature messes up keeping and removing RowMark's
in remove_self_joins_one_group(). That didn't lead to user-level error,
because the planned RowMark is only used to reference a rtable entry in later
execution stages. An RTE entry for keeping and removing relations is
identical and refers to the same relation OID.
To reduce confusion and prevent future issues, this commit cleans up the code
and fixes the incorrect behaviour. Furthermore, it includes sanity checks in
setrefs.c on existing non-null RTE and RelOptInfo entries for each RowMark.
Discussion: https://postgr.es/m/18c6bd6c-6d2a-419a-b0da-dfedef34b585%40gmail.com
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Backpatch-through: 18
Commit 1585ff7387 changed GetTransactionSnapshot() to throw an error
if it's called during logical decoding, instead of returning the
historic snapshot. I made that change for extra protection, because a
historic snapshot can only be used to access catalog tables while
GetTransactionSnapshot() is usually called when you're executing
arbitrary queries. You might get very subtle visibility problems if
you tried to use the historic snapshot for arbitrary queries.
There's no built-in code in PostgreSQL that calls
GetTransactionSnapshot() during logical decoding, but it turns out
that the pglogical extension does just that, to evaluate row filter
expressions. You would get weird results if the row filter runs
arbitrary queries, but it is sane as long as you don't access any
non-catalog tables. Even though there are no checks to enforce that in
pglogical, a typical row filter expression does not access any tables
and works fine. Accessing tables marked with the user_catalog_table =
true option is also OK.
To fix pglogical with row filters, and any other extensions that might
do similar things, revert GetTransactionSnapshot() to return a
historic snapshot during logical decoding.
To try to still catch the unsafe usage of historic snapshots, add
checks in heap_beginscan() and index_beginscan() to complain if you
try to use a historic snapshot to scan a non-catalog table. We're very
close to the version 18 release however, so add those new checks only
in master.
Backpatch-through: 18
Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://www.postgresql.org/message-id/20250809222338.cc.nmisch@google.com
Temporary relations may share the same RelFileNumber with a permanent
relation, or other temporary relations associated with other sessions.
Being able to uniquely identify a temporary relation would require
RelidByRelfilenumber() to know about the proc number of the temporary
relation it wants to identify, something it is not designed for since
its introduction in f01d1ae3a1.
There are currently three callers of RelidByRelfilenumber():
- autoprewarm.
- Logical decoding, reorder buffer.
- pg_filenode_relation(), that attempts to find a relation OID based on
a tablespace OID and a RelFileNumber.
This makes the situation problematic particularly for the first two
cases, leading to the possibility of random ERRORs due to
inconsistencies that temporary relations can create in the cache
maintained by RelidByRelfilenumber(). The third case should be less of
an issue, as I suspect that there are few direct callers of
pg_filenode_relation().
The window where the ERRORs are happen is very narrow, requiring an OID
wraparound to create a lookup conflict in RelidByRelfilenumber() with a
temporary table reusing the same OID as another relation already cached.
The problem is easier to reach in workloads with a high OID consumption
rate, especially with a higher number of temporary relations created.
We could get pg_filenode_relation() and RelidByRelfilenumber() to work
with temporary relations if provided the means to identify them with an
optional proc number given in input, but the years have also shown that
we do not have a use case for it, yet. Note that this could not be
backpatched if pg_filenode_relation() needs changes. It is simpler to
ignore temporary relations.
Reported-by: Shenhao Wang <wangsh.fnst@fujitsu.com>
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-By: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Takamichi Osumi <osumi.takamichi@fujitsu.com>
Reviewed-By: Michael Paquier <michael@paquier.xyz>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-By: Shenhao Wang <wangsh.fnst@fujitsu.com>
Discussion: https://postgr.es/m/bbaaf9f9-ebb2-645f-54bb-34d6efc7ac42@fujitsu.com
Backpatch-through: 13
If we error out during execution of a SQL-language function, we will
often leave behind non-null pointers in its SQLFunctionCache's cplan
and eslist fields. This is problematic if the SQLFunctionCache is
re-used, because those pointers will point at resources that were
released during error cleanup. This problem escaped detection so far
because ordinarily we won't re-use an FmgrInfo+SQLFunctionCache struct
after a query error. However, in the rather improbable case that
someone implements an opclass support function in SQL language, there
will be long-lived FmgrInfos for it in the relcache, and then the
problem is reachable after the function throws an error.
To fix, add a flag to SQLFunctionCache that tracks whether execution
escapes out of fmgr_sql, and clear out the relevant fields during
init_sql_fcache if so. (This is going to need more thought if we ever
try to share FMgrInfos across threads; but it's very far from being
the only problem such a project will encounter, since many functions
regard fn_extra as being query-local state.)
This broke at commit 0313c5dc6; before that we did not try to re-use
SQLFunctionCache state across calls. Hence, back-patch to v18.
Bug: #19026
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19026-90aed5e71d0c8af3@postgresql.org
Backpatch-through: 18
Some replication slot manipulations (logical decoding via SQL,
advancing) were failing an assertion when releasing a slot in
single-user mode, because active_pid was not set in a ReplicationSlot
when its slot is acquired.
ReplicationSlotAcquire() has some logic to be able to work with the
single-user mode. This commit sets ReplicationSlot->active_pid to
MyProcPid, to let the slot-related logic fall-through, considering the
single process as the one holding the slot.
Some TAP tests are added for various replication slot functions with the
single-user mode, while on it, for slot creation, drop, advancing, copy
and logical decoding with multiple slot types (temporary, physical vs
logical). These tests are skipped on Windows, as direct calls of
postgres --single would fail on permission failures. There is no
platform-specific behavior that needs to be checked, so living with this
restriction should be fine. The CI is OK with that, now let's see what
the buildfarm tells.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Mutaamba Maasha <maasha@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966ED588A0328DAEBE8CB25F5FA2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 13
The DROP SUBSCRIPTION command performs several operations: it stops the
subscription workers, removes subscription-related entries from system
catalogs, and deletes the replication slot on the publisher server.
Previously, this command acquired an AccessExclusiveLock on
pg_subscription before initiating these steps.
However, while holding this lock, the command attempts to connect to the
publisher to remove the replication slot. In cases where the connection is
made to a newly created database on the same server as subscriber, the
cache-building process during connection tries to acquire an
AccessShareLock on pg_subscription, resulting in a self-deadlock.
To resolve this issue, we reduce the lock level on pg_subscription during
DROP SUBSCRIPTION from AccessExclusiveLock to RowExclusiveLock. Earlier,
the higher lock level was used to prevent the launcher from starting a new
worker during the drop operation, as a restarted worker could become
orphaned.
Now, instead of relying on a strict lock, we acquire an AccessShareLock on
the specific subscription being dropped and re-validate its existence
after acquiring the lock. If the subscription is no longer valid, the
worker exits gracefully. This approach avoids the deadlock while still
ensuring that orphan workers are not created.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/18988-7312c868be2d467f@postgresql.org
When extracting a timestamp from a UUIDv7, a conversion from
milliseconds to microseconds was using the incorrect constant
NS_PER_US instead of US_PER_MS. Although both constants have the same
value, this fix improves code clarity by using the semantically
correct constant.
Backpatch to v18, where UUIDv7 was introduced.
Author: Erik Nordström <erik@tigerdata.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CACAa4V+i07eaP6h4MHNydZeX47kkLPwAg0sqe67R=M5tLdxNuQ@mail.gmail.com
Backpatch-through: 18