1
0
mirror of https://github.com/postgres/postgres.git synced 2025-05-28 05:21:27 +03:00

59297 Commits

Author SHA1 Message Date
Thomas Munro
44f400fbc6 Fix latch event policy that hid socket events.
If a WaitEventSetWait() caller asks for multiple events, an already set
latch would previously prevent other events from being reported at the
same time.  Now, we'll also poll the kernel for other events that would
fit in the caller's output buffer with a zero wait time.  This policy
change doesn't affect callers that ask for only one event.

The main caller affected is the postmaster.  If its latch is set
extremely frequently by backends launching workers and workers exiting,
we don't want it to handle only those jobs and ignore incoming client
connections.

Back-patch to 16 where the postmaster began using the API.  The
fast-return policy changed here is older than that, but doesn't cause
any known problems in earlier releases.

Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/Z1n5UpAiGDmFcMmd%40nathan
2025-01-20 16:43:39 +13:00
Michael Paquier
e6767c0ed1 Fix header check for continuation records where standbys could be stuck
XLogPageRead() checks immediately for an invalid WAL record header on a
standby, to be able to handle the case of continuation records that need
to be read across two different sources.  As written, the check was too
generic, applying to any target LSN.  Based on an analysis by Kyotaro
Horiguchi, what really matters is to make sure that the page header is
checked when attempting to read a LSN at the boundary of a segment, to
handle the case of a continuation record that spawns across multiple
pages when dealing with multiple segments, as WAL receivers are spawned
they request WAL from the beginning of a segment.  This fix has been
proposed by Kyotaro Horiguchi.

This could cause standbys to loop infinitely when dealing with a
continuation record during a timeline jump, in the case where the
contents of the record in the follow-up page are invalid.

Some regression tests are added to check such scenarios, able to
reproduce the original problem.  In the test, the contents of a
continuation record are overwritten with junk zeros on its follow-up
page, and replayed on standbys.  This is inspired by 039_end_of_wal.pl,
and is enough to show how standbys should react on promotion by not
being stuck.  Without the fix, the test would fail with a timeout.  The
test to reproduce the problem has been written by Alexander Kukushkin.

The original check has been introduced in 066871980183, for a similar
problem.

Author: Kyotaro Horiguchi, Alexander Kukushkin
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAFh8B=mozC+e1wGJq0H=0O65goZju+6ab5AU7DEWCSUA2OtwDg@mail.gmail.com
Backpatch-through: 13
2025-01-20 09:30:33 +09:00
Michael Paquier
d1bf86a622 Revert recent changes related to handling of 2PC files at recovery
This commit reverts 8f67f994e8ea (down to v13) and c3de0f9eed38 (down to
v17), as these are proving to not be completely correct regarding two
aspects:
- In v17 and newer branches, c3de0f9eed38's check for epoch handling is
incorrect, and does not correctly handle frozen epochs.  A logic closer
to widen_snapshot_xid() should be used.  The 2PC code should try to
integrate deeper with FullTransactionIds, 5a1dfde8334b being not enough.
- In v13 and newer branches, 8f67f994e8ea is a workaround for the real
issue, which is that we should not attempt CLOG lookups without reaching
consistency.  This exists since 728bd991c3c4, and this is reachable with
ProcessTwoPhaseBuffer() called by restoreTwoPhaseData() at the beginning
of recovery.

Per discussion with Noah Misch.

Discussion: https://postgr.es/m/20250116010051.f3.nmisch@google.com
Backpatch-through: 13
2025-01-17 13:27:42 +09:00
Tom Lane
0671a71e02 Fix setrefs.c's failure to do expression processing on prune steps.
We should run the expression subtrees of PartitionedRelPruneInfo
structs through fix_scan_expr.  Failure to do so means that
AlternativeSubPlans within those expressions won't be cleaned up
properly, resulting in "unrecognized node type" errors since v14.

It seems fairly likely that at least some of the other steps done
by fix_scan_expr are important here as well, resulting in as-yet-
undetected bugs.  Therefore, I've chosen to back-patch this to
all supported branches including v13, even though the known
symptom doesn't manifest in v13.

Per bug #18778 from Alexander Lakhin.

Discussion: https://postgr.es/m/18778-24cd399df6c806af@postgresql.org
2025-01-16 20:40:07 -05:00
Michael Paquier
149ed87e22 Move routines to manipulate WAL into PostgreSQL::Test::Cluster
These facilities were originally in the recovery TAP test
039_end_of_wal.pl.  A follow-up bug fix with a TAP test doing similar
WAL manipulations requires them, and all these had better not be
duplicated due to their complexity.  The routine names are tweaked to
use "wal" more consistently, similarly to the existing "advance_wal".

In v14 and v13, the new routines are moved to PostgresNode.pm.
039_end_of_wal.pl is updated to use the refactored routines, without
changing its coverage.

Reviewed-by: Alexander Kukushkin
Discussion: https://postgr.es/m/CAFh8B=mozC+e1wGJq0H=0O65goZju+6ab5AU7DEWCSUA2OtwDg@mail.gmail.com
Backpatch-through: 13
2025-01-16 09:26:25 +09:00
Tom Lane
a0dfeae0dc Avoid symbol collisions between pqsignal.c and legacy-pqsignal.c.
In the name of ABI stability (that is, to avoid a library major
version bump for libpq), libpq still exports a version of pqsignal()
that we no longer want to use ourselves.  However, since that has
the same link name as the function exported by src/port/pqsignal.c,
there is a link ordering dependency determining which version will
actually get used by code that uses libpq as well as libpgport.a.

It now emerges that the wrong version has been used by pgbench and
psql since commit 06843df4a rearranged their link commands.  This
can result in odd failures in pgbench with the -T switch, since its
SIGALRM handler will now not be marked SA_RESTART.  psql may have
some edge-case problems in \watch, too.

Since we don't want to depend on link ordering effects anymore,
let's fix this in the same spirit as b6c7cfac8: use macros to change
the actual link names of the competing functions.  We cannot change
legacy-pqsignal.c's exported name of course, so the victim has to be
src/port/pqsignal.c.

In master, rename its exported name to be pqsignal_fe in frontend or
pqsignal_be in backend.  (We could perhaps have gotten away with using
the same symbol in both cases, but since the FE and BE versions now
work a little differently, it seems advisable to use different names.)

In back branches, rename to pqsignal_fe in frontend but keep it as
pqsignal in backend.  The frontend change could affect third-party
code that is calling pqsignal from libpgport.a or libpgport_shlib.a,
but only if the code is compiled against port.h from a different minor
release than libpgport.  Since we don't support using libpgport as a
shared library, it seems unlikely that there will be such a problem.
I left the backend symbol unchanged to avoid an ABI break for
extensions.  This means that the link ordering hazard still exists
for any extension that links against libpq.  However, none of our own
extensions use both pqsignal() and libpq, and we're not making things
any worse for third-party extensions that do.

Report from Andy Fan, diagnosis by Fujii Masao, patch by me.
Back-patch to all supported branches, as 06843df4a was.

Discussion: https://postgr.es/m/87msfz5qv2.fsf@163.com
2025-01-14 18:50:24 -05:00
Fujii Masao
ba2dbedd53 ecpg: Restore detection of unsupported COPY FROM STDIN.
The ecpg command includes code to warn about unsupported COPY FROM STDIN
statements in input files. However, since commit 3d009e45bd,
this functionality has been broken due to a bug introduced in that commit,
causing ecpg to fail to detect the statement.

This commit resolves the issue, restoring ecpg's ability to detect
COPY FROM STDIN and issue a warning as intended.

Back-patch to all supported versions.

Author: Ryo Kanbayashi
Reviewed-by: Hayato Kuroda, Tom Lane
Discussion: https://postgr.es/m/CANOn0Ez_t5uDCUEV8c1YORMisJiU5wu681eEVZzgKwOeiKhkqQ@mail.gmail.com
2025-01-15 01:24:24 +09:00
Heikki Linnakangas
96e61b2792 Fix catcache invalidation of a list entry that's being built
If a new catalog tuple is inserted that belongs to a catcache list
entry, and cache invalidation happens while the list entry is being
built, the list entry might miss the newly inserted tuple.

To fix, change the way we detect concurrent invalidations while a
catcache entry is being built. Keep a stack of entries that are being
built, and apply cache invalidation to those entries in addition to
the real catcache entries. This is similar to the in-progress list in
relcache.c.

Back-patch to all supported versions.

Reviewed-by: Noah Misch
Discussion: https://www.postgresql.org/message-id/2234dc98-06fe-42ed-b5db-ac17384dc880@iki.fi
2025-01-14 14:35:11 +02:00
Michael Paquier
e027ee9902 Fix potential integer overflow in bringetbitmap()
This function expects an "int64" as result and stores the number of
pages to add to the index scan bitmap as an "int", multiplying its final
result by 10.  For a relation large enough, this can theoretically
overflow if counting more than (INT32_MAX / 10) pages, knowing that the
number of pages is upper-bounded by MaxBlockNumber.

To avoid the overflow, this commit redefines "totalpages", used to
calculate the result, to be an "int64" rather than an "int".

Reported-by: Evgeniy Gorbanyov
Author: James Hunter
Discussion: https://www.postgresql.org/message-id/07704817-6fa0-460c-b1cf-cd18f7647041@basealt.ru
Backpatch-through: 13
2025-01-14 15:13:14 +09:00
Daniel Gustafsson
dc24c9ad52 Fix HBA option count
Commit 27a1f8d108 missed updating the max HBA option count to
account for the new option added.  Fix by bumping the counter
and adjust the relevant comment to match.  Backpatch down to
all supported branches like the erroneous commit.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/286764.1736697356@sss.pgh.pa.us
Backpatch-through: v13
2025-01-12 23:44:39 +01:00
Dean Rasheed
d037cc2af1 Fix JsonExpr deparsing to quote variable names in the PASSING clause.
When deparsing a JsonExpr, variable names in the PASSING clause were
not quoted. However, since they are parsed as ColLabel tokens, some
variable names require double quotes to ensure that they are properly
interpreted. Fix by using quote_identifier() in the deparsing code.

This oversight was limited to the SQL/JSON query functions
JSON_EXISTS(), JSON_QUERY(), and JSON_VALUE().

Back-patch to v17, where these functions were added.

Dean Rasheed, reviewed by Tom Lane.

Discussion: https://postgr.es/m/CAEZATCXTpAS%3DncfLNTZ7YS6O5puHeLg_SUYAit%2Bcs7wsrd9Msg%40mail.gmail.com
2025-01-12 13:36:44 +00:00
Dean Rasheed
61b12135f5 Fix XMLTABLE() deparsing to quote namespace names if necessary.
When deparsing an XMLTABLE() expression, XML namespace names were not
quoted. However, since they are parsed as ColLabel tokens, some names
require double quotes to ensure that they are properly interpreted.
Fix by using quote_identifier() in the deparsing code.

Back-patch to all supported versions.

Dean Rasheed, reviewed by Tom Lane.

Discussion: https://postgr.es/m/CAEZATCXTpAS%3DncfLNTZ7YS6O5puHeLg_SUYAit%2Bcs7wsrd9Msg%40mail.gmail.com
2025-01-12 12:56:52 +00:00
Tom Lane
e98df02df3 Repair memory leaks in plpython.
PLy_spi_execute_plan (PLyPlan.execute) and PLy_cursor_plan
(plpy.cursor) use PLy_output_convert to convert Python values
into Datums that can be passed to the query-to-execute.  But they
failed to pay much attention to its warning that it can leave "cruft
generated along the way" behind.  Repeated use of these methods can
result in a substantial memory leak for the duration of the calling
plpython function.

To fix, make a temporary memory context to invoke PLy_output_convert
in.  This also lets us get rid of the rather fragile code that was
here for retail pfree's of the converted Datums.  Indeed, we don't
need the PLyPlanObject.values field anymore at all, though I left it
in place in the back branches in the name of ABI stability.

Mat Arye and Tom Lane, per report from Mat Arye.  Back-patch to all
supported branches.

Discussion: https://postgr.es/m/CADsUR0DvVgnZYWwnmKRK65MZg7YLUSTDLV61qdnrwtrAJgU6xw@mail.gmail.com
2025-01-11 11:45:56 -05:00
Daniel Gustafsson
8ed9bf0a32 Fix missing ldapscheme option in pg_hba_file_rules()
The ldapscheme option was missed when inspecing the HbaLine for
assembling rows for the pg_hba_file_rules function.  Backpatch
to all supported versions.

Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reported-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Bug: 18769
Discussion: https://postgr.es/m/18769-dd8610cbc0405172@postgresql.org
Backpatch-through: v13
2025-01-10 22:02:58 +01:00
David Rowley
5db9367e51 Fix UNION planner datatype issue
66c0185a3 gave the planner the ability to have union child queries
provide the union planner with pre-sorted input so that UNION queries
could be more efficiently implemented using Merge Append.

That commit overlooked checking that the UNION target list and the union
child target list's types all match.  In some corner cases, this could
result in the planner producing sorts using the sort operator of the
top-level UNION's target list type rather than of the union child's
target list's type.  The implications of this range from silently
working correctly, despite using the wrong sort operator all the way up
to a segmentation fault.

Here we fix by adjusting the planner so it makes no attempt to have the
subquery produce pre-sorted results when the data type of the UNION
target list and the types from the subquery target list don't match
exactly.

Backpatch to 17, where 66c0185a3 was introduced.

Reported-by: Jason Smith <dqetool@126.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Bug: 18764
Discussion: https://postgr.es/m/18764-63ad667ea26e877a%40postgresql.org
Backpatch-through: 17
2025-01-10 14:31:31 +13:00
Nathan Bossart
c559f61b54 Fix an ALTER GROUP ... DROP USER error message.
This error message stated the privileges required to add a member
to a group even if the user was trying to drop a member:

	postgres=> alter group a drop user b;
	ERROR:  permission denied to alter role
	DETAIL:  Only roles with the ADMIN option on role "a" may add members.

Since the required privileges for both operations are the same, we
can fix this by modifying the message to mention both adding and
dropping members:

	postgres=> alter group a drop user b;
	ERROR:  permission denied to alter role
	DETAIL:  Only roles with the ADMIN option on role "a" may add or drop members.

Author: ChangAo Chen
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/tencent_FAA0D00E3514AAF0BBB6322542A6094FEF05%40qq.com
Backpatch-through: 16
2025-01-09 17:10:13 -06:00
Álvaro Herrera
ffd9b81346
Fix SLRU bank selection code
The originally submitted code (using bit masking) was correct when the
number of slots was restricted to be a power of two -- but that
limitation was removed during development that led to commit
53c2a97a9266, which made the bank selection code incorrect.  This led to
always using a smaller number of banks than available.  Change said code
to use integer modulo instead, which works correctly with an arbitrary
number of banks.

It's likely that we could improve on this to avoid runtime use of
integer division.  But with this change we're, at least, not wasting
memory on unused banks, and more banks mean less contention, which is
likely to have a much higher performance impact than a single
instruction's latency.

Author: Yura Sokolov <y.sokolov@postgrespro.ru>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/9444dc46-ca47-43ed-9058-89c456316306@postgrespro.ru
2025-01-09 07:44:30 +01:00
Thomas Munro
faee3185aa Fix off_t overflow in pg_basebackup on Windows.
walmethods.c used off_t to navigate around a pg_wal.tar file that could
exceed 2GB, which doesn't work on Windows and would fail with misleading
errors.  Use pgoff_t instead.

Back-patch to all supported branches.

Author: Davinder Singh <davinder.singh@enterprisedb.com>
Reported-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/CAKZiRmyM4YnokK6Oenw5JKwAQ3rhP0YTz2T-tiw5dAQjGRXE3Q%40mail.gmail.com
2025-01-09 16:05:01 +13:00
Thomas Munro
af109e3399 Provide 64-bit ftruncate() and lseek() on Windows.
Change our ftruncate() macro to use the 64-bit variant of chsize(), and
add a new macro to redirect lseek() to _lseeki64().

Back-patch to all supported releases, in preparation for a bug fix.

Tested-by: Davinder Singh <davinder.singh@enterprisedb.com>
Discussion: https://postgr.es/m/CAKZiRmyM4YnokK6Oenw5JKwAQ3rhP0YTz2T-tiw5dAQjGRXE3Q%40mail.gmail.com
2025-01-09 15:00:23 +13:00
Thomas Munro
45aef9f6bb Fix C error reported by Oracle compiler.
Commit 66aaabe7 (branches 13 - 17 only) was not acceptable to the Oracle
Developer Studio compiler on build farm animal wrasse.  It accidentally
used a C++ style return statement to wrap a void function.  None of the
usual compilers complained, but it is right, that is not allowed in C.
Fix.

Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z33vgfVgvOnbFLN9%40paquier.xyz
2025-01-08 17:02:30 +13:00
Thomas Munro
66aaabe7a1 Restore smgrtruncate() prototype in back-branches.
It's possible that external code is calling smgrtruncate().  Any
external callers might like to consider the recent changes to
RelationTruncate(), but commit 38c579b0 should not have changed the
function prototype in the back-branches, per ABI stability policy.

Restore smgrtruncate()'s traditional argument list in the back-branches,
but make it a wrapper for a new function smgrtruncate2().  The three
callers in core can use smgrtruncate2() directly.  In master (18-to-be),
smgrtruncate2() is effectively renamed to smgrtruncate(), so this wart
is cleaned up.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKG%2BThae6x6%2BjmQiuALQBT2Ae1ChjMh1%3DkMvJ8y_SBJZrvA%40mail.gmail.com
2025-01-08 10:43:40 +13:00
Nathan Bossart
e43537cdc3 Use PqMsg_* macros in postgres.c.
Commit f4b54e1ed9, which introduced macros for protocol characters,
missed updating a couple of places in postgres.c.

Author: Dave Cramer
Reviewed-by: Fabrízio de Royes Mello
Discussion: https://postgr.es/m/CADK3HHJUVBPoVOmFesPB-fN8_dYt%2BQELV2UB6jxOW2Z40qF-qw%40mail.gmail.com
Backpatch-through: 17
2025-01-07 15:34:19 -06:00
Álvaro Herrera
d7905aa1da
Fix error message wording
The originals are ambiguous and a bit out of style.

Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/202412141243.efesjyyvzxsz@alvherre.pgsql
2025-01-07 20:07:32 +01:00
Heikki Linnakangas
e998901b88 Remove duplicate definitions in proc.h
These are also present in procnumber.h

Reported-by: Peter Eisentraut
Discussion: https://www.postgresql.org/message-id/bd04d675-4672-4f87-800a-eb5d470c15fc@eisentraut.org
2025-01-06 11:56:28 +02:00
Andrew Dunstan
591128f9c9 Document strange jsonb sort order for empty top level arrays
Slightly faulty logic in the original jsonb code (commit d9134d0a355)
results in an empty top level array sorting less than a json null. We
can't change the sort order now since it would affect btree indexes over
jsonb, so document the anomaly.

Backpatch to all live branches (13 .. 17)

In master, also add a code comment noting the anomaly.

Reported-by: Yan Chengpen
Reviewed-by: Jian He

Discussion: https://postgr.es/m/OSBPR01MB45199DD8DA2D1CECD50518188E272@OSBPR01MB4519.jpnprd01.prod.outlook.com
2025-01-03 10:28:58 -05:00
Richard Guo
297b280abd Ignore nullingrels when looking up statistics
When looking up statistical data about an expression, we do not need
to concern ourselves with the outer joins that could null the
Vars/PHVs contained in the expression.  Accounting for nullingrels in
the expression could cause estimate_num_groups to count the same Var
multiple times if it's marked with different nullingrels.  This is
incorrect, and could lead to "ERROR:  corrupt MVNDistinct entry" when
searching for multivariate n-distinct.

Furthermore, the nullingrels could prevent us from matching an
expression to expressional index columns or to the expressions in
extended statistics, leading to inaccurate estimates.

To fix, strip out all the nullingrels from the expression before we
look up statistical data about it.  There is one ensuing plan change
in the regression tests, but it looks reasonable and does not
compromise its original purpose.

This patch could result in plan changes, but it fixes an actual bug,
so back-patch to v16 where the outer-join-aware-Var infrastructure was
introduced.

Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4-2Z4k+nFTiZe0Qbu5n8juUWenDAtMzi98bAZQtwHx0-w@mail.gmail.com
2025-01-02 17:59:32 +09:00
Bruce Momjian
759620716a Update copyright for 2025
Backpatch-through: 13
2025-01-01 11:21:54 -05:00
Michael Paquier
836435424b Fix memory leak in pgoutput with relation attribute map
pgoutput caches the attribute map of a relation, that is free()'d only
when validating a RelationSyncEntry.  However, this code path is not
taken when calling any of the SQL functions able to do some logical
decoding, like pg_logical_slot_{get,peek}_changes(), leaking some memory
into CacheMemoryContext on repeated calls.

To address this, a relation's attribute map is allocated in
PGOutputData's cachectx, free()'d at the end of the execution of these
SQL functions when logical decoding ends.  This is available down to 15.
v13 and v14 have a similar leak, which will be dealt with later.

Reported-by: Masahiko Sawada
Author: Vignesh C
Reviewed-by: Hou Zhijie
Discussion: https://postgr.es/m/CAD21AoDkAhQVSukOfH3_reuF-j4EU0-HxMqU3dU+bSTxsqT14Q@mail.gmail.com
Discussion: https://postgr.es/m/CALDaNm1hewNAsZ_e6FF52a=9drmkRJxtEPrzCB6-9mkJyeBBqA@mail.gmail.com
Backpatch-through: 15
2024-12-30 13:33:58 +09:00
Michael Paquier
c3de0f9eed Fix failures with incorrect epoch handling for 2PC files at recovery
At the beginning of recovery, an orphaned two-phase file in an epoch
different than the one defined in the checkpoint record could not be
removed based on the assumptions that AdjustToFullTransactionId() relies
on, assuming that all files would be either from the current epoch or
from the previous epoch.

If the checkpoint epoch was 0 while the 2PC file was orphaned and in the
future, AdjustToFullTransactionId() would underflow the epoch used to
build the 2PC file path.  In non-assert builds, this would create a
WARNING message referring to a 2PC file with an epoch of "FFFFFFFF" (or
UINT32_MAX), as an effect of the underflow calculation, leaving the
orphaned file around.

Some tests are added with dummy 2PC files in the past and the future,
checking that these are properly removed.

Issue introduced by 5a1dfde8334b, that has switched two-phase state
files to use FullTransactionIds.

Reported-by: Vitaly Davydov
Author: Michael Paquier
Reviewed-by: Vitaly Davydov
Discussion: https://postgr.es/m/13b5b6-676c3080-4d-531db900@47931709
Backpatch-through: 17
2024-12-30 09:58:09 +09:00
Michael Paquier
03c46e1f26 Fix handling of orphaned 2PC files in the future at recovery
Before 728bd991c3c4, that has improved the support for 2PC files during
recovery, the initial logic scanning files in pg_twophase was done so as
files in the future of the transaction ID horizon were checked first,
followed by a check if a transaction ID is aborted or committed which
could involve a pg_xact lookup.  After this commit, these checks have
been done in reverse order.

Files detected as in the future do not have a state that can be checked
in pg_xact, hence this caused recovery to fail abruptly should an
orphaned 2PC file in the future of the transaction ID horizon exist in
pg_twophase at the beginning of recovery.

A test is added to check for this scenario, using an empty 2PC with a
transaction ID large enough to be in the future when running the test.
This test is added in 16 and older versions for now.  17 and newer
versions are impacted by a second bug caused by the addition of the
epoch in the 2PC file names.  An equivalent test will be added in these
branches in a follow-up commit, once the second set of issues reported
are fixed.

Author: Vitaly Davydov, Michael Paquier
Discussion: https://postgr.es/m/11e597-676ab680-8d-374f23c0@145466129
Backpatch-through: 13
2024-12-30 08:06:40 +09:00
Tom Lane
15b4c46c32 Exclude parallel workers from connection privilege/limit checks.
Cause parallel workers to not check datallowconn, rolcanlogin, and
ACL_CONNECT privileges.  The leader already checked these things
(except for rolcanlogin which might have been checked for a different
role).  Re-checking can accomplish little except to induce unexpected
failures in applications that might not even be aware that their query
has been parallelized.  We already had the principle that parallel
workers rely on their leader to pass a valid set of authorization
information, so this change just extends that a bit further.

Also, modify the ReservedConnections, datconnlimit and rolconnlimit
logic so that these limits are only enforced against regular backends,
and only regular backends are counted while checking if the limits
were already reached.  Previously, background processes that had an
assigned database or role were subject to these limits (with rather
random exclusions for autovac workers and walsenders), and the set of
existing processes that counted against each limit was quite haphazard
as well.  The point of these limits, AFAICS, is to ensure the
availability of PGPROC slots for regular backends.  Since all other
types of processes have their own separate pools of PGPROC slots, it
makes no sense either to enforce these limits against them or to count
them while enforcing the limit.

While edge-case failures of these sorts have been possible for a
long time, the problem got a good deal worse with commit 5a2fed911
(CVE-2024-10978), which caused parallel workers to make some of these
checks using the leader's current role where before we had used its
AuthenticatedUserId, thus allowing parallel queries to fail after
SET ROLE.  The previous behavior was fairly accidental and I have
no desire to return to it.

This patch includes reverting 73c9f91a1, which was an emergency hack
to suppress these same checks in some cases.  It wasn't complete,
as shown by a recent bug report from Laurenz Albe.  We can also revert
fd4d93d26 and 492217301, which hacked around the same problems in one
regression test.

In passing, remove the special case for autovac workers in
CheckMyDatabase; it seems cleaner to have AutoVacWorkerMain pass
the INIT_PG_OVERRIDE_ALLOW_CONNS flag, now that that does what's
needed.

Like 5a2fed911, back-patch to supported branches (which sadly no
longer includes v12).

Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
2024-12-28 16:08:50 -05:00
Tom Lane
14141bbbc1 Reserve a PGPROC slot and semaphore for the slotsync worker process.
The need for this was missed in commit 93db6cbda, with the result
being that if we launch a slotsync worker it would consume one of
the PGPROCs in the max_connections pool.  That could lead to inability
to launch the worker, or to subsequent failures of connection requests
that should have succeeded according to the configured settings.

Rather than create some one-off infrastructure to support this,
let's group the slotsync worker with the existing autovac launcher
in a new category of "special worker" processes.  These are kind of
like auxiliary processes, but they cannot use that infrastructure
because they need to be able to run transactions.

For the moment, make these processes share the PGPROC freelist
used for autovac workers (which previously supplied the autovac
launcher too).  This is partly to avoid an ABI change in v17,
and partly because it seems silly to have a freelist with
at most two members.  This might be worth revisiting if we grow
enough workers in this category.

Tom Lane and Hou Zhijie.  Back-patch to v17.

Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
2024-12-28 12:30:42 -05:00
Tom Lane
a46311ed72 Try to avoid semaphore-related test failures on NetBSD/OpenBSD.
These two platforms have a remarkably tight default limit on the
number of SysV semaphores in the system: SEMMNS is only 60
out-of-the-box.  Unless manual action is taken to raise that,
we'll only be able to allocate 3 sets of 16 usable semaphores
each, leading to initdb setting max_connections to just 20.
That's problematic because the core regression tests expect
to be able to launch 20 concurrent sessions, leaving us with
no headroom.  This seems to be the cause of intermittent
buildfarm failures on some machines.

While there's no getting around the fact that you'd better raise
SEMMNS for production use on these platforms, it does seem desirable
for "make check" to pass reliably without that.  We can make that
happen, at least for awhile longer, with two small changes:

* Change sysv_sema.c's SEMAS_PER_SET to 19, so that we can eat up
all of the available semas not just most of them.

* Change initdb to make the smallest max_connections value it will
consider be 25 not 20.

This is a back-patch of recent HEAD commit 38da05346 into v17.
The motivation for doing this now is that an upcoming bug-fix
patch will give the new-in-17 slotsync worker process its own
reserved PGPROC and hence also semaphore.  With that patch but
without this change, v17 would fail to start at all under the
default SEMMNS on these platforms.

Discussion: https://postgr.es/m/db2773a2-aca0-43d0-99c1-060efcd9954e@gmail.com
Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
2024-12-28 11:48:13 -05:00
Noah Misch
fa61313770 In REASSIGN OWNED of a database, lock the tuple as mandated.
Commit aac2c9b4fde889d13f859c233c2523345e72d32b mandated such locking
and attempted to fulfill that mandate, but it missed REASSIGN OWNED.
Hence, it remained possible to lose VACUUM's inplace update of
datfrozenxid if a REASSIGN OWNED processed that database at the same
time.  This didn't affect the other inplace-updated catalog, pg_class.
For pg_class, REASSIGN OWNED calls ATExecChangeOwner() instead of the
generic AlterObjectOwner_internal(), and ATExecChangeOwner() fulfills
the locking mandate.

Like in GRANT, implement this by following the locking protocol for any
catalog subject to the generic AlterObjectOwner_internal().  It would
suffice to do this for IsInplaceUpdateOid() catalogs only.  Back-patch
to v13 (all supported versions).

Kirill Reshke.  Reported by Alexander Kukushkin.

Discussion: https://postgr.es/m/CAFh8B=mpKjAy4Cuun-HP-f_vRzh2HSvYFG3rhVfYbfEBUhBAGg@mail.gmail.com
2024-12-28 07:16:26 -08:00
Heikki Linnakangas
d8b0c64116 meson: Export all libcommon functions in Windows builds
This fixes "unresolved external symbol" errors with extensions that
use functions from libpgport that need special CFLAGS to
compile. Currently, that includes the CRC-32 functions.

Commit 2571c1d5cc did this for libcommon, but I missed that libpqport
has the same issue.

Reported-by: Tom Lane
Backpatch-through: 16, where Meson was introduced
Discussion: https://www.postgresql.org/message-id/CAOdR5yF0krWrxycA04rgUKCgKugRvGWzzGLAhDZ9bzNv8g0Lag@mail.gmail.com
2024-12-25 19:22:33 +02:00
Heikki Linnakangas
c80acbc6fa meson: Export all libcommon functions in Windows builds
This fixes "unresolved external symbol" errors with extensions that
use functions from libcommon. This was reported with pgvector.

Reported-by: Andrew Kane
Author: Vladlen Popolitov
Backpatch-through: 16, where Meson was introduced
Discussion: https://www.postgresql.org/message-id/CAOdR5yF0krWrxycA04rgUKCgKugRvGWzzGLAhDZ9bzNv8g0Lag@mail.gmail.com
2024-12-25 18:14:26 +02:00
Tom Lane
89962bfef6 postgres_fdw: re-issue cancel requests a few times if necessary.
Despite the best efforts of commit 0e5c82380, we're still seeing
occasional failures of postgres_fdw's query_cancel test in the
buildfarm.  Investigation suggests that its 100ms timeout is
still not enough to reliably ensure that the remote side starts
the query before receiving the cancel request --- and if it
hasn't, it will just discard the request because it's idle.

We discussed allowing a cancel request to kill the next-received
query, but that would have wide and perhaps unpleasant side-effects.
What seems safer is to make postgres_fdw do what a human user would
likely do, which is issue another cancel request if the first one
didn't seem to do anything.  We'll keep the same overall 30 second
grace period before concluding things are broken, but issue additional
cancel requests after 1 second, then 2 more seconds, then 4, then 8.
(The next one in series is 16 seconds, but we'll hit the 30 second
timeout before that.)

Having done that, revert the timeout in query_cancel.sql to 10 ms.
That will still be enough on most machines, most of the time, for
the remote query to start; but now we're intentionally risking the
race condition occurring sometimes in the buildfarm, so that the
repeat-cancel code path will get some testing.

As before, back-patch to v17.  We might eventually contemplate
back-patching this further, and/or adding similar logic to dblink.
But given the lack of field complaints to date, this feels like
mostly an exercise in test case stabilization, so v17 is enough.

Discussion: https://postgr.es/m/colnv3lzzmc53iu5qoawynr6qq7etn47lmggqr65ddtpjliq5d@glkveb4m6nop
2024-12-23 15:14:30 -05:00
Michael Paquier
bbe68c13ab Fix memory leak in pgoutput with publication list cache
The pgoutput module caches publication names in a list and frees it upon
invalidation.  However, the code forgot to free the actual publication
names within the list elements, as publication names are pstrdup()'d in
GetPublication().  This would cause memory to leak in
CacheMemoryContext, bloating it over time as this context is not
cleaned.

This is a problem for WAL senders running for a long time, as an
accumulation of invalidation requests would bloat its cache memory
usage.  A second case, where this leak is easier to see, involves a
backend calling SQL functions like pg_logical_slot_{get,peek}_changes()
which create a new decoding context with each execution.  More
publications create more bloat.

To address this, this commit adds a new memory context within the
logical decoding context and resets it each time the publication names
cache is invalidated, based on a suggestion from Amit Kapila.  This
ensures that the lifespan of the publication names aligns with that of
the logical decoding context.

Contrary to the HEAD-only commit f0c569d71515 that has changed
PGOutputData to track this new child memory context, the context is
tracked with a static variable whose state is reset with a MemoryContext
reset callback attached to PGOutputData->context, so as ABI
compatibility is preserved in stable branches.  This approach is based
on an suggestion from Amit Kapila.

Analyzed-by: Michael Paquier, Jeff Davis
Author: Masahiko Sawada
Reviewed-by: Amit Kapila, Michael Paquier, Euler Taveira, Hou Zhijie
Discussion: https://postgr.es/m/Z0khf9EVMVLOc_YY@paquier.xyz
Backpatch-through: 13
2024-12-23 12:48:06 +09:00
Heikki Linnakangas
7cfdb4d1e7 Update TransactionXmin when MyProc->xmin is updated
GetSnapshotData() set TransactionXmin = MyProc->xmin, but when
SnapshotResetXmin() advanced MyProc->xmin, it did not advance
TransactionXmin correspondingly. That meant that TransactionXmin could
be older than MyProc->xmin, and XIDs between than TransactionXmin and
the real MyProc->xmin could be vacuumed away. One known consequence is
in pg_subtrans lookups: we might try to look up the status of an XID
that was already truncated away.

Back-patch to all supported versions.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/d27a046d-a1e4-47d1-a95c-fbabe41debb4@iki.fi
2024-12-21 23:42:52 +02:00
Thomas Munro
0350b876b0 Fix corruption when relation truncation fails.
RelationTruncate() does three things, while holding an
AccessExclusiveLock and preventing checkpoints:

1. Logs the truncation.
2. Drops buffers, even if they're dirty.
3. Truncates some number of files.

Step 2 could previously be canceled if it had to wait for I/O, and step
3 could and still can fail in file APIs.  All orderings of these
operations have data corruption hazards if interrupted, so we can't give
up until the whole operation is done.  When dirty pages were discarded
but the corresponding blocks were left on disk due to ERROR, old page
versions could come back from disk, reviving deleted data (see
pgsql-bugs #18146 and several like it).  When primary and standby were
allowed to disagree on relation size, standbys could panic (see
pgsql-bugs #18426) or revive data unknown to visibility management on
the primary (theorized).

Changes:

 * WAL is now unconditionally flushed first
 * smgrtruncate() is now called in a critical section, preventing
   interrupts and causing PANIC on file API failure
 * smgrtruncate() has a new parameter for existing fork sizes,
   because it can't call smgrnblocks() itself inside a critical section

The changes apply to RelationTruncate(), smgr_redo() and
pg_truncate_visibility_map().  That last is also brought up to date with
other evolutions of the truncation protocol.

The VACUUM FileTruncate() failure mode had been discussed in older
reports than the ones referenced below, with independent analysis from
many people, but earlier theories on how to fix it were too complicated
to back-patch.  The more recently invented cancellation bug was
diagnosed by Alexander Lakhin.  Other corruption scenarios were spotted
by me while iterating on this patch and earlier commit 75818b3a.

Back-patch to all supported releases.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reported-by: rootcause000@gmail.com
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/18146-04e908c662113ad5%40postgresql.org
Discussion: https://postgr.es/m/18426-2d18da6586f152d6%40postgresql.org
2024-12-20 23:57:18 +13:00
Peter Geoghegan
9e85b20da7 Avoid nbtree index scan SAOP scanBehind confusion.
Consistently reset so->scanBehind at the beginning of nbtree array
advancement, even during sktrig_required=false calls (calls where array
advancement is triggered by an unsatisfied non-required array scan key).
Otherwise, it's possible for queries to fail to return all relevant
tuples to the scan given a low-order required scan key that was
previously deemed "satisfied" by a truncated high key attribute value.
This only happened at the point where a later non-required array scan
key needed to be "advanced" once on the next leaf page (that is, once
the right sibling of the truncated high key page was reached).

The underlying issue was that later code within _bt_advance_array_keys
assumed that the so->scanBehind flag must have been set using the
current page's high key (not the previous page's high key).  Any later
successful recheck call to _bt_check_compare would therefore spuriously
be prevented from making _bt_advance_array_keys return true, based on
the faulty belief that the truncated attribute must be from the scan's
current tuple (i.e. the non-pivot tuple at the start of the next page).
_bt_advance_array_keys would return false for the tuple, ultimately
resulting in _bt_checkkeys failing to return a matching tuple.

Oversight in commit 5bf748b8, which enhanced nbtree ScalarArrayOp
execution.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkJKncfqyAUTeuB5GgRhT1vhsWO2q11dbZNqKmvjopP_g@mail.gmail.com
Backpatch: 17-, where commit 5bf748b8 first appears.
2024-12-19 11:08:53 -05:00
David Rowley
7b8d45d278 Fix Assert failure in WITH RECURSIVE UNION queries
If the non-recursive part of a recursive CTE ended up using
TTSOpsBufferHeapTuple as the table slot type, then a duplicate value
could cause an Assert failure in CheckOpSlotCompatibility() when
checking the hash table for the duplicate value.  The expected slot type
for the deform step was TTSOpsMinimalTuple so the Assert failed when the
TTSOpsBufferHeapTuple slot was used.

This is a long-standing bug which we likely didn't notice because it
seems much more likely that the non-recursive term would have required
projection and used a TTSOpsVirtual slot, which CheckOpSlotCompatibility
is ok with.

There doesn't seem to be any harm done here other than the Assert
failure.  Both TTSOpsMinimalTuple and TTSOpsBufferHeapTuple slot types
require tuple deformation, so the EEOP_*_FETCHSOME ExprState step would
have properly existed in the ExprState.

The solution is to pass NULL for the ExecBuildGroupingEqual's 'lops'
parameter.  This means the ExprState's EEOP_*_FETCHSOME step won't
expect a fixed slot type.  This makes CheckOpSlotCompatibility() happy as
no checking is performed when the ExprEvalStep is not expecting a fixed
slot type.

Reported-by: Richard Guo
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAMbWs4-8U9q2LAtf8+ghV11zeUReA3AmrYkxzBEv0vKnDxwkKA@mail.gmail.com
Backpatch-through: 13, all supported versions
2024-12-19 13:12:18 +13:00
Tom Lane
04b8601981 Fix memory leak in pg_restore with zstd-compressed data.
EndCompressorZstd() neglected to free everything.  This was
most visible with a lot of large objects in the dump.

Per report from Tomasz Szypowski.  Back-patch to v16
where this code came in.

Discussion: https://postgr.es/m/DU0PR04MB94193D038A128EF989F922D199042@DU0PR04MB9419.eurprd04.prod.outlook.com
2024-12-17 22:31:26 -05:00
Nathan Bossart
18452b70ac Accommodate very large dshash tables.
If a dshash table grows very large (e.g., the dshash table for
cumulative statistics when there are millions of tables), resizing
it may fail with an error like:

	ERROR: invalid DSA memory alloc request size 1073741824

To fix, permit dshash resizing to allocate more than 1 GB by
providing the DSA_ALLOC_HUGE flag.

Reported-by: Andreas Scherbaum
Author: Matthias van de Meent
Reviewed-by: Cédric Villemain, Michael Paquier, Andres Freund
Discussion: https://postgr.es/m/80a12d59-0d5e-4c54-866c-e69cd6536471%40pgug.de
Backpatch-through: 13
2024-12-17 15:24:45 -06:00
Tomas Vondra
3668c1d506 Detect version mismatch in brin_page_items
Commit dae761a87ed modified brin_page_items() to return the new "empty"
flag for each BRIN range. But the new output parameter was added in the
middle, which may cause crashes when using the new binary with old
function definition.

The ideal solution would be to introduce API versioning similar to what
pg_stat_statements does, but it's too late for that as PG17 was already
released (so we can't introduce a new extension version). We could do
something similar in brin_page_items() by checking the number of output
columns (and ignoring the new flag), but it doesn't seem very nice.

Instead, simply error out and suggest updating the extension to the
latest version. pageinspect is a superuser-only extension, and there's
not much reason to run an older version. Moreover, there's a precedent
for this approach in 691e8b2e18.

Reported by Ľuboslav Špilák, investigation and patch by me. Backpatch to
17, same as dae761a87ed.

Reported-by: Ľuboslav Špilák
Reviewed-by: Michael Paquier, Hayato Kuroda, Peter Geoghegan
Backpatch-through: 17
Discussion: https://postgr.es/m/VI1PR02MB63331C3D90E2104FD12399D38A5D2@VI1PR02MB6333.eurprd02.prod.outlook.com
Discussion: https://postgr.es/m/flat/3385a58f-5484-49d0-b790-9a198a0bf236@vondra.me
2024-12-17 17:50:13 +01:00
Tomas Vondra
42eae257cf Update comments about index parallel builds
Commit b43757171470 allowed parallel builds for BRIN, but left behind
two comments claiming only btree indexes support parallel builds.

Reported by Egor Rogov, along with similar issues in SGML docs.
Backpatch to 17, where parallel builds for BRIN were introduced.

Reported-by: Egor Rogov
Backpatch-through: 17
Discussion: https://postgr.es/m/114e2d5d-125e-07d8-94aa-5ad175fb7443@postgrespro.ru
2024-12-17 15:48:29 +01:00
Amit Kapila
88baa27914 Doc: Fix the wrong link on pg_createsubscriber page.
Commit 84db9a0eb1 has added the incorrect link to
'initial data synchronization'. It was a subsection of Row Filter and
didn't provide the required information.

Author: Peter Smith
Reviewed-by: Vignesh C, Pavel Luzanov
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/CAHut+PtnA4DB_pcv4TDr4NjUSM1=P2N_cuZx5DX09k7LVmaqUA@mail.gmail.com
2024-12-17 15:02:14 +05:30
Michael Paquier
94e7e77a95 pg_combinebackup: Fix PITR comparison test in 002_compare_backups
The test was creating both the dumps to compare from the same database
on the same node, so it would never detect any mismatches when comparing
the logical dumps of the two servers.

Fixing this issue has revealed that there is a difference in the dumps:
the tablespaces paths are different.  This commit uses compare_text()
with a custom comparison function to erase the difference (slightly
tweaked to be able to work with WIN32 and non-WIN32 paths).  This way,
the non-relevant parts of the tablespace path are ignored from the check
with the basic structure of the query string still compared.

Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/87h67653ns.fsf@wibble.ilmari.org
Backpatch-through: 17
2024-12-17 09:24:18 +09:00
Tomas Vondra
f5dec4e861 doc: Mention BRIN indexes support parallel builds
Two places in the documentation suggest B-tree is the only index access
method allowing parallel builds. Commit b4375717 added parallel builds
for BRIN too, but failed to update the docs. So fix that, and backpatch
to 17, where parallel BRIN builds were introduced.

Author: Egor Rogov
Backpatch-through: 17
Discussion: https://postgr.es/m/114e2d5d-125e-07d8-94aa-5ad175fb7443@postgrespro.ru
2024-12-16 19:20:22 +01:00
Heikki Linnakangas
d1253c67fa Make 009_twophase.pl test pass with recovery_min_apply_delay set
The test failed if you ran the regression tests with TEMP_CONFIG with
recovery_min_apply_delay = '500ms'. Fix the race condition by waiting
for transaction to be applied in the replica, like in a few other
tests.

The failing test was introduced in commit cbfbda7841. Backpatch to all
supported versions like that commit (except v12, which is no longer
supported).

Reported-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/09e2a70a-a6c2-4b5c-aeae-040a7449c9f2@gmail.com
2024-12-16 15:59:27 +02:00