LatchWaitSetPostmasterDeathPos, the latch event position for the
postmaster death event, is initialized under IsUnderPostmaster.
WaitLatch() considered it as a valid wait target in single-user mode
(!IsUnderPostmaster), which was incorrect.
One code path found to fail with an assertion failure is a database drop
in single-user mode while waiting in WaitForProcSignalBarrier() after
the drop.
Oversight in commit 84e5b2f07a.
Author: Patrick Stählin <me@packi.ch>
Co-authored-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Discussion: https://postgr.es/m/18996-3a2744c8140488de@postgresql.org
Backpatch-through: 18
lwlock.c, lwlock.h, and wait_event_names.txt each contain a list of
built-in LWLock tranches. It is easy to miss one or the other when
adding or removing tranches, and discrepancies have adverse effects
(e.g., breaking JOINs between pg_stat_activity and pg_wait_events).
This commit moves the lists of built-in tranches in lwlock.{c,h} to
lwlocklist.h and adds a cross-check to the script that generates
lwlocknames.h. If the lists do not match exactly, building will
fail.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aHpOgwuFQfcFMZ/B%40ip-10-97-1-34.eu-west-3.compute.internal
Logical replication requires reliable conflict detection to maintain data
consistency across nodes. To achieve this, we must prevent premature
removal of tuples deleted by other origins and their associated commit_ts
data by VACUUM, which could otherwise lead to incorrect conflict reporting
and resolution.
This patch introduces a mechanism to retain deleted tuples on the
subscriber during the application of concurrent transactions from remote
nodes. Retaining these tuples allows us to correctly ignore concurrent
updates to the same tuple. Without this, an UPDATE might be misinterpreted
as an INSERT during resolutions due to the absence of the original tuple.
Additionally, we ensure that origin metadata is not prematurely removed by
vacuum freeze, which is essential for detecting update_origin_differs and
delete_origin_differs conflicts.
To support this, a new replication slot named pg_conflict_detection is
created and maintained by the launcher on the subscriber. Each apply
worker tracks its own non-removable transaction ID, which the launcher
aggregates to determine the appropriate xmin for the slot, thereby
retaining necessary tuples.
Conflict information retention (deleted tuples and commit_ts) can be
enabled per subscription via the retain_conflict_info option. This is
disabled by default to avoid unnecessary overhead for configurations that
do not require conflict resolution or logging.
During upgrades, if any subscription on the old cluster has
retain_conflict_info enabled, a conflict detection slot will be created to
protect relevant tuples from deletion when the new cluster starts.
This is a foundational work to correctly detect update_deleted conflict
which will be done in a follow-up patch.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
The terms used in wait_event_names.txt and lwlock.c were inconsistent
for MultiXactOffsetSLRU and MultiXactMemberSLRU, which could cause joins
between pg_wait_events and pg_stat_activity to fail. lwlock.c is
adjusted in this commit to what the historical name of the event has
always been, and what is documented.
Oversight in 53c2a97a92. 08b9b9e043 has fixed a similar
inconsistency some time ago.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/aHdxN0D0hKXzHFQG@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 17
method_worker.c installed SignalHandlerForConfigReload, but it failed to
actually process reload requests. That hasn't yet produced any concrete
problem reports in terms of GUC changes it should have cared about in
v18, but it was inconsistent.
It did cause problems for a couple of patches in development that need
IO workers to react to ALTER SYSTEM + pg_reload_conf(). Fix extracted
from one of those patches.
Back-patch to 18.
Reported-by: Dmitry Dolgov <9erthalion6@gmail.com>
Discussion: https://postgr.es/m/sh5uqe4a4aqo5zkkpfy5fobe2rg2zzouctdjz7kou4t74c66ql%40yzpkxb7pgoxf
Adopt PgAioXXX convention for pgaio module type names. Rename a
function that didn't use a pgaio_worker_ submodule prefix. Rename the
internal submit function's arguments to match the indirectly relevant
function pointer declaration and nearby examples. Rename the array of
handle IDs in PgAioSubmissionQueue to sqes, a term of art seen in the
systems it emulates, also clarifying that they're not IO handle
pointers as the old name might imply.
No change in behavior, just type, variable and function name cleanup.
Back-patch to 18.
Discussion: https://postgr.es/m/CA%2BhUKG%2BwbaZZ9Nwc_bTopm4f-7vDmCwLk80uKDHj9mq%2BUp0E%2Bg%40mail.gmail.com
The new name more accurately reflects the effects of this flag on a
requested checkpoint. Checkpoint-related log messages (i.e., those
controlled by the log_checkpoints configuration parameter) will now
say "fast" instead of "immediate", too. Likewise, references to
"immediate" checkpoints in the documentation have been updated to
say "fast". This is preparatory work for a follow-up commit that
will add a MODE option to the CHECKPOINT command.
Author: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
The new name more accurately relects the effects of this flag on a
requested checkpoint. Checkpoint-related log messages (i.e., those
controlled by the log_checkpoints configuration parameter) will now
say "flush-unlogged" instead of "flush-all", too. This is
preparatory work for a follow-up commit that will add a
FLUSH_UNLOGGED option to the CHECKPOINT command.
Author: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
This commit adds a new system view that provides information about
entries in the dynamic shared memory (DSM) registry. Specifically,
it returns the name, type, and size of each entry. Note that since
we cannot discover the size of dynamic shared memory areas (DSAs)
and hash tables backed by DSAs (dshashes) without first attaching
to them, the size column is left as NULL for those.
Bumps catversion.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Sungwoo Chang <swchangdev@gmail.com>
Discussion: https://postgr.es/m/4D445D3E-81C5-4135-95BB-D414204A0AB4%40gmail.com
By default io_uring creates a shared memory mapping for each io_uring
instance, leading to a large number of memory mappings. Unfortunately a large
number of memory mappings slows things down, backend exit is particularly
affected. To address that, newer kernels (6.5) support using user-provided
memory for the memory. By putting the relevant memory into shared memory we
don't need any additional mappings.
On a system with a new enough kernel and liburing, there is no discernible
overhead when doing a pgbench -S -C anymore.
Reported-by: MARK CALLAGHAN <mdcallag@gmail.com>
Reviewed-by: "Burd, Greg" <greg@burd.me>
Reviewed-by: Jim Nasby <jnasby@upgrade.com>
Discussion: https://postgr.es/m/CAFbpF8OA44_UG+RYJcWH9WjF7E3GA6gka3gvH6nsrSnEe9H0NA@mail.gmail.com
Backpatch-through: 18
This commit standardizes the output format for LSNs to ensure consistent
representation across various tools and messages. Previously, LSNs were
inconsistently printed as `%X/%X` in some contexts, while others used
zero-padding. This often led to confusion when comparing.
To address this, the LSN format is now uniformly set to `%X/%08X`,
ensuring the lower 32-bit part is always zero-padded to eight
hexadecimal digits.
Author: Japin Li <japinli@hotmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/ME0P300MB0445CA53CA0E4B8C1879AF84B641A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
This refactoring is a follow-up of the work done in 5a1dfde833, that
has switched 2PC file names to use FullTransactionIds when written on
disk. This will help with the integration of a follow-up solution
related to the handling of two-phase files during recovery, to address
older defects while reading these from disk after a crash.
This change is useful in itself as it reduces the need to build the
file names from epoch numbers and TransactionIds, because we can use
directly FullTransactionIds from which the 2PC file names are guessed.
So this avoids a lot of back-and-forth between the FullTransactionIds
retrieved from the file names and how these are passed around in the
internal 2PC logic.
Note that the core of the change is the use of a FullTransactionId
instead of a TransactionId in GlobalTransactionData, that tracks 2PC
file information in shared memory. The change in TwoPhaseCallback makes
this commit unfit for stable branches.
Noah has contributed a good chunk of this patch. I have spent some time
on it as well while working on the issues with two-phase state files and
recovery.
Author: Noah Misch <noah@leadboat.com>
Co-Authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/Z5sd5O9JO7NYNK-C@paquier.xyz
Discussion: https://postgr.es/m/20250116205254.65.nmisch@google.com
Previously, truncating a temporary relation required scanning the entire
local buffer pool once per relation fork to invalidate buffers. This could
be slow, especially with a large local buffers, as the scan was repeated
multiple times.
A similar issue with regular tables (shared buffers) was addressed in
commit 6d05086c0a by scanning the buffer pool only once for all forks.
This commit applies the same optimization to temporary relations,
improving truncation performance.
Author: Daniil Davydov <3danissimo@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Discussion: https://postgr.es/m/CAJDiXggNqsJOH7C5co4jA8nDk8vw-=sokyh5s1_TENWnC6Ofcg@mail.gmail.com
Presently, the dynamic shared memory (DSM) registry only provides
GetNamedDSMSegment(), which allocates a fixed-size segment. To use
the DSM registry for more sophisticated things like dynamic shared
memory areas (DSAs) or a hash table backed by a DSA (dshash), users
need to create a DSM segment that stores various handles and LWLock
tranche IDs and to write fairly complicated initialization code.
Furthermore, there is likely little variation in this
initialization code between libraries.
This commit introduces functions that simplify allocating a DSA or
dshash within the DSM registry. These functions are very similar
to GetNamedDSMSegment(). Notable differences include the lack of
an initialization callback parameter and the prohibition of calling
the functions more than once for a given entry in each backend
(which should be trivially avoidable in most circumstances). While
at it, this commit bumps the maximum DSM registry entry name length
from 63 bytes to 127 bytes.
Also note that even though one could presumably detach/destroy the
DSAs and dshashes created in the registry, such use-cases are not
yet well-supported, if for no other reason than the associated DSM
registry entries cannot be removed. Adding such support is left as
a future exercise.
The test_dsm_registry test module contains tests for the new
functions and also serves as a complete usage example.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Discussion: https://postgr.es/m/aEC8HGy2tRQjZg_8%40nathan
When querying NUMA status of pages in shared memory, we need to touch
the memory first to get valid results. This may trigger valgrind
reports, because some of the memory (e.g. unpinned buffers) may be
marked as noaccess.
Solved by adding a valgrind suppresion. An alternative would be to
adjust the access/noaccess status before touching the memory, but that
seems far too invasive. It would require all those places to have
detailed knowledge of what the shared memory stores.
The pg_numa_touch_mem_if_required() macro is replaced with a function.
Macros are invisible to suppressions, so it'd have to suppress reports
for the caller - e.g. pg_get_shmem_allocations_numa(). So we'd suppress
reports for the whole function, and that seems to heavy-handed. It might
easily hide other valid issues.
Reviewed-by: Christoph Berg <myon@debian.org>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aEtDozLmtZddARdB@msg.df7cb.de
Backpatch-through: 18
Previously there was no memory barrier enforcing correct memory ordering when
waiting for a free IO handle. However, in the much more common case of waiting
for IO to complete, memory barriers already were present.
On strongly ordered architectures like x86 this had no negative consequences,
but on some armv8 hardware (observed on Apple hardware), it was possible for
the update, in the IO worker, to PgAioHandle->state to become visible before
->distilled_result becoming visible, leading to rather confusing assertion
failures. The failures were rare enough that the bug sometimes took days to
reproduce when running 027_stream_regress in a loop.
Once finally debugged, it was easy enough to come up with a much quicker
repro: Trigger a lot of very fast IO by limiting io_combine_limit to 1 and
ensure that we always have to wait for a free handle by setting
io_max_concurrency to 1. Triggering lots of concurrent seqscans in that setup
triggers the issue within seconds.
One reason this was hard to debug was that the assertion failure most commonly
happened in WaitReadBuffers(), rather than in the AIO subsystem itself. The
assertions added in this commit make problems like this easier to understand.
Also add a comment to the IO worker explaining that we rely on the lwlock
acquisition for correct memory ordering.
I think it'd be good to add a tap test that stress tests buffer IO, but that's
material for a separate patch.
Thanks a lot to Alexander and Konstantin for all the debugging help.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Investigated-by: Andres Freund <andres@anarazel.de>
Investigated-by: Alexander Lakhin <exclusion@gmail.com>
Investigated-by: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/2dkz7azclpeiqcmouamdixyn5xhlzy4rvikxrbovyzvi6rnv5c@pz7o7osv2ahf
This is a continuation of 15a79c7311, cleaning up the AIO io_uring
code that has been committed after that while still using %llu.
The code changed here is new in v18, so cleaning things now means less
conflicts if this area of the code changes on backpatch once the 18
stable branch is created.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/aEZcGCnYFq642q8k@paquier.xyz
This commit replaces the formula used for "TotalProcs" with a call to
pgaio_uring_procs() in pgaio_uring_shmem_init() for the shared memory
initialization, which is exactly the same, removing a duplication.
pgaio_uring_procs() is used for shared memory sizing and a sanity check,
and it has some documentation explaining some reasoning behind the
formula.
Author: Japin Li <japinli@hotmail.com>
Discussion: https://postgr.es/m/ME0P300MB044521067A1EDDA9EDEC3793B66DA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Previously, XactLockTableWait() and ConditionalXactLockTableWait() could enter
a non-interruptible loop when they successfully acquired a lock on a transaction
but the transaction still appeared to be running. Since this loop continued
until the transaction completed, it could result in long, uninterruptible waits.
Although this scenario is generally unlikely since XactLockTableWait() and
ConditionalXactLockTableWait() can basically acquire a transaction lock
only when the transaction is not running, it can occur in a hot standby.
In such cases, the transaction may still appear active due to
the KnownAssignedXids list, even while no lock on the transaction exists.
For example, this situation can happen when creating a logical replication
slot on a standby.
The cause of the non-interruptible loop was the absence of CHECK_FOR_INTERRUPTS()
within it. This commit adds CHECK_FOR_INTERRUPTS() to the loop in both functions,
ensuring they can be interrupted safely.
Back-patch to all supported branches.
Author: Kevin K Biju <kevinkbiju@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAM45KeELdjhS-rGuvN=ZLJ_asvZACucZ9LZWVzH7bGcD12DDwg@mail.gmail.com
Backpatch-through: 13
pg_aios used the terms "read" and "write" for vectored I/O read and
write operations, respectively. The documentation refers to them as
"readv" and "writev", and the code uses internally the terms
PGAIO_OP_READV and PGAIO_OP_WRITEV for them, as of "vectored".
This commit adjusts these operation names to match with the code and the
documentation.
Oversight in 8e293e689b.
Author: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Discussion: https://postgr.es/m/6df1e949d1d759ad2767c18e5845963e@oss.nttdata.com
elog()/ereport() process interrupts, iff the log message is < ERROR and the
log message will be emitted. aio's debug messages are emitted via ereport(),
but in some places the code is not ready for interrupts to be processed.
Fix the issue using a few different methods:
1) handle interrupts arriving concurrently - in some places it's easy to
detect that by fetching the handle's generation a bit earlier
2) Check if interrupts made the work needing to be done obsolete
3) Disallow interrupts, as there's no sane way to make interrupt processing
safe
To prevent some similar issues from being re-introduced, assert that
interrupts are held in pgaio_io_update_state().
This commit also fixes the contents of a debug message I added in 039bfc457e.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/mvpm7ga3dfgz7bvum22hmuz26cariylmcppb3irayftc7bwk3r@l7gb6gr7azhc
This cleans up the code related to the testing infrastructure of AIO
that used injection points, switching the test code to use the new
facility for injection points added by 371f2db8b0 rather than tweaks
to pass and reset arguments to the callbacks run.
This removes all the dependencies to USE_INJECTION_POINTS in the AIO
code. pgaio_io_call_inj(), pgaio_inj_io_get() and pgaio_inj_cur_handle
are now gone.
Reviewed-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/Z_y9TtnXubvYAApS@paquier.xyz
The macros INJECTION_POINT() and INJECTION_POINT_CACHED() are extended
with an optional argument that can be passed down to the callback
attached when an injection point is run, giving to callbacks the
possibility to manipulate a stack state given by the caller. The
existing callbacks in modules injection_points and test_aio have their
declarations adjusted based on that.
da7226993f (core AIO infrastructure) and 93bc3d75d8 (test_aio) and
been relying on a set of workarounds where a static variable called
pgaio_inj_cur_handle is used as runtime argument in the injection point
callbacks used by the AIO tests, in combination with a TRY/CATCH block
to reset the argument value. The infrastructure introduced in this
commit will be reused for the AIO tests, simplifying them.
Reviewed-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/Z_y9TtnXubvYAApS@paquier.xyz
A 'void *' argument suggests that the caller might pass an arbitrary
struct, which is appropriate for functions like libc's read/write, or
pq_sendbytes(). 'uint8 *' is more appropriate for byte arrays that
have no structure, like the cancellation keys or SCRAM tokens. Some
places used 'char *', but 'uint8 *' is better because 'char *' is
commonly used for null-terminated strings. Change code around SCRAM,
MD5 authentication, and cancellation key handling to follow these
conventions.
Discussion: https://www.postgresql.org/message-id/61be9e31-7b7d-49d5-bc11-721800d89d64@eisentraut.org
This cleans up some loose ends left by commit e8ca9ed1d. I hadn't
looked closely enough at these places before, but now I have.
The use of double-quoted #includes for Perl headers in plperl_system.h
seems to be simply a mistake introduced in 6c944bf3c and faithfully
copied forward since then. (I had thought possibly it was required
by some weird Windows build setup, but there's no evidence of that in
our history.)
The occurrences in SectionMemoryManager.h and SectionMemoryManager.cpp
evidently stem from those files' origin as LLVM code. It's
understandable that LLVM would treat their own files as needing
double-quoted #includes; but they're still system headers to us.
I also applied the same check to *.c files, and found a few other
random incorrect usages in both directions.
Our ECPG headers and test files routinely use angle brackets to refer
to ECPG headers. I left those usages alone, since it seems reasonable
for an ECPG user to regard those headers as system headers.
c4d5cb71d2 adjusted the fast-path locking code to allow some
configuration of the number of fast-path locking slots via the
max_locks_per_transaction GUC. In that commit the FAST_PATH_REL_GROUP()
macro used integer division to determine the fast-path locking group slot
to use for the lock.
The divisor in this case is always a power-of-two value. Here we swap
out the divide by a bitwise-AND, which is a significantly faster
operation to perform.
In passing, adjust the code that's setting FastPathLockGroupsPerBackend
so that it's more clear that the value being set is a power-of-two.
Also, adjust some comments in the area which contained some magic
numbers. It seems better to justify the 1024 upper limit in the
location where the #define is made instead of where it is used.
Author: David Rowley <drowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAApHDvodr3bcnpxcs7+k-3cFwYR0tP-BYhyd2PpDhe-bCx9i=g@mail.gmail.com
10f6646847 intended to limit the value of io_combine_limit to the minimum of
io_combine_limit and io_max_combine_limit. To avoid issues with interdependent
GUCs, it introduced io_combine_limit_guc and set io_combine_limit in assign
hooks. That plan was thwarted by guc_tables.c accidentally still referencing
io_combine_limit, instead of io_combine_limit_guc. That lead to the GUC
machinery overriding the work done in the assign hooks, potentially leaving
io_combine_limit with a too high value.
The consequence of this bug was that when running with io_combine_limit >
io_combine_limit_guc the AIO machinery would not have reserved large enough
iovec and IO data arrays, with one IO's arrays overlapping with another IO's,
leading to total confusion.
To make such a problem easier to detect in the future, add assertions to
pgaio_io_set_handle_data_* checking the length is smaller than
io_max_combine_limit (not just PG_IOV_MAX).
It'd be nice to have a few tests for this, but it's not entirely obvious how
to do so portably.
As remarked upon by Tom, the GUC assignment hooks really shouldn't set the
underlying variable, that's the job of the GUC machinery. Change that as well.
Discussion: https://postgr.es/m/c5jyqnuwrpigd35qe7xdypxsisdjrdba5iw63mhcse4mzjogxo@qdjpv22z763f
pgaio_io_reclaim() reset the fields in PgAioHandle before updating the state
to IDLE or incrementing the generation. For most things that's OK, but for
pg_get_aios() it is not - if it copied the PgAioHandle while fields were being
reset, we wouldn't detect that and could call
pgaio_io_get_target_description() with ioh->target == PGAIO_TID_INVALID,
leading to a crash.
Fix this issue by incrementing the generation and state earlier, before
resetting.
Also add an assertion to pgaio_io_get_target_description() for the target to
be valid - that'd have made this case a bit easier to debug. While at it,
add/update a few related assertions.
Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/062daca9-dfad-4750-9da8-b13388301ad9@gmail.com
The format of the injection point names used by the AIO code does not
match the existing naming convention used everywhere else in the code,
so let's be consistent. These points are used in test_aio.
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/Z_yTB80bdu1sYDqJ@paquier.xyz
Commit 0bada39c83 fixed a bug of this kind,
which existed in all branches for six days before detection. While the
probability of reaching the trouble was low, the disruption was extreme. No
new backends could start, and service restoration needed an immediate
shutdown. Hence, add this to catch the next bug like it.
The new check in RelationIdGetRelation() suffices to make autovacuum detect
the bug in commit 243e9b40f1 that led to commit
0bada39. This also checks in a number of similar places. It replaces each
Assert(IsTransactionState()) that pertained to a conditional catalog read.
No back-patch for now, but a back-patch of commit 243e9b4 should back-patch
this, too. A back-patch could omit the src/test/regress changes, since back
branches won't gain new index columns.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/20250410191830.0e.nmisch@google.com
Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places. These
inconsistencies were all introduced during Postgres 18 development.
This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
This moves/renames some of the functions defined in pg_numa.c:
* pg_numa_get_pagesize() is renamed to pg_get_shmem_pagesize(), and
moved to src/backend/storage/ipc/shmem.c. The new name better reflects
that the page size is not related to NUMA, and it's specifically about
the page size used for the main shared memory segment.
* move pg_numa_available() to src/backend/storage/ipc/shmem.c, i.e. into
the backend (which more appropriate for functions callable from SQL).
While at it, improve the comment to explain what page size it returns.
* remove unnecessary includes from src/port/pg_numa.c, adding
unnecessary dependencies (src/port should be suitable for frontent).
These were either leftovers or unnecessary thanks to the other changes
in this commit.
This eliminates unnecessary dependencies on backend symbols, which we
don't want in src/port.
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
https://postgr.es/m/CALdSSPi5fj0a7UG7Fmw2cUD1uWuckU_e8dJ+6x-bJEokcSXzqA@mail.gmail.com
It can be set to either COPY (the default) or CLONE if the system
supports it. CLONE causes callers of copydir(), currently CREATE
DATABASE ... STRATEGY=FILE_COPY and ALTER DATABASE ... SET TABLESPACE =
..., to use copy_file_range (Linux, FreeBSD) or copyfile (macOS) to copy
files instead of a read-write loop over the contents.
CLONE gives the kernel the opportunity to share block ranges on
copy-on-write file systems and push copying down to storage on others,
depending on configuration. On some systems CLONE can be used to clone
large databases quickly with CREATE DATABASE ... TEMPLATE=source
STRATEGY=FILE_COPY.
Other operating systems could be supported; patches welcome.
Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Ranier Vilela <ranier.vf@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGLM%2Bt%2BSwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w%40mail.gmail.com