1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-18 02:02:55 +03:00
Commit Graph

2951 Commits

Author SHA1 Message Date
Heikki Linnakangas
daf3d99d2b Add comment to explain why PGReserveSemaphores() is called early
Before commit e25626677f, PGReserveSemaphores() had to be called
before SpinlockSemaInit() because spinlocks were implemented using
semaphores on some platforms (--disable-spinlocks). Add a comment
explaining that.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAExHW5seSZpPx-znjidVZNzdagGHOk06F+Ds88MpPUbxd1kTaA@mail.gmail.com
Backpatch-to: 18
2025-11-06 14:20:48 +02:00
Alexander Korotkov
447aae13b0 Implement WAIT FOR command
WAIT FOR is to be used on standby and specifies waiting for
the specific WAL location to be replayed.  This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.

WAIT FOR needs to wait without any snapshot held.  Otherwise, the snapshot
could prevent the replay of WAL records, implying a kind of self-deadlock.
This is why separate utility command seems appears to be the most robust
way to implement this functionality.  It's not possible to implement this as
a function.  Previous experience shows that stored procedures also have
limitation in this aspect.

Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com
Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2025-11-05 11:44:13 +02:00
Alexander Korotkov
3b4e53a075 Add infrastructure for efficient LSN waiting
Implement a new facility that allows processes to wait for WAL to reach
specific LSNs, both on primary (waiting for flush) and standby (waiting
for replay) servers.

The implementation uses shared memory with per-backend information
organized into pairing heaps, allowing O(1) access to the minimum
waited LSN. This enables fast-path checks: after replaying or flushing
WAL, the startup process or WAL writer can quickly determine if any
waiters need to be awakened.

Key components:
- New xlogwait.c/h module with WaitForLSNReplay() and WaitForLSNFlush()
- Separate pairing heaps for replay and flush waiters
- WaitLSN lightweight lock for coordinating shared state
- Wait events WAIT_FOR_WAL_REPLAY and WAIT_FOR_WAL_FLUSH for monitoring

This infrastructure can be used by features that need to wait for WAL
operations to complete.

Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com
Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2025-11-05 11:44:13 +02:00
Andres Freund
dae00f333b aio: Improve assertions related to io_method
First, the assertions in assign_io_method() were the wrong way round. Second,
the lengthof() assertion checked the length of io_method_options, which is the
wrong array to check and is always longer than pgaio_method_ops_table.

While add it, add a static assert to ensure pgaio_method_ops_table and
io_method_options stay in sync.

Per coverity and Tom Lane.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 18
2025-11-04 20:03:53 -05:00
Masahiko Sawada
8ae0f6a0c3 Add CHECK_FOR_INTERRUPTS in Evict{Rel,All}UnpinnedBuffers.
This commit adds CHECK_FOR_INTERRUPTS to the shared buffer iteration
loops in EvictRelUnpinnedBuffers and EvictAllUnpinnedBuffers. These
functions, used by pg_buffercache's pg_buffercache_evict_relation and
pg_buffercache_evict_all, can now be interrupted during long-running
operations.

Backpatch to version 18, where these functions and their corresponding
pg_buffercache functions were introduced.

Author: Yuhang Qiu <iamqyh@gmail.com>
Discussion: https://postgr.es/m/8DC280D4-94A2-4E7B-BAB9-C345891D0B78%40gmail.com
Backpatch-through: 18
2025-11-04 15:47:25 -08:00
Peter Eisentraut
e1ac846f3d Mark ItemPointer arguments as const throughout
This is a follow up 991295f.  I searched over src/ and made all
ItemPointer arguments as const as much as possible.

Note: We cut out from the original patch the pieces that would have
created incompatibilities in the index or table AM APIs.  Those could
be considered separately.

Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw%40mail.gmail.com
2025-10-30 14:12:06 +01:00
Peter Eisentraut
3479a0f823 const-qualify ItemPointer comparison functions
Add const qualifiers to ItemPointerEquals() and ItemPointerCompare().
This will allow further changes up the stack.  It also complements
commit aeb767ca0b, as we now have all of itemptr.h appropriately
const-qualified.

Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw@mail.gmail.com
2025-10-30 10:13:47 +01:00
Nathan Bossart
123661427b Fix a couple of comments.
These were discovered while reviewing Aleksander Alekseev's
proposed changes to pgindent.

Oversights in commits 393e0d2314 and 25a30bbd42.

Discussion: https://postgr.es/m/aP-H6kSsGOxaB21k%40nathan
2025-10-27 10:30:05 -05:00
Peter Eisentraut
76acf4b722 Remove Item type
This type is just char * underneath, it provides no real value, no
type safety, and just makes the code one level more mysterious.  It is
more idiomatic to refer to blobs of memory by a combination of void *
and size_t, so change it to that.

Also, since this type hides the pointerness, we can't apply qualifiers
to what is pointed to, which requires some unconstify nonsense.  This
change allows fixing that.

Extension code that uses the Item type can change its code to use
void * to be backward compatible.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org
2025-10-27 09:55:59 +01:00
Álvaro Herrera
b7cc6474e9 Make smgr access for a BufferManagerRelation safer in relcache inval
Currently there's no bug, because we have no code path where we
invalidate relcache entries where it'd cause a problem.  But it's more
robust to do it this way in case we introduce such a path later, as some
Postgres forks reportedly already have.

Author: Daniil Davydov <3danissimo@gmail.com>
Reviewed-by: Stepan Neretin <slpmcf@gmail.com>
Discussion: https://postgr.es/m/CAJDiXgj3FNzAhV+jjPqxMs3jz=OgPohsoXFj_fh-L+nS+13CKQ@mail.gmail.com
2025-10-21 10:51:55 +03:00
Michael Paquier
e4e496e88c Fix comment in pg_get_shmem_allocations_numa()
The comment fixed in this commit described the function as dealing with
database blocks, but in reality it processes shared memory allocations.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aH4DDhdiG9Gi0rG7@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 18
2025-10-21 16:12:30 +09:00
Tom Lane
92cf557ffa Add static assertion that RELSEG_SIZE fits in an int.
Our configure script intended to ensure this, but it supposed that
expr(1) would report an error for integer overflow.  Maybe that
was true when the code was written (commit 3c6248a82 of 2008-05-02),
but all the modern expr's I tried will deliver bigger-than-int32
results without complaint.  Moreover, if you use --with-segsize-blocks
then there's no check at all.

Ideally we'd add a test in configure itself to check that the value
fits in int, but to do that we'd need to suppose that test(1) handles
bigger-than-int32 numbers correctly.  Probably modern ones do, but
that's an assumption I could do without; and I'm not too trusting
about meson either.  Instead, let's install a static assertion, so
that even people who ignore all the compiler warnings you get from
such values will be forced to confront the fact that it won't work.

This has been hazardous for awhile, but given that we hadn't heard
a complaint about it till now, I don't feel a need to back-patch.

Reported-by: Casey Shobe <casey.allen.shobe@icloud.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/C5DC82D6-C76D-4E8F-BC2E-DF03EFC4FA24@icloud.com
2025-10-19 18:28:46 -04:00
Andres Freund
c819d1017d bufmgr: Fix valgrind checking for buffers pinned in StrategyGetBuffer()
In 5e89985928 I made StrategyGetBuffer() pin buffers with a single CAS,
instead of using PinBuffer_Locked(). Unfortunately I missed that
PinBuffer_Locked() marked the page as defined for valgrind.

Fix this oversight by centralizing the valgrind initialization into
TrackNewBufferPin(), which also allows us to reduce the number of places doing
VALGRIND_MAKE_MEM_DEFINED.

Per buildfarm animal skink and Amit Langote.

Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
Discussion: https://postgr.es/m/CA+HiwqGKJ6nEXEPQW7EpykVsEtzxp5-up_xhtcUAkWFtATVQvQ@mail.gmail.com
2025-10-09 19:17:13 -04:00
Andres Freund
5e89985928 bufmgr: Don't lock buffer header in StrategyGetBuffer()
Previously StrategyGetBuffer() acquired the buffer header spinlock for every
buffer, whether it was reusable or not. If reusable, it'd be returned, with
the lock held, to GetVictimBuffer(), which then would pin the buffer with
PinBuffer_Locked(). That's somewhat violating the spirit of the guidelines for
holding spinlocks (i.e. that they are only held for a few lines of consecutive
code) and necessitates using PinBuffer_Locked(), which scales worse than
PinBuffer() due to holding the spinlock.  This alone makes it worth changing
the code.

However, the main reason to change this is that a future commit will make
PinBuffer_Locked() slower (due to making UnlockBufHdr() slower), to gain
scalability for the much more common case of pinning a pre-existing buffer. By
pinning the buffer with a single atomic operation, iff the buffer is reusable,
we avoid any potential regression for miss-heavy workloads. There strictly are
fewer atomic operations for each potential buffer after this change.

The price for this improvement is that freelist.c needs two CAS loops and
needs to be able to set up the resource accounting for pinned buffers. The
latter is achieved by exposing a new function for that purpose from bufmgr.c,
that seems better than exposing the entire private refcount infrastructure.
The improvement seems worth the complexity.

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-10-08 17:04:07 -04:00
Andres Freund
3baae90013 bufmgr: fewer calls to BufferDescriptorGetContentLock
We're planning to merge buffer content locks into BufferDesc.state. To reduce
the size of that patch, centralize calls to BufferDescriptorGetContentLock().

The biggest part of the change is in assertions, by introducing
BufferIsLockedByMe[InMode]() (and removing BufferIsExclusiveLocked()). This
seems like an improvement even without aforementioned plans.

Additionally replace some direct calls to LWLockAcquire() with calls to
LockBuffer().

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-10-08 16:06:19 -04:00
Andres Freund
2a2e1b470b bufmgr: Fix signedness of mask variable in BufferSync()
BM_PERMANENT is defined as 1U<<31, which is a negative number when interpreted
as a signed integer. Unfortunately the mask variable in BufferSync() was
signed. This has been wrong for a long time, but failed to fail, due to
integer conversion rules.

However, in an upcoming patch the width of the state variable will be
increased, with the wrong signedness leading to never flushing permanent
buffers - luckily caught in a test.

It seems better to fix this separately, instead of doing so as part of a
large, otherwise mechanical, patch.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-10-08 14:34:30 -04:00
Andres Freund
3c2b97b29e bufmgr: Introduce FlushUnlockedBuffer
There were several copies of code locking a buffer, flushing its contents, and
unlocking the buffer. It seems worth centralizing that into a helper function.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-10-08 14:34:30 -04:00
Andres Freund
819dc118c0 Improve ReadRecentBuffer() scalability
While testing a new potential use for ReadRecentBuffer(), Andres
reported that it scales badly when called concurrently for the same
buffer by many backends.  Instead of a naive (but wrong) coding with
PinBuffer(), it used the spinlock, so that it could be careful to pin
only if the buffer was valid and holding the expected block, to avoid
breaking invariants in eg GetVictimBuffer().  Unfortunately that made it
less scalable than PinBuffer(), which uses compare-exchange instead.

We can fix that by giving PinBuffer() a new skip_if_not_valid mode that
doesn't pin invalid buffers.  It might occasionally skip when it
shouldn't due to the unlocked read of the header flags, but that's
unlikely and perfectly acceptable for an opportunistic optimisation
routine, and it can only succeed when it really should due to the
compare-exchange loop.

Note that this fixes ReadRecentBuffer()'s failure to bump the usage
count. While this could be seen as a bug, there currently aren't cases
affected by this in core, so it doesn't seem worth backpatching that portion.

Author: Thomas Munro <thomas.munro@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/20230627020546.t6z4tntmj7wmjrfh%40awork3.anarazel.de
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-10-08 13:10:40 -04:00
Nathan Bossart
74b41f5a77 Make some use of anonymous unions [DSM registry].
Make some use of anonymous unions, which are allowed as of C11, as
examples and encouragement for future code, and to test compilers.

This commit changes the DSMRegistryEntry struct.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/aNKsDg0fJwqhZdXX%40nathan
2025-10-03 10:14:33 -05:00
Álvaro Herrera
7e638d7f50 Don't include execnodes.h in replication/conflict.h
... which silently propagates a lot of headers into many places
via pgstat.h, as evidenced by the variety of headers that this patch
needs to add to seemingly random places.  Add a minimum of typedefs to
conflict.h to be able to remove execnodes.h, and fix the fallout.

Backpatch to 18, where conflict.h first appeared.

Discussion: https://postgr.es/m/202509191927.uj2ijwmho7nv@alvherre.pgsql
2025-09-25 14:52:41 +02:00
Peter Eisentraut
a5b35fcedb Remove PointerIsValid()
This doesn't provide any value over the standard style of checking the
pointer directly or comparing against NULL.

Also remove related:
- AllocPointerIsValid() [unused]
- IndexScanIsValid() [had one user]
- HeapScanIsValid() [unused]
- InvalidRelation [unused]

Leaving HeapTupleIsValid(), ItemIdIsValid(), PortalIsValid(),
RelationIsValid for now, to reduce code churn.

Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/ad50ab6b-6f74-4603-b099-1cd6382fb13d%40eisentraut.org
Discussion: https://www.postgresql.org/message-id/CA+hUKG+NFKnr=K4oybwDvT35dW=VAjAAfiuLxp+5JeZSOV3nBg@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/bccf2803-5252-47c2-9ff0-340502d5bd1c@iki.fi
2025-09-24 15:17:20 +02:00
Nathan Bossart
c3cc2ab87d Fix re-initialization of LWLock-related shared memory.
When shared memory is re-initialized after a crash, the named
LWLock tranche request array that was copied to shared memory will
no longer be accessible.  To fix, save the pointer to the original
array in postmaster's local memory, and switch to it when
re-initializing the LWLock-related shared memory.

Oversight in commit ed1aad15e0.  Per buildfarm member batta.

Reported-by: Michael Paquier <michael@paquier.xyz>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aMoejB3iTWy1SxfF%40paquier.xyz
Discussion: https://postgr.es/m/f8ca018f-3479-49f6-a92c-e31db9f849d7%40gmail.com
2025-09-18 09:55:39 -05:00
Andres Freund
0110e2ec5c Mark shared buffer lookup table HASH_FIXED_SIZE
StrategyInitialize() calls InitBufTable() with maximum number of entries that
the buffer lookup table can ever have. Thus there should not be any need to
allocate more element after initialization. Hence mark the hash table as fixed
sized.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAExHW5v0jh3F_wj86yC=qBfWk0uiT94qy=Z41uzAHLHh0SerRA@mail.gmail.com
2025-09-17 20:28:43 -04:00
Michael Paquier
158c48303e Fix shared memory calculation size of PgAioCtl
The shared memory size was calculated based on an offset of io_handles,
which is itself a pointer included in the structure.  We tend to
overestimate the shared memory size overall, so this was unlikely an
issue in practice, but let's be correct and use the full size of the
structure in the calculation, so as the pointer for io_handles is
included.

Oversight in da7226993f.

Author: Madhukar Prasad <madhukarprasad@google.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAKi+wrbC2dTzh_vKJoAZXV5wqTbhY0n4wRNpCjJ=e36aoo0kFw@mail.gmail.com
Backpatch-through: 18
2025-09-17 09:33:32 +09:00
Peter Eisentraut
2aac62be8c Default to log_lock_waits=on
If someone is stuck behind a lock for more than a second, that is
almost always a problem that is worth a log entry.

Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-By: Michael Banck <mbanck@gmx.net>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Christoph Berg <myon@debian.org>
Reviewed-By: Stephen Frost <sfrost@snowman.net>
Discussion: https://postgr.es/m/b8b8502915e50f44deb111bc0b43a99e2733e117.camel%40cybertec.at
2025-09-12 07:57:06 +02:00
Peter Eisentraut
25f36066dd Remove traces of support for Sun Studio compiler
Per discussion, this compiler suite is no longer maintained, and
it has not been able to compile PostgreSQL since at least PostgreSQL
17.

This removes all the remaining support code for this compiler.

Note that the Solaris operating system continues to be supported, but
using GCC as the compiler.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/a0f817ee-fb86-483a-8a14-b6f7f5991b6e%40eisentraut.org
2025-09-12 07:39:05 +02:00
Nathan Bossart
ed1aad15e0 Move named LWLock tranche requests to shared memory.
In EXEC_BACKEND builds, GetNamedLWLockTranche() can segfault when
called outside of the postmaster process, as it might access
NamedLWLockTrancheRequestArray, which won't be initialized.  Given
the lack of reports, this is apparently unusual, presumably because
it is usually called from a shmem_startup_hook like this:

    mystruct = ShmemInitStruct(..., &found);
    if (!found)
    {
        mystruct->locks = GetNamedLWLockTranche(...);
        ...
    }

This genre of shmem_startup_hook evades the aforementioned
segfaults because the struct is initialized in the postmaster, so
all other callers skip the !found path.

We considered modifying the documentation or requiring
GetNamedLWLockTranche() to be called from the postmaster, but
ultimately we decided to simply move the request array to shared
memory (and add it to the BackendParameters struct), thereby
allowing calls outside postmaster on all platforms.  Since the main
shared memory segment is initialized after accepting LWLock tranche
requests, postmaster builds the request array in local memory first
and then copies it to shared memory later.

Given the lack of reports, back-patching seems unnecessary.

Reported-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0v1_15QPg5Sqd2Qz5rh_qcsyCeHHmRDY89xVHcy2yt5BQ%40mail.gmail.com
2025-09-11 16:13:55 -05:00
Jeff Davis
9af672bcb2 meson: build checksums with extra optimization flags.
Use -funroll-loops and -ftree-vectorize when building checksum.c to
match what autoconf does.

Discussion: https://postgr.es/m/a81f2f7ef34afc24a89c613671ea017e3651329c.camel@j-davis.com
Reviewed-by: Andres Freund <andres@anarazel.de>
2025-09-08 12:29:42 -07:00
Andres Freund
2c78940527 bufmgr: Remove freelist, always use clock-sweep
This set of changes removes the list of available buffers and instead simply
uses the clock-sweep algorithm to find and return an available buffer.  This
also removes the have_free_buffer() function and simply caps the
pg_autoprewarm process to at most NBuffers.

While on the surface this appears to be removing an optimization it is in fact
eliminating code that induces overhead in the form of synchronization that is
problematic for multi-core systems.

The main reason for removing the freelist, however, is not the moderate
improvement in scalability, but that having the freelist would require
dedicated complexity in several upcoming patches. As we have not been able to
find a case benefiting from the freelist...

Author: Greg Burd <greg@burd.me>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/70C6A5B5-2A20-4D0B-BC73-EB09DD62D61C@getmailspring.com
2025-09-05 12:25:59 -04:00
Andres Freund
50e4c6ace5 bufmgr: Use consistent naming of the clock-sweep algorithm
Minor edits to comments only.

Author: Greg Burd <greg@burd.me>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/70C6A5B5-2A20-4D0B-BC73-EB09DD62D61C@getmailspring.com
2025-09-05 12:25:59 -04:00
Nathan Bossart
d814d7fc3d Revert recent change to RequestNamedLWLockTranche().
Commit 38b602b028 modified this function to allocate enough space
for MAX_NAMED_TRANCHES (256) requests, which is likely far more
than most clusters need.  This commit reverts that change so that
it first allocates enough space for only 16 requests and resizes
the array when necessary.  While at it, remove the check for too
many tranches from this function.  We can now rely on
InitializeLWLocks() to do that check via its calls to
LWLockNewTrancheId() for the named tranches.

Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/aLmzwC2dRbqk14y6%40nathan
2025-09-04 15:34:48 -05:00
Nathan Bossart
38b602b028 Move dynamically-allocated LWLock tranche names to shared memory.
There are two ways for shared libraries to allocate their own
LWLock tranches.  One way is to call RequestNamedLWLockTranche() in
a shmem_request_hook, which requires the library to be loaded via
shared_preload_libraries.  The other way is to call
LWLockNewTrancheId(), which is not subject to the same
restrictions.  However, LWLockNewTrancheId() does require each
backend to store the tranche's name in backend-local memory via
LWLockRegisterTranche().  This API is a little cumbersome and leads
to things like unhelpful pg_stat_activity.wait_event values in
backends that haven't loaded the library.

This commit moves these LWLock tranche names to shared memory, thus
eliminating the need for each backend to call
LWLockRegisterTranche().  Instead, the tranche name must be
provided to LWLockNewTrancheId(), which immediately makes the name
available to all backends.  Since the tranche name array is
append-only, lookups can ordinarily avoid locking as long as their
local copy of the LWLock counter is greater than the requested
tranche ID.

One downside of this approach is that we now have a hard limit on
both the length of tranche names (NAMEDATALEN-1 bytes) and the
number of dynamically-allocated tranches (256).  Besides a limit of
NAMEDATALEN-1 bytes for tranche names registered via
RequestNamedLWLockTranche(), no such limits previously existed.  We
could avoid these new limits by using dynamic shared memory, but
the complexity involved didn't seem worth it.  We briefly
considered making the tranche limit user-configurable but
ultimately decided against that, too.  Since there is still a lot
of time left in the v19 development cycle, it's possible we will
revisit this choice.

Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAA5RZ0vvED3naph8My8Szv6DL4AxOVK3eTPS0qXsaKi%3DbVdW2A%40mail.gmail.com
2025-09-03 13:57:48 -05:00
Nathan Bossart
5487058b56 Prepare DSM registry for upcoming changes to LWLock tranche names.
A proposed patch would place a limit of NAMEDATALEN-1 (i.e., 63)
bytes on the names of dynamically-allocated LWLock tranches, but
GetNamedDSA() and GetNamedDSHash() may register tranches with
longer names.  This commit lowers the maximum DSM registry entry
name length to NAMEDATALEN-1 bytes and modifies GetNamedDSHash() to
create only one tranche, thereby allowing us to keep the DSM
registry's tranche names below NAMEDATALEN bytes.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/aKzIg1JryN1qhNuy%40nathan
2025-08-29 20:34:53 -05:00
Tom Lane
f727b63e81 Provide error context when an error is thrown within WaitOnLock().
Show the requested lock level and the object being waited on,
in the same format we use for deadlock reports and similar errors.
This is particularly helpful for debugging lock-timeout errors,
since otherwise the user has very little to go on about which
lock timed out.  The performance cost of setting up the callback
should be negligible compared to the other tracing support already
present in WaitOnLock.

As in the deadlock-report case, we just show numeric object OIDs,
because it seems too scary to try to perform catalog lookups
in this context.

Reported-by: Steve Baldwin <steve.baldwin@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1602369.1752167154@sss.pgh.pa.us
2025-08-29 15:43:34 -04:00
Nathan Bossart
67fcf48c3b Make LWLockCounter a global variable.
Using the LWLockCounter requires first calculating its address in
shared memory like this:

	LWLockCounter = (int *) ((char *) MainLWLockArray - sizeof(int));

Commit 82e861fbe1 started this trend in order to fix EXEC_BACKEND
builds, but it could also be fixed by adding it to the
BackendParameters struct.  The current approach is somewhat
difficult to follow, so this commit switches to the latter.  While
at it, swap around the code in LWLockShmemSize() to match the order
of assignments in CreateLWLocks() for added readability.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aLDLnan9gNCS9fHx%40nathan
2025-08-29 12:13:37 -05:00
Peter Eisentraut
991295f387 Mark ItemPointer arguments as const in tuple/table lock functions
The functions LockTuple, ConditionalLockTuple, UnlockTuple, and
XactLockTableWait take an ItemPointer argument that they do not
modify, so the argument can be const-qualified to better convey intent
and allow the compiler to enforce immutability.

Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2m9e4rECHBwpRE4%2BGCH%2BpbYZXLh2f4rB1Du5hDfKug%2BOg%40mail.gmail.com
2025-08-29 07:39:58 +02:00
Álvaro Herrera
325fc0ab14 Avoid including commands/dbcommands.h in so many places
This has been done historically because of get_database_name (which
since commit cb98e6fb8f belongs in lsyscache.c/h, so let's move it
there) and get_database_oid (which is in the right place, but whose
declaration should appear in pg_database.h rather than dbcommands.h).
Clean this up.

Also, xlogreader.h and stringinfo.h are no longer needed by dbcommands.h
since commit f1fd515b39, so remove them.

Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/202508191031.5ipojyuaswzt@alvherre.pgsql
2025-08-28 12:39:04 +02:00
Andres Freund
5865150b6d aio: Stop using enum bitfields due to bad code generation
During an investigation into rather odd aio related errors on macos, observed
by Alexander and Konstantin, we started to wonder if bitfield access is
related to the error. At the moment it looks like it is related, we cannot
reproduce the failures when replacing the bitfields. In addition, the problem
can only be reproduced with some compiler [versions] and not everyone has been
able to reproduce the issue.

The observed problem is that, very rarely, PgAioHandle->{state,target} are in
an inconsistent state, after having been checked to be in a valid state not
long before, triggering an assertion failure. Unfortunately, this could be
caused by wrong compiler code generation or somehow of missing memory barriers
- we don't really know. In theory there should not be any concurrent write
access to the handle in the state the bug is triggered, as the handle was idle
and is just being initialized.

Separately from the bug, we observed that at least gcc and clang generate
rather terrible code for the bitfield access. Even if it's not clear if the
observed assertion failure is actually caused by the bitfield somehow, the bad
code generation alone is sufficient reason to stop using bitfields.

Therefore, replace the enum bitfields with uint8s and instead cast in each
switch statement.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/1500090.1745443021@sss.pgh.pa.us
Backpatch-through: 18
2025-08-27 19:12:11 -04:00
Peter Eisentraut
e567e22290 Message style improvements
Mostly adding some quoting.
2025-08-26 22:52:11 +02:00
Michael Paquier
13b935cd52 Change dynahash.c and hsearch.h to use int64 instead of long
This code was relying on "long", which is signed 8 bytes everywhere
except on Windows where it is 4 bytes, that could potentially expose it
to overflows, even if the current uses in the code are fine as far as I
know.  This code is now able to rely on the same sizeof() variable
everywhere, with int64.  long was used for sizes, partition counts and
entry counts.

Some callers of the dynahash.c routines used long declarations, that can
be cleaned up to use int64 instead.  There was one shortcut based on
SIZEOF_LONG, that can be removed.  long is entirely removed from
dynahash.c and hsearch.h.

Similar work was done in b1e5c9fa9a.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aKQYp-bKTRtRauZ6@paquier.xyz
2025-08-22 11:59:02 +09:00
Peter Eisentraut
47932f3cdc Use consistent type for pgaio_io_get_id() result
The result of pgaio_io_get_id() was being assigned to a mix of int and
uint32 variables.  This fixes it to use int consistently, which seems
the most correct.  Also change the queue empty special value in
method_worker.c to -1 from UINT32_MAX.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/70c784b3-f60b-4652-b8a6-75e5f051243e%40eisentraut.org
2025-08-21 19:45:25 +02:00
Nathan Bossart
3eec0e6533 Fix comment for MAX_SIMUL_LWLOCKS.
This comment mentions that pg_buffercache locks all buffer
partitions simultaneously, but it hasn't done so since v10.

Oversight in commit 6e654546fb.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/aKTuAHVEuYCUmmIy%40nathan
2025-08-19 16:48:22 -05:00
Tom Lane
2a600a93c7 Make type Datum be 8 bytes wide everywhere.
This patch makes sizeof(Datum) be 8 on all platforms including
32-bit ones.  The objective is to allow USE_FLOAT8_BYVAL to be true
everywhere, and in consequence to remove a lot of code that is
specific to pass-by-reference handling of float8, int8, etc.  The
code for abbreviated sort keys can be simplified similarly.  In this
way we can reduce the maintenance effort involved in supporting 32-bit
platforms, without going so far as to actually desupport them.  Since
Datum is strictly an in-memory concept, this has no impact on on-disk
storage, though an initdb or pg_upgrade will be needed to fix affected
catalog entries.

We have required platforms to support [u]int64 for ages, so this
breaks no supported platform.  We can expect that this change will
make 32-bit builds a bit slower and more memory-hungry, although being
able to use pass-by-value handling of 8-byte types may buy back some
of that.  But we stopped optimizing for 32-bit cases a long time ago,
and this seems like just another step on that path.

This initial patch simply forces the correct type definition and
USE_FLOAT8_BYVAL setting, and cleans up a couple of minor compiler
complaints that ensued.  This is sufficient for testing purposes.
In the wake of a bunch of Datum-conversion cleanups by Peter
Eisentraut, this now compiles cleanly with gcc on a 32-bit platform.
(I'd only tested the previous version with clang, which it turns out
is less picky than gcc about width-changing coercions.)

There is a good deal of now-dead code that I'll remove in separate
follow-up patches.

A catversion bump is required because this affects initial catalog
contents (on 32-bit machines) in two ways: pg_type.typbyval changes
for some built-in types, and Const nodes in stored views/rules will
now have 8 bytes not 4 for pass-by-value types.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us
2025-08-13 17:18:22 -04:00
Thomas Munro
b421223172 Fix rare bug in read_stream.c's split IO handling.
The internal queue of buffers could become corrupted in a rare edge case
that failed to invalidate an entry, causing a stale buffer to be
"forwarded" to StartReadBuffers().  This is a simple fix for the
immediate problem.

A small API change might be able to remove this and related fragility
entirely, but that will have to wait a bit.

Defect in commit ed0b87ca.

Bug: 19006
Backpatch-through: 18
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/19006-80fcaaf69000377e%40postgresql.org
2025-08-09 13:04:38 +12:00
Peter Eisentraut
ff89e182d4 Add missing Datum conversions
Add various missing conversions from and to Datum.  The previous code
mostly relied on implicit conversions or its own explicit casts
instead of using the correct DatumGet*() or *GetDatum() functions.

We think these omissions are harmless.  Some actual bugs that were
discovered during this process have been committed
separately (80c758a2e1, fd2ab03fea).

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/8246d7ff-f4b7-4363-913e-827dadfeb145%40eisentraut.org
2025-08-08 22:06:57 +02:00
Thomas Munro
b5cd74612c Remove obsolete comment.
Remove a comment about potential for AIO in StartReadBuffersImpl(),
because that change happened.
2025-08-09 01:46:04 +12:00
Masahiko Sawada
b5c53b403c Suppress maybe-uninitialized warning.
Following commit e035863c9a, building with -O0 began triggering
warnings about potentially uninitialized 'workbuf' usage. While
theoretically the initialization isn't necessary since VARDATA()
doesn't access the contents of the pointed-to object, this commit
explicitly initializes the workbuf variable to suppress the warning.

Buildfarm members adder and flaviventris have shown the warning.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAD21AoCOZxfqnNgfM5yVKJZYnOq5m2Q96fBGy1fovEqQ9V4OZA@mail.gmail.com
2025-08-05 15:30:28 -07:00
Peter Eisentraut
2ad6e80de9 Fix various hash function uses
These instances were using Datum-returning functions where a
lower-level function returning uint32 would be more appropriate.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/8246d7ff-f4b7-4363-913e-827dadfeb145%40eisentraut.org
2025-08-05 11:47:23 +02:00
Tom Lane
9e9190154e Fix MemoryContextAllocAligned's interaction with Valgrind.
Arrange that only the "aligned chunk" part of the allocated space is
included in a Valgrind vchunk.  This suppresses complaints about that
vchunk being possibly lost because PG is retaining only pointers to
the aligned chunk.  Also make sure that trailing wasted space is
marked NOACCESS.

As a tiny performance improvement, arrange that MCXT_ALLOC_ZERO zeroes
only the returned "aligned chunk", not the wasted padding space.

In passing, fix GetLocalBufferStorage to use MemoryContextAllocAligned
instead of rolling its own implementation, which was equally broken
according to Valgrind.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
2025-08-02 21:59:46 -04:00
Heikki Linnakangas
613f647122 Handle cancel requests with PID 0 gracefully
If the client sent a query cancel request with backend PID 0, it
tripped an assertion. With assertions disabled, you got this in the
log instead:

    LOG:  invalid cancel request with PID 0
    LOG:  wrong key in cancel request for process 0

Query cancellations don't even require authentication, so we better
tolerate bogus requests. Fix by turning the assertion into a regular
runtime check.

Spotted while testing libpq behavior with a modified server that
didn't send BackendKeyData to the client.

Backpatch-through: 18
2025-07-30 00:39:49 +03:00