postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-18 02:02:55 +03:00

Author	SHA1	Message	Date
Heikki Linnakangas	daf3d99d2b	Add comment to explain why PGReserveSemaphores() is called early Before commit `e25626677f`, PGReserveSemaphores() had to be called before SpinlockSemaInit() because spinlocks were implemented using semaphores on some platforms (--disable-spinlocks). Add a comment explaining that. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CAExHW5seSZpPx-znjidVZNzdagGHOk06F+Ds88MpPUbxd1kTaA@mail.gmail.com Backpatch-to: 18	2025-11-06 14:20:48 +02:00
Alexander Korotkov	447aae13b0	Implement WAIT FOR command WAIT FOR is to be used on standby and specifies waiting for the specific WAL location to be replayed. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes are on standby. WAIT FOR needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records, implying a kind of self-deadlock. This is why separate utility command seems appears to be the most robust way to implement this functionality. It's not possible to implement this as a function. Previous experience shows that stored procedures also have limitation in this aspect. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>	2025-11-05 11:44:13 +02:00
Alexander Korotkov	3b4e53a075	Add infrastructure for efficient LSN waiting Implement a new facility that allows processes to wait for WAL to reach specific LSNs, both on primary (waiting for flush) and standby (waiting for replay) servers. The implementation uses shared memory with per-backend information organized into pairing heaps, allowing O(1) access to the minimum waited LSN. This enables fast-path checks: after replaying or flushing WAL, the startup process or WAL writer can quickly determine if any waiters need to be awakened. Key components: - New xlogwait.c/h module with WaitForLSNReplay() and WaitForLSNFlush() - Separate pairing heaps for replay and flush waiters - WaitLSN lightweight lock for coordinating shared state - Wait events WAIT_FOR_WAL_REPLAY and WAIT_FOR_WAL_FLUSH for monitoring This infrastructure can be used by features that need to wait for WAL operations to complete. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>	2025-11-05 11:44:13 +02:00
Andres Freund	dae00f333b	aio: Improve assertions related to io_method First, the assertions in assign_io_method() were the wrong way round. Second, the lengthof() assertion checked the length of io_method_options, which is the wrong array to check and is always longer than pgaio_method_ops_table. While add it, add a static assert to ensure pgaio_method_ops_table and io_method_options stay in sync. Per coverity and Tom Lane. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Backpatch-through: 18	2025-11-04 20:03:53 -05:00
Masahiko Sawada	8ae0f6a0c3	Add CHECK_FOR_INTERRUPTS in Evict{Rel,All}UnpinnedBuffers. This commit adds CHECK_FOR_INTERRUPTS to the shared buffer iteration loops in EvictRelUnpinnedBuffers and EvictAllUnpinnedBuffers. These functions, used by pg_buffercache's pg_buffercache_evict_relation and pg_buffercache_evict_all, can now be interrupted during long-running operations. Backpatch to version 18, where these functions and their corresponding pg_buffercache functions were introduced. Author: Yuhang Qiu <iamqyh@gmail.com> Discussion: https://postgr.es/m/8DC280D4-94A2-4E7B-BAB9-C345891D0B78%40gmail.com Backpatch-through: 18	2025-11-04 15:47:25 -08:00
Peter Eisentraut	e1ac846f3d	Mark ItemPointer arguments as const throughout This is a follow up `991295f`. I searched over src/ and made all ItemPointer arguments as const as much as possible. Note: We cut out from the original patch the pieces that would have created incompatibilities in the index or table AM APIs. Those could be considered separately. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw%40mail.gmail.com	2025-10-30 14:12:06 +01:00
Peter Eisentraut	3479a0f823	const-qualify ItemPointer comparison functions Add const qualifiers to ItemPointerEquals() and ItemPointerCompare(). This will allow further changes up the stack. It also complements commit `aeb767ca0b`, as we now have all of itemptr.h appropriately const-qualified. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw@mail.gmail.com	2025-10-30 10:13:47 +01:00
Nathan Bossart	123661427b	Fix a couple of comments. These were discovered while reviewing Aleksander Alekseev's proposed changes to pgindent. Oversights in commits `393e0d2314` and `25a30bbd42`. Discussion: https://postgr.es/m/aP-H6kSsGOxaB21k%40nathan	2025-10-27 10:30:05 -05:00
Peter Eisentraut	76acf4b722	Remove Item type This type is just char * underneath, it provides no real value, no type safety, and just makes the code one level more mysterious. It is more idiomatic to refer to blobs of memory by a combination of void * and size_t, so change it to that. Also, since this type hides the pointerness, we can't apply qualifiers to what is pointed to, which requires some unconstify nonsense. This change allows fixing that. Extension code that uses the Item type can change its code to use void * to be backward compatible. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org	2025-10-27 09:55:59 +01:00
Álvaro Herrera	b7cc6474e9	Make smgr access for a BufferManagerRelation safer in relcache inval Currently there's no bug, because we have no code path where we invalidate relcache entries where it'd cause a problem. But it's more robust to do it this way in case we introduce such a path later, as some Postgres forks reportedly already have. Author: Daniil Davydov <3danissimo@gmail.com> Reviewed-by: Stepan Neretin <slpmcf@gmail.com> Discussion: https://postgr.es/m/CAJDiXgj3FNzAhV+jjPqxMs3jz=OgPohsoXFj_fh-L+nS+13CKQ@mail.gmail.com	2025-10-21 10:51:55 +03:00
Michael Paquier	e4e496e88c	Fix comment in pg_get_shmem_allocations_numa() The comment fixed in this commit described the function as dealing with database blocks, but in reality it processes shared memory allocations. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aH4DDhdiG9Gi0rG7@ip-10-97-1-34.eu-west-3.compute.internal Backpatch-through: 18	2025-10-21 16:12:30 +09:00
Tom Lane	92cf557ffa	Add static assertion that RELSEG_SIZE fits in an int. Our configure script intended to ensure this, but it supposed that expr(1) would report an error for integer overflow. Maybe that was true when the code was written (commit `3c6248a82` of 2008-05-02), but all the modern expr's I tried will deliver bigger-than-int32 results without complaint. Moreover, if you use --with-segsize-blocks then there's no check at all. Ideally we'd add a test in configure itself to check that the value fits in int, but to do that we'd need to suppose that test(1) handles bigger-than-int32 numbers correctly. Probably modern ones do, but that's an assumption I could do without; and I'm not too trusting about meson either. Instead, let's install a static assertion, so that even people who ignore all the compiler warnings you get from such values will be forced to confront the fact that it won't work. This has been hazardous for awhile, but given that we hadn't heard a complaint about it till now, I don't feel a need to back-patch. Reported-by: Casey Shobe <casey.allen.shobe@icloud.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/C5DC82D6-C76D-4E8F-BC2E-DF03EFC4FA24@icloud.com	2025-10-19 18:28:46 -04:00
Andres Freund	c819d1017d	bufmgr: Fix valgrind checking for buffers pinned in StrategyGetBuffer() In `5e89985928` I made StrategyGetBuffer() pin buffers with a single CAS, instead of using PinBuffer_Locked(). Unfortunately I missed that PinBuffer_Locked() marked the page as defined for valgrind. Fix this oversight by centralizing the valgrind initialization into TrackNewBufferPin(), which also allows us to reduce the number of places doing VALGRIND_MAKE_MEM_DEFINED. Per buildfarm animal skink and Amit Langote. Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff Discussion: https://postgr.es/m/CA+HiwqGKJ6nEXEPQW7EpykVsEtzxp5-up_xhtcUAkWFtATVQvQ@mail.gmail.com	2025-10-09 19:17:13 -04:00
Andres Freund	5e89985928	bufmgr: Don't lock buffer header in StrategyGetBuffer() Previously StrategyGetBuffer() acquired the buffer header spinlock for every buffer, whether it was reusable or not. If reusable, it'd be returned, with the lock held, to GetVictimBuffer(), which then would pin the buffer with PinBuffer_Locked(). That's somewhat violating the spirit of the guidelines for holding spinlocks (i.e. that they are only held for a few lines of consecutive code) and necessitates using PinBuffer_Locked(), which scales worse than PinBuffer() due to holding the spinlock. This alone makes it worth changing the code. However, the main reason to change this is that a future commit will make PinBuffer_Locked() slower (due to making UnlockBufHdr() slower), to gain scalability for the much more common case of pinning a pre-existing buffer. By pinning the buffer with a single atomic operation, iff the buffer is reusable, we avoid any potential regression for miss-heavy workloads. There strictly are fewer atomic operations for each potential buffer after this change. The price for this improvement is that freelist.c needs two CAS loops and needs to be able to set up the resource accounting for pinned buffers. The latter is achieved by exposing a new function for that purpose from bufmgr.c, that seems better than exposing the entire private refcount infrastructure. The improvement seems worth the complexity. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-10-08 17:04:07 -04:00
Andres Freund	3baae90013	bufmgr: fewer calls to BufferDescriptorGetContentLock We're planning to merge buffer content locks into BufferDesc.state. To reduce the size of that patch, centralize calls to BufferDescriptorGetContentLock(). The biggest part of the change is in assertions, by introducing BufferIsLockedByMe[InMode]() (and removing BufferIsExclusiveLocked()). This seems like an improvement even without aforementioned plans. Additionally replace some direct calls to LWLockAcquire() with calls to LockBuffer(). Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-10-08 16:06:19 -04:00
Andres Freund	2a2e1b470b	bufmgr: Fix signedness of mask variable in BufferSync() BM_PERMANENT is defined as 1U<<31, which is a negative number when interpreted as a signed integer. Unfortunately the mask variable in BufferSync() was signed. This has been wrong for a long time, but failed to fail, due to integer conversion rules. However, in an upcoming patch the width of the state variable will be increased, with the wrong signedness leading to never flushing permanent buffers - luckily caught in a test. It seems better to fix this separately, instead of doing so as part of a large, otherwise mechanical, patch. Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-10-08 14:34:30 -04:00
Andres Freund	3c2b97b29e	bufmgr: Introduce FlushUnlockedBuffer There were several copies of code locking a buffer, flushing its contents, and unlocking the buffer. It seems worth centralizing that into a helper function. Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-10-08 14:34:30 -04:00
Andres Freund	819dc118c0	Improve ReadRecentBuffer() scalability While testing a new potential use for ReadRecentBuffer(), Andres reported that it scales badly when called concurrently for the same buffer by many backends. Instead of a naive (but wrong) coding with PinBuffer(), it used the spinlock, so that it could be careful to pin only if the buffer was valid and holding the expected block, to avoid breaking invariants in eg GetVictimBuffer(). Unfortunately that made it less scalable than PinBuffer(), which uses compare-exchange instead. We can fix that by giving PinBuffer() a new skip_if_not_valid mode that doesn't pin invalid buffers. It might occasionally skip when it shouldn't due to the unlocked read of the header flags, but that's unlikely and perfectly acceptable for an opportunistic optimisation routine, and it can only succeed when it really should due to the compare-exchange loop. Note that this fixes ReadRecentBuffer()'s failure to bump the usage count. While this could be seen as a bug, there currently aren't cases affected by this in core, so it doesn't seem worth backpatching that portion. Author: Thomas Munro <thomas.munro@gmail.com> Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/20230627020546.t6z4tntmj7wmjrfh%40awork3.anarazel.de Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-10-08 13:10:40 -04:00
Nathan Bossart	74b41f5a77	Make some use of anonymous unions [DSM registry]. Make some use of anonymous unions, which are allowed as of C11, as examples and encouragement for future code, and to test compilers. This commit changes the DSMRegistryEntry struct. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/aNKsDg0fJwqhZdXX%40nathan	2025-10-03 10:14:33 -05:00
Álvaro Herrera	7e638d7f50	Don't include execnodes.h in replication/conflict.h ... which silently propagates a lot of headers into many places via pgstat.h, as evidenced by the variety of headers that this patch needs to add to seemingly random places. Add a minimum of typedefs to conflict.h to be able to remove execnodes.h, and fix the fallout. Backpatch to 18, where conflict.h first appeared. Discussion: https://postgr.es/m/202509191927.uj2ijwmho7nv@alvherre.pgsql	2025-09-25 14:52:41 +02:00
Peter Eisentraut	a5b35fcedb	Remove PointerIsValid() This doesn't provide any value over the standard style of checking the pointer directly or comparing against NULL. Also remove related: - AllocPointerIsValid() [unused] - IndexScanIsValid() [had one user] - HeapScanIsValid() [unused] - InvalidRelation [unused] Leaving HeapTupleIsValid(), ItemIdIsValid(), PortalIsValid(), RelationIsValid for now, to reduce code churn. Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/ad50ab6b-6f74-4603-b099-1cd6382fb13d%40eisentraut.org Discussion: https://www.postgresql.org/message-id/CA+hUKG+NFKnr=K4oybwDvT35dW=VAjAAfiuLxp+5JeZSOV3nBg@mail.gmail.com Discussion: https://www.postgresql.org/message-id/bccf2803-5252-47c2-9ff0-340502d5bd1c@iki.fi	2025-09-24 15:17:20 +02:00
Nathan Bossart	c3cc2ab87d	Fix re-initialization of LWLock-related shared memory. When shared memory is re-initialized after a crash, the named LWLock tranche request array that was copied to shared memory will no longer be accessible. To fix, save the pointer to the original array in postmaster's local memory, and switch to it when re-initializing the LWLock-related shared memory. Oversight in commit `ed1aad15e0`. Per buildfarm member batta. Reported-by: Michael Paquier <michael@paquier.xyz> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/aMoejB3iTWy1SxfF%40paquier.xyz Discussion: https://postgr.es/m/f8ca018f-3479-49f6-a92c-e31db9f849d7%40gmail.com	2025-09-18 09:55:39 -05:00
Andres Freund	0110e2ec5c	Mark shared buffer lookup table HASH_FIXED_SIZE StrategyInitialize() calls InitBufTable() with maximum number of entries that the buffer lookup table can ever have. Thus there should not be any need to allocate more element after initialization. Hence mark the hash table as fixed sized. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAExHW5v0jh3F_wj86yC=qBfWk0uiT94qy=Z41uzAHLHh0SerRA@mail.gmail.com	2025-09-17 20:28:43 -04:00
Michael Paquier	158c48303e	Fix shared memory calculation size of PgAioCtl The shared memory size was calculated based on an offset of io_handles, which is itself a pointer included in the structure. We tend to overestimate the shared memory size overall, so this was unlikely an issue in practice, but let's be correct and use the full size of the structure in the calculation, so as the pointer for io_handles is included. Oversight in `da7226993f`. Author: Madhukar Prasad <madhukarprasad@google.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAKi+wrbC2dTzh_vKJoAZXV5wqTbhY0n4wRNpCjJ=e36aoo0kFw@mail.gmail.com Backpatch-through: 18	2025-09-17 09:33:32 +09:00
Peter Eisentraut	2aac62be8c	Default to log_lock_waits=on If someone is stuck behind a lock for more than a second, that is almost always a problem that is worth a log entry. Author: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-By: Michael Banck <mbanck@gmx.net> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Christoph Berg <myon@debian.org> Reviewed-By: Stephen Frost <sfrost@snowman.net> Discussion: https://postgr.es/m/b8b8502915e50f44deb111bc0b43a99e2733e117.camel%40cybertec.at	2025-09-12 07:57:06 +02:00
Peter Eisentraut	25f36066dd	Remove traces of support for Sun Studio compiler Per discussion, this compiler suite is no longer maintained, and it has not been able to compile PostgreSQL since at least PostgreSQL 17. This removes all the remaining support code for this compiler. Note that the Solaris operating system continues to be supported, but using GCC as the compiler. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/a0f817ee-fb86-483a-8a14-b6f7f5991b6e%40eisentraut.org	2025-09-12 07:39:05 +02:00
Nathan Bossart	ed1aad15e0	Move named LWLock tranche requests to shared memory. In EXEC_BACKEND builds, GetNamedLWLockTranche() can segfault when called outside of the postmaster process, as it might access NamedLWLockTrancheRequestArray, which won't be initialized. Given the lack of reports, this is apparently unusual, presumably because it is usually called from a shmem_startup_hook like this: mystruct = ShmemInitStruct(..., &found); if (!found) { mystruct->locks = GetNamedLWLockTranche(...); ... } This genre of shmem_startup_hook evades the aforementioned segfaults because the struct is initialized in the postmaster, so all other callers skip the !found path. We considered modifying the documentation or requiring GetNamedLWLockTranche() to be called from the postmaster, but ultimately we decided to simply move the request array to shared memory (and add it to the BackendParameters struct), thereby allowing calls outside postmaster on all platforms. Since the main shared memory segment is initialized after accepting LWLock tranche requests, postmaster builds the request array in local memory first and then copies it to shared memory later. Given the lack of reports, back-patching seems unnecessary. Reported-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0v1_15QPg5Sqd2Qz5rh_qcsyCeHHmRDY89xVHcy2yt5BQ%40mail.gmail.com	2025-09-11 16:13:55 -05:00
Jeff Davis	9af672bcb2	meson: build checksums with extra optimization flags. Use -funroll-loops and -ftree-vectorize when building checksum.c to match what autoconf does. Discussion: https://postgr.es/m/a81f2f7ef34afc24a89c613671ea017e3651329c.camel@j-davis.com Reviewed-by: Andres Freund <andres@anarazel.de>	2025-09-08 12:29:42 -07:00
Andres Freund	2c78940527	bufmgr: Remove freelist, always use clock-sweep This set of changes removes the list of available buffers and instead simply uses the clock-sweep algorithm to find and return an available buffer. This also removes the have_free_buffer() function and simply caps the pg_autoprewarm process to at most NBuffers. While on the surface this appears to be removing an optimization it is in fact eliminating code that induces overhead in the form of synchronization that is problematic for multi-core systems. The main reason for removing the freelist, however, is not the moderate improvement in scalability, but that having the freelist would require dedicated complexity in several upcoming patches. As we have not been able to find a case benefiting from the freelist... Author: Greg Burd <greg@burd.me> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/70C6A5B5-2A20-4D0B-BC73-EB09DD62D61C@getmailspring.com	2025-09-05 12:25:59 -04:00
Andres Freund	50e4c6ace5	bufmgr: Use consistent naming of the clock-sweep algorithm Minor edits to comments only. Author: Greg Burd <greg@burd.me> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/70C6A5B5-2A20-4D0B-BC73-EB09DD62D61C@getmailspring.com	2025-09-05 12:25:59 -04:00
Nathan Bossart	d814d7fc3d	Revert recent change to RequestNamedLWLockTranche(). Commit `38b602b028` modified this function to allocate enough space for MAX_NAMED_TRANCHES (256) requests, which is likely far more than most clusters need. This commit reverts that change so that it first allocates enough space for only 16 requests and resizes the array when necessary. While at it, remove the check for too many tranches from this function. We can now rely on InitializeLWLocks() to do that check via its calls to LWLockNewTrancheId() for the named tranches. Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/aLmzwC2dRbqk14y6%40nathan	2025-09-04 15:34:48 -05:00
Nathan Bossart	38b602b028	Move dynamically-allocated LWLock tranche names to shared memory. There are two ways for shared libraries to allocate their own LWLock tranches. One way is to call RequestNamedLWLockTranche() in a shmem_request_hook, which requires the library to be loaded via shared_preload_libraries. The other way is to call LWLockNewTrancheId(), which is not subject to the same restrictions. However, LWLockNewTrancheId() does require each backend to store the tranche's name in backend-local memory via LWLockRegisterTranche(). This API is a little cumbersome and leads to things like unhelpful pg_stat_activity.wait_event values in backends that haven't loaded the library. This commit moves these LWLock tranche names to shared memory, thus eliminating the need for each backend to call LWLockRegisterTranche(). Instead, the tranche name must be provided to LWLockNewTrancheId(), which immediately makes the name available to all backends. Since the tranche name array is append-only, lookups can ordinarily avoid locking as long as their local copy of the LWLock counter is greater than the requested tranche ID. One downside of this approach is that we now have a hard limit on both the length of tranche names (NAMEDATALEN-1 bytes) and the number of dynamically-allocated tranches (256). Besides a limit of NAMEDATALEN-1 bytes for tranche names registered via RequestNamedLWLockTranche(), no such limits previously existed. We could avoid these new limits by using dynamic shared memory, but the complexity involved didn't seem worth it. We briefly considered making the tranche limit user-configurable but ultimately decided against that, too. Since there is still a lot of time left in the v19 development cycle, it's possible we will revisit this choice. Author: Sami Imseih <samimseih@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAA5RZ0vvED3naph8My8Szv6DL4AxOVK3eTPS0qXsaKi%3DbVdW2A%40mail.gmail.com	2025-09-03 13:57:48 -05:00
Nathan Bossart	5487058b56	Prepare DSM registry for upcoming changes to LWLock tranche names. A proposed patch would place a limit of NAMEDATALEN-1 (i.e., 63) bytes on the names of dynamically-allocated LWLock tranches, but GetNamedDSA() and GetNamedDSHash() may register tranches with longer names. This commit lowers the maximum DSM registry entry name length to NAMEDATALEN-1 bytes and modifies GetNamedDSHash() to create only one tranche, thereby allowing us to keep the DSM registry's tranche names below NAMEDATALEN bytes. Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/aKzIg1JryN1qhNuy%40nathan	2025-08-29 20:34:53 -05:00
Tom Lane	f727b63e81	Provide error context when an error is thrown within WaitOnLock(). Show the requested lock level and the object being waited on, in the same format we use for deadlock reports and similar errors. This is particularly helpful for debugging lock-timeout errors, since otherwise the user has very little to go on about which lock timed out. The performance cost of setting up the callback should be negligible compared to the other tracing support already present in WaitOnLock. As in the deadlock-report case, we just show numeric object OIDs, because it seems too scary to try to perform catalog lookups in this context. Reported-by: Steve Baldwin <steve.baldwin@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1602369.1752167154@sss.pgh.pa.us	2025-08-29 15:43:34 -04:00
Nathan Bossart	67fcf48c3b	Make LWLockCounter a global variable. Using the LWLockCounter requires first calculating its address in shared memory like this: LWLockCounter = (int ) ((char ) MainLWLockArray - sizeof(int)); Commit `82e861fbe1` started this trend in order to fix EXEC_BACKEND builds, but it could also be fixed by adding it to the BackendParameters struct. The current approach is somewhat difficult to follow, so this commit switches to the latter. While at it, swap around the code in LWLockShmemSize() to match the order of assignments in CreateLWLocks() for added readability. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aLDLnan9gNCS9fHx%40nathan	2025-08-29 12:13:37 -05:00
Peter Eisentraut	991295f387	Mark ItemPointer arguments as const in tuple/table lock functions The functions LockTuple, ConditionalLockTuple, UnlockTuple, and XactLockTableWait take an ItemPointer argument that they do not modify, so the argument can be const-qualified to better convey intent and allow the compiler to enforce immutability. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2m9e4rECHBwpRE4%2BGCH%2BpbYZXLh2f4rB1Du5hDfKug%2BOg%40mail.gmail.com	2025-08-29 07:39:58 +02:00
Álvaro Herrera	325fc0ab14	Avoid including commands/dbcommands.h in so many places This has been done historically because of get_database_name (which since commit `cb98e6fb8f` belongs in lsyscache.c/h, so let's move it there) and get_database_oid (which is in the right place, but whose declaration should appear in pg_database.h rather than dbcommands.h). Clean this up. Also, xlogreader.h and stringinfo.h are no longer needed by dbcommands.h since commit `f1fd515b39`, so remove them. Author: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/202508191031.5ipojyuaswzt@alvherre.pgsql	2025-08-28 12:39:04 +02:00
Andres Freund	5865150b6d	aio: Stop using enum bitfields due to bad code generation During an investigation into rather odd aio related errors on macos, observed by Alexander and Konstantin, we started to wonder if bitfield access is related to the error. At the moment it looks like it is related, we cannot reproduce the failures when replacing the bitfields. In addition, the problem can only be reproduced with some compiler [versions] and not everyone has been able to reproduce the issue. The observed problem is that, very rarely, PgAioHandle->{state,target} are in an inconsistent state, after having been checked to be in a valid state not long before, triggering an assertion failure. Unfortunately, this could be caused by wrong compiler code generation or somehow of missing memory barriers - we don't really know. In theory there should not be any concurrent write access to the handle in the state the bug is triggered, as the handle was idle and is just being initialized. Separately from the bug, we observed that at least gcc and clang generate rather terrible code for the bitfield access. Even if it's not clear if the observed assertion failure is actually caused by the bitfield somehow, the bad code generation alone is sufficient reason to stop using bitfields. Therefore, replace the enum bitfields with uint8s and instead cast in each switch statement. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reported-by: Konstantin Knizhnik <knizhnik@garret.ru> Discussion: https://postgr.es/m/1500090.1745443021@sss.pgh.pa.us Backpatch-through: 18	2025-08-27 19:12:11 -04:00
Peter Eisentraut	e567e22290	Message style improvements Mostly adding some quoting.	2025-08-26 22:52:11 +02:00
Michael Paquier	13b935cd52	Change dynahash.c and hsearch.h to use int64 instead of long This code was relying on "long", which is signed 8 bytes everywhere except on Windows where it is 4 bytes, that could potentially expose it to overflows, even if the current uses in the code are fine as far as I know. This code is now able to rely on the same sizeof() variable everywhere, with int64. long was used for sizes, partition counts and entry counts. Some callers of the dynahash.c routines used long declarations, that can be cleaned up to use int64 instead. There was one shortcut based on SIZEOF_LONG, that can be removed. long is entirely removed from dynahash.c and hsearch.h. Similar work was done in `b1e5c9fa9a`. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aKQYp-bKTRtRauZ6@paquier.xyz	2025-08-22 11:59:02 +09:00
Peter Eisentraut	47932f3cdc	Use consistent type for pgaio_io_get_id() result The result of pgaio_io_get_id() was being assigned to a mix of int and uint32 variables. This fixes it to use int consistently, which seems the most correct. Also change the queue empty special value in method_worker.c to -1 from UINT32_MAX. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/70c784b3-f60b-4652-b8a6-75e5f051243e%40eisentraut.org	2025-08-21 19:45:25 +02:00
Nathan Bossart	3eec0e6533	Fix comment for MAX_SIMUL_LWLOCKS. This comment mentions that pg_buffercache locks all buffer partitions simultaneously, but it hasn't done so since v10. Oversight in commit `6e654546fb`. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/aKTuAHVEuYCUmmIy%40nathan	2025-08-19 16:48:22 -05:00
Tom Lane	2a600a93c7	Make type Datum be 8 bytes wide everywhere. This patch makes sizeof(Datum) be 8 on all platforms including 32-bit ones. The objective is to allow USE_FLOAT8_BYVAL to be true everywhere, and in consequence to remove a lot of code that is specific to pass-by-reference handling of float8, int8, etc. The code for abbreviated sort keys can be simplified similarly. In this way we can reduce the maintenance effort involved in supporting 32-bit platforms, without going so far as to actually desupport them. Since Datum is strictly an in-memory concept, this has no impact on on-disk storage, though an initdb or pg_upgrade will be needed to fix affected catalog entries. We have required platforms to support [u]int64 for ages, so this breaks no supported platform. We can expect that this change will make 32-bit builds a bit slower and more memory-hungry, although being able to use pass-by-value handling of 8-byte types may buy back some of that. But we stopped optimizing for 32-bit cases a long time ago, and this seems like just another step on that path. This initial patch simply forces the correct type definition and USE_FLOAT8_BYVAL setting, and cleans up a couple of minor compiler complaints that ensued. This is sufficient for testing purposes. In the wake of a bunch of Datum-conversion cleanups by Peter Eisentraut, this now compiles cleanly with gcc on a 32-bit platform. (I'd only tested the previous version with clang, which it turns out is less picky than gcc about width-changing coercions.) There is a good deal of now-dead code that I'll remove in separate follow-up patches. A catversion bump is required because this affects initial catalog contents (on 32-bit machines) in two ways: pg_type.typbyval changes for some built-in types, and Const nodes in stored views/rules will now have 8 bytes not 4 for pass-by-value types. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us	2025-08-13 17:18:22 -04:00
Thomas Munro	b421223172	Fix rare bug in read_stream.c's split IO handling. The internal queue of buffers could become corrupted in a rare edge case that failed to invalidate an entry, causing a stale buffer to be "forwarded" to StartReadBuffers(). This is a simple fix for the immediate problem. A small API change might be able to remove this and related fragility entirely, but that will have to wait a bit. Defect in commit `ed0b87ca`. Bug: 19006 Backpatch-through: 18 Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/19006-80fcaaf69000377e%40postgresql.org	2025-08-09 13:04:38 +12:00
Peter Eisentraut	ff89e182d4	Add missing Datum conversions Add various missing conversions from and to Datum. The previous code mostly relied on implicit conversions or its own explicit casts instead of using the correct DatumGet() or GetDatum() functions. We think these omissions are harmless. Some actual bugs that were discovered during this process have been committed separately (`80c758a2e1`, `fd2ab03fea`). Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/8246d7ff-f4b7-4363-913e-827dadfeb145%40eisentraut.org	2025-08-08 22:06:57 +02:00
Thomas Munro	b5cd74612c	Remove obsolete comment. Remove a comment about potential for AIO in StartReadBuffersImpl(), because that change happened.	2025-08-09 01:46:04 +12:00
Masahiko Sawada	b5c53b403c	Suppress maybe-uninitialized warning. Following commit `e035863c9a`, building with -O0 began triggering warnings about potentially uninitialized 'workbuf' usage. While theoretically the initialization isn't necessary since VARDATA() doesn't access the contents of the pointed-to object, this commit explicitly initializes the workbuf variable to suppress the warning. Buildfarm members adder and flaviventris have shown the warning. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAD21AoCOZxfqnNgfM5yVKJZYnOq5m2Q96fBGy1fovEqQ9V4OZA@mail.gmail.com	2025-08-05 15:30:28 -07:00
Peter Eisentraut	2ad6e80de9	Fix various hash function uses These instances were using Datum-returning functions where a lower-level function returning uint32 would be more appropriate. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/8246d7ff-f4b7-4363-913e-827dadfeb145%40eisentraut.org	2025-08-05 11:47:23 +02:00
Tom Lane	9e9190154e	Fix MemoryContextAllocAligned's interaction with Valgrind. Arrange that only the "aligned chunk" part of the allocated space is included in a Valgrind vchunk. This suppresses complaints about that vchunk being possibly lost because PG is retaining only pointers to the aligned chunk. Also make sure that trailing wasted space is marked NOACCESS. As a tiny performance improvement, arrange that MCXT_ALLOC_ZERO zeroes only the returned "aligned chunk", not the wasted padding space. In passing, fix GetLocalBufferStorage to use MemoryContextAllocAligned instead of rolling its own implementation, which was equally broken according to Valgrind. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us	2025-08-02 21:59:46 -04:00
Heikki Linnakangas	613f647122	Handle cancel requests with PID 0 gracefully If the client sent a query cancel request with backend PID 0, it tripped an assertion. With assertions disabled, you got this in the log instead: LOG: invalid cancel request with PID 0 LOG: wrong key in cancel request for process 0 Query cancellations don't even require authentication, so we better tolerate bogus requests. Fix by turning the assertion into a regular runtime check. Spotted while testing libpq behavior with a modified server that didn't send BackendKeyData to the client. Backpatch-through: 18	2025-07-30 00:39:49 +03:00

1 2 3 4 5 ...

2951 Commits