postgres

mirror of https://github.com/postgres/postgres.git synced 2026-01-27 21:43:08 +03:00

Author	SHA1	Message	Date
Andres Freund	45f658dacb	freespace: Don't modify page without any lock Before this commit fsm_vacuum_page() modified the page without any lock on the page. Historically that was kind of ok, as we didn't rely on the freespace to really stay consistent and we did not have checksums. But these days pages are checksummed and there are ways for FSM pages to be included in WAL records, even if the FSM itself is still not WAL logged. If a FSM page ever were modified while a WAL record referenced that page, we'd be in trouble, as the WAL CRC could end up getting corrupted. The reason to address this right now is a series of patches with the goal to only allow modifications of pages with an appropriate lock level. Obviously not having any lock is not appropriate :) Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/4wggb7purufpto6x35fd2kwhasehnzfdy3zdcu47qryubs2hdz@fa5kannykekr Discussion: https://postgr.es/m/e6a8f734-2198-4958-a028-aba863d4a204@iki.fi	2026-01-12 12:40:00 -05:00
Heikki Linnakangas	ad853bb877	Fix misc typos, mostly in comments The only user-visible change is the fix in the "malformed pg_dependencies" error detail. That one is new in commit `e1405aa5e3`, so no backpatching required.	2026-01-08 18:10:08 +02:00
Michael Paquier	f1e251be80	Allow bgworkers to be terminated for database-related commands Background workers gain a new flag, called BGWORKER_INTERRUPTIBLE, that offers the possibility to terminate the workers when these are connected to a database that is involved in one of the following commands: ALTER DATABASE RENAME TO ALTER DATABASE SET TABLESPACE CREATE DATABASE DROP DATABASE This is useful to give background workers the same behavior as backends and autovacuum workers, which are stopped when these commands are executed. The default behavior, that exists since 9.3, is still to never terminate bgworkers connected to the database involved in any of these commands. The new flag has to be set to terminate the workers. A couple of tests are added to worker_spi to track the commands that impact the termination of the workers. There is a test case for a non-interruptible worker, additionally, that relies on an injection point to make the wait time in CountOtherDBBackends() reduced from 5s to 0.3s for faster test runs. The tests rely on the contents of the server logs to check if a worker has been started or terminated: - LOG generated by worker_spi_main() at startup, once connection to database is done. - FATAL in bgworker_die() when terminated. A couple of tests run in the CI have showed that this method is stable enough. The safe_psql() calls that scan pg_stat_activity could be replaced with some poll_query_until() for more stability, if the current method proves to be an issue in the buildfarm. Author: Aya Iwata <iwata.aya@fujitsu.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Ryo Matsumura <matsumura.ryo@fujitsu.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/OS7PR01MB11964335F36BE41021B62EAE8EAE4A@OS7PR01MB11964.jpnprd01.prod.outlook.com	2026-01-06 14:24:29 +09:00
David Rowley	4c144e0452	Use the GetPGProcByNumber() macro when possible A few places were accessing &ProcGlobal->allProcs directly, so adjust them to use the accessor macro instead. Author: Maksim Melnikov <m.melnikov@postgrespro.ru> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/80621c00-aba6-483c-88b1-a845461d1165@postgrespro.ru	2026-01-05 21:19:03 +13:00
Michael Paquier	b8cfcb9e00	Fix typos and inconsistencies in code and comments This change is a cocktail of harmonization of function argument names, grammar typos, renames for better consistency and unused code (see ltree). All of these have been spotted by the author. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/b2c0d0b7-3944-487d-a03d-d155851958ff@gmail.com	2026-01-05 09:19:15 +09:00
Bruce Momjian	451c43974f	Update copyright for 2026 Backpatch-through: 14	2026-01-01 13:24:10 -05:00
Masahiko Sawada	85d5bd308b	Fix macro name for io_uring_queue_init_mem check. Commit `f54af9f267` added a check for io_uring_queue_init_mem(). However, it used the macro name HAVE_LIBURING_QUEUE_INIT_MEM in both meson.build and the C code, while the Autotools build script defined HAVE_IO_URING_QUEUE_INIT_MEM. As a result, the optimization was never enabled in builds configured with Autotools, as the C code checked for the wrong macro name. This commit changes the macro name to HAVE_IO_URING_QUEUE_INIT_MEM in meson.build and the C code. This matches the actual function name (io_uring_queue_init_mem), following the standard HAVE_<FUNCTION> convention. Backpatch to 18, where the macro was introduced. Bug: #19368 Reported-by: Evan Si <evsi@amazon.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/19368-016d79a7f3a1c599@postgresql.org Backpatch-through: 18	2025-12-31 11:18:14 -08:00
Michael Paquier	9adf32da6b	Split some long Makefile lists This change makes more readable code diffs when adding new items or removing old items, while ensuring that lines do not get excessively long. Some SUBDIRS, PROGRAMS and REGRESS lists are split. Note that there are a few more REGRESS lists that could be split, particularly in contrib/. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Co-Authored-By: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Man Zeng <zengman@halodbtech.com> Discussion: https://postgr.es/m/DF6HDGB559U5.3MPRFCWPONEAE@jeltef.nl	2025-12-28 09:17:42 +09:00
Michael Paquier	bde3a46160	Upgrade BufFile to use int64 for byte positions This change has the advantage of removing some weird type casts, caused by offset calculations based on pgoff_t but saved as int (on older branches we use off_t, which could be 4 or 8 bytes depending on the environment). These are safe currently because capped by MAX_PHYSICAL_FILESIZE, but we would run into problems when to make MAX_PHYSICAL_FILESIZE larger or allow callers of these routines to use a larger physical max size on demand. While on it, this improves BufFileDumpBuffer() so as we do not use an offset for "availbytes". It is not a file offset per-set, but a number of available bytes. This change should lead to no functional changes. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aUStrqoOCDRFAq1M@paquier.xyz	2025-12-26 08:41:56 +09:00
Masahiko Sawada	67c20979ce	Toggle logical decoding dynamically based on logical slot presence. Previously logical decoding required wal_level to be set to 'logical' at server start. This meant that users had to incur the overhead of logical-level WAL logging even when no logical replication slots were in use. This commit adds functionality to automatically control logical decoding availability based on logical replication slot presence. The newly introduced module logicalctl.c allows logical decoding to be dynamically activated when needed when wal_level is set to 'replica'. When the first logical replication slot is created, the system automatically increases the effective WAL level to maintain logical-level WAL records. Conversely, after the last logical slot is dropped or invalidated, it decreases back to 'replica' WAL level. While activation occurs synchronously right after creating the first logical slot, deactivation happens asynchronously through the checkpointer process. This design avoids a race condition at the end of recovery; a concurrent deactivation could happen while the startup process enables logical decoding at the end of recovery, but WAL writes are still not permitted until recovery fully completes. The checkpointer will handle it after recovery is done. Asynchronous deactivation also avoids excessive toggling of the logical decoding status in workloads that repeatedly create and drop a single logical slot. On the other hand, this lazy approach can delay changes to effective_wal_level and the disabling logical decoding, especially when the checkpointer is busy with other tasks. We chose this lazy approach in all deactivation paths to keep the implementation simple, even though laziness is strictly required only for end-of-recovery cases. Future work might address this limitation either by using a dedicated worker instead of the checkpointer, or by implementing synchronous waiting during slot drops if workloads are significantly affected by the lazy deactivation of logical decoding. The effective WAL level, determined internally by XLogLogicalInfo, is allowed to change within a transaction until an XID is assigned. Once an XID is assigned, the value becomes fixed for the remainder of the transaction. This behavior ensures that the logging mode remains consistent within a writing transaction, similar to the behavior of GUC parameters. A new read-only GUC parameter effective_wal_level is introduced to monitor the actual WAL level in effect. This parameter reflects the current operational WAL level, which may differ from the configured wal_level setting. Bump PG_CONTROL_VERSION as it adds a new field to CheckPoint struct. Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAD21AoCVLeLYq09pQPaWs+Jwdni5FuJ8v2jgq-u9_uFbcp6UbA@mail.gmail.com	2025-12-23 10:13:16 -08:00
Michael Paquier	e5f3839af6	Switch buffile.c/h to use pgoff_t instead of off_t off_t was previously used for offsets, which is 4 bytes on Windows, hence limiting the backend code to a hard limit for files longer than 2GB. This leads to some simplification in these files, removing some casts based on long, also 4 bytes on Windows. This commit removes one comment introduced in `db3c4c3a2d`, not relevant anymore as pgoff_t is a safe 8-byte alternative on Windows. This change is surprisingly not invasive, as the callers of BufFileTell(), BufFileSeek() and BufFileTruncateFileSet() (worker.c, tuplestore.c, etc.) track offsets in local structures that just to switch from off_t to pgoff_t for the most part. The file is still relying on a maximum file size of MAX_PHYSICAL_FILESIZE (1GB). This change allows the code to make this maximum potentially larger in the future, or larger on a per-demand basis. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aUStrqoOCDRFAq1M@paquier.xyz	2025-12-23 07:41:34 +09:00
Andres Freund	09ae2c8bac	bufmgr: Optimize & harmonize LockBufHdr(), LWLockWaitListLock() The main optimization is for LockBufHdr() to delay initializing SpinDelayStatus, similar to what LWLockWaitListLock already did. The initialization is sufficiently expensive & buffer header lock acquisitions are sufficiently frequent, to make it worthwhile to instead have a fastpath (via a likely() branch) that does not initialize the SpinDelayStatus. While LWLockWaitListLock() already the aforementioned optimization, it did not use likely(), and inspection of the assembly shows that this indeed leads to worse code generation (also observed in a microbenchmark). Fix that by adding the likely(). While the LockBufHdr() improvement is a small gain on its own, it mainly is aimed at preventing a regression after a future commit, which requires additional locking to set hint bits. While touching both, also make the comments more similar to each other. Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-12-19 13:23:33 -05:00
Nathan Bossart	48d4a1423d	Allow passing a pointer to GetNamedDSMSegment()'s init callback. This commit adds a new "void *arg" parameter to GetNamedDSMSegment() that is passed to the initialization callback function. This is useful for reusing an initialization callback function for multiple DSM segments. Author: Zsolt Parragi <zsolt.parragi@percona.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAN4CZFMjh8TrT9ZhWgjVTzBDkYZi2a84BnZ8bM%2BfLPuq7Cirzg%40mail.gmail.com	2025-12-15 14:27:16 -06:00
Andres Freund	30df61990c	bufmgr: Add one-entry cache for private refcount The private refcount entry for a buffer is often looked up repeatedly for the same buffer, e.g. to pin and then unpin a buffer. Benchmarking shows that it's worthwhile to have a one-entry cache for that case. With that cache in place, it's worth splitting GetPrivateRefCountEntry() into a small inline portion (for the cache hit case) and an out-of-line helper for the rest. This is helpful for some workloads today, but becomes more important in an upcoming patch that will utilize the private refcount infrastructure to also store whether the buffer is currently locked, as that increases the rate of lookups substantially. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/6rgb2nvhyvnszz4ul3wfzlf5rheb2kkwrglthnna7qhe24onwr@vw27225tkyar	2025-12-14 13:09:43 -05:00
Andres Freund	edbaaea0a9	bufmgr: Separate keys for private refcount infrastructure This makes lookups faster, due to allowing auto-vectorized lookups. It is also beneficial for an upcoming patch, independent of auto-vectorization, as the upcoming patch wants to track more information for each pinned buffer, making the existing loop, iterating over an array of PrivateRefCountEntry, more expensive due to increasing its size. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-12-14 13:09:43 -05:00
Peter Eisentraut	493eb0da31	Replace most StaticAssertStmt() with StaticAssertDecl() Similar to commit `75f49221c2`, it is preferable to use StaticAssertDecl() instead of StaticAssertStmt() when possible. Discussion: https://www.postgresql.org/message-id/flat/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com	2025-12-12 10:06:40 +01:00
Nathan Bossart	b4cbc106a6	Fix some comments. Like commit `123661427b`, these were discovered while reviewing Aleksander Alekseev's proposed changes to pgindent.	2025-12-11 15:13:04 -06:00
Heikki Linnakangas	b65f1ad9b1	Fix comment on how temp files and subtransactions are handled The comment was accurate a long time ago, but not any more. I failed to update the comment in commit `ab3148b712`.	2025-12-11 15:57:11 +02:00
Michael Paquier	4f7dacc5b8	Use palloc_object() and palloc_array(), the last change This is the last batch of changes that have been suggested by the author, this part covering the non-trivial changes. Some of the changes suggested have been discarded as they seem to lead to more instructions generated, leaving the parts that can be qualified as in-place replacements. Similar work has been done in `1b105f9472`, `0c3c5c3b06` and `31d3847a37`. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com	2025-12-11 14:29:12 +09:00
Heikki Linnakangas	fa44b8b7fb	Fix some near-bugs related to ResourceOwner function arguments These functions took a ResourceOwner argument, but only checked if it was NULL, and then used CurrentResourceOwner for the actual work. Surely the intention was to use the passed-in resource owner. All current callers passed CurrentResourceOwner or NULL, so this has no consequences at the moment, but it's an accident waiting to happen for future caller and extensions. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://www.postgresql.org/message-id/CAEze2Whnfv8VuRZaohE-Af+GxBA1SNfD_rXfm84Jv-958UCcJA@mail.gmail.com Backpatch-through: 17	2025-12-10 11:43:16 +02:00
Heikki Linnakangas	bae9d2f892	Fix typo in comment Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/CABPTF7V8CbOXGePqrad6EH3Om7DRhNiO3C0rQ-62UuT7RdU-GQ@mail.gmail.com	2025-12-10 01:06:03 +02:00
Michael Paquier	1b105f9472	Use palloc_object() and palloc_array() in backend code The idea is to encourage more the use of these new routines across the tree, as these offer stronger type safety guarantees than palloc(). This batch of changes includes most of the trivial changes suggested by the author for src/backend/. A total of 334 files are updated here. Among these files, 48 of them have their build change slightly; these are caused by line number changes as the new allocation formulas are simpler, shaving around 100 lines of code in total. Similar work has been done in `0c3c5c3b06` and `31d3847a37`. Author: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com	2025-12-10 07:36:46 +09:00
Peter Eisentraut	2268f2b91b	Remove useless casts in format arguments There were a number of useless casts in format arguments, either where the input to the cast was already in the right type, or seemingly uselessly casting between types instead of just using the right format placeholder to begin with. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/07fa29f9-42d7-4aac-8834-197918cbbab6%40eisentraut.org	2025-12-09 07:33:08 +01:00
Peter Eisentraut	2b117bb014	Remove unnecessary casts in printf format arguments (%zu/%zd) Many of these are probably left over from before use of %zu/%zd was portable. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/07fa29f9-42d7-4aac-8834-197918cbbab6%40eisentraut.org	2025-12-09 07:33:08 +01:00
Nathan Bossart	80f6e2fb4a	Fix key size of PrivateRefCountHash. The key is the first member of PrivateRefCountEntry, which has type Buffer. This commit changes the key size from sizeof(int32) to sizeof(Buffer). This appears to be an oversight in commit `4b4b680c3d`, but it's of no consequence because Buffer has been a signed 32-bit integer for a long time. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aS77DTpl0fOkIKSZ%40ip-10-97-1-34.eu-west-3.compute.internal	2025-12-04 15:42:18 -06:00
Andres Freund	6c5c393b74	Rename BUFFERPIN wait event class to BUFFER In an upcoming patch more wait events will be added to the wait event class (for buffer locking), making the current name too specific. Alternatively we could introduce a dedicated wait event class for those, but it seems somewhat confusing to have a BUFFERPIN and a BUFFER wait event class. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-12-03 18:38:20 -05:00
Andres Freund	156680055d	bufmgr: Turn BUFFER_LOCK_* into an enum It seems cleaner to use an enum to tie the different values together. It also helps to have a more descriptive type in the argument to various functions. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-12-03 18:38:20 -05:00
Nathan Bossart	f894acb24a	Show size of DSAs and dshashes in pg_dsm_registry_allocations. Presently, this view reports NULL for the size of DSAs and dshash tables because 1) the current backend might not be attached to them and 2) the registry doesn't save the pointers to the dsa_area or dshash_table in local memory. Also, the view doesn't show partially-initialized entries to avoid ambiguity, since those entries would report a NULL size as well. This commit introduces a function that looks up the size of a DSA given its handle (transiently attaching to the control segment if needed) and teaches pg_dsm_registry_allocations to use it to show the size of successfully-initialized DSA and dshash entries. Furthermore, the view now reports partially-initialized entries with a NULL size. Reviewed-by: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aSeEDeznAsHR1_YF%40nathan	2025-12-02 10:29:45 -06:00
Peter Eisentraut	4f941d432b	Remove useless casting to same type This removes some casts where the input already has the same type as the type specified by the cast. Their presence could cause risks of hiding actual type mismatches in the future or silently discarding qualifiers. It also improves readability. Same kind of idea as `7f798aca1d` and `ef8fe69360`. (This does not change all such instances, but only those hand-picked by the author.) Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/aSQy2JawavlVlEB0%40ip-10-97-1-34.eu-west-3.compute.internal	2025-12-02 10:09:32 +01:00
Michael Paquier	9660906dbd	Add routines for marking buffers dirty efficiently This commit introduces new internal bufmgr routines for marking shared buffers as dirty: * MarkDirtyUnpinnedBuffer() * MarkDirtyRelUnpinnedBuffers() * MarkDirtyAllUnpinnedBuffers() These functions provide an efficient mechanism to respectively mark one buffer, all the buffers of a relation, or the entire shared buffer pool as dirty, something that can be useful to force patterns for the checkpointer. MarkDirtyUnpinnedBufferInternal(), an extra routine, is used by these three, to mark as dirty an unpinned buffer. They are intended as developer tools to manipulate buffer dirtiness in bulk, and will be used in a follow-up commit. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Joseph Koshakow <koshy44@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Yuhang Qiu <iamqyh@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw@mail.gmail.com	2025-11-28 07:39:33 +09:00
Nathan Bossart	dbdc717ac6	Teach DSM registry to retry entry initialization if needed. If DSM registry entry initialization fails, backends could try to use an uninitialized DSM segment, DSA, or dshash table (since the entry is still added to the registry). To fix, restructure the code so that the registry retries initialization as needed. This commit also modifies pg_get_dsm_registry_allocations() to leave out partially-initialized entries, as they shouldn't have any allocated memory. DSM registry entry initialization shouldn't fail often in practice, but retrying was deemed better than leaving entries in a permanently failed state (as was done by commit `1165a933aa`, which has since been reverted). Suggested-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/E1vJHUk-006I7r-37%40gemulon.postgresql.org Backpatch-through: 17	2025-11-26 15:12:25 -06:00
Nathan Bossart	2dd506b859	Revert "Teach DSM registry to ERROR if attaching to an uninitialized entry." This reverts commit `1165a933aa` (and the corresponding commits on the back-branches). In a follow-up commit, we'll teach the registry to retry entry initialization instead of leaving it in a permanently failed state. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/E1vJHUk-006I7r-37%40gemulon.postgresql.org Backpatch-through: 17	2025-11-26 11:37:21 -06:00
Nathan Bossart	9446f918ac	Remove a few unused struct members. Oversights in commits `ab9e0e718a`, `f3049a603a`, and `247ce06b88`. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aScUuBSawPWogUxs%40ip-10-97-1-34.eu-west-3.compute.internal	2025-11-26 09:50:00 -06:00
Andres Freund	81f7738953	lwlock: Fix, currently harmless, bug in LWLockWakeup() Accidentally the code in LWLockWakeup() checked the list of to-be-woken up processes to see if LW_FLAG_HAS_WAITERS should be unset. That means that HAS_WAITERS would not get unset immediately, but only during the next, unnecessary, call to LWLockWakeup(). Luckily, as the code stands, this is just a small efficiency issue. However, if there were (as in a patch of mine) a case in which LWLockWakeup() would not find any backend to wake, despite the wait list not being empty, we'd wrongly unset LW_FLAG_HAS_WAITERS, leading to potentially hanging. While the consequences in the backbranches are limited, the code as-is confusing, and it is possible that there are workloads where the additional wait list lock acquisitions hurt, therefore backpatch. Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff Backpatch-through: 14	2025-11-24 18:10:48 -05:00
Jacob Champion	e2ceff13d8	postgres: Use pg_{add,mul}_size_overflow() The backend implementations of add_size() and mul_size() can now make use of the APIs provided in common/int.h. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAOYmi%2B%3D%2BpqUd2MUitvgW1pAJuXgG_TKCVc3_Ek7pe8z9nkf%2BAg%40mail.gmail.com	2025-11-24 09:59:54 -08:00
Peter Eisentraut	ef8fe69360	Remove useless casts to (void *) Their presence causes (small) risks of hiding actual type mismatches or silently discarding qualifiers. Some have been missed in `7f798aca1d` and some are new ones along the same lines. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/aR8Yv%2BuATLKbJCgI%40ip-10-97-1-34.eu-west-3.compute.internal	2025-11-21 16:49:40 +01:00
Peter Eisentraut	97e04c74be	C11 alignas instead of unions This changes a few union members that only existed to ensure alignments and replaces them with the C11 alignas specifier. This change only uses fundamental alignments (meaning approximately alignments of basic types), which all C11 compilers must support. There are opportunities for similar changes using extended alignments, for example in PGIOAlignedBlock, but these are not necessarily supported by all compilers, so they are kept as a separate change. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org	2025-11-21 10:08:24 +01:00
Thomas Munro	017249b828	Add some missing #include <limits.h>. These files relied on transitive inclusion via port/atomics.h for constants CHAR_BIT and INT_MAX. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/536409d2-c9df-4ef3-808d-1ffc3182868c@iki.fi	2025-11-13 22:56:08 +13:00
Michael Paquier	84fb27511d	Replace off_t by pgoff_t in I/O routines PostgreSQL's Windows port has never been able to handle files larger than 2GB due to the use of off_t for file offsets, only 32-bit on Windows. This causes signed integer overflow at exactly 2^31 bytes when trying to handle files larger than 2GB, for the routines touched by this commit. Note that large files are forbidden by ./configure (`3c6248a828`) and meson (recent change, see `79cd66f28c`). This restriction also exists in v16 and older versions for the now-dead MSVC scripts. The code base already defines pgoff_t as __int64 (64-bit) on Windows for this purpose, and some function declarations in headers use it, but many internals still rely on off_t. This commit switches more routines to use pgoff_t, offering more portability, for areas mainly related to file extensions and storage. These are not critical for WAL segments yet, which have currently a maximum size allowed of 1GB (well, this opens the door at allowing a larger size for them). This matters more for segment files if we want to lift the large file restriction in ./configure and meson in the future, which would make sense to remove once/if all traces of off_t are gone from the tree. This can additionally matter for out-of-core code that may want files larger than 2GB in places where off_t is four bytes in size. Note that off_t is still used in other parts of the tree like buffile.c, WAL sender/receiver, base backup, pg_combinebackup, etc. These other code paths can be addressed separately, and their update will be required if we want to remove the large file restriction in the future. This commit is a good first cut in itself towards more portability, hopefully. On Unix-like systems, pgoff_t is defined as off_t, so this change only affects Windows behavior. Author: Bryan Green <dbryan.green@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/0f238ff4-c442-42f5-adb8-01b762c94ca1@gmail.com	2025-11-13 12:41:40 +09:00
Nathan Bossart	1165a933aa	Teach DSM registry to ERROR if attaching to an uninitialized entry. If DSM entry initialization fails, backends could try to use an uninitialized DSM segment, DSA, or dshash table (since the entry is still added to the registry). To fix, keep track of whether initialization completed, and ERROR if a backend tries to attach to an uninitialized entry. We could instead retry initialization as needed, but that seemed complicated, error prone, and unlikely to help most cases. Furthermore, such problems probably indicate a coding error. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/dd36d384-55df-4fc2-825c-5bc56c950fa9%40gmail.com Backpatch-through: 17	2025-11-12 14:30:11 -06:00
Bruce Momjian	6204d07ad6	Remove blank line in C code. Was added in commit `5e89985928`. Reported-by: Ashutosh Bapat Author: Ashutosh Bapat Discussion: https://postgr.es/m/CAExHW5tba_biyuMrd_iPVzq-+XvsMdPcEnjQ+d+__V=cjYj8Pg@mail.gmail.com Backpatch-through: master	2025-11-07 21:54:25 -05:00
Andres Freund	5310fac6e0	bufmgr: Use atomic sub for unpinning buffers The prior commit made it legal to modify BufferDesc.state while the buffer header spinlock is held. This allows us to replace the CAS loop inUnpinBufferNoOwner() with an atomic sub. This improves scalability significantly. See the prior commits for more background. Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-11-06 16:43:16 -05:00
Andres Freund	c75ebc657f	bufmgr: Allow some buffer state modifications while holding header lock Until now BufferDesc.state was not allowed to be modified while the buffer header spinlock was held. This meant that operations like unpinning buffers needed to use a CAS loop, waiting for the buffer header spinlock to be released before updating. The benefit of that restriction is that it allowed us to unlock the buffer header spinlock with just a write barrier and an unlocked write (instead of a full atomic operation). That was important to avoid regressions in `48354581a4`. However, since then the hottest buffer header spinlock uses have been replaced with atomic operations (in particular, the most common use of PinBuffer_Locked(), in GetVictimBuffer() (formerly in BufferAlloc()), has been removed in `5e89985928`). This change will allow, in a subsequent commit, to release buffer pins with a single atomic-sub operation. This previously was not possible while such operations were not allowed while the buffer header spinlock was held, as an atomic-sub would not have allowed a race-free check for the buffer header lock being held. Using atomic-sub to unpin buffers is a nice scalability win, however it is not the primary motivation for this change (although it would be sufficient). The primary motivation is that we would like to merge the buffer content lock into BufferDesc.state, which will result in more frequent changes of the state variable, which in some situations can cause a performance regression, due to an increased CAS failure rate when unpinning buffers. The regression entirely vanishes when using atomic-sub. Naively implementing this would require putting CAS loops in every place modifying the buffer state while holding the buffer header lock. To avoid that, introduce UnlockBufHdrExt(), which can set/add flags as well as the refcount, together with releasing the lock. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-11-06 16:42:10 -05:00
Álvaro Herrera	a2b02293bc	Use XLogRecPtrIsValid() in various places Now that commit `06edbed478` has introduced XLogRecPtrIsValid(), we can use that instead of: - XLogRecPtrIsInvalid() - direct comparisons with InvalidXLogRecPtr - direct comparisons with literal 0 This makes the code more consistent. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal	2025-11-06 20:33:57 +01:00
Heikki Linnakangas	aa9c5fd3e3	Refactor shared memory allocation for semaphores Before commit `e25626677f`, spinlocks were implemented using semaphores on some platforms (--disable-spinlocks). That made it necessary to initialize semaphores early, before any spinlocks could be used. Now that we don't support --disable-spinlocks anymore, we can allocate the shared memory needed for semaphores the same way as other shared memory structures. Since the semaphores are used only in the PGPROC array, move the semaphore shmem size estimation and initialization calls to ProcGlobalShmemSize() and InitProcGlobal(). Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CAExHW5seSZpPx-znjidVZNzdagGHOk06F+Ds88MpPUbxd1kTaA@mail.gmail.com	2025-11-06 14:45:00 +02:00
Heikki Linnakangas	daf3d99d2b	Add comment to explain why PGReserveSemaphores() is called early Before commit `e25626677f`, PGReserveSemaphores() had to be called before SpinlockSemaInit() because spinlocks were implemented using semaphores on some platforms (--disable-spinlocks). Add a comment explaining that. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CAExHW5seSZpPx-znjidVZNzdagGHOk06F+Ds88MpPUbxd1kTaA@mail.gmail.com Backpatch-to: 18	2025-11-06 14:20:48 +02:00
Alexander Korotkov	447aae13b0	Implement WAIT FOR command WAIT FOR is to be used on standby and specifies waiting for the specific WAL location to be replayed. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes are on standby. WAIT FOR needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records, implying a kind of self-deadlock. This is why separate utility command seems appears to be the most robust way to implement this functionality. It's not possible to implement this as a function. Previous experience shows that stored procedures also have limitation in this aspect. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>	2025-11-05 11:44:13 +02:00
Alexander Korotkov	3b4e53a075	Add infrastructure for efficient LSN waiting Implement a new facility that allows processes to wait for WAL to reach specific LSNs, both on primary (waiting for flush) and standby (waiting for replay) servers. The implementation uses shared memory with per-backend information organized into pairing heaps, allowing O(1) access to the minimum waited LSN. This enables fast-path checks: after replaying or flushing WAL, the startup process or WAL writer can quickly determine if any waiters need to be awakened. Key components: - New xlogwait.c/h module with WaitForLSNReplay() and WaitForLSNFlush() - Separate pairing heaps for replay and flush waiters - WaitLSN lightweight lock for coordinating shared state - Wait events WAIT_FOR_WAL_REPLAY and WAIT_FOR_WAL_FLUSH for monitoring This infrastructure can be used by features that need to wait for WAL operations to complete. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>	2025-11-05 11:44:13 +02:00
Andres Freund	dae00f333b	aio: Improve assertions related to io_method First, the assertions in assign_io_method() were the wrong way round. Second, the lengthof() assertion checked the length of io_method_options, which is the wrong array to check and is always longer than pgaio_method_ops_table. While add it, add a static assert to ensure pgaio_method_ops_table and io_method_options stay in sync. Per coverity and Tom Lane. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Backpatch-through: 18	2025-11-04 20:03:53 -05:00
Masahiko Sawada	8ae0f6a0c3	Add CHECK_FOR_INTERRUPTS in Evict{Rel,All}UnpinnedBuffers. This commit adds CHECK_FOR_INTERRUPTS to the shared buffer iteration loops in EvictRelUnpinnedBuffers and EvictAllUnpinnedBuffers. These functions, used by pg_buffercache's pg_buffercache_evict_relation and pg_buffercache_evict_all, can now be interrupted during long-running operations. Backpatch to version 18, where these functions and their corresponding pg_buffercache functions were introduced. Author: Yuhang Qiu <iamqyh@gmail.com> Discussion: https://postgr.es/m/8DC280D4-94A2-4E7B-BAB9-C345891D0B78%40gmail.com Backpatch-through: 18	2025-11-04 15:47:25 -08:00

1 2 3 4 5 ...

2996 Commits