postgres

mirror of https://github.com/postgres/postgres.git synced 2026-01-26 09:41:40 +03:00

Author	SHA1	Message	Date
Michael Paquier	d756fa1019	Add pg_clear_extended_stats() This function is able to clear the data associated to an extended statistics object, making things so as the object looks as newly-created. The caller of this function needs the following arguments for the extended stats to clear: - The name of the relation. - The schema name of the relation. - The name of the extended stats object. - The schema name of the extended stats object. - If the stats are inherited or not. The first two parameters are especially important to ensure a consistent lookup and ACL checks for the relation on which is based the extended stats object that will be cleared, relying first on a RangeVar lookup where permissions are checked without locking a relation, critical to prevent denial-of-service attacks when using this kind of function (see also `688dc6299a` for a similar concern). The third to fifth arguments give a way to target the extended stats records to clear. This has been extracted from a larger patch by the same author, for a piece which is again useful on its own. I have rewritten large portions of it. The tests have been extended while discussing this piece, resulting on what this commit includes. The intention behind this feature is to add support for the import of extended statistics across dumps and upgrades, this change building one piece that we will be able to rely on for the rest of the changes. Bump catalog version. Author: Corey Huinker <corey.huinker@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2026-01-16 08:13:30 +09:00
Andres Freund	d40fd85187	lwlock: Remove support for disowned lwlwocks This reverts commit `f8d7f29b3e`, plus parts of subsequent commits fixing a typo in a parameter name. Support for disowned lwlocks was added for the benefit of AIO, to be able to have content locks "owned" by the AIO subsystem. But as of commit `fcb9c977aa`, content locks do not use lwlocks anymore. It does not seem particularly likely that we need this facility outside of the AIO use-case, therefore remove the now unused functions. I did choose to keep the comment added in the aforementioned commit about lock->owner intentionally being left pointing to the last owner. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/cj5mcjdpucvw4a54hehslr3ctukavrbnxltvuzzhqnimvpju5e@cy3g3mnsefwz	2026-01-15 14:57:45 -05:00
Andres Freund	55fbfb738b	lwlock: Remove ForEachLWLockHeldByMe As of commit `fcb9c977aa`, ForEachLWLockHeldByMe(), introduced in `f4ece891fc`, is not used anymore, as content locks are now implemented in bufmgr.c. It doesn't seem that likely that a new user of the functionality will appear all that soon, making removal of the function seem like the most sensible path. It can easily be added back if necessary. Discussion: https://postgr.es/m/lneuyxqxamqoayd2ntau3lqjblzdckw6tjgeu4574ezwh4tzlg%40noioxkquezdw	2026-01-15 14:57:45 -05:00
Andres Freund	335f2231a3	pgindent fix for `8077649907` Per buildfarm member koel. Backpatch-through: 18	2026-01-15 14:57:45 -05:00
Andres Freund	fcb9c977aa	bufmgr: Implement buffer content locks independently of lwlocks Until now buffer content locks were implemented using lwlocks. That has the obvious advantage of not needing a separate efficient implementation of locks. However, the time for a dedicated buffer content lock implementation has come: 1) Hint bits are currently set while holding only a share lock. This leads to having to copy pages while they are being written out if checksums are enabled, which is not cheap. We would like to add AIO writes, however once many buffers can be written out at the same time, it gets a lot more expensive to copy them, particularly because that copy needs to reside in shared buffers (for worker mode to have access to the buffer). In addition, modifying buffers while they are being written out can cause issues with unbuffered/direct-IO, as some filesystems (like btrfs) do not like that, due to filesystem internal checksums getting corrupted. The solution to this is to require a new share-exclusive lock-level to set hint bits and to write out buffers, making those operations mutually exclusive. We could introduce such a lock-level into the generic lwlock implementation, however it does not look like there would be other users, and it does add some overhead into important code paths. 2) For AIO writes we need to be able to race-freely check whether a buffer is undergoing IO and whether an exclusive lock on the page can be acquired. That is rather hard to do efficiently when the buffer state and the lock state are separate atomic variables. This is a major hindrance to allowing writes to be done asynchronously. 3) Buffer locks are by far the most frequently taken locks. Optimizing them specifically for their use case is worth the effort. E.g. by merging content locks into buffer locks we will be able to release a buffer lock and pin in one atomic operation. 4) There are more complicated optimizations, like long-lived "super pinned & locked" pages, that cannot realistically be implemented with the generic lwlock implementation. Therefore implement content locks inside bufmgr.c. The lockstate is stored as part of BufferDesc.state. The implementation of buffer content locks is fairly similar to lwlocks, with a few important differences: 1) An additional lock-level share-exclusive has been added. This lock-level conflicts with exclusive locks and itself, but not share locks. 2) Error recovery for content locks is implemented as part of the already existing private-refcount tracking mechanism in combination with resowners, instead of a bespoke mechanism as the case for lwlocks. This means we do not need to add dedicated error-recovery code paths to release all content locks (like done with LWLockReleaseAll() for lwlocks). 3) The lock state is embedded in BufferDesc.state instead of having its own struct. 4) The wakeup logic is a tad more complicated due to needing to support the additional lock-level This commit unfortunately introduces some code that is very similar to the code in lwlock.c, however the code is not equivalent enough to easily merge it. The future wins that this commit makes possible seem worth the cost. As of this commit nothing uses the new share-exclusive lock mode. It will be used in a future commit. It seemed too complicated to introduce the lock-level in a separate commit. It's worth calling out one wart in this commit: Despite content locks not being lwlocks anymore, they continue to use PGPROC->lw* - that seemed better than duplicating the relevant infrastructure. Another thing worth pointing out is that, after this change, content locks are not reported as LWLock wait events anymore, but as new wait events in the "Buffer" wait event class (see also `6c5c393b74`). The old BufferContent lwlock tranche has been removed. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2026-01-15 14:26:53 -05:00
Andres Freund	dac328c8a6	bufmgr: Change BufferDesc.state to be a 64-bit atomic This is motivated by wanting to merge buffer content locks into BufferDesc.state in a future commit, rather than having a separate lwlock (see commit `c75ebc657f` for more details). As this change is rather mechanical, it seems to make sense to split it out into a separate commit, for easier review. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2026-01-15 14:20:41 -05:00
Tom Lane	282b1cde9d	Optimize LISTEN/NOTIFY via shared channel map and direct advancement. This patch reworks LISTEN/NOTIFY to avoid waking backends that have no need to process the notification messages we just sent. The primary change is to create a shared hash table that tracks which processes are listening to which channels (where a "channel" is defined by a database OID and channel name). This allows a notifying process to accurately determine which listeners are interested, replacing the previous weak approximation that listeners in other databases couldn't be interested. Secondly, if a listener is known not to be interested and is currently stopped at the old queue head, we avoid waking it at all and just directly advance its queue pointer past the notifications we inserted. These changes permit very significant improvements (integer multiples) in NOTIFY throughput, as well as a noticeable reduction in latency, when there are many listeners but only a few are interested in any specific message. There is no improvement for the simplest case where every listener reads every message, but any loss seems below the noise level. Author: Joel Jacobson <joel@compiler.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com	2026-01-15 14:12:15 -05:00
Heikki Linnakangas	23b25586dc	Fix 'unexpected data beyond EOF' on replica restart On restart, a replica can fail with an error like 'unexpected data beyond EOF in block 200 of relation T/D/R'. These are the steps to reproduce it: - A relation has a size of 400 blocks. - Blocks 201 to 400 are empty. - Block 200 has two rows. - Blocks 100 to 199 are empty. - A restartpoint is done - Vacuum truncates the relation to 200 blocks - A FPW deletes a row in block 200 - A checkpoint is done - A FPW deletes the last row in block 200 - Vacuum truncates the relation to 100 blocks - The replica restarts When the replica restarts: - The relation on disk starts at 100 blocks, because all the truncations were applied before restart. - The first truncate to 200 blocks is replayed. It silently fails, but it will still (incorrectly!) update the cache size to 200 blocks - The first FPW on block 200 is applied. XLogReadBufferForRead relies on the cached size and incorrectly assumes that the page already exists in the file, and thus won't extend the relation. - The online checkpoint record is replayed, calling smgrdestroyall which causes the cached size to be discarded - The second FPW on block 200 is applied. This time, the detected size is 100 blocks, an extend is attempted. However, the block 200 is already present in the buffer cache due to the first FPW. This triggers the 'unexpected data beyond EOF'. To fix, update the cached size in SmgrRelation with the current size rather than the requested new size, when the requested new size is greater. Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Discussion: https://www.postgresql.org/message-id/CAO6_Xqrv-snNJNhbj1KjQmWiWHX3nYGDgAc=vxaZP3qc4g1Siw@mail.gmail.com Backpatch-through: 14	2026-01-15 21:02:49 +02:00
Álvaro Herrera	35e3fae738	Remove #include <math.h> where not needed Liujinyang reported the one in binaryheap.c, I then found and analyzed the rest. For future patches, we require git archaelogical analysis before we accept patches of this nature. Co-authored-by: liujinyang <21043272@qq.com> Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/tencent_6B302BFCAF6F010E00AB5C2C0ECB7AA3F205@qq.com	2026-01-15 19:09:47 +01:00
Andres Freund	8077649907	aio: io_uring: Fix danger of completion getting reused before being read We called io_uring_cqe_seen(..., cqe) before reading cqe->res. That allows the completion to be reused, which in turn could lead to cqe->res being overwritten. The window for that is very narrow and the likelihood of it happening is very low, as we should never actually utilize all CQEs, but the consequences would be bad. This bug was reported to me privately. Backpatch-through: 18 Discussion: https://postgr.es/m/bwo3e5lj2dgi2wzq4yvbyzu7nmwueczvvzioqsqo6azu6lm5oy@pbx75g2ach3p	2026-01-15 11:09:07 -05:00
Heikki Linnakangas	d9c3c94365	Wake up autovacuum launcher from postmaster when a worker exits When an autovacuum worker exits, the launcher needs to be notified with SIGUSR2, so that it can rebalance and possibly launch a new worker. The launcher must be notified only after the worker has finished ProcKill(), so that the worker slot is available for a new worker. Before this commit, the autovacuum worker was responsible for that, which required a slightly complicated dance to pass the launcher's PID from FreeWorkerInfo() to ProcKill() in a global variable. Simplify that by moving the responsibility of the signaling to the postmaster. The postmaster was already doing it when it failed to fork a worker process, so it seems logical to make it responsible for notifying the launcher on worker exit too. That's also how the notification on background worker exit is done. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: li carol <carol.li2025@outlook.com> Discussion: https://www.postgresql.org/message-id/a5e27d25-c7e7-45d5-9bac-a17c8f462def@iki.fi	2026-01-15 18:02:25 +02:00
Heikki Linnakangas	102bdaa9be	Add check for invalid offset at multixid truncation If a multixid with zero offset is left behind after a crash, and that multixid later becomes the oldest multixid, truncation might try to look up its offset and read the zero value. In the worst case, we might incorrectly use the zero offset to truncate valid SLRU segments that are still needed. I'm not sure if that can happen in practice, or if there are some other lower-level safeguards or incidental reasons that prevent the caller from passing an unwritten multixid as the oldest multi. But better safe than sorry, so let's add an explicit check for it. In stable branches, we should perhaps do the same check for 'oldestOffset', i.e. the offset of the old oldest multixid (in master, 'oldestOffset' is gone). But if the old oldest multixid has an invalid offset, the damage has been done already, and we would never advance past that point. It's not clear what we should do in that case. The check that this commit adds will prevent such an multixid with invalid offset from becoming the oldest multixid in the first place, which seems enough for now. Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: Discussion: https://www.postgresql.org/message-id/000301b2-5b81-4938-bdac-90f6eb660843@iki.fi Backpatch-through: 14	2026-01-15 16:48:45 +02:00
Heikki Linnakangas	c4b71e6f60	Remove some unnecessary code from multixact truncation With 64-bit multixact offsets, PerformMembersTruncation() doesn't need the starting offset anymore. The 'oldestOffset' value that TruncateMultiXact() calculates is no longer used for anything. Remove it, and the code to calculate it. 'oldestOffset' was included in the WAL record as 'startTruncMemb', which sounds nice if you e.g. look at the WAL with pg_waldump, but it was also confusing because we didn't actually use the value for determining what to truncate. Replaying the WAL would remove all segments older than 'endTruncMemb', regardless of 'startTruncMemb'. The 'startTruncOff' stored in the WAL record was similarly unnecessary even before 64-bit multixid offsets, it was stored just for the sake of symmetry with 'startTruncMemb'. Remove both from the WAL record, and rename the remaining 'endTruncOff' to 'oldestMulti' and 'endTruncMemb' to 'oldestOffset', for consistency with the variable names used for them in other places. Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://www.postgresql.org/message-id/000301b2-5b81-4938-bdac-90f6eb660843@iki.fi	2026-01-15 13:34:50 +02:00
Peter Eisentraut	da265a8717	plpython: Streamline initialization The initialization of PL/Python (the Python interpreter, the global state, the plpy module) was arranged confusingly across different functions with unclear and confusing boundaries. For example, PLy_init_interp() said "Initialize the Python interpreter ..." but it didn't actually do this, and PLy_init_plpy() said "initialize plpy module" but it didn't do that either. After this change, all the global initialization is called directly from _PG_init(), and the plpy module initialization is all called from its registered initialization function PyInit_plpy(). Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: li carol <carol.li2025@outlook.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org	2026-01-15 12:11:52 +01:00
Peter Eisentraut	3263a893fb	plpython: Remove duplicate PyModule_Create() This seems to have existed like this since Python 3 support was added (commit `dd4cd55c15`), but it's unclear what this second call is supposed to accomplish. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: li carol <carol.li2025@outlook.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org	2026-01-15 10:32:41 +01:00
Peter Eisentraut	34d8111c3a	plpython: Clean up PyModule_AddObject() uses The comments "PyModule_AddObject does not add a refcount to the object, for some odd reason" seem distracting. Arguably, this behavior is expected, not odd. Also, the additional references created by the existing code are apparently not necessary. But we should clean up the reference in the error case, as suggested by the Python documentation. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: li carol <carol.li2025@outlook.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org	2026-01-15 10:32:38 +01:00
Peter Eisentraut	8cb95a0645	plpython: Remove commented out code This code has been commented out since the first commit of plpython. It doesn't seem worth keeping. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: li carol <carol.li2025@outlook.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org	2026-01-15 10:32:34 +01:00
Michael Paquier	32e27bd320	Introduce routines to validate and free MVNDistinct and MVDependencies These routines are useful to perform some basic validation checks on each object structure, working currently on attribute numbers for non-expression and expression attnums. These checks could be extended in the future. Note that this code is not used yet in the tree, and that these functions will become handy for an upcoming patch for the import of extended statistics data. However, they are worth their own independent change as they are actually useful by themselves, with at least the extension code argument in mind (or perhaps I am just feeling more pedantic today). Extracted from a larger patch by the same author, with many adjustments and fixes by me. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2026-01-15 09:36:05 +09:00
Jeff Davis	ed425b5a20	Remove redundant assignment in CreateWorkExprContext In CreateWorkExprContext(), maxBlockSize is initialized to ALLOCSET_DEFAULT_MAXSIZE, and it then immediately reassigned, thus the initialization is a redundant. Author: Andreas Karlsson <andreas@proxel.se> Reported-by: Chao Li <lic@highgo.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/83a14f3c-f347-4769-9c01-30030b31f1eb@gmail.com	2026-01-14 12:01:36 -08:00
Andres Freund	556c92a689	lwlock: Improve local variable name In `9a385f6166` I used the variable name new_release_in_progress, but new_wake_in_progress makes more sense given the flag name. Suggested-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/AC5E365D-7AD9-47AE-B2C6-25756712B188@gmail.com	2026-01-14 11:15:38 -05:00
Peter Eisentraut	fa16e7fd84	Revert "Replace pg_restrict by standard restrict" This reverts commit `f0f2c0c1ae`. The original problem that led to the use of pg_restrict was that MSVC couldn't handle plain restrict, and defining it to something else would conflict with its __declspec(restrict) that is used in system header files. In C11 mode, this is no longer a problem, as MSVC handles plain restrict. This led to the commit to replace pg_restrict with restrict. But this did not take C++ into account. Standard C++ does not have restrict, so we defined it as something else (for example, MSVC supports __restrict). But this then again conflicts with __declspec(restrict) in system header files. So we have to revert this attempt. The comments are updated to clarify that the reason for this is now C++ only. Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/CAGECzQRoD7chJP1-dneSrhxUJv%2BBRcigoGOO4UwGzaShLot2Yw%40mail.gmail.com	2026-01-14 15:12:25 +01:00
Peter Eisentraut	794ba8b6a4	doc: Slightly correct advice on C/C++ linkage The documentation was writing that <literal>extern C</literal> should be used, but it should be <literal>extern "C"</literal>.	2026-01-14 15:05:29 +01:00
Peter Eisentraut	2bc60f8621	Enable Python Limited API for PL/Python on MSVC Previously, the Python Limited API was disabled on MSVC due to build failures caused by Meson not knowing to link against python3.lib instead of python3XX.lib when using the Limited API. This commit works around the Meson limitation by explicitly finding and linking against python3.lib on MSVC, and removes the preprocessor guard that was disabling the Limited API on MSVC in plpython.h. This requires python3.lib to be present in the Python installation, which is included when Python is installed. Author: Bryan Green <dbryan.green@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/ee410de1-1e0b-4770-b125-eeefd4726a24%40eisentraut.org	2026-01-14 10:43:51 +01:00
Álvaro Herrera	4196d6178a	Reword confusing comment to avoid "typo fixes" Author: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/CAApHDvqPmpa53jcTmfU8arFFm7=hB5cFoXX5dcUH=1qV0tRFHA@mail.gmail.com	2026-01-14 10:07:44 +01:00
Michael Paquier	6dcfac9696	Use more consistent *GetDatum() macros for some unsigned numbers This patch switches some code paths to use GetDatum() macros more in line with the data types of the variables they manipulate. This set of changes does not fix a problem, but it is always nice to be more consistent across the board. Author: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Roman Khapov <rkhapov@yandex-team.ru> Reviewed-by: Yuan Li <carol.li2025@outlook.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Man Zeng <zengman@halodbtech.com> Discussion: https://postgr.es/m/CALdSSPidtC7j3MwhkqRj0K2hyp36ztnnjSt6qzGxQtiePR1dzw@mail.gmail.com	2026-01-14 17:07:49 +09:00
Amit Kapila	e385a4e2fd	Prevent unintended dropping of active replication origins. Commit `5b148706c5` exposed functionality that allows multiple processes to use the same replication origin, enabling non-builtin logical replication solutions to implement parallel apply for large transactions. With this functionality, if two backends acquire the same replication origin and one of them resets it first, the acquired_by flag is cleared without acknowledging that another backend is still actively using the origin. This can lead to the origin being unintentionally dropped. If the shared memory for that dropped origin is later reused for a newly created origin, the remaining backend that still holds a pointer to the old memory may inadvertently advance the LSN of a completely different origin, causing unpredictable behavior. Although the underlying issue predates commit `5b148706c5`, it did not surface earlier because the internal parallel apply worker mechanism correctly coordinated origin resets and drops. This commit resolves the problem by introducing a reference counter for replication origins. The reference count increases when a backend sets the origin and decreases when it resets it. Additionally, the backend that first acquires the origin will not release it until all other backends using the origin have released it as well. The patch also prevents dropping a replication origin when acquired_by is zero but the reference counter is nonzero, covering the scenario where the first session exits without properly releasing the origin. Author: Hou Zhijie <houzj.fnst@fujitsu.com> Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/TY4PR01MB169077EE72ABE9E55BAF162D494B5A@TY4PR01MB16907.jpnprd01.prod.outlook.com Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com	2026-01-14 07:15:46 +00:00
Michael Paquier	4fe1ea7777	pg_waldump: Relax LSN comparison check in TAP test The test 002_save_fullpage.pl, checking --save-fullpage fails with wal_consistency_checking enabled, due to the fact that the block saved in the file has the same LSN as the LSN used in the file name. The test required that the block LSN is stritly lower than file LSN. This commit relaxes the check a bit, by allowing the LSNs to match. While on it, the test name is reworded to include some information about the file and block LSNs, which is useful for debugging. Author: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/4226AED7-E38F-419B-AAED-9BC853FB55DE@yandex-team.ru Backpatch-through: 16	2026-01-14 16:02:30 +09:00
Andres Freund	ff219c1987	bufmgr: Make definitions related to buffer descriptor easier to modify This is in preparation to widening the buffer state to 64 bits, which in turn is preparation for implementing content locks in bufmgr. This commit aims to make the subsequent commits a bit easier to review, by separating out reformatting etc from the actual changes. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/4csodkvvfbfloxxjlkgsnl2lgfv2mtzdl7phqzd4jxjadxm4o5@usw7feyb5bzf	2026-01-13 19:38:29 -05:00
Andres Freund	9a385f6166	lwlock: Invert meaning of LW_FLAG_RELEASE_OK Previously, a flag was set to indicate that a lock release should wake up waiters. Since waking waiters is the default behavior in the majority of cases, this logic has been inverted. The new LW_FLAG_WAKE_IN_PROGRESS flag is now set iff wakeups are explicitly inhibited. The motivation for this change is that in an upcoming commit, content locks will be implemented independently of lwlocks, with the lock state stored as part of BufferDesc.state. As all of a buffer's flags are cleared when the buffer is invalidated, without this change we would have to re-add the RELEASE_OK flag after clearing the flags; otherwise, the next lock release would not wake waiters. It seems good to keep the implementation of lwlocks and buffer content locks as similar as reasonably possible. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/4csodkvvfbfloxxjlkgsnl2lgfv2mtzdl7phqzd4jxjadxm4o5@usw7feyb5bzf	2026-01-13 19:38:29 -05:00
Michael Paquier	e217dc7484	Fix query jumbling with GROUP BY clauses RangeTblEntry.groupexprs was marked with the node attribute query_jumble_ignore, causing a list of GROUP BY expressions to be ignored during the query jumbling. For example, these two queries could be grouped together within the same query ID: SELECT count() FROM t GROUP BY a; SELECT count() FROM t GROUP BY b; However, as such queries use different GROUP BY clauses, they should be split across multiple entries. This fixes an oversight in `247dea89f7`, that has introduced an RTE for GROUP BY clauses. Query IDs are documented as being stable across minor releases, but as this is a regression new to v18 and that we are still early in its support cycle, a backpatch is exceptionally done as this has broken a behavior that exists since query jumbling is supported in core, since its introduction in pg_stat_statements. The tests of pg_stat_statements are expanded to cover this area, with patterns involving GROUP BY and GROUPING clauses. Author: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEy2W+tCqC7XuJ94r3ivWsM=onKJp94kRFx3hoARjBeFQ@mail.gmail.com Backpatch-through: 18	2026-01-14 08:44:12 +09:00
Fujii Masao	ad381d0d92	doc: Document DEFAULT option in file_fdw. Commit `9f8377f7a` introduced the DEFAULT option for file_fdw but did not update the documentation. This commit adds the missing description of the DEFAULT option to the file_fdw documentation. Backpatch to v16, where the DEFAULT option was introduced. Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CAOzEurT_PE7QEh5xAdb7Cja84Rur5qPv2Fzt3Tuqi=NU0WJsbg@mail.gmail.com Backpatch-through: 16	2026-01-13 22:54:45 +09:00
Álvaro Herrera	8a47d9ee7f	Fix test_misc/010_index_concurrently_upsert for cache-clobbering builds The test script added by commit `e1c971945d` failed to handle the case of cache-clobbering builds (CLOBBER_CACHE_ALWAYS and CATCACHE_FORCE_RELEASE) properly -- it would only exit a loop on timeout, which is slow, and unfortunate because I (Álvaro) increased the timeout for that loop to the complete default TAP test timeout, causing the buildfarm to report the whole test run as a timeout failure. We can be much quicker: exit the loop as soon as the backend is seen as waiting on the injection point. In this commit we still reduce the timeout (of that loop and a nearby one just to be safe) to half of the default. I (Álvaro) had also changed Mihail's "sleep(1)" to "sleep(0.1)", which apparently turns a 1s sleep into a 0s sleep, because Perl -- probably making this a busy loop. Use Time::HiRes::usleep instead, like we do in other tests. Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CADzfLwWOVyJygX6BFuyuhTKkJ7uw2e8OcVCDnf6iqnOFhMPE%2BA%40mail.gmail.com	2026-01-13 10:03:33 +01:00
John Naylor	94a24b4ee5	Improve some comment wording and grammar in extension.c Noted while looking at reports of grammatical errors. Reported-by: albert tan <alterttan1223@gmail.com> Reported-by: Yuan Li(carol) <carol.li2025@outlook.com> Discussion: https://postgr.es/m/CAEzortnJB7aue6miGT_xU2KLb3okoKgkBe4EzJ6yJ%3DY8LMB7gw%40mail.gmail.com	2026-01-13 12:33:08 +07:00
Jeff Davis	a00a25b6ce	Fix error message typo. Reported-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAEoWx2mMmm9fTZYgE-r_T-KPTFR1rKO029QV-S-6n=7US_9EMA@mail.gmail.com	2026-01-12 19:07:00 -08:00
Andres Freund	0b96e734c5	heapam: Add batch mode mvcc check and use it in page mode There are two reasons for doing so: 1) It is generally faster to perform checks in a batched fashion and making sequential scans faster is nice. 2) We would like to stop setting hint bits while pages are being written out. The necessary locking becomes visible for page mode scans, if done for every tuple. With batching, the overhead can be amortized to only happen once per page. There are substantial further optimization opportunities along these lines: - Right now HeapTupleSatisfiesMVCCBatch() simply uses the single-tuple HeapTupleSatisfiesMVCC(), relying on the compiler to inline it. We could instead write an explicitly optimized version that avoids repeated xid tests. - Introduce batched version of the serializability test - Introduce batched version of HeapTupleSatisfiesVacuum Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/6rgb2nvhyvnszz4ul3wfzlf5rheb2kkwrglthnna7qhe24onwr@vw27225tkyar	2026-01-12 13:22:04 -05:00
Andres Freund	852558b9ec	heapam: Use exclusive lock on old page in CLUSTER To be able to guarantee that we can set the hint bit, acquire an exclusive lock on the old buffer. This is required as a future commit will only allow hint bits to be set with a new lock level, which is acquired as-needed in a non-blocking fashion. We need the hint bits, set in heapam_relation_copy_for_cluster() -> HeapTupleSatisfiesVacuum(), to be set, as otherwise reform_and_rewrite_tuple() -> rewrite_heap_tuple() will get confused. Specifically, rewrite_heap_tuple() checks for HEAP_XMAX_INVALID in the old tuple to determine whether to check the old-to-new mapping hash table. It'd be better if we somehow could avoid setting hint bits on the old page. A common reason to use VACUUM FULL is very bloated tables - rewriting most of the old table during VACUUM FULL doesn't exactly help. Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/4wggb7purufpto6x35fd2kwhasehnzfdy3zdcu47qryubs2hdz@fa5kannykekr	2026-01-12 12:40:13 -05:00
Andres Freund	45f658dacb	freespace: Don't modify page without any lock Before this commit fsm_vacuum_page() modified the page without any lock on the page. Historically that was kind of ok, as we didn't rely on the freespace to really stay consistent and we did not have checksums. But these days pages are checksummed and there are ways for FSM pages to be included in WAL records, even if the FSM itself is still not WAL logged. If a FSM page ever were modified while a WAL record referenced that page, we'd be in trouble, as the WAL CRC could end up getting corrupted. The reason to address this right now is a series of patches with the goal to only allow modifications of pages with an appropriate lock level. Obviously not having any lock is not appropriate :) Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/4wggb7purufpto6x35fd2kwhasehnzfdy3zdcu47qryubs2hdz@fa5kannykekr Discussion: https://postgr.es/m/e6a8f734-2198-4958-a028-aba863d4a204@iki.fi	2026-01-12 12:40:00 -05:00
Álvaro Herrera	225d1df1d2	Stop including {brin,gin}_tuple.h in tuplesort.h Doing this meant that those two headers, which are supposed to be internal to their corresponding index AMs, were being included pretty much universally, because tuplesort.h is included by execnodes.h which is very widely used. Stop that, and fix fallout. We also change indexing.h to no longer include execnodes.h (tuptable.h is sufficient), and relscan.h to no longer include buf.h (pointless since `c2fe139c20`). Author: Mario González <gonzalemario@gmail.com> Discussion: https://postgr.es/m/CAFsReFUcBFup=Ohv_xd7SNQ=e73TXi8YNEkTsFEE2BW7jS1noQ@mail.gmail.com	2026-01-12 18:09:49 +01:00
Jeff Davis	b96a9fd76f	fuzzystrmatch: use pg_ascii_toupper(). fuzzystrmatch is designed for ASCII, so no need to rely on the global LC_CTYPE setting. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/dd0cdd1f-e786-426e-b336-1ffa9b2f1fc6%40eisentraut.org	2026-01-12 08:54:04 -08:00
Álvaro Herrera	2defd00062	Move instrumentation-related structs to instrument_node.h Some structs and enums related to parallel query instrumentation had organically grown scattered across various files, and were causing header pollution especially through execnodes.h. Create a single file where they can live together. This only moves the structs to the new file; cleaning up the pollution by removing no-longer-necessary cross-header inclusion will be done in future commits. Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Co-authored-by: Mario González <gonzalemario@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/202510051642.wwmn4mj77wch@alvherre.pgsql Discussion: https://postgr.es/m/CAFsReFUr4KrQ60z+ck9cRM4WuUw1TCghN7EFwvV0KvuncTRc2w@mail.gmail.com	2026-01-12 16:59:28 +01:00
Peter Eisentraut	c3c240537f	Avoid casting void * function arguments In many cases, the cast would silently drop a const qualifier. To fix, drop the unnecessary cast and let the compiler check the types and qualifiers. Add const to read-only local variables, preserving the const qualifiers from the function signatures. Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Co-authored-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/aUQHy/MmWq7c97wK%40ip-10-97-1-34.eu-west-3.compute.internal	2026-01-12 16:12:56 +01:00
Peter Eisentraut	707f905399	Add const to read only TableInfo pointers in pg_dump Functions that dump table data receive their parameters through const void * but were casting away const. Add const qualifiers to functions that only read the table information. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aUQHy/MmWq7c97wK%40ip-10-97-1-34.eu-west-3.compute.internal	2026-01-12 14:26:26 +01:00
Peter Eisentraut	e39ece0343	Make dmetaphone collation-aware The dmetaphone() SQL function internally upper-cases the argument string. It did this using the toupper() function. That way, it has a dependency on the global LC_CTYPE locale setting, which we want to get rid of. The "double metaphone" algorithm specifically supports the "C with cedilla" letter, so just using ASCII case conversion wouldn't work. To fix that, use the passed-in collation and use the str_toupper() function, which has full awareness of collations and collation providers. Note that this does not change the fact that this function only works correctly with single-byte encodings. The change to str_toupper() makes the case conversion multibyte-enabled, but the rest of the function is still not ready. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://www.postgresql.org/message-id/108e07a2-0632-4f00-984d-fe0e0d0ec726%40eisentraut.org	2026-01-12 08:35:48 +01:00
Nathan Bossart	5d1f5079ab	pg_dump: Fix memory leak in dumpSequenceData(). Oversight in commit `7a485bd641`. Per Coverity. Backpatch-through: 18	2026-01-11 13:52:50 -06:00
Michael Paquier	540c39cc56	doc: Improve description of pg_restore --jobs The parameter name used for the option value was named "number-of-jobs", which was inconsistent with what all the other tools with an option called --jobs use. This commit updates the parameter name to "njobs". Author: Tatsuro Yamada <yamatattsu@gmail.com> Discussion: https://postgr.es/m/CAOKkKFvHqA6Tny0RKkezWVfVV91nPJyj4OGtMi3C1RznDVXqrg@mail.gmail.com	2026-01-11 15:24:02 +09:00
Michael Paquier	1c0f6c3879	Fix some typos across the board Found while browsing the code.	2026-01-11 08:16:46 +09:00
Andres Freund	e5a5e0a907	instrumentation: Keep time fields as instrtime, convert in callers Previously the instrumentation logic always converted to seconds, only for many of the callers to do unnecessary division to get to milliseconds. As an upcoming refactoring will split the Instrumentation struct, utilize instrtime always to keep things simpler. It's also a bit faster to not have to first convert to a double in functions like InstrEndLoop(), InstrAggNode(). Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAP53PkzZ3UotnRrrnXWAv=F4avRq9MQ8zU+bxoN9tpovEu6fGQ@mail.gmail.com	2026-01-09 13:38:00 -05:00
Heikki Linnakangas	bba81f9d3d	Inline ginCompareAttEntries for speed It is called in tight loops during GIN index build. Author: David Geier <geidav.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/5d366878-2007-4d31-861e-19294b7a583b@gmail.com	2026-01-09 20:31:43 +02:00
Jacob Champion	e2aae8d68f	doc: Improve description of publish_via_partition_root Reword publish_via_partition_root's opening paragraph. Describe its behavior more clearly, and directly state that its default is false. Per complaint by Peter Smith; final text of the patch made in collaboration with Chao Li. Author: Chao Li <li.evan.chao@gmail.com> Author: Peter Smith <peter.b.smith@fujitsu.com> Reported-by: Peter Smith <peter.b.smith@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAHut%2BPu7SpK%2BctOYoqYR3V4w5LKc9sCs6c_qotk9uTQJQ4zp6g%40mail.gmail.com Backpatch-through: 14	2026-01-09 10:11:37 -08:00
Tom Lane	7a1d422e39	Improve "constraint must include all partitioning columns" message. This formerly said "unique constraint must ...", which was accurate enough when it only applied to UNIQUE and PRIMARY KEY constraints. However, now we use it for exclusion constraints too, and in that case it's a tad confusing. Do what we already did in the errdetail message: print the constraint_type, so that it looks like "UNIQUE constraint ...", "EXCLUDE constraint ...", etc. Author: jian he <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACJufxH6VhAf65Vghg4T2q315gY=Rt4BUfMyunkfRj0n2S9n-g@mail.gmail.com	2026-01-09 12:59:35 -05:00

1 2 3 4 5 ...

63107 Commits