postgres

mirror of https://github.com/postgres/postgres.git synced 2025-10-16 17:07:43 +03:00

Author	SHA1	Message	Date
Tom Lane	26d1cd375f	Restore test coverage of LZ4Stream_gets(). In commit `a45c78e32` I removed the only regression test case that reaches this function, because it turns out that we only use it if reading an LZ4-compressed blobs.toc file in a directory dump, and that is a state that has to be created manually. That seems like a bad thing to not test, not so much for LZ4Stream_gets() itself as because it means the squirrely eol_flag logic in LZ4Stream_read_internal() is not tested. The reason for the change was that I thought the lz4 program did not have any way to perform compression without explicit specification of the output file name. However, it turns out that the syntax synopsis in its man page is a lie, and if you read enough of the man page you find out that with "-m" it will do what's needful. So restore the manual compression step in that test case. Noted while testing some proposed changes in pg_dump's compression logic. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us Backpatch-through: 17	2025-10-11 16:33:55 -04:00
Álvaro Herrera	3231fd0455	Stop creating constraints during DETACH CONCURRENTLY Commit `71f4c8c6f7` (which implemented DETACH CONCURRENTLY) added code to create a separate table constraint when a table is detached concurrently, identical to the partition constraint, on the theory that such a constraint was needed in case the optimizer had constructed any query plans that depended on the constraint being there. However, that theory was apparently bogus because any such plans would be invalidated. For hash partitioning, those constraints are problematic, because their expressions reference the OID of the parent partitioned table, to which the detached table is no longer related; this causes all sorts of problems (such as inability of restoring a pg_dump of that table, and the table no longer working properly if the partitioned table is later dropped). We'd like to get rid of all those constraints. In fact, for branch master, do that -- no longer create any substitute constraints. However, out of fear that some users might somehow depend on these constraints for other partitioning strategies, for stable branches (back to 14, which added DETACH CONCURRENTLY), only do it for hash partitioning. (If you repeatedly DETACH CONCURRENTLY and then ATTACH a partition, then with this constraint addition you don't need to scan the table in the ATTACH step, which presumably is good. But if users really valued this feature, they would have requested that it worked for non-concurrent DETACH also.) Author: Haiyang Li <mohen.lhy@alibaba-inc.com> Reported-by: Fei Changhong <feichanghong@qq.com> Reported-by: Haiyang Li <mohen.lhy@alibaba-inc.com> Backpatch-through: 14 Discussion: https://postgr.es/m/18371-7fef49f63de13f02@postgresql.org Discussion: https://postgr.es/m/19070-781326347ade7c57@postgresql.org	2025-10-11 20:30:12 +02:00
Álvaro Herrera	ff47f9c16c	dbase_redo: Fix Valgrind-reported memory leak Introduced by my (Álvaro's) commit `9e4f914b5e`, which was itself backpatched to pg10, though only pg15 and up contain the problem because of commit `9c08aea6a3`. This isn't a particularly significant leak, but given the fix is trivial, we might as well backpatch to all branches where it applies, so do that. Author: Nathan Bossart <nathandbossart@gmail.com> Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/x4odfdlrwvsjawscnqsqjpofvauxslw7b4oyvxgt5owoyf4ysn@heafjusodrz7	2025-10-11 16:39:22 +02:00
Peter Geoghegan	843e50208a	Remove overzealous _bt_killitems assertion. An assertion in _bt_killitems expected the scan's currPos state to contain a valid LSN, saved from when currPos's page was initially read. The assertion failed to account for the fact that even logged relations can have leaf pages with an invalid LSN when built with wal_level set to "minimal". Remove the faulty assertion. Oversight in commit `e6eed40e` (though note that the assertion was backpatched to stable branches before 18 by commit `7c319f54`). Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Matthijs van der Vleuten <postgresql@zr40.nl> Bug: #19082 Discussion: https://postgr.es/m/19082-628e62160dbbc1c1@postgresql.org Backpatch-through: 13	2025-10-10 14:52:25 -04:00
Michael Paquier	3a36543d7d	Fix two typos in xlogstats.h and xlogstats.c Issue found while browsing this area of the code, introduced and copy-pasted around by `2258e76f90`. Backpatch-through: 15	2025-10-10 11:51:45 +09:00
Michael Paquier	912af1c7e9	Remove state.tmp when failing to save a replication slot An error happening while a slot data is saved on disk in SaveSlotToPath() could cause a state.tmp file (temporary file holding the slot state data, renamed to its permanent name at the end of the function) to remain around after it has been created. This temporary file is created with O_EXCL, meaning that if an existing state.tmp is found, its creation would fail. This would prevent the slot data to be saved, requiring a manual intervention to remove state.tmp before being able to save again a slot. Possible scenarios where this temporary file could remain on disk is for example a ENOSPC case (no disk space) while writing, syncing or renaming it. The bug reports point to a write failure as the principal cause of the problems. Using O_TRUNC has been argued back in 2019 as a potential solution to discard any temporary file that could exist. This solution was rejected as O_EXCL can also act as a safety measure when saving the slot state, crash recovery offering cleanup guarantees post-crash. This commit uses the alternative approach that has been suggested by Andres Freund back in 2019. When the temporary state file cannot be written, synced, closed or renamed (note: not when created!), an unlink() is used to remove the temporary state file while holding the in-progress I/O LWLock, so as any follow-up attempts to save a slot's data would not choke on an existing file that remained around because of a previous failure. This problem has been reported a few times across the years, going back to 2019, but for some reason I have never come back to do something about it and it has been forgotten. A recent report has reminded me that this was still a problem. Reported-by: Kevin K Biju <kevinkbiju@gmail.com> Reported-by: Sergei Kornilov <sk@zsrv.org> Reported-by: Grigory Smolkin <g.smolkin@postgrespro.ru> Discussion: https://postgr.es/m/CAM45KeHa32soKL_G8Vk38CWvTBeOOXcsxAPAs7Jt7yPRf2mbVA@mail.gmail.com Discussion: https://postgr.es/m/3559061693910326@qy4q4a6esb2lebnz.sas.yp-c.yandex.net Discussion: https://postgr.es/m/08bbfab1-a61d-3750-fc18-4ab2c1aa7f09@postgrespro.ru Backpatch-through: 13	2025-10-10 09:23:59 +09:00
Andres Freund	c819d1017d	bufmgr: Fix valgrind checking for buffers pinned in StrategyGetBuffer() In `5e89985928` I made StrategyGetBuffer() pin buffers with a single CAS, instead of using PinBuffer_Locked(). Unfortunately I missed that PinBuffer_Locked() marked the page as defined for valgrind. Fix this oversight by centralizing the valgrind initialization into TrackNewBufferPin(), which also allows us to reduce the number of places doing VALGRIND_MAKE_MEM_DEFINED. Per buildfarm animal skink and Amit Langote. Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff Discussion: https://postgr.es/m/CA+HiwqGKJ6nEXEPQW7EpykVsEtzxp5-up_xhtcUAkWFtATVQvQ@mail.gmail.com	2025-10-09 19:17:13 -04:00
Michael Paquier	9d46b86529	test_bitmapset: Improve random function test_random_operations() did not check the result returned by bms_is_member() in its last phase, when checking that the contents of the bitmap match with what is expected. This was impacting the reliability of the function and the coverage it could provide. This commit improves the whole function, adding more checks based on bms_is_member(), using a bitmap and a secondary array that tracks the members added by random additions and deletions. While on it, more comments are added to document the internals of the function. Reported-by: Ranier Vilela <ranier.vf@gmail.com> Author: Greg Burd <greg@burd.me> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAEudQAq_zOSA2NUQSWePTGV_=90Uw0WcXxGOWnN-vwF046OOqA@mail.gmail.com	2025-10-10 07:20:03 +09:00
Melanie Plageman	d96f87332b	Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting bits in the VM, specify the VM block changes in the XLOG_HEAP2_MULTI_INSERT record. This halves the number of WAL records emitted by COPY FREEZE. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com	2025-10-09 16:29:01 -04:00
David Rowley	1b073cba49	Cleanup VACUUM option processing error messages The processing of the PARALLEL option for VACUUM was not quite following what the DefElem code had intended. defGetInt32() already has code to handle missing parameters and returns a perfectly good error message for when that happens. Here we get rid of the ExecVacuum() error: ERROR: parallel option requires a value between 0 and N and leave defGetInt32() handle it, which will give: ERROR: parallel requires an integer value defGetInt32() was already handling the non-integer parameter case, so it may as well handle the missing parameter case too. Additionally, parameterize the option name to make translator work easier, and also use errhint_internal() rather than errhint() for the BUFFER_USAGE_LIMIT option since there isn't any work for a translator to do for "%s". Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAApHDvovH14tNWB+WvP6TSbfi7-=TysQ9h5tQ5AgavwyWRWKHA@mail.gmail.com	2025-10-10 09:25:23 +13:00
Tom Lane	89d57c1fb3	Clean up memory leakage that occurs in context callback functions. An error context callback function might leak some memory into ErrorContext, since those functions are run with ErrorContext as current context. In the case where the elevel is ERROR, this is no problem since the code level that catches the error should do FlushErrorState to clean up, and that will reset ErrorContext. However, if the elevel is less than ERROR then no such cleanup occurs. In principle, repeated leaks while emitting log messages or client notices could accumulate arbitrarily much leaked data, if no ERROR occurs in the session. To fix, let errfinish() perform an ErrorContext reset if it is at the outermost error nesting level. (If it isn't, we'll delay cleanup until the outermost nesting level is exited.) The only actual leakage of this sort that I've been able to observe within our regression tests was recently introduced by commit `f727b63e8`. While it seems plausible that there are other such leaks not reached in the regression tests, the lack of field reports suggests that they're not a big problem. Accordingly, I won't take the risk of back-patching this now. We can always back-patch later if we get field reports of leaks. Reported-by: Andres Freund <andres@anarazel.de> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/jngsjonyfscoont4tnwi2qoikatpd5hifsg373vmmjvugwiu6g@m6opxh7uisgd	2025-10-09 15:37:42 -04:00
Masahiko Sawada	b46efe9048	Fix access-to-already-freed-memory issue in pgoutput. While pgoutput caches relation synchronization information in RelationSyncCache that resides in CacheMemoryContext, each entry's information (such as row filter expressions and column lists) is stored in the entry's private memory context (entry_cxt in RelationSyncEntry), which is a descendant memory context of the decoding context. If a logical decoding invoked via SQL functions like pg_logical_slot_get_binary_changes fails with an error, subsequent logical decoding executions could access already-freed memory of the entry's cache, resulting in a crash. With this change, it's ensured that RelationSyncCache is cleaned up even in error cases by using a memory context reset callback function. Backpatch to 15, where entry_cxt was introduced for column filtering and row filtering. While the backbranches v13 and v14 have a similar issue where RelationSyncCache persists even after an error when pgoutput is used via SQL API, we decided not to backport this fix. This decision was made because v13 is approaching its final minor release, and we won't have an chance to fix any new issues that might arise. Additionally, since using pgoutput via SQL API is not a common use case, the risk outwights the benefit. If we receive bug reports, we can consider backporting the fixes then. Author: vignesh C <vignesh21@gmail.com> Co-authored-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Discussion: https://postgr.es/m/CALDaNm0x-aCehgt8Bevs2cm=uhmwS28MvbYq1=s2Ekf0aDPkOA@mail.gmail.com Backpatch-through: 15	2025-10-09 10:59:27 -07:00
Tom Lane	71540dcdcb	Avoid uninitialized-variable warnings from older compilers. Some of the buildfarm is still unhappy with WinGetFuncArgInPartition even after `2273fa32b`. While it seems to be just very old compilers, we can suppress the warnings and arguably make the code more readable by not initializing these variables till closer to where they are used. While at it, make a couple of cosmetic comment improvements.	2025-10-09 10:33:55 -04:00
Richard Guo	36fd8bde1b	Fix comment in eager_aggregate.sql The comment stated that eager aggregation is disabled by default, which is no longer true. This patch removes that comment as well as the related GUC set statement. Reported-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvr4YWpiMR3RsgYwJWv-u8xoRqTAKRiYy9zUszjZOqG4Ug@mail.gmail.com	2025-10-09 17:50:54 +09:00
Richard Guo	f997d777ad	Remove unnecessary include of "utils/fmgroids.h" In initsplan.c, no macros for built-in function OIDs are used, so this include is unnecessary and can be removed. This was my oversight in commit `8e1185910`. Discussion: https://postgr.es/m/CAMbWs4_-sag-cAKrLJ+X+5njL1=oudk=+KfLmsLZ5a2jckn=kg@mail.gmail.com	2025-10-09 17:49:20 +09:00
Michael Paquier	8d02f49696	Remove duplicated log related to slot creation in pg_createsubscriber The creation of a replication slot done in a specific database on a publisher was logged twice, with the second log not mentioning the database where the slot creation happened. This commit removes the information logged after a slot has been successfully created, moving the information about the publisher from the second to the first log. Note that failing a slot creation is also logged, so there is no loss of information. Author: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAHut+Pv7qDvLbDgc9PQGhULT3rPXTxdu_=w+iW-kMs+zPADR+w@mail.gmail.com	2025-10-09 14:02:24 +09:00
Amit Kapila	96b3784973	Add "ALL SEQUENCES" support to publications. This patch adds support for the ALL SEQUENCES clause in publications, enabling synchronization/replication of all sequences that is useful for upgrades. Publications can now include all sequences via FOR ALL SEQUENCES. psql enhancements: \d shows publications for a given sequence. \dRp indicates if a publication includes all sequences. ALL SEQUENCES can be combined with ALL TABLES, but not with other options like TABLE or TABLES IN SCHEMA. We can extend support for more granular clauses in future. The view pg_publication_sequences provides information about the mapping between publications and sequences. This patch enables publishing of sequences; subscriber-side support will be added in upcoming patches. Author: vignesh C <vignesh21@gmail.com> Author: Tomas Vondra <tomas@vondra.me> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com	2025-10-09 03:48:54 +00:00
Amit Langote	ef5e60a9d3	Fix internal error from CollateExpr in SQL/JSON DEFAULT expressions SQL/JSON functions such as JSON_VALUE could fail with "unrecognized node type" errors when a DEFAULT clause contained an explicit COLLATE expression. That happened because assign_collations_walker() could invoke exprSetCollation() on a JsonBehavior expression whose DEFAULT still contained a CollateExpr, which exprSetCollation() does not handle. For example: SELECT JSON_VALUE('{"a":1}', '$.c' RETURNING text DEFAULT 'A' COLLATE "C" ON EMPTY); Fix by validating in transformJsonBehavior() that the DEFAULT expression's collation matches the enclosing JSON expression’s collation. In exprSetCollation(), replace the recursive call on the JsonBehavior expression with an assertion that its collation already matches the target, since the parser now enforces that condition. Reported-by: Jian He <jian.universality@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CACJufxHVwYYSyiVQ6o+PsRX6zQ7rAFinh_fv1kCfTsT1xG4Zeg@mail.gmail.com Backpatch-through: 17	2025-10-09 01:07:59 -04:00
David Rowley	a5a68dd6d5	Make truncate_useless_pathkeys() consider WindowFuncs truncate_useless_pathkeys() seems to have neglected to account for PathKeys that might be useful for WindowClause evaluation. Modify it so that it properly accounts for that. Making this work required adjusting two things: 1. Change from checking query_pathkeys to check sort_pathkeys instead. 2. Add explicit check for window_pathkeys For #1, query_pathkeys gets set in standard_qp_callback() according to the sort order requirements for the first operation to be applied after the join planner is finished, so this changes depending on which upper planner operations a particular query needs. If the query has window functions and no GROUP BY, then query_pathkeys gets set to window_pathkeys. Before this change, this meant PathKeys useful for the ORDER BY were not accounted for in queries with window functions. Because of #1, #2 is now required so that we explicitly check to ensure we don't truncate away PathKeys useful for window functions. Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvrj3HTKmXoLMbUjTO=_MNMxM=cnuCSyBKidAVibmYPnrg@mail.gmail.com	2025-10-09 12:38:33 +13:00
Andres Freund	5e89985928	bufmgr: Don't lock buffer header in StrategyGetBuffer() Previously StrategyGetBuffer() acquired the buffer header spinlock for every buffer, whether it was reusable or not. If reusable, it'd be returned, with the lock held, to GetVictimBuffer(), which then would pin the buffer with PinBuffer_Locked(). That's somewhat violating the spirit of the guidelines for holding spinlocks (i.e. that they are only held for a few lines of consecutive code) and necessitates using PinBuffer_Locked(), which scales worse than PinBuffer() due to holding the spinlock. This alone makes it worth changing the code. However, the main reason to change this is that a future commit will make PinBuffer_Locked() slower (due to making UnlockBufHdr() slower), to gain scalability for the much more common case of pinning a pre-existing buffer. By pinning the buffer with a single atomic operation, iff the buffer is reusable, we avoid any potential regression for miss-heavy workloads. There strictly are fewer atomic operations for each potential buffer after this change. The price for this improvement is that freelist.c needs two CAS loops and needs to be able to set up the resource accounting for pinned buffers. The latter is achieved by exposing a new function for that purpose from bufmgr.c, that seems better than exposing the entire private refcount infrastructure. The improvement seems worth the complexity. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-10-08 17:04:07 -04:00
Andres Freund	3baae90013	bufmgr: fewer calls to BufferDescriptorGetContentLock We're planning to merge buffer content locks into BufferDesc.state. To reduce the size of that patch, centralize calls to BufferDescriptorGetContentLock(). The biggest part of the change is in assertions, by introducing BufferIsLockedByMe[InMode]() (and removing BufferIsExclusiveLocked()). This seems like an improvement even without aforementioned plans. Additionally replace some direct calls to LWLockAcquire() with calls to LockBuffer(). Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-10-08 16:06:19 -04:00
Andres Freund	2a2e1b470b	bufmgr: Fix signedness of mask variable in BufferSync() BM_PERMANENT is defined as 1U<<31, which is a negative number when interpreted as a signed integer. Unfortunately the mask variable in BufferSync() was signed. This has been wrong for a long time, but failed to fail, due to integer conversion rules. However, in an upcoming patch the width of the state variable will be increased, with the wrong signedness leading to never flushing permanent buffers - luckily caught in a test. It seems better to fix this separately, instead of doing so as part of a large, otherwise mechanical, patch. Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-10-08 14:34:30 -04:00
Andres Freund	3c2b97b29e	bufmgr: Introduce FlushUnlockedBuffer There were several copies of code locking a buffer, flushing its contents, and unlocking the buffer. It seems worth centralizing that into a helper function. Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-10-08 14:34:30 -04:00
Andres Freund	819dc118c0	Improve ReadRecentBuffer() scalability While testing a new potential use for ReadRecentBuffer(), Andres reported that it scales badly when called concurrently for the same buffer by many backends. Instead of a naive (but wrong) coding with PinBuffer(), it used the spinlock, so that it could be careful to pin only if the buffer was valid and holding the expected block, to avoid breaking invariants in eg GetVictimBuffer(). Unfortunately that made it less scalable than PinBuffer(), which uses compare-exchange instead. We can fix that by giving PinBuffer() a new skip_if_not_valid mode that doesn't pin invalid buffers. It might occasionally skip when it shouldn't due to the unlocked read of the header flags, but that's unlikely and perfectly acceptable for an opportunistic optimisation routine, and it can only succeed when it really should due to the compare-exchange loop. Note that this fixes ReadRecentBuffer()'s failure to bump the usage count. While this could be seen as a bug, there currently aren't cases affected by this in core, so it doesn't seem worth backpatching that portion. Author: Thomas Munro <thomas.munro@gmail.com> Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/20230627020546.t6z4tntmj7wmjrfh%40awork3.anarazel.de Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-10-08 13:10:40 -04:00
Masahiko Sawada	d3b6183dd9	Add mem_exceeded_count column to pg_stat_replication_slots. This commit introduces a new column mem_exceeded_count to the pg_stat_replication_slots view. This counter tracks how often the memory used by logical decoding exceeds the logical_decoding_work_mem limit. The new statistic helps users determine whether exceeding the logical_decoding_work_mem limit is a rare occurrences or a frequent issue, information that wasn't available through existing statistics. Bumps catversion. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/978D21E8-9D3B-40EA-A4B1-F87BABE7868C@yesql.se	2025-10-08 10:05:04 -07:00
Tom Lane	14ad0d7bf2	Cleanup NAN code in float.h, too. In the same spirit as `3bf905692`, assume that all compilers we still support provide the NAN macro, and get rid of workarounds for that. The C standard allows implementations to omit NAN if the underlying float arithmetic lacks quiet (non-signaling) NaNs. However, we've required that feature for years: the workarounds only supported lack of the macro, not lack of the functionality. I put in a compile-time #error if there's no macro, just for clarity. Also fix up the copies of these functions in ecpglib, and leave a breadcrumb for the next hacker who touches them. History of the hacks being removed here can be found in commits `1bc2d544b`, `4d17a2146`, `cec8394b5`. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1952095.1759764279@sss.pgh.pa.us	2025-10-08 12:19:53 -04:00
Robert Haas	4685977cc5	Add extension_state member to PlannedStmt. Extensions can stash data computed at plan time into this list using planner_shutdown_hook (or perhaps other mechanisms) and then access it from any code that has access to the PlannedStmt (such as explain hooks), allowing for extensible debugging and instrumentation of plans. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com	2025-10-08 09:07:49 -04:00
Robert Haas	94f3ad3961	Add planner_setup_hook and planner_shutdown_hook. These hooks allow plugins to get control at the earliest point at which the PlannerGlobal object is fully initialized, and then just before it gets destroyed. This is useful in combination with the extendable plan state facilities (see extendplan.h) and perhaps for other purposes as well. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com	2025-10-08 09:05:38 -04:00
Robert Haas	c83ac02ec7	Add ExplainState argument to pg_plan_query() and planner(). This allows extensions to have access to any data they've stored in the ExplainState during planning. Unfortunately, it won't help with EXPLAIN EXECUTE is used, but since that case is less common, this still seems like an improvement. Since planner() has quite a few arguments now, also add some documentation of those arguments and the return value. Author: Robert Haas <rhaas@postgresql.org> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com	2025-10-08 08:33:29 -04:00
Richard Guo	8e11859102	Implement Eager Aggregation Eager aggregation is a query optimization technique that partially pushes aggregation past a join, and finalizes it once all the relations are joined. Eager aggregation may reduce the number of input rows to the join and thus could result in a better overall plan. In the current planner architecture, the separation between the scan/join planning phase and the post-scan/join phase means that aggregation steps are not visible when constructing the join tree, limiting the planner's ability to exploit aggregation-aware optimizations. To implement eager aggregation, we collect information about aggregate functions in the targetlist and HAVING clause, along with grouping expressions from the GROUP BY clause, and store it in the PlannerInfo node. During the scan/join planning phase, this information is used to evaluate each base or join relation to determine whether eager aggregation can be applied. If applicable, we create a separate RelOptInfo, referred to as a grouped relation, to represent the partially-aggregated version of the relation and generate grouped paths for it. Grouped relation paths can be generated in two ways. The first method involves adding sorted and hashed partial aggregation paths on top of the non-grouped paths. To limit planning time, we only consider the cheapest or suitably-sorted non-grouped paths in this step. Alternatively, grouped paths can be generated by joining a grouped relation with a non-grouped relation. Joining two grouped relations is currently not supported. To further limit planning time, we currently adopt a strategy where partial aggregation is pushed only to the lowest feasible level in the join tree where it provides a significant reduction in row count. This strategy also helps ensure that all grouped paths for the same grouped relation produce the same set of rows, which is important to support a fundamental assumption of the planner. For the partial aggregation that is pushed down to a non-aggregated relation, we need to consider all expressions from this relation that are involved in upper join clauses and include them in the grouping keys, using compatible operators. This is essential to ensure that an aggregated row from the partial aggregation matches the other side of the join if and only if each row in the partial group does. This ensures that all rows within the same partial group share the same "destiny", which is crucial for maintaining correctness. One restriction is that we cannot push partial aggregation down to a relation that is in the nullable side of an outer join, because the NULL-extended rows produced by the outer join would not be available when we perform the partial aggregation, while with a non-eager-aggregation plan these rows are available for the top-level aggregation. Pushing partial aggregation in this case may result in the rows being grouped differently than expected, or produce incorrect values from the aggregate functions. If we have generated a grouped relation for the topmost join relation, we finalize its paths at the end. The final paths will compete in the usual way with paths built from regular planning. The patch was originally proposed by Antonin Houska in 2017. This commit reworks various important aspects and rewrites most of the current code. However, the original patch and reviews were very useful. Author: Richard Guo <guofenglinux@gmail.com> Author: Antonin Houska <ah@cybertec.at> (in an older version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version) Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version) Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version) Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com	2025-10-08 17:04:23 +09:00
Richard Guo	185e304263	Allow negative aggtransspace to indicate unbounded state size This patch reuses the existing aggtransspace in pg_aggregate to signal that an aggregate's transition state can grow unboundedly. If aggtransspace is set to a negative value, it now indicates that the transition state may consume unpredictable or large amounts of memory, such as in aggregates like array_agg or string_agg that accumulate input rows. This information can be used by the planner to avoid applying memory-sensitive optimizations (e.g., eager aggregation) when there is a risk of excessive memory usage during partial aggregation. Bump catalog version. Per idea from Robert Haas, though applied differently than originally suggested. Discussion: https://postgr.es/m/CA+TgmoYbkvYwLa+1vOP7RDY7kO2=A7rppoPusoRXe44VDOGBPg@mail.gmail.com	2025-10-08 17:01:48 +09:00
Michael Paquier	138da727a1	Improve description of some WAL records for GIN The following information is added in the description of some GIN records: - In INSERT_LISTPAGE, the number of tuples and the right link block. - In UPDATE_META_PAGE, the number of tuples, the previous tail block, and the right link block. - In SPLIT, the left and right children blocks. Author: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/CALdSSPgnAt5L=D_xGXRXLYO5FK1H31_eYEESxdU1n-r4g+6GqA@mail.gmail.com	2025-10-08 14:02:26 +09:00
Michael Paquier	b71bae41a0	Add stats_reset to pg_stat_user_functions It is possible to call pg_stat_reset_single_function_counters() for a single function, but the reset time was missing the system view showing its statistics. Like all the fields of pg_stat_user_functions, the GUC track_functions needs to be enabled to show the statistics about function executions. Bump catalog version. Bump PGSTAT_FILE_FORMAT_ID, as a result of the new field added to PgStat_StatFuncEntry. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aONjnsaJSx-nEdfU@paquier.xyz	2025-10-08 12:43:40 +09:00
Amit Kapila	035b09131d	Fix typo in function header comment. Reported-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CA+TgmoZYh_nw-2j_Fi9y6ZAvrpN+W1aSOFNM7Rus2Q-zTkCsQw@mail.gmail.com	2025-10-08 03:17:05 +00:00
Tatsuo Ishii	2273fa32bc	Fix Coverity issues reported in commit `25a30bbd42`. Fix several issues pointed out by Coverity (reported by Tome Lane). - In row_is_in_frame(), return value of window_gettupleslot() was not checked. - WinGetFuncArgInPartition() tried to derefference "isout" pointer even if it could be NULL in some places. Besides the issues, I also fixed a compiler warning reported by Álvaro Herrera. Moreover, in WinGetFuncArgInPartition refactor the do...while loop so that the codes inside the loop simpler. Also simplify the case when abs_pos < 0. Author: Tatsuo Ishii <ishii@postgresql.org> Reviewed-by: Paul Ramsey <pramsey@cleverelephant.ca> Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/1686755.1759679957%40sss.pgh.pa.us Discussion: https://postgr.es/m/202510051612.gw67jlc2iqpw%40alvherre.pgsql	2025-10-08 09:26:49 +09:00
David Rowley	3bf905692c	Cleanup INFINITY code in float.h The INFINITY macro is always defined per C99 standard, so this should mean we can now get rid of the workaround code for when that macro isn't defined. Also, delete the (now unneeded) #pragma code which was disabling a compiler warning in MSVC. There was a comment explaining why the #pragma was placed outside the function body to work around a MSVC compiler bug, but the link explaining that was dead, as reported by jian he. Author: David Rowley <dgrowleyml@gmail.com> Reported-by: jian he <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACJufxGARYETnNwtCK7QC0zE_7gq-tfN0mME=gT5rTNtC=VSHQ@mail.gmail.com	2025-10-08 12:07:17 +13:00
Robert Haas	64095d1574	Remove PlannerInfo's join_search_private method. Instead, use the new mechanism that allows planner extensions to store private state inside a PlannerInfo, treating GEQO as an in-core planner extension. This is a useful test of the new facility, and also buys back a few bytes of storage. To make this work, we must remove innerrel_is_unique_ext's hack of testing whether join_search_private is set as a proxy for whether the join search might be retried. Add a flag that extensions can use to explicitly signal their intentions instead. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com	2025-10-07 12:43:45 -04:00
Robert Haas	0132dddab3	Allow private state in certain planner data structures. Extension that make extensive use of planner hooks may want to coordinate their efforts, for example to avoid duplicate computation, but that's currently difficult because there's no really good way to pass data between different hooks. To make that easier, allow for storage of extension-managed private state in PlannerGlobal, PlannerInfo, and RelOptInfo, along very similar lines to what we have permitted for ExplainState since commit `c65bc2e1d1`. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com	2025-10-07 12:09:30 -04:00
Tom Lane	afd532c3a8	Adjust new TAP test to work on macOS. Seems Apple's version of "wc -l" puts spaces before the number. (I wonder why the cfbot didn't find this.) While here, make the failure case log what it got, to aid debugging future issues. Per buildfarm.	2025-10-07 11:47:27 -04:00
Tom Lane	27da1a796f	Improve psql's ability to select pager mode accurately. We try to use the pager only when more than a screenful's worth of data is to be printed. However, the code in print.c that's concerned with counting the number of lines that will be needed missed a lot of edge cases: * While plain aligned mode accounted for embedded newlines in column headers and table cells, unaligned and vertical output modes did not. * In particular, since vertical mode repeats the headers for each record, we need to account for embedded newlines in the headers for each record. * Multi-line table titles were not accounted for. * tuples_only mode (where headers aren't printed) wasn't accounted for. * Footers were accounted for as one line per footer, again missing the possibility of multi-line footers. (In some cases such as "\d+" on a view, there can be many lines in a footer.) Also, we failed to account for the default footer. To fix, move the entire responsibility for counting lines into IsPagerNeeded (or actually, into a new subroutine count_table_lines), and then expand the logic as appropriate. Also restructure to make it perhaps a bit easier to follow. It's still only completely accurate for ALIGNED/WRAPPED/UNALIGNED formats, but the other formats are not typically used with interactive output. Arrange to not run count_table_lines at all unless we will use its result, and teach it to quit early as soon as it's proven that the output is long enough to require use of the pager. When dealing with large tables this should save a noticeable amount of time, since pg_wcssize() isn't exactly cheap. In passing, move the "flog" output step to the bottom of printTable(), rather than running it when we've already opened the pager in some modes. In principle it shouldn't interfere with the pager because flog should always point to a non-interactive file; but it seems silly to risk any interference, especially when the existing positioning seems to have been chosen with the aid of a dartboard. Also add a TAP test to exercise pager mode. Up to now, we have had zero test coverage of these code paths, because they aren't reached unless isatty(stdout). We do have the test infrastructure to improve that situation, though. Following the lead of 010_tab_completion.pl, set up an interactive psql and feed it some test cases. To detect whether it really did invoke the pager, point PSQL_PAGER to "wc -l". The test is skipped if that utility isn't available. Author: Erik Wienhold <ewie@ewie.name> Test-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/2dd2430f-dd20-4c89-97fd-242616a3d768@ewie.name	2025-10-07 10:57:56 -04:00
Robert Haas	8c49a484e8	Assign each subquery a unique name prior to planning it. Previously, subqueries were given names only after they were planned, which makes it difficult to use information from a previous execution of the query to guide future planning. If, for example, you knew something about how you want "InitPlan 2" to be planned, you won't know whether the subquery you're currently planning will end up being "InitPlan 2" until after you've finished planning it, by which point it's too late to use the information that you had. To fix this, assign each subplan a unique name before we begin planning it. To improve consistency, use textual names for all subplans, rather than, as we did previously, a mix of numbers (such as "InitPlan 1") and names (such as "CTE foo"), and make sure that the same name is never assigned more than once. We adopt the somewhat arbitrary convention of using the type of sublink to set the plan name; for example, a query that previously had two expression sublinks shown as InitPlan 2 and InitPlan 1 will now end up named expr_1 and expr_2. Because names are assigned before rather than after planning, some of the regression test outputs show the numerical part of the name switching positions: what was previously SubPlan 2 was actually the first one encountered, but we finished planning it later. We assign names even to subqueries that aren't shown as such within the EXPLAIN output. These include subqueries that are a FROM clause item or a branch of a set operation, rather than something that will be turned into an InitPlan or SubPlan. The purpose of this is to make sure that, below the topmost query level, there's always a name for each subquery that is stable from one planning cycle to the next (assuming no changes to the query or the database schema). Author: Robert Haas <rhaas@postgresql.org> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: http://postgr.es/m/3641043.1758751399@sss.pgh.pa.us	2025-10-07 09:18:54 -04:00
Daniel Gustafsson	8c2d5d4f11	doc: Add missing parenthesis in pg_stat_progress_analyze docs Author: Shinya Kato <shinya11.kato@gmail.com> Discussion: https://postgr.es/m/CAOzEurRgpAh9dsbEM88FPOhNaV_PkdL6p_9MJatcrNf9wXw1nw@mail.gmail.com Backpatch-through: 18	2025-10-07 15:02:20 +02:00
Álvaro Herrera	c53775185d	Fix compile of src/tutorial/funcs.c I broke this with recent #include removals. Fix by adding an explicit Reported-by: Devrim Gündüz <devrim@gunduz.org> Discussion: https://postgr.es/m/5e2c2d7c44434f3f0af7523864b27fe4fb590902.camel@gunduz.org	2025-10-07 10:45:57 +02:00
David Rowley	9c9d41af4d	Teach planner to short-circuit EXCEPT/INTERSECT with dummy inputs When either inputs of an INTERSECT [ALL] operator are proven not to return any results (a dummy rel), then mark the entire INTERSECT operation as dummy. Likewise, if an EXCEPT [ALL] operation's left input is proven empty, then mark the entire operation as dummy. With EXCEPT ALL, we can easily handle the right input being dummy as we can return the left input without any processing. That can lead to significant performance gains during query execution. We can't easily handle dummy right inputs for EXCEPT (without ALL), as that would require deduplication of the left input. Wiring up those Paths is likely more complex than it's worth as the gains during execution aren't that great, so let's leave that one to be handled by the normal Path generation code. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAApHDvri53PPF76c3M94_QNWbJfXjyCnjXuj_2=LYM-0m8WZtw@mail.gmail.com	2025-10-07 17:17:52 +13:00
David Rowley	928df067d1	Fix incorrect targetlist in dummy UNIONs The prior code, added in `03d40e4b5` attempted to use the targetlist of the first UNION child when all UNION children were proven as dummy rels. That's not going to work when some operation atop of the Result node must find target entries within the Result's targetlist. This could have been something as simple as trying to sort the results of the UNION operation, which would lead to: ERROR: could not find pathkey item to sort Instead, use the top-level UNION's targetlist and fix the varnos in setrefs.c. Because set operation targetlists always use varno==0, we can rewrite those to become varno==1, i.e. use the Vars from the first UNION child. This does result in showing Vars from relations that are not present in the final plan, but that's no different to what we see when normal base relations are proven dummy. Without this fix it would be possible to see the following error in EXPLAIN VERBOSE when all UNION inputs were proven empty. ERROR: bogus varno: 0 Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvrUASy9sfULMEsM2udvZJP6AoBRCZvHYXYxZTy2tX9FYw@mail.gmail.com	2025-10-07 14:15:04 +13:00
Masahiko Sawada	771cfe22a0	Avoid unnecessary GinFormTuple() calls for incompressible posting lists. Previously, we attempted to form a posting list tuple even when ginCompressPostingList() failed to compress the posting list due to its size. While there was no functional failure, it always wasted one GinFormTuple() call when item pointers didn't fit in a posting list tuple. This commit ensures that a GIN index tuple is formed only when all item pointers in the posting list are successfully compressed. Author: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAE7r3M+C=jcpTD93f_RBHrQp3C+=TAXFs+k4tTuZuuxboK8AvA@mail.gmail.com	2025-10-06 14:02:01 -07:00
Nathan Bossart	ec8719ccbf	Optimize hex_encode() and hex_decode() using SIMD. The hex_encode() and hex_decode() functions serve as the workhorses for hexadecimal data for bytea's text format conversion functions, and some workloads are sensitive to their performance. This commit adds new implementations that use routines from port/simd.h, which testing indicates are much faster for larger inputs. For small or invalid inputs, we fall back on the existing scalar versions. Since we are using port/simd.h, these optimizations apply to both x86-64 and AArch64. Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Chiranmoy Bhattacharya <chiranmoy.bhattacharya@fujitsu.com> Co-authored-by: Susmitha Devanga <devanga.susmitha@fujitsu.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/aLhVWTRy0QPbW2tl%40nathan	2025-10-06 12:28:50 -05:00
Andrew Dunstan	5b5e8a29c1	Revert "Improve docs syntax checking" This reverts commit `b292256272`. Further discussion is needed Discussion: https://postgr.es/m/0198ec0f-0269-4cf4-b4a7-22c05b3047cb@eisentraut.org	2025-10-06 07:53:31 -04:00
Amit Kapila	b93172ca59	Expose sequence page LSN via pg_get_sequence_data. This patch enhances the pg_get_sequence_data function to include the page-level LSN (Log Sequence Number) of the sequence. This additional metadata will be used by upcoming patches to support synchronization of sequences during logical replication. By exposing the LSN, we enable more accurate tracking of sequence changes, which is essential for maintaining consistency across replicated nodes. Author: vignesh C <vignesh21@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://www.postgresql.org/message-id/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com	2025-10-06 08:30:16 +00:00
Michael Paquier	42c6b74d89	Add comment in ginxlog.h about block used with ginxlogInsertListPage All the other structures describe the list of blocks used, and in the case of a GIN_INSERT_LISTPAGE record block 0 refers to a list page with the items added to it. Author: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/CALdSSPgk=9WRoXhZy5fdk+T1hiau7qbL_vn94w_L1N=gtEdbsg@mail.gmail.com	2025-10-06 16:23:51 +09:00

1 2 3 4 5 ...

62293 Commits