postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-22 12:22:45 +03:00

Author	SHA1	Message	Date
Álvaro Herrera	53af9491a0	Restructure foreign key handling code for ATTACH/DETACH ... to fix bugs when the referenced table is partitioned. The catalog representation we chose for foreign keys connecting partitioned tables (in commit `f56f8f8da6`) is inconvenient, in the sense that a standalone table has a different way to represent the constraint when referencing a partitioned table, than when the same table becomes a partition (and vice versa). Because of this, we need to create additional catalog rows on detach (pg_constraint and pg_trigger), and remove them on attach. We were doing some of those things, but not all of them, leading to missing catalog rows in certain cases. The worst problem seems to be that we are missing action triggers after detaching a partition, which means that you could update/delete rows from the referenced partitioned table that still had referencing rows on that table, the server failing to throw the required errors. !!! Note that this means existing databases with FKs that reference partitioned tables might have rows that break relational integrity, on tables that were once partitions on the referencing side of the FK. Another possible problem is that trying to reattach a table that had been detached would fail indicating that internal triggers cannot be found, which from the user's point of view is nonsensical. In branches 15 and above, we fix this by creating a new helper function addFkConstraint() which is in charge of creating a standalone pg_constraint row, and repurposing addFkRecurseReferencing() and addFkRecurseReferenced() so that they're only the recursive routine for each side of the FK, and they call addFkConstraint() to create pg_constraint at each partitioning level and add the necessary triggers. These new routines can be used during partition creation, partition attach and detach, and foreign key creation. This reduces redundant code and simplifies the flow. In branches 14 and 13, we have a much simpler fix that consists on simply removing the constraint on detach. The reason is that those branches are missing commit `f4566345cf`, which reworked the way this works in a way that we didn't consider back-patchable at the time. We opted to leave branch 12 alone, because it's different from branch 13 enough that the fix doesn't apply; and because it is going in EOL mode very soon, patching it now might be worse since there's no way to undo the damage if it goes wrong. Existing databases might need to be repaired. In the future we might want to rethink the catalog representation to avoid this problem, but for now the code seems to do what's required to make the constraints operate correctly. Co-authored-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> Co-authored-by: Tender Wang <tndrwang@gmail.com> Co-authored-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Guillaume Lelarge <guillaume@lelarge.info> Reported-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> Reported-by: Thomas Baehler (SBB CFF FFS) <thomas.baehler2@sbb.ch> Discussion: https://postgr.es/m/20230420144344.40744130@karst Discussion: https://postgr.es/m/20230705233028.2f554f73@karst Discussion: https://postgr.es/m/GVAP278MB02787E7134FD691861635A8BC9032@GVAP278MB0278.CHEP278.PROD.OUTLOOK.COM Discussion: https://postgr.es/m/18541-628a61bc267cd2d3@postgresql.org	2024-10-22 16:01:18 +02:00
Tom Lane	68ad9816c1	Fix wrong assertion and poor error messages in "COPY (query) TO". If the query is rewritten into a NOTIFY command by a DO INSTEAD rule, we'd get an assertion failure, or in non-assert builds issue a rather confusing error message. Improve that. Also fix a longstanding grammar mistake in a nearby error message. Per bug #18664 from Alexander Lakhin. Back-patch to all supported branches. Tender Wang and Tom Lane Discussion: https://postgr.es/m/18664-ffd0ebc2386598df@postgresql.org	2024-10-21 15:08:22 -04:00
Heikki Linnakangas	3c7d78427e	Update outdated comment on WAL-logged locks with invalid XID We haven't generated those for a long time. Discussion: https://www.postgresql.org/message-id/b439edfc-c5e5-43a9-802d-4cb51ec20646@iki.fi	2024-10-21 14:28:43 +03:00
Heikki Linnakangas	1a43de5e0a	Fix race condition in committing a serializable transaction The finished transaction list can contain XIDs that are older than the serializable global xmin. It's a short-lived state; ClearOldPredicateLocks() removes any such transactions from the list, and it's called whenever the global xmin advances. But if another backend calls SummarizeOldestCommittedSxact() in that window, it will call SerialAdd() on an XID that's older than the global xmin, or if there are no more transactions running, when global xmin is invalid. That trips the assertion in SerialAdd(). Fixes bug #18658 reported by Andrew Bille. Thanks to Alexander Lakhin for analysis. Backpatch to all versions. Discussion: https://www.postgresql.org/message-id/18658-7dab125ec688c70b%40postgresql.org	2024-10-21 09:49:21 +03:00
Michael Paquier	57a36e890d	Fix grammar of a comment in bufmgr.c Author: Junwang Zhao Discussion: https://postgr.es/m/CAEG8a3L5YjxXCjx0LhkwHdDGsNgpFGEqH7SqtXRPNP+dwFMVZQ@mail.gmail.com	2024-10-21 11:25:29 +09:00
Amit Langote	11c87216d1	SQL/JSON: Fix some oversights in commit `b6e1157e7` The decision in `b6e1157e7` to ignore raw_expr when evaluating a JsonValueExpr was incorrect. While its value is not ultimately used (since formatted_expr's value is), failing to initialize it can lead to problems, for instance, when the expression tree in raw_expr contains Aggref nodes, which must be initialized to ensure the parent Agg node works correctly. Also, optimize eval_const_expressions_mutator()'s handling of JsonValueExpr a bit. Currently, when formatted_expr cannot be folded into a constant, we end up processing it twice -- once directly in eval_const_expressions_mutator() and again recursively via ece_generic_processing(). This recursive processing is required to handle raw_expr. To avoid the redundant processing of formatted_expr, we now process raw_expr directly in eval_const_expressions_mutator(). Finally, update the comment of JsonValueExpr to describe the roles of raw_expr and formatted_expr more clearly. Bug: #18657 Reported-by: Alexander Lakhin <exclusion@gmail.com> Diagnosed-by: Fabio R. Sluzala <fabio3rs@gmail.com> Diagnosed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18657-1b90ccce2b16bdb8@postgresql.org Backpatch-through: 16	2024-10-20 12:20:55 +09:00
Jeff Davis	b391d882ff	Allow pg_set_relation_stats() to set relpages to -1. While the default value for relpages is 0, if a partitioned table with at least one child has been analyzed, then the partititoned table will have a relpages value of -1. Author: Corey Huinker Discussion: https://postgr.es/m/CADkLM=fajh1Lpcyr_XsMmq-9Z=SGk-u+_Zeac7Pt0RAN3uiVCg@mail.gmail.com	2024-10-18 10:44:15 -07:00
Peter Geoghegan	1bd4bc85ca	Optimize nbtree backwards scans. Make nbtree backwards scans optimistically access the next page to be read to the left by following a prevPage block number that's now stashed in currPos when the leaf page is first read. This approach matches the one taken during forward scans, which follow a symmetric nextPage block number from currPos. We stash both a prevPage and a nextPage, since the scan direction might change (when fetching from a scrollable cursor). Backwards scans will no longer need to lock the same page twice, except in rare cases where the scan detects a concurrent page split (or page deletion). Testing has shown this optimization to be particularly effective during parallel index-only backwards scans: ~12% reductions in query execution time are quite possible. We're much better off being optimistic; concurrent left sibling page splits are rare in general. It's possible that we'll need to lock more pages than the pessimistic approach would have, but only when there are _multiple_ concurrent splits of the left sibling page we now start at. If there's just a single concurrent left sibling page split, the new approach to scanning backwards will at least break even relative to the old one (we'll acquire the same number of leaf page locks as before). The optimization from this commit has long been contemplated by comments added by commit `2ed5b87f96`, which changed the rules for locking/pinning during nbtree index scans. The approach that that commit introduced to leaf level link traversal when scanning forwards is now more or less applied all the time, regardless of the direction we're scanning in. Following uniform conventions around sibling link traversal is simpler. The only real remaining difference between our forward and backwards handling is that our backwards handling must still detect and recover from any concurrent left sibling splits (and concurrent page deletions), as documented in the nbtree README. That is structured as a single, isolated extra step that takes place in _bt_readnextpage. Also use this opportunity to further simplify the functions that deal with reading pages and traversing sibling links on the leaf level, and to document their preconditions and postconditions (with respect to things like buffer locks, buffer pins, and seizing the parallel scan). This enhancement completely supersedes the one recently added by commit `3f44959f`. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAEze2WgpBGRgTTxTWVPXc9+PB6fc1a7t+VyGXHzfnrFXcQVxnA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkBTuFv7W2+84jJT8mWZLXVL0GHq2hMUTn6c9Vw=eYrCw@mail.gmail.com	2024-10-18 11:25:32 -04:00
Peter Eisentraut	24a36f91e3	Fix strsep() use for SCRAM secrets parsing The previous code (from commit `5d2e1cc117`) did not detect end of string correctly, so it would fail to error out if fewer than the expected number of fields were present, which could then later lead to a crash when NULL string pointers are accessed. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reported-by: Ranier Vilela <ranier.vf@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org	2024-10-18 11:15:54 +02:00
Fujii Masao	9272bdeac8	Remove unused code for unlogged materialized views. Commit `3bf3ab8c56` initially introduced support for unlogged materialized views, but this was later disallowed by commit `3223b25ff7`. Additionally, commit `d25f519107` added more code for handling unlogged materialized views. This commit cleans up all unused code related to them. If unlogged materialized views had been supported in any official release, psql would need to retain code to handle them for compatibility with older servers. However, since they were never included in an official release, this code is no longer necessary. Author: Pixian Shi Reviewed-by: Yugo Nagata, Fujii Masao Discussion: https://postgr.es/m/CAAccyYKRZ=OvAvgowiSH+OELbStLP=p2Ht=R3CgT=OaNSH5DAA@mail.gmail.com	2024-10-18 17:18:57 +09:00
Jeff Davis	eecd9138a0	Improve ThrowErrorData() comments for use with soft errors. Reviewed-by: Corey Huinker Discussion: https://postgr.es/m/901ab7cf01957f92ea8b30b6feeb0eacfb7505fc.camel@j-davis.com	2024-10-17 14:56:44 -07:00
Thomas Munro	98c7c7152d	Fix extreme skew detection in Parallel Hash Join. After repartitioning the inner side of a hash join that would have exceeded the allowed size, we check if all the tuples from a parent partition moved to one child partition. That is evidence that it contains duplicate keys and later attempts to repartition will also fail, so we should give up trying to limit memory (for lack of a better fallback strategy). A thinko prevented the check from working correctly in partition 0 (the one that is partially loaded into memory already). After repartitioning, we should check for extreme skew if the parent partition's space_exhausted flag was set, not the child partition's. The consequence was repeated futile repartitioning until per-partition data exceeded various limits including "ERROR: invalid DSA memory alloc request size 1811939328", OS allocation failure, or temporary disk space errors. (We could also do something about some of those symptoms, but that's material for separate patches.) This problem only became likely when PostgreSQL 16 introduced support for Parallel Hash Right/Full Join, allowing NULL keys into the hash table. Repartitioning always leaves NULL in partition 0, no matter how many times you do it, because the hash value is all zero bits. That's unlikely for other hashed values, but they might still have caused wasted extra effort before giving up. Back-patch to all supported releases. Reported-by: Craig Milhiser <craig@milhiser.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Discussion: https://postgr.es/m/CA%2BwnhO1OfgXbmXgC4fv_uu%3DOxcDQuHvfoQ4k0DFeB0Qqd-X-rQ%40mail.gmail.com	2024-10-17 22:11:59 +13:00
Peter Eisentraut	665785d85f	Fix unnecessary casts of copyObject() result The result is already of the correct type, so these casts don't do anything. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/637eeea8-5663-460b-a114-39572c0f6c6e%40eisentraut.org	2024-10-17 08:36:48 +02:00
Peter Eisentraut	eafda78fc4	Improve node type forward reference Instead of using Node *, we can use an incomplete struct. That way, everything has the correct type and fewer casts are required. This technique is already used elsewhere in node type definitions. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/637eeea8-5663-460b-a114-39572c0f6c6e%40eisentraut.org	2024-10-17 08:36:48 +02:00
David Rowley	9ca67658d1	Don't store intermediate hash values in ExprState->resvalue `adf97c156` made it so ExprStates could support hashing and changed Hash Join to use that instead of manually extracting Datums from tuples and hashing them one column at a time. When hashing multiple columns or expressions, the code added in that commit stored the intermediate hash value in the ExprState's resvalue field. That was a mistake as steps may be injected into the ExprState between each hashing step that look at or overwrite the stored intermediate hash value. EEOP_PARAM_SET is an example of such a step. Here we fix this by adding a new dedicated field for storing intermediate hash values and adjust the code so that all apart from the final hashing step store their result in the intermediate field. In passing, rename a variable so that it's more aligned to the surrounding code and also so a few lines stay within the 80 char margin. Reported-by: Andres Freund Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru> Discussion: https://postgr.es/m/CAApHDvqo9eenEFXND5zZ9JxO_k4eTA4jKMGxSyjdTrsmYvnmZw@mail.gmail.com	2024-10-17 14:25:08 +13:00
Michael Paquier	089aac631b	Fix validation of COPY FORCE_NOT_NULL/FORCE_NULL for the all-column cases This commit adds missing checks for COPY FORCE_NOT_NULL and FORCE_NULL when applied to all columns via "*". These options now correctly require CSV mode and are disallowed in COPY TO, making their behavior consistent with FORCE_QUOTE. Some regression tests are added to verify the correct behavior for the all-columns case, including FORCE_QUOTE, which was not tested. Backpatch down to 17, where support for the all-column grammar with FORCE_NOT_NULL and FORCE_NULL has been added. Author: Joel Jacobson Reviewed-by: Zhang Mingli Discussion: https://postgr.es/m/65030d1d-5f90-4fa4-92eb-f5f50389858e@app.fastmail.com Backpatch-through: 17	2024-10-17 08:44:50 +09:00
Peter Geoghegan	c0490b0ef7	nbtree: fix read page recheck typo. Oversight in commit `79fa7b3b`.	2024-10-16 17:38:38 -04:00
Tom Lane	c96de42c4b	Further refine _SPI_execute_plan's rule for atomic execution. Commit `2dc1deaea` turns out to have been still a brick shy of a load, because CALL statements executing within a plpgsql exception block could still pass the wrong snapshot to stable functions within the CALL's argument list. That happened because standard_ProcessUtility forces isAtomicContext to true if IsTransactionBlock is true, which it always will be inside a subtransaction. Then ExecuteCallStmt would think it does not need to push a new snapshot --- but _SPI_execute_plan didn't do so either, since it thought it was in nonatomic mode. The best fix for this seems to be for _SPI_execute_plan to operate in atomic execution mode if IsSubTransaction() is true, even when the SPI context as a whole is non-atomic. This makes _SPI_execute_plan have the same rules about when non-atomic execution is allowed as _SPI_commit/_SPI_rollback have about when COMMIT/ROLLBACK are allowed, which seems appropriately symmetric. (If anyone ever tries to allow COMMIT/ROLLBACK inside a subtransaction, this would all need to be rethought ... but I'm unconvinced that such a thing could be logically consistent at all.) For further consistency, also check IsSubTransaction() in SPI_inside_nonatomic_context. That does not matter for its one present-day caller StartTransaction, which can't be reached inside a subtransaction. But if any other callers ever arise, they'd presumably want this definition. Per bug #18656 from Alexander Alehin. Back-patch to all supported branches, like previous fixes in this area. Discussion: https://postgr.es/m/18656-cade1780866ef66c@postgresql.org	2024-10-16 17:36:40 -04:00
Jeff Davis	b360d1762b	Fix #include order from `e839c8ecc9`. Reported-by: Alexander Korotkov Discussion: https://postgr.es/m/CAPpHfduAiGSsvUc614Z-JOnyQffcMeJncWMF2HnUL8wFy4fuWA@mail.gmail.com	2024-10-16 12:13:40 -07:00
Masahiko Sawada	1b9b6cc345	Reduce memory block size for decoded tuple storage to 8kB. Commit `a4ccc1cef` introduced the Generation Context and modified the logical decoding process to use a Generation Context with a fixed block size of 8MB for storing tuple data decoded during logical decoding (i.e., rb->tup_context). Several reports have indicated that the logical decoding process can be terminated due to out-of-memory (OOM) situations caused by excessive memory usage in rb->tup_context. This issue can occur when decoding a workload involving several concurrent transactions, including a long-running transaction that modifies tuples. By design, the Generation Context does not free a memory block until all chunks within that block are released. Consequently, if tuples modified by the long-running transaction are stored across multiple memory blocks, these blocks remain allocated until the long-running transaction completes, leading to substantial memory fragmentation. The memory usage during logical decoding, tracked by rb->size, does not account for memory fragmentation, resulting in potentially much higher memory consumption than the value of the logical_decoding_work_mem parameter. Various improvement strategies were discussed in the relevant thread. This change reduces the block size of the Generation Context used in rb->tup_context from 8MB to 8kB. This modification significantly decreases the likelihood of substantial memory fragmentation occurring and is relatively straightforward to backport. Performance testing across multiple platforms has confirmed that this change will not introduce any performance degradation that would impact actual operation. Backport to all supported branches. Reported-by: Alex Richman, Michael Guissine, Avi Weinberg Reviewed-by: Amit Kapila, Fujii Masao, David Rowley Tested-by: Hayato Kuroda, Shlok Kyal Discussion: https://postgr.es/m/CAD21AoBTY1LATZUmvSXEssvq07qDZufV4AF-OHh9VD2pC0VY2A%40mail.gmail.com Backpatch-through: 12	2024-10-16 12:08:05 -07:00
Peter Geoghegan	79fa7b3b1a	Normalize nbtree truncated high key array behavior. Commit `5bf748b8` taught nbtree ScalarArrayOp index scans to decide when and how to start the next primitive index scan based on physical index characteristics. This included rules for deciding whether to start a new primitive index scan (or whether to move onto the right sibling leaf page instead) that specifically consider truncated lower-order columns (-inf columns) from leaf page high keys. These omitted columns were treated as satisfying the scan's required scan keys, though only for scan keys marked required in the current scan direction (forward). Scan keys that didn't get this behavior (those marked required in the backwards direction only) usually didn't give the scan reasonable cause to reposition itself to a later leaf page (via another descent of the index in _bt_first), but _bt_advance_array_keys would nevertheless always give up by forcing another call to _bt_first. _bt_advance_array_keys was unwilling to allow the scan to continue onto the next leaf page, to reconsider whether we really should start another primitive scan based on the details of the sibling page's tuples. This didn't match its behavior with similar cases involving keys required in the current scan direction (forward), which seems unprincipled. It led to an excessive number of primitive scans/index descents for queries with a higher-order = array scan key (with dense, contiguous values) mixed with a lower-order required > or >= scan key. Bring > and >= strategy scan keys in line with other required scan key types: treat truncated -inf scan keys as having satisfied scan keys required in either scan direction (forwards and backwards alike) during array advancement. That way affected scans can continue to the right sibling leaf page. Advancement must now schedule an explicit recheck of the right sibling page's high key in cases involving > or >= scan keys. The recheck gives the scan a way to back out and start another primitive index scan (we can't just rely on _bt_checkkeys with > or >= scan keys). This work can be considered a stand alone optimization on top of the work from commit `5bf748b8`. But it was written in preparation for an upcoming patch that will add skip scan to nbtree. In practice scans that use "skip arrays" will tend to be much more sensitive to any implementation deficiencies in this area. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAH2-Wz=9A_UtM7HzUThSkQ+BcrQsQZuNhWOvQWK06PRkEp=SKQ@mail.gmail.com	2024-10-16 12:17:49 -04:00
Amit Langote	c259b1578e	Fix typo in comment of transformJsonAggConstructor() An oversight of `3a8a1f3254`. Reported-by: Tender Wang <tndrwang@gmail.com> Author: Tender Wang <tndrwang@gmail.com> Backpatch-through: 16	2024-10-16 20:37:02 +09:00
David Rowley	2453196107	Move clause_sides_match_join() into restrictinfo.h Two near-identical copies of clause_sides_match_join() existed in joinpath.c and analyzejoins.c. Deduplicate this by moving the function into restrictinfo.h. It isn't quite clear that keeping the inline property of this function is worthwhile, but this commit is just an exercise in code deduplication. More effort would be required to determine if the inline property is worth keeping. Author: James Hunter <james.hunter.pg@gmail.com> Discussion: https://postgr.es/m/CAJVSvF7Nm_9kgMLOch4c-5fbh3MYg%3D9BdnDx3Dv7Fcb64zr64Q%40mail.gmail.com	2024-10-15 21:14:21 +13:00
Masahiko Sawada	7cdfeee320	Add contrib/pg_logicalinspect. This module provides SQL functions that allow to inspect logical decoding components. It currently allows to inspect the contents of serialized logical snapshots of a running database cluster, which is useful for debugging or educational purposes. Author: Bertrand Drouvot Reviewed-by: Amit Kapila, Shveta Malik, Peter Smith, Peter Eisentraut Reviewed-by: David G. Johnston Discussion: https://postgr.es/m/ZscuZ92uGh3wm4tW%40ip-10-97-1-34.eu-west-3.compute.internal	2024-10-14 17:22:02 -07:00
Masahiko Sawada	e2fd615ecc	Move SnapBuild and SnapBuildOnDisk structs to snapshot_internal.h. This commit moves the definitions of the SnapBuild and SnapBuildOnDisk structs, related to logical snapshots, to the snapshot_internal.h file. This change allows external tools, such as pg_logicalinspect (with an upcoming patch), to access and utilize the contents of logical snapshots. Author: Bertrand Drouvot Reviewed-by: Amit Kapila, Shveta Malik, Peter Smith Discussion: https://postgr.es/m/ZscuZ92uGh3wm4tW%40ip-10-97-1-34.eu-west-3.compute.internal	2024-10-14 17:19:33 -07:00
Jeff Davis	66ac94cdc7	Move libc-specific code from pg_locale.c into pg_locale_libc.c. Move implementation of pg_locale_t code for libc collations into pg_locale_libc.c. Other locale-related code, such as pg_perm_setlocale(), remains in pg_locale.c for now. Discussion: https://postgr.es/m/flat/2830211e1b6e6a2e26d845780b03e125281ea17b.camel@j-davis.com	2024-10-14 12:48:43 -07:00
Jeff Davis	f244a2bb4c	Move ICU-specific code from pg_locale.c into pg_locale_icu.c. Discussion: https://postgr.es/m/flat/2830211e1b6e6a2e26d845780b03e125281ea17b.camel@j-davis.com	2024-10-14 12:13:26 -07:00
Masahiko Sawada	4681ad4b2f	Use construct_array_builtin for FLOAT8OID instead of construct_array. Commit `d746021de1` introduced construct_array_builtin() for built-in data types, but forgot some replacements linked to FLOAT8OID. Author: Bertrand Drouvot Reviewed-by: Peter Eisentraut Discussion: https://postgr.es/m/CAD21AoCERkwmttY44dqUw%3Dm_9QCctu7W%2Bp6B7w_VqxRJA1Qq_Q%40mail.gmail.com	2024-10-14 09:49:29 -07:00
Peter Eisentraut	c594f1ad2b	Track scan reversals in MergeJoin The MergeJoin struct was tracking "mergeStrategies", which were an array of btree strategy numbers, purely for the purpose of comparing it later against btree strategies to determine if the scan direction was forward or reverse. Change that. Instead, track "mergeReversals", an array of bool, to indicate the same without an unfortunate assumption that a strategy number refers specifically to a btree strategy. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2024-10-14 15:36:18 +02:00
Peter Eisentraut	0d2aa4d493	Track sort direction in SortGroupClause Functions make_pathkey_from_sortop() and transformWindowDefinitions(), which receive a SortGroupClause, were determining the sort order (ascending vs. descending) by comparing that structure's operator strategy to BTLessStrategyNumber, but could just as easily have gotten it from the SortGroupClause object, if it had such a field, so add one. This reduces the number of places that hardcode the assumption that the strategy refers specifically to a btree strategy, rather than some other index AM's operators. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2024-10-14 15:36:02 +02:00
Jeff Davis	35a015a600	Fixup for pg_set_relation_stats(). Reported-by: Noriyoshi Shinoda Discussion: https://postgr.es/m/DM4PR84MB17345E2DFF28A5557B7CBC3CEE7A2@DM4PR84MB1734.NAMPRD84.PROD.OUTLOOK.COM	2024-10-13 13:44:23 -07:00
Michael Paquier	c0b74323dc	Use MAX_PARALLEL_WORKER_LIMIT for max_parallel_maintenance_workers max_parallel_maintenance_workers has been introduced in `9da0cc3528`, and used a hardcoded limit of 1024 rather than this variable. max_parallel_workers and max_parallel_workers_per_gather already used MAX_PARALLEL_WORKER_LIMIT (1024) as their upper-bound since `6599c9ac33`. Author: Matthias van de Meent Reviewed-by: Zhang Mingli Discussion: https://postgr.es/m/CAEze2WiCiJD+8Wig_wGPyn4vgdPjbnYXy2Rw+9KYi6izTMuP=w@mail.gmail.com	2024-10-13 11:20:30 +09:00
Tom Lane	9f954177b1	Correctly identify which EC members are computable at a plan node. find_computable_ec_member() had the wrong mental model of what its primary caller prepare_sort_from_pathkeys() would do with the selected EquivalenceClass member expression. We will not compute the EC expression in a plan node atop the one returning the passed-in targetlist; rather, the EC expression will be computed as an additional column of that targetlist. So any Var or quasi-Var used in the given tlist is also available to the EC expression. In simple cases this makes no difference because the given tlist is just a list of Vars or quasi-Vars --- but if we are considering an appendrel member produced by flattening a UNION ALL, the tlist may contain expressions, resulting in failure to match and a "could not find pathkey item to sort" error. To fix, we can flatten both the tlist and the EC members with pull_var_clause(), and then just check for subset-ness, so that the code is actually shorter than before. While this bug is quite old, the present patch only works back to v13. We could possibly make it work in v12 by back-patching parts of `375398244`. On the whole though I don't like the risk/reward ratio of that idea. v12's final release is next month, meaning there would be no chance to correct matters if the patch causes a regression. Since this failure has escaped notice for 14 years, it's likely nobody will hit it in the field with v12. Per bug #18652 from Alexander Lakhin. Andrei Lepikhov and Tom Lane Discussion: https://postgr.es/m/18652-deaa782ebcca85d1@postgresql.org	2024-10-12 14:56:08 -04:00
Jeff Davis	98c5b191e7	Fix missed case for builtin collation provider. A missed check for the builtin collation provider could result in falling through to call isalpha(). This does not appear to have practical consequences because it only happens for characters in the ASCII range. Regardless, the builtin provider should not be calling libc functions, so backpatch. Discussion: https://postgr.es/m/1bd5a0a5192f82c22ee7527e825b18ab0028b2c7.camel@j-davis.com Backpatch-through: 17	2024-10-11 16:59:29 -07:00
Jeff Davis	e839c8ecc9	Create functions pg_set_relation_stats, pg_clear_relation_stats. These functions are used to tweak statistics on any relation, provided that the user has MAINTAIN privilege on the relation, or is the database owner. Bump catalog version. Author: Corey Huinker Discussion: https://postgr.es/m/CADkLM=eErgzn7ECDpwFcptJKOk9SxZEk5Pot4d94eVTZsvj3gw@mail.gmail.com	2024-10-11 16:55:11 -07:00
Daniel Gustafsson	6f782a2a17	Avoid mixing custom and OpenSSL BIO functions PostgreSQL has for a long time mixed two BIO implementations, which can lead to subtle bugs and inconsistencies. This cleans up our BIO by just just setting up the methods we need. This patch does not introduce any functionality changes. The following methods are no longer defined due to not being needed: - gets: Not used by libssl - puts: Not used by libssl - create: Sets up state not used by libpq - destroy: Not used since libpq use BIO_NOCLOSE, if it was used it close the socket from underneath libpq - callback_ctrl: Not implemented by sockets The following methods are defined for our BIO: - read: Used for reading arbitrary length data from the BIO. No change in functionality from the previous implementation. - write: Used for writing arbitrary length data to the BIO. No change in functionality from the previous implementation. - ctrl: Used for processing ctrl messages in the BIO (similar to ioctl). The only ctrl message which matters is BIO_CTRL_FLUSH used for writing out buffered data (or signal EOF and that no more data will be written). BIO_CTRL_FLUSH is mandatory to implement and is implemented as a no-op since there is no intermediate buffer to flush. BIO_CTRL_EOF is the out-of-band method for signalling EOF to read_ex based BIO's. Our BIO is not read_ex based but someone could accidentally call BIO_CTRL_EOF on us so implement mainly for completeness sake. As the implementation is no longer related to BIO_s_socket or calling SSL_set_fd, methods have been renamed to reference the PGconn and Port types instead. This also reverts back to using BIO_set_data, with our fallback, as a small optimization as BIO_set_app_data require the ex_data mechanism in OpenSSL. Author: David Benjamin <davidben@google.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAF8qwaCZ97AZWXtg_y359SpOHe+HdJ+p0poLCpJYSUxL-8Eo8A@mail.gmail.com	2024-10-11 21:58:58 +02:00
Nathan Bossart	4e1fad3787	Add pg_ls_summariesdir(). This function returns the name, size, and last modification time of each regular file in pg_wal/summaries. This allows administrators to grant privileges to view the contents of this directory without granting privileges on pg_ls_dir(), which allows listing the contents of many other directories. This commit also gives the pg_monitor predefined role EXECUTE privileges on the new pg_ls_summariesdir() function. Bumps catversion. Author: Yushi Ogiwara Reviewed-by: Michael Paquier, Fujii Masao Discussion: https://postgr.es/m/a0a3af15a9b9daa107739eb45aa9a9bc%40oss.nttdata.com	2024-10-11 11:02:09 -05:00
Álvaro Herrera	099c572d33	Use deconstruct_array_builtin instead of deconstruct_array Commit `062a844424` introduced use of deconstruct_array when deconstruct_array_builtin can be used instead. Do that to save some code. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/Zwi5g2GzlUX1NqxR@ip-10-97-1-34.eu-west-3.compute.internal	2024-10-11 09:54:18 +02:00
David Rowley	161320b4b9	Adjust EXPLAIN's output for disabled nodes `c01743aa4` added EXPLAIN output to display the plan node's disabled_node count whenever that count is above 0. Seemingly, there weren't many people who liked that output as each parent of a disabled node would also have a "Disabled Nodes" output due to the way disabled_nodes is accumulated towards the root plan node. It was often hard and sometimes impossible to figure out which nodes were disabled from looking at EXPLAIN. You might think it would be possible to manually add up the numbers from the "Disabled Nodes" output of a given node's children to figure out if that node has a higher disabled_nodes count than its children, but that wouldn't have worked for Append and Merge Append nodes if some disabled child nodes were run-time pruned during init plan. Those children are not displayed in EXPLAIN. Here we attempt to improve this output by only showing "Disabled: true" against only the nodes which are explicitly disabled themselves. That seems to be the output that's desired by the most people who voiced their opinion. This is done by summing up the disabled_nodes of the given node's children and checking if that number is less than the disabled_nodes of the current node. This commit also fixes a bug in make_sort() which was neglecting to set the Sort's disabled_nodes field. This should have copied what was done in cost_sort(), but it hadn't been updated. With the new output, the choice to not maintain that field properly was clearly wrong as the disabled-ness of the node was attributed to the Sort's parent instead. Reviewed-by: Laurenz Albe, Alena Rybakina Discussion: https://postgr.es/m/9e4ad616bebb103ec2084bf6f724cfc739e7fabb.camel@cybertec.at	2024-10-11 17:19:59 +13:00
Álvaro Herrera	fd64ed60b6	Unbreak overflow test for attinhcount/coninhcount Commit `90189eefc1` narrowed pg_attribute.attinhcount and pg_constraint.coninhcount from 32 to 16 bits, but kept other related structs with 32-bit wide fields: ColumnDef and CookedConstraint contain an int 'inhcount' field which is itself checked for overflow on increments, but there's no check that the values aren't above INT16_MAX before assigning to the catalog columns. This means that a creative user can get a inconsistent table definition and override some protections. Fix it by changing those other structs to also use int16. Also, modernize style by using pg_add_s16_overflow for overflow testing instead of checking for negative values. We also have Constraint.inhcount, which is here removed completely. This was added by commit `b0e96f3119` and not removed by its revert at `6f8bb7c1e9`. It is not needed by the upcoming not-null constraints patch. This is mostly academic, so we agreed not to backpatch to avoid ABI problems. Bump catversion because of the changes to parse nodes. Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Co-authored-by: 何建 (jian he) <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/202410081611.up4iyofb5ie7@alvherre.pgsql	2024-10-10 17:41:01 +02:00
Tom Lane	5a4416192d	Avoid crash in estimate_array_length with null root pointer. Commit `9391f7152` added a "PlannerInfo *root" parameter to estimate_array_length, but failed to consider the possibility that NULL would be passed for that, leading to a null pointer dereference. We could rectify the particular case shown in the bug report by fixing simplify_function/inline_function to pass through the root pointer. However, as long as eval_const_expressions is documented to accept NULL for root, similar hazards would remain. For now, let's just do the narrow fix of hardening estimate_array_length to not crash. Its behavior with NULL root will be the same as it was before `9391f7152`, so this is not too awful. Per report from Fredrik Widlert (via Paul Ramsey). Back-patch to v17 where `9391f7152` came in. Discussion: https://postgr.es/m/518339E7-173E-45EC-A0FF-9A4A62AA4F40@cleverelephant.ca	2024-10-09 17:07:53 -04:00
Michael Paquier	f3f06b1330	Apply GUC name from central table in more places of guc.c The name extracted from the record of the GUC tables is applied to more internal places of guc.c. This change has the advantage to simplify parse_and_validate_value(), where the "name" was only used in elog messages, while it was required to match with the name from the GUC record. pg_parameter_aclcheck() now passes the name of the GUC from its record in two places rather than the caller's argument. The value given to this function goes through convert_GUC_name_for_parameter_acl() that does a simple ASCII downcasing. Few GUCs mix character casing in core; one test is added for one of these code paths with "IntervalStyle". Author: Peter Smith, Michael Paquier Discussion: https://postgr.es/m/ZwNh4vkc2NHJHnND@paquier.xyz	2024-10-09 18:47:34 +09:00
Richard Guo	67a54b9e83	Allow pushdown of HAVING clauses with grouping sets In some cases, we may want to transfer a HAVING clause into WHERE in hopes of eliminating tuples before aggregation instead of after. Previously, we couldn't do this if there were any nonempty grouping sets, because we didn't have a way to tell if the HAVING clause referenced any columns that were nullable by the grouping sets, and moving such a clause into WHERE could potentially change the results. Now, with expressions marked nullable by grouping sets with the RT index of the RTE_GROUP RTE, it is much easier to identify those clauses that reference any nullable-by-grouping-sets columns: we just need to check if the RT index of the RTE_GROUP RTE is present in the clause. For other HAVING clauses, they can be safely pushed down. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs4-NpzPgtKU=hgnvyn+J-GanxQCjrUi7piNzZ=upiCV=2Q@mail.gmail.com	2024-10-09 17:19:04 +09:00
Richard Guo	828e94c9d2	Consider explicit incremental sort for mergejoins For a mergejoin, if the given outer path or inner path is not already well enough ordered, we need to do an explicit sort. Currently, we only consider explicit full sort and do not account for incremental sort. In this patch, for the outer path of a mergejoin, we choose to use explicit incremental sort if it is enabled and there are presorted keys. For the inner path, though, we cannot use incremental sort because it does not support mark/restore at present. The rationale is based on the assumption that incremental sort is always faster than full sort when there are presorted keys, a premise that has been applied in various parts of the code. In addition, the current cost model tends to favor incremental sort as being cheaper than full sort in the presence of presorted keys, making it reasonable not to consider full sort in such cases. It could be argued that what if a mergejoin with an incremental sort as the outer path is selected as the inner path of another mergejoin. However, this should not be a problem, because mergejoin itself does not support mark/restore either, and we will add a Material node on top of it anyway in this case (see final_cost_mergejoin). There is one ensuing plan change in the regression tests, and we have to modify that test case to ensure that it continues to test what it is intended to. No backpatch as this could result in plan changes. Author: Richard Guo Reviewed-by: David Rowley, Tomas Vondra Discussion: https://postgr.es/m/CAMbWs49x425QrX7h=Ux05WEnt8GS757H-jOP3_xsX5t1FoUsZw@mail.gmail.com	2024-10-09 17:14:42 +09:00
Michael Paquier	de3a2ea3b2	Introduce two fields in EState to track parallel worker activity These fields can be set by executor nodes to record how many parallel workers were planned to be launched and how many of them have been actually launched within the number initially planned. This data is able to give an approximation of the parallel worker draught a system is facing, making easier the tuning of related configuration parameters. These fields will be used by some follow-up patches to populate other parts of the system with their data. Author: Guillaume Lelarge, Benoit Lobréau Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com Discussion: https://postgr.es/m/CAECtzeWtTGOK0UgKXdDGpfTVSa5bd_VbUt6K6xn8P7X+_dZqKw@mail.gmail.com	2024-10-09 08:07:48 +09:00
Tom Lane	2d24fd942c	Add min and max aggregates for bytea type. Similar to `a0f1fce80`, although we chose to duplicate logic rather than invoke byteacmp, primarily to avoid repeat detoasting. Marat Buharov, Aleksander Alekseev Discussion: https://postgr.es/m/CAPCEVGXiASjodos4P8pgyV7ixfVn-ZgG9YyiRZRbVqbGmfuDyg@mail.gmail.com	2024-10-08 13:52:14 -04:00
Andres Freund	57f3702471	Use aux process resource owner in walsender AIO will need a resource owner to do IO. Right now we create a resowner on-demand during basebackup, and we could do the same for AIO. But it seems easier to just always create an aux process resowner. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi	2024-10-08 11:37:45 -04:00
Andres Freund	755a4c10d1	bufmgr/smgr: Don't cross segment boundaries in StartReadBuffers() With real AIO it doesn't make sense to cross segment boundaries with one IO. Add smgrmaxcombine() to allow upper layers to query which buffers can be merged. We could continue to cross segment boundaries when not using AIO, but it doesn't really make sense, because md.c will never be able to perform the read across the segment boundary in one system call. Which means we'll mark more buffers as undergoing IO than really makes sense - if another backend desires to read the same blocks, it'll be blocked longer than necessary. So it seems better to just never cross the boundary. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi	2024-10-08 11:37:45 -04:00
Andres Freund	488f826c72	bufmgr: Return early in ScheduleBufferTagForWriteback() if fsync=off As pg_flush_data() doesn't do anything with fsync disabled, there's no point in tracking the buffer for writeback. Arguably the better fix would be to change pg_flush_data() to flush data even with fsync off, but that's a behavioral change, whereas this is just a small optimization. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi	2024-10-08 11:37:45 -04:00
Heikki Linnakangas	2bbc261ddb	Use an shmem_exit callback to remove backend from PMChildFlags on exit This seems nicer than having to duplicate the logic between InitProcess() and ProcKill() for which child processes have a PMChildFlags slot. Move the MarkPostmasterChildActive() call earlier in InitProcess(), out of the section protected by the spinlock. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi	2024-10-08 15:06:34 +03:00

1 2 3 4 5 ...

26054 Commits