postgres

mirror of https://github.com/postgres/postgres.git synced 2025-09-02 04:21:28 +03:00

Author	SHA1	Message	Date
Melanie Plageman	2128cebcdb	Remove extra prefetch iterator setup for Bitmap Table Scan `1a0da347a7` replaced Bitmap Table Scan's separate private and shared bitmap iterators with a unified iterator. It accidentally set up the prefetch iterator twice for non-parallel bitmap table scans. Remove the extra set up call to tbm_begin_iterate().	2024-12-19 11:55:18 -05:00
Melanie Plageman	754c610e13	Fix bitmap table scan crash on iterator release `1a0da347a7` replaced Bitmap Table Scan's individual private and shared iterators with a unified iterator. It neglected, however, to check if the iterator had already been cleaned up before doing so on rescan. Add this check both on rescan and end scan to be safe. Reported-by: Richard Guo Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs48nrhcLY1kcd-u9oD%2B6yiS631F_8Fx8ZGsO-BYDwH%2Bbyw%40mail.gmail.com	2024-12-19 11:55:03 -05:00
David Rowley	08cdb079d4	Optimize grouping equality checks with virtual slots `8f4ee9626` fixed an old Assert failure that could happen when the slot type used to look up the hash table for BuildTupleHashTableExt() users wasn't a TTSOpsMinimalTuple slot. The fix for that in the back branches had to be to pass the TupleTableSlotOps as NULL, however in master, since we have the inputOps parameter as was added by `d96d1d515`, we can pass that down instead. At least one caller uses a fixed slot that's always TTSOpsVirtual, so passing down inputOps for these cases allows ExecBuildGroupingEqual() to skip adding the EEOP_INNER_FETCHSOME ExprEvalStep. This should increase the performance of hashed subplans very slightly. Author: Tom Lane, David Rowley Discussion: https://postgr.es/m/2543667.1734483723@sss.pgh.pa.us	2024-12-19 13:57:21 +13:00
David Rowley	8f4ee96269	Fix Assert failure in WITH RECURSIVE UNION queries If the non-recursive part of a recursive CTE ended up using TTSOpsBufferHeapTuple as the table slot type, then a duplicate value could cause an Assert failure in CheckOpSlotCompatibility() when checking the hash table for the duplicate value. The expected slot type for the deform step was TTSOpsMinimalTuple so the Assert failed when the TTSOpsBufferHeapTuple slot was used. This is a long-standing bug which we likely didn't notice because it seems much more likely that the non-recursive term would have required projection and used a TTSOpsVirtual slot, which CheckOpSlotCompatibility is ok with. There doesn't seem to be any harm done here other than the Assert failure. Both TTSOpsMinimalTuple and TTSOpsBufferHeapTuple slot types require tuple deformation, so the EEOP__FETCHSOME ExprState step would have properly existed in the ExprState. The solution is to pass NULL for the ExecBuildGroupingEqual's 'lops' parameter. This means the ExprState's EEOP__FETCHSOME step won't expect a fixed slot type. This makes CheckOpSlotCompatibility() happy as no checking is performed when the ExprEvalStep is not expecting a fixed slot type. Reported-by: Richard Guo Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAMbWs4-8U9q2LAtf8+ghV11zeUReA3AmrYkxzBEv0vKnDxwkKA@mail.gmail.com Backpatch-through: 13, all supported versions	2024-12-19 13:11:39 +13:00
Melanie Plageman	1a0da347a7	Bitmap Table Scans use unified TBMIterator With the repurposing of TBMIterator as an interface for both parallel and serial iteration through TIDBitmaps in commit `7f9d4187e7`, bitmap table scans may now use it. Modify bitmap table scan code to use the TBMIterator. This requires moving around a bit of code, so a few variables are initialized elsewhere. Author: Melanie Plageman Reviewed-by: Tomas Vondra Discussion: https://postgr.es/m/c736f6aa-8b35-4e20-9621-62c7c82e2168%40vondra.me	2024-12-18 18:43:39 -05:00
Melanie Plageman	7f9d4187e7	Add common interface for TBMIterators Add and use TBMPrivateIterator, which replaces the current TBMIterator for serial use cases, and repurpose TBMIterator to be a unified interface for both the serial ("private") and parallel ("shared") TID Bitmap iterator interfaces. This encapsulation simplifies call sites for callers supporting both parallel and serial TID Bitmap access. TBMIterator is not yet used in this commit. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Heikki Linnakangas Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi	2024-12-18 18:19:28 -05:00
David Rowley	d96d1d5152	Fix incorrect slot type in BuildTupleHashTableExt `0f5738202` adjusted the execGrouping.c code so it made use of ExprStates to generate hash values. That commit made a wrong assumption that the slot type to pass to ExecBuildHash32FromAttrs() is always &TTSOpsMinimalTuple. That's not the case as the slot type depends on the slot type passed to LookupTupleHashEntry(), which for nodeRecursiveunion.c, could be any of the current slot types. Here we fix this by adding a new parameter to BuildTupleHashTableExt() to allow the slot type to be passed in. In the case of nodeSubplan.c and nodeAgg.c the slot type is always &TTSOpsVirtual, so for both of those cases, it's beneficial to pass the known slot type as that allows ExecBuildHash32FromAttrs() to skip adding the tuple deform step to the resulting ExprState. Another possible fix would have been to have ExecBuildHash32FromAttrs() set "fetch.kind" to NULL so that ExecComputeSlotInfo() always determines the EEOP_INNER_FETCHSOME is required, however, that option isn't favorable as slows down aggregation and hashed subplan evaluation due to the extra (needless) deform step. Thanks to Nathan Bossart for bisecting to find the offending commit based on Paul's report. Reported-by: Paul Ramsey <pramsey@cleverelephant.ca> Discussion: https://postgr.es/m/99F064C1-B3EB-4BE7-97D2-D2A0AA487A71@cleverelephant.ca	2024-12-18 12:05:55 +13:00
David Rowley	0f5738202b	Use ExprStates for hashing in GROUP BY and SubPlans This speeds up obtaining hash values for GROUP BY and hashed SubPlans by using the ExprState support for hashing, thus allowing JIT compilation for obtaining hash values for these operations. This, even without JIT compilation, has been shown to improve Hash Aggregate performance in some cases by around 15% and hashed NOT IN queries in one case by over 30%, however, real-world cases are likely to see smaller gains as the test cases used were purposefully designed to have high hashing overheads by keeping the hash table small to prevent additional memory overheads that would be a factor when working with large hash tables. In passing, fix a hypothetical bug in ExecBuildHash32Expr() so that the initial value is stored directly in the ExprState's result field if there are no expressions to hash. None of the current users of this function use an initial value, so the bug is only hypothetical. Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Discussion: https://postgr.es/m/CAApHDvpYSO3kc9UryMevWqthTBrxgfd9djiAjKHMPUSQeX9vdQ@mail.gmail.com	2024-12-11 13:47:16 +13:00
David Rowley	50416cc484	Speedup Hash Joins with dedicated functions for ExprState hashing Hashing of a single Var is a very common operation for ExprState to perform. Here we add dedicated ExecJust* functions which helps speed up Hash Joins by removing the interpretation overhead in ExecInterpExpr(). This change currently only affects Hash Joins on a single column. Hash Joins with multiple join keys or an expression still run through ExecInterpExpr(). Some testing has shown up to 10% query performance increases on recent AMD hardware and nearly 7% increase on an Apple M2 for a query performing a hash join with a large number of lookups on a small hash table. This change was extracted from a larger patch which adjusts GROUP BY / hashed subplans / hashed set operations to use ExprState hashing. Discussion: https://postgr.es/m/CAApHDvr8Zc0ZgzVoCZLdHGOFNhiJeQ6vrUcS9V7N23zMWQb-eA@mail.gmail.com	2024-12-11 11:32:15 +13:00
Tom Lane	9828905303	Doc: add some commentary about ExecutorRun's NoMovement special case. Robert Haas expressed concern about whether commit `3eea7a0c9` exposed the parallel-execution machinery to a case it isn't tested for, namely a second non-parallel execution of a plan after a parallel execution. Investigation shows that that can't happen because of pquery.c's manipulation of the scan direction, but it sure wasn't obvious to start with. Add some commentary about that. Discussion: https://postgr.es/m/CA+TgmoagyKQy=HFw+wLo0AKTYWHui+iKswZ8Jnqqd-cFby-WVg@mail.gmail.com	2024-12-10 17:17:28 -05:00
Peter Eisentraut	74edabce7a	Support for GiST in get_equal_strategy_number() A WITHOUT OVERLAPS primary key or unique constraint is accepted as a REPLICA IDENTITY, since it guarantees uniqueness. But subscribers applying logical decoding messages would fail because there was not support for looking up the equals operator for a gist index. This fixes that: For GiST indexes we can use the stratnum GiST support function. Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-12-10 13:26:09 +01:00
Peter Eisentraut	a2a475b011	Replace get_equal_strategy_number_for_am() by get_equal_strategy_number() get_equal_strategy_number_for_am() gets the equal strategy number for an AM. This currently only supports btree and hash. In the more general case, this also depends on the operator class (see for example GistTranslateStratnum()). To support that, replace this function with get_equal_strategy_number() that takes an opclass and derives it from there. (This function already existed before as a static function, so the signature is kept for simplicity.) This patch is only a refactoring, it doesn't add support for other index AMs such as gist. This will be done separately. Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-12-10 12:53:27 +01:00
Peter Eisentraut	321c287351	Improve internal logical replication error for missing equality strategy This "shouldn't happen", except right now it can with a temporal gist index (to be fixed soon), because of missing gist support in get_equal_strategy_number(). But right now, the error is not caught right away, but instead you get the subsequent error about a "missing operator 0". This makes the error more accurate. Author: Paul Jungwirth <pj@illuminatedcomputing.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-12-10 12:30:42 +01:00
Tom Lane	3eea7a0c97	Simplify executor's determination of whether to use parallelism. Our parallel-mode code only works when we are executing a query in full, so ExecutePlan must disable parallel mode when it is asked to do partial execution. The previous logic for this involved passing down a flag (variously named execute_once or run_once) from callers of ExecutorRun or PortalRun. This is overcomplicated, and unsurprisingly some of the callers didn't get it right, since it requires keeping state that not all of them have handy; not to mention that the requirements for it were undocumented. That led to assertion failures in some corner cases. The only state we really need for this is the existing QueryDesc.already_executed flag, so let's just put all the responsibility in ExecutePlan. (It could have been done in ExecutorRun too, leading to a slightly shorter patch -- but if there's ever more than one caller of ExecutePlan, it seems better to have this logic in the subroutine than the callers.) This makes those ExecutorRun/PortalRun parameters unnecessary. In master it seems okay to just remove them, returning the API for those functions to what it was before parallelism. Such an API break is clearly not okay in stable branches, but for them we can just leave the parameters in place after documenting that they do nothing. Per report from Yugo Nagata, who also reviewed and tested this patch. Back-patch to all supported branches. Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp	2024-12-09 14:38:19 -05:00
Richard Guo	5668a857de	Fix right-semi-joins in HashJoin rescans When resetting a HashJoin node for rescans, if it is a single-batch join and there are no parameter changes for the inner subnode, we can just reuse the existing hash table without rebuilding it. However, for join types that depend on the inner-tuple match flags in the hash table, we need to reset these match flags to avoid incorrect results. This applies to right, right-anti, right-semi, and full joins. When I introduced "Right Semi Join" plan shapes in `aa86129e1`, I failed to reset the match flags in the hash table for right-semi joins in rescans. This oversight has been shown to produce incorrect results. This patch fixes it. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs4-nQF9io2WL2SkD0eXvfPdyBc9Q=hRwfQHCGV2usa0jyA@mail.gmail.com	2024-12-09 20:36:23 +09:00
Amit Kapila	2d0152d614	Improve the error message introduced in commit `87ce27de69`. The error detail message "Replica identity consists of an unpublished generated column." implies that the entire replica identity is made up of an unpublished generated column which may not be the case. Reported-by: Peter Smith Author: Shlok Kyal Reviewed-by: Peter Smith, Amit Kapila Discussion: https://postgr.es/m/CAHut+PuwMhKx0PhOA4APhJTLoBGNykbeCQpr_CuwGT-SkswG5w@mail.gmail.com	2024-12-09 09:11:45 +05:30
David Rowley	1fe5a347e3	Fix possible crash during WindowAgg evaluation When short-circuiting WindowAgg node evaluation on the top-level WindowAgg node using quals on monotonic window functions, because the WindowAgg run condition can mean there's no need to evaluate subsequent window function results in the same partition once the run condition becomes false, it was possible that the executor would use stale results from the previous invocation of the window function in some cases. A fix for this was partially done by a5832722, but that commit only fixed the issue for non-top-level WindowAgg nodes. I mistakenly thought that the top-level WindowAgg didn't have this issue, but Jayesh's example case clearly shows that's incorrect. At the time, I also thought that this only affected 32-bit systems as all window functions which then supported run conditions returned BIGINT, however, that's wrong as ExecProject is still called and that could cause evaluation of any other window function belonging to the same WindowAgg node, one of which may return a byref type. The only queries affected by this are WindowAggs with a "Run Condition" which contains at least one window function with a byref result type, such as lead() or lag() on a byref column. The window clause must also contain a PARTITION BY clause (without a PARTITION BY, execution of the WindowAgg stops immediately when the run condition becomes false and there's no risk of using the stale results). Reported-by: Jayesh Dehankar Discussion: https://postgr.es/m/193261e2c4d.3dd3cd7c1842.871636075166132237@zohocorp.com Backpatch-through: 15, where WindowAgg run conditions were added	2024-12-09 14:23:21 +13:00
Amit Kapila	87ce27de69	Ensure stored generated columns must be published when required. Ensure stored generated columns that are part of REPLICA IDENTITY must be published explicitly for UPDATE and DELETE operations to be published. We can publish generated columns by listing them in the column list or by enabling the publish_generated_columns option. This commit changes the behavior of the test added in commit `adedf54e65` by giving an ERROR for the UPDATE operation in such cases. There is no way to trigger the bug reported in commit `adedf54e65` but we didn't remove the corresponding code change because it is still relevant when replicating changes from a publisher with version less than 18. We decided not to backpatch this behavior change to avoid the risk of breaking existing output plugins that may be sending generated columns by default although we are not aware of any such plugin. Also, we didn't see any reports related to this on STABLE branches which is another reason not to backpatch this change. Author: Shlok Kyal, Hou Zhijie Reviewed-by: Vignesh C, Amit Kapila Discussion: https://postgr.es/m/CANhcyEVw4V2Awe2AB6i0E5AJLNdASShGfdBLbUd1XtWDboymCA@mail.gmail.com	2024-12-04 09:45:18 +05:30
David Rowley	4171c44c9b	Revert "Introduce CompactAttribute array in TupleDesc" This reverts commit `d28dff3f6c`. Quite a large number of buildfarm members didn't like this commit and it's not yet clear why. Reverting this before too many animals turn red. Discussion: https://postgr.es/m/CAApHDvr9i6T5=iAwQCxFDgMsthr_obVxgwBaEJkC8KUH6yM3Hw@mail.gmail.com	2024-12-03 17:12:38 +13:00
David Rowley	d28dff3f6c	Introduce CompactAttribute array in TupleDesc The new compact_attrs array stores a few select fields from FormData_pg_attribute in a more compact way, using only 16 bytes per column instead of the 104 bytes that FormData_pg_attribute uses. Using CompactAttribute allows performance-critical operations such as tuple deformation to be performed without looking at the FormData_pg_attribute element in TupleDesc which means fewer cacheline accesses. With this change, NAMEDATALEN could be increased with a much smaller negative impact on performance. For some workloads, tuple deformation can be the most CPU intensive part of processing the query. Some testing with 16 columns on a table where the first column is variable length showed around a 10% increase in transactions per second for an OLAP type query performing aggregation on the 16th column. However, in certain cases, the increases were much higher, up to ~25% on one AMD Zen4 machine. This also makes pg_attribute.attcacheoff redundant. A follow-on commit will remove it, thus shrinking the FormData_pg_attribute struct by 4 bytes. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com Reviewed-by: Andres Freund, Victor Yegorov	2024-12-03 16:50:59 +13:00
Peter Eisentraut	7f798aca1d	Remove useless casts to (void *) Many of them just seem to have been copied around for no real reason. Their presence causes (small) risks of hiding actual type mismatches or silently discarding qualifiers Discussion: https://www.postgresql.org/message-id/flat/461ea37c-8b58-43b4-9736-52884e862820@eisentraut.org	2024-11-28 08:27:20 +01:00
Tom Lane	5b8728cd7f	Fix NULLIF()'s handling of read-write expanded objects. If passed a read-write expanded object pointer, the EEOP_NULLIF code would hand that same pointer to the equality function and then (unless equality was reported) also return the same pointer as its value. This is no good, because a function that receives a read-write expanded object pointer is fully entitled to scribble on or even delete the object, thus corrupting the NULLIF output. (This problem is likely unobservable with the equality functions provided in core Postgres, but it's easy to demonstrate with one coded in plpgsql.) To fix, make sure the pointer passed to the equality function is read-only. We can still return the original read-write pointer as the NULLIF result, allowing optimization of later operations. Per bug #18722 from Alexander Lakhin. This has been wrong since we invented expanded objects, so back-patch to all supported branches. Discussion: https://postgr.es/m/18722-fd9e645448cc78b4@postgresql.org	2024-11-25 18:09:09 -05:00
Nathan Bossart	cd7ab57532	Ensure cached plans are correctly marked as dependent on role. If a CTE, subquery, sublink, security invoker view, or coercion projection references a table with row-level security policies, we neglected to mark the plan as potentially dependent on which role is executing it. This could lead to later executions in the same session returning or hiding rows that should have been hidden or returned instead. Reported-by: Wolfgang Walther Reviewed-by: Noah Misch Security: CVE-2024-10976 Backpatch-through: 12	2024-11-11 09:00:00 -06:00
Michael Paquier	e7a9496de9	Add two attributes to pg_stat_database for parallel workers activity Two attributes are added to pg_stat_database: * parallel_workers_to_launch, counting the total number of parallel workers that were planned to be launched. * parallel_workers_launched, counting the total number of parallel workers actually launched. The ratio of both fields can provide hints that there are not enough slots available when launching parallel workers, also useful when pg_stat_statements is not deployed on an instance (i.e. `cf54a2c002`). This commit relies on `de3a2ea3b2`, that has added two fields to EState, that get incremented when executing Gather or GatherMerge nodes. A test is added in select_parallel, where parallel workers are spawned. Bump catalog version. Author: Benoit Lobréau Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com	2024-11-11 10:40:48 +09:00
Tom Lane	b8df690492	Improve fix for not entering parallel mode when holding interrupts. Commit `ac04aa84a` put the shutoff for this into the planner, which is not ideal because it doesn't prevent us from re-using a previously made parallel plan. Revert the planner change and instead put the shutoff into InitializeParallelDSM, modeling it on the existing code there for recovering from failure to allocate a DSM segment. However, that code path is mostly untested, and testing a bit harder showed there's at least one bug: ExecHashJoinReInitializeDSM is not prepared for us to have skipped doing parallel DSM setup. I also thought the Assert in ReinitializeParallelWorkers is pretty ill-advised, and replaced it with a silent Min() operation. The existing test case added by `ac04aa84a` serves fine to test this version of the fix, so no change needed there. Patch by me, but thanks to Noah Misch for the core idea that we could shut off worker creation when !INTERRUPTS_CAN_BE_PROCESSED. Back-patch to v12, as `ac04aa84a` was. Discussion: https://postgr.es/m/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com	2024-11-08 13:42:10 -05:00
David Rowley	87f81a5563	Fix hypothetical bug in ExprState building for hashing `adf97c156` gave ExprStates the ability to hash expressions and return a single hash value. That commit supports seeding the hash value with an initial value to have that blended into the final hash value. Here we fix a hypothetical bug where if there are zero expressions to hash, the initial value is stored in the wrong location. The existing code stored the initial value in an intermediate location expecting that when the expressions were hashed that those steps would store the final hash value in the ExprState.resvalue field. However, that wouldn't happen when there are zero expressions to hash. The correct thing to do instead is to have a special case for zero expressions and when we hit that case, store the initial value directly in the ExprState.resvalue. The reason that this is a hypothetical bug is that no code currently calls ExecBuildHash32Expr passing a non-zero initial value. Discussion: https://postgr.es/m/CAApHDvpMAL_zxbMRr1LOex3O7Y7R7ZN2i8iUFLQhqQiJMAg3qw@mail.gmail.com	2024-11-06 09:16:00 +13:00
Daniel Gustafsson	fb7e27abfb	Remove duplicate words in comments A few comments contained duplicate "the" in sentences, fix by removing one occurrence. Author: Vignesh C <vignesh21@gmail.com> Discussion: https://postgr.es/m/CALDaNm2aEEiPwGJmPdzBxROVvs8n75yCjKz4K1f1B2TdWpzxTA@mail.gmail.com	2024-10-31 11:38:03 +01:00
David Rowley	3974bc3196	Remove unused field from SubPlanState struct `bf6c614a2` did some conversion work to use ExprState instead of manually calling equality functions to check if one set of values is not distinct from another set. That patch removed many of the fields that became redundant as a result of that change, but it forgot to remove SubPlanState.tab_eq_funcs. Fix that. In passing, fix the header comment for TupleHashEntryData to correctly spell the field name it's talking about. Author: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Discussion: https://postgr.es/m/CA+FpmFeycdombFzrjZw7Rmc29CVm4OOzCWwu=dVBQ6q=PX8SvQ@mail.gmail.com Discussion: https://postgr.es/m/CAApHDvrWR2jYVhec=COyF2g2BE_ns91NDsCHAMFiXbyhEujKdQ@mail.gmail.com	2024-10-31 13:44:15 +13:00
Peter Eisentraut	e18512c000	Remove unused #include's from backend .c files as determined by IWYU These are mostly issues that are new since commit `dbbca2cf29`. Discussion: https://www.postgresql.org/message-id/flat/0df1d5b1-8ca8-4f84-93be-121081bde049%40eisentraut.org	2024-10-27 08:26:50 +01:00
Melanie Plageman	de380a62b5	Make table_scan_bitmap_next_block() async-friendly Move all responsibility for indicating a block is exhuasted into table_scan_bitmap_next_tuple() and advance the main iterator in heap-specific code. This flow control makes more sense and is a step toward using the read stream API for bitmap heap scans. Previously, table_scan_bitmap_next_block() returned false to indicate table_scan_bitmap_next_tuple() should not be called for the tuples on the page. This happened both when 1) there were no visible tuples on the page and 2) when the block returned by the iterator was past the end of the table. BitmapHeapNext() (generic bitmap table scan code) handled the case when the bitmap was exhausted. It makes more sense for table_scan_bitmap_next_tuple() to return false when there are no visible tuples on the page and table_scan_bitmap_next_block() to return false when the bitmap is exhausted or there are no more blocks in the table. As part of this new design, TBMIterateResults are no longer used as a flow control mechanism in BitmapHeapNext(), so we removed table_scan_bitmap_next_tuple's TBMIterateResult parameter. Note that the prefetch iterator is still saved in the BitmapHeapScanState node and advanced in generic bitmap table scan code. This is because 1) it was not necessary to change the prefetch iterator location to change the flow control in BitmapHeapNext() 2) modifying prefetch iterator management requires several more steps better split over multiple commits and 3) the prefetch iterator will be removed once the read stream API is used. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas, Mark Dilger Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi	2024-10-25 10:11:58 -04:00
Melanie Plageman	7bd7aa4d30	Move EXPLAIN counter increment to heapam_scan_bitmap_next_block Increment the lossy and exact page counters for EXPLAIN of bitmap heap scans in heapam_scan_bitmap_next_block(). Note that other table AMs will need to do this as well Pushing the counters into heapam_scan_bitmap_next_block() is required to be able to use the read stream API for bitmap heap scans. The bitmap iterator must be advanced from inside the read stream callback, so TBMIterateResults cannot be used as a flow control mechanism in BitmapHeapNext(). Author: Melanie Plageman Reviewed-by: Tomas Vondra, Heikki Linnakangas Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi	2024-10-25 10:11:46 -04:00
Amit Langote	55e6d712af	Remove unnecessary word in a comment Relations opened by the executor are only closed once in ExecCloseRangeTableRelations(), so the word "again" in the comment for ExecGetRangeTableRelation() is misleading and unnecessary. Discussion: https://postgr.es/m/CA+HiwqHnw-zR+u060i3jp4ky5UR0CjByRFQz50oZ05de7wUg=Q@mail.gmail.com Backpatch-through: 12	2024-10-23 17:54:48 +09:00
Amit Langote	11c87216d1	SQL/JSON: Fix some oversights in commit `b6e1157e7` The decision in `b6e1157e7` to ignore raw_expr when evaluating a JsonValueExpr was incorrect. While its value is not ultimately used (since formatted_expr's value is), failing to initialize it can lead to problems, for instance, when the expression tree in raw_expr contains Aggref nodes, which must be initialized to ensure the parent Agg node works correctly. Also, optimize eval_const_expressions_mutator()'s handling of JsonValueExpr a bit. Currently, when formatted_expr cannot be folded into a constant, we end up processing it twice -- once directly in eval_const_expressions_mutator() and again recursively via ece_generic_processing(). This recursive processing is required to handle raw_expr. To avoid the redundant processing of formatted_expr, we now process raw_expr directly in eval_const_expressions_mutator(). Finally, update the comment of JsonValueExpr to describe the roles of raw_expr and formatted_expr more clearly. Bug: #18657 Reported-by: Alexander Lakhin <exclusion@gmail.com> Diagnosed-by: Fabio R. Sluzala <fabio3rs@gmail.com> Diagnosed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18657-1b90ccce2b16bdb8@postgresql.org Backpatch-through: 16	2024-10-20 12:20:55 +09:00
Thomas Munro	98c7c7152d	Fix extreme skew detection in Parallel Hash Join. After repartitioning the inner side of a hash join that would have exceeded the allowed size, we check if all the tuples from a parent partition moved to one child partition. That is evidence that it contains duplicate keys and later attempts to repartition will also fail, so we should give up trying to limit memory (for lack of a better fallback strategy). A thinko prevented the check from working correctly in partition 0 (the one that is partially loaded into memory already). After repartitioning, we should check for extreme skew if the parent partition's space_exhausted flag was set, not the child partition's. The consequence was repeated futile repartitioning until per-partition data exceeded various limits including "ERROR: invalid DSA memory alloc request size 1811939328", OS allocation failure, or temporary disk space errors. (We could also do something about some of those symptoms, but that's material for separate patches.) This problem only became likely when PostgreSQL 16 introduced support for Parallel Hash Right/Full Join, allowing NULL keys into the hash table. Repartitioning always leaves NULL in partition 0, no matter how many times you do it, because the hash value is all zero bits. That's unlikely for other hashed values, but they might still have caused wasted extra effort before giving up. Back-patch to all supported releases. Reported-by: Craig Milhiser <craig@milhiser.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Discussion: https://postgr.es/m/CA%2BwnhO1OfgXbmXgC4fv_uu%3DOxcDQuHvfoQ4k0DFeB0Qqd-X-rQ%40mail.gmail.com	2024-10-17 22:11:59 +13:00
David Rowley	9ca67658d1	Don't store intermediate hash values in ExprState->resvalue `adf97c156` made it so ExprStates could support hashing and changed Hash Join to use that instead of manually extracting Datums from tuples and hashing them one column at a time. When hashing multiple columns or expressions, the code added in that commit stored the intermediate hash value in the ExprState's resvalue field. That was a mistake as steps may be injected into the ExprState between each hashing step that look at or overwrite the stored intermediate hash value. EEOP_PARAM_SET is an example of such a step. Here we fix this by adding a new dedicated field for storing intermediate hash values and adjust the code so that all apart from the final hashing step store their result in the intermediate field. In passing, rename a variable so that it's more aligned to the surrounding code and also so a few lines stay within the 80 char margin. Reported-by: Andres Freund Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru> Discussion: https://postgr.es/m/CAApHDvqo9eenEFXND5zZ9JxO_k4eTA4jKMGxSyjdTrsmYvnmZw@mail.gmail.com	2024-10-17 14:25:08 +13:00
Tom Lane	c96de42c4b	Further refine _SPI_execute_plan's rule for atomic execution. Commit `2dc1deaea` turns out to have been still a brick shy of a load, because CALL statements executing within a plpgsql exception block could still pass the wrong snapshot to stable functions within the CALL's argument list. That happened because standard_ProcessUtility forces isAtomicContext to true if IsTransactionBlock is true, which it always will be inside a subtransaction. Then ExecuteCallStmt would think it does not need to push a new snapshot --- but _SPI_execute_plan didn't do so either, since it thought it was in nonatomic mode. The best fix for this seems to be for _SPI_execute_plan to operate in atomic execution mode if IsSubTransaction() is true, even when the SPI context as a whole is non-atomic. This makes _SPI_execute_plan have the same rules about when non-atomic execution is allowed as _SPI_commit/_SPI_rollback have about when COMMIT/ROLLBACK are allowed, which seems appropriately symmetric. (If anyone ever tries to allow COMMIT/ROLLBACK inside a subtransaction, this would all need to be rethought ... but I'm unconvinced that such a thing could be logically consistent at all.) For further consistency, also check IsSubTransaction() in SPI_inside_nonatomic_context. That does not matter for its one present-day caller StartTransaction, which can't be reached inside a subtransaction. But if any other callers ever arise, they'd presumably want this definition. Per bug #18656 from Alexander Alehin. Back-patch to all supported branches, like previous fixes in this area. Discussion: https://postgr.es/m/18656-cade1780866ef66c@postgresql.org	2024-10-16 17:36:40 -04:00
Peter Eisentraut	c594f1ad2b	Track scan reversals in MergeJoin The MergeJoin struct was tracking "mergeStrategies", which were an array of btree strategy numbers, purely for the purpose of comparing it later against btree strategies to determine if the scan direction was forward or reverse. Change that. Instead, track "mergeReversals", an array of bool, to indicate the same without an unfortunate assumption that a strategy number refers specifically to a btree strategy. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2024-10-14 15:36:18 +02:00
Michael Paquier	de3a2ea3b2	Introduce two fields in EState to track parallel worker activity These fields can be set by executor nodes to record how many parallel workers were planned to be launched and how many of them have been actually launched within the number initially planned. This data is able to give an approximation of the parallel worker draught a system is facing, making easier the tuning of related configuration parameters. These fields will be used by some follow-up patches to populate other parts of the system with their data. Author: Guillaume Lelarge, Benoit Lobréau Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com Discussion: https://postgr.es/m/CAECtzeWtTGOK0UgKXdDGpfTVSa5bd_VbUt6K6xn8P7X+_dZqKw@mail.gmail.com	2024-10-09 08:07:48 +09:00
Michael Paquier	4572d59e3c	Improve style of two code paths In execGrouping.c, execTuplesMatchPrepare() was doing a memory allocation that was not necessary when the number of columns was 0. In foreign.c, pg_options_to_table() was assigning twice a variable to the same value. Author: Ranier Vilela Discussion: https://postgr.es/m/CAEudQAqup0agbSzMjSLSTn=OANyCzxENF1+HrSYnr3WyZib7=Q@mail.gmail.com	2024-10-08 10:51:20 +09:00
Michael Paquier	4dd3087300	Remove assertion checking query ID in execMain.c This assertion has been added by `24f5205948`, but Alexander Lakhin has proved that the ExecutorRun() one can be broken by using a PL function that manipulates compute_query_id and track_activities, while the ones in ExecutorFinish() and ExecutorEnd() could be triggered when cleaning up portals at the beginning of a new query execution. Discussion: https://postgr.es/m/b37d8e6c-e83d-e157-8865-1b2460a6aef2@gmail.com	2024-10-04 12:51:17 +09:00
Michael Paquier	5deb56387e	Expand assertion check for query ID reporting in executor As formulated, the assertion added in the executor by `24f5205948` to check that a query ID is set had two problems: - track_activities may be disabled while compute_query_id is enabled, causing the query ID to not be reported to pg_stat_activity. - debug_query_string may not be set in some context. The only path where this would matter is visibly autovacuum, should parallel workers be enabled there at some point. This is not the case currently. There was no test showing the interactions between the query ID and track_activities, so let's add one based on a scan of pg_stat_activity. This assertion is still an experimentation at this stage, but let's see if this shows more paths where query IDs are not properly set while they should. Discussion: https://postgr.es/m/Zvn5616oYXmpXyHI@paquier.xyz	2024-10-01 08:56:21 +09:00
Noah Misch	aac2c9b4fd	For inplace update durability, make heap_update() callers wait. The previous commit fixed some ways of losing an inplace update. It remained possible to lose one when a backend working toward a heap_update() copied a tuple into memory just before inplace update of that tuple. In catalogs eligible for inplace update, use LOCKTAG_TUPLE to govern admission to the steps of copying an old tuple, modifying it, and issuing heap_update(). This includes MERGE commands. To avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when holding a relation lock sufficient to exclude inplace updaters. Back-patch to v12 (all supported versions). In v13 and v12, "UPDATE pg_class" or "UPDATE pg_database" can still lose an inplace update. The v14+ UPDATE fix needs commit `86dc90056d`, and it wasn't worth reimplementing that fix without such infrastructure. Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas. Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com	2024-09-24 15:25:18 -07:00
David Rowley	5d56d07ca3	Optimize tuplestore usage for WITH RECURSIVE CTEs nodeRecursiveunion.c makes use of two tuplestores and, until now, would delete and recreate one of these tuplestores after every recursive iteration. Here we adjust that behavior and instead reuse one of the existing tuplestores and just empty it of all tuples using tuplestore_clear(). This saves some free/malloc roundtrips and has shown a 25-30% performance improvement for queries that perform very little work between recursive iterations. This also paves the way to add some EXPLAIN ANALYZE telemetry output for recursive common table expressions, similar to what was done in `1eff8279d` and `95d6e9af0`. Previously calling tuplestore_end() would have caused the maximum storage space used to be lost. Reviewed-by: Tatsuo Ishii Discussion: https://postgr.es/m/CAApHDvr9yW0YRiK8A2J7nvyT8g17YzbSfOviEWrghazKZbHbig@mail.gmail.com	2024-09-19 15:20:35 +12:00
Michael Paquier	24f5205948	Add some sanity checks in executor for query ID reporting This commit adds three sanity checks in code paths of the executor where it is possible to use hooks, checking that a query ID is reported in pg_stat_activity if compute_query_id is enabled: - ExecutorRun() - ExecutorFinish() - ExecutorEnd() This causes the test in pg_stat_statements added in `933848d16d` to complain immediately in ExecutorRun(). The idea behind this commit is to help extensions to detect if they are missing query ID reports when a query goes through the executor. Perhaps this will prove to be a bad idea, but let's see where this experience goes in v18 and newer versions. Reviewed-by: Sami Imseih Discussion: https://postgr.es/m/ZuJb5xCKHH0A9tMN@paquier.xyz	2024-09-18 14:43:37 +09:00
Michael Paquier	933848d16d	Add missing query ID reporting in extended query protocol This commit adds query ID reports for two code paths when processing extended query protocol messages: - When receiving a bind message, setting it to the first Query retrieved from a cached cache. - When receiving an execute message, setting it to the first PlannedStmt stored in a portal. An advantage of this method is that this is able to cover all the types of portals handled in the extended query protocol, particularly these two when the report done in ExecutorStart() is not enough (neither is an addition in ExecutorRun(), actually, for the second point): - Multiple execute messages, with multiple ExecutorRun(). - Portal with execute/fetch messages, like a query with a RETURNING clause and a fetch size that stores the tuples in a first execute message going though ExecutorStart() and ExecuteRun(), followed by one or more execute messages doing only fetches from the tuplestore created in the first message. This corresponds to the case where execute_is_fetch is set, for example. Note that the query ID reporting done in ExecutorStart() is still necessary, as an EXECUTE requires it. Query ID reporting is optimistic and more calls to pgstat_report_query_id() don't matter as the first report takes priority except if the report is forced. The comment in ExecutorStart() is adjusted to reflect better the reality with the extended query protocol. The test added in pg_stat_statements is a courtesy of Robert Haas. This uses psql's \bind metacommand, hence this part is backpatched down to v16. Reported-by: Kaido Vaikla, Erik Wienhold Author: Sami Imseih Reviewed-by: Jian He, Andrei Lepikhov, Michael Paquier Discussion: https://postgr.es/m/CA+427g8DiW3aZ6pOpVgkPbqK97ouBdf18VLiHFesea2jUk3XoQ@mail.gmail.com Discussion: https://postgr.es/m/CA+TgmoZxtnf_jZ=VqBSyaU8hfUkkwoJCJ6ufy4LGpXaunKrjrg@mail.gmail.com Discussion: https://postgr.es/m/1391613709.939460.1684777418070@office.mailbox.org Backpatch-through: 14	2024-09-18 09:59:09 +09:00
Peter Eisentraut	fc0438b4e8	Add temporal PRIMARY KEY and UNIQUE constraints Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints. These are backed by GiST indexes instead of B-tree indexes, since they are essentially exclusion constraints with = for the scalar parts of the key and && for the temporal part. (previously committed as `46a0cd4cef`, reverted by 46a0cd4cefb; the new part is this:) Because 'empty' && 'empty' is false, the temporal PK/UQ constraint allowed duplicates, which is confusing to users and breaks internal expectations. For instance, when GROUP BY checks functional dependencies on the PK, it allows selecting other columns from the table, but in the presence of duplicate keys you could get the value from any of their rows. So we need to forbid empties. This all means that at the moment we can only support ranges and multiranges for temporal PK/UQs, unlike the original patch (above). Documentation and tests for this are added. But this could conceivably be extended by introducing some more general support for the notion of "empty" for other types. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:30 +02:00
Amit Langote	dd8bea88ab	SQL/JSON: Avoid initializing unnecessary ON ERROR / ON EMPTY steps When the ON ERROR / ON EMPTY behavior is to return NULL, returning NULL directly from ExecEvalJsonExprPath() suffices. Therefore, there's no need to create separate steps to check the error/empty flag or those to evaluate the the constant NULL expression. This speeds up common cases because the default ON ERROR / ON EMPTY behavior for JSON_QUERY() and JSON_VALUE() is to return NULL. However, these steps are necessary if the RETURNING type is a domain, as constraints on the domain may need to be checked. Reported-by: Jian He <jian.universality@gmail.com> Author: Jian He <jian.universality@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17	2024-09-09 13:46:58 +09:00
Amit Langote	4d7e24e0f4	Revert recent SQL/JSON related commits Reverts `68222851d5`, `565caaa79a`, and `3a97460970`, because a few BF animals didn't like one or all of them.	2024-09-06 12:53:01 +09:00
Amit Langote	3a97460970	SQL/JSON: Avoid initializing unnecessary ON ERROR / ON EMPTY steps When the ON ERROR / ON EMPTY behavior is to return NULL, returning NULL directly from ExecEvalJsonExprPath() suffices. Therefore, there's no need to create separate steps to check the error/empty flag or those to evaluate the the constant NULL expression. This speeds up common cases because the default ON ERROR / ON EMPTY behavior for JSON_QUERY() and JSON_VALUE() is to return NULL. However, these steps are necessary if the RETURNING type is a domain, as constraints on the domain may need to be checked. Reported-by: Jian He <jian.universality@gmail.com> Author: Jian He <jian.universality@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17	2024-09-06 12:05:40 +09:00
Peter Eisentraut	4af123ad45	Fix misleading error message context Author: Pavel Stehule <pavel.stehule@gmail.com> Reviewed-by: Stepan Neretin <sncfmgg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRAw+OkVW=FgMKHKyvY3CgtWy3cWdY7XT+S5TJaTttu=oA@mail.gmail.com	2024-09-05 15:19:00 +02:00

1 2 3 4 5 ...

2585 Commits