postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-19 13:42:17 +03:00

Author	SHA1	Message	Date
Richard Guo	f5050f795a	Mark expressions nullable by grouping sets When generating window_pathkeys, distinct_pathkeys, or sort_pathkeys, we failed to realize that the grouping/ordering expressions might be nullable by grouping sets. As a result, we may incorrectly deem that the PathKeys are redundant by EquivalenceClass processing and thus remove them from the pathkeys list. That would lead to wrong results in some cases. To fix this issue, we mark the grouping expressions nullable by grouping sets if that is the case. If the grouping expression is a Var or PlaceHolderVar or constructed from those, we can just add the RT index of the RTE_GROUP RTE to the existing nullingrels field(s); otherwise we have to add a PlaceHolderVar to carry on the nullingrel bit. However, we have to manually remove this nullingrel bit from expressions in various cases where these expressions are logically below the grouping step, such as when we generate groupClause pathkeys for grouping sets, or when we generate PathTarget for initial input to grouping nodes. Furthermore, in set_upper_references, the targetlist and quals of an Agg node should have nullingrels that include the effects of the grouping step, ie they will have nullingrels equal to the input Vars/PHVs' nullingrels plus the nullingrel bit that references the grouping RTE. In order to perform exact nullingrels matches, we also need to manually remove this nullingrel bit. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com	2024-09-10 12:36:48 +09:00
Richard Guo	247dea89f7	Introduce an RTE for the grouping step If there are subqueries in the grouping expressions, each of these subqueries in the targetlist and HAVING clause is expanded into distinct SubPlan nodes. As a result, only one of these SubPlan nodes would be converted to reference to the grouping key column output by the Agg node; others would have to get evaluated afresh. This is not efficient, and with grouping sets this can cause wrong results issues in cases where they should go to NULL because they are from the wrong grouping set. Furthermore, during re-evaluation, these SubPlan nodes might use nulled column values from grouping sets, which is not correct. This issue is not limited to subqueries. For other types of expressions that are part of grouping items, if they are transformed into another form during preprocessing, they may fail to match lower target items. This can also lead to wrong results with grouping sets. To fix this issue, we introduce a new kind of RTE representing the output of the grouping step, with columns that are the Vars or expressions being grouped on. In the parser, we replace the grouping expressions in the targetlist and HAVING clause with Vars referencing this new RTE, so that the output of the parser directly expresses the semantic requirement that the grouping expressions be gotten from the grouping output rather than computed some other way. In the planner, we first preprocess all the columns of this new RTE and then replace any Vars in the targetlist and HAVING clause that reference this new RTE with the underlying grouping expressions, so that we will have only one instance of a SubPlan node for each subquery contained in the grouping expressions. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com	2024-09-10 12:35:34 +09:00
Richard Guo	87b6c3c0b7	Fix order of parameters in a cost_sort call In label_sort_with_costsize, the cost_sort function is called with the parameters 'input_disabled_nodes' and 'input_cost' in the wrong order. This does not cause any plan diffs in the regression tests, because label_sort_with_costsize is only used to label the Sort node nicely for EXPLAIN, and cost numbers are not displayed in regression tests. Oversight in `e22253467`. Fixed by passing arguments in the right order. Per report from Alexander Lakhin running UBSan. Author: Alexander Lakhin Discussion: https://postgr.es/m/a9b7231d-68bc-f117-a07c-96688f3e6aef@gmail.com	2024-09-09 12:58:31 +09:00
Richard Guo	9626068f13	Avoid unnecessary post-sort projection When generating paths for the ORDER BY clause, one thing we need to ensure is that the output paths project the correct final_target. To achieve this, in create_ordered_paths, we compare the pathtarget of each generated path with the given 'target', and add a post-sort projection step if the two targets do not match. Currently we perform a simple pointer comparison between the two targets. It turns out that this is not sufficient. Each sorted_path generated in create_ordered_paths initially projects the correct target required by the preceding steps of sort. If it is the same pointer as sort_input_target, pointer comparison suffices, because sort_input_target is always identical to final_target when no post-sort projection is needed. However, sorted_path's initial pathtarget may not be the same pointer as sort_input_target, because in apply_scanjoin_target_to_paths, if the target to be applied has the same expressions as the existing reltarget, we only inject the sortgroupref info into the existing pathtargets, rather than create new projection paths. As a result, pointer comparison in create_ordered_paths is not reliable. Instead, we can compare PathTarget.exprs to determine whether a projection step is needed. If the expressions match, we can be confident that a post-sort projection is not required. It could be argued that this change adds extra check cost each time we decide whether a post-sort projection is needed. However, as explained in apply_scanjoin_target_to_paths, by avoiding the creation of projection paths, we save effort both immediately and at plan creation time. This, I think, justifies the extra check cost. There are two ensuing plan changes in the regression tests, but they look reasonable and are exactly what we are fixing here. So no additional test cases are added. No backpatch as this could result in plan changes. Author: Richard Guo Reviewed-by: Peter Eisentraut, David Rowley, Tom Lane Discussion: https://postgr.es/m/CAMbWs48TosSvmnz88663_2yg3hfeOFss-J2PtnENDH6J_rLnRQ@mail.gmail.com	2024-09-04 12:19:19 +09:00
Richard Guo	4f1124548f	Check the validity of commutators for merge/hash clauses When creating merge or hash join plans in createplan.c, the merge or hash clauses may need to get commuted to ensure that the outer var is on the left and the inner var is on the right if they are not already in the expected form. This requires that their operators have commutators. Failing to find a commutator at this stage would result in 'ERROR: could not find commutator for operator xxx', with no opportunity to select an alternative plan. Typically, this is not an issue because mergejoinable or hashable operators are expected to always have valid commutators. But in some artificial cases this assumption may not hold true. Therefore, here in this patch we check the validity of commutators for clauses in the form "inner op outer" when selecting mergejoin/hash clauses, and consider a clause unusable for the current pair of outer and inner relations if it lacks a commutator. There are not (and should not be) any such operators built into Postgres that are mergejoinable or hashable but have no commutators; so we leverage the alias type 'int8alias1' created in equivclass.sql to build the test case. This is why the test case is included in equivclass.sql rather than in join.sql. Although this is arguably a bug fix, it cannot be reproduced without installing an incomplete opclass, which is unlikely to happen in practice, so no back-patch. Reported-by: Alexander Pyhalov Author: Richard Guo Reviewed-by: Tom Lane Discussion: https://postgr.es/m/c59ec04a2fef94d9ffc35a9b17dfc081@postgrespro.ru	2024-09-04 12:17:11 +09:00
Tom Lane	cb8e50a4a0	Avoid inserting PlaceHolderVars in cases where pre-v16 PG did not. Commit `2489d76c4` removed some logic from pullup_replace_vars() that avoided wrapping a PlaceHolderVar around a pulled-up subquery output expression if the expression could be proven to go to NULL anyway (because it contained Vars or PHVs of the pulled-up relation and did not contain non-strict constructs). But removing that logic turns out to cause performance regressions in some cases, because the extra PHV blocks subexpression folding, and will do so even if outer-join reduction later turns it into a no-op with no phnullingrels bits. This can for example prevent an expression from being matched to an index. The reason for always adding a PHV was to ensure we had someplace to put the varnullingrels marker bits of the Var being replaced. However, it turns out we can optimize in exactly the same cases that the previous code did, because we can instead attach the needed varnullingrels bits to the contained Var(s)/PHV(s). This is not a complete solution --- it would be even better if we could remove PHVs after reducing them to no-ops. It doesn't look practical to back-patch such an improvement, but this change seems safe and at least gets rid of the performance-regression cases. Per complaint from Nikhil Raj. Back-patch to v16 where the problem appeared. Discussion: https://postgr.es/m/CAG1ps1xvnTZceKK24OUfMKLPvDP2vjT-d+F2AOCWbw_v3KeEgg@mail.gmail.com	2024-08-30 12:42:12 -04:00
Robert Haas	c01743aa48	Show number of disabled nodes in EXPLAIN ANALYZE output. Now that disable_cost is not included in the cost estimate, there's no visible sign in EXPLAIN output of which plan nodes are disabled. Fix that by propagating the number of disabled nodes from Path to Plan, and then showing it in the EXPLAIN output. There is some question about whether this is a desirable change. While I personally believe that it is, it seems best to make it a separate commit, in case we decide to back out just this part, or rework it. Reviewed by Andres Freund, Heikki Linnakangas, and David Rowley. Discussion: http://postgr.es/m/CA+TgmoZ_+MS+o6NeGK2xyBv-xM+w1AfFVuHE4f_aq6ekHv7YSQ@mail.gmail.com	2024-08-21 10:14:35 -04:00
Robert Haas	e222534679	Treat number of disabled nodes in a path as a separate cost metric. Previously, when a path type was disabled by e.g. enable_seqscan=false, we either avoided generating that path type in the first place, or more commonly, we added a large constant, called disable_cost, to the estimated startup cost of that path. This latter approach can distort planning. For instance, an extremely expensive non-disabled path could seem to be worse than a disabled path, especially if the full cost of that path node need not be paid (e.g. due to a Limit). Or, as in the regression test whose expected output changes with this commit, the addition of disable_cost can make two paths that would normally be distinguishible in cost seem to have fuzzily the same cost. To fix that, we now count the number of disabled path nodes and consider that a high-order component of both the startup cost and the total cost. Hence, the path list is now sorted by disabled_nodes and then by total_cost, instead of just by the latter, and likewise for the partial path list. It is important that this number is a count and not simply a Boolean; else, as soon as we're unable to respect disabled path types in all portions of the path, we stop trying to avoid them where we can. Because the path list is now sorted by the number of disabled nodes, the join prechecks must compute the count of disabled nodes during the initial cost phase instead of postponing it to final cost time. Counts of disabled nodes do not cross subquery levels; at present, there is no reason for them to do so, since the we do not postpone path selection across subquery boundaries (see make_subplan). Reviewed by Andres Freund, Heikki Linnakangas, and David Rowley. Discussion: http://postgr.es/m/CA+TgmoZ_+MS+o6NeGK2xyBv-xM+w1AfFVuHE4f_aq6ekHv7YSQ@mail.gmail.com	2024-08-21 10:12:30 -04:00
Alvaro Herrera	52f3de874b	Avoid failure to open dropped detached partition When a partition is detached and immediately dropped, a prepared statement could try to compute a new partition descriptor that includes it. This leads to this kind of error: ERROR: could not open relation with OID 457639 Avoid this by skipping the partition in expand_partitioned_rtentry if it doesn't exist. Noted by me while investigating bug #18559. Kuntal Gosh helped to identify the exact failure. Backpatch to 14, where DETACH CONCURRENTLY was introduced. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/202408122233.bo4adt3vh5bi@alvherre.pgsql	2024-08-19 16:09:10 -04:00
Peter Eisentraut	a67a49648d	Rename C23 keyword constexpr is a keyword in C23. Rename a conflicting identifier for future-proofing. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/08abc832-1384-4aca-a535-1a79765b565e%40eisentraut.org	2024-08-13 06:15:28 +02:00
Tom Lane	b919a97a6c	Fix "failed to find plan for subquery/CTE" errors in EXPLAIN. To deparse a reference to a field of a RECORD-type output of a subquery, EXPLAIN normally digs down into the subquery's plan to try to discover exactly which anonymous RECORD type is meant. However, this can fail if the subquery has been optimized out of the plan altogether on the grounds that no rows could pass the WHERE quals, which has been possible at least since `3fc6e2d7f`. There isn't anything remaining in the plan tree that would help us, so fall back to printing the field name as "fN" for the N'th column of the record. (This will actually be the right thing some of the time, since it matches the column names we assign to RowExprs.) In passing, fix a comment typo in create_projection_plan, which I noticed while experimenting with an alternative fix for this. Per bug #18576 from Vasya B. Back-patch to all supported branches. Richard Guo and Tom Lane Discussion: https://postgr.es/m/18576-9feac34e132fea9e@postgresql.org	2024-08-09 11:21:39 -04:00
Masahiko Sawada	66e94448ab	Restrict accesses to non-system views and foreign tables during pg_dump. When pg_dump retrieves the list of database objects and performs the data dump, there was possibility that objects are replaced with others of the same name, such as views, and access them. This vulnerability could result in code execution with superuser privileges during the pg_dump process. This issue can arise when dumping data of sequences, foreign tables (only 13 or later), or tables registered with a WHERE clause in the extension configuration table. To address this, pg_dump now utilizes the newly introduced restrict_nonsystem_relation_kind GUC parameter to restrict the accesses to non-system views and foreign tables during the dump process. This new GUC parameter is added to back branches too, but these changes do not require cluster recreation. Back-patch to all supported branches. Reviewed-by: Noah Misch Security: CVE-2024-7348 Backpatch-through: 12	2024-08-05 06:05:33 -07:00
Richard Guo	9b282a9359	Fix partitionwise join with partially-redundant join clauses To determine if the two relations being joined can use partitionwise join, we need to verify the existence of equi-join conditions involving pairs of matching partition keys for all partition keys. Currently we do that by looking through the join's restriction clauses. However, it has been discovered that this approach is insufficient, because there might be partition keys known equal by a specific EC, but they do not form a join clause because it happens that other members of the EC than the partition keys are constrained to become a join clause. To address this issue, in addition to examining the join's restriction clauses, we also check if any partition keys are known equal by ECs, by leveraging function exprs_known_equal(). To accomplish this, we enhance exprs_known_equal() to check equality per the semantics of the opfamily, if provided. It could be argued that exprs_known_equal() could be called O(N^2) times, where N is the number of partition key expressions, resulting in noticeable performance costs if there are a lot of partition key expressions. But I think this is not a problem. The number of a joinrel's partition key expressions would only be equal to the join degree, since each base relation within the join contributes only one partition key expression. That is to say, it does not scale with the number of partitions. A benchmark with a query involving 5-way joins of partitioned tables, each with 3 partition keys and 1000 partitions, shows that the planning time is not significantly affected by this patch (within the margin of error), particularly when compared to the impact caused by partitionwise join. Thanks to Tom Lane for the idea of leveraging exprs_known_equal() to check if partition keys are known equal by ECs. Author: Richard Guo, Tom Lane Reviewed-by: Tom Lane, Ashutosh Bapat, Robert Haas Discussion: https://postgr.es/m/CAN_9JTzo_2F5dKLqXVtDX5V6dwqB0Xk+ihstpKEt3a1LT6X78A@mail.gmail.com	2024-07-30 15:51:54 +09:00
Richard Guo	2309eff62b	Refactor the checks for parameterized partial paths Parameterized partial paths are not supported, and we have several checks in try_partial_xxx_path functions to enforce this. For a partial nestloop join path, we need to ensure that if the inner path is parameterized, the parameterization is fully satisfied by the proposed outer path. For a partial merge/hashjoin join path, we need to ensure that the inner path is not parameterized. In all cases, we need to ensure that the outer path is not parameterized. However, the comment in try_partial_hashjoin_path does not describe this correctly. This patch fixes that. In addtion, this patch simplifies the checks peformed in try_partial_hashjoin_path and try_partial_mergejoin_path with the help of macro PATH_REQ_OUTER, and also adds asserts that the outer path is not parameterized in try_partial_xxx_path functions. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs48mKJ6g_GnYNa7dnw04MHaMK-jnAEBrMVhTp2uUg3Ut4A@mail.gmail.com	2024-07-30 15:49:44 +09:00
Richard Guo	cc9daa09ee	Short-circuit sort_inner_and_outer if there are no mergejoin clauses In sort_inner_and_outer, we create mergejoin join paths by explicitly sorting both relations on each possible ordering of the available mergejoin clauses. However, if there are no available mergejoin clauses, we can skip this process entirely. This patch introduces a check for mergeclause_list at the beginning of sort_inner_and_outer and exits the function if it is found to be empty. This might help skip all the statements that come before the call to select_outer_pathkeys_for_merge, including the build of UniquePaths in the case of JOIN_UNIQUE_OUTER or JOIN_UNIQUE_INNER. I doubt there's any measurable performance improvement, but throughout the run of the regression tests, sort_inner_and_outer is called a total of 44,424 times. Among these calls, there are 11,064 instances where mergeclause_list is found to be empty, which accounts for approximately one-fourth. I think this suggests that implementing this shortcut is worthwhile. Author: Richard Guo Reviewed-by: Ashutosh Bapat Discussion: https://postgr.es/m/CAMbWs48RKiZGFEd5A0JtztRY5ZdvVvNiHh0AKeuoz21F+0dVjQ@mail.gmail.com	2024-07-30 15:46:39 +09:00
Richard Guo	513f4472a4	Reduce memory used by partitionwise joins In try_partitionwise_join, we aim to break down the join between two partitioned relations into joins between matching partitions. To achieve this, we iterate through each pair of partitions from the two joining relations and create child-join relations for them. With potentially thousands of partitions, the local objects allocated in each iteration can accumulate significant memory usage. Therefore, we opt to eagerly free these local objects at the end of each iteration. In line with this approach, this patch frees the bitmap set that represents the relids of child-join relations at the end of each iteration. Additionally, it modifies build_child_join_rel() to reuse the AppendRelInfo structures generated within each iteration. Author: Ashutosh Bapat Reviewed-by: David Christensen, Richard Guo Discussion: https://postgr.es/m/CAExHW5s4EqY43oB=ne6B2=-xLgrs9ZGeTr1NXwkGFt2j-OmaQQ@mail.gmail.com	2024-07-29 11:35:51 +09:00
Richard Guo	f47b33a191	Simplify create_merge_append_path for clarity We don't currently support parameterized MergeAppend paths: there's little use for an ordered path on the inside of a nestloop. Given this, we can simplify create_merge_append_path by directly setting param_info to NULL instead of calling get_appendrel_parampathinfo. We can also simplify the Assert for child paths a little bit. This change won't make any measurable difference in performance; it's just for clarity's sake. Author: Richard Guo Reviewed-by: Alena Rybakina, Paul A Jungwirth Discussion: https://postgr.es/m/CAMbWs4_n1bgH2nACMuGsXZct3KH6PBFS0tPdQsXdstRfyxTunQ@mail.gmail.com	2024-07-29 11:33:18 +09:00
Richard Guo	8b2e9fd26a	Remove redundant code in create_gather_merge_path In create_gather_merge_path, we should always guarantee that the subpath is adequately ordered, and we do not add a Sort node in createplan.c for a Gather Merge node. Therefore, the 'else' branch in create_gather_merge_path, which computes the cost for a Sort node, is redundant. This patch removes the redundant code and emits an error if the subpath is not sufficiently ordered. Meanwhile, this patch changes the check for the subpath's pathkeys in create_gather_merge_plan to an Assert. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs48u=0bWf3epVtULjJ-=M9Hbkz+ieZQAOS=BfbXZFqbDCg@mail.gmail.com	2024-07-23 11:18:53 +09:00
Richard Guo	581df21487	Fix rowcount estimate for gather (merge) paths In the case of a parallel plan, when computing the number of tuples processed per worker, we divide the total number of tuples by the parallel_divisor obtained from get_parallel_divisor(), which accounts for the leader's contribution in addition to the number of workers. Accordingly, when estimating the number of tuples for gather (merge) nodes, we should multiply the number of tuples per worker by the same parallel_divisor to reverse the division. However, currently we use parallel_workers rather than parallel_divisor for the multiplication. This could result in an underestimation of the number of tuples for gather (merge) nodes, especially when there are fewer than four workers. This patch fixes this issue by using the same parallel_divisor for the multiplication. There is one ensuing plan change in the regression tests, but it looks reasonable and does not compromise its original purpose of testing parallel-aware hash join. In passing, this patch removes an unnecessary assignment for path.rows in create_gather_merge_path, and fixes an uninitialized-variable issue in generate_useful_gather_paths. No backpatch as this could result in plan changes. Author: Anthonin Bonnefoy Reviewed-by: Rafia Sabih, Richard Guo Discussion: https://postgr.es/m/CAO6_Xqr9+51NxgO=XospEkUeAg-p=EjAWmtpdcZwjRgGKJ53iA@mail.gmail.com	2024-07-23 10:33:26 +09:00
Robert Haas	e4326fbc60	Remove grotty use of disable_cost for TID scan plans. Previously, the code charged disable_cost for CurrentOfExpr, and then subtracted disable_cost from the cost of a TID path that used CurrentOfExpr as the TID qual, effectively disabling all paths except that one. Now, we instead suppress generation of the disabled paths entirely, and generate only the one that the executor will actually understand. With this approach, we do not need to rely on disable_cost being large enough to prevent the wrong path from being chosen, and we save some CPU cycle by avoiding generating paths that we can't actually use. In my opinion, the code is also easier to understand like this. Patch by me. Review by Heikki Linnakangas. Discussion: http://postgr.es/m/591b3596-2ea0-4b8e-99c6-fad0ef2801f5@iki.fi	2024-07-22 14:57:53 -04:00
Richard Guo	069d0ff022	Check lateral references within PHVs for memoize cache keys If we intend to generate a Memoize node on top of a path, we need cache keys of some sort. Currently we search for the cache keys in the parameterized clauses of the path as well as the lateral_vars of its parent. However, it turns out that this is not sufficient because there might be lateral references derived from PlaceHolderVars, which we fail to take into consideration. This oversight can cause us to miss opportunities to utilize the Memoize node. Moreover, in some plans, failing to recognize all the cache keys could result in performance regressions. This is because without identifying all the cache keys, we would need to purge the entire cache every time we get a new outer tuple during execution. This patch fixes this issue by extracting lateral Vars from within PlaceHolderVars and subsequently including them in the cache keys. In passing, this patch also includes a comment clarifying that Memoize nodes are currently not added on top of join relation paths. This explains why this patch only considers PlaceHolderVars that are due to be evaluated at baserels. Author: Richard Guo Reviewed-by: Tom Lane, David Rowley, Andrei Lepikhov Discussion: https://postgr.es/m/CAMbWs48jLxn0pAPZpJ50EThZ569Xrw+=4Ac3QvkpQvNszbeoNg@mail.gmail.com	2024-07-15 10:26:33 +09:00
Richard Guo	22d946b0f8	Consider materializing the cheapest inner path in parallel nestloop When generating non-parallel nestloop paths for each available outer path, we always consider materializing the cheapest inner path if feasible. Similarly, in this patch, we also consider materializing the cheapest inner path when building partial nestloop paths. This approach potentially reduces the need to rescan the inner side of a partial nestloop path for each outer tuple. Author: Tender Wang Reviewed-by: Richard Guo, Robert Haas, David Rowley, Alena Rybakina Reviewed-by: Tomasz Rybak, Paul Jungwirth, Yuki Fujii Discussion: https://postgr.es/m/CAHewXNkPmtEXNfVQMou_7NqQmFABca9f4etjBtdbbm0ZKDmWvw@mail.gmail.com	2024-07-12 11:16:43 +09:00
Richard Guo	aa86129e19	Support "Right Semi Join" plan shapes Hash joins can support semijoin with the LHS input on the right, using the existing logic for inner join, combined with the assurance that only the first match for each inner tuple is considered, which can be achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag. This can be very useful in some cases since we may now have the option to hash the smaller table instead of the larger. Merge join could likely support "Right Semi Join" too. However, the benefit of swapping inputs tends to be small here, so we do not address that in this patch. Note that this patch also modifies a test query in join.sql to ensure it continues testing as intended. With this patch the original query would result in a right-semi-join rather than semi-join, compromising its original purpose of testing the fix for neqjoinsel's behavior for semi-joins. Author: Richard Guo Reviewed-by: wenhui qiu, Alena Rybakina, Japin Li Discussion: https://postgr.es/m/CAMbWs4_X1mN=ic+SxcyymUqFx9bB8pqSLTGJ-F=MHy4PW3eRXw@mail.gmail.com	2024-07-05 09:26:48 +09:00
Michael Paquier	b81a71aa05	Assign error codes where missing for user-facing failures All the errors triggered in the code paths patched here would cause the backend to issue an internal_error errcode, which is a state that should be used only for "can't happen" situations. However, these code paths are reachable by the regression tests, and could be seen by users in valid cases. Some regression tests expect internal errcodes as they manipulate the backend state to cause corruption (like checksums), or use elog() because it is more convenient (like injection points), these have no need to change. This reduces the number of internal failures triggered in a check-world by more than half, while providing correct errcodes for these valid cases. Reviewed-by: Robert Haas Discussion: https://postgr.es/m/Zic_GNgos5sMxKoa@paquier.xyz	2024-07-04 09:48:40 +09:00
David Rowley	65b71dec2d	Use TupleDescAttr macro consistently A few places were directly accessing the attrs[] array. This goes against the standards set by `2cd708452`. Fix that. Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com	2024-07-02 13:41:47 +12:00
David Rowley	aa901a37cf	Fix possible Assert failure in cost_memoize_rescan In cost_memoize_rescan(), when calculating the hit_ratio using the calls and ndistinct estimations, if the value that was set in MemoizePath.calls had not been processed through clamp_row_est(), then it was possible that it was set to some non-integer value which could result in ndistinct being 1 higher than calls due to estimate_num_groups() performing clamp_row_est() on its input_rows. This could result in hit_ratio values slightly below 0.0, which would cause an Assert failure. The value of MemoizePath.calls comes from the final parameter in the create_memoize_path() function, of which we only have one true caller of. That caller passes outer_path->rows. All the core code I looked at always seems to call clamp_row_est() on the Path.rows, so there might have been no issues with any core Paths causing troubles here. The bug report was about a CustomPath with a non-clamped row estimated. The misbehavior as a result of this seems to be mostly limited to the Assert() failing. Aside from that, it seems the Memoize costs would just come out slightly higher than they should have, which is likely fairly harmless. Reported-by: Kohei KaiGai <kaigai@heterodb.com> Discussion: https://postgr.es/m/CAOP8fzZnTU+N64UYJYogb1hN-5hFP+PwTb3m_cnGAD7EsQwrKw@mail.gmail.com Reviewed-by: Richard Guo Backpatch-through: 14, where Memoize was introduced	2024-06-19 10:20:24 +12:00
Peter Geoghegan	6207f08f70	Harmonize function parameter names for Postgres 17. Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced during Postgres 17 development. pg_bsd_indent still has a couple of similar inconsistencies, which I (pgeoghegan) have left untouched for now. This commit was written with help from clang-tidy, by mechanically applying the same rules as similar clean-up commits (the earliest such commit was commit `035ce1fe`).	2024-06-12 17:01:51 -04:00
Tom Lane	915de706d2	Fix infer_arbiter_indexes() to not assume resultRelation is 1. infer_arbiter_indexes failed to renumber varnos in index expressions or predicates that it got from the catalogs. This escaped detection up to now because the stored varnos in such trees will be 1, and an INSERT's result relation is usually the first rangetable entry, so that that was fine. However, in cases such as inserting through an updatable view, it's not fine, leading to failure to match the expressions to the query with ensuing "there is no unique or exclusion constraint matching the ON CONFLICT specification" errors. Fix by copy-and-paste from get_relation_info(). Per bug #18502 from Michael Wang. Back-patch to all supported versions. Discussion: https://postgr.es/m/18502-545b53f5b81e54e0@postgresql.org	2024-06-11 17:57:46 -04:00
Richard Guo	3cb19f45a3	Fix comment about cross-checking the varnullingrels The nullingrels match checks are not limited to debugging builds. Oversight in commit `867be9c07`. Author: Richard Guo Reviewed-by: Alvaro Herrera, Tom Lane, Robert Haas Discussion: https://postgr.es/m/CAMbWs4_SDsdYD7DdQw7RXc3jv3axbg+RGZ7aSi9GaqX=F8hNVw@mail.gmail.com	2024-06-10 13:05:20 +09:00
Alexander Korotkov	505c008ca3	Restore preprocess_groupclause() `0452b461bc` made optimizer explore alternative orderings of group-by pathkeys. It eliminated preprocess_groupclause(), which was intended to match items between GROUP BY and ORDER BY. Instead, get_useful_group_keys_orderings() function generates orderings of GROUP BY elements at the time of grouping paths generation. The get_useful_group_keys_orderings() function takes into account 3 orderings of GROUP BY pathkeys and clauses: original order as written in GROUP BY, matching ORDER BY clauses as much as possible, and matching the input path as much as possible. Given that even before `0452b461b`, preprocess_groupclause() could change the original order of GROUP BY clauses we don't need to consider it apart from ordering matching ORDER BY clauses. This commit restores preprocess_groupclause() to provide an ordering of GROUP BY elements matching ORDER BY before generation of paths. The new version of preprocess_groupclause() takes into account an incremental sort. The get_useful_group_keys_orderings() function now takes into 2 orderings of GROUP BY elements: the order generated preprocess_groupclause() and the order matching the input path as much as possible. Discussion: https://postgr.es/m/CAPpHfdvyWLMGwvxaf%3D7KAp-z-4mxbSH8ti2f6mNOQv5metZFzg%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Andrei Lepikhov, Pavel Borisov	2024-06-06 13:44:34 +03:00
Alexander Korotkov	0c1af2c35c	Rename PathKeyInfo to GroupByOrdering `0452b461bc` made optimizer explore alternative orderings of group-by pathkeys. The PathKeyInfo data structure was used to store the particular ordering of group-by pathkeys and corresponding clauses. It turns out that PathKeyInfo is not the best name for that purpose. This commit renames this data structure to GroupByOrdering, and revises its comment. Discussion: https://postgr.es/m/db0fc3a4-966c-4cec-a136-94024d39212d%40postgrespro.ru Reported-by: Tom Lane Author: Andrei Lepikhov Reviewed-by: Alexander Korotkov, Pavel Borisov	2024-06-06 13:43:24 +03:00
Alexander Korotkov	91143c03d4	Add invariants check to get_useful_group_keys_orderings() This commit introduces invariants checking of generated orderings in get_useful_group_keys_orderings() for assert-enabled builds. Discussion: https://postgr.es/m/a663f0f6-cbf6-49aa-af2e-234dc6768a07%40postgrespro.ru Reported-by: Tom Lane Author: Andrei Lepikhov Reviewed-by: Alexander Korotkov, Pavel Borisov	2024-06-06 13:42:47 +03:00
Alexander Korotkov	199012a3d8	Fix asymmetry in setting EquivalenceClass.ec_sortref `0452b461bc` made get_eclass_for_sort_expr() always set EquivalenceClass.ec_sortref if it's not done yet. This leads to an asymmetric situation when whoever first looks for the EquivalenceClass sets the ec_sortref. It is also counterintuitive that get_eclass_for_sort_expr() performs modification of data structures. This commit makes make_pathkeys_for_sortclauses_extended() responsible for setting EquivalenceClass.ec_sortref. Now we set the EquivalenceClass.ec_sortref's needed to explore alternative GROUP BY ordering specifically during building pathkeys by the list of grouping clauses. Discussion: https://postgr.es/m/17037754-f187-4138-8285-0e2bfebd0dea%40postgrespro.ru Reported-by: Tom Lane Author: Andrei Lepikhov Reviewed-by: Alexander Korotkov, Pavel Borisov	2024-06-06 13:41:34 +03:00
David Rowley	7b71e5bbcc	Fix some grammatical errors in some comments Introduced by `9f1337639`. Author: James Coleman <jtc331@gmail.com> Discussion: https://postgr.es/m/CAAaqYe9ZQ_1+QiF_Nv7b37opicBu+35ZQK1CetQ54r5UdrF1eg@mail.gmail.com	2024-06-05 21:31:28 +12:00
Robert Haas	c37267162e	Fix generate_union_paths for non-sortable types. The previous logic would fail to set groupList when grouping_is_sortable() returned false, which could cause queries to return wrong answers when some columns were not sortable. David Rowley, reviewed by Heikki Linnakangas and Álvaro Herrera. Reported by Hubert "depesz" Lubaczewski. Discussion: http://postgr.es/m/Zktzf926vslR35Fv@depesz.com Discussion: http://postgr.es/m/CAApHDvra=c8_zZT0J-TftByWN2Y+OJfnjNJFg4Dfdi2s+QzmqA@mail.gmail.com	2024-05-21 12:54:09 -04:00
Robert Haas	12933dc604	Re-allow planner to use Merge Append to efficiently implement UNION. This reverts commit `7204f35919`, thus restoring `66c0185a3` (Allow planner to use Merge Append to efficiently implement UNION) as well as the follow-on commits `d5d2205c8`, `3b1a7eb28`, `7487044d6`. Per further discussion on pgsql-release, we wish to ship beta1 with this feature, and patch the bug that was found just before wrap, rather than shipping beta1 with the feature reverted.	2024-05-21 12:44:51 -04:00
Tom Lane	7204f35919	Revert commit `66c0185a3` and follow-on patches. This reverts `66c0185a3` (Allow planner to use Merge Append to efficiently implement UNION) as well as the follow-on commits `d5d2205c8`, `3b1a7eb28`, `7487044d6`. In addition to those, `07746a8ef` had to be removed then re-applied in a different place, because `66c0185a3` moved the relevant code. The reason for this last-minute thrashing is that depesz found a case in which the patched code creates a completely wrong plan that silently gives incorrect query results. It's unclear what the cause is or how many cases are affected, but with beta1 wrap staring us in the face, there's no time for closer investigation. After we figure that out, we can decide whether to un-revert this for beta2 or hold it for v18. Discussion: https://postgr.es/m/Zktzf926vslR35Fv@depesz.com (also some private discussion among pgsql-release)	2024-05-20 15:08:30 -04:00
Peter Eisentraut	8aee330af5	Revert temporal primary keys and foreign keys This feature set did not handle empty ranges correctly, and it's now too late for PostgreSQL 17 to fix it. The following commits are reverted: `6db4598fcb` Add stratnum GiST support function `46a0cd4cef` Add temporal PRIMARY KEY and UNIQUE constraints `86232a49a4` Fix comment on gist_stratnum_btree `030e10ff1a` Rename pg_constraint.conwithoutoverlaps to conperiod `a88c800deb` Use daterange and YMD in without_overlaps tests instead of tsrange. `5577a71fb0` Use half-open interval notation in without_overlaps tests `34768ee361` Add temporal FOREIGN KEY contraints `482e108cd3` Add test for REPLICA IDENTITY with a temporal key `c3db1f30cb` doc: clarify PERIOD and WITHOUT OVERLAPS in CREATE TABLE `144c2ce0cc` Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes Discussion: https://www.postgresql.org/message-id/d0b64a7a-dfe4-4b84-a906-c7dedfa40a3e@eisentraut.org	2024-05-16 08:17:46 +02:00
Alvaro Herrera	6f8bb7c1e9	Revert structural changes to not-null constraints There are some problems with the new way to handle these constraints that were detected at the last minute, and require fixes that appear too invasive to be doing this late in the cycle. Revert this (again) for now, we'll try again with these problems fixed. The following commits are reverted: `b0e96f3119` Catalog not-null constraints `9b581c5341` Disallow changing NO INHERIT status of a not-null constraint `d0ec2ddbe0` Fix not-null constraint test `ac22a9545c` Move privilege check to the right place `b0f7dd915b` Check stack depth in new recursive functions `3af7217942` Update information_schema definition for not-null constraints `c3709100be` Fix propagating attnotnull in multiple inheritance `d9f686a72e` Fix restore of not-null constraints with inheritance `d72d32f52d` Don't try to assign smart names to constraints `0cd711271d` Better handle indirect constraint drops `13daa33fa5` Disallow NO INHERIT not-null constraints on partitioned tables `d45597f72f` Disallow direct change of NO INHERIT of not-null constraints `21ac38f498` Fix inconsistencies in error messages Discussion: https://postgr.es/m/202405110940.joxlqcx4dogd@alvherre.pgsql	2024-05-13 11:31:09 +02:00
Peter Eisentraut	144c2ce0cc	Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes A PRIMARY KEY or UNIQUE constraint with WITHOUT OVERLAPS will be a GiST index, not a B-Tree, but it will still have indisunique set. The code for ON CONFLICT fails if it sees a non-btree index that has indisunique. This commit fixes that and adds some tests. But now that we can't just test indisunique, we also need some extra checks to prevent DO UPDATE from running against a WITHOUT OVERLAPS constraint (because the conflict could happen against more than one row, and we'd only update one). Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Discussion: https://www.postgresql.org/message-id/1426589a-83cb-4a89-bf40-713970c07e63@illuminatedcomputing.com	2024-05-10 14:55:31 +02:00
Tom Lane	c7be3c015b	Make left-join removal safe under -DREALLOCATE_BITMAPSETS. The initial building of RestrictInfos and SpecialJoinInfos tends to create structures that share relid sets (such as syn_lefthand and outer_relids). There's nothing wrong with that in itself, but when we modify those relid sets during join removal, we have to be sure not to corrupt the values that other structures are pointing at. remove_rel_from_query() was careless about this. It accidentally worked anyway because (1) we'd never be reducing the sets to empty, so they wouldn't get pfree'd; and (2) the in-place modification is the same one that we did or will apply to the other struct's relid set, so that there wasn't visible corruption at the end of the process. While there's no live bug in a standard build, of course this is way too fragile to accept going forward. (Maybe we should back-patch this change too for safety, but I've refrained for now.) This bug was exposed by the recent invention of REALLOCATE_BITMAPSETS. Commit `e0477837c` had installed a fix, but that went away with the reversion of SJE, so now we need to fix it again. David Rowley and Tom Lane Discussion: https://postgr.es/m/CACJufxFVQmr4=JWHAOSLuKA5Zy9H26nY6tVrRFBOekHoALyCkQ@mail.gmail.com	2024-05-09 11:01:50 -04:00
Tom Lane	6572bd55b0	Prevent RLS filters on ctid from breaking WHERE CURRENT OF <cursor>. The executor only supports CurrentOfExpr as the sole tidqual of a TidScan plan node. tidpath.c failed to take any particular care about that, but would just take the first ctid equality qual it could find in the target relation's baserestrictinfo list. Originally that was fine because the grammar prevents any other WHERE conditions from being combined with CURRENT OF <cursor>. However, if the relation has RLS visibility policies then those would get included in the list. Should such a policy include a condition on ctid, we'd typically grab the wrong qual and produce a malfunctioning plan. To fix, introduce a simplistic priority ordering scheme for which ctid equality qual to prefer. Real-world cases involving more than one such qual are so rare that it doesn't seem worth going to any great trouble to choose one over another, so I didn't work very hard; but this code could be extended in future if someone thinks differently. It's extremely difficult to think of a reasonable use-case for an RLS restriction involving ctid, and certainly we've heard no field reports of this failure. So this doesn't seem worthy of back-patching, but in the name of cleanliness let's fix it going forward. Patch by me, per report from Robert Haas. Discussion: https://postgr.es/m/3914881.1715038270@sss.pgh.pa.us	2024-05-07 13:35:10 -04:00
Tom Lane	07746a8ef2	Finish incomplete revert of `ec63622c0`. The code change this made might well be fine to keep, but the comment justifying it by reference to self-join removal isn't. Let's just go back to the status quo ante, pending a more thorough review/redesign of SJE. (I found this by grepping to see if any references to self-join removal remained in the tree.)	2024-05-06 14:22:45 -04:00
Alexander Korotkov	d1d286d83c	Revert: Remove useless self-joins This commit reverts `d3d55ce571` and subsequent fixes `2b26a69455`, `93c85db3b5`, `b44a1708ab`, `b7f315c9d7`, `8a8ed916f7`, `b5fb6736ed`, `0a93f803f4`, `e0477837ce`, `a7928a57b9`, `5ef34a8fc3`, `30b4955a46`, `8c441c0827`, `028b15405b`, `fe093994db`, `489072ab7a`, and `466979ef03`. We are quite late in the release cycle and new bugs continue to appear. Even though we have fixes for all known bugs, there is a risk of throwing many bugs to end users. The plan for self-join elimination would be to do more review and testing, then re-commit in the early v18 cycle. Reported-by: Tom Lane Discussion: https://postgr.es/m/2422119.1714691974%40sss.pgh.pa.us	2024-05-06 14:36:36 +03:00
David Rowley	7d2c7f08d9	Fix query pullup issue with WindowClause runCondition `94985c210` added code to detect when WindowFuncs were monotonic and allowed additional quals to be "pushed down" into the subquery to be used as WindowClause runConditions in order to short-circuit execution in nodeWindowAgg.c. The Node representation of runConditions wasn't well selected and because we do qual pushdown before planning the subquery, the planning of the subquery could perform subquery pull-up of nested subqueries. For WindowFuncs with args, the arguments could be changed after pushing the qual down to the subquery. This was made more difficult by the fact that the code duplicated the WindowFunc inside an OpExpr to include in the WindowClauses runCondition field. This could result in duplication of subqueries and a pull-up of such a subquery could result in another initplan parameter being issued for the 2nd version of the subplan. This could result in errors such as: ERROR: WindowFunc not found in subplan target lists To fix this, we change the node representation of these run conditions and instead of storing an OpExpr containing the WindowFunc in a list inside WindowClause, we now store a new node type named WindowFuncRunCondition within a new field in the WindowFunc. These get transformed into OpExprs later in planning once subquery pull-up has been performed. This problem did exist in v15 and v16, but that was fixed by `9d36b883b` and e5d20bbd. Cat version bump due to new node type and modifying WindowFunc struct. Bug: #18305 Reported-by: Zuming Jiang Discussion: https://postgr.es/m/18305-33c49b4c830b37b3%40postgresql.org	2024-05-05 12:54:46 +12:00
Dean Rasheed	2e068db56e	Use macro NUM_MERGE_MATCH_KINDS instead of '3' in MERGE code. Code quality improvement for `0294df2f1f`. Aleksander Alekseev, reviewed by Richard Guo. Discussion: https://postgr.es/m/CAJ7c6TMsiaV5urU_Pq6zJ2tXPDwk69-NKVh4AMN5XrRiM7N%2BGA%40mail.gmail.com	2024-04-19 09:40:20 +01:00
Daniel Gustafsson	950d4a2cb1	Fix typos and duplicate words This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se	2024-04-18 21:28:07 +02:00
Tom Lane	03107b4eda	Ensure generated join clauses for child rels have correct relids. When building a join clause derived from an EquivalenceClass, if the clause is to be used with an appendrel child relation then make sure its clause_relids include the relids of that child relation. Normally this would be true already because the EquivalenceMember would be a Var of that relation. However, if the appendrel represents a flattened UNION ALL construct then some child EquivalenceMembers could be constants with no relids. The resulting under-marked clause is problematic because it could mislead join_clause_is_movable_into about where the clause should be evaluated. We do not have an example showing incorrect plan generation, but there are existing cases in the regression tests that will fail the Asserts this patch adds to get_baserel_parampathinfo. A similarly wrong conclusion about a clause being considered by get_joinrel_parampathinfo would lead to wrong placement of the clause. (This also squares with the way that clause_relids is calculated for non-equijoin clauses in adjust_appendrel_attrs.) The other reason for wanting these new Asserts is that the previous blithe assumption that the results of generate_join_implied_equalities "necessarily satisfy join_clause_is_movable_into" turns out to be wrong pre-v16. If it's still wrong it'd be good to find out. Per bug #18429 from Benoît Ryder. The bug as filed was fixed by commit `2489d76c4`, but these changes correlate with the fix we will need to apply in pre-v16 branches. Discussion: https://postgr.es/m/18429-8982d4a348cc86c6@postgresql.org	2024-04-16 11:22:51 -04:00
Tom Lane	e0df80828a	Fix type-checking of RECORD-returning functions in FROM, redux. Commit `2ed8f9a01` intended to institute a policy that if a RangeTblFunction has a coldeflist, then the function return type is certainly RECORD, and we should use the coldeflist as the source of truth about what the columns of the record type are. When the original function has been folded to a constant, inspection of the constant might give a different answer. This situation will lead to a tuple-type-mismatch error at execution, but up until that point we need to consistently believe the coldeflist, or we'll have problems from different bits of code reaching different conclusions. expandRTE didn't get that memo though, and would try to produce a tupdesc based on the constant in this situation, leading to an assertion failure. (Desultory testing suggests that non-assert builds often manage to give the expected error, although I also saw a "cache lookup failed for type 0" error, and it seems at least possible that a crash could happen.) Some other callers of get_expr_result_type and get_expr_result_tupdesc were also being incautious about this. While none of them seem to have actual bugs, they're working harder than necessary in this case, besides which it seems safest to have an explicit policy of not using those functions on an RTE with a coldeflist. Adjust the code accordingly, and add commentary to funcapi.c about this policy. Also fix an obsolete comment that claimed "get_expr_result_type() doesn't know how to extract type info from a RECORD constant". That hasn't been true since commit `d57534740`. Per bug #18422 from Alexander Lakhin. As with the previous commit, back-patch to all supported branches. Discussion: https://postgr.es/m/18422-89ca86c8eac5246d@postgresql.org	2024-04-15 12:56:56 -04:00
David Rowley	b9ecefecc7	Fix recently introduced typo in code comment Reported-by: Richard Guo Discussion: https://postgr.es/m/CAMbWs49kAsZUsj7-0SBLvE9+uKz0RCqMEmM3NVytc1YvS8sTrQ@mail.gmail.com	2024-04-12 23:15:52 +12:00

1 2 3 4 5 ...

2756 Commits