1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-07 19:06:32 +03:00
Commit Graph

2739 Commits

Author SHA1 Message Date
Richard Guo
8b2e9fd26a Remove redundant code in create_gather_merge_path
In create_gather_merge_path, we should always guarantee that the
subpath is adequately ordered, and we do not add a Sort node in
createplan.c for a Gather Merge node.  Therefore, the 'else' branch in
create_gather_merge_path, which computes the cost for a Sort node, is
redundant.

This patch removes the redundant code and emits an error if the
subpath is not sufficiently ordered.  Meanwhile, this patch changes
the check for the subpath's pathkeys in create_gather_merge_plan to an
Assert.

Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs48u=0bWf3epVtULjJ-=M9Hbkz+ieZQAOS=BfbXZFqbDCg@mail.gmail.com
2024-07-23 11:18:53 +09:00
Richard Guo
581df21487 Fix rowcount estimate for gather (merge) paths
In the case of a parallel plan, when computing the number of tuples
processed per worker, we divide the total number of tuples by the
parallel_divisor obtained from get_parallel_divisor(), which accounts
for the leader's contribution in addition to the number of workers.

Accordingly, when estimating the number of tuples for gather (merge)
nodes, we should multiply the number of tuples per worker by the same
parallel_divisor to reverse the division.  However, currently we use
parallel_workers rather than parallel_divisor for the multiplication.
This could result in an underestimation of the number of tuples for
gather (merge) nodes, especially when there are fewer than four
workers.

This patch fixes this issue by using the same parallel_divisor for the
multiplication.  There is one ensuing plan change in the regression
tests, but it looks reasonable and does not compromise its original
purpose of testing parallel-aware hash join.

In passing, this patch removes an unnecessary assignment for path.rows
in create_gather_merge_path, and fixes an uninitialized-variable issue
in generate_useful_gather_paths.

No backpatch as this could result in plan changes.

Author: Anthonin Bonnefoy
Reviewed-by: Rafia Sabih, Richard Guo
Discussion: https://postgr.es/m/CAO6_Xqr9+51NxgO=XospEkUeAg-p=EjAWmtpdcZwjRgGKJ53iA@mail.gmail.com
2024-07-23 10:33:26 +09:00
Robert Haas
e4326fbc60 Remove grotty use of disable_cost for TID scan plans.
Previously, the code charged disable_cost for CurrentOfExpr, and then
subtracted disable_cost from the cost of a TID path that used
CurrentOfExpr as the TID qual, effectively disabling all paths except
that one. Now, we instead suppress generation of the disabled paths
entirely, and generate only the one that the executor will actually
understand.

With this approach, we do not need to rely on disable_cost being
large enough to prevent the wrong path from being chosen, and we
save some CPU cycle by avoiding generating paths that we can't
actually use. In my opinion, the code is also easier to understand
like this.

Patch by me. Review by Heikki Linnakangas.

Discussion: http://postgr.es/m/591b3596-2ea0-4b8e-99c6-fad0ef2801f5@iki.fi
2024-07-22 14:57:53 -04:00
Richard Guo
069d0ff022 Check lateral references within PHVs for memoize cache keys
If we intend to generate a Memoize node on top of a path, we need
cache keys of some sort.  Currently we search for the cache keys in
the parameterized clauses of the path as well as the lateral_vars of
its parent.  However, it turns out that this is not sufficient because
there might be lateral references derived from PlaceHolderVars, which
we fail to take into consideration.

This oversight can cause us to miss opportunities to utilize the
Memoize node.  Moreover, in some plans, failing to recognize all the
cache keys could result in performance regressions.  This is because
without identifying all the cache keys, we would need to purge the
entire cache every time we get a new outer tuple during execution.

This patch fixes this issue by extracting lateral Vars from within
PlaceHolderVars and subsequently including them in the cache keys.

In passing, this patch also includes a comment clarifying that Memoize
nodes are currently not added on top of join relation paths.  This
explains why this patch only considers PlaceHolderVars that are due to
be evaluated at baserels.

Author: Richard Guo
Reviewed-by: Tom Lane, David Rowley, Andrei Lepikhov
Discussion: https://postgr.es/m/CAMbWs48jLxn0pAPZpJ50EThZ569Xrw+=4Ac3QvkpQvNszbeoNg@mail.gmail.com
2024-07-15 10:26:33 +09:00
Richard Guo
22d946b0f8 Consider materializing the cheapest inner path in parallel nestloop
When generating non-parallel nestloop paths for each available outer
path, we always consider materializing the cheapest inner path if
feasible.  Similarly, in this patch, we also consider materializing
the cheapest inner path when building partial nestloop paths.  This
approach potentially reduces the need to rescan the inner side of a
partial nestloop path for each outer tuple.

Author: Tender Wang
Reviewed-by: Richard Guo, Robert Haas, David Rowley, Alena Rybakina
Reviewed-by: Tomasz Rybak, Paul Jungwirth, Yuki Fujii
Discussion: https://postgr.es/m/CAHewXNkPmtEXNfVQMou_7NqQmFABca9f4etjBtdbbm0ZKDmWvw@mail.gmail.com
2024-07-12 11:16:43 +09:00
Richard Guo
aa86129e19 Support "Right Semi Join" plan shapes
Hash joins can support semijoin with the LHS input on the right, using
the existing logic for inner join, combined with the assurance that only
the first match for each inner tuple is considered, which can be
achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag.  This can be very
useful in some cases since we may now have the option to hash the
smaller table instead of the larger.

Merge join could likely support "Right Semi Join" too.  However, the
benefit of swapping inputs tends to be small here, so we do not address
that in this patch.

Note that this patch also modifies a test query in join.sql to ensure it
continues testing as intended.  With this patch the original query would
result in a right-semi-join rather than semi-join, compromising its
original purpose of testing the fix for neqjoinsel's behavior for
semi-joins.

Author: Richard Guo
Reviewed-by: wenhui qiu, Alena Rybakina, Japin Li
Discussion: https://postgr.es/m/CAMbWs4_X1mN=ic+SxcyymUqFx9bB8pqSLTGJ-F=MHy4PW3eRXw@mail.gmail.com
2024-07-05 09:26:48 +09:00
Michael Paquier
b81a71aa05 Assign error codes where missing for user-facing failures
All the errors triggered in the code paths patched here would cause the
backend to issue an internal_error errcode, which is a state that should
be used only for "can't happen" situations.  However, these code paths
are reachable by the regression tests, and could be seen by users in
valid cases.  Some regression tests expect internal errcodes as they
manipulate the backend state to cause corruption (like checksums), or
use elog() because it is more convenient (like injection points), these
have no need to change.

This reduces the number of internal failures triggered in a check-world
by more than half, while providing correct errcodes for these valid
cases.

Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/Zic_GNgos5sMxKoa@paquier.xyz
2024-07-04 09:48:40 +09:00
David Rowley
65b71dec2d Use TupleDescAttr macro consistently
A few places were directly accessing the attrs[] array. This goes
against the standards set by 2cd708452. Fix that.

Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
2024-07-02 13:41:47 +12:00
David Rowley
aa901a37cf Fix possible Assert failure in cost_memoize_rescan
In cost_memoize_rescan(), when calculating the hit_ratio using the calls
and ndistinct estimations, if the value that was set in
MemoizePath.calls had not been processed through clamp_row_est(), then it
was possible that it was set to some non-integer value which could result
in ndistinct being 1 higher than calls due to estimate_num_groups()
performing clamp_row_est() on its input_rows.  This could result in
hit_ratio values slightly below 0.0, which would cause an Assert failure.

The value of MemoizePath.calls comes from the final parameter in the
create_memoize_path() function, of which we only have one true caller of.
That caller passes outer_path->rows.  All the core code I looked at
always seems to call clamp_row_est() on the Path.rows, so there might
have been no issues with any core Paths causing troubles here.  The bug
report was about a CustomPath with a non-clamped row estimated.

The misbehavior as a result of this seems to be mostly limited to the
Assert() failing.  Aside from that, it seems the Memoize costs would
just come out slightly higher than they should have, which is likely
fairly harmless.

Reported-by: Kohei KaiGai <kaigai@heterodb.com>
Discussion: https://postgr.es/m/CAOP8fzZnTU+N64UYJYogb1hN-5hFP+PwTb3m_cnGAD7EsQwrKw@mail.gmail.com
Reviewed-by: Richard Guo
Backpatch-through: 14, where Memoize was introduced
2024-06-19 10:20:24 +12:00
Peter Geoghegan
6207f08f70 Harmonize function parameter names for Postgres 17.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places.  These
inconsistencies were all introduced during Postgres 17 development.

pg_bsd_indent still has a couple of similar inconsistencies, which I
(pgeoghegan) have left untouched for now.

This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
2024-06-12 17:01:51 -04:00
Tom Lane
915de706d2 Fix infer_arbiter_indexes() to not assume resultRelation is 1.
infer_arbiter_indexes failed to renumber varnos in index expressions
or predicates that it got from the catalogs.  This escaped detection
up to now because the stored varnos in such trees will be 1, and an
INSERT's result relation is usually the first rangetable entry,
so that that was fine.  However, in cases such as inserting through
an updatable view, it's not fine, leading to failure to match the
expressions to the query with ensuing "there is no unique or exclusion
constraint matching the ON CONFLICT specification" errors.

Fix by copy-and-paste from get_relation_info().

Per bug #18502 from Michael Wang.  Back-patch to all supported
versions.

Discussion: https://postgr.es/m/18502-545b53f5b81e54e0@postgresql.org
2024-06-11 17:57:46 -04:00
Richard Guo
3cb19f45a3 Fix comment about cross-checking the varnullingrels
The nullingrels match checks are not limited to debugging builds.
Oversight in commit 867be9c07.

Author: Richard Guo
Reviewed-by: Alvaro Herrera, Tom Lane, Robert Haas
Discussion: https://postgr.es/m/CAMbWs4_SDsdYD7DdQw7RXc3jv3axbg+RGZ7aSi9GaqX=F8hNVw@mail.gmail.com
2024-06-10 13:05:20 +09:00
Alexander Korotkov
505c008ca3 Restore preprocess_groupclause()
0452b461bc made optimizer explore alternative orderings of group-by pathkeys.
It eliminated preprocess_groupclause(), which was intended to match items
between GROUP BY and ORDER BY.  Instead, get_useful_group_keys_orderings()
function generates orderings of GROUP BY elements at the time of grouping
paths generation.  The get_useful_group_keys_orderings() function takes into
account 3 orderings of GROUP BY pathkeys and clauses: original order as written
in GROUP BY, matching ORDER BY clauses as much as possible, and matching the
input path as much as possible.  Given that even before 0452b461b,
preprocess_groupclause() could change the original order of GROUP BY clauses
we don't need to consider it apart from ordering matching ORDER BY clauses.

This commit restores preprocess_groupclause() to provide an ordering of
GROUP BY elements matching ORDER BY before generation of paths.  The new
version of preprocess_groupclause() takes into account an incremental sort.
The get_useful_group_keys_orderings() function now takes into 2 orderings of
GROUP BY elements: the order generated preprocess_groupclause() and the order
matching the input path as much as possible.

Discussion: https://postgr.es/m/CAPpHfdvyWLMGwvxaf%3D7KAp-z-4mxbSH8ti2f6mNOQv5metZFzg%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Andrei Lepikhov, Pavel Borisov
2024-06-06 13:44:34 +03:00
Alexander Korotkov
0c1af2c35c Rename PathKeyInfo to GroupByOrdering
0452b461bc made optimizer explore alternative orderings of group-by pathkeys.
The PathKeyInfo data structure was used to store the particular ordering of
group-by pathkeys and corresponding clauses.  It turns out that PathKeyInfo
is not the best name for that purpose.  This commit renames this data structure
to GroupByOrdering, and revises its comment.

Discussion: https://postgr.es/m/db0fc3a4-966c-4cec-a136-94024d39212d%40postgrespro.ru
Reported-by: Tom Lane
Author: Andrei Lepikhov
Reviewed-by: Alexander Korotkov, Pavel Borisov
2024-06-06 13:43:24 +03:00
Alexander Korotkov
91143c03d4 Add invariants check to get_useful_group_keys_orderings()
This commit introduces invariants checking of generated orderings
in get_useful_group_keys_orderings() for assert-enabled builds.

Discussion: https://postgr.es/m/a663f0f6-cbf6-49aa-af2e-234dc6768a07%40postgrespro.ru
Reported-by: Tom Lane
Author: Andrei Lepikhov
Reviewed-by: Alexander Korotkov, Pavel Borisov
2024-06-06 13:42:47 +03:00
Alexander Korotkov
199012a3d8 Fix asymmetry in setting EquivalenceClass.ec_sortref
0452b461bc made get_eclass_for_sort_expr() always set
EquivalenceClass.ec_sortref if it's not done yet.  This leads to an asymmetric
situation when whoever first looks for the EquivalenceClass sets the
ec_sortref.  It is also counterintuitive that get_eclass_for_sort_expr()
performs modification of data structures.

This commit makes make_pathkeys_for_sortclauses_extended() responsible for
setting EquivalenceClass.ec_sortref.  Now we set the
EquivalenceClass.ec_sortref's needed to explore alternative GROUP BY ordering
specifically during building pathkeys by the list of grouping clauses.

Discussion: https://postgr.es/m/17037754-f187-4138-8285-0e2bfebd0dea%40postgrespro.ru
Reported-by: Tom Lane
Author: Andrei Lepikhov
Reviewed-by: Alexander Korotkov, Pavel Borisov
2024-06-06 13:41:34 +03:00
David Rowley
7b71e5bbcc Fix some grammatical errors in some comments
Introduced by 9f1337639.

Author: James Coleman <jtc331@gmail.com>
Discussion: https://postgr.es/m/CAAaqYe9ZQ_1+QiF_Nv7b37opicBu+35ZQK1CetQ54r5UdrF1eg@mail.gmail.com
2024-06-05 21:31:28 +12:00
Robert Haas
c37267162e Fix generate_union_paths for non-sortable types.
The previous logic would fail to set groupList when
grouping_is_sortable() returned false, which could cause queries
to return wrong answers when some columns were not sortable.

David Rowley, reviewed by Heikki Linnakangas and Álvaro Herrera.
Reported by Hubert "depesz" Lubaczewski.

Discussion: http://postgr.es/m/Zktzf926vslR35Fv@depesz.com
Discussion: http://postgr.es/m/CAApHDvra=c8_zZT0J-TftByWN2Y+OJfnjNJFg4Dfdi2s+QzmqA@mail.gmail.com
2024-05-21 12:54:09 -04:00
Robert Haas
12933dc604 Re-allow planner to use Merge Append to efficiently implement UNION.
This reverts commit 7204f35919,
thus restoring 66c0185a3 (Allow planner to use Merge Append to
efficiently implement UNION) as well as the follow-on commits
d5d2205c8, 3b1a7eb28, 7487044d6.

Per further discussion on pgsql-release, we wish to ship beta1 with
this feature, and patch the bug that was found just before wrap,
rather than shipping beta1 with the feature reverted.
2024-05-21 12:44:51 -04:00
Tom Lane
7204f35919 Revert commit 66c0185a3 and follow-on patches.
This reverts 66c0185a3 (Allow planner to use Merge Append to
efficiently implement UNION) as well as the follow-on commits
d5d2205c8, 3b1a7eb28, 7487044d6.  In addition to those, 07746a8ef
had to be removed then re-applied in a different place, because
66c0185a3 moved the relevant code.

The reason for this last-minute thrashing is that depesz found a
case in which the patched code creates a completely wrong plan
that silently gives incorrect query results.  It's unclear what
the cause is or how many cases are affected, but with beta1 wrap
staring us in the face, there's no time for closer investigation.
After we figure that out, we can decide whether to un-revert this
for beta2 or hold it for v18.

Discussion: https://postgr.es/m/Zktzf926vslR35Fv@depesz.com
(also some private discussion among pgsql-release)
2024-05-20 15:08:30 -04:00
Peter Eisentraut
8aee330af5 Revert temporal primary keys and foreign keys
This feature set did not handle empty ranges correctly, and it's now
too late for PostgreSQL 17 to fix it.

The following commits are reverted:

    6db4598fcb Add stratnum GiST support function
    46a0cd4cef Add temporal PRIMARY KEY and UNIQUE constraints
    86232a49a4 Fix comment on gist_stratnum_btree
    030e10ff1a Rename pg_constraint.conwithoutoverlaps to conperiod
    a88c800deb Use daterange and YMD in without_overlaps tests instead of tsrange.
    5577a71fb0 Use half-open interval notation in without_overlaps tests
    34768ee361 Add temporal FOREIGN KEY contraints
    482e108cd3 Add test for REPLICA IDENTITY with a temporal key
    c3db1f30cb doc:  clarify PERIOD and WITHOUT OVERLAPS in CREATE TABLE
    144c2ce0cc Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes

Discussion: https://www.postgresql.org/message-id/d0b64a7a-dfe4-4b84-a906-c7dedfa40a3e@eisentraut.org
2024-05-16 08:17:46 +02:00
Alvaro Herrera
6f8bb7c1e9 Revert structural changes to not-null constraints
There are some problems with the new way to handle these constraints
that were detected at the last minute, and require fixes that appear too
invasive to be doing this late in the cycle.  Revert this (again) for
now, we'll try again with these problems fixed.

The following commits are reverted:

    b0e96f3119  Catalog not-null constraints
    9b581c5341  Disallow changing NO INHERIT status of a not-null constraint
    d0ec2ddbe0  Fix not-null constraint test
    ac22a9545c  Move privilege check to the right place
    b0f7dd915b  Check stack depth in new recursive functions
    3af7217942  Update information_schema definition for not-null constraints
    c3709100be  Fix propagating attnotnull in multiple inheritance
    d9f686a72e  Fix restore of not-null constraints with inheritance
    d72d32f52d  Don't try to assign smart names to constraints
    0cd711271d  Better handle indirect constraint drops
    13daa33fa5  Disallow NO INHERIT not-null constraints on partitioned tables
    d45597f72f  Disallow direct change of NO INHERIT of not-null constraints
    21ac38f498  Fix inconsistencies in error messages

Discussion: https://postgr.es/m/202405110940.joxlqcx4dogd@alvherre.pgsql
2024-05-13 11:31:09 +02:00
Peter Eisentraut
144c2ce0cc Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes
A PRIMARY KEY or UNIQUE constraint with WITHOUT OVERLAPS will be a
GiST index, not a B-Tree, but it will still have indisunique set.  The
code for ON CONFLICT fails if it sees a non-btree index that has
indisunique.  This commit fixes that and adds some tests.  But now
that we can't just test indisunique, we also need some extra checks to
prevent DO UPDATE from running against a WITHOUT OVERLAPS constraint
(because the conflict could happen against more than one row, and we'd
only update one).

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/1426589a-83cb-4a89-bf40-713970c07e63@illuminatedcomputing.com
2024-05-10 14:55:31 +02:00
Tom Lane
c7be3c015b Make left-join removal safe under -DREALLOCATE_BITMAPSETS.
The initial building of RestrictInfos and SpecialJoinInfos tends to
create structures that share relid sets (such as syn_lefthand and
outer_relids).  There's nothing wrong with that in itself, but when
we modify those relid sets during join removal, we have to be sure
not to corrupt the values that other structures are pointing at.
remove_rel_from_query() was careless about this.  It accidentally
worked anyway because (1) we'd never be reducing the sets to empty,
so they wouldn't get pfree'd; and (2) the in-place modification is the
same one that we did or will apply to the other struct's relid set,
so that there wasn't visible corruption at the end of the process.

While there's no live bug in a standard build, of course this is way
too fragile to accept going forward.  (Maybe we should back-patch
this change too for safety, but I've refrained for now.)

This bug was exposed by the recent invention of REALLOCATE_BITMAPSETS.
Commit e0477837c had installed a fix, but that went away with the
reversion of SJE, so now we need to fix it again.

David Rowley and Tom Lane

Discussion: https://postgr.es/m/CACJufxFVQmr4=JWHAOSLuKA5Zy9H26nY6tVrRFBOekHoALyCkQ@mail.gmail.com
2024-05-09 11:01:50 -04:00
Tom Lane
6572bd55b0 Prevent RLS filters on ctid from breaking WHERE CURRENT OF <cursor>.
The executor only supports CurrentOfExpr as the sole tidqual of a
TidScan plan node.  tidpath.c failed to take any particular care about
that, but would just take the first ctid equality qual it could find
in the target relation's baserestrictinfo list.  Originally that was
fine because the grammar prevents any other WHERE conditions from
being combined with CURRENT OF <cursor>.  However, if the relation has
RLS visibility policies then those would get included in the list.
Should such a policy include a condition on ctid, we'd typically grab
the wrong qual and produce a malfunctioning plan.

To fix, introduce a simplistic priority ordering scheme for which ctid
equality qual to prefer.  Real-world cases involving more than one
such qual are so rare that it doesn't seem worth going to any great
trouble to choose one over another, so I didn't work very hard; but
this code could be extended in future if someone thinks differently.

It's extremely difficult to think of a reasonable use-case for an RLS
restriction involving ctid, and certainly we've heard no field reports
of this failure.  So this doesn't seem worthy of back-patching, but
in the name of cleanliness let's fix it going forward.

Patch by me, per report from Robert Haas.

Discussion: https://postgr.es/m/3914881.1715038270@sss.pgh.pa.us
2024-05-07 13:35:10 -04:00
Tom Lane
07746a8ef2 Finish incomplete revert of ec63622c0.
The code change this made might well be fine to keep, but the
comment justifying it by reference to self-join removal isn't.
Let's just go back to the status quo ante, pending a more thorough
review/redesign of SJE.

(I found this by grepping to see if any references to self-join
removal remained in the tree.)
2024-05-06 14:22:45 -04:00
Alexander Korotkov
d1d286d83c Revert: Remove useless self-joins
This commit reverts d3d55ce571 and subsequent fixes 2b26a69455, 93c85db3b5,
b44a1708ab, b7f315c9d7, 8a8ed916f7, b5fb6736ed, 0a93f803f4, e0477837ce,
a7928a57b9, 5ef34a8fc3, 30b4955a46, 8c441c0827, 028b15405b, fe093994db,
489072ab7a, and 466979ef03.

We are quite late in the release cycle and new bugs continue to appear.  Even
though we have fixes for all known bugs, there is a risk of throwing many
bugs to end users.

The plan for self-join elimination would be to do more review and testing,
then re-commit in the early v18 cycle.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/2422119.1714691974%40sss.pgh.pa.us
2024-05-06 14:36:36 +03:00
David Rowley
7d2c7f08d9 Fix query pullup issue with WindowClause runCondition
94985c210 added code to detect when WindowFuncs were monotonic and
allowed additional quals to be "pushed down" into the subquery to be
used as WindowClause runConditions in order to short-circuit execution
in nodeWindowAgg.c.

The Node representation of runConditions wasn't well selected and
because we do qual pushdown before planning the subquery, the planning
of the subquery could perform subquery pull-up of nested subqueries.
For WindowFuncs with args, the arguments could be changed after pushing
the qual down to the subquery.

This was made more difficult by the fact that the code duplicated the
WindowFunc inside an OpExpr to include in the WindowClauses runCondition
field.  This could result in duplication of subqueries and a pull-up of
such a subquery could result in another initplan parameter being issued
for the 2nd version of the subplan.  This could result in errors such as:

ERROR:  WindowFunc not found in subplan target lists

To fix this, we change the node representation of these run conditions
and instead of storing an OpExpr containing the WindowFunc in a list
inside WindowClause, we now store a new node type named
WindowFuncRunCondition within a new field in the WindowFunc.  These get
transformed into OpExprs later in planning once subquery pull-up has been
performed.

This problem did exist in v15 and v16, but that was fixed by 9d36b883b
and e5d20bbd.

Cat version bump due to new node type and modifying WindowFunc struct.

Bug: #18305
Reported-by: Zuming Jiang
Discussion: https://postgr.es/m/18305-33c49b4c830b37b3%40postgresql.org
2024-05-05 12:54:46 +12:00
Dean Rasheed
2e068db56e Use macro NUM_MERGE_MATCH_KINDS instead of '3' in MERGE code.
Code quality improvement for 0294df2f1f.

Aleksander Alekseev, reviewed by Richard Guo.

Discussion: https://postgr.es/m/CAJ7c6TMsiaV5urU_Pq6zJ2tXPDwk69-NKVh4AMN5XrRiM7N%2BGA%40mail.gmail.com
2024-04-19 09:40:20 +01:00
Daniel Gustafsson
950d4a2cb1 Fix typos and duplicate words
This fixes various typos, duplicated words, and tiny bits of whitespace
mainly in code comments but also in docs.

Author: Daniel Gustafsson <daniel@yesql.se>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Alexander Lakhin <exclusion@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
2024-04-18 21:28:07 +02:00
Tom Lane
03107b4eda Ensure generated join clauses for child rels have correct relids.
When building a join clause derived from an EquivalenceClass, if the
clause is to be used with an appendrel child relation then make sure
its clause_relids include the relids of that child relation.
Normally this would be true already because the EquivalenceMember
would be a Var of that relation.  However, if the appendrel represents
a flattened UNION ALL construct then some child EquivalenceMembers
could be constants with no relids.  The resulting under-marked clause
is problematic because it could mislead join_clause_is_movable_into
about where the clause should be evaluated.  We do not have an example
showing incorrect plan generation, but there are existing cases in
the regression tests that will fail the Asserts this patch adds to
get_baserel_parampathinfo.  A similarly wrong conclusion about a
clause being considered by get_joinrel_parampathinfo would lead to
wrong placement of the clause.  (This also squares with the way
that clause_relids is calculated for non-equijoin clauses in
adjust_appendrel_attrs.)

The other reason for wanting these new Asserts is that the previous
blithe assumption that the results of generate_join_implied_equalities
"necessarily satisfy join_clause_is_movable_into" turns out to be
wrong pre-v16.  If it's still wrong it'd be good to find out.

Per bug #18429 from Benoît Ryder.  The bug as filed was fixed by
commit 2489d76c4, but these changes correlate with the fix we
will need to apply in pre-v16 branches.

Discussion: https://postgr.es/m/18429-8982d4a348cc86c6@postgresql.org
2024-04-16 11:22:51 -04:00
Tom Lane
e0df80828a Fix type-checking of RECORD-returning functions in FROM, redux.
Commit 2ed8f9a01 intended to institute a policy that if a
RangeTblFunction has a coldeflist, then the function return type is
certainly RECORD, and we should use the coldeflist as the source of
truth about what the columns of the record type are.  When the
original function has been folded to a constant, inspection of the
constant might give a different answer.  This situation will lead to
a tuple-type-mismatch error at execution, but up until that point we
need to consistently believe the coldeflist, or we'll have problems
from different bits of code reaching different conclusions.

expandRTE didn't get that memo though, and would try to produce a
tupdesc based on the constant in this situation, leading to an
assertion failure.  (Desultory testing suggests that non-assert
builds often manage to give the expected error, although I also
saw a "cache lookup failed for type 0" error, and it seems at
least possible that a crash could happen.)

Some other callers of get_expr_result_type and get_expr_result_tupdesc
were also being incautious about this.  While none of them seem to
have actual bugs, they're working harder than necessary in this case,
besides which it seems safest to have an explicit policy of not using
those functions on an RTE with a coldeflist.  Adjust the code
accordingly, and add commentary to funcapi.c about this policy.

Also fix an obsolete comment that claimed "get_expr_result_type()
doesn't know how to extract type info from a RECORD constant".
That hasn't been true since commit d57534740.

Per bug #18422 from Alexander Lakhin.
As with the previous commit, back-patch to all supported branches.

Discussion: https://postgr.es/m/18422-89ca86c8eac5246d@postgresql.org
2024-04-15 12:56:56 -04:00
David Rowley
b9ecefecc7 Fix recently introduced typo in code comment
Reported-by: Richard Guo
Discussion: https://postgr.es/m/CAMbWs49kAsZUsj7-0SBLvE9+uKz0RCqMEmM3NVytc1YvS8sTrQ@mail.gmail.com
2024-04-12 23:15:52 +12:00
David Rowley
3af7040985 Fix IS [NOT] NULL qual optimization for inheritance tables
b262ad440 added code to have the planner remove redundant IS NOT NULL
quals and eliminate needless scans for IS NULL quals on tables where the
qual's column has a NOT NULL constraint.

That commit failed to consider that an inheritance parent table could
have differing NOT NULL constraints between the parent and the child.
This caused issues as if we eliminated a qual on the parent, when
applying the quals to child tables in apply_child_basequals(), the qual
might not have been added to the parent's baserestrictinfo.

Here we fix this by not applying the optimization to remove redundant
quals to RelOptInfos belonging to inheritance parents and applying the
optimization again in apply_child_basequals().  Effectively, this means
that the parent and child are considered independently as the parent has
both an inh=true and inh=false RTE and we still apply the optimization
to the RelOptInfo corresponding to the inh=false RTE.

We're able to still apply the optimization in add_base_clause_to_rel()
for partitioned tables as the NULLability of partitions must match that
of their parent.  And, if we ever expand restriction_is_always_false()
and restriction_is_always_true() to handle partition constraints then we
can apply the same logic as, even in multi-level partitioned tables,
there's no way to route values to a partition when the qual does not
match the partition qual of the partitioned table's parent partition.
The same is true for CHECK constraints as those must also match between
arent partitioned tables and their partitions.

Author: Richard Guo, David Rowley
Discussion: https://postgr.es/m/CAMbWs4930gQSZmjR7aANzEapdy61gCg6z8dT-kAEYD0sYWKPdQ@mail.gmail.com
2024-04-12 20:07:53 +12:00
Alexander Korotkov
ff9f72c68f revert: Transform OR clauses to ANY expression
This commit reverts 72bd38cc99 due to implementation and design issues.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/3604469.1712628736%40sss.pgh.pa.us
2024-04-10 02:28:09 +03:00
Alexander Korotkov
beabea6c20 Fix usage of same ListCell transform_or_to_any()'s in nested loops
Discussion: https://postgr.es/m/CAAKRu_b4SXNW4GAM0bv3e6wcL5ODSXg1ZdRCn6uyLLjSPbveBg%40mail.gmail.com
Author: Melanie Plageman
2024-04-08 01:38:37 +03:00
Alexander Korotkov
72bd38cc99 Transform OR clauses to ANY expression
Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...])
on the preliminary stage of optimization when we are still working with the
expression tree.

Here Cn is a n-th constant expression, 'expr' is non-constant expression, 'op'
is an operator which returns boolean result and has a commuter (for the case
of reverse order of constant and non-constant parts of the expression,
like 'Cn op expr').

Sometimes it can lead to not optimal plan.  This is why there is a
or_to_any_transform_limit GUC.  It specifies a threshold value of length of
arguments in an OR expression that triggers the OR-to-ANY transformation.
Generally, more groupable OR arguments mean that transformation will be more
likely to win than to lose.

Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru
Author: Alena Rybakina <lena.ribackina@yandex.ru>
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
2024-04-08 01:27:52 +03:00
Peter Geoghegan
5bf748b86b Enhance nbtree ScalarArrayOp execution.
Commit 9e8da0f7 taught nbtree to handle ScalarArrayOpExpr quals
natively.  This works by pushing down the full context (the array keys)
to the nbtree index AM, enabling it to execute multiple primitive index
scans that the planner treats as one continuous index scan/index path.
This earlier enhancement enabled nbtree ScalarArrayOp index-only scans.
It also allowed scans with ScalarArrayOp quals to return ordered results
(with some notable restrictions, described further down).

Take this general approach a lot further: teach nbtree SAOP index scans
to decide how to execute ScalarArrayOp scans (when and where to start
the next primitive index scan) based on physical index characteristics.
This can be far more efficient.  All SAOP scans will now reliably avoid
duplicative leaf page accesses (just like any other nbtree index scan).
SAOP scans whose array keys are naturally clustered together now require
far fewer index descents, since we'll reliably avoid starting a new
primitive scan just to get to a later offset from the same leaf page.

The scan's arrays now advance using binary searches for the array
element that best matches the next tuple's attribute value.  Required
scan key arrays (i.e. arrays from scan keys that can terminate the scan)
ratchet forward in lockstep with the index scan.  Non-required arrays
(i.e. arrays from scan keys that can only exclude non-matching tuples)
"advance" without the process ever rolling over to a higher-order array.

Naturally, only required SAOP scan keys trigger skipping over leaf pages
(non-required arrays cannot safely end or start primitive index scans).
Consequently, even index scans of a composite index with a high-order
inequality scan key (which we'll mark required) and a low-order SAOP
scan key (which we won't mark required) now avoid repeating leaf page
accesses -- that benefit isn't limited to simpler equality-only cases.
In general, all nbtree index scans now output tuples as if they were one
continuous index scan -- even scans that mix a high-order inequality
with lower-order SAOP equalities reliably output tuples in index order.
This allows us to remove a couple of special cases that were applied
when building index paths with SAOP clauses during planning.

Bugfix commit 807a40c5 taught the planner to avoid generating unsafe
path keys: path keys on a multicolumn index path, with a SAOP clause on
any attribute beyond the first/most significant attribute.  These cases
are now all safe, so we go back to generating path keys without regard
for the presence of SAOP clauses (just like with any other clause type).
Affected queries can now exploit scan output order in all the usual ways
(e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early).

Also undo changes from follow-up bugfix commit a4523c5a, which taught
the planner to produce alternative index paths, with path keys, but
without low-order SAOP index quals (filter quals were used instead).
We'll no longer generate these alternative paths, since they can no
longer offer any meaningful advantages over standard index qual paths.
Affected queries thereby avoid all of the disadvantages that come from
using filter quals within index scan nodes.  They can avoid extra heap
page accesses from using filter quals to exclude non-matching tuples
(index quals will never have that problem).  They can also skip over
irrelevant sections of the index in more cases (though only when nbtree
determines that starting another primitive scan actually makes sense).

There is a theoretical risk that removing restrictions on SAOP index
paths from the planner will break compatibility with amcanorder-based
index AMs maintained as extensions.  Such an index AM could have the
same limitations around ordered SAOP scans as nbtree had up until now.
Adding a pro forma incompatibility item about the issue to the Postgres
17 release notes seems like a good idea.

Author: Peter Geoghegan <pg@bowt.ie>
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com
2024-04-06 11:47:10 -04:00
David Rowley
7487044d6c Don't adjust ressortgroupref in generate_setop_child_grouplist()
This is already done inside assignSortGroupRef(), therefore is
redundant.

Oversight from 66c0185a3.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/3703023.1711654574@sss.pgh.pa.us
2024-04-03 15:39:29 +13:00
David Rowley
3b1a7eb289 Don't zero tuple_fraction when planning UNIONs with ORDER BYs
Since 66c0185a3, the planner is able to use Merge Append -> Unique to
implement UNION queries and each subquery is prompted to produce Paths
correctly sorted by the UNION's targetlist.

Here we remove some now redundant code which was zeroing the
tuple_fraction at the parent level.  This will allow the planner to
consider cheap startup paths when planning the UNION's subqueries.

EXCEPT and INTERSECT set operations still have the tuple_fraction zeroed
in generate_nonunion_paths().  These operations currently always read
all of their subqueries' tuples.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/3703023.1711654574@sss.pgh.pa.us
2024-04-03 11:40:33 +13:00
David Rowley
d5d2205c8d Fix assert failure when planning setop subqueries with CTEs
66c0185a3 adjusted the UNION planner to request that union child queries
produce Paths correctly ordered to implement the UNION by way of
MergeAppend followed by Unique.  The code there made a bad assumption
that if the root->parent_root->parse had setOperations set that the
query must be the child subquery of a set operation.  That's not true
when it comes to planning a non-inlined CTE which is parented by a set
operation.  This causes issues as the CTE's targetlist has no
requirement to match up to the SetOperationStmt's groupClauses

Fix this by adding a new parameter to both subquery_planner() and
grouping_planner() to explicitly pass the SetOperationStmt only when
planning set operation child subqueries.

Thank you to Tom Lane for helping to rationalize the decision on the
best function signature for subquery_planner().

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/242fc7c6-a8aa-2daf-ac4c-0a231e2619c1@gmail.com
2024-04-02 12:15:45 +13:00
Dean Rasheed
0294df2f1f Add support for MERGE ... WHEN NOT MATCHED BY SOURCE.
This allows MERGE commands to include WHEN NOT MATCHED BY SOURCE
actions, which operate on rows that exist in the target relation, but
not in the data source. These actions can execute UPDATE, DELETE, or
DO NOTHING sub-commands.

This is in contrast to already-supported WHEN NOT MATCHED actions,
which operate on rows that exist in the data source, but not in the
target relation. To make this distinction clearer, such actions may
now be written as WHEN NOT MATCHED BY TARGET.

Writing WHEN NOT MATCHED without specifying BY SOURCE or BY TARGET is
equivalent to writing WHEN NOT MATCHED BY TARGET.

Dean Rasheed, reviewed by Alvaro Herrera, Ted Yu and Vik Fearing.

Discussion: https://postgr.es/m/CAEZATCWqnKGc57Y_JanUBHQXNKcXd7r=0R4NEZUVwP+syRkWbA@mail.gmail.com
2024-03-30 10:00:26 +00:00
Tom Lane
be98a550cc Update comment in set_dummy_rel_pathlist().
This comment claimed that set_dummy_rel_pathlist() has callers
other than (possibly indirectly) set_rel_size().  It doesn't,
so revise the argument to not rely on that.

Noted by Richard Guo.

Discussion: https://postgr.es/m/CAMbWs4-KFEU_fDuJPNCOkUu3rwvZvKBEytkd9VrM4kH4-2h1CQ@mail.gmail.com
2024-03-28 11:44:00 -04:00
Tom Lane
9d00cf4772 Remove some redundant set_cheapest() calls.
Commit e2fa76d80 centralized the responsibility for doing
set_cheapest() for a baserel, but these functions added later
seemingly didn't get the memo.  There's no apparent reason why
we need the cheapest path for these relation types to be available
any sooner than it is for other base relation types, so delete the
duplicate calls.  Doesn't save much since there's only one path
in these cases, but it might improve clarity.

Richard Guo

Discussion: https://postgr.es/m/CAMbWs4-KFEU_fDuJPNCOkUu3rwvZvKBEytkd9VrM4kH4-2h1CQ@mail.gmail.com
2024-03-26 16:02:44 -04:00
Tom Lane
a65724dfa7 Propagate pathkeys from CTEs up to the outer query.
If we know the sort order of a CTE's output, and it is relevant
to the outer query, label the CTE's outer-query access path using
those pathkeys.  This may enable optimizations such as avoiding
a sort in the outer query.

The code for hoisting pathkeys into the outer query already exists
for regular RTE_SUBQUERY subqueries, but it wasn't getting used for
CTEs, possibly out of concern for maintaining an optimization fence
between the CTE and the outer query.  However, on the same arguments
used for commit f7816aec2, there seems no harm in letting the outer
query know what the inner query decided to do.

In support of this, we now remember the best Path as well as Plan
for each subquery for the rest of the planner run.  There may be
future applications for having that at hand, and it surely costs
little to build one more List.

Richard Guo (minor mods by me)

Discussion: https://postgr.es/m/CAMbWs49xYd3f8CrE8-WW3--dV1zH_sDSDn-vs2DzHj81Wcnsew@mail.gmail.com
2024-03-26 13:05:51 -04:00
Tom Lane
c7076ba6ad Refactor predicate_{implied,refuted}_by_simple_clause.
Put the node-type-dependent operations into switches on nodeTag.
This should ease addition of new proof rules for other expression
node types.  There is no functional change, although some tests
are made in a different order than before.

Also, add a couple of new cross-checks in test_predtest.c.

James Coleman (part of a larger patch series)

Discussion: https://postgr.es/m/CAAaqYe8Bo4bf_i6qKj8KBsmHMYXhe3Xt6vOe3OBQnOaf3_XBWg@mail.gmail.com
2024-03-25 17:45:15 -04:00
Amit Langote
0f7863afef Code review for 6190d828cd
* Fix the comment of init_dummy_sjinfo() to remove references to
  non-existing parameters 'rel1' and 'rel2'.

* Adjust consider_new_or_clause() to call init_dummy_sjinfo() to make
  up a SpecialJoinInfo for inner joins like other code sites that
  were adjusted in 6190d828cd to do so.

Author: Richard Guo <guofenglinux@gmail.com>
Reported-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAExHW5tHqEf3ASVqvFFcghYGPfpy7o3xnvhHwBGbJFMRH8KjNw@mail.gmail.com
2024-03-25 19:45:27 +09:00
Amit Langote
6190d828cd Do not translate dummy SpecialJoinInfos for child joins
This teaches build_child_join_sjinfo() to create the dummy
SpecialJoinInfos (those created for inner joins) directly for a given
child join, skipping the unnecessary overhead of translating the
parent joinrel's SpecialJoinInfo.

To that end, this commit moves the code to initialize the dummy
SpecialJoinInfos to a new function named init_dummy_sjinfo() and
changes the few existing sites that have this code and
build_child_join_sjinfo() to call this new function.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/CAExHW5tHqEf3ASVqvFFcghYGPfpy7o3xnvhHwBGbJFMRH8KjNw@mail.gmail.com
2024-03-25 18:06:47 +09:00
Amit Langote
5278d0a2e8 Reduce memory used by partitionwise joins
Specifically, this commit reduces the memory consumed by the
SpecialJoinInfos that are allocated for child joins in
try_partitionwise_join() by freeing them at the end of creating paths
for each child join.

A SpecialJoinInfo allocated for a given child join is a copy of the
parent join's SpecialJoinInfo, which contains the translated copies
of the various Relids bitmapsets and semi_rhs_exprs, which is a List
of Nodes.  The newly added freeing step frees the struct itself and
the various bitmapsets, but not semi_rhs_exprs, because there's no
handy function to free the memory of Node trees.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/CAExHW5tHqEf3ASVqvFFcghYGPfpy7o3xnvhHwBGbJFMRH8KjNw@mail.gmail.com
2024-03-25 18:06:46 +09:00
David Rowley
66c0185a3d Allow planner to use Merge Append to efficiently implement UNION
Until now, UNION queries have often been suboptimal as the planner has
only ever considered using an Append node and making the results unique
by either using a Hash Aggregate, or by Sorting the entire Append result
and running it through the Unique operator.  Both of these methods
always require reading all rows from the union subqueries.

Here we adjust the union planner so that it can request that each subquery
produce results in target list order so that these can be Merge Appended
together and made unique with a Unique node.  This can improve performance
significantly as the union child can make use of the likes of btree
indexes and/or Merge Joins to provide the top-level UNION with presorted
input.  This is especially good if the top-level UNION contains a LIMIT
node that limits the output rows to a small subset of the unioned rows as
cheap startup plans can be used.

Author: David Rowley
Reviewed-by: Richard Guo, Andy Fan
Discussion: https://postgr.es/m/CAApHDvpb_63XQodmxKUF8vb9M7CxyUyT4sWvEgqeQU-GB7QFoQ@mail.gmail.com
2024-03-25 14:31:14 +13:00