As I'd feared, commit 5c9d8636d was still a few bricks shy of a load.
We can't just leave pulled-up lateral-reference Vars with no new
nullingrels: we have to carefully compute what subset of the
to-be-replaced Var's nullingrels apply to them, else we still get
"wrong varnullingrels" errors. This is a bit tedious, but it looks
like we can use the nullingrel data this patch computes for other
purposes, enabling better optimization. We don't want to inject
unnecessary plan changes into stable branches though, so leave that
idea for a later HEAD-only patch.
Patch by me, but thanks to Richard Guo for devising a test case that
broke 5c9d8636d, and for preliminary investigation about how to fix
it. As before, back-patch to v16.
Discussion: https://postgr.es/m/E1tGn4j-0003zi-MP@gemulon.postgresql.org
There is no point in transforming OR-clauses into SAOP's if the target index
doesn't support SAOP scans anyway. This commit adds corresponding checks
to match_orclause_to_indexcol() and group_similar_or_args(). The first check
fixes the actual bug, while the second just saves some cycles.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/8174de69-9e1a-0827-0e81-ef97f56a5939%40gmail.com
Author: Alena Rybakina
Reviewed-by: Ranier Vilela, Alexander Korotkov, Andrei Lepikhov
If we are pulling up a subquery that's under an outer join, and
the subquery's target list contains a strict expression that uses
both a subquery variable and a lateral-reference variable, it's okay
to pull up the expression without wrapping it in a PlaceHolderVar.
That's safe because if the subquery variable is forced to NULL
by the outer join, the expression result will come out as NULL too,
so we don't have to force that outcome by evaluating the expression
below the outer join. It'd be correct to wrap in a PHV, but that can
lead to very significantly worse plans, since we'd then have to use
a nestloop plan to pass down the lateral reference to where the
expression will be evaluated.
However, when we do that, we should not mark the lateral reference
variable as being nulled by the outer join, because it isn't after
we pull up the expression in this way. So the marking logic added
by cb8e50a4a was incorrect in this detail, leading to "wrong
varnullingrels" errors from the consistency-checking logic in
setrefs.c. It seems to be sufficient to just not mark lateral
references at all in this case. (I have a nagging feeling that more
complexity may be needed in cases where there are several levels of
outer join, but some attempts to break it with that didn't succeed.)
Per report from Bertrand Mamasam. Back-patch to v16, as the previous
patch was.
Discussion: https://postgr.es/m/CACZ67_UA_EVrqiFXJu9XK50baEpH=ofEPJswa2kFxg6xuSw-ww@mail.gmail.com
The function get_param_path_clause_serials() is used to get the set of
pushed-down clauses enforced within a parameterized Path. Since we
don't currently support parameterized MergeAppend paths, and it
doesn't look like that is going to change anytime soon (as explained
in the comments for generate_orderedappend_paths), we don't need to
consider MergeAppendPath in this function.
This change won't make any measurable difference in performance; it's
just for clarity's sake.
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
Discussion: https://postgr.es/m/CAMbWs4_Puie4DQ2ODvjQB_3CxYkUODnrJm8jn_ObMAcrjYNW7Q@mail.gmail.com
The ordering of DISTINCT items is semantically insignificant, so we
can reorder them as needed. In fact, in the parser, we absorb the
sorting semantics of the sortClause as much as possible into the
distinctClause, ensuring that one clause is a prefix of the other.
This can help avoid a possible need to re-sort.
In this commit, we attempt to adjust the DISTINCT keys to match the
input path's pathkeys. This can likewise help avoid re-sorting, or
allow us to use incremental-sort to save efforts.
For DISTINCT ON expressions, the parser already ensures that they
match the initial ORDER BY expressions. When reordering the DISTINCT
keys, we must ensure that the resulting pathkey list matches the
initial distinctClause pathkeys.
This introduces a new GUC, enable_distinct_reordering, which allows
the optimization to be disabled if needed.
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
Discussion: https://postgr.es/m/CAMbWs48dR26cCcX0f=8bja2JKQPcU64136kHk=xekHT9xschiQ@mail.gmail.com
When optimizer generates bitmap paths, it considers breaking OR-clause
arguments one-by-one. But now, a group of similar OR-clauses can be
transformed into SAOP during index matching. So, bitmap paths should
keep up.
This commit teaches bitmap paths generation machinery to group similar
OR-clauses into dedicated RestrictInfos. Those RestrictInfos are considered
both to match index as a whole (as SAOP), or to match as a set of individual
OR-clause argument one-by-one (the old way).
Therefore, bitmap path generation will takes advantage of OR-clauses to SAOP's
transformation. The old way of handling them is also considered. So, there
shouldn't be planning regression.
Discussion: https://postgr.es/m/CAPpHfdu5iQOjF93vGbjidsQkhHvY2NSm29duENYH_cbhC6x%2BMg%40mail.gmail.com
Author: Alexander Korotkov, Andrey Lepikhov
Reviewed-by: Alena Rybakina, Andrei Lepikhov, Jian he, Robert Haas
Reviewed-by: Peter Geoghegan
This commit makes match_clause_to_indexcol() match
"(indexkey op C1) OR (indexkey op C2) ... (indexkey op CN)" expression
to the index while transforming it into "indexkey op ANY(ARRAY[C1, C2, ...])"
(ScalarArrayOpExpr node).
This transformation allows handling long OR-clauses with single IndexScan
avoiding diving them into a slower BitmapOr.
We currently restrict Ci to be either Const or Param to apply this
transformation only when it's clearly beneficial. However, in the future,
we might switch to a liberal understanding of constants, as it is in other
cases.
Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru
Author: Alena Rybakina, Andrey Lepikhov, Alexander Korotkov
Reviewed-by: Peter Geoghegan, Ranier Vilela, Alexander Korotkov, Robert Haas
Reviewed-by: Jian He, Tom Lane, Nikolay Shaplov
In the dim past we figured it was okay to ignore collations
when combining UNION set-operation nodes into a single N-way
UNION operation. I believe that was fine at the time, but
it stopped being fine when we added nondeterministic collations:
the semantics of distinct-ness are affected by those. v17 made
it even less fine by allowing per-child sorting operations to
be merged via MergeAppend, although I think we accidentally
avoided any live bug from that.
Add a check that collations match before deciding that two
UNION nodes are equivalent. I also failed to resist the
temptation to comment plan_union_children() a little better.
Back-patch to all supported branches (v13 now), since they
all have nondeterministic collations.
Discussion: https://postgr.es/m/3605568.1731970579@sss.pgh.pa.us
Commit ac04aa84a put the shutoff for this into the planner, which is
not ideal because it doesn't prevent us from re-using a previously
made parallel plan. Revert the planner change and instead put the
shutoff into InitializeParallelDSM, modeling it on the existing code
there for recovering from failure to allocate a DSM segment.
However, that code path is mostly untested, and testing a bit harder
showed there's at least one bug: ExecHashJoinReInitializeDSM is not
prepared for us to have skipped doing parallel DSM setup. I also
thought the Assert in ReinitializeParallelWorkers is pretty
ill-advised, and replaced it with a silent Min() operation.
The existing test case added by ac04aa84a serves fine to test this
version of the fix, so no change needed there.
Patch by me, but thanks to Noah Misch for the core idea that we
could shut off worker creation when !INTERRUPTS_CAN_BE_PROCESSED.
Back-patch to v12, as ac04aa84a was.
Discussion: https://postgr.es/m/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com
We now create contype='n' pg_constraint rows for not-null constraints on
user tables. Only one such constraint is allowed for a column.
We propagate these constraints to other tables during operations such as
adding inheritance relationships, creating and attaching partitions and
creating tables LIKE other tables. These related constraints mostly
follow the well-known rules of conislocal and coninhcount that we have
for CHECK constraints, with some adaptations: for example, as opposed to
CHECK constraints, we don't match not-null ones by name when descending
a hierarchy to alter or remove it, instead matching by the name of the
column that they apply to. This means we don't require the constraint
names to be identical across a hierarchy.
The inheritance status of these constraints can be controlled: now we
can be sure that if a parent table has one, then all children will have
it as well. They can optionally be marked NO INHERIT, and then children
are free not to have one. (There's currently no support for altering a
NO INHERIT constraint into inheriting down the hierarchy, but that's a
desirable future feature.)
This also opens the door for having these constraints be marked NOT
VALID, as well as allowing UNIQUE+NOT NULL to be used for functional
dependency determination, as envisioned by commit e49ae8d3bc. It's
likely possible to allow DEFERRABLE constraints as followup work, as
well.
psql shows these constraints in \d+, though we may want to reconsider if
this turns out to be too noisy. Earlier versions of this patch hid
constraints that were on the same columns of the primary key, but I'm
not sure that that's very useful. If clutter is a problem, we might be
better off inventing a new \d++ command and not showing the constraints
in \d+.
For now, we omit these constraints on system catalog columns, because
they're unlikely to achieve anything.
The main difference to the previous attempt at this (b0e96f3119) is
that we now require that such a constraint always exists when a primary
key is in the column; we didn't require this previously which had a
number of unpalatable consequences. With this requirement, the code is
easier to reason about. For example:
- We no longer have "throwaway constraints" during pg_dump. We needed
those for the case where a table had a PK without a not-null
underneath, to prevent a slow scan of the data during restore of the
PK creation, which was particularly problematic for pg_upgrade.
- We no longer have to cope with attnotnull being set spuriously in
case a primary key is dropped indirectly (e.g., via DROP COLUMN).
Some bits of code in this patch were authored by Jian He.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Bernd Helmle <mailings@oopsware.de>
Reviewed-by: 何建 (jian he) <jian.universality@gmail.com>
Reviewed-by: 王刚 (Tender Wang) <tndrwang@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/202408310358.sdhumtyuy2ht@alvherre.pgsql
If the collation of any join key column doesn’t match the collation of
the corresponding partition key, partitionwise joins can yield incorrect
results. For example, rows that would match under the join key collation
might be located in different partitions due to the partitioning
collation. In such cases, a partitionwise join would yield different
results from a non-partitionwise join, so disallow it in such cases.
Reported-by: Tender Wang <tndrwang@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
Backpatch-through: 12
If the collation of any grouping column doesn’t match the collation of
the corresponding partition key, partitionwise grouping can yield
incorrect results. For example, rows that would be grouped under the
grouping collation may end up in different partitions under the
partitioning collation. In such cases, full partitionwise grouping
would produce results that differ from those without partitionwise
grouping, so disallowed that.
Partial partitionwise aggregation is still allowed, as the Finalize
step reconciles partition-level aggregates with grouping requirements
across all partitions, ensuring that the final output remains
consistent.
This commit also fixes group_by_has_partkey() by ensuring the
RelabelType node is stripped from grouping expressions when matching
them to partition key expressions to avoid false mismatches.
Bug: #18568
Reported-by: Webbo Han <1105066510@qq.com>
Author: Webbo Han <1105066510@qq.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org
Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
Backpatch-through: 12
When we generate multiple clones of the same qual condition to cope
with outer join identity 3, we need to ensure that all the clones get
the same serial number. To achieve this, we reset the
root->last_rinfo_serial counter each time we produce RestrictInfo(s)
from the qual list (see deconstruct_distribute_oj_quals). This
approach works only if we ensure that we are not changing the qual
list in any way that'd affect the number of RestrictInfos built from
it.
However, with b262ad440, an IS NULL qual on a NOT NULL column might
result in an additional constant-FALSE RestrictInfo. And different
versions of the same qual clause can lead to different conclusions
about whether it can be reduced to constant-FALSE. This would affect
the number of RestrictInfos built from the qual list for different
versions, causing inconsistent RestrictInfo serial numbers across
multiple clones of the same qual. This inconsistency can confuse
users of these serial numbers, such as rebuild_joinclause_attr_needed,
and lead to planner errors such as "ERROR: variable not found in
subplan target lists".
To fix, reset the root->last_rinfo_serial counter after generating the
additional constant-FALSE RestrictInfo.
Back-patch to v17 where the issue crept in. In v17, I failed to make
a test case that would expose this bug, so no test case for v17.
Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4-B6kafn+LmPuh-TYFwFyEm-vVj3Qqv7Yo-69CEv14rRg@mail.gmail.com
For an EXISTS subquery, the only thing that matters is whether it
returns zero or more than zero rows. Therefore, we remove certain SQL
features that won't affect that, among them the GROUP BY clauses.
After we drop the groupClause, we'd better remove the RTE_GROUP RTE
and clear the hasGroupRTE flag, as they depend on the groupClause.
Failing to do so could result in a bogus RTE_GROUP entry in the parent
query, leading to an assertion failure on the hasGroupRTE flag.
Reported-by: David Rowley
Author: Richard Guo
Discussion: https://postgr.es/m/CAApHDvp2_yht8uPLyWO-kVGWZhYvx5zjGfSrg4fBQ9fsC13V0g@mail.gmail.com
The decision in b6e1157e7 to ignore raw_expr when evaluating a
JsonValueExpr was incorrect. While its value is not ultimately
used (since formatted_expr's value is), failing to initialize it
can lead to problems, for instance, when the expression tree in
raw_expr contains Aggref nodes, which must be initialized to
ensure the parent Agg node works correctly.
Also, optimize eval_const_expressions_mutator()'s handling of
JsonValueExpr a bit. Currently, when formatted_expr cannot be folded
into a constant, we end up processing it twice -- once directly in
eval_const_expressions_mutator() and again recursively via
ece_generic_processing(). This recursive processing is required to
handle raw_expr. To avoid the redundant processing of formatted_expr,
we now process raw_expr directly in eval_const_expressions_mutator().
Finally, update the comment of JsonValueExpr to describe the roles of
raw_expr and formatted_expr more clearly.
Bug: #18657
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Fabio R. Sluzala <fabio3rs@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18657-1b90ccce2b16bdb8@postgresql.org
Backpatch-through: 16
Two near-identical copies of clause_sides_match_join() existed in
joinpath.c and analyzejoins.c. Deduplicate this by moving the function
into restrictinfo.h.
It isn't quite clear that keeping the inline property of this function
is worthwhile, but this commit is just an exercise in code
deduplication. More effort would be required to determine if the inline
property is worth keeping.
Author: James Hunter <james.hunter.pg@gmail.com>
Discussion: https://postgr.es/m/CAJVSvF7Nm_9kgMLOch4c-5fbh3MYg%3D9BdnDx3Dv7Fcb64zr64Q%40mail.gmail.com
The MergeJoin struct was tracking "mergeStrategies", which were an
array of btree strategy numbers, purely for the purpose of comparing
it later against btree strategies to determine if the scan direction
was forward or reverse. Change that. Instead, track
"mergeReversals", an array of bool, to indicate the same without an
unfortunate assumption that a strategy number refers specifically to a
btree strategy.
Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
Functions make_pathkey_from_sortop() and transformWindowDefinitions(),
which receive a SortGroupClause, were determining the sort order
(ascending vs. descending) by comparing that structure's operator
strategy to BTLessStrategyNumber, but could just as easily have gotten
it from the SortGroupClause object, if it had such a field, so add
one. This reduces the number of places that hardcode the assumption
that the strategy refers specifically to a btree strategy, rather than
some other index AM's operators.
Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
find_computable_ec_member() had the wrong mental model of what
its primary caller prepare_sort_from_pathkeys() would do with
the selected EquivalenceClass member expression. We will not
compute the EC expression in a plan node atop the one returning
the passed-in targetlist; rather, the EC expression will be
computed as an additional column of that targetlist. So any
Var or quasi-Var used in the given tlist is also available to the
EC expression. In simple cases this makes no difference because
the given tlist is just a list of Vars or quasi-Vars --- but if
we are considering an appendrel member produced by flattening
a UNION ALL, the tlist may contain expressions, resulting in
failure to match and a "could not find pathkey item to sort"
error.
To fix, we can flatten both the tlist and the EC members with
pull_var_clause(), and then just check for subset-ness, so
that the code is actually shorter than before.
While this bug is quite old, the present patch only works back to
v13. We could possibly make it work in v12 by back-patching parts
of 375398244. On the whole though I don't like the risk/reward
ratio of that idea. v12's final release is next month, meaning
there would be no chance to correct matters if the patch causes a
regression. Since this failure has escaped notice for 14 years,
it's likely nobody will hit it in the field with v12.
Per bug #18652 from Alexander Lakhin.
Andrei Lepikhov and Tom Lane
Discussion: https://postgr.es/m/18652-deaa782ebcca85d1@postgresql.org
c01743aa4 added EXPLAIN output to display the plan node's disabled_node
count whenever that count is above 0. Seemingly, there weren't many
people who liked that output as each parent of a disabled node would
also have a "Disabled Nodes" output due to the way disabled_nodes is
accumulated towards the root plan node. It was often hard and sometimes
impossible to figure out which nodes were disabled from looking at
EXPLAIN. You might think it would be possible to manually add up the
numbers from the "Disabled Nodes" output of a given node's children to
figure out if that node has a higher disabled_nodes count than its
children, but that wouldn't have worked for Append and Merge Append nodes
if some disabled child nodes were run-time pruned during init plan. Those
children are not displayed in EXPLAIN.
Here we attempt to improve this output by only showing "Disabled: true"
against only the nodes which are explicitly disabled themselves. That
seems to be the output that's desired by the most people who voiced
their opinion. This is done by summing up the disabled_nodes of the
given node's children and checking if that number is less than the
disabled_nodes of the current node.
This commit also fixes a bug in make_sort() which was neglecting to set
the Sort's disabled_nodes field. This should have copied what was done
in cost_sort(), but it hadn't been updated. With the new output, the
choice to not maintain that field properly was clearly wrong as the
disabled-ness of the node was attributed to the Sort's parent instead.
Reviewed-by: Laurenz Albe, Alena Rybakina
Discussion: https://postgr.es/m/9e4ad616bebb103ec2084bf6f724cfc739e7fabb.camel@cybertec.at
Commit 9391f7152 added a "PlannerInfo *root" parameter to
estimate_array_length, but failed to consider the possibility that
NULL would be passed for that, leading to a null pointer dereference.
We could rectify the particular case shown in the bug report by fixing
simplify_function/inline_function to pass through the root pointer.
However, as long as eval_const_expressions is documented to accept
NULL for root, similar hazards would remain. For now, let's just do
the narrow fix of hardening estimate_array_length to not crash.
Its behavior with NULL root will be the same as it was before
9391f7152, so this is not too awful.
Per report from Fredrik Widlert (via Paul Ramsey). Back-patch to v17
where 9391f7152 came in.
Discussion: https://postgr.es/m/518339E7-173E-45EC-A0FF-9A4A62AA4F40@cleverelephant.ca
In some cases, we may want to transfer a HAVING clause into WHERE in
hopes of eliminating tuples before aggregation instead of after.
Previously, we couldn't do this if there were any nonempty grouping
sets, because we didn't have a way to tell if the HAVING clause
referenced any columns that were nullable by the grouping sets, and
moving such a clause into WHERE could potentially change the results.
Now, with expressions marked nullable by grouping sets with the RT
index of the RTE_GROUP RTE, it is much easier to identify those
clauses that reference any nullable-by-grouping-sets columns: we just
need to check if the RT index of the RTE_GROUP RTE is present in the
clause. For other HAVING clauses, they can be safely pushed down.
Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4-NpzPgtKU=hgnvyn+J-GanxQCjrUi7piNzZ=upiCV=2Q@mail.gmail.com
For a mergejoin, if the given outer path or inner path is not already
well enough ordered, we need to do an explicit sort. Currently, we
only consider explicit full sort and do not account for incremental
sort.
In this patch, for the outer path of a mergejoin, we choose to use
explicit incremental sort if it is enabled and there are presorted
keys. For the inner path, though, we cannot use incremental sort
because it does not support mark/restore at present.
The rationale is based on the assumption that incremental sort is
always faster than full sort when there are presorted keys, a premise
that has been applied in various parts of the code. In addition, the
current cost model tends to favor incremental sort as being cheaper
than full sort in the presence of presorted keys, making it reasonable
not to consider full sort in such cases.
It could be argued that what if a mergejoin with an incremental sort
as the outer path is selected as the inner path of another mergejoin.
However, this should not be a problem, because mergejoin itself does
not support mark/restore either, and we will add a Material node on
top of it anyway in this case (see final_cost_mergejoin).
There is one ensuing plan change in the regression tests, and we have
to modify that test case to ensure that it continues to test what it
is intended to.
No backpatch as this could result in plan changes.
Author: Richard Guo
Reviewed-by: David Rowley, Tomas Vondra
Discussion: https://postgr.es/m/CAMbWs49x425QrX7h=Ux05WEnt8GS757H-jOP3_xsX5t1FoUsZw@mail.gmail.com
If a MERGE command contains WHEN NOT MATCHED BY SOURCE actions, the
source relation appears on the outer side of the join. Thus, any Vars
referring to the source in the merge join condition, actions, and
RETURNING list should be marked as nullable by the join, since they
are used in the ModifyTable node above the join. Note that this only
applies to the copy of join condition used in the executor to
distinguish MATCHED from NOT MATCHED BY SOURCE cases. Vars in the
original join condition, inside the join node itself, should not be
marked.
Failure to correctly mark these Vars led to a "wrong varnullingrels"
error in the final stage of query planning, in some circumstances. We
happened to get away without this in all previous tests, since they
all involved a ModifyTable node directly on top of the join node, so
that the top plan targetlist coincided with the output of the join,
and the varnullingrels check was more lax. However, if another plan
node, such as a one-time filter Result node, gets inserted between the
ModifyTable node and the join node, then a stricter check is applied,
which fails.
Per bug #18634 from Alexander Lakhin. Thanks to Tom Lane and Richard
Guo for review and analysis.
Back-patch to v17, where WHEN NOT MATCHED BY SOURCE support was added
to MERGE.
Discussion: https://postgr.es/m/18634-db5299c937877f2b%40postgresql.org
If a MERGE command contains WHEN NOT MATCHED BY SOURCE actions, the
merge join condition is used by the executor to distinguish MATCHED
from NOT MATCHED BY SOURCE cases. However, this qual is executed using
the output from the join subplan node, which nulls the output from the
source relation in the not matched case, and so the result may be
incorrect if the join condition is "non-strict" -- for example,
something like "src.col IS NOT DISTINCT FROM tgt.col".
Fix this by enhancing the join recheck condition with an additional
"src IS NOT NULL" check, so that it does the right thing when
evaluated using the output from the join subplan.
Noted by Tom Lane while investigating bug #18634 from Alexander
Lakhin.
Back-patch to v17, where WHEN NOT MATCHED BY SOURCE support was added
to MERGE.
Discussion: https://postgr.es/m/18634-db5299c937877f2b%40postgresql.org
Up to now, remove_rel_from_query() has done a pretty shoddy job
of updating our where-needed bitmaps (per-Var attr_needed and
per-PlaceHolderVar ph_needed relid sets). It removed direct mentions
of the to-be-removed baserel and outer join, which is the minimum
amount of effort needed to keep the data structures self-consistent.
But it didn't account for the fact that the removed join ON clause
probably mentioned Vars of other relations, and those Vars might now
not be needed as high up in the join tree as before. It's easy to
show cases where this results in failing to remove a lower outer join
that could also have been removed.
To fix, recalculate the where-needed bitmaps from scratch after
each successful join removal. This sounds expensive, but it seems
to add only negligible planner runtime. (We cheat a little bit
by preserving "relation 0" entries in the bitmaps, allowing us to
skip re-scanning the targetlist and HAVING qual.)
The submitted test case drew attention because we had successfully
optimized away the lower join prior to v16. I suspect that that's
somewhat accidental and there are related cases that were never
optimized before and now can be. I've not tried to come up with
one, though.
Perhaps we should back-patch this into v16 and v17 to repair the
performance regression. However, since it took a year for anyone
to notice the problem, it can't be affecting too many people. Let's
let the patch bake awhile in HEAD, and see if we get more complaints.
Per bug #18627 from Mikaël Gourlaouen. No back-patch for now.
Discussion: https://postgr.es/m/18627-44f950eb6a8416c2@postgresql.org
Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints.
These are backed by GiST indexes instead of B-tree indexes, since they
are essentially exclusion constraints with = for the scalar parts of
the key and && for the temporal part.
(previously committed as 46a0cd4cef, reverted by 46a0cd4cefb; the new
part is this:)
Because 'empty' && 'empty' is false, the temporal PK/UQ constraint
allowed duplicates, which is confusing to users and breaks internal
expectations. For instance, when GROUP BY checks functional
dependencies on the PK, it allows selecting other columns from the
table, but in the presence of duplicate keys you could get the value
from any of their rows. So we need to forbid empties.
This all means that at the moment we can only support ranges and
multiranges for temporal PK/UQs, unlike the original patch (above).
Documentation and tests for this are added. But this could
conceivably be extended by introducing some more general support for
the notion of "empty" for other types.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
When generating window_pathkeys, distinct_pathkeys, or sort_pathkeys,
we failed to realize that the grouping/ordering expressions might be
nullable by grouping sets. As a result, we may incorrectly deem that
the PathKeys are redundant by EquivalenceClass processing and thus
remove them from the pathkeys list. That would lead to wrong results
in some cases.
To fix this issue, we mark the grouping expressions nullable by
grouping sets if that is the case. If the grouping expression is a
Var or PlaceHolderVar or constructed from those, we can just add the
RT index of the RTE_GROUP RTE to the existing nullingrels field(s);
otherwise we have to add a PlaceHolderVar to carry on the nullingrel
bit.
However, we have to manually remove this nullingrel bit from
expressions in various cases where these expressions are logically
below the grouping step, such as when we generate groupClause pathkeys
for grouping sets, or when we generate PathTarget for initial input to
grouping nodes.
Furthermore, in set_upper_references, the targetlist and quals of an
Agg node should have nullingrels that include the effects of the
grouping step, ie they will have nullingrels equal to the input
Vars/PHVs' nullingrels plus the nullingrel bit that references the
grouping RTE. In order to perform exact nullingrels matches, we also
need to manually remove this nullingrel bit.
Bump catversion because this changes the querytree produced by the
parser.
Thanks to Tom Lane for the idea to invent a new kind of RTE.
Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from
various threads.
Author: Richard Guo
Reviewed-by: Ashutosh Bapat, Sutou Kouhei
Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com
If there are subqueries in the grouping expressions, each of these
subqueries in the targetlist and HAVING clause is expanded into
distinct SubPlan nodes. As a result, only one of these SubPlan nodes
would be converted to reference to the grouping key column output by
the Agg node; others would have to get evaluated afresh. This is not
efficient, and with grouping sets this can cause wrong results issues
in cases where they should go to NULL because they are from the wrong
grouping set. Furthermore, during re-evaluation, these SubPlan nodes
might use nulled column values from grouping sets, which is not
correct.
This issue is not limited to subqueries. For other types of
expressions that are part of grouping items, if they are transformed
into another form during preprocessing, they may fail to match lower
target items. This can also lead to wrong results with grouping sets.
To fix this issue, we introduce a new kind of RTE representing the
output of the grouping step, with columns that are the Vars or
expressions being grouped on. In the parser, we replace the grouping
expressions in the targetlist and HAVING clause with Vars referencing
this new RTE, so that the output of the parser directly expresses the
semantic requirement that the grouping expressions be gotten from the
grouping output rather than computed some other way. In the planner,
we first preprocess all the columns of this new RTE and then replace
any Vars in the targetlist and HAVING clause that reference this new
RTE with the underlying grouping expressions, so that we will have
only one instance of a SubPlan node for each subquery contained in the
grouping expressions.
Bump catversion because this changes the querytree produced by the
parser.
Thanks to Tom Lane for the idea to invent a new kind of RTE.
Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from
various threads.
Author: Richard Guo
Reviewed-by: Ashutosh Bapat, Sutou Kouhei
Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com
In label_sort_with_costsize, the cost_sort function is called with the
parameters 'input_disabled_nodes' and 'input_cost' in the wrong order.
This does not cause any plan diffs in the regression tests, because
label_sort_with_costsize is only used to label the Sort node nicely
for EXPLAIN, and cost numbers are not displayed in regression tests.
Oversight in e22253467. Fixed by passing arguments in the right
order.
Per report from Alexander Lakhin running UBSan.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/a9b7231d-68bc-f117-a07c-96688f3e6aef@gmail.com
When generating paths for the ORDER BY clause, one thing we need to
ensure is that the output paths project the correct final_target. To
achieve this, in create_ordered_paths, we compare the pathtarget of
each generated path with the given 'target', and add a post-sort
projection step if the two targets do not match.
Currently we perform a simple pointer comparison between the two
targets. It turns out that this is not sufficient. Each sorted_path
generated in create_ordered_paths initially projects the correct
target required by the preceding steps of sort. If it is the same
pointer as sort_input_target, pointer comparison suffices, because
sort_input_target is always identical to final_target when no
post-sort projection is needed.
However, sorted_path's initial pathtarget may not be the same pointer
as sort_input_target, because in apply_scanjoin_target_to_paths, if
the target to be applied has the same expressions as the existing
reltarget, we only inject the sortgroupref info into the existing
pathtargets, rather than create new projection paths. As a result,
pointer comparison in create_ordered_paths is not reliable.
Instead, we can compare PathTarget.exprs to determine whether a
projection step is needed. If the expressions match, we can be
confident that a post-sort projection is not required.
It could be argued that this change adds extra check cost each time we
decide whether a post-sort projection is needed. However, as
explained in apply_scanjoin_target_to_paths, by avoiding the creation
of projection paths, we save effort both immediately and at plan
creation time. This, I think, justifies the extra check cost.
There are two ensuing plan changes in the regression tests, but they
look reasonable and are exactly what we are fixing here. So no
additional test cases are added.
No backpatch as this could result in plan changes.
Author: Richard Guo
Reviewed-by: Peter Eisentraut, David Rowley, Tom Lane
Discussion: https://postgr.es/m/CAMbWs48TosSvmnz88663_2yg3hfeOFss-J2PtnENDH6J_rLnRQ@mail.gmail.com
When creating merge or hash join plans in createplan.c, the merge or
hash clauses may need to get commuted to ensure that the outer var is
on the left and the inner var is on the right if they are not already
in the expected form. This requires that their operators have
commutators. Failing to find a commutator at this stage would result
in 'ERROR: could not find commutator for operator xxx', with no
opportunity to select an alternative plan.
Typically, this is not an issue because mergejoinable or hashable
operators are expected to always have valid commutators. But in some
artificial cases this assumption may not hold true. Therefore, here
in this patch we check the validity of commutators for clauses in the
form "inner op outer" when selecting mergejoin/hash clauses, and
consider a clause unusable for the current pair of outer and inner
relations if it lacks a commutator.
There are not (and should not be) any such operators built into
Postgres that are mergejoinable or hashable but have no commutators;
so we leverage the alias type 'int8alias1' created in equivclass.sql
to build the test case. This is why the test case is included in
equivclass.sql rather than in join.sql.
Although this is arguably a bug fix, it cannot be reproduced without
installing an incomplete opclass, which is unlikely to happen in
practice, so no back-patch.
Reported-by: Alexander Pyhalov
Author: Richard Guo
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/c59ec04a2fef94d9ffc35a9b17dfc081@postgrespro.ru
Commit 2489d76c4 removed some logic from pullup_replace_vars()
that avoided wrapping a PlaceHolderVar around a pulled-up
subquery output expression if the expression could be proven
to go to NULL anyway (because it contained Vars or PHVs of the
pulled-up relation and did not contain non-strict constructs).
But removing that logic turns out to cause performance regressions
in some cases, because the extra PHV blocks subexpression folding,
and will do so even if outer-join reduction later turns it into a
no-op with no phnullingrels bits. This can for example prevent
an expression from being matched to an index.
The reason for always adding a PHV was to ensure we had someplace
to put the varnullingrels marker bits of the Var being replaced.
However, it turns out we can optimize in exactly the same cases that
the previous code did, because we can instead attach the needed
varnullingrels bits to the contained Var(s)/PHV(s).
This is not a complete solution --- it would be even better if we
could remove PHVs after reducing them to no-ops. It doesn't look
practical to back-patch such an improvement, but this change seems
safe and at least gets rid of the performance-regression cases.
Per complaint from Nikhil Raj. Back-patch to v16 where the
problem appeared.
Discussion: https://postgr.es/m/CAG1ps1xvnTZceKK24OUfMKLPvDP2vjT-d+F2AOCWbw_v3KeEgg@mail.gmail.com
Now that disable_cost is not included in the cost estimate, there's
no visible sign in EXPLAIN output of which plan nodes are disabled.
Fix that by propagating the number of disabled nodes from Path to
Plan, and then showing it in the EXPLAIN output.
There is some question about whether this is a desirable change.
While I personally believe that it is, it seems best to make it a
separate commit, in case we decide to back out just this part, or
rework it.
Reviewed by Andres Freund, Heikki Linnakangas, and David Rowley.
Discussion: http://postgr.es/m/CA+TgmoZ_+MS+o6NeGK2xyBv-xM+w1AfFVuHE4f_aq6ekHv7YSQ@mail.gmail.com
Previously, when a path type was disabled by e.g. enable_seqscan=false,
we either avoided generating that path type in the first place, or
more commonly, we added a large constant, called disable_cost, to the
estimated startup cost of that path. This latter approach can distort
planning. For instance, an extremely expensive non-disabled path
could seem to be worse than a disabled path, especially if the full
cost of that path node need not be paid (e.g. due to a Limit).
Or, as in the regression test whose expected output changes with this
commit, the addition of disable_cost can make two paths that would
normally be distinguishible in cost seem to have fuzzily the same cost.
To fix that, we now count the number of disabled path nodes and
consider that a high-order component of both the startup cost and the
total cost. Hence, the path list is now sorted by disabled_nodes and
then by total_cost, instead of just by the latter, and likewise for
the partial path list. It is important that this number is a count
and not simply a Boolean; else, as soon as we're unable to respect
disabled path types in all portions of the path, we stop trying to
avoid them where we can.
Because the path list is now sorted by the number of disabled nodes,
the join prechecks must compute the count of disabled nodes during
the initial cost phase instead of postponing it to final cost time.
Counts of disabled nodes do not cross subquery levels; at present,
there is no reason for them to do so, since the we do not postpone
path selection across subquery boundaries (see make_subplan).
Reviewed by Andres Freund, Heikki Linnakangas, and David Rowley.
Discussion: http://postgr.es/m/CA+TgmoZ_+MS+o6NeGK2xyBv-xM+w1AfFVuHE4f_aq6ekHv7YSQ@mail.gmail.com
When a partition is detached and immediately dropped, a prepared
statement could try to compute a new partition descriptor that includes
it. This leads to this kind of error:
ERROR: could not open relation with OID 457639
Avoid this by skipping the partition in expand_partitioned_rtentry if it
doesn't exist.
Noted by me while investigating bug #18559. Kuntal Gosh helped to
identify the exact failure.
Backpatch to 14, where DETACH CONCURRENTLY was introduced.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/202408122233.bo4adt3vh5bi@alvherre.pgsql
To deparse a reference to a field of a RECORD-type output of a
subquery, EXPLAIN normally digs down into the subquery's plan to try
to discover exactly which anonymous RECORD type is meant. However,
this can fail if the subquery has been optimized out of the plan
altogether on the grounds that no rows could pass the WHERE quals,
which has been possible at least since 3fc6e2d7f. There isn't
anything remaining in the plan tree that would help us, so fall back
to printing the field name as "fN" for the N'th column of the record.
(This will actually be the right thing some of the time, since it
matches the column names we assign to RowExprs.)
In passing, fix a comment typo in create_projection_plan, which
I noticed while experimenting with an alternative fix for this.
Per bug #18576 from Vasya B. Back-patch to all supported branches.
Richard Guo and Tom Lane
Discussion: https://postgr.es/m/18576-9feac34e132fea9e@postgresql.org
When pg_dump retrieves the list of database objects and performs the
data dump, there was possibility that objects are replaced with others
of the same name, such as views, and access them. This vulnerability
could result in code execution with superuser privileges during the
pg_dump process.
This issue can arise when dumping data of sequences, foreign
tables (only 13 or later), or tables registered with a WHERE clause in
the extension configuration table.
To address this, pg_dump now utilizes the newly introduced
restrict_nonsystem_relation_kind GUC parameter to restrict the
accesses to non-system views and foreign tables during the dump
process. This new GUC parameter is added to back branches too, but
these changes do not require cluster recreation.
Back-patch to all supported branches.
Reviewed-by: Noah Misch
Security: CVE-2024-7348
Backpatch-through: 12
To determine if the two relations being joined can use partitionwise
join, we need to verify the existence of equi-join conditions
involving pairs of matching partition keys for all partition keys.
Currently we do that by looking through the join's restriction
clauses. However, it has been discovered that this approach is
insufficient, because there might be partition keys known equal by a
specific EC, but they do not form a join clause because it happens
that other members of the EC than the partition keys are constrained
to become a join clause.
To address this issue, in addition to examining the join's restriction
clauses, we also check if any partition keys are known equal by ECs,
by leveraging function exprs_known_equal(). To accomplish this, we
enhance exprs_known_equal() to check equality per the semantics of the
opfamily, if provided.
It could be argued that exprs_known_equal() could be called O(N^2)
times, where N is the number of partition key expressions, resulting
in noticeable performance costs if there are a lot of partition key
expressions. But I think this is not a problem. The number of a
joinrel's partition key expressions would only be equal to the join
degree, since each base relation within the join contributes only one
partition key expression. That is to say, it does not scale with the
number of partitions. A benchmark with a query involving 5-way joins
of partitioned tables, each with 3 partition keys and 1000 partitions,
shows that the planning time is not significantly affected by this
patch (within the margin of error), particularly when compared to the
impact caused by partitionwise join.
Thanks to Tom Lane for the idea of leveraging exprs_known_equal() to
check if partition keys are known equal by ECs.
Author: Richard Guo, Tom Lane
Reviewed-by: Tom Lane, Ashutosh Bapat, Robert Haas
Discussion: https://postgr.es/m/CAN_9JTzo_2F5dKLqXVtDX5V6dwqB0Xk+ihstpKEt3a1LT6X78A@mail.gmail.com
Parameterized partial paths are not supported, and we have several
checks in try_partial_xxx_path functions to enforce this. For a
partial nestloop join path, we need to ensure that if the inner path
is parameterized, the parameterization is fully satisfied by the
proposed outer path. For a partial merge/hashjoin join path, we need
to ensure that the inner path is not parameterized. In all cases, we
need to ensure that the outer path is not parameterized.
However, the comment in try_partial_hashjoin_path does not describe
this correctly. This patch fixes that.
In addtion, this patch simplifies the checks peformed in
try_partial_hashjoin_path and try_partial_mergejoin_path with the help
of macro PATH_REQ_OUTER, and also adds asserts that the outer path is
not parameterized in try_partial_xxx_path functions.
Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs48mKJ6g_GnYNa7dnw04MHaMK-jnAEBrMVhTp2uUg3Ut4A@mail.gmail.com
In sort_inner_and_outer, we create mergejoin join paths by explicitly
sorting both relations on each possible ordering of the available
mergejoin clauses. However, if there are no available mergejoin
clauses, we can skip this process entirely.
This patch introduces a check for mergeclause_list at the beginning of
sort_inner_and_outer and exits the function if it is found to be
empty. This might help skip all the statements that come before the
call to select_outer_pathkeys_for_merge, including the build of
UniquePaths in the case of JOIN_UNIQUE_OUTER or JOIN_UNIQUE_INNER.
I doubt there's any measurable performance improvement, but throughout
the run of the regression tests, sort_inner_and_outer is called a
total of 44,424 times. Among these calls, there are 11,064 instances
where mergeclause_list is found to be empty, which accounts for
approximately one-fourth. I think this suggests that implementing
this shortcut is worthwhile.
Author: Richard Guo
Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/CAMbWs48RKiZGFEd5A0JtztRY5ZdvVvNiHh0AKeuoz21F+0dVjQ@mail.gmail.com
In try_partitionwise_join, we aim to break down the join between two
partitioned relations into joins between matching partitions. To
achieve this, we iterate through each pair of partitions from the two
joining relations and create child-join relations for them. With
potentially thousands of partitions, the local objects allocated in
each iteration can accumulate significant memory usage. Therefore, we
opt to eagerly free these local objects at the end of each iteration.
In line with this approach, this patch frees the bitmap set that
represents the relids of child-join relations at the end of each
iteration. Additionally, it modifies build_child_join_rel() to reuse
the AppendRelInfo structures generated within each iteration.
Author: Ashutosh Bapat
Reviewed-by: David Christensen, Richard Guo
Discussion: https://postgr.es/m/CAExHW5s4EqY43oB=ne6B2=-xLgrs9ZGeTr1NXwkGFt2j-OmaQQ@mail.gmail.com
We don't currently support parameterized MergeAppend paths: there's
little use for an ordered path on the inside of a nestloop. Given
this, we can simplify create_merge_append_path by directly setting
param_info to NULL instead of calling get_appendrel_parampathinfo. We
can also simplify the Assert for child paths a little bit.
This change won't make any measurable difference in performance; it's
just for clarity's sake.
Author: Richard Guo
Reviewed-by: Alena Rybakina, Paul A Jungwirth
Discussion: https://postgr.es/m/CAMbWs4_n1bgH2nACMuGsXZct3KH6PBFS0tPdQsXdstRfyxTunQ@mail.gmail.com