join_is_legal() needs to reject forming certain outer joins in cases
where that would lead the planner down a blind alley. However, it
mistakenly supposed that the way to handle full joins was to treat them
as applying the same constraints as for left joins, only to both sides.
That doesn't work, as shown in bug #15741 from Anthony Skorski: given
a lateral reference out of a join that's fully enclosed by a full join,
the code would fail to believe that any join ordering is legal, resulting
in errors like "failed to build any N-way joins".
However, we don't really need to consider full joins at all for this
purpose, because we effectively force them to be evaluated in syntactic
order, and that order is always legal for lateral references. Hence,
get rid of this broken logic for full joins and just ignore them instead.
This seems to have been an oversight in commit 7e19db0c0.
Back-patch to all supported branches, as that was.
Discussion: https://postgr.es/m/15741-276f1f464b3f40eb@postgresql.org
The assertions added by commits 34ea1ab7f et al found another problem:
set_dummy_rel_pathlist and mark_dummy_rel were failing to label
the dummy paths they create with the correct outer_relids, in case
the relation is necessarily parameterized due to having lateral
references in its tlist. It's likely that this has no user-visible
consequences in production builds, at the moment; but still an assertion
failure is a bad thing, so back-patch the fix.
Per bug #15694 from Roman Zharkov (via Alexander Lakhin)
and an independent report by Tushar Ahuja.
Discussion: https://postgr.es/m/15694-74f2ca97e7044f7f@postgresql.org
Discussion: https://postgr.es/m/7d72ab20-c725-3ce2-f99d-4e64dd8a0de6@enterprisedb.com
In the case where inheritance_planner() finds that every table has
been excluded by constraints, it thought it could get away with
making a plan consisting of just a dummy Result node. While certainly
there's no updating or deleting to be done, this had two user-visible
problems: the plan did not report the correct set of output columns
when a RETURNING clause was present, and if there were any
statement-level triggers that should be fired, it didn't fire them.
Hence, rather than only generating the dummy Result, we need to
stick a valid ModifyTable node on top, which requires a tad more
effort here.
It's been broken this way for as long as inheritance_planner() has
known about deleting excluded subplans at all (cf commit 635d42e9c),
so back-patch to all supported branches.
Amit Langote and Tom Lane, per a report from Petr Fedorov.
Discussion: https://postgr.es/m/5da6f0f0-1364-1876-6978-907678f89a3e@phystech.edu
Check ec_relids before bothering to iterate through the EC members.
On a perhaps extreme, but still real-world, query in which
match_eclasses_to_foreign_key_col() accounts for the bulk of the
planner's runtime, this saves nearly 40% of the runtime. It's a bit
of a stopgap fix, but it's simple enough to be back-patched to 9.6
where this code came in; so let's do that.
David Rowley
Discussion: https://postgr.es/m/6970.1545327857@sss.pgh.pa.us
The previous ordering of these steps satisfied the nominal requirement
that set_rel_pathlist_hook could editorialize on the whole set of Paths
constructed for a base relation. In practice, though, trying to change
the set of partial paths was impossible. Adding one didn't work because
(a) it was too late to be included in Gather paths made by the core code,
and (b) calling add_partial_path after generate_gather_paths is unsafe,
because it might try to delete a path it thinks is dominated, but that
is already embedded in some Gather path(s). Nor could the hook safely
remove partial paths, for the same reason that they might already be
embedded in Gathers.
Better to call extensions first, let them add partial paths as desired,
and then gather. In v11 and up, we already doubled down on that ordering
by postponing gathering even further for single-relation queries; so even
if the hook wished to editorialize on Gather path construction, it could
not.
Report and patch by KaiGai Kohei. Back-patch to 9.6 where Gather paths
were added.
Discussion: https://postgr.es/m/CAOP8fzahwpKJRTVVTqo2AE=mDTz_efVzV6Get_0=U3SO+-ha1A@mail.gmail.com
As reported in bug #15613 from Srinivasan S A, file_fdw and postgres_fdw
neglected to mark plain baserel foreign paths as parameterized when the
relation has lateral_relids. Other FDWs have surely copied this mistake,
so rather than just patching those two modules, install a band-aid fix
in create_foreignscan_path to rectify the mistake centrally.
Although the band-aid is enough to fix the visible symptom, correct
the calls in file_fdw and postgres_fdw anyway, so that they are valid
examples for external FDWs.
Also, since the band-aid isn't enough to make this work for parameterized
foreign joins, throw an elog(ERROR) if such a case is passed to
create_foreignscan_path. This shouldn't pose much of a problem for
existing external FDWs, since it's likely they aren't trying to make such
paths anyway (though some of them may need a defense against joins with
lateral_relids, similar to the one this patch installs into postgres_fdw).
Add some assertions in relnode.c to catch future occurrences of the same
error --- in particular, as backstop against core-code mistakes like the
one fixed by commit bdd9a99aa.
Discussion: https://postgr.es/m/15613-092be1be9576c728@postgresql.org
create_lateral_join_info() computes a bunch of information about lateral
references between base relations, and then attempts to propagate those
markings to appendrel children of the original base relations. But the
original coding neglected the possibility of indirect descendants
(grandchildren etc). During v11 development we noticed that this was
wrong for partitioned-table cases, but failed to realize that it was just
as wrong for any appendrel. While the case can't arise for appendrels
derived from traditional table inheritance (because we make a flat
appendrel for that), nested appendrels can arise from nested UNION ALL
subqueries. Failure to mark the lower-level relations as having lateral
references leads to confusion in add_paths_to_append_rel about whether
unparameterized paths can be built. It's not very clear whether that
leads to any user-visible misbehavior; the lack of field reports suggests
that it may cause nothing worse than minor cost misestimation. Still,
it's a bug, and it leads to failures of Asserts that I intend to add
later.
To fix, we need to propagate information from all appendrel parents,
not just those that are RELOPT_BASERELs. We can still do it in one
pass, if we rely on the append_rel_list to be ordered with ancestor
relationships before descendant ones; add assertions checking that.
While fixing this, we can make a small performance improvement by
traversing the append_rel_list just once instead of separately for
each appendrel parent relation.
Noted while investigating bug #15613, though this patch does not fix
that (which is why I'm not committing the related Asserts yet).
Discussion: https://postgr.es/m/3951.1549403812@sss.pgh.pa.us
Up to now, createplan.c attempted to share PARAM_EXEC slots for
NestLoopParams across different plan levels, if the same underlying Var
was being fed down to different righthand-side subplan trees by different
NestLoops. This was, I think, more of an artifact of using subselect.c's
PlannerParamItem infrastructure than an explicit design goal, but anyway
that was the end result.
This works well enough as long as the plan tree is executing synchronously,
but the feature whereby Gather can execute the parallelized subplan locally
breaks it. An upper NestLoop node might execute for a row retrieved from
a parallel worker, and assign a value for a PARAM_EXEC slot from that row,
while the leader's copy of the parallelized subplan is suspended with a
different active value of the row the Var comes from. When control
eventually returns to the leader's subplan, it gets the wrong answers if
the same PARAM_EXEC slot is being used within the subplan, as reported
in bug #15577 from Bartosz Polnik.
This is pretty reminiscent of the problem fixed in commit 46c508fbc, and
the proper fix seems to be the same: don't try to share PARAM_EXEC slots
across different levels of controlling NestLoop nodes.
This requires decoupling NestLoopParam handling from PlannerParamItem
handling, although the logic remains somewhat similar. To avoid bizarre
division of labor between subselect.c and createplan.c, I decided to move
all the param-slot-assignment logic for both cases out of those files
and put it into a new file paramassign.c. Hopefully it's a bit better
documented now, too.
A regression test case for this might be nice, but we don't know a
test case that triggers the problem with a suitably small amount
of data.
Back-patch to 9.6 where we added Gather nodes. It's conceivable that
related problems exist in older branches; but without some evidence
for that, I'll leave the older branches alone.
Discussion: https://postgr.es/m/15577-ca61ab18904af852@postgresql.org
MinMaxExpr invokes the btree comparison function for its input datatype,
so it's only leakproof if that function is. Many such functions are
indeed leakproof, but others are not, and we should not just assume that
they are. Hence, adjust contain_leaked_vars to verify the leakproofness
of the referenced function explicitly.
I didn't add a regression test because it would need to depend on
some particular comparison function being leaky, and that's a moving
target, per discussion.
This has been wrong all along, so back-patch to supported branches.
Discussion: https://postgr.es/m/31042.1546194242@sss.pgh.pa.us
"rescanratio" was computed as 1 + rescanned-tuples / total-inner-tuples,
which is sensible if it's to be multiplied by total-inner-tuples or a cost
value corresponding to scanning all the inner tuples. But in reality it
was (mostly) multiplied by inner_rows or a related cost, numbers that take
into account the possibility of stopping short of scanning the whole inner
relation thanks to a limited key range in the outer relation. This'd
still make sense if we could expect that stopping short would result in a
proportional decrease in the number of tuples that have to be rescanned.
It does not, however. The argument that establishes the validity of our
estimate for that number is independent of whether we scan all of the inner
relation or stop short, and experimentation also shows that stopping short
doesn't reduce the number of rescanned tuples. So the correct calculation
is 1 + rescanned-tuples / inner_rows, and we should be sure to multiply
that by inner_rows or a corresponding cost value.
Most of the time this doesn't make much difference, but if we have
both a high rescan rate (due to lots of duplicate values) and an outer
key range much smaller than the inner key range, then the error can
be significant, leading to a large underestimate of the cost associated
with rescanning.
Per report from Vijaykumar Jain. This thinko appears to go all the way
back to the introduction of the rescan estimation logic in commit
70fba7043, so back-patch to all supported branches.
Discussion: https://postgr.es/m/CAE7uO5hMb_TZYJcZmLAgO6iD68AkEK6qCe7i=vZUkCpoKns+EQ@mail.gmail.com
postgres_fdw's postgresGetForeignPlan() assumes without checking that the
outer_plan it's given for a join relation must have a NestLoop, MergeJoin,
or HashJoin node at the top. That's been wrong at least since commit
4bbf6edfb (which could cause insertion of a Sort node on top) and it seems
like a pretty unsafe thing to Just Assume even without that.
Through blind good fortune, this doesn't seem to have any worse
consequences today than strange EXPLAIN output, but it's clearly trouble
waiting to happen.
To fix, test the node type explicitly before touching Join-specific
fields, and avoid jamming the new tlist into a node type that can't
do projection. Export a new support function from createplan.c
to avoid building low-level knowledge about the latter into FDWs.
Back-patch to 9.6 where the faulty coding was added. Note that the
associated regression test cases don't show any changes before v11,
apparently because the tests back-patched with 4bbf6edfb don't actually
exercise the problem case before then (there's no top-level Sort
in those plans).
Discussion: https://postgr.es/m/8946.1544644803@sss.pgh.pa.us
Our support for multiple-set-clauses in UPDATE assumes that the Params
referencing a MULTIEXPR_SUBLINK SubPlan will appear before that SubPlan
in the targetlist of the plan node that calculates the updated row.
(Yeah, it's a hack...) In some PG branches it's possible that a Result
node gets inserted between the primary calculation of the update tlist
and the ModifyTable node. setrefs.c did the wrong thing in this case
and left the upper-level Params as Params, causing a crash at runtime.
What it should do is replace them with "outer" Vars referencing the child
plan node's output. That's a result of careless ordering of operations
in fix_upper_expr_mutator, so we can fix it just by reordering the code.
Fix fix_join_expr_mutator similarly for consistency, even though join
nodes could never appear in such a context. (In general, it seems
likely to be a bit cheaper to use Vars than Params in such situations
anyway, so this patch might offer a tiny performance improvement.)
The hazard extends back to 9.5 where the MULTIEXPR_SUBLINK stuff
was introduced, so back-patch that far. However, this may be a live
bug only in 9.6.x and 10.x, as the other branches don't seem to want
to calculate the final tlist below the Result node. (That plan shape
change between branches might be a mini-bug in itself, but I'm not
really interested in digging into the reasons for that right now.
Still, add a regression test memorializing what we expect there,
so we'll notice if it changes again.)
Per bug report from Eduards Bezverhijs.
Discussion: https://postgr.es/m/b6cd572a-3e44-8785-75e9-c512a5a17a73@tieto.com
classify_index_clause_usage() is O(N^2) in the number of distinct index
qual clauses it considers, because of its use of a simple search list to
store them. For nearly all queries, that's fine because only a few clauses
will be considered. But Alexander Kuzmenkov reported a machine-generated
query with 80000 (!) index qual clauses, which caused this code to take
forever. Somewhat remarkably, this is the only O(N^2) behavior we now
have for such a query, so let's fix it.
We can get rid of the O(N^2) runtime for cases like this without much
damage to the functionality of choose_bitmap_and() by separating out
paths with "too many" qual or pred clauses, and deeming them to always
be nonredundant with other paths. Then their clauses needn't go into
the search list, so it doesn't get too long, but we don't lose the
ability to consider bitmap AND plans altogether. I set the threshold
for "too many" to be 100 clauses per path, which should be plenty to
ensure no change in planning behavior for normal queries.
There are other things we could do to make this go faster, but it's not
clear that it's worth any additional effort. 80000 qual clauses require
a whole lot of work in many other places, too.
The code's been like this for a long time, so back-patch to all supported
branches. The troublesome query only works back to 9.5 (in 9.4 it fails
with stack overflow in the parser); so I'm not sure that fixing this in
9.4 has any real-world benefit, but perhaps it does.
Discussion: https://postgr.es/m/90c5bdfa-d633-dabe-9889-3cf3e1acd443@postgrespro.ru
Allowing sub-select containing LIMIT/OFFSET in workers can lead to
inconsistent results at the top-level as there is no guarantee that the
row order will be fully deterministic. The fix is to prohibit pushing
LIMIT/OFFSET within sub-selects to workers.
Reported-by: Andrew Fletcher
Bug: 15324
Author: Amit Kapila
Reviewed-by: Dilip Kumar
Backpatch-through: 9.6
Discussion: https://postgr.es/m/153417684333.10284.11356259990921828616@wrigleys.postgresql.org
workers.
Allowing window function calculation in workers leads to inconsistent
results because if the input row ordering is not fully deterministic, the
output of window functions might vary across workers. The fix is to treat
them as parallel-restricted.
In the passing, improve the coding pattern in max_parallel_hazard_walker
so that it has a chain of mutually-exclusive if ... else if ... else if
... else if ... IsA tests.
Reported-by: Marko Tiikkaja
Bug: 15324
Author: Amit Kapila
Reviewed-by: Tom Lane
Backpatch-through: 9.6
Discussion: https://postgr.es/m/CAL9smLAnfPJCDUUG4ckX2iznj53V7VSMsYefzZieN93YxTNOcw@mail.gmail.com
GatherMergePath (introduced in 10) and CustomPath (introduced in 9.5)
have gone missing. The order of the Path nodes was inconsistent with
what is listed in nodes.h, so make the order consistent at the same time
to ease future checks and additions.
Author: Sawada Masahiko
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAD21AoBQMLoc=ohH-oocuAPsELrmk8_EsRJjOyR8FQLZkbE0wA@mail.gmail.com
We should only run apply_pathtarget_labeling_to_tlist if CP_LABEL_TLIST
was specified, because only in that case has use_physical_tlist checked
that the labeling will succeed; otherwise we may get an "ORDER/GROUP BY
expression not found in targetlist" error. (This subsumes the previous
test about gating_clauses, because we reset "flags" to zero earlier
if there are gating clauses to apply.)
The only known case in which a failure can occur is with a ProjectSet
path directly atop a table scan path, although it seems likely that there
are other cases or will be such in future. This means that the failure
is currently only visible in the v10 branch: 9.6 didn't have ProjectSet,
while in v11 and HEAD, apply_scanjoin_target_to_paths for some weird
reason is using create_projection_path not apply_projection_to_path,
masking the problem because there's a ProjectionPath in between.
Nonetheless this code is clearly wrong on its own terms, so back-patch
to 9.6 where this logic was introduced.
Per report from Regina Obe.
Discussion: https://postgr.es/m/001501d40f88$75186950$5f493bf0$@pcorp.us
canonicalize_ec_expression() is supposed to agree with coerce_type() as to
whether a RelabelType should be inserted to make a subexpression be valid
input for the operators of a given opclass. However, it did the wrong
thing with named-composite-type inputs to record_eq(): it put in a
RelabelType to RECORDOID, which the parser doesn't. In some cases this was
harmless because all code paths involving a particular equivalence class
did the same thing, but in other cases this would result in failing to
recognize a composite-type expression as being a member of an equivalence
class that it actually is a member of. The most obvious bad effect was to
fail to recognize that an index on a composite column could provide the
sort order needed for a mergejoin on that column, as reported by Teodor
Sigaev. I think there might be other, subtler, cases that result in
misoptimization. It also seems possible that an unwanted RelabelType
would sometimes get into an emitted plan --- but because record_eq and
friends don't examine the declared type of their input expressions, that
would not create any visible problems.
To fix, just treat RECORDOID as if it were a polymorphic type, which in
some sense it is. We might want to consider formalizing that a bit more
someday, but for the moment this seems to be the only place where an
IsPolymorphicType() test ought to include RECORDOID as well.
This has been broken for a long time, so back-patch to all supported
branches.
Discussion: https://postgr.es/m/a6b22369-e3bf-4d49-f59d-0c41d3551e81@sigaev.ru
On further reflection, commit e5d83995e didn't go far enough: pretty much
everywhere in the planner that examines a clause's is_pushed_down flag
ought to be changed to use the more complicated behavior where we also
check the clause's required_relids. Otherwise we could make incorrect
decisions about whether, say, a clause is safe to use as a hash clause.
Some (many?) of these places are safe as-is, either because they are
never reached while considering a parameterized path, or because there
are additional checks that would reject a pushed-down clause anyway.
However, it seems smarter to just code them all the same way rather
than rely on easily-broken reasoning of that sort.
In support of that, invent a new macro RINFO_IS_PUSHED_DOWN that should
be used in place of direct tests on the is_pushed_down flag.
Like the previous patch, back-patch to all supported branches.
Discussion: https://postgr.es/m/f8128b11-c5bf-3539-48cd-234178b2314d@proxel.se
In some cases a clause attached to an outer join can be pushed down into
the outer join's RHS even though the clause is not degenerate --- this
can happen if we choose to make a parameterized path for the RHS. If
the clause ends up attached to a lower outer join, we'd misclassify it
as being a "join filter" not a plain "filter" condition at that node,
leading to wrong query results.
To fix, teach extract_actual_join_clauses to examine each join clause's
required_relids, not just its is_pushed_down flag. (The latter now
seems vestigial, or at least in need of rethinking, but we won't do
anything so invasive as redefining it in a bug-fix patch.)
This has been wrong since we introduced parameterized paths in 9.2,
though it's evidently hard to hit given the lack of previous reports.
The test case used here involves a lateral function call, and I think
that a lateral reference may be required to get the planner to select
a broken plan; though I wouldn't swear to that. In any case, even if
LATERAL is needed to trigger the bug, it still affects all supported
branches, so back-patch to all.
Per report from Andreas Karlsson. Thanks to Andrew Gierth for
preliminary investigation.
Discussion: https://postgr.es/m/f8128b11-c5bf-3539-48cd-234178b2314d@proxel.se
One of the things canonicalize_qual() does is to remove constant-NULL
subexpressions of top-level AND/OR clauses. It does that on the assumption
that what it's given is a top-level WHERE clause, so that NULL can be
treated like FALSE. Although this is documented down inside a subroutine
of canonicalize_qual(), it wasn't mentioned in the documentation of that
function itself, and some callers hadn't gotten that memo.
Notably, commit d007a9505 caused get_relation_constraints() to apply
canonicalize_qual() to CHECK constraints. That allowed constraint
exclusion to misoptimize situations in which a CHECK constraint had a
provably-NULL subclause, as seen in the regression test case added here,
in which a child table that should be scanned is not. (Although this
thinko is ancient, the test case doesn't fail before 9.2, for reasons
I've not bothered to track down in detail. There may be related cases
that do fail before that.)
More recently, commit f0e44751d added an independent bug by applying
canonicalize_qual() to index expressions, which is even sillier since
those might not even be boolean. If they are, though, I think this
could lead to making incorrect index entries for affected index
expressions in v10. I haven't attempted to prove that though.
To fix, add an "is_check" parameter to canonicalize_qual() to specify
whether it should assume WHERE or CHECK semantics, and make it perform
NULL-elimination accordingly. Adjust the callers to apply the right
semantics, or remove the call entirely in cases where it's not known
that the expression has one or the other semantics. I also removed
the call in some cases involving partition expressions, where it should
be a no-op because such expressions should be canonical already ...
and was a no-op, independently of whether it could in principle have
done something, because it was being handed the qual in implicit-AND
format which isn't what it expects. In HEAD, add an Assert to catch
that type of mistake in future.
This represents an API break for external callers of canonicalize_qual().
While that's intentional in HEAD to make such callers think about which
case applies to them, it seems like something we probably wouldn't be
thanked for in released branches. Hence, in released branches, the
extra parameter is added to a new function canonicalize_qual_ext(),
and canonicalize_qual() is a wrapper that retains its old behavior.
Patch by me with suggestions from Dean Rasheed. Back-patch to all
supported branches.
Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us
Since 9.5, it's possible that some but not all columns of an index
support returning the indexed value for index-only scans. If the
same indexed column appears in index columns that behave both ways,
check_index_only() supposed that it'd be OK to do an index-only scan
testing that column; but that fails if we have to recheck the indexed
condition on one of the columns that doesn't support this.
In principle we could make this work by remapping the recheck expressions
to pull the value from a column that does support returning the indexed
value. But such cases are so weird and rare that, at least for now,
it doesn't seem worth the trouble. Instead, just teach check_index_only
that a value is returnable only if all the index columns containing it
are returnable, rather than any of them.
Per report from David Pereiro Lagares. Back-patch to 9.5 where the
possibility of this situation appeared.
Kyotaro Horiguchi
Discussion: https://postgr.es/m/1516210494.1798.16.camel@nlpgo.com
Given overlapping or partially redundant join clauses, for example
t1 JOIN t2 ON t1.a = t2.x AND t1.b = t2.x
the planner's EquivalenceClass machinery will ordinarily refactor the
clauses as "t1.a = t1.b AND t1.a = t2.x", so that join processing doesn't
see multiple references to the same EquivalenceClass in a list of join
equality clauses. However, if the join is outer, it's incorrect to derive
a restriction clause on the outer side from the join conditions, so the
clause refactoring does not happen and we end up with overlapping join
conditions. The code that attempted to deal with such cases had several
subtle bugs, which could result in "left and right pathkeys do not match in
mergejoin" or "outer pathkeys do not match mergeclauses" planner errors,
if the selected join plan type was a mergejoin. (It does not appear that
any actually incorrect plan could have been emitted.)
The core of the problem really was failure to recognize that the outer and
inner relations' pathkeys have different relationships to the mergeclause
list. A join's mergeclause list is constructed by reference to the outer
pathkeys, so it will always be ordered the same as the outer pathkeys, but
this cannot be presumed true for the inner pathkeys. If the inner sides of
the mergeclauses contain multiple references to the same EquivalenceClass
({t2.x} in the above example) then a simplistic rendering of the required
inner sort order is like "ORDER BY t2.x, t2.x", but the pathkey machinery
recognizes that the second sort column is redundant and throws it away.
The mergejoin planning code failed to account for that behavior properly.
One error was to try to generate cut-down versions of the mergeclause list
from cut-down versions of the inner pathkeys in the same way as the initial
construction of the mergeclause list from the outer pathkeys was done; this
could lead to choosing a mergeclause list that fails to match the outer
pathkeys. The other problem was that the pathkey cross-checking code in
create_mergejoin_plan treated the inner and outer pathkey lists
identically, whereas actually the expectations for them must be different.
That led to false "pathkeys do not match" failures in some cases, and in
principle could have led to failure to detect bogus plans in other cases,
though there is no indication that such bogus plans could be generated.
Reported by Alexander Kuzmenkov, who also reviewed this patch. This has
been broken for years (back to around 8.3 according to my testing), so
back-patch to all supported branches.
Discussion: https://postgr.es/m/5dad9160-4632-0e47-e120-8e2082000c01@postgrespro.ru
create_plan_recurse lacked any stack depth check. This is not per
our normal coding rules, but I'd supposed it was safe because earlier
planner processing is more complex and presumably should eat more
stack. But bug #15033 from Andrew Grossman shows this isn't true,
at least not for queries having the form of a many-thousand-way
INTERSECT stack.
Further testing showed that recurse_set_operations is also capable
of being crashed in this way, since it likewise will recurse to the
bottom of a parsetree before calling any support functions that
might themselves contain any stack checks. However, its stack
consumption is only perhaps a third of create_plan_recurse's.
It's possible that this particular problem with create_plan_recurse can
only manifest in 9.6 and later, since before that we didn't build a Path
tree for set operations. But having seen this example, I now have no
faith in the proposition that create_plan_recurse doesn't need a stack
check, so back-patch to all supported branches.
Discussion: https://postgr.es/m/20180127050845.28812.58244@wrigleys.postgresql.org
If we're inside a lateral subquery, there may be no unparameterized paths
for a particular child relation of an appendrel, in which case we *must*
be able to create similarly-parameterized paths for each other child
relation, else the planner will fail with "could not devise a query plan
for the given query". This means that there are situations where we'd
better be able to reparameterize at least one path for each child.
This calls into question the assumption in reparameterize_path() that
it can just punt if it feels like it. However, the only case that is
known broken right now is where the child is itself an appendrel so that
all its paths are AppendPaths. (I think possibly I disregarded that in
the original coding on the theory that nested appendrels would get folded
together --- but that only happens *after* reparameterize_path(), so it's
not excused from handling a child AppendPath.) Given that this code's been
like this since 9.3 when LATERAL was introduced, it seems likely we'd have
heard of other cases by now if there were a larger problem.
Per report from Elvis Pranskevichus. Back-patch to 9.3.
Discussion: https://postgr.es/m/5981018.zdth1YWmNy@hammer.magicstack.net
If a query against an inheritance tree runs concurrently with an ALTER
TABLE that's disinheriting one of the tree members, it's possible to get
a "could not find inherited attribute" error because after obtaining lock
on the removed member, make_inh_translation_list sees that its columns
have attinhcount=0 and decides they aren't the columns it's looking for.
An ideal fix, perhaps, would avoid including such a just-removed member
table in the query at all; but there seems no way to accomplish that
without adding expensive catalog rechecks or creating a likelihood of
deadlocks. Instead, let's just drop the check on attinhcount. In this
way, a query that's included a just-disinherited child will still
succeed, which is not a completely unreasonable behavior.
This problem has existed for a long time, so back-patch to all supported
branches. Also add an isolation test verifying related behaviors.
Patch by me; the new isolation test is based on Kyotaro Horiguchi's work.
Discussion: https://postgr.es/m/20170626.174612.23936762.horiguchi.kyotaro@lab.ntt.co.jp
If we flatten a subquery whose target list contains constants or
expressions, when those output columns are used in GROUPING SET columns,
the planner was capable of doing the wrong thing by merging a pulled-up
expression into the surrounding expression during const-simplification.
Then the late processing that attempts to match subexpressions to grouping
sets would fail to match those subexpressions to grouping sets, with the
effect that they'd not go to null when expected.
To fix, wrap such subquery outputs in PlaceHolderVars, ensuring that
they preserve their separate identity throughout the planner's expression
processing. This is a bit of a band-aid, because the wrapper defeats
const-simplification even in places where it would be safe to allow.
But a nicer fix would likely be too invasive to back-patch, and the
consequences of the missed optimizations probably aren't large in most
cases.
Back-patch to 9.5 where grouping sets were introduced.
Heikki Linnakangas, with small mods and better test cases by me;
additional review by Andrew Gierth
Discussion: https://postgr.es/m/7dbdcf5c-b5a6-ef89-4958-da212fe10176@iki.fi
Since 9.4, we've allowed the syntax "select union select" and variants
of that. However, the planner wasn't expecting a no-column set operation
and ended up treating the set operation as if it were UNION ALL.
Pre-v10, there seem to be some executor issues that would need to be
fixed to support such cases, and it doesn't really seem worth expending
much effort on. Just disallow it, instead.
Per report from Victor Yegorov.
Discussion: https://postgr.es/m/CAGnEbojGJrRSOgJwNGM7JSJZpVAf8xXcVPbVrGdhbVEHZ-BUMw@mail.gmail.com
rewriteTargetListUD's processing is dependent on the relkind of the query's
target table. That was fine at the time it was made to act that way, even
for queries on inheritance trees, because all tables in an inheritance tree
would necessarily be plain tables. However, the 9.5 feature addition
allowing some members of an inheritance tree to be foreign tables broke the
assumption that rewriteTargetListUD's output tlist could be applied to all
child tables with nothing more than column-number mapping. This led to
visible failures if foreign child tables had row-level triggers, and would
also break in cases where child tables belonged to FDWs that used methods
other than CTID for row identification.
To fix, delay running rewriteTargetListUD until after the planner has
expanded inheritance, so that it is applied separately to the (already
mapped) tlist for each child table. We can conveniently call it from
preprocess_targetlist. Refactor associated code slightly to avoid the
need to heap_open the target relation multiple times during
preprocess_targetlist. (The APIs remain a bit ugly, particularly around
the point of which steps scribble on parse->targetList and which don't.
But avoiding such scribbling would require a change in FDW callback APIs,
which is more pain than it's worth.)
Also fix ExecModifyTable to ensure that "tupleid" is reset to NULL when
we transition from rows providing a CTID to rows that don't. (That's
really an independent bug, but it manifests in much the same cases.)
Add a regression test checking one manifestation of this problem, which
was that row-level triggers on a foreign child table did not work right.
Back-patch to 9.5 where the problem was introduced.
Etsuro Fujita, reviewed by Ildus Kurbangaliev and Ashutosh Bapat
Discussion: https://postgr.es/m/20170514150525.0346ba72@postgrespro.ru
Previously, we skipped using search_indexed_tlist_for_sortgroupref()
if the tlist expression being sought in the child plan node was merely
a Var. This is purely an optimization, based on the theory that
search_indexed_tlist_for_var() is faster, and one copy of a Var should
be as good as another. However, the GROUPING SETS patch broke the
latter assumption: grouping columns containing the "same" Var can
sometimes have different outputs, as shown in the test case added here.
So do it the hard way whenever a ressortgroupref marking exists.
(If this seems like a bottleneck, we could imagine building a tlist index
data structure for ressortgroupref values, as we do for Vars. But I'll
let that idea go until there's some evidence it's worthwhile.)
Back-patch to 9.6. The problem also exists in 9.5 where GROUPING SETS
came in, but this patch is insufficient to resolve the problem in 9.5:
there is some obscure dependency on the upper-planner-pathification
work that happened in 9.6. Given that this is such a weird corner case,
and no end users have complained about it, it doesn't seem worth the work
to develop a fix for 9.5.
Patch by me, per a report from Heikki Linnakangas. (This does not fix
Heikki's original complaint, just the follow-on one.)
Discussion: https://postgr.es/m/aefc657e-edb2-64d5-6df1-a0828f6e9104@iki.fi
If the user modifies a view that has CHECK OPTIONs and this gets
translated into a modification to an underlying relation which happens
to be a foreign table, the check options should be enforced. In the
normal code path, that was happening properly, but it was not working
properly for "direct" modification because the whole operation gets
pushed to the remote side in that case and we never have an option to
enforce the constraint against individual tuples. Fix by disabling
direct modification when there is a need to enforce CHECK OPTIONs.
Etsuro Fujita, reviewed by Kyotaro Horiguchi and by me.
Discussion: http://postgr.es/m/f8a48f54-6f02-9c8a-5250-9791603171ee@lab.ntt.co.jp
David Rowley found that the "use the smallest per-column selectivity"
heuristic applied in some cases by get_foreign_key_join_selectivity()
was badly off if the FK columns are independent, producing estimates
much worse than we got before that code was added in 9.6.
One case where that heuristic was used was for LEFT and FULL outer joins
with the referenced rel on the outside of the join. But we should not
really need to special-case those here. eqjoinsel() never has had such a
special case; the correction is applied by calc_joinrel_size_estimate()
instead. Let's just estimate such cases like inner joins and rely on that
later adjustment. (I think there was something of a thinko here, in that
the comments seem to be thinking about the selectivity as defined for
semi/anti joins; but that shouldn't apply to left/full joins.) Add a
regression test exercising such a case to show that this is sane in
at least some cases.
The other case where we used that heuristic was for SEMI/ANTI outer joins,
either if the referenced rel was on the outside, or if it was on the inside
but was part of a join within the RHS. In either case, the FK doesn't give
us a lot of traction towards estimating the selectivity. To ensure that
we don't have regressions from what happened before 9.6, let's punt by
ignoring the FK in such cases and applying the traditional selectivity
calculation. (We might be able to improve on that later, but for now
I just want to be sure it's not worse than 9.5.)
Report and patch by David Rowley, simplified a bit by me. Back-patch
to 9.6 where this code was added.
Discussion: https://postgr.es/m/CAKJS1f8NO8oCDcxrteohG6O72uU1saEVT9qX=R8pENr5QWerXw@mail.gmail.com
We were applying the use_physical_tlist optimization to all relation
scan plans, even those implemented by custom scan providers. However,
that's a bad idea for a couple of reasons. The custom provider might
be unable to provide columns that it hadn't expected to be asked for
(for example, the custom scan might depend on an index-only scan).
Even more to the point, there's no good reason to suppose that this
"optimization" is a win for a custom scan; whatever the custom provider
is doing is likely not based on simply returning physical heap tuples.
(As a counterexample, if the custom scan is an interface to a column store,
demanding all columns would be a huge loss.) If it is a win, the custom
provider could make that decision for itself and insert a suitable
pathtarget into the path, anyway.
Per discussion with Dmitry Ivanov. Back-patch to 9.5 where custom scan
support was introduced. The argument that the custom provider can adjust
the behavior by changing the pathtarget only applies to 9.6+, but on
balance it seems more likely that use_physical_tlist will hurt custom
scans than help them.
Discussion: https://postgr.es/m/e29ddd30-8ef9-4da5-a50b-2bb7b8c7198d@postgrespro.ru
As reported by Sean Johnston in bug #14614, since 9.6 the planner can fail
due to trying to look up the referent of a Var with varno 0. This happens
because we generate such Vars in generate_append_tlist, for lack of any
better way to describe the output of a SetOp node. In typical situations
nothing really cares about that, but given nested set-operation queries
we will call estimate_num_groups on the output of the subquery, and that
wants to know what a Var actually refers to. That logic used to look at
subquery->targetList, but in commit 3fc6e2d7f I'd switched it to look at
subroot->processed_tlist, ie the actual output of the subquery plan not the
parser's idea of the result. It seemed like a good idea at the time :-(.
As a band-aid fix, change it back.
Really we ought to have an honest way of naming the outputs of SetOp steps,
which suggests that it'd be a good idea for the parser to emit an RTE
corresponding to each one. But that's a task for another day, and it
certainly wouldn't yield a back-patchable fix.
Report: https://postgr.es/m/20170407115808.25934.51866@wrigleys.postgresql.org
Commit 45be99f8cd removed GatherPath's
num_workers field, but this is entirely bogus. Normally, a path's
parallel_workers flag is supposed to indicate the number of workers
that it wants, and should be 0 for a non-partial path. In that
commit, I mistakenly thought that GatherPath could also use that field
to indicate the number of workers that it would try to start, but
that's disastrous, because then it can propagate up to higher nodes in
the plan tree, which will then get incorrect rowcounts because the
parallel_workers flag is involved in computing those values. Repair
by putting the separate field back.
Report by Tomas Vondra. Patch by me, reviewed by Amit Kapila.
Discussion: http://postgr.es/m/f91b4a44-f739-04bd-c4b6-f135bd643669@2ndquadrant.com
If we forcibly place a Material node atop a finished subplan, we need
to move any initPlans attached to the subplan up to the Material node,
in order to keep SS_finalize_plan() happy. I'd figured this out in
commit 7b67a0a49 for the case of materializing a cursor plan, but out of
an abundance of caution, I put the initPlan movement hack at the call
site for that case, rather than inside materialize_finished_plan().
That was the wrong thing, because it turns out to also be necessary for
the only other caller of materialize_finished_plan(), ie subselect.c.
We lacked any test cases that exposed the mistake, but bug#14524 from
Wei Congrui shows that it's possible to get an initPlan reference into
the top tlist in that case too, and then SS_finalize_plan() complains.
Hence, move the hack into materialize_finished_plan().
In HEAD, also relocate some recently-added tests in subselect.sql, which
I'd unthinkingly dropped into the middle of a sequence of related tests.
Report: https://postgr.es/m/20170202060020.1400.89021@wrigleys.postgresql.org
For a partial path, the cardinality estimate needs to reflect the
number of rows we think each worker will see, rather than the total
number of rows; otherwise, costing will go wrong. The previous coding
got this completely wrong for parallel joins.
Unfortunately, this change may destabilize plans for users of 9.6 who
have enabled parallel query, but since 9.6 is still fairly new I'm
hoping expectations won't be too settled yet. Also, this is really a
brown-paper-bag bug, so leaving it unfixed for the entire lifetime of
9.6 seems unwise.
Related reports (whose import I initially failed to recognize) by
Tomas Vondra and Tom Lane.
Discussion: http://postgr.es/m/CA+TgmoaDxZ5z5Kw_oCQoymNxNoVaTCXzPaODcOuao=CzK8dMZw@mail.gmail.com
This case wasn't thought through sufficiently in commit 100340e2d.
It's true that the FK proves that every outer row has a match in the
inner table, but we forgot that some of the inner rows might be filtered
away by WHERE conditions located within the semijoin's RHS.
If the RHS is just one table, we can reasonably take the semijoin
selectivity as equal to the fraction of the referenced table's rows
that are expected to survive its restriction clauses.
If the RHS is a join, it's not clear how much of the referenced table
might get through the join, so fall back to the same rule we were
already using for other outer-join cases: use the minimum of the
regular per-clause selectivity estimates. This gives the same result
as if we hadn't considered the FK at all when there's a single FK
column, but it should still help for multi-column FKs, which is the
case that 100340e2d is really meant to help with.
Back-patch to 9.6 where the previous commit came in.
Discussion: https://postgr.es/m/16149.1481835103@sss.pgh.pa.us
The existing tests in preprocess_minmax_aggregates() usually prevent it
from trying to do anything with queries containing CTEs, but there's an
exception: a CTE could be present as a member of an appendrel, if we
flattened a UNION ALL that contains CTE references. If it did try to
generate an optimized path for a query using a CTE, it failed with
"could not find plan for CTE", as reported by Torsten Förtsch.
The proximate cause is an unwise decision in commit 3fc6e2d7f to clear
subroot->cte_plan_ids in build_minmax_path(). That left the subroot's
cte_plan_ids list out of step with its parse->cteList.
Removing the "subroot->cte_plan_ids = NIL;" assignment is enough to let
the case work again, but really it's pretty silly to be expending any
cycles at all in this module when there are CTEs: we always treat their
outputs as unordered so there's no way for the optimization to win.
Hence, also add an early-exit test so we don't waste time like that.
Back-patch to 9.6 where the misbehavior was introduced.
Report: https://postgr.es/m/CAKkG4_=gjY5QiHtqSZyWMwDuTd_CftKoTaCqxjJ7uUz1-Gw=qw@mail.gmail.com
consider_parallel_nestloop passed the wrong jointype down to its
subroutines for JOIN_UNIQUE_INNER cases (it should pass JOIN_INNER), and it
thought that it could pass paths other than innerrel->cheapest_total_path
to create_unique_path, which create_unique_path is not on board with.
These bugs would lead to assertion failures or other errors, suggesting
that this code path hasn't been tested much.
hash_inner_and_outer's code for parallel join effectively treated both
JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER the same as JOIN_INNER (for
different reasons :-(), leading to incorrect plans that treated a semijoin
as if it were a plain join.
Michael Day submitted a test case demonstrating that hash_inner_and_outer
failed for JOIN_UNIQUE_OUTER, and I found the other cases through code
review.
Report: https://postgr.es/m/D0E8A029-D1AC-42E8-979A-5DE4A77E4413@rcmail.com
Andreas Seltenreich found another case where we were being too optimistic
about allowing a plan to be considered parallelizable despite it containing
initPlans. It seems like the real issue here is that if we know we are
going to tack initPlans onto the topmost Plan node for a subquery, we
had better mark that subquery's result Paths as not-parallel-safe. That
fixes this problem and allows reversion of a kluge (added in commit
7b67a0a49 and extended in f24cf960d) to not trust the parallel_safe flag
at top level.
Discussion: <874m2w4k5d.fsf@ex.ansel.ydns.eu>
We mustn't force parallel mode if the query has any subplans, since
ExecSerializePlan doesn't transmit them to workers. Testing
top_plan->initPlan is inadequate because (1) there might be initPlans
attached to lower plan nodes, and (2) non-initPlan subplans don't
work either. There's certainly room for improvement in those
restrictions, but for the moment that's what we've got.
Amit Kapila, per report from Andreas Seltenreich
Discussion: <8737im6pmh.fsf@credativ.de>
The plan generated for sorted partial aggregation with "GROUP BY constant"
included a Sort node with no sort keys, which the executor does not like.
Per report from Steve Randall. I'd add a regression test case if I could
think of a compact one, but it doesn't seem worth expending lots of cycles
on.
Report: <CABVd52UAdGXpg_rCk46egpNKYdXOzCjuJ1zG26E2xBe_8bj+Fg@mail.gmail.com>