aggregate function. By definition, such a sub-SELECT cannot reference any
variables of query levels between itself and the aggregate's semantic level
(else the aggregate would've been assigned to that lower level instead).
So the correct, most efficient implementation is to treat the sub-SELECT as
being a sub-select of that outer query level, not the level the aggregate
syntactically appears in. Not doing so also confuses the heck out of our
parameter-passing logic, as illustrated in bug report from Daniel Grace.
Fortunately, we were already copying the whole Aggref expression up to the
outer query level, so all that's needed is to delay SS_process_sublinks
processing of the sub-SELECT until control returns to the outer level.
This has been broken since we introduced spec-compliant treatment of
outer aggregates in 7.4; so patch all the way back.
constants through full joins, as in
select * from tenk1 a full join tenk1 b using (unique1)
where unique1 = 42;
which should generate a fairly cheap plan where we apply the constraint
unique1 = 42 in each relation scan. This had been broken by my patch of
2008-06-27, which is now reverted in favor of a more invasive but hopefully
less incorrect approach. That patch was meant to prevent incorrect extraction
of OR'd indexclauses from OR conditions above an outer join. To do that
correctly we need more information than the outerjoin_delay flag can provide,
so add a nullable_relids field to RestrictInfo that records exactly which
relations are nulled by outer joins that are underneath a particular qual
clause. A side benefit is that we can make the test in create_or_index_quals
more specific: it is now smart enough to extract an OR'd indexclause into the
outer side of an outer join, even though it must not do so in the inner side.
The old coding couldn't distinguish these cases so it could not do either.
by the planning process. This prevents the "failed to locate grouping columns"
error recently reported by Dickson Guedes. That happens because planning
replaces SubLinks by SubPlans in the subquery's targetlist, and exprTypmod()
is smarter about the former than the latter, causing the apparent type of
the subquery's output columns to change. This seems to be a deficiency we
should fix in exprTypmod(), but that will be a much more invasive patch
with possible side-effects elsewhere, so I'll do that only in HEAD.
Back-patch to 8.3. Arguably the lack of a copying step is broken/dangerous
all the way back, but in the absence of known problems I'll refrain from
making the older branches pay the extra cost. (The reason this particular
symptom didn't appear before is that exprTypmod() wasn't smart about SubLinks
either, until 8.3.)
outer join clauses. Given, say,
... from a left join b on a.a1 = b.b1 where a.a1 = 42;
we'll deduce a clause b.b1 = 42 and then mark the original join clause
redundant (we can't remove it completely for reasons I don't feel like
squeezing into this log entry). However the original implementation of
that wasn't bulletproof, because clause_selectivity() wouldn't honor
this_selec if given nonzero varRelid --- which in practice meant that
it worked as desired *except* when considering index scan quals. Which
resulted in bogus underestimation of the size of the indexscan result for
an inner indexscan in an outer join, and consequently a possibly bad
choice of indexscan vs. bitmap scan. Fix by introducing an explicit test
into clause_selectivity(). Also, to make sure we don't trigger that test
in corner cases, change the convention to be that this_selec > 1, not
this_selec = 1, means it's been marked redundant. Per trouble report from
Scara Maccai.
Back-patch to 8.2, where the problem was introduced.
AND, OR, or equivalent clauses: if there are too many (more than 100) just
exit without proving anything. This ensures that we don't spend O(N^2) time
trying (and most likely failing) to prove anything about very long IN lists
and similar cases.
Also, install a couple of CHECK_FOR_INTERRUPTS calls to ensure that a long
proof attempt can be interrupted.
Per gripe from Sergey Konoplev.
Back-patch the whole patch to 8.2 and just the CHECK_FOR_INTERRUPTS addition
to 8.1. (The rest of the patch doesn't apply cleanly, and since 8.1 doesn't
show the complained-of behavior anyway, it doesn't seem necessary to work
hard on it.)
we extended the appendrel mechanism to support UNION ALL optimization. The
reason nobody noticed was that we are not actually using attr_needed data for
appendrel children; hence it seems more reasonable to rip it out than fix it.
Back-patch to 8.2 because an Assert failure is possible in corner cases.
Per examination of an example from Jim Nasby.
In HEAD, also get rid of AppendRelInfo.col_mappings, which is quite inadequate
to represent UNION ALL situations; depend entirely on translated_vars instead.
btree. We can't easily tell whether clauses generated from the equivalence
class could be used with such an index, so just assume that they might be.
This bit of over-optimization prevented use of non-btree indexes for nestloop
inner indexscans, in any case where the join uses an equality operator that
is also a btree operator --- which in particular is typically true for hash
indexes. Noted while trying to test the current hash index patch.
when its input is constant and the element coercion function is immutable
(or nonexistent, ie, binary-coercible case). This is an oversight in the
8.3 implementation of ArrayCoerceExpr, and its result is that certain cases
involving IN or NOT IN with constants don't get optimized as they should be.
Per experimentation with an example from Ow Mun Heng.
parent, not only those with RangeTblRefs. We need them in ExecCheckRTPerms.
Report by Brendan O'Shea. Back-patch to 8.2, where pull_up_simple_union_all
was introduced.
bug #4290. The fundamental bug is that masking extParam by outer_params,
as finalize_plan had been doing, caused us to lose the information that
an initPlan depended on the output of a sibling initPlan. On reflection
the best thing to do seemed to be not to try to adjust outer_params for
this case but get rid of it entirely. The only thing it was really doing
for us was to filter out param IDs associated with SubPlan nodes, and that
can be done (with greater accuracy) while processing individual SubPlan
nodes in finalize_primnode. This approach was vindicated by the discovery
that the masking method was hiding a second bug: SS_finalize_plan failed to
remove extParam bits for initPlan output params that were referenced in the
main plan tree (it only got rid of those referenced by other initPlans).
It's not clear that this caused any real problems, given the limited use
of extParam by the executor, but it's certainly not what was intended.
I originally thought that there was also a problem with needing to include
indirect dependencies on external params in initPlans' param sets, but it
turns out that the executor handles this correctly so long as the depended-on
initPlan is earlier in the initPlans list than the one using its output.
That seems a bit of a fragile assumption, but it is true at the moment,
so I just documented it in some code comments rather than making what would
be rather invasive changes to remove the assumption.
Back-patch to 8.1. Previous versions don't have the case of initPlans
referring to other initPlans' outputs, so while the existing logic is still
questionable for them, there are not any known bugs to be fixed. So I'll
refrain from changing them for now.
of any lower outer join, even if it also references the non-nullable side and
so could not get pushed below the outer join anyway. We need this in case
the clause is an OR clause: if it doesn't get marked outerjoin_delayed,
create_or_index_quals() could pull an indexable restriction for the nullable
side out of it, leading to wrong results as demonstrated by today's bug
report from toruvinn. (See added regression test case for an example.)
In principle this has been wrong for quite a while. In practice I don't
think any branch before 8.3 can really show the failure, because
create_or_index_quals() will only pull out indexable conditions, and before
8.3 those were always strict. So though we might have improperly generated
null-extended rows in the outer join, they'd get discarded from the result
anyway. The gating factor that makes the failure visible is that 8.3
considers "col IS NULL" to be indexable. Hence I'm not going to risk
back-patching further than 8.3.
that it depends on for replan-forcing purposes. We need to consider plain OID
constants too, because eval_const_expressions folds a RelabelType atop a Const
to just a Const. This change could result in OID values that aren't really
for tables getting added to the dependency list, but the worst-case
consequence would be occasional useless replans. Per report from Gabriele
Messineo.
output is not of the same type that's needed for the IN comparison (ie,
where the parser inserted an implicit coercion above the subselect result).
We should record the coerced expression, not just a raw Var referencing
the subselect output, as the quantity that needs to be unique-ified if
we choose to implement the IN as Unique followed by a plain join.
As of 8.3 this error was causing crashes, as seen in bug #4113 from Javier
Hernandez, because the executor was being told to hash or sort the raw
subselect output column using operators appropriate to the coerced type.
In prior versions there was no crash because the executor chose the
hash or sort operators for itself based on the column type it saw.
However, that's still not really right, because what's unique for one data
type might not be unique for another. In corner cases we could get multiple
outputs of a row that should appear only once, as demonstrated by the
regression test case included in this commit.
However, this patch doesn't apply cleanly to 8.2 or before, and the code
involved has shifted enough over time that I'm hesitant to try to back-patch.
Given the lack of complaints from the field about such corner cases, I think
the bug may not be important enough to risk breaking other things with a
back-patch.
we had several code paths where a physical tlist could be used for the input
to a Sort node, which is a dumb idea because any unneeded table columns will
increase the volume of data the sort has to push around.
(Unfortunately the easy-looking fix of calling disuse_physical_tlist during
make_sort_xxx doesn't work because in most cases we're already committed to
the current input tlist --- it's been marked with sort column numbers, or
we've built grouping column numbers using it, etc. The tlist has to be
selected properly at the calling level before we start constructing sort-col
information. This is easy enough to do, we were just failing to take the
point into consideration.)
Back-patch to 8.3. I believe the problem probably exists clear back to 7.4
when the physical tlist optimization was added, but I'm afraid to back-patch
further than 8.3 without a great deal more study than I want to put into it.
The code in this area has drifted a lot over time. The real-world importance
of these code paths is uncertain anyway --- I think in many cases we'd
probably prefer hash-based methods.
eval_const_expressions needs to be passed the PlannerInfo ("root") structure,
because in some cases we want it to substitute values for Param nodes.
(So "constant" is not so constant as all that ...) This mistake partially
disabled optimization of unnamed extended-Query statements in 8.3: in
particular the LIKE-to-indexscan optimization would never be applied if the
LIKE pattern was passed as a parameter, and constraint exclusion depending
on a parameter value didn't work either.
the query result must be exactly one row (since we don't do this when there's
any GROUP BY). Therefore any ORDER BY or DISTINCT attached to the query is
useless and can be dropped. Aside from saving useless cycles, this protects
us against problems with matching the hacked-up tlist entries to sort clauses,
as seen in a bug report from Taiki Yamaguchi. We might need to work harder
if we ever try to optimize grouped queries with this approach, but this
solution will do for now.
knowledge up through any joins it participates in. We were doing that already
in some special cases but not in the general case. Also, defend against zero
row estimates for the input relations in cost_mergejoin --- this fix may have
eliminated the only scenario in which that can happen, but be safe. Per
report from Alex Solovey.
subquery output column exactly once left-to-right. Although this is the case
in the original parser output, it might not be so after rewriting and
constant-folding, as illustrated by bug #3882 from Jan Mate. Instead
scan the subquery's target list to obtain needed per-column information;
this is duplicative of what the parser did, but only a couple dozen lines
need be copied, and we can clean up a couple of notational uglinesses.
Bug was introduced in 8.2 as part of revision of SubLink representation.
constraint yields TRUE for every row of its table, only that it does not
yield FALSE (a NULL result isn't disallowed). This breaks a couple of
implications that would be true in two-valued logic. I had put in one such
mistake in an 8.2.5 patch: foo IS NULL doesn't refute a strict operator
on foo. But there was another in the original 8.2 release: NOT foo doesn't
refute an expression whose truth would imply the truth of foo.
Per report from Rajesh Kumar Mallah.
To preserve the ability to do constraint exclusion with one partition
holding NULL values, extend relation_excluded_by_constraints() to check
for attnotnull flags, and add col IS NOT NULL expressions to the set of
constraints we hope to refute.
checking of argument compatibility right; although the problem is only exposed
with multiple-input aggregates in which some arguments are polymorphic and
some are not. Per bug #3852 from Sokolov Yura.
for unhandled clause types ought to be 0.5, not 1.0. I fear I introduced
this silliness due to misreading the intent of the very-poorly-structured
code that was there when we inherited the file from Berkeley. The lack
of sanity in this behavior was exposed by an example from Sim Zacks.
(Arguably this is a bug fix and should be back-patched, but I'm a bit
hesitant to introduce a possible planner behavior change in the back
branches; it might detune queries that worked acceptably in the past.)
While at it, make estimation for DistinctExpr do something marginally
realistic, rather than just defaulting.
clauseless joins of relations that have unexploited join clauses. Rather
than looking at every other base relation in the query, the correct thing is
to examine the other relations in the "initial_rels" list of the current
make_rel_from_joinlist() invocation, because those are what we actually have
the ability to join against. This might be a subset of the whole query in
cases where join_collapse_limit or from_collapse_limit or full joins have
prevented merging the whole query into a single join problem. This is a bit
untidy because we have to pass those rels down through a new PlannerInfo
field, but it's necessary. Per bug #3865 from Oleg Kharin.
of poorer planning in 8.3 than 8.2:
1. After pushing a constant across an outer join --- ie, given
"a LEFT JOIN b ON (a.x = b.y) WHERE a.x = 42", we can deduce that b.y is
sort of equal to 42, in the sense that we needn't fetch any b rows where
it isn't 42 --- loop to see if any additional deductions can be made.
Previous releases did that by recursing, but I had mistakenly thought that
this was no longer necessary given the EquivalenceClass machinery.
2. Allow pushing constants across outer join conditions even if the
condition is outerjoin_delayed due to a lower outer join. This is safe
as long as the condition is strict and we re-test it at the upper join.
3. Keep the outer-join clause even if we successfully push a constant
across it. This is *necessary* in the outerjoin_delayed case, but
even in the simple case, it seems better to do this to ensure that the
join search order heuristics will consider the join as reasonable to
make. Mark such a clause as having selectivity 1.0, though, since it's
not going to eliminate very many rows after application of the constant
condition.
4. Tweak have_relevant_eclass_joinclause to report that two relations
are joinable when they have vars that are equated to the same constant.
We won't actually generate any joinclause from such an EquivalenceClass,
but again it seems that in such a case it's a good idea to consider
the join as worth costing out.
5. Fix a bug in select_mergejoin_clauses that was exposed by these
changes: we have to reject candidate mergejoin clauses if either side was
equated to a constant, because we can't construct a canonical pathkey list
for such a clause. This is an implementation restriction that might be
worth fixing someday, but it doesn't seem critical to get it done for 8.3.
the two join variables at both ends: not only trailing rows that need not be
scanned because there cannot be a match on the other side, but initial rows
that will be scanned without possibly having a match. This allows a more
realistic estimate of startup cost to be made, per recent pgsql-performance
discussion. In passing, fix a couple of bugs that had crept into
mergejoinscansel: it was not quite up to speed for the task of estimating
descending-order scans, which is a new requirement in 8.3.
indexable-clauses list for a btree index. Formerly it just Asserted that
all such clauses were opclauses, but that's no longer true in 8.3.
Per bug #3796 from Matthias Schoeneich.
clauselist_selectivity skip some analysis that's useless when there's only
one clause in the given list. Actually this can win even for not-so-simple
queries, because we also apply clauselist_selectivity to sublists such as the
quals matching an index; which are likely to have only a single entry even
when the total query is quite complicated.
where rtoffset == 0. In that case there is no need to change Var nodes,
and since filling in unset opfuncid fields is always safe, scribbling on the
input tree to that extent is not objectionable. This brings the cost of this
operation back down to what it was in 8.2 for simple queries. Per
investigation of performance gripe from Guillaume Smet.
where the EquivalenceClass machinery is unable to deduce anything more from a
simple "var = const" qual clause. There are probably some more cases where
this could be done, but this seems to take care of most of the added overhead
for simple queries. Per gripe from Guillaume Smet.
In passing, fix a problem that was exposed by this change:
reconsider_outer_join_clause and friends were passing the wrong relids to
build_implied_join_equality, resulting in RestrictInfos with the wrong
required_relids. This mistake was masked in typical cases since the bogus
RestrictInfos would never have escaped from the EquivalenceClass machinery,
but I think there might be corner cases involving "broken" ECs where there
would have been a visible failure even without the new optimization. In any
case the code was certainly not operating as intended.
OpExpr and related nodes. We're going to have to set the opfuncid of
such nodes eventually (if we haven't already), so we might as well
exploit the opportunity to cache the function OID. Buys back some
of the extra planner overhead noted by Guillaume Smet, though I still
need to fool with equivclass.c to really respond to that.
predictable manner; in particular that if you say ORDER BY output-column-ref,
it will in fact sort by that specific column even if there are multiple
syntactic matches. An example is
SELECT random() AS a, random() AS b FROM ... ORDER BY b, a;
While the use-case for this might be a bit debatable, it worked as expected
in earlier releases, so we should preserve the behavior for 8.3. Per my
recent proposal.
While at it, fix convert_subquery_pathkeys() to handle RelabelType stripping
in both directions; it needs this for the same reasons make_sort_from_pathkeys
does.
to be able to discard top-level RelabelType nodes on *both* sides of the
equivalence-class-to-target-list comparison, since make_pathkey_from_sortinfo
might either add or remove a RelabelType. Also fix the latter to do the
removal case cleanly. Per example from Peter.
make_greater_string() try harder to generate a string that's actually greater
than its input string. Before we just assumed that making a string that was
memcmp-greater was enough, but it is easy to generate examples where this is
not so when the locale is not C. Instead, loop until the relevant comparison
function agrees that the generated string is greater than the input.
Unfortunately this is probably not enough to guarantee that the generated
string is greater than all extensions of the input, so we cannot relax the
restriction to C locale for the LIKE/regex index optimization. But it should
at least improve the odds of getting a useful selectivity estimate in
prefix_selectivity(). Per example from Guillaume Smet.
Backpatch to 8.1, mainly because that's what the complainant is using...
RelabelType nodes when the sort key is binary-compatible with the sort
operator rather than having exactly its input type. We did this correctly
for index columns but not sort keys, leading to failure to notice that
a varchar index matches an ORDER BY request. This requires a bit more work
in make_sort_from_pathkeys, but not anyplace else that I can find.
Per bug report and subsequent discussion.
This doubles the planning workload for mergejoins while not actually
accomplishing much. The only useful case is where one of the directions
matches the query's ORDER BY request; therefore, put a thumb on the scales
in that direction, and otherwise arbitrarily consider only the ASC direction.
(This is a lot easier now than it would've been before 8.3, since we have
more semantic knowledge embedded in PathKeys now.)
if either of the input relations can legally be joined to any other rels using
join clauses. This avoids uselessly (and expensively) considering a lot of
really stupid join paths when there is a join restriction with a large
footprint, that is, lots of relations inside its LHS or RHS. My patch of
15-Feb-2007 had been causing the code to consider joining *every* combination
of rels inside such a group, which is exponentially bad :-(. With this
behavior, clauseless bushy joins will be done if necessary, but they'll be
put off as long as possible. Per report from Jakub Ouhrabka.
Backpatch to 8.2. We might someday want to backpatch to 8.1 as well, but 8.1
does not have the problem for OUTER JOIN nests, only for IN-clauses, so it's
not clear anyone's very likely to hit it in practice; and the current patch
doesn't apply cleanly to 8.1.
neglected to test whether an outer join's join-condition actually refers to
the lower outer join it is looking at. (The comment correctly described what
was supposed to happen, but the code didn't do it...) This often resulted in
adding an unnecessary constraint on the join order of the two outer joins,
which was bad enough. However, it also seems to expose a performance
problem in an older patch (from 15-Feb): once we've decided that there is a
join ordering constraint, we will start trying clauseless joins between every
combination of rels within the constraint, which pointlessly eats up lots of
time and space if there are numerous rels below the outer join. That probably
needs to be revisited :-(. Per gripe from Jakub Ouhrabka.
then-delete on the current cursor row. The basic fix is that nodeTidscan.c
has to apply heap_get_latest_tid() to the current-scan-TID obtained from the
cursor query; this ensures we get the latest row version to work with.
However, since that only works if the query plan is a TID scan, we also have
to hack the planner to make sure only that type of plan will be selected.
(Formerly, the planner might decide to apply a seqscan if the table is very
small. This change is probably a Good Thing anyway, since it's hard to see
how a seqscan could really win.) That means the execQual.c code to support
CurrentOfExpr as a regular expression type is dead code, so replace it with
just an elog(). Also, add regression tests covering these cases. Note
that the added tests expose the fact that re-fetching an updated row
misbehaves if the cursor used FOR UPDATE. That's an independent bug that
should be fixed later. Per report from Dharmendra Goyal.
used to perform MIN(foo) or MAX(foo), since we want to discard null rows in
the indexscan anyway. (This would probably fall out for free if we were
injecting the IS NOT NULL clause somewhere earlier, but given the current
anatomy of the MIN/MAX optimization code we have to do it explicitly.
Fortunately, very little added code is needed.) Per a discussion with
Henk de Wit.
simplification gets detoasted before it is incorporated into a Const node.
Otherwise, if an immutable function were to return a TOAST pointer (an
unlikely case, but it can be made to happen), we would end up with a plan
that depends on the continued existence of the out-of-line toast datum.
a relation as a reason to invalidate a plan when the relation changes. This
handles scenarios such as dropping/recreating a sequence that is referenced by
nextval('seq') in a cached plan. Rather than teach plancache.c all about
digging through plan trees to find regclass Consts, we charge the planner's
setrefs.c with making a list of the relation OIDs on which each plan depends.
That way the list can be built cheaply during a plan tree traversal that has
to happen anyway. Per bug #3662 and subsequent discussion.
eval_const_expressions simplifies this to just "WHERE false", but we have
already done pull_up_IN_clauses so the IN join will be done, or at least
planned, anyway. The trouble case comes when the sub-SELECT is itself a join
and we decide to implement the IN by unique-ifying the sub-SELECT outputs:
with no remaining reference to the output Vars in WHERE, we won't have
propagated the Vars up to the upper join point, leading to "variable not found
in subplan target lists" error. Fix by adding an extra scan of in_info_list
and forcing all Vars mentioned therein to be propagated up to the IN join
point. Per bug report from Miroslav Sulc.
join search order portion of the planner; this is specifically intended to
simplify developing a replacement for GEQO planning. Patch by Julius
Stroffek, editorialized on by me. I renamed make_one_rel_by_joins to
standard_join_search and make_rels_by_joins to join_search_one_level to better
reflect their place within this scheme.