As of commit a87c72915 (which later got backpatched as far as 9.1),
we're explicitly supporting the notion that append relations can be
nested; this can occur when UNION ALL constructs are nested, or when
a UNION ALL contains a table with inheritance children.
Bug #11457 from Nelson Page, as well as an earlier report from Elvis
Pranskevichus, showed that there were still nasty bugs associated with such
cases: in particular the EquivalenceClass mechanism could try to generate
"join" clauses connecting an appendrel child to some grandparent appendrel,
which would result in assertion failures or bogus plans.
Upon investigation I concluded that all current callers of
find_childrel_appendrelinfo() need to be fixed to explicitly consider
multiple levels of parent appendrels. The most complex fix was in
processing of "broken" EquivalenceClasses, which are ECs for which we have
been unable to generate all the derived equality clauses we would like to
because of missing cross-type equality operators in the underlying btree
operator family. That code path is more or less entirely untested by
the regression tests to date, because no standard opfamilies have such
holes in them. So I wrote a new regression test script to try to exercise
it a bit, which turned out to be quite a worthwhile activity as it exposed
existing bugs in all supported branches.
The present patch is essentially the same as far back as 9.2, which is
where parameterized paths were introduced. In 9.0 and 9.1, we only need
to back-patch a small fragment of commit 5b7b5518d, which fixes failure to
propagate out the original WHERE clauses when a broken EC contains constant
members. (The regression test case results show that these older branches
are noticeably stupider than 9.2+ in terms of the quality of the plans
generated; but we don't really care about plan quality in such cases,
only that the plan not be outright wrong. A more invasive fix in the
older branches would not be a good idea anyway from a plan-stability
standpoint.)
When we committed a87c729153, we somehow
failed to notice that it didn't merely improve plan quality for expression
indexes; there were very closely related cases that failed outright with
"could not find pathkey item to sort". The failing cases seem to be those
where the planner was already capable of selecting a MergeAppend plan,
and there was inheritance involved: the lack of appropriate eclass child
members would prevent prepare_sort_from_pathkeys() from succeeding on the
MergeAppend's child plan nodes for inheritance child tables.
Accordingly, back-patch into 9.1 through 9.3, along with an extra
regression test case covering the problem.
Per trouble report from Michael Glaesemann.
This was not changed in HEAD, but will be done later as part of a
pgindent run. Future pgindent runs will also do this.
Report by Tom Lane
Backpatch through all supported branches, but not HEAD
The planner is aware that it mustn't push down upper-level quals into
subqueries if the quals reference subquery output columns that contain
set-returning functions or volatile functions, or are non-DISTINCT outputs
of a DISTINCT ON subquery. However, it missed making this check when
there were one or more levels of UNION or INTERSECT above the dangerous
expression. This could lead to "set-valued function called in context that
cannot accept a set" errors, as seen in bug #8213 from Eric Soroos, or to
silently wrong answers in the other cases.
To fix, refactor the checks so that we make the column-is-unsafe checks
during subquery_is_pushdown_safe(), which already has to recursively
inspect all arms of a set-operation tree. This makes
qual_is_pushdown_safe() considerably simpler, at the cost that we will
spend some cycles checking output columns that possibly aren't referenced
in any upper qual. But the cases where this code gets executed at all
are already nontrivial queries, so it's unlikely anybody will notice any
slowdown of planning.
This has been broken since commit 05f916e6ad,
which makes the bug over ten years old. A bit surprising nobody noticed it
before now.
In commit 0f61d4dd1b, I added code to copy up
column width estimates for each column of a subquery. That code supposed
that the subquery couldn't have any output columns that didn't correspond
to known columns of the current query level --- which is true when a query
is parsed from scratch, but the assumption fails when planning a view that
depends on another view that's been redefined (adding output columns) since
the upper view was made. This results in an assertion failure or even a
crash, as per bug #8025 from lindebg. Remove the Assert and instead skip
the column if its resno is out of the expected range.
In a query such as "SELECT DISTINCT min(x) FROM tab", the DISTINCT is
pretty useless (there being only one output row), but nonetheless it
shouldn't fail. But it could fail if "tab" is an inheritance parent,
because planagg.c's code for fixing up equivalence classes after making the
index-optimized MIN/MAX transformation wasn't prepared to find child-table
versions of the aggregate expression. The least ugly fix seems to be
to add an option to mutate_eclass_expressions() to skip child-table
equivalence class members, which aren't used anymore at this stage of
planning so it's not really necessary to fix them. Since child members
are ignored in many cases already, it seems plausible for
mutate_eclass_expressions() to have an option to ignore them too.
Per bug #7703 from Maxim Boguk.
Back-patch to 9.1. Although the same code exists before that, it cannot
encounter child-table aggregates AFAICS, because the index optimization
transformation cannot succeed on inheritance trees before 9.1 (for lack
of MergeAppend).
generate_base_implied_equalities_const() should prefer plain Consts over
other em_is_const eclass members when choosing the "pivot" value that
all the other members will be equated to. This makes it more likely that
the generated equalities will be useful in constraint-exclusion proofs.
Per report from Rushabh Lathia.
If a potential equivalence clause references a variable from the nullable
side of an outer join, the planner needs to take care that derived clauses
are not pushed to below the outer join; else they may use the wrong value
for the variable. (The problem arises only with non-strict clauses, since
if an upper clause can be proven strict then the outer join will get
simplified to a plain join.) The planner attempted to prevent this type
of error by checking that potential equivalence clauses aren't
outerjoin-delayed as a whole, but actually we have to check each side
separately, since the two sides of the clause will get moved around
separately if it's treated as an equivalence. Bugs of this type can be
demonstrated as far back as 7.4, even though releases before 8.3 had only
a very ad-hoc notion of equivalence clauses.
In addition, we neglected to account for the possibility that such clauses
might have nonempty nullable_relids even when not outerjoin-delayed; so the
equivalence-class machinery lacked logic to compute correct nullable_relids
values for clauses it constructs. This oversight was harmless before 9.2
because we were only using RestrictInfo.nullable_relids for OR clauses;
but as of 9.2 it could result in pushing constructed equivalence clauses
to incorrect places. (This accounts for bug #7604 from Bill MacArthur.)
Fix the first problem by adding a new test check_equivalence_delay() in
distribute_qual_to_rels, and fix the second one by adding code in
equivclass.c and called functions to set correct nullable_relids for
generated clauses. Although I believe the second part of this is not
currently necessary before 9.2, I chose to back-patch it anyway, partly to
keep the logic similar across branches and partly because it seems possible
we might find other reasons why we need valid values of nullable_relids in
the older branches.
Add regression tests illustrating these problems. In 9.0 and up, also
add test cases checking that we can push constants through outer joins,
since we've broken that optimization before and I nearly broke it again
with an overly simplistic patch for this problem.
The planner previously assumed that parameter Vars having the same absolute
query level, varno, and varattno could safely be assigned the same runtime
PARAM_EXEC slot, even though they might be different Vars appearing in
different subqueries. This was (probably) safe before the introduction of
CTEs, but the lazy-evalution mechanism used for CTEs means that a CTE can
be executed during execution of some other subquery, causing the lifespan
of Params at the same syntactic nesting level as the CTE to overlap with
use of the same slots inside the CTE. In 9.1 we created additional hazards
by using the same parameter-assignment technology for nestloop inner scan
parameters, but it was broken before that, as illustrated by the added
regression test.
To fix, restructure the planner's management of PlannerParamItems so that
items having different semantic lifespans are kept rigorously separated.
This will probably result in complex queries using more runtime PARAM_EXEC
slots than before, but the slots are cheap enough that this hardly matters.
Also, stop generating PlannerParamItems containing Params for subquery
outputs: all we really need to do is reserve the PARAM_EXEC slot number,
and that now only takes incrementing a counter. The planning code is
simpler and probably faster than before, as well as being more correct.
Per report from Vik Reykja.
Back-patch of commit 46c508fbcf into all
branches that support WITH.
Previously, pattern_fixed_prefix() was defined to return whatever fixed
prefix it could extract from the pattern, plus the "rest" of the pattern.
That definition was sensible for LIKE patterns, but not so much for
regexes, where reconstituting a valid pattern minus the prefix could be
quite tricky (certainly the existing code wasn't doing that correctly).
Since the only thing that callers ever did with the "rest" of the pattern
was to pass it to like_selectivity() or regex_selectivity(), let's cut out
the middle-man and just have pattern_fixed_prefix's subroutines do this
directly. Then pattern_fixed_prefix can return a simple selectivity
number, and the question of how to cope with partial patterns is removed
from its API specification.
While at it, adjust the API spec so that callers who don't actually care
about the pattern's selectivity (which is a lot of them) can pass NULL for
the selectivity pointer to skip doing the work of computing a selectivity
estimate.
This patch is only an API refactoring that doesn't actually change any
processing, other than allowing a little bit of useless work to be skipped.
However, it's necessary infrastructure for my upcoming fix to regex prefix
extraction, because after that change there won't be any simple way to
identify the "rest" of the regex, not even to the low level of fidelity
needed by regex_selectivity. We can cope with that if regex_fixed_prefix
and regex_selectivity communicate directly, but not if we have to work
within the old API. Hence, back-patch to all active branches.
We can do this without creating an API break for estimation functions
by passing the collation using the existing fmgr functionality for
passing an input collation as a hidden parameter.
The need for this was foreseen at the outset, but we didn't get around to
making it happen in 9.1 because of the decision to sort all pg_statistic
histograms according to the database's default collation. That meant that
selectivity estimators generally need to use the default collation too,
even if they're estimating for an operator that will do something
different. The reason it's suddenly become more interesting is that
regexp interpretation also uses a collation (for its LC_TYPE not LC_COLLATE
property), and we no longer want to use the wrong collation when examining
regexps during planning. It's not that the selectivity estimate is likely
to change much from this; rather that we are thinking of caching compiled
regexps during planner estimation, and we won't get the intended benefit
if we cache them with a different collation than the executor will use.
Back-patch to 9.1, both because the regexp change is likely to get
back-patched and because we might as well get this right in all
collation-supporting branches, in case any third-party code wants to
rely on getting the collation. The patch turns out to be minuscule
now that I've done it ...
cost_index tries to estimate the per-tuple costs of evaluating filter
conditions (a/k/a qpquals) by subtracting the estimated cost of the
indexqual conditions from that of the baserestrictinfo conditions. This is
correct so long as the indexquals list is a subset of the baserestrictinfo
list. However, in the presence of derived indexable conditions it's
completely wrong, leading to bogus or even negative scan cost estimates,
as seen for example in bug #6579 from Istvan Endredy. In practice the
problem isn't severe except in the specific case of a LIKE optimization on
a functional index containing a very expensive function.
A proper fix for this might change cost estimates by more than people would
like for stable branches, so in the back branches let's just clamp the cost
difference to be not less than zero. That will at least prevent completely
insane behavior, while not changing the results normally.
In commit 57664ed25e I tried to fix a bug
reported by Teodor Sigaev by making non-simple-Var output columns distinct
(by wrapping their expressions with dummy PlaceHolderVar nodes). This did
not work too well. Commit b28ffd0fcc fixed
some ensuing problems with matching to child indexes, but per a recent
report from Claus Stadler, constraint exclusion of UNION ALL subqueries was
still broken, because constant-simplification didn't handle the injected
PlaceHolderVars well either. On reflection, the original patch was quite
misguided: there is no reason to expect that EquivalenceClass child members
will be distinct. So instead of trying to make them so, we should ensure
that we can cope with the situation when they're not.
Accordingly, this patch reverts the code changes in the above-mentioned
commits (though the regression test cases they added stay). Instead, I've
added assorted defenses to make sure that duplicate EC child members don't
cause any problems. Teodor's original problem ("MergeAppend child's
targetlist doesn't match MergeAppend") is addressed more directly by
revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort
list guide creation of each child's sort list.
In passing, get rid of add_sort_column; as far as I can tell, testing for
duplicate sort keys at this stage is dead code. Certainly it doesn't
trigger often enough to be worth expending cycles on in ordinary queries.
And keeping the test would've greatly complicated the new logic in
prepare_sort_from_pathkeys, because comparing pathkey list entries against
a previous output array requires that we not skip any entries in the list.
Back-patch to 9.1, like the previous patches. The only known issue in
this area that wasn't caused by the ill-advised previous patches was the
MergeAppend planning failure, which of course is not relevant before 9.1.
It's possible that we need some of the new defenses against duplicate child
EC entries in older branches, but until there's some clear evidence of that
I'm going to refrain from back-patching further.
This fixes an oversight in commit 11cad29c91,
which introduced MergeAppend plans. Before that happened, we never
particularly cared about the sort ordering of scans of inheritance child
relations, since appending their outputs together would destroy any
ordering anyway. But now it's important to be able to match child relation
sort orderings to those of the surrounding query. The original coding of
add_child_rel_equivalences skipped ec_has_const EquivalenceClasses, on the
originally-correct grounds that adding child expressions to them was
useless. The effect of this is that when a parent variable is equated to
a constant, we can't recognize that index columns on the equivalent child
variables are not sort-significant; that is, we can't recognize that a
child index on, say, (x, y) is able to generate output in "ORDER BY y"
order when there is a clause "WHERE x = constant". Adding child
expressions to the (x, constant) EquivalenceClass fixes this, without any
downside that I can see other than a few more planner cycles expended on
such queries.
Per recent gripe from Robert McGehee. Back-patch to 9.1 where MergeAppend
was introduced.
In commit 57664ed25e, I made the planner
wrap non-simple-variable outputs of appendrel children (IOW, child SELECTs
of UNION ALL subqueries) inside PlaceHolderVars, in order to solve some
issues with EquivalenceClass processing. However, this means that any
upper-level WHERE clauses mentioning such outputs will now contain
PlaceHolderVars after they're pushed down into the appendrel child,
and that prevents indxpath.c from recognizing that they could be matched
to index expressions. To fix, add explicit stripping of PlaceHolderVars
from index operands, same as we have long done for RelabelType nodes.
Add a regression test covering both this and the plain-UNION case (which
is a totally different code path, but should also be able to do it).
Per bug #6416 from Matteo Beccati. Back-patch to 9.1, same as the
previous change.
The uniqueness condition might fail to hold intra-transaction, and assuming
it does can give incorrect query results. Per report from Marti Raudsepp,
though this is not his proposed patch.
Back-patch to 9.0, where both these features were introduced. In the
released branches, add the new IndexOptInfo field to the end of the struct,
to try to minimize ABI breakage for third-party code that may be examining
that struct.
If an indexable operator for a non-collatable indexed datatype has a
collatable right-hand input type, any OpExpr for it will be marked with a
nonzero inputcollid (since having one collatable input is sufficient to
make that happen). However, an index on a non-collatable column certainly
doesn't have any collation. This caused us to fail to match such operators
to their indexes, because indxpath.c required an exact match of index
collation and clause collation. It seems correct to allow a match when the
index is collation-less regardless of the clause's inputcollid: an operator
with both noncollatable and collatable inputs could perhaps depend on the
collation of the collatable input, but it could hardly expect the index for
the noncollatable input to have that same collation.
Per bug #6232 from Pierre Ducroquet. His example is specifically about
"hstore ? text" but the problem seems quite generic.
set_append_rel_pathlist supposed that, while computing per-column width
estimates for the appendrel, it could ignore child rels for which the
translated reltargetlist entry wasn't a Var. This gave rise to completely
silly estimates in some common cases, such as constant outputs from some or
all of the arms of a UNION ALL. Instead, fall back on get_typavgwidth to
estimate from the value's datatype; which might be a poor estimate but at
least it's not completely wacko.
That problem was exposed by an Assert in set_subquery_size_estimates, which
unfortunately was still overoptimistic even with that fix, since we don't
compute attr_widths estimates for appendrels that are entirely excluded by
constraints. So remove the Assert; we'll just fall back on get_typavgwidth
in such cases.
Also, since set_subquery_size_estimates calls set_baserel_size_estimates
which calls set_rel_width, there's no need for set_subquery_size_estimates
to call get_typavgwidth; set_rel_width will handle it for us if we just
leave the estimate set to zero. Remove the unnecessary code.
Per report from Erik Rijkers and subsequent investigation.
A PlaceHolderVar's expression might contain another, lower-level
PlaceHolderVar. If the outer PlaceHolderVar is used, the inner one
certainly will be also, and so we have to make sure that both of them get
into the placeholder_list with correct ph_may_need values during the
initial pre-scan of the query (before deconstruct_jointree starts).
We did this correctly for PlaceHolderVars appearing in the query quals,
but overlooked the issue for those appearing in the top-level targetlist;
with the result that nested placeholders referenced only in the targetlist
did not work correctly, as illustrated in bug #6154.
While at it, add some error checking to find_placeholder_info to ensure
that we don't try to create new placeholders after it's too late to do so;
they have to all be created before deconstruct_jointree starts.
Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.
Regular aggregate functions in combination with, or within the arguments
of, window functions are OK per spec; they have the semantics that the
aggregate output rows are computed and then we run the window functions
over that row set. (Thus, this combination is not really useful unless
there's a GROUP BY so that more than one aggregate output row is possible.)
The case without GROUP BY could fail, as recently reported by Jeff Davis,
because sloppy construction of the Agg node's targetlist resulted in extra
references to possibly-ungrouped Vars appearing outside the aggregate
function calls themselves. See the added regression test case for an
example.
Fixing this requires modifying the API of flatten_tlist and its underlying
function pull_var_clause. I chose to make pull_var_clause's API for
aggregates identical to what it was already doing for placeholders, since
the useful behaviors turn out to be the same (error, report node as-is, or
recurse into it). I also tightened the error checking in this area a bit:
if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here,
that was a long time ago, so complain instead of ignoring them.
Backpatch into 9.1. The failure exists in 8.4 and 9.0 as well, but seeing
that it only occurs in a basically-useless corner case, it doesn't seem
worth the risks of changing a function API in a minor release. There might
be third-party code using pull_var_clause.
The previous coding failed to account properly for the costs of evaluating
the input expressions of aggregates and window functions, as seen in a
recent gripe from Claudio Freire. (I said at the time that it wasn't
counting these costs at all; but on closer inspection, it was effectively
charging these costs once per output tuple. That is completely wrong for
aggregates, and not exactly right for window functions either.)
There was also a hard-wired assumption that aggregates and window functions
had procost 1.0, which is now fixed to respect the actual cataloged costs.
The costing of WindowAgg is still pretty bogus, since it doesn't try to
estimate the effects of spilling data to disk, but that seems like a
separate issue.
Although rowcount estimates really ought not be NaN, a bug elsewhere
could perhaps result in that, and that would cause Assert failure in
cost_mergejoin, which I believe to be the explanation for bug #5977 from
Anton Kuznetsov. Seems like a good idea to expend a couple more cycles
to prevent that, even though the real bug is elsewhere. Not back-patching,
though, because we don't encourage running production systems with
Asserts on.
When we are doing GEQO join planning, the current memory context is a
short-lived context that will be reset at the end of geqo_eval(). However,
the RelOptInfos for base relations are set up before that and then re-used
across many GEQO cycles. Hence, any code that modifies a baserel during
join planning has to be careful not to put pointers to the short-lived
context into the baserel struct. mark_dummy_rel got this wrong, leading to
easy-to-reproduce-once-you-know-how crashes in 8.4, as reported off-list by
Leo Carson of SDSC. Some improvements made in 9.0 make it difficult to
demonstrate the crash in 9.0 or HEAD; but there's no doubt that there's
still a risk factor here, so patch all branches that have the function.
(Note: 8.3 has a similar function, but it's only applied to joinrels and
thus is not a hazard.)
Since collation is effectively an argument, not a property of the function,
FmgrInfo is really the wrong place for it; and this becomes critical in
cases where a cached FmgrInfo is used for varying purposes that might need
different collation settings. Fix by passing it in FunctionCallInfoData
instead. In particular this allows a clean fix for bug #5970 (record_cmp
not working). This requires touching a bit more code than the original
method, but nobody ever thought that collations would not be an invasive
patch...
This is necessary, not optional, now that ILIKE and regexes are collation
aware --- else we might derive a wrong comparison constant for index
optimized pattern matches.
Get rid of bogus collation test in match_special_index_operator (even for
ILIKE, the pattern match operator's collation doesn't matter here, and even
if it did the test was testing the wrong thing).
Fix broken looping logic in expand_indexqual_rowcompare.
Add collation check in match_clause_to_ordering_op.
Make naming and argument ordering more consistent; improve comments.
I'm not sure these have any non-cosmetic implications, but I'm not sure
they don't, either. In particular, ensure the CaseTestExpr generated
by transformAssignmentIndirection to represent the base target column
carries the correct collation, because parse_collate.c won't fix that.
Tweak lsyscache.c API so that we can get the appropriate collation
without an extra syscache lookup.
In nearly all cases, the caller already knows the correct collation, and
in a number of places, the value the caller has handy is more correct than
the default for the type would be. (In particular, this patch makes it
significantly less likely that eval_const_expressions will result in
changing the exposed collation of an expression.) So an internal lookup
is both expensive and wrong.
Instead of playing cute games with pathkeys, just build a direct
representation of the intended sub-select, and feed it through
query_planner to get a Path for the index access. This is a bit slower
than 9.1's previous method, since we'll duplicate most of the overhead of
query_planner; but since the whole optimization only applies to rather
simple single-table queries, that probably won't be much of a problem in
practice. The advantage is that we get to do the right thing when there's
a partial index that needs the implicit IS NOT NULL clause to be usable.
Also, although this makes planagg.c be a bit more closely tied to the
ordering of operations in grouping_planner, we can get rid of some coupling
to lower-level parts of the planner. Per complaint from Marti Raudsepp.
All expression nodes now have an explicit output-collation field, unless
they are known to only return a noncollatable data type (such as boolean
or record). Also, nodes that can invoke collation-aware functions store
a separate field that is the collation value to pass to the function.
This avoids confusion that arises when a function has collatable inputs
and noncollatable output type, or vice versa.
Also, replace the parser's on-the-fly collation assignment method with
a post-pass over the completed expression tree. This allows us to use
a more complex (and hopefully more nearly spec-compliant) assignment
rule without paying for it in extra storage in every expression node.
Fix assorted bugs in the planner's handling of collations by making
collation one of the defining properties of an EquivalenceClass and
by converting CollateExprs into discardable RelabelType nodes during
expression preprocessing.
While this will give wrong answers when estimating selectivity for a
comparison operator that's using a non-default collation, the estimation
error probably won't be large; and anyway the former approach created
estimation errors of its own by trying to use a histogram that might have
been computed with some other collation. So we'll adopt this simplified
approach for now and perhaps improve it sometime in the future.
This patch incorporates changes from Andres Freund to make sure that
selfuncs.c passes a valid collation OID to any datatype-specific function
it calls, in case that function wants collation information. Said OID will
now always be DEFAULT_COLLATION_OID, but at least we won't get errors.
The recent additions for FDW support required checking foreign-table-ness
in several places in the parse/plan chain. While it's not clear whether
that would really result in a noticeable slowdown, it seems best to avoid
any performance risk by keeping a copy of the relation's relkind in
RangeTblEntry. That might have some other uses later, anyway.
Per discussion.
This commit provides the core code and documentation needed. A contrib
module test case will follow shortly.
Shigeru Hanada, Jan Urbanski, Heikki Linnakangas
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
Per my note of a couple days ago, create_index_paths would refuse to
consider any path at all for GIN indexes if the selectivity estimate came
out as 1.0; not even if you tried to force it with enable_seqscan. While
this isn't really a bad outcome in practice, it could be annoying for
testing purposes. Adjust the test for "is this path only useful for
sorting" so that it doesn't fire on paths with nil pathkeys, which will
include all GIN paths.
This is advantageous first because it allows us to hash the smaller table
regardless of the outer-join type, and second because hash join can be more
flexible than merge join in dealing with arbitrary join quals in a FULL
join. For merge join all the join quals have to be mergejoinable, but hash
join will work so long as there's at least one hashjoinable qual --- the
others can be any condition. (This is true essentially because we don't
keep per-inner-tuple match flags in merge join, while hash join can do so.)
To do this, we need a has-it-been-matched flag for each tuple in the
hashtable, not just one for the current outer tuple. The key idea that
makes this practical is that we can store the match flag in the tuple's
infomask, since there are lots of bits there that are of no interest for a
MinimalTuple. So we aren't increasing the size of the hashtable at all for
the feature.
To write this without turning the hash code into even more of a pile of
spaghetti than it already was, I rewrote ExecHashJoin in a state-machine
style, similar to ExecMergeJoin. Other than that decision, it was pretty
straightforward.
This is a heavily revised version of builtin_knngist_core-0.9. The
ordering operators are no longer mixed in with actual quals, which would
have confused not only humans but significant parts of the planner.
Instead, ordering operators are carried separately throughout planning and
execution.
Since the API for ambeginscan and amrescan functions had to be changed
anyway, this commit takes the opportunity to rationalize that a bit.
RelationGetIndexScan no longer forces a premature index_rescan call;
instead, callers of index_beginscan must call index_rescan too. Aside from
making the AM-side initialization logic a bit less peculiar, this has the
advantage that we do not make a useless extra am_rescan call when there are
runtime key values. AMs formerly could not assume that the key values
passed to amrescan were actually valid; now they can.
Teodor Sigaev and Tom Lane
Formerly we looked up the operators associated with each index (caching
them in relcache) and then the planner looked up the btree opfamily
containing such operators in order to build the btree-centric pathkey
representation that describes the index's sort order. This is quite
pointless for btree indexes: we might as well just use the index's opfamily
information directly. That saves syscache lookup cycles during planning,
and furthermore allows us to eliminate the relcache's caching of operators
altogether, which may help in reducing backend startup time.
I added code to plancat.c to perform the same type of double lookup
on-the-fly if it's ever faced with a non-btree amcanorder index AM.
If such a thing actually becomes interesting for production, we should
replace that logic with some more-direct method for identifying the
corresponding btree opfamily; but it's not worth spending effort on now.
There is considerably more to do pursuant to my recent proposal to get rid
of sort-operator-based representations of sort orderings, but this patch
grabs some of the low-hanging fruit. I'll look at the remainder of that
work after the current commitfest.
We no longer need the terminating zero entry in opfamily[], so get rid of
it. Also replace assorted ad-hoc looping logic with simple for and foreach
constructs. This code is now noticeably more readable than it was an hour
ago; credit to Robert for seeing that it could be simplified.
As per the ancient comment for set_rel_width, it really wasn't much good
for relations that aren't plain tables: it would never find any stats and
would always fall back on datatype-based estimates, which are often pretty
silly. Fix that by copying up width estimates from the subquery planning
process.
At some point we might want to do this for CTEs too, but that would be a
significantly more invasive patch because the sub-PlannerInfo is no longer
accessible by the time it's needed. I refrained from doing anything about
that, partly for fear of breaking the unmerged CTE-related patches.
In passing, also generate less bogus width estimates for whole-row Vars.
Per a gripe from Jon Nelson.
Per my recent proposal, get rid of all the direct inspection of indexes
and manual generation of paths in planagg.c. Instead, set up
EquivalenceClasses for the aggregate argument expressions, and let the
regular path generation logic deal with creating paths that can satisfy
those sort orders. This makes planagg.c a bit more visible to the rest
of the planner than it was originally, but the approach is basically a lot
cleaner than before. A major advantage of doing it this way is that we get
MIN/MAX optimization on inheritance trees (using MergeAppend of indexscans)
practically for free, whereas in the old way we'd have had to add a whole
lot more duplicative logic.
One small disadvantage of this approach is that MIN/MAX aggregates can no
longer exploit partial indexes having an "x IS NOT NULL" predicate, unless
that restriction or something that implies it is specified in the query.
The previous implementation was able to use the added "x IS NOT NULL"
condition as an extra predicate proof condition, but in this version we
rely entirely on indexes that are considered usable by the main planning
process. That seems a fair tradeoff for the simplicity and functionality
gained.
It was reporting that these were fully indexed (hence cheap), when of
course they're the exact opposite of that. I'm not certain if the case
would arise in practice, since a clauseless semijoin is hard to produce
in SQL, but if it did happen we'd make some dumb decisions.
The core of this patch is hash_array() and associated typcache
infrastructure, which works just about exactly like the existing support
for array comparison.
In addition I did some work to ensure that the planner won't think that an
array type is hashable unless its element type is hashable, and similarly
for sorting. This includes adding a datatype parameter to op_hashjoinable
and op_mergejoinable, and adding an explicit "hashable" flag to
SortGroupClause. The lack of a cross-check on the element type was a
pre-existing bug in mergejoin support --- but it didn't matter so much
before, because if you couldn't sort the element type there wasn't any good
alternative to failing anyhow. Now that we have the alternative of hashing
the array type, there are cases where we can avoid a failure by being picky
at the planner stage, so it's time to be picky.
The issue of exactly how to combine the per-element hash values to produce
an array hash is still open for discussion, but the rest of this is pretty
solid, so I'll commit it as-is.
Zoltan Boszormenyi exhibited a test case in which planning time was
dominated by construction of EquivalenceClasses and PathKeys that had no
actual relevance to the query (and in fact got discarded immediately).
This happened because we generated PathKeys describing the sort ordering of
every index on every table in the query, and only after that checked to see
if the sort ordering was relevant. The EC/PK construction code is O(N^2)
in the number of ECs, which is all right for the intended number of such
objects, but it gets out of hand if there are ECs for lots of irrelevant
indexes.
To fix, twiddle the handling of mergeclauses a little bit to ensure that
every interesting EC is created before we begin path generation. (This
doesn't cost anything --- in fact I think it's a bit cheaper than before
--- since we always eventually created those ECs anyway.) Then, if an
index column can't be found in any pre-existing EC, we know that that sort
ordering is irrelevant for the query. Instead of creating a useless EC,
we can just not build a pathkey for the index column in the first place.
The index will still be considered if it's useful for non-order-related
reasons, but we will think of its output as unsorted.
This patch eliminates the former need to sort the output of an Append scan
when an ordered scan of an inheritance tree is wanted. This should be
particularly useful for fast-start cases such as queries with LIMIT.
Original patch by Greg Stark, with further hacking by Hans-Jurgen Schonig,
Robert Haas, and Tom Lane.