of a relation in a flat 'joininfo' list. The former arrangement grouped
the join clauses according to the set of unjoined relids used in each;
however, profiling on test cases involving lots of joins proves that
that data structure is a net loss. It takes more time to group the
join clauses together than is saved by avoiding duplicate tests later.
It doesn't help any that there are usually not more than one or two
clauses per group ...
a new PlannerInfo struct, which is passed around instead of the bare
Query in all the planning code. This commit is essentially just a
code-beautification exercise, but it does open the door to making
larger changes to the planner data structures without having to muck
with the widely-known Query struct.
aren't doing anything useful (ie, neither selection nor projection).
Also, extend to SubqueryScan the hacks already in place to avoid
unnecessary ExecProject calls when the result would just be the same
tuple the subquery already delivered. This saves some overhead in
UNION and other set operations, as well as avoiding overhead for
unflatten-able subqueries. Per example from Sokolov Yura.
node, as this behavior is now better done as a bitmap OR indexscan.
This allows considerable simplification in nodeIndexscan.c itself as
well as several planner modules concerned with indexscan plan generation.
Also we can improve the sharing of code between regular and bitmap
indexscans, since they are now working with nigh-identical Plan nodes.
but the code is basically working. Along the way, rewrite the entire
approach to processing OR index conditions, and make it work in join
cases for the first time ever. orindxpath.c is now basically obsolete,
but I left it in for the time being to allow easy comparison testing
against the old implementation.
logic operations during planning. Seems cleaner to create two new Path
node types, instead --- this avoids duplication of cost-estimation code.
Also, create an enable_bitmapscan GUC parameter to control use of bitmap
plans.
scans, using in-memory tuple ID bitmaps as the intermediary. The planner
frontend (path creation and cost estimation) is not there yet, so none
of this code can be executed. I have tested it using some hacked planner
code that is far too ugly to see the light of day, however. Committing
now so that the bulk of the infrastructure changes go in before the tree
drifts under me.
into indexscans on matching indexes. For the moment, it only handles
int4 and text datatypes; next step is to add a column to pg_aggregate
so that all MIN/MAX aggregates can be handled. Per my recent proposal.
few palloc's. I also chose to eliminate the restype and restypmod fields
entirely, since they are redundant with information stored in the node's
contained expression; re-examining the expression at need seems simpler
and more reliable than trying to keep restype/restypmod up to date.
initdb forced due to change in contents of stored rules.
really ought to run before canonicalize_qual, because it can now produce
forms that canonicalize_qual knows how to improve (eg, NOT clauses).
Also, because eval_const_expressions already knows about flattening
nested ANDs and ORs into N-argument form, the initial flatten_andors
pass in canonicalize_qual is now completely redundant and can be
removed. This doesn't save a whole lot of code, but the time and
palloc traffic eliminated is a useful gain on large expression trees.
structs. There are many places in the planner where we were passing
both a rel and an index to subroutines, and now need only pass the
index struct. Notationally simpler, and perhaps a tad faster.
for boolean indexes. Previously we would only use such an index with
WHERE clauses like 'indexkey = true' or 'indexkey = false'. The new
code transforms the cases 'indexkey', 'NOT indexkey', 'indexkey IS TRUE',
and 'indexkey IS FALSE' into one of these. While this is only marginally
useful in itself, I intend soon to change constant-expression simplification
so that 'foo = true' and 'foo = false' are reduced to just 'foo' and
'NOT foo' ... which would lose the ability to use boolean indexes for
such queries at all, if the indexscan machinery couldn't make the
reverse transformation.
grouping_planner() to preprocess_targetlist(), according to a comment
in grouping_planner(). I think the refactoring makes sense, and moves
some extraneous details out of grouping_planner().
Formerly, if such a clause contained no aggregate functions we mistakenly
treated it as equivalent to WHERE. Per spec it must cause the query to
be treated as a grouped query of a single group, the same as appearance
of aggregate functions would do. Also, the HAVING filter must execute
after aggregate function computation even if it itself contains no
aggregate functions.
look at the actual aggregate transition datatypes and the actual overhead
needed by nodeAgg.c, instead of using pessimistic round numbers.
Per a discussion with Michael Tiemann.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
at the top level of the column's old default expression before adding
an implicit coercion to the new column type. This seems to satisfy the
principle of least surprise, as per discussion of bug #1290.
for scanning one term of an OR clause if the index's predicate is implied
by that same OR clause term (possibly in conjunction with top-level WHERE
clauses). Per recent example from Dawid Kuroczko,
http://archives.postgresql.org/pgsql-performance/2004-10/msg00095.php
Also, fix a very long-standing bug in index predicate testing, namely the
bizarre ordering of decomposition of predicate and restriction clauses.
AFAICS the correct way is to break down the predicate all the way, and
then for each component term see if you can prove it from the entire
restriction set. The original coding had a purely-implementation-artifact
distinction between ANDing at the top level and ANDing below that, and
proceeded to get the decomposition order wrong everywhere below the top
level, with the result that even slightly complicated AND/OR predicates
could not be proven. For instance, given
create index foop on foo(f2) where f1=42 or f1=1
or (f1 = 11 and f2 = 55);
the old code would fail to match this index to the query
select * from foo where f1 = 11 and f2 = 55;
when it obviously ought to match.
from Sebastian Böck. The fix involves being more consistent about
when rangetable entries are copied or modified. Someday we really
need to fix this stuff to not scribble on its input data structures
in the first place...
until Bind is received, so that actual parameter values are visible to the
planner. Make use of the parameter values for estimation purposes (but
don't fold them into the actual plan). This buys back most of the
potential loss of plan quality that ensues from using out-of-line
parameters instead of putting literal values right into the query text.
This patch creates a notion of constant-folding expressions 'for
estimation purposes only', in which case we can be more aggressive than
the normal eval_const_expressions() logic can be. Right now the only
difference in behavior is inserting bound values for Params, but it will
be interesting to look at other possibilities. One that we've seen
come up repeatedly is reducing now() and related functions to current
values, so that queries like ... WHERE timestampcol > now() - '1 day'
have some chance of being planned effectively.
Oliver Jowett, with some kibitzing from Tom Lane.
rather than allowing them only in a few special cases as before. In
particular you can now pass a ROW() construct to a function that accepts
a rowtype parameter. Internal generation of RowExprs fixes a number of
corner cases that used to not work very well, such as referencing the
whole-row result of a JOIN or subquery. This represents a further step in
the work I started a month or so back to make rowtype values into
first-class citizens.
by the set operation, so that redundant sorts at higher levels can be
avoided. This was foreseen a good while back, but not done. Per request
from Karel Zak.
corner cases that could stand improvement, but it does all the basic
stuff. A byproduct is that the selectivity routines are no longer
constrained to working on simple Vars; we might in future be able to
improve the behavior for subexpressions that don't match indexes.
that it's good to join where there are join clauses rather than where there
are not. Also enable it to generate bushy plans at need, so that it doesn't
fail in the presence of multiple IN clauses containing sub-joins. These
changes appear to improve the behavior enough that we can substantially reduce
the default pool size and generations count, thereby decreasing the runtime,
and yet get as good or better plans as we were getting in 7.4. Consequently,
adjust the default GEQO parameters. I also modified the way geqo_effort is
used so that it affects both population size and number of generations;
it's now useful as a single control to adjust the GEQO runtime-vs-plan-quality
tradeoff. Bump geqo_threshold to 12, since even with these changes GEQO
seems to be slower than the regular planner at 11 relations.
default value for geqo_effort is supposed to be 40, not 1. The actual
'genetic' component of the GEQO algorithm has been practically disabled
since 7.1 because of this mistake. Improve documentation while at it.
check instead of hardwiring assumptions that only certain plan node types
can appear at the places where we are testing. This was always a pretty
fragile assumption, and it turns out to be broken in 7.4 for certain cases
involving IN-subselect tests that need type coercion.
Also, modify code that builds finished Plan tree so that node types that
don't do projection always copy their input node's targetlist, rather than
having the tlist passed in from the caller. The old method makes it too
easy to write broken code that thinks it can modify the tlist when it
cannot.
with index qual clauses in the Path representation. This saves a little
work during createplan and (probably more importantly) allows reuse of
cached selectivity estimates during indexscan planning. Also fix latent
bug: wrong plan would have been generated for a 'special operator' used
in a nestloop-inner-indexscan join qual, because the special operator
would not have gotten into the list of quals to recheck. This bug is
only latent because at present the special-operator code could never
trigger on a join qual, but sooner or later someone will want to do it.
join conditions in which each OR subclause includes a constraint on
the same relation. This implements the other useful side-effect of
conversion to CNF format, without its unpleasant side-effects. As
per pghackers discussion of a few weeks ago.
teaching the latter to accept either RestrictInfo nodes or bare
clause expressions; and cache the selectivity result in the RestrictInfo
node when possible. This extends the caching behavior of approx_selectivity
to many more contexts, and should reduce duplicate selectivity
calculations.
first time generate an OR indexscan for a two-column index when the WHERE
condition is like 'col1 = foo AND (col2 = bar OR col2 = baz)' --- before,
the OR had to be on the first column of the index or we'd not notice the
possibility of using it. Some progress towards extracting OR indexscans
from subclauses of an OR that references multiple relations, too, although
this code is #ifdef'd out because it needs more work.
fields: now they are valid whenever the clause is a binary opclause,
not only when it is a potential join clause (there is a new boolean
field canjoin to signal the latter condition). This lets us avoid
recomputing the relid sets over and over while examining indexes.
Still more work to do to make this as useful as it could be, because
there are places that could use the info but don't have access to the
RestrictInfo node.
about whether it is applied before or after eval_const_expressions().
I believe there were some corner cases where the system would fail to
recognize that a partial index is applicable because of the previous
inconsistency. Store normal rather than 'implicit AND' representations
of constraints and index predicates in the catalogs.
initdb forced due to representation change of constraints/predicates.
sequence every time it's called is bogus --- it interferes with user
control over the seed, and actually decreases randomness overall
(because a seed based on time(NULL) is pretty predictable). If you really
want a reproducible result from geqo, do 'set seed = 0' before planning
a query.
node emits only those vars that are actually needed above it in the
plan tree. (There were comments in the code suggesting that this was
done at some point in the dim past, but for a long time we have just
made join nodes emit everything that either input emitted.) Aside from
being marginally more efficient, this fixes the problem noted by Peter
Eisentraut where a join above an IN-implemented-as-join might fail,
because the subplan targetlist constructed in the latter case didn't
meet the expectation of including everything.
Along the way, fix some places that were O(N^2) in the targetlist
length. This is not all the trouble spots for wide queries by any
means, but it's a step forward.
'scalar op ALL (array)', where the operator is applied between the
lefthand scalar and each element of the array. The operator must
yield boolean; the result of the construct is the OR or AND of the
per-element results, respectively.
Original coding by Joe Conway, after an idea of Peter's. Rewritten
by Tom to keep the implementation strictly separate from subqueries.