clauselist_selectivity skip some analysis that's useless when there's only
one clause in the given list. Actually this can win even for not-so-simple
queries, because we also apply clauselist_selectivity to sublists such as the
quals matching an index; which are likely to have only a single entry even
when the total query is quite complicated.
the number of rows likely to be produced by a query such as
SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL;
What this is doing is selecting for t1 rows with no match in t2, and thus
it may produce a significant number of rows even if the t2.key table column
contains no nulls at all. 8.2 thinks the table column's null fraction is
relevant and thus may estimate no rows out, which results in terrible plans
if there are more joins above this one. A proper fix for this will involve
passing much more information about the context of a clause to the selectivity
estimator functions than we ever have. There's no time left to write such a
patch for 8.3, and it wouldn't be back-patchable into 8.2 anyway. Instead,
put in an ad-hoc test to defeat the normal table-stats-based estimation when
an IS NULL test is evaluated at an outer join, and just use a constant
estimate instead --- I went with 0.5 for lack of a better idea. This won't
catch every case but it will catch the typical ways of writing such queries,
and it seems unlikely to make things worse for other queries.
Along the way, allow FOR UPDATE in non-WITH-HOLD cursors; there may once
have been a reason to disallow that, but it seems to work now, and it's
really rather necessary if you want to select a row via a cursor and then
update it in a concurrent-safe fashion.
Original patch by Arul Shaji, rather heavily editorialized by Tom Lane.
are mostly excluded by constraints: do the CE test a bit earlier to save
some adjust_appendrel_attrs() work on excluded children, and arrange to
use array indexing rather than rt_fetch() to fetch RTEs in the main body
of the planner. The latter is something I'd wanted to do for awhile anyway,
but seeing list_nth_cell() as 35% of the runtime gets one's attention.
this code was last gone over, there wasn't really any alternative to
globals because we didn't have the PlannerInfo struct being passed all
through the planner code. Now that we do, we can restructure things
to avoid non-reentrancy. I'm fooling with this because otherwise I'd
have had to add another global variable for the planned compact
range table list.
clauses containing no variables and no volatile functions. Such a clause
can be used as a one-time qual in a gating Result plan node, to suppress
plan execution entirely when it is false. Even when the clause is true,
putting it in a gating node wins by avoiding repeated evaluation of the
clause. In previous PG releases, query_planner() would do this for
pseudoconstant clauses appearing at the top level of the jointree, but
there was no ability to generate a gating Result deeper in the plan tree.
To fix it, get rid of the special case in query_planner(), and instead
process pseudoconstant clauses through the normal RestrictInfo qual
distribution mechanism. When a pseudoconstant clause is found attached to
a path node in create_plan(), pull it out and generate a gating Result at
that point. This requires special-casing pseudoconstants in selectivity
estimation and cost_qual_eval, but on the whole it's pretty clean.
It probably even makes the planner a bit faster than before for the normal
case of no pseudoconstants, since removing pull_constant_clauses saves one
useless traversal of the qual tree. Per gripe from Phil Frost.
not likely ever to be implemented seeing it's been removed from SQL2003.
This allows getting rid of the 'filter' version of yylex() that we had in
parser.c, which should save at least a few microseconds in parsing.
qualification when the underlying operator is indexable and useOr is true.
That is, indexkey op ANY (ARRAY[...]) is effectively translated into an
OR combination of one indexscan for each array element. This only works
for bitmap index scans, of course, since regular indexscans no longer
support OR'ing of scans. There are still some loose ends to clean up
before changing 'x IN (list)' to translate as a ScalarArrayOpExpr;
for instance predtest.c ought to be taught about it. But this gets the
basic functionality in place.
A RestrictInfo representing an OR clause now contains two versions of
the contained expression, one with sub-RestrictInfos and one without.
clause_selectivity() should descend to the version with sub-RestrictInfos
so that it has a chance of caching its results for the OR's sub-clauses.
Failing to do so resulted in redundant planner effort.
a new PlannerInfo struct, which is passed around instead of the bare
Query in all the planning code. This commit is essentially just a
code-beautification exercise, but it does open the door to making
larger changes to the planner data structures without having to muck
with the widely-known Query struct.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
estimates when combining the estimates for a range query. As pointed out
by Miquel van Smoorenburg, the existing check for an impossible combined
result would quite possibly fail to detect one default and one non-default
input. It seems better to use the default range query estimate in such
cases. To do so, add a check for an estimate of exactly DEFAULT_INEQ_SEL.
This is a bit ugly because it introduces additional coupling between
clauselist_selectivity and scalarltsel/scalargtsel, but it's not like
there wasn't plenty already...
until Bind is received, so that actual parameter values are visible to the
planner. Make use of the parameter values for estimation purposes (but
don't fold them into the actual plan). This buys back most of the
potential loss of plan quality that ensues from using out-of-line
parameters instead of putting literal values right into the query text.
This patch creates a notion of constant-folding expressions 'for
estimation purposes only', in which case we can be more aggressive than
the normal eval_const_expressions() logic can be. Right now the only
difference in behavior is inserting bound values for Params, but it will
be interesting to look at other possibilities. One that we've seen
come up repeatedly is reducing now() and related functions to current
values, so that queries like ... WHERE timestampcol > now() - '1 day'
have some chance of being planned effectively.
Oliver Jowett, with some kibitzing from Tom Lane.
In the past, we used a 'Lispy' linked list implementation: a "list" was
merely a pointer to the head node of the list. The problem with that
design is that it makes lappend() and length() linear time. This patch
fixes that problem (and others) by maintaining a count of the list
length and a pointer to the tail node along with each head node pointer.
A "list" is now a pointer to a structure containing some meta-data
about the list; the head and tail pointers in that structure refer
to ListCell structures that maintain the actual linked list of nodes.
The function names of the list API have also been changed to, I hope,
be more logically consistent. By default, the old function names are
still available; they will be disabled-by-default once the rest of
the tree has been updated to use the new API names.
rather than allowing them only in a few special cases as before. In
particular you can now pass a ROW() construct to a function that accepts
a rowtype parameter. Internal generation of RowExprs fixes a number of
corner cases that used to not work very well, such as referencing the
whole-row result of a JOIN or subquery. This represents a further step in
the work I started a month or so back to make rowtype values into
first-class citizens.
teaching the latter to accept either RestrictInfo nodes or bare
clause expressions; and cache the selectivity result in the RestrictInfo
node when possible. This extends the caching behavior of approx_selectivity
to many more contexts, and should reduce duplicate selectivity
calculations.
'scalar op ALL (array)', where the operator is applied between the
lefthand scalar and each element of the array. The operator must
yield boolean; the result of the construct is the OR or AND of the
per-element results, respectively.
Original coding by Joe Conway, after an idea of Peter's. Rewritten
by Tom to keep the implementation strictly separate from subqueries.
startup, not in the parser; this allows ALTER DOMAIN to work correctly
with domain constraint operations stored in rules. Rod Taylor;
code review by Tom Lane.
passed to join selectivity estimators. Make use of this in eqjoinsel
to derive non-bogus selectivity for IN clauses. Further tweaking of
cost estimation for IN.
initdb forced because of pg_proc.h changes.
containing a volatile function), rather than only on 'Var = Var' clauses
as before. This makes it practical to do flatten_join_alias_vars at the
start of planning, which in turn eliminates a bunch of klugery inside the
planner to deal with alias vars. As a free side effect, we now detect
implied equality of non-Var expressions; for example in
SELECT ... WHERE a.x = b.y and b.y = 42
we will deduce a.x = 42 and use that as a restriction qual on a. Also,
we can remove the restriction introduced 12/5/02 to prevent pullup of
subqueries whose targetlists contain sublinks.
Still TODO: make statistical estimation routines in selfuncs.c and costsize.c
smarter about expressions that are more complex than plain Vars. The need
for this is considerably greater now that we have to be able to estimate
the suitability of merge and hash join techniques on such expressions.
so that all executable expression nodes inherit from a common supertype
Expr. This is somewhat of an exercise in code purity rather than any
real functional advance, but getting rid of the extra Oper or Func node
formerly used in each operator or function call should provide at least
a little space and speed improvement.
initdb forced by changes in stored-rules representation.
Ray Ontko 28-June-02. Also, fix prefix_selectivity for NAME lefthand
variables (it was bogusly assuming binary compatibility), and adjust
make_greater_string() to not call pg_mbcliplen() with invalid multibyte
data (this last per bug report that I can't find at the moment, but it
was in July '02).
some kibitzing from Tom Lane. Not everything works yet, and there's
no documentation or regression test, but let's commit this so Joe
doesn't need to cope with tracking changes in so many files ...
o Change all current CVS messages of NOTICE to WARNING. We were going
to do this just before 7.3 beta but it has to be done now, as you will
see below.
o Change current INFO messages that should be controlled by
client_min_messages to NOTICE.
o Force remaining INFO messages, like from EXPLAIN, VACUUM VERBOSE, etc.
to always go to the client.
o Remove INFO from the client_min_messages options and add NOTICE.
Seems we do need three non-ERROR elog levels to handle the various
behaviors we need for these messages.
Regression passed.
IS TRUE, etc, with some degree of verisimilitude. Split out
selectivity support functions from builtins.h into a new header
file selfuncs.h, so as to reduce the number of header files builtins.h
must depend on. Fix a few missing inclusions exposed thereby.
From Joe Conway, with some kibitzing from Tom Lane.
of costsize.c routines to pass Query root, so that costsize can figure
more things out by itself and not be so dependent on its callers to tell
it everything it needs to know. Use selectivity of hash or merge clause
to estimate number of tuples processed internally in these joins
(this is more useful than it would've been before, since eqjoinsel is
somewhat more accurate than before).
create_index_paths are not immediately discarded, but are available for
subsequent planner work. This allows avoiding redundant syscache lookups
in several places. Change interface to operator selectivity estimation
procedures to allow faster and more flexible estimation.
Initdb forced due to change of pg_proc entries for selectivity functions!
right thing with variable-free clauses that contain noncachable functions,
such as 'WHERE random() < 0.5' --- these are evaluated once per
potential output tuple. Expressions that contain only Params are
now candidates to be indexscan quals --- for example, 'var = ($1 + 1)'
can now be indexed. Cope with RelabelType nodes atop potential indexscan
variables --- this oversight prevents 7.0.* from recognizing some
potentially indexscanable situations.