1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-18 17:42:25 +03:00
Commit Graph

546 Commits

Author SHA1 Message Date
31468d05d8 Dept of better ideas: refrain from creating the planner's placeholder_list
until vars are distributed to rels during query_planner() startup.  We don't
really need it before that, and not building it early has some advantages.
First, we don't need to put it through the various preprocessing steps, which
saves some cycles and eliminates the need for a number of routines to support
PlaceHolderInfo nodes at all.  Second, this means one less unused plan for any
sub-SELECT appearing in a placeholder's expression, since we don't build
placeholder_list until after sublink expansion is complete.
2008-10-22 20:17:52 +00:00
e6ae3b5dbf Add a concept of "placeholder" variables to the planner. These are variables
that represent some expression that we desire to compute below the top level
of the plan, and then let that value "bubble up" as though it were a plain
Var (ie, a column value).

The immediate application is to allow sub-selects to be flattened even when
they are below an outer join and have non-nullable output expressions.
Formerly we couldn't flatten because such an expression wouldn't properly
go to NULL when evaluated above the outer join.  Now, we wrap it in a
PlaceHolderVar and arrange for the actual evaluation to occur below the outer
join.  When the resulting Var bubbles up through the join, it will be set to
NULL if necessary, yielding the correct results.  This fixes a planner
limitation that's existed since 7.1.

In future we might want to use this mechanism to re-introduce some form of
Hellerstein's "expensive functions" optimization, ie place the evaluation of
an expensive function at the most suitable point in the plan tree.
2008-10-21 20:42:53 +00:00
44d5be0e53 Implement SQL-standard WITH clauses, including WITH RECURSIVE.
There are some unimplemented aspects: recursive queries must use UNION ALL
(should allow UNION too), and we don't have SEARCH or CYCLE clauses.
These might or might not get done for 8.4, but even without them it's a
pretty useful feature.

There are also a couple of small loose ends and definitional quibbles,
which I'll send a memo about to pgsql-hackers shortly.  But let's land
the patch now so we can get on with other development.

Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
2008-10-04 21:56:55 +00:00
ee33b95d9c Improve the plan cache invalidation mechanism to make it invalidate plans
when user-defined functions used in a plan are modified.  Also invalidate
plans when schemas, operators, or operator classes are modified; but for these
cases we just invalidate everything rather than tracking exact dependencies,
since these types of objects seldom change in a production database.

Tom Lane; loosely based on a patch by Martin Pihlak.
2008-09-09 18:58:09 +00:00
19e34b6239 Improve sublink pullup code to handle ANY/EXISTS sublinks that are at top
level of a JOIN/ON clause, not only at top level of WHERE.  (However, we
can't do this in an outer join's ON clause, unless the ANY/EXISTS refers
only to the nullable side of the outer join, so that it can effectively
be pushed down into the nullable side.)  Per request from Kevin Grittner.

In passing, fix a bug in the initial implementation of EXISTS pullup:
it would Assert if the EXIST's WHERE clause used a join alias variable.
Since we haven't yet flattened join aliases when this transformation
happens, it's necessary to include join relids in the computed set of
RHS relids.
2008-08-17 01:20:00 +00:00
e006a24ad1 Implement SEMI and ANTI joins in the planner and executor. (Semijoins replace
the old JOIN_IN code, but antijoins are new functionality.)  Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively.  Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins.  Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo".  With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation.  That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch.  So for the moment the output size estimates for semi and
especially anti joins are quite bogus.
2008-08-14 18:48:00 +00:00
2d1d96b1ce Teach the system how to use hashing for UNION. (INTERSECT/EXCEPT will follow,
but seem like a separate patch since most of the remaining work is on the
executor side.)  I took the opportunity to push selection of the grouping
operators for set operations into the parser where it belongs.  Otherwise this
is just a small exercise in making prepunion.c consider both alternatives.

As with the recent DISTINCT patch, this means we can UNION on datatypes that
can hash but not sort, and it means that UNION without ORDER BY is no longer
certain to produce sorted output.
2008-08-07 01:11:52 +00:00
c78248c91d Department of second thoughts: fix newly-added code in planner.c to make real
sure that DISTINCT ON does what it's supposed to, ie, sort by the full ORDER
BY list before unique-ifying.  The error seems masked in simple cases by the
fact that query_planner won't return query pathkeys that only partially match
the requested sort order, but I wouldn't want to bet that it couldn't be
exposed in some way or other.
2008-08-05 16:03:10 +00:00
be3b265c94 Improve SELECT DISTINCT to consider hash aggregation, as well as sort/uniq,
as methods for implementing the DISTINCT step.  This eliminates the former
performance gap between DISTINCT and GROUP BY, and also makes it possible
to do SELECT DISTINCT on datatypes that only support hashing not sorting.

SELECT DISTINCT ON is still always implemented by sorting; it would take
executor changes to support hashing that, and it's not clear it's worth
the trouble.

This is a release-note-worthy incompatibility from previous PG versions,
since SELECT DISTINCT can no longer be counted on to deliver sorted output
without explicitly saying ORDER BY.  (Anyone who can't cope with that
can consider turning off enable_hashagg.)

Several regression test queries needed to have ORDER BY added to preserve
stable output order.  I fixed the ones that manifested here, but there
might be some other cases that show up on other platforms.
2008-08-05 02:43:18 +00:00
ec73b56a31 Make GROUP BY work properly for datatypes that only support hashing and not
sorting.  The infrastructure for this was all in place already; it's only
necessary to fix the planner to not assume that sorting is always an available
option.
2008-08-03 19:10:52 +00:00
9511304752 Rearrange the querytree representation of ORDER BY/GROUP BY/DISTINCT items
as per my recent proposal:

1. Fold SortClause and GroupClause into a single node type SortGroupClause.
We were already relying on them to be struct-equivalent, so using two node
tags wasn't accomplishing much except to get in the way of comparing items
with equal().

2. Add an "eqop" field to SortGroupClause to carry the associated equality
operator.  This is cheap for the parser to get at the same time it's looking
up the sort operator, and storing it eliminates the need for repeated
not-so-cheap lookups during planning.  In future this will also let us
represent GROUP/DISTINCT operations on datatypes that have hash opclasses
but no btree opclasses (ie, they have equality but no natural sort order).
The previous representation simply didn't work for that, since its only
indicator of comparison semantics was a sort operator.

3. Add a hasDistinctOn boolean to struct Query to explicitly record whether
the distinctClause came from DISTINCT or DISTINCT ON.  This allows removing
some complicated and not 100% bulletproof code that attempted to figure
that out from the distinctClause alone.

This patch doesn't in itself create any new capability, but it's necessary
infrastructure for future attempts to use hash-based grouping for DISTINCT
and UNION/INTERSECT/EXCEPT.
2008-08-02 21:32:01 +00:00
63247bec28 Fix parser so that we don't modify the user-written ORDER BY list in order
to represent DISTINCT or DISTINCT ON.  This gets rid of a longstanding
annoyance that a view or rule using SELECT DISTINCT will be dumped out
with an overspecified ORDER BY list, and is one small step along the way
to decoupling DISTINCT and ORDER BY enough so that hash-based implementation
of DISTINCT will be possible.  In passing, improve transformDistinctClause
so that it doesn't reject duplicate DISTINCT ON items, as was reported by
Steve Midgley a couple weeks ago.
2008-07-31 22:47:56 +00:00
eaf1b5d348 Tighten up SS_finalize_plan's computation of valid_params to exclude Params of
the current query level that aren't in fact output parameters of the current
initPlans.  (This means, for example, output parameters of regular subplans.)
To make this work correctly for output parameters coming from sibling
initplans requires rejiggering the API of SS_finalize_plan just a bit:
we need the siblings to be visible to it, rather than hidden as
SS_make_initplan_from_plan had been doing.  This is really part of my response
to bug #4290, but I concluded this part probably shouldn't be back-patched,
since all that it's doing is to make a debugging cross-check tighter.
2008-07-10 02:14:03 +00:00
db147b3483 Allow the planner's estimate of the fraction of a cursor's rows that will be
retrieved to be controlled through a GUC variable.

Robert Hell
2008-05-02 21:26:10 +00:00
25e46a504b Fix a couple of oversights associated with the "physical tlist" optimization:
we had several code paths where a physical tlist could be used for the input
to a Sort node, which is a dumb idea because any unneeded table columns will
increase the volume of data the sort has to push around.

(Unfortunately the easy-looking fix of calling disuse_physical_tlist during
make_sort_xxx doesn't work because in most cases we're already committed to
the current input tlist --- it's been marked with sort column numbers, or
we've built grouping column numbers using it, etc.  The tlist has to be
selected properly at the calling level before we start constructing sort-col
information.  This is easy enough to do, we were just failing to take the
point into consideration.)

Back-patch to 8.3.  I believe the problem probably exists clear back to 7.4
when the physical tlist optimization was added, but I'm afraid to back-patch
further than 8.3 without a great deal more study than I want to put into it.
The code in this area has drifted a lot over time.  The real-world importance
of these code paths is uncertain anyway --- I think in many cases we'd
probably prefer hash-based methods.
2008-04-17 21:22:14 +00:00
6b73d7e567 Fix an oversight I made in a cleanup patch over a year ago:
eval_const_expressions needs to be passed the PlannerInfo ("root") structure,
because in some cases we want it to substitute values for Param nodes.
(So "constant" is not so constant as all that ...)  This mistake partially
disabled optimization of unnamed extended-Query statements in 8.3: in
particular the LIKE-to-indexscan optimization would never be applied if the
LIKE pattern was passed as a parameter, and constraint exclusion depending
on a parameter value didn't work either.
2008-04-01 00:48:33 +00:00
6fc9d4272c Revert my erroneous fix for Taiki Yamaguchi's DISTINCT MAX() bug.
Whatever we do about that, this isn't the path to the solution.
2008-03-29 00:15:28 +00:00
2e4094dad8 Department of second thoughts: the rule that ORDER BY and DISTINCT are
useless for an ungrouped-aggregate query holds regardless of whether
optimize_minmax_aggregates succeeds.  So we might as well apply the
optimization in any case.

I'll leave 8.3 as it was, since this version is a tad more invasive
than my earlier patch.
2008-03-28 02:00:11 +00:00
ff72280c9e When we have successfully optimized a MIN or MAX aggregate into an indexscan,
the query result must be exactly one row (since we don't do this when there's
any GROUP BY).  Therefore any ORDER BY or DISTINCT attached to the query is
useless and can be dropped.  Aside from saving useless cycles, this protects
us against problems with matching the hacked-up tlist entries to sort clauses,
as seen in a bug report from Taiki Yamaguchi.  We might need to work harder
if we ever try to optimize grouped queries with this approach, but this
solution will do for now.
2008-03-27 19:06:14 +00:00
0d49838df6 Arrange to "inline" SQL functions that appear in a query's FROM clause,
are declared to return set, and consist of just a single SELECT.  We
can replace the FROM-item with a sub-SELECT and then optimize much as
if we were dealing with a view.  Patch from Richard Rowell, cleaned up
by me.
2008-03-18 22:04:14 +00:00
9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
f6e8730d11 Re-run pgindent with updated list of typedefs. (Updated README should
avoid this problem in the future.)
2007-11-15 22:25:18 +00:00
fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
82d8ab6fc4 Fix the plan-invalidation mechanism to treat regclass constants that refer to
a relation as a reason to invalidate a plan when the relation changes.  This
handles scenarios such as dropping/recreating a sequence that is referenced by
nextval('seq') in a cached plan.  Rather than teach plancache.c all about
digging through plan trees to find regclass Consts, we charge the planner's
setrefs.c with making a list of the relation OIDs on which each plan depends.
That way the list can be built cheaply during a plan tree traversal that has
to happen anyway.  Per bug #3662 and subsequent discussion.
2007-10-11 18:05:27 +00:00
282d2a03dd HOT updates. When we update a tuple without changing any of its indexed
columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version.  Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.

In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.

Pavan Deolasee, with help from a bunch of other people.
2007-09-20 17:56:33 +00:00
cadb78330e Repair two constraint-exclusion corner cases triggered by proving that an
inheritance child of an UPDATE/DELETE target relation can be excluded by
constraints.  I had rearranged some code in set_append_rel_pathlist() to
avoid "useless" work when a child is excluded, but overdid it and left
the child with no cheapest_path entry, causing possible failure later
if the appendrel was involved in a join.  Also, it seems that the dummy
plan generated by inheritance_planner() when all branches are excluded
has to be a bit less dummy now than was required in 8.2.
Per report from Jan Wieck.  Add his test case to the regression tests.
2007-05-26 18:23:02 +00:00
604ffd280b Create hooks to let a loadable plugin monitor (or even replace) the planner
and/or create plans for hypothetical situations; in particular, investigate
plans that would be generated using hypothetical indexes.  This is a
heavily-rewritten version of the hooks proposed by Gurjeet Singh for his
Index Advisor project.  In this formulation, the index advisor can be
entirely a loadable module instead of requiring a significant part to be
in the core backend, and plans can be generated for hypothetical indexes
without requiring the creation and rolling-back of system catalog entries.

The index advisor patch as-submitted is not compatible with these hooks,
but it needs significant work anyway due to other 8.2-to-8.3 planner
changes.  With these hooks in the core backend, development of the advisor
can proceed as a pgfoundry project.
2007-05-25 17:54:25 +00:00
d26559dbf3 Teach tuplesort.c about "top N" sorting, in which only the first N tuples
need be returned.  We keep a heap of the current best N tuples and sift-up
new tuples into it as we scan the input.  For M input tuples this means
only about M*log(N) comparisons instead of M*log(M), not to mention a lot
less workspace when N is small --- avoiding spill-to-disk for large M
is actually the most attractive thing about it.  Patch includes planner
and executor support for invoking this facility in ORDER BY ... LIMIT
queries.  Greg Stark, with some editorialization by moi.
2007-05-04 01:13:45 +00:00
bbbe825f5f Modify processing of DECLARE CURSOR and EXPLAIN so that they can resolve the
types of unspecified parameters when submitted via extended query protocol.
This worked in 8.2 but I had broken it during plancache changes.  DECLARE
CURSOR is now treated almost exactly like a plain SELECT through parse
analysis, rewrite, and planning; only just before sending to the executor
do we divert it away to ProcessUtility.  This requires a special-case check
in a number of places, but practically all of them were already special-casing
SELECT INTO, so it's not too ugly.  (Maybe it would be a good idea to merge
the two by treating IntoClause as a form of utility statement?  Not going to
worry about that now, though.)  That approach doesn't work for EXPLAIN,
however, so for that I punted and used a klugy solution of running parse
analysis an extra time if under extended query protocol.
2007-04-27 22:05:49 +00:00
66888f7424 Expose more cursor-related functionality in SPI: specifically, allow
access to the planner's cursor-related planning options, and provide new
FETCH/MOVE routines that allow access to the full power of those commands.
Small refactoring of planner(), pg_plan_query(), and pg_plan_queries()
APIs to make it convenient to pass the planning options down from SPI.

This is the core-code portion of Pavel Stehule's patch for scrollable
cursor support in plpgsql; I'll review and apply the plpgsql changes
separately.
2007-04-16 01:14:58 +00:00
c7ff7663e4 Get rid of the separate EState for subplans, and just let them share the
parent query's EState.  Now that there's a single flat rangetable for both
the main plan and subplans, there's no need anymore for a separate EState,
and removing it allows cleaning up some crufty code in nodeSubplan.c and
nodeSubqueryscan.c.  Should be a tad faster too, although any difference
will probably be hard to measure.  This is the last bit of subsidiary
mop-up work from changing to a flat rangetable.
2007-02-27 01:11:26 +00:00
eab6b8b27e Turn the rangetable used by the executor into a flat list, and avoid storing
useless substructure for its RangeTblEntry nodes.  (I chose to keep using the
same struct node type and just zero out the link fields for unneeded info,
rather than making a separate ExecRangeTblEntry type --- it seemed too
fragile to have two different rangetable representations.)

Along the way, put subplans into a list in the toplevel PlannedStmt node,
and have SubPlan nodes refer to them by list index instead of direct pointers.
Vadim wanted to do that years ago, but I never understood what he was on about
until now.  It makes things a *whole* lot more robust, because we can stop
worrying about duplicate processing of subplans during expression tree
traversals.  That's been a constant source of bugs, and it's finally gone.

There are some consequent simplifications yet to be made, like not using
a separate EState for subplans in the executor, but I'll tackle that later.
2007-02-22 22:00:26 +00:00
9cbd0c155d Remove the Query structure from the executor's API. This allows us to stop
storing mostly-redundant Query trees in prepared statements, portals, etc.
To replace Query, a new node type called PlannedStmt is inserted by the
planner at the top of a completed plan tree; this carries just the fields of
Query that are still needed at runtime.  The statement lists kept in portals
etc. now consist of intermixed PlannedStmt and bare utility-statement nodes
--- no Query.  This incidentally allows us to remove some fields from Query
and Plan nodes that shouldn't have been there in the first place.

Still to do: simplify the execution-time range table; at the moment the
range table passed to the executor still contains Query trees for subqueries.

initdb forced due to change of stored rules.
2007-02-20 17:32:18 +00:00
7c5e5439d2 Get rid of some old and crufty global variables in the planner. When
this code was last gone over, there wasn't really any alternative to
globals because we didn't have the PlannerInfo struct being passed all
through the planner code.  Now that we do, we can restructure things
to avoid non-reentrancy.  I'm fooling with this because otherwise I'd
have had to add another global variable for the planned compact
range table list.
2007-02-19 07:03:34 +00:00
f41803bb39 Refactor planner's pathkeys data structure to create a separate, explicit
representation of equivalence classes of variables.  This is an extensive
rewrite, but it brings a number of benefits:
* planner no longer fails in the presence of "incomplete" operator families
that don't offer operators for every possible combination of datatypes.
* avoid generating and then discarding redundant equality clauses.
* remove bogus assumption that derived equalities always use operators
named "=".
* mergejoins can work with a variety of sort orders (e.g., descending) now,
instead of tying each mergejoinable operator to exactly one sort order.
* better recognition of redundant sort columns.
* can make use of equalities appearing underneath an outer join.
2007-01-20 20:45:41 +00:00
a191a169d6 Change the planner-to-executor API so that the planner tells the executor
which comparison operators to use for plan nodes involving tuple comparison
(Agg, Group, Unique, SetOp).  Formerly the executor looked up the default
equality operator for the datatype, which was really pretty shaky, since it's
possible that the data being fed to the node is sorted according to some
nondefault operator class that could have an incompatible idea of equality.
The planner knows what it has sorted by and therefore can provide the right
equality operator to use.  Also, this change moves a couple of catalog lookups
out of the executor and into the planner, which should help startup time for
pre-planned queries by some small amount.  Modify the planner to remove some
other cavalier assumptions about always being able to use the default
operators.  Also add "nulls first/last" info to the Plan node for a mergejoin
--- neither the executor nor the planner can cope yet, but at least the API is
in place.
2007-01-10 18:06:05 +00:00
29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
7a3e30e608 Add INSERT/UPDATE/DELETE RETURNING, with basic docs and regression tests.
plpgsql support to come later.  Along the way, convert execMain's
SELECT INTO support into a DestReceiver, in order to eliminate some ugly
special cases.

Jonah Harris and Tom Lane
2006-08-12 02:52:06 +00:00
635d42e9c3 Fix inheritance_planner() to delete dummy subplans from its Append plan
list, when some of the child rels have been excluded by constraint
exclusion.  This doesn't save a huge amount of time but it'll save some,
and it makes the EXPLAIN output look saner.  We already did the
equivalent thing in set_append_rel_pathlist(), but not here.
2006-08-05 17:21:52 +00:00
9caafda579 Add support for multi-row VALUES clauses as part of INSERT statements
(e.g. "INSERT ... VALUES (...), (...), ...") and elsewhere as allowed
by the spec. (e.g. similar to a FROM clause subselect). initdb required.
Joe Conway and Tom Lane.
2006-08-02 01:59:48 +00:00
a998a69247 Code review for bigint-LIMIT patch. Fix missed planner dependency,
eliminate unnecessary code, force initdb because stored rules change
(limit nodes are now supposed to be int8 not int4 expressions).
Update comments and error messages, which still all said 'integer'.
2006-07-26 19:31:51 +00:00
085e559654 Change LIMIT/OFFSET to use int8
Dhanaraj M
2006-07-26 00:34:48 +00:00
e0522505bd Remove 576 references of include files that were not needed. 2006-07-14 14:52:27 +00:00
0ff3461bcc Alphabetically order reference to include files, "N" - "S". 2006-07-11 17:26:59 +00:00
cffd89ca73 Revise the planner's handling of "pseudoconstant" WHERE clauses, that is
clauses containing no variables and no volatile functions.  Such a clause
can be used as a one-time qual in a gating Result plan node, to suppress
plan execution entirely when it is false.  Even when the clause is true,
putting it in a gating node wins by avoiding repeated evaluation of the
clause.  In previous PG releases, query_planner() would do this for
pseudoconstant clauses appearing at the top level of the jointree, but
there was no ability to generate a gating Result deeper in the plan tree.
To fix it, get rid of the special case in query_planner(), and instead
process pseudoconstant clauses through the normal RestrictInfo qual
distribution mechanism.  When a pseudoconstant clause is found attached to
a path node in create_plan(), pull it out and generate a gating Result at
that point.  This requires special-casing pseudoconstants in selectivity
estimation and cost_qual_eval, but on the whole it's pretty clean.
It probably even makes the planner a bit faster than before for the normal
case of no pseudoconstants, since removing pull_constant_clauses saves one
useless traversal of the qual tree.  Per gripe from Phil Frost.
2006-07-01 18:38:33 +00:00
1c1ecd5124 Improve planner estimates for size of tuple hash tables. 2006-06-28 20:04:38 +00:00
f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
8b109ebf14 Teach planner to convert simple UNION ALL subqueries into append relations,
thereby sharing code with the inheritance case.  This puts the UNION-ALL-view
approach to partitioned tables on par with inheritance, so far as constraint
exclusion is concerned: it works either way.  (Still need to update the docs
to say so.)  The definition of "simple UNION ALL" is a little simpler than
I would like --- basically the union arms can only be SELECT * FROM foo
--- but it's good enough for partitioned-table cases.
2006-02-03 21:08:49 +00:00
8a1468af4e Restructure planner's handling of inheritance. Rather than processing
inheritance trees on-the-fly, which pretty well constrained us to considering
only one way of planning inheritance, expand inheritance sets during the
planner prep phase, and build a side data structure that can be consulted
later to find which RTEs are members of which inheritance sets.  As proof of
concept, use the data structure to plan joins against inheritance sets more
efficiently: we can now use indexes on the set members in inner-indexscan
joins.  (The generated plans could be improved further, but it'll take some
executor changes.)  This data structure will also support handling UNION ALL
subqueries in the same way as inheritance sets, but that aspect of it isn't
finished yet.
2006-01-31 21:39:25 +00:00