The SQL standard does not have general functions-in-FROM, but it does
allow UNNEST() there (see the <collection derived table> production),
and the semantics of that are defined to include lateral references.
So spec compliance requires allowing lateral references within UNNEST()
even without an explicit LATERAL keyword. Rather than making UNNEST()
a special case, it seems best to extend this flexibility to any
function-in-FROM. We'll still allow LATERAL to be written explicitly
for clarity's sake, but it's now a noise word in this context.
In theory this change could result in a change in behavior of existing
queries, by allowing what had been an outer reference in a function-in-FROM
to be captured by an earlier FROM-item at the same level. However, all
pre-9.3 PG releases have a bug that causes them to match variable
references to earlier FROM-items in preference to outer references (and
then throw an error). So no previously-working query could contain the
type of ambiguity that would risk a change of behavior.
Per a suggestion from Andrew Gierth, though I didn't use his patch.
Formerly we relied on checking after-the-fact to see if an expression
contained aggregates, window functions, or sub-selects when it shouldn't.
This is grotty, easily forgotten (indeed, we had forgotten to teach
DefineIndex about rejecting window functions), and none too efficient
since it requires extra traversals of the parse tree. To improve matters,
define an enum type that classifies all SQL sub-expressions, store it in
ParseState to show what kind of expression we are currently parsing, and
make transformAggregateCall, transformWindowFuncCall, and transformSubLink
check the expression type and throw error if the type indicates the
construct is disallowed. This allows removal of a large number of ad-hoc
checks scattered around the code base. The enum type is sufficiently
fine-grained that we can still produce error messages of at least the
same specificity as before.
Bringing these error checks together revealed that we'd been none too
consistent about phrasing of the error messages, so standardize the wording
a bit.
Also, rewrite checking of aggregate arguments so that it requires only one
traversal of the arguments, rather than up to three as before.
In passing, clean up some more comments left over from add_missing_from
support, and annotate some tests that I think are dead code now that that's
gone. (I didn't risk actually removing said dead code, though.)
Now that we are storing structs in these lists, the distinction between
the two lists can be represented with a couple of extra flags while using
only a single list. This simplifies the code and should save a little
bit of palloc traffic, since the majority of RTEs are represented in both
lists anyway.
This patch implements the standard syntax of LATERAL attached to a
sub-SELECT in FROM, and also allows LATERAL attached to a function in FROM,
since set-returning function calls are expected to be one of the principal
use-cases.
The main change here is a rewrite of the mechanism for keeping track of
which relations are visible for column references while the FROM clause is
being scanned. The parser "namespace" lists are no longer lists of bare
RTEs, but are lists of ParseNamespaceItem structs, which carry an RTE
pointer as well as some visibility-controlling flags. Aside from
supporting LATERAL correctly, this lets us get rid of the ancient hacks
that required rechecking subqueries and JOIN/ON and function-in-FROM
expressions for invalid references after they were initially parsed.
Invalid column references are now always correctly detected on sight.
In passing, remove assorted parser error checks that are now dead code by
virtue of our having gotten rid of add_missing_from, as well as some
comments that are obsolete for the same reason. (It was mainly
add_missing_from that caused so much fudging here in the first place.)
The planner support for this feature is very minimal, and will be improved
in future patches. It works well enough for testing purposes, though.
catversion bump forced due to new field in RangeTblEntry.
Fix loss of previous expression-simplification work when a transform
function fires: we must not simply revert to untransformed input tree.
Instead build a dummy FuncExpr node to pass to the transform function.
This has the additional advantage of providing a simpler, more uniform
API for transform functions.
Move documentation to a somewhat less buried spot, relocate some
poorly-placed code, be more wary of null constants and invalid typmod
values, add an opr_sanity check on protransform function signatures,
and some other minor cosmetic adjustments.
Note: although this patch touches pg_proc.h, no need for catversion
bump, because the changes are cosmetic and don't actually change the
intended catalog contents.
Making this operation look like a utility statement seems generally a good
idea, and particularly so in light of the desire to provide command
triggers for utility statements. The original choice of representing it as
SELECT with an IntoClause appendage had metastasized into rather a lot of
places, unfortunately, so that this patch is a great deal more complicated
than one might at first expect.
In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS
subcommands required restructuring some EXPLAIN-related APIs. Add-on code
that calls ExplainOnePlan or ExplainOneUtility, or uses
ExplainOneQuery_hook, will need adjustment.
Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO,
which formerly were accepted though undocumented, are no longer accepted.
The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE.
The CREATE RULE case doesn't seem to have much real-world use (since the
rule would work only once before failing with "table already exists"),
so we'll not bother with that one.
Both SELECT INTO and CREATE TABLE AS still return a command tag of
"SELECT nnnn". There was some discussion of returning "CREATE TABLE nnnn",
but for the moment backwards compatibility wins the day.
Andres Freund and Tom Lane
walsender.h should depend on xlog.h, not vice versa. (Actually, the
inclusion was circular until a couple hours ago, which was even sillier;
but Bruce broke it in the expedient rather than logically correct
direction.) Because of that poor decision, plus blind application of
pgrminclude, we had a situation where half the system was depending on
xlog.h to include such unrelated stuff as array.h and guc.h. Clean up
the header inclusion, and manually revert a lot of what pgrminclude had
done so things build again.
This episode reinforces my feeling that pgrminclude should not be run
without adult supervision. Inclusion changes in header files in particular
need to be reviewed with great care. More generally, it'd be good if we
had a clearer notion of module layering to dictate which headers can sanely
include which others ... but that's a big task for another day.
Initially, we use this only to eliminate calls to the varchar()
function in cases where the length is not being reduced and, therefore,
the function call is equivalent to a RelabelType operation. The most
significant effect of this is that we can avoid a table rewrite when
changing a varchar(X) column to a varchar(Y) column, where Y > X.
Noah Misch, reviewed by me and Alexey Klyukin
All expression nodes now have an explicit output-collation field, unless
they are known to only return a noncollatable data type (such as boolean
or record). Also, nodes that can invoke collation-aware functions store
a separate field that is the collation value to pass to the function.
This avoids confusion that arises when a function has collatable inputs
and noncollatable output type, or vice versa.
Also, replace the parser's on-the-fly collation assignment method with
a post-pass over the completed expression tree. This allows us to use
a more complex (and hopefully more nearly spec-compliant) assignment
rule without paying for it in extra storage in every expression node.
Fix assorted bugs in the planner's handling of collations by making
collation one of the defining properties of an EquivalenceClass and
by converting CollateExprs into discardable RelabelType nodes during
expression preprocessing.
This patch implements data-modifying WITH queries according to the
semantics that the updates all happen with the same command counter value,
and in an unspecified order. Therefore one WITH clause can't see the
effects of another, nor can the outer query see the effects other than
through the RETURNING values. And attempts to do conflicting updates will
have unpredictable results. We'll need to document all that.
This commit just fixes the code; documentation updates are waiting on
author.
Marko Tiikkaja and Hitoshi Harada
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
The core of this patch is hash_array() and associated typcache
infrastructure, which works just about exactly like the existing support
for array comparison.
In addition I did some work to ensure that the planner won't think that an
array type is hashable unless its element type is hashable, and similarly
for sorting. This includes adding a datatype parameter to op_hashjoinable
and op_mergejoinable, and adding an explicit "hashable" flag to
SortGroupClause. The lack of a cross-check on the element type was a
pre-existing bug in mergejoin support --- but it didn't matter so much
before, because if you couldn't sort the element type there wasn't any good
alternative to failing anyhow. Now that we have the alternative of hashing
the array type, there are cases where we can avoid a failure by being picky
at the planner stage, so it's time to be picky.
The issue of exactly how to combine the per-element hash values to produce
an array hash is still open for discussion, but the rest of this is pretty
solid, so I'll commit it as-is.
any implicit casting previously applied to the targetlist item. This is
reasonable because the implicit cast, by definition, wasn't written by the
user; so we are preserving the expected behavior that ORDER BY items match
textually equivalent tlist items. The case never arose before because there
couldn't be any implicit casting of a top-level SELECT item before we process
ORDER BY etc. But now it can arise in the context of aggregates containing
ORDER BY clauses, since the "targetlist" is the already-casted list of
arguments for the aggregate. The net effect is that the datatype used for
ORDER BY/DISTINCT purposes is the aggregate's declared input type, not that
of the original input column; which is a bit debatable but not horrendous,
and to do otherwise would require major rework that doesn't seem justified.
Per bug #5564 from Daniel Grace. Back-patch to 9.0 where aggregate ORDER BY
was implemented.
This patch allows the frame to start from CURRENT ROW (in either RANGE or
ROWS mode), and it also adds support for ROWS n PRECEDING and ROWS n FOLLOWING
start and end points. (RANGE value PRECEDING/FOLLOWING isn't there yet ---
the grammar works, but that's all.)
Hitoshi Harada, reviewed by Pavel Stehule
of shared or nailed system catalogs. This has two key benefits:
* The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.
* We no longer have to use an unsafe reindex-in-place approach for reindexing
shared catalogs.
CLUSTER on nailed catalogs now works too, although I left it disabled on
shared catalogs because the resulting pg_index.indisclustered update would
only be visible in one database.
Since reindexing shared system catalogs is now fully transactional and
crash-safe, the former special cases in REINDEX behavior have been removed;
shared catalogs are treated the same as non-shared.
This commit does not do anything about the recently-discussed problem of
deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
concurrent queries; will address that in a separate patch. As a stopgap,
parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
such failures during the regression tests.
non-kluge method for controlling the order in which values are fed to an
aggregate function. At the same time eliminate the old implementation
restriction that DISTINCT was only supported for single-argument aggregates.
Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
dropped null values of x unconditionally. Now, it does so only if the
agg transition function is strict; otherwise nulls are treated as DISTINCT
normally would, ie, you get one copy.
Andrew Gierth, reviewed by Hitoshi Harada
for example in
WITH w AS (SELECT * FROM foo) SELECT * FROM w, bar ... FOR UPDATE
the FOR UPDATE will now affect bar but not foo. This is more useful and
consistent than the original 8.4 behavior, which tried to propagate FOR UPDATE
into the WITH query but always failed due to assorted implementation
restrictions. Even though we are in process of removing those restrictions,
it seems correct on philosophical grounds to not let the outer query's
FOR UPDATE affect the WITH query.
In passing, fix isLockedRel which frequently got things wrong in
nested-subquery cases: "FOR UPDATE OF foo" applies to an alias foo in the
current query level, not subqueries. This has been broken for a long time,
but it doesn't seem worth back-patching further than 8.4 because the actual
consequences are minimal. At worst the parser would sometimes get
RowShareLock on a relation when it should be AccessShareLock or vice versa.
That would only make a difference if someone were using ExclusiveLock
concurrently, which no standard operation does, and anyway FOR UPDATE
doesn't result in visible changes so it's not clear that the someone would
notice any problem. Between that and the fact that FOR UPDATE barely works
with subqueries at all in existing releases, I'm not excited about worrying
about it.
code was already okay with this, but the hack that obtained the output
column types of a recursive union in advance of doing real parse analysis
of the recursive union forgot to handle the case where there was an inner
WITH clause available to the non-recursive term. Best fix seems to be to
refactor so that we don't need the "throwaway" parse analysis step at all.
Instead, teach the transformSetOperationStmt code to set up the CTE's output
column information after it's processed the non-recursive term normally.
Per report from David Fetter.
so that their elements are always taken as simple expressions over the
query's input columns. It originally seemed like a good idea to make them
act exactly like GROUP BY and ORDER BY, right down to the SQL92-era behavior
of accepting output column names or numbers. However, that was not such a
great idea, for two reasons:
1. It permits circular references, as exhibited in bug #5018: the output
column could be the one containing the window function itself. (We actually
had a regression test case illustrating this, but nobody thought twice about
how confusing that would be.)
2. It doesn't seem like a good idea for, eg, "lead(foo) OVER (ORDER BY foo)"
to potentially use two completely different meanings for "foo".
Accordingly, narrow down the behavior of window clauses to use only the
SQL99-compliant interpretation that the expressions are simple expressions.
This alters various incidental uses of C++ key words to use other similar
identifiers, so that a C++ compiler won't choke outright. You still
(probably) need extern "C" { }; around the inclusion of backend headers.
based on a patch by Kurt Harriman <harriman@acm.org>
Also add a script cpluspluscheck to check for C++ compatibility in the
future. As of right now, this passes without error for me.
of adding optional namespace and action fields to DefElem. Having three
node types that do essentially the same thing bloats the code and leads
to errors of confusion, such as in yesterday's bug report from Khee Chin.
qualifier, and add support for this in pg_dump.
This allows TOAST tables to have user-defined fillfactor, and will also
enable us to move the autovacuum parameters to reloptions without taking
away the possibility of setting values for TOAST tables.
patch. This includes the ability to force the frame to cover the whole
partition, and the ability to make the frame end exactly on the current row
rather than its last ORDER BY peer. Supporting any more of the full SQL
frame-clause syntax will require nontrivial hacking on the window aggregate
code, so it'll have to wait for 8.5 or beyond.
well as regular tables. Per discussion, this seems necessary to meet the
principle of least astonishment.
In passing, simplify the error messages in warnAutoRange(). Now that we
have parser error position info for these errors, it doesn't seem very
useful to word the error message differently depending on whether we are
inside a sub-select or not.
There are some unimplemented aspects: recursive queries must use UNION ALL
(should allow UNION too), and we don't have SEARCH or CYCLE clauses.
These might or might not get done for 8.4, but even without them it's a
pretty useful feature.
There are also a couple of small loose ends and definitional quibbles,
which I'll send a memo about to pgsql-hackers shortly. But let's land
the patch now so we can get on with other development.
Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
SELECT foo.*) so that it cannot be confused with a quoted identifier "*".
Instead create a separate node type A_Star to represent this notation.
Per pgsql-hackers discussion of 2007-Sep-27.
most node types used in expression trees (both before and after parse
analysis). This allows us to place an error cursor in many situations
where we formerly could not, because the information wasn't available
beyond the very first level of parse analysis. There's a fair amount
of work still to be done to persuade individual ereport() calls to actually
include an error location, but this gets the initdb-forcing part of the
work out of the way; and the situation is already markedly better than
before for complaints about unimplementable implicit casts, such as
CASE and UNION constructs with incompatible alternative data types.
Per my proposal of a few days ago.
into nodes/nodeFuncs, so as to reduce wanton cross-subsystem #includes inside
the backend. There's probably more that should be done along this line,
but this is a start anyway.
but seem like a separate patch since most of the remaining work is on the
executor side.) I took the opportunity to push selection of the grouping
operators for set operations into the parser where it belongs. Otherwise this
is just a small exercise in making prepunion.c consider both alternatives.
As with the recent DISTINCT patch, this means we can UNION on datatypes that
can hash but not sort, and it means that UNION without ORDER BY is no longer
certain to produce sorted output.
as methods for implementing the DISTINCT step. This eliminates the former
performance gap between DISTINCT and GROUP BY, and also makes it possible
to do SELECT DISTINCT on datatypes that only support hashing not sorting.
SELECT DISTINCT ON is still always implemented by sorting; it would take
executor changes to support hashing that, and it's not clear it's worth
the trouble.
This is a release-note-worthy incompatibility from previous PG versions,
since SELECT DISTINCT can no longer be counted on to deliver sorted output
without explicitly saying ORDER BY. (Anyone who can't cope with that
can consider turning off enable_hashagg.)
Several regression test queries needed to have ORDER BY added to preserve
stable output order. I fixed the ones that manifested here, but there
might be some other cases that show up on other platforms.
sorting. The infrastructure for this was all in place already; it's only
necessary to fix the planner to not assume that sorting is always an available
option.
as per my recent proposal:
1. Fold SortClause and GroupClause into a single node type SortGroupClause.
We were already relying on them to be struct-equivalent, so using two node
tags wasn't accomplishing much except to get in the way of comparing items
with equal().
2. Add an "eqop" field to SortGroupClause to carry the associated equality
operator. This is cheap for the parser to get at the same time it's looking
up the sort operator, and storing it eliminates the need for repeated
not-so-cheap lookups during planning. In future this will also let us
represent GROUP/DISTINCT operations on datatypes that have hash opclasses
but no btree opclasses (ie, they have equality but no natural sort order).
The previous representation simply didn't work for that, since its only
indicator of comparison semantics was a sort operator.
3. Add a hasDistinctOn boolean to struct Query to explicitly record whether
the distinctClause came from DISTINCT or DISTINCT ON. This allows removing
some complicated and not 100% bulletproof code that attempted to figure
that out from the distinctClause alone.
This patch doesn't in itself create any new capability, but it's necessary
infrastructure for future attempts to use hash-based grouping for DISTINCT
and UNION/INTERSECT/EXCEPT.
to represent DISTINCT or DISTINCT ON. This gets rid of a longstanding
annoyance that a view or rule using SELECT DISTINCT will be dumped out
with an overspecified ORDER BY list, and is one small step along the way
to decoupling DISTINCT and ORDER BY enough so that hash-based implementation
of DISTINCT will be possible. In passing, improve transformDistinctClause
so that it doesn't reject duplicate DISTINCT ON items, as was reported by
Steve Midgley a couple weeks ago.
corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
This was probably protecting some implementation limitation when it was
put in, but as far as I can tell the planner and executor have no such
assumption anymore; the case seems to work fine. Per a gripe from
Grzegorz Jaskiewicz.