This patch adds the server infrastructure to support extensions.
There is still one significant loose end, namely how to make it play nice
with pg_upgrade, so I am not yet committing the changes that would make
all the contrib modules depend on this feature.
In passing, fix a disturbingly large amount of breakage in
AlterObjectNamespace() and callers.
Dimitri Fontaine, reviewed by Anssi Kääriäinen,
Itagaki Takahiro, Tom Lane, and numerous others
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
This feature allows a unique or pkey constraint to be created using an
already-existing unique index. While the constraint isn't very
functionally different from the bare index, it's nice to be able to do that
for documentation purposes. The main advantage over just issuing a plain
ALTER TABLE ADD UNIQUE/PRIMARY KEY is that the index can be created with
CREATE INDEX CONCURRENTLY, so that there is not a long interval where the
table is locked against updates.
On the way, refactor some of the code in DefineIndex() and index_create()
so that we don't have to pass through those functions in order to create
the index constraint's catalog entries. Also, in parse_utilcmd.c, pass
around the ParseState pointer in struct CreateStmtContext to save on
notation, and add error location pointers to some error reports that didn't
have one before.
Gurjeet Singh, reviewed by Steve Singer and Tom Lane
In an inherited UPDATE/DELETE, each target table has its own subplan,
because it might have a column set different from other targets. This
means that the resjunk columns we add to support EvalPlanQual might be
at different physical column numbers in each subplan. The EvalPlanQual
rewrite I did for 9.0 failed to account for this, resulting in possible
misbehavior or even crashes during concurrent updates to the same row,
as seen in a recent report from Gordon Shannon. Revise the data structure
so that we track resjunk column numbers separately for each subplan.
I also chose to move responsibility for identifying the physical column
numbers back to executor startup, instead of assuming that numbers derived
during preprocess_targetlist would stay valid throughout subsequent
massaging of the plan. That's a bit slower, so we might want to consider
undoing it someday; but it would complicate the patch considerably and
didn't seem justifiable in a bug fix that has to be back-patched to 9.0.
Foreign tables are a core component of SQL/MED. This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried. Support for foreign table scans will need to
be added in a future patch. However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.
Shigeru Hanada, heavily revised by Robert Haas
This commit replaces pg_class.relistemp with pg_class.relpersistence;
and also modifies the RangeVar node type to carry relpersistence rather
than istemp. It also removes removes rd_istemp from RelationData and
instead performs the correct computation based on relpersistence.
For clarity, we add three new macros: RelationNeedsWAL(),
RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we
can clarify the purpose of each check that previous depended on
rd_istemp.
This is intended as infrastructure for the upcoming unlogged tables
patch, as well as for future possible work on global temporary tables.
There are some code paths, such as SPI_execute(), where we invoke
copyObject() on raw parse trees before doing parse analysis on them. Since
the bison grammar is capable of building heavily nested parsetrees while
itself using only minimal stack depth, this means that copyObject() can be
the front-line function that hits stack overflow before anything else does.
Accordingly, it had better have a check_stack_depth() call. I did a bit of
performance testing and found that this slows down copyObject() by only a
few percent, so the hit ought to be negligible in the context of complete
processing of a query.
Per off-list report from Toshihide Katayama. Back-patch to all supported
branches.
This is a heavily revised version of builtin_knngist_core-0.9. The
ordering operators are no longer mixed in with actual quals, which would
have confused not only humans but significant parts of the planner.
Instead, ordering operators are carried separately throughout planning and
execution.
Since the API for ambeginscan and amrescan functions had to be changed
anyway, this commit takes the opportunity to rationalize that a bit.
RelationGetIndexScan no longer forces a premature index_rescan call;
instead, callers of index_beginscan must call index_rescan too. Aside from
making the AM-side initialization logic a bit less peculiar, this has the
advantage that we do not make a useless extra am_rescan call when there are
runtime key values. AMs formerly could not assume that the key values
passed to amrescan were actually valid; now they can.
Teodor Sigaev and Tom Lane
This commit adds columns amoppurpose and amopsortfamily to pg_amop, and
column amcanorderbyop to pg_am. For the moment all the entries in
amcanorderbyop are "false", since the underlying support isn't there yet.
Also, extend the CREATE OPERATOR CLASS/ALTER OPERATOR FAMILY commands with
[ FOR SEARCH | FOR ORDER BY sort_operator_family ] clauses to allow the new
columns of pg_amop to be populated, and create pg_dump support for dumping
that information.
I also added some documentation, although it's perhaps a bit premature
given that the feature doesn't do anything useful yet.
Teodor Sigaev, Robert Haas, Tom Lane
Per my recent proposal, get rid of all the direct inspection of indexes
and manual generation of paths in planagg.c. Instead, set up
EquivalenceClasses for the aggregate argument expressions, and let the
regular path generation logic deal with creating paths that can satisfy
those sort orders. This makes planagg.c a bit more visible to the rest
of the planner than it was originally, but the approach is basically a lot
cleaner than before. A major advantage of doing it this way is that we get
MIN/MAX optimization on inheritance trees (using MergeAppend of indexscans)
practically for free, whereas in the old way we'd have had to add a whole
lot more duplicative logic.
One small disadvantage of this approach is that MIN/MAX aggregates can no
longer exploit partial indexes having an "x IS NOT NULL" predicate, unless
that restriction or something that implies it is specified in the query.
The previous implementation was able to use the added "x IS NOT NULL"
condition as an extra predicate proof condition, but in this version we
rely entirely on indexes that are considered usable by the main planning
process. That seems a fair tradeoff for the simplicity and functionality
gained.
The core of this patch is hash_array() and associated typcache
infrastructure, which works just about exactly like the existing support
for array comparison.
In addition I did some work to ensure that the planner won't think that an
array type is hashable unless its element type is hashable, and similarly
for sorting. This includes adding a datatype parameter to op_hashjoinable
and op_mergejoinable, and adding an explicit "hashable" flag to
SortGroupClause. The lack of a cross-check on the element type was a
pre-existing bug in mergejoin support --- but it didn't matter so much
before, because if you couldn't sort the element type there wasn't any good
alternative to failing anyhow. Now that we have the alternative of hashing
the array type, there are cases where we can avoid a failure by being picky
at the planner stage, so it's time to be picky.
The issue of exactly how to combine the per-element hash values to produce
an array hash is still open for discussion, but the rest of this is pretty
solid, so I'll commit it as-is.
After much expenditure of effort, we've got this to the point where the
performance penalty is pretty minimal in typical cases.
Andrew Dunstan, reviewed by Brendan Jurd, Dean Rasheed, and Tom Lane
This is not the hoped-for facility of using INSERT/UPDATE/DELETE inside
a WITH, but rather the other way around. It seems useful in its own
right anyway.
Note: catversion bumped because, although the contents of stored rules
might look compatible, there's actually a subtle semantic change.
A single Query containing a WITH and INSERT...VALUES now represents
writing the WITH before the INSERT, not before the VALUES. While it's
not clear that that matters to anyone, it seems like a good idea to
have it cited in the git history for catversion.h.
Original patch by Marko Tiikkaja, with updating and cleanup by
Hitoshi Harada.
This patch eliminates the former need to sort the output of an Append scan
when an ordered scan of an inheritance tree is wanted. This should be
particularly useful for fast-start cases such as queries with LIMIT.
Original patch by Greg Stark, with further hacking by Hans-Jurgen Schonig,
Robert Haas, and Tom Lane.
This patch adds the SQL-standard concept of an INSTEAD OF trigger, which
is fired instead of performing a physical insert/update/delete. The
trigger function is passed the entire old and/or new rows of the view,
and must figure out what to do to the underlying tables to implement
the update. So this feature can be used to implement updatable views
using trigger programming style rather than rule hacking.
In passing, this patch corrects the names of some columns in the
information_schema.triggers view. It seems the SQL committee renamed
them somewhere between SQL:99 and SQL:2003.
Dean Rasheed, reviewed by Bernd Helmle; some additional hacking by me.
The point of a PlaceHolderVar is to allow a non-strict expression to be
evaluated below an outer join, after which its value bubbles up like a Var
and can be forced to NULL when the outer join's semantics require that.
However, there was a serious design oversight in that, namely that we
didn't ensure that there was actually a correct place in the plan tree
to evaluate the placeholder :-(. It may be necessary to delay evaluation
of an outer join to ensure that a placeholder that should be evaluated
below the join can be evaluated there. Per recent bug report from Kirill
Simonov.
Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.
This is intended as infrastructure to support integration with label-based
mandatory access control systems such as SE-Linux. Further changes (mostly
hooks) will be needed, but this is a big chunk of it.
KaiGai Kohei and Robert Haas
The implicitly created sequence was created as owned by the current user,
who could be different from the table owner, eg if current user is a
superuser or some member of the table's owning role. This caused sanity
checks in the SEQUENCE OWNED BY code to spit up. Although possibly we
don't need those sanity checks, the safest fix seems to be to make sure
the implicit sequence is assigned the same owner role as the table has.
(We still do all permissions checks as the current user, however.)
Per report from Josh Berkus.
Back-patch to 9.0. The bug goes back to the invention of SEQUENCE OWNED BY
in 8.2, but the fix requires an API change for DefineRelation(), which seems
to have potential for breaking third-party code if done in a minor release.
Given the lack of prior complaints, it's probably not worth fixing in the
stable branches.
_outPlannedStmt is only debug support, so the omission there was not very
serious, but the omission in _copyPlannedStmt is a real bug. The consequence
would be that a copied plan tree would never be marked as a transient plan,
so that we would forget we ought to replan it after some not-yet-ready index
becomes ready for use. This might explain some past complaints about indexes
created with CREATE INDEX CONCURRENTLY not being used right away. Problem
spotted by Yeb Havinga.
Back-patch to 8.3, where the field was added.
other columns to be referenced without listing them in GROUP BY, so long as
the primary key column(s) are listed in GROUP BY.
Eventually we should also allow functional dependency on a UNIQUE constraint
when the columns are marked NOT NULL, but that has to wait until NOT NULL
constraints are represented in pg_constraint, because we need to have
pg_constraint OIDs for all the conditions needed to ensure functional
dependency.
Peter Eisentraut, reviewed by Alex Hunsaker and Tom Lane
relation using the general PARAM_EXEC executor parameter mechanism, rather
than the ad-hoc kluge of passing the outer tuple down through ExecReScan.
The previous method was hard to understand and could never be extended to
handle parameters coming from multiple join levels. This patch doesn't
change the set of possible plans nor have any significant performance effect,
but it's necessary infrastructure for future generalization of the concept
of an inner indexscan plan.
ExecReScan's second parameter is now unused, so it's removed.
This operates in the same way as other CREATE OR REPLACE commands, ie,
it replaces everything but the ownership and ACL lists of an existing
entry, and requires the caller to have owner privileges for that entry.
While modifying an existing language has some use in development scenarios,
in typical usage all the "replaced" values come from pg_pltemplate so there
will be no actual change in the language definition. The reason for adding
this is mainly to allow programs to ensure that a language exists without
triggering an error if it already does exist.
This commit just adds and documents the new option. A followon patch
will use it to clean up some unpleasant cases in pg_dump and pg_regress.
In addition, add support for a "payload" string to be passed along with
each notify event.
This implementation should be significantly more efficient than the old one,
and is also more compatible with Hot Standby usage. There is not yet any
facility for HS slaves to receive notifications generated on the master,
although such a thing is possible in future.
Joachim Wieland, reviewed by Jeff Davis; also hacked on by me.
This patch allows the frame to start from CURRENT ROW (in either RANGE or
ROWS mode), and it also adds support for ROWS n PRECEDING and ROWS n FOLLOWING
start and end points. (RANGE value PRECEDING/FOLLOWING isn't there yet ---
the grammar works, but that's all.)
Hitoshi Harada, reviewed by Pavel Stehule
This patch only supports seq_page_cost and random_page_cost as parameters,
but it provides the infrastructure to scalably support many more.
In particular, we may want to add support for effective_io_concurrency,
but I'm leaving that as future work for now.
Thanks to Tom Lane for design help and Alvaro Herrera for the review.
8.2beta but never carried out. This avoids repetitive tests of whether the
argument is of scalar or composite type. Also, be a bit more paranoid about
composite arguments in some places where we previously weren't checking.
and teach ANALYZE to compute such stats for tables that have subclasses.
Per my proposal of yesterday.
autovacuum still needs to be taught about running ANALYZE on parent tables
when their subclasses change, but the feature is useful even without that.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
non-kluge method for controlling the order in which values are fed to an
aggregate function. At the same time eliminate the old implementation
restriction that DISTINCT was only supported for single-argument aggregates.
Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
dropped null values of x unconditionally. Now, it does so only if the
agg transition function is strict; otherwise nulls are treated as DISTINCT
normally would, ie, you get one copy.
Andrew Gierth, reviewed by Hitoshi Harada
support any indexable commutative operator, not just equality. Two rows
violate the exclusion constraint if "row1.col OP row2.col" is TRUE for
each of the columns in the constraint.
Jeff Davis, reviewed by Robert Haas
checked to determine whether the trigger should be fired.
For BEFORE triggers this is mostly a matter of spec compliance; but for AFTER
triggers it can provide a noticeable performance improvement, since queuing of
a deferred trigger event and re-fetching of the row(s) at end of statement can
be short-circuited if the trigger does not need to be fired.
Takahiro Itagaki, reviewed by KaiGai Kohei.
adopted for EXPLAIN. This will allow additional options to be implemented
in future without having to make them fully-reserved keywords. The old syntax
remains available for existing options, however.
Itagaki Takahiro
underneath the Limit node, not atop it. This fixes the old problem that such
a query might unexpectedly return fewer rows than the LIMIT says, due to
LockRows discarding updated rows.
There is a related problem that LockRows might destroy the sort ordering
produced by earlier steps; but fixing that by pushing LockRows below Sort
would create serious performance problems that are unjustified in many
real-world applications, as well as potential deadlock problems from locking
many more rows than expected. Instead, keep the present semantics of applying
FOR UPDATE after ORDER BY within a single query level; but allow the user to
specify the other way by writing FOR UPDATE in a sub-select. To make that
work, track whether FOR UPDATE appeared explicitly in sub-selects or got
pushed down from the parent, and don't flatten a sub-select that contained an
explicit FOR UPDATE.
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
are named in the UPDATE's SET list.
Note: the schema of pg_trigger has not actually changed; we've just started
to use a column that was there all along. catversion bumped anyway so that
this commit is included in the history of potentially interesting changes
to system catalog contents.
Itagaki Takahiro
execMain.c and into a new plan node type LockRows. Like the recent change
to put table updating into a ModifyTable plan node, this increases planning
flexibility by allowing the operations to occur below the top level of the
plan tree. It's necessary in any case to restore the previous behavior of
having FOR UPDATE locking occur before ModifyTable does.
This partially refactors EvalPlanQual to allow multiple rows-under-test
to be inserted into the EPQ machinery before starting an EPQ test query.
That isn't sufficient to fix EPQ's general bogosity in the face of plans
that return multiple rows per test row, though. Since this patch is
mostly about getting some plan node infrastructure in place and not about
fixing ten-year-old bugs, I will leave EPQ improvements for another day.
Another behavioral change that we could now think about is doing FOR UPDATE
before LIMIT, but that too seems like it should be treated as a followon
patch.
They are now handled by a new plan node type called ModifyTable, which is
placed at the top of the plan tree. In itself this change doesn't do much,
except perhaps make the handling of RETURNING lists and inherited UPDATEs a
tad less klugy. But it is necessary preparation for the intended extension of
allowing RETURNING queries inside WITH.
Marko Tiikkaja
Create a new catalog pg_db_role_setting where they are now stored, and better
encapsulate the code that deals with settings into its realm. The old
datconfig and rolconfig columns are removed.
psql has gained a \drds command to display the settings.
Backwards compatibility warning: while the backwards-compatible system views
still have the config columns, they no longer completely represent the
configuration for a user or database.
Catalog version bumped.
inheritance parent tables are compared using equal(), instead of doing
strcmp() on the nodeToString representation. The old implementation was
always a tad cheesy, and it finally fails completely as of 8.4, now that the
node tree might contain syntax location information. equal() knows it's
supposed to ignore those fields, but strcmp() hardly can. Per recent
report from Scott Ribe.