scanning; GiST and GIN do not, and it seems like too much trouble to make
them do so. By teaching ExecSupportsBackwardScan() about this restriction,
we ensure that the planner will protect a scroll cursor from the problem
by adding a Materialize node.
In passing, fix another longstanding bug in the same area: backwards scan of
a plan with set-returning functions in the targetlist did not work either,
since the TupFromTlist expansion code pays no attention to direction (and
has no way to run a SRF backwards anyway). Again the fix is to make
ExecSupportsBackwardScan check this restriction.
Also adjust the index AM API specification to note that mark/restore support
is unnecessary if the AM can't produce ordered output.
In the previous coding, the list of columns that needed to be hashed on
was allocated in the per-query context, but we reallocated every time
the Agg node was rescanned. Since this information doesn't change over
a rescan, just construct the list of columns once during ExecInitAgg().
according to the TupleDesc's natts, not the number of physical columns in the
tuple. The previous coding would do the wrong thing in cases where natts is
different from the tuple's column count: either incorrectly report error when
it should just treat the column as null, or actually crash due to indexing off
the end of the TupleDesc's attribute array. (The second case is probably not
possible in modern PG versions, due to more careful handling of inheritance
cases than we once had. But it's still a clear lack of robustness here.)
The incorrect error indication is ignored by all callers within the core PG
distribution, so this bug has no symptoms visible within the core code, but
it might well be an issue for add-on packages. So patch all the way back.
RecursiveUnion to which it refers. It turns out that we can just postpone the
relevant initialization steps until the first exec call for the node, by which
time the ancestor node must surely be initialized. Per report from Greg Stark.
implementation uses an in-memory hash table, so it will poop out for very
large recursive results ... but the performance characteristics of a
sort-based implementation would be pretty unpleasant too.
There are some unimplemented aspects: recursive queries must use UNION ALL
(should allow UNION too), and we don't have SEARCH or CYCLE clauses.
These might or might not get done for 8.4, but even without them it's a
pretty useful feature.
There are also a couple of small loose ends and definitional quibbles,
which I'll send a memo about to pgsql-hackers shortly. But let's land
the patch now so we can get on with other development.
Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
This facility replaces the former mark/restore support but is otherwise
upward-compatible with previous uses. It's expected to be needed for
single evaluation of CTEs and also for window functions, so I'm committing
it separately instead of waiting for either one of those patches to be
finished. Per discussion with Greg Stark and Hitoshi Harada.
Note: I removed nodeFunctionscan's mark/restore support, instead of bothering
to update it for this change, because it was dead code anyway.
we regenerate the SQL query text not merely the plan derived from it. This
is needed to handle contingencies such as renaming of a table or column
used in an FK. Pre-8.3, such cases worked despite the lack of replanning
(because the cached plan needn't actually change), so this is a regression.
Per bug #4417 from Benjamin Bihler.
GetOldestXmin() instead of RecentGlobalXmin; this is safer because we do not
depend on the latter being correctly set elsewhere, and while it is more
expensive, this code path is not performance-critical. This is a real
risk for autovacuum, because it can execute whole cycles without doing
a single vacuum, which would mean that RecentGlobalXmin would stay at its
initialization value, FirstNormalTransactionId, causing a bogus value to be
inserted in pg_database. This bug could explain some recent reports of
failure to truncate pg_clog.
At the same time, change the initialization of RecentGlobalXmin to
InvalidTransactionId, and ensure that it's set to something else whenever
it's going to be used. Using it as FirstNormalTransactionId in HOT page
pruning could incur in data loss. InitPostgres takes care of setting it
to a valid value, but the extra checks are there to prevent "special"
backends from behaving in unusual ways.
Per Tom Lane's detailed problem dissection in 29544.1221061979@sss.pgh.pa.us
into nodes/nodeFuncs, so as to reduce wanton cross-subsystem #includes inside
the backend. There's probably more that should be done along this line,
but this is a start anyway.
checks in ExecIndexBuildScanKeys() that were inadequate anyway: it's better
to verify the correct varno on an expected index key, not just reject OUTER
and INNER.
This makes the entire current contents of nodeFuncs.c dead code. I'll be
replacing it with some other stuff later, as per recent proposal.
subqueries into the same thing you'd have gotten from IN (except always with
unknownEqFalse = true, so as to get the proper semantics for an EXISTS).
I believe this fixes the last case within CVS HEAD in which an EXISTS could
give worse performance than an equivalent IN subquery.
The tricky part of this is that if the upper query probes the EXISTS for only
a few rows, the hashing implementation can actually be worse than the default,
and therefore we need to make a cost-based decision about which way to use.
But at the time when the planner generates plans for subqueries, it doesn't
really know how many times the subquery will be executed. The least invasive
solution seems to be to generate both plans and postpone the choice until
execution. Therefore, in a query that has been optimized this way, EXPLAIN
will show two subplans for the EXISTS, of which only one will actually get
executed.
There is a lot more that could be done based on this infrastructure: in
particular it's interesting to consider switching to the hash plan if we start
out using the non-hashed plan but find a lot more upper rows going by than we
expected. I have therefore left some minor inefficiencies in place, such as
initializing both subplans even though we will currently only use one.
match in antijoin mode, we should advance to next outer tuple not next inner.
We know we don't want to return this outer tuple, and there is no point in
advancing over matching inner tuples now, because we'd just have to do it
again if the next outer tuple has the same merge key. This makes a noticeable
difference if there are lots of duplicate keys in both inputs.
Similarly, after finding a match in semijoin mode, arrange to advance to
the next outer tuple after returning the current match; or immediately,
if it fails the extra quals. The rationale is the same. (This is a
performance bug in existing releases; perhaps worth back-patching? The
planner tries to avoid using mergejoin with lots of duplicates, so it may
not be a big issue in practice.)
Nestloop and hash got this right to start with, but I made some cosmetic
adjustments there to make the corresponding bits of logic look more similar.
the old JOIN_IN code, but antijoins are new functionality.) Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins. Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation. That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch. So for the moment the output size estimates for semi and
especially anti joins are quite bogus.
INSERT or UPDATE will match the target table's current rowtype. In pre-8.3
releases inconsistency can arise with stale cached plans, as reported by
Merlin Moncure. (We patched the equivalent hazard on the SELECT side in Feb
2007; I'm not sure why we thought there was no risk on the insertion side.)
In 8.3 and HEAD this problem should be impossible due to plan cache
invalidation management, but it seems prudent to make the check anyway.
Back-patch as far as 8.0. 7.x versions lack ALTER COLUMN TYPE, so there
seems no way to abuse a stale plan comparably.
hashtable entries for tuples that are found only in the second input: they
can never contribute to the output. Furthermore, this implies that the
planner should endeavor to put first the smaller (in number of groups) input
relation for an INTERSECT. Implement that, and upgrade prepunion's estimation
of the number of rows returned by setops so that there's some amount of sanity
in the estimate of which one is smaller.
This completes my project of improving usage of hashing for duplicate
elimination (aggregate functions with DISTINCT remain undone, but that's
for some other day).
As with the previous patches, this means we can INTERSECT/EXCEPT on datatypes
that can hash but not sort, and it means that INTERSECT/EXCEPT without ORDER
BY are no longer certain to produce sorted output.
would work, but in fact it didn't return the same rows when moving backwards
as when moving forwards. This would have no visible effect in a DISTINCT
query (at least assuming the column datatypes use a strong definition of
equality), but it gave entirely wrong answers for DISTINCT ON queries.
as per my recent proposal:
1. Fold SortClause and GroupClause into a single node type SortGroupClause.
We were already relying on them to be struct-equivalent, so using two node
tags wasn't accomplishing much except to get in the way of comparing items
with equal().
2. Add an "eqop" field to SortGroupClause to carry the associated equality
operator. This is cheap for the parser to get at the same time it's looking
up the sort operator, and storing it eliminates the need for repeated
not-so-cheap lookups during planning. In future this will also let us
represent GROUP/DISTINCT operations on datatypes that have hash opclasses
but no btree opclasses (ie, they have equality but no natural sort order).
The previous representation simply didn't work for that, since its only
indicator of comparison semantics was a sort operator.
3. Add a hasDistinctOn boolean to struct Query to explicitly record whether
the distinctClause came from DISTINCT or DISTINCT ON. This allows removing
some complicated and not 100% bulletproof code that attempted to figure
that out from the distinctClause alone.
This patch doesn't in itself create any new capability, but it's necessary
infrastructure for future attempts to use hash-based grouping for DISTINCT
and UNION/INTERSECT/EXCEPT.
filter to be used when INSERT or SELECT INTO has a plan that returns raw
disk tuples. The virtual-tuple-slot optimizations that were put in place
awhile ago mean that ExecInsert has to do ExecMaterializeSlot, and that
already copies the tuple if it's raw (and does so more efficiently than
a junk filter, too). So get rid of that logic. This in turn means that
we can throw away ExecMayReturnRawTuples, which wasn't used for any other
purpose, and was always a kluge anyway.
In passing, move a couple of SELECT-INTO-specific fields out of EState
and into the private state of the SELECT INTO DestReceiver, as was foreseen
in an old comment there. Also make intorel_receive use ExecMaterializeSlot
not ExecCopySlotTuple, for consistency with ExecInsert and to possibly save
a tuple copy step in some cases.
a portal are never NULL, but reliably provide the source text of the query.
It turns out that there was only one place that was really taking a short-cut,
which was the 'EXECUTE' utility statement. That doesn't seem like a
sufficiently critical performance hotspot to justify not offering a guarantee
of validity of the portal source text. Fix it to copy the source text over
from the cached plan. Add Asserts in the places that set up cached plans and
portals to reject null source strings, and simplify a bunch of places that
formerly needed to guard against nulls.
There may be a few places that cons up statements for execution without
having any source text at all; I found one such in ConvertTriggerToFK().
It seems sufficient to inject a phony source string in such a case,
for instance
ProcessUtility((Node *) atstmt,
"(generated ALTER TABLE ADD FOREIGN KEY command)",
NULL, false, None_Receiver, NULL);
We should take a second look at the usage of debug_query_string,
particularly the recently added current_query() SQL function.
ITAGAKI Takahiro and Tom Lane
bug #4290. The fundamental bug is that masking extParam by outer_params,
as finalize_plan had been doing, caused us to lose the information that
an initPlan depended on the output of a sibling initPlan. On reflection
the best thing to do seemed to be not to try to adjust outer_params for
this case but get rid of it entirely. The only thing it was really doing
for us was to filter out param IDs associated with SubPlan nodes, and that
can be done (with greater accuracy) while processing individual SubPlan
nodes in finalize_primnode. This approach was vindicated by the discovery
that the masking method was hiding a second bug: SS_finalize_plan failed to
remove extParam bits for initPlan output params that were referenced in the
main plan tree (it only got rid of those referenced by other initPlans).
It's not clear that this caused any real problems, given the limited use
of extParam by the executor, but it's certainly not what was intended.
I originally thought that there was also a problem with needing to include
indirect dependencies on external params in initPlans' param sets, but it
turns out that the executor handles this correctly so long as the depended-on
initPlan is earlier in the initPlans list than the one using its output.
That seems a bit of a fragile assumption, but it is true at the moment,
so I just documented it in some code comments rather than making what would
be rather invasive changes to remove the assumption.
Back-patch to 8.1. Previous versions don't have the case of initPlans
referring to other initPlans' outputs, so while the existing logic is still
questionable for them, there are not any known bugs to be fixed. So I'll
refrain from changing them for now.
corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
the PARAM_FLAG_CONST flag on the parameters that are passed into the portal,
while the former's behavior is unchanged. This should only affect the case
where the portal is executing an EXPLAIN; it will cause the generated plan to
look more like what would be generated if the portal were actually executing
the command being explained. Per gripe from Pavel.
functions.
Note that because this patch changes FmgrInfo, any external C functions
you might be testing with 8.4 will need to be recompiled.
Patch by Martin Pihlak, some editorialization by me (principally, removing
tracking of getrusage() numbers)
file portability/instr_time.h, and add a couple more macros to eliminate
some abstraction leakage we formerly had. Also update psql to use this
header instead of its own copy of nearly the same code.
This commit in itself is just code cleanup and shouldn't change anything.
It lays some groundwork for the upcoming function-stats patch, though.
There are two ways to track a snapshot: there's the "registered" list, which
is used for arbitrary long-lived snapshots; and there's the "active stack",
which is used for the snapshot that is considered "active" at any time.
This also allows users of snapshots to stop worrying about snapshot memory
allocation and freeing, and about using PG_TRY blocks around ActiveSnapshot
assignment. This is all done automatically now.
As a consequence, this allows us to reset MyProc->xmin when there are no
more snapshots registered in the current backend, reducing the impact that
long-running transactions have on VACUUM.
unnecessary #include lines in it. Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.
For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.
While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
as those for inherited columns; that is, it's no longer allowed for a child
table to not have a check constraint matching one that exists on a parent.
This satisfies the principle of least surprise (rows selected from the parent
will always appear to meet its check constraints) and eliminates some
longstanding bogosity in pg_dump, which formerly had to guess about whether
check constraints were really inherited or not.
The implementation involves adding conislocal and coninhcount columns to
pg_constraint (paralleling attislocal and attinhcount in pg_attribute)
and refactoring various ALTER TABLE actions to be more like those for
columns.
Alex Hunsaker, Nikhil Sontakke, Tom Lane
a user-supplied TID is out of range for the relation. This is needed to
preserve compatibility with our pre-8.3 behavior, and it is sensible anyway
since if the query were implemented by brute force rather than optimized
into a TidScan, the behavior for a non-existent TID would be zero rows out,
never an error. Per gripe from Gurjeet Singh.
UPDATE/SHARE couldn't occur as a subquery in a query with a non-SELECT
top-level operation. Symptoms included outright failure (as in report from
Mark Mielke) and silently neglecting to take the requested row locks.
Back-patch to 8.3, because the visible failure in the INSERT ... SELECT case
is a regression from 8.2. I'm a bit hesitant to back-patch further given the
lack of field complaints.
no particular need to do get_op_opfamily_properties() while building an
indexscan plan. Postpone that lookup until executor start. This simplifies
createplan.c a lot more than it complicates nodeIndexscan.c, and makes things
more uniform since we already had to do it that way for RowCompare
expressions. Should be a bit faster too, at least for plans that aren't
re-used many times, since we avoid palloc'ing and perhaps copying the
intermediate list data structure.
instead of plan time. Extend the amgettuple API so that the index AM returns
a boolean indicating whether the indexquals need to be rechecked, and make
that rechecking happen in nodeIndexscan.c (currently the only place where
it's expected to be needed; other callers of index_getnext are just erroring
out for now). For the moment, GIN and GIST have stub logic that just always
sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing
that control down to the opclass consistent() functions. The planner no
longer pays any attention to amopreqcheck, and that catalog column will go
away in due course.
indexscan always occurs in one call, and the results are returned in a
TIDBitmap instead of a limited-size array of TIDs. This should improve
speed a little by reducing AM entry/exit overhead, and it is necessary
infrastructure if we are ever to support bitmap indexes.
In an only slightly related change, add support for TIDBitmaps to preserve
(somewhat lossily) the knowledge that particular TIDs reported by an index
need to have their quals rechecked when the heap is visited. This facility
is not really used yet; we'll need to extend the forced-recheck feature to
plain indexscans before it's useful, and that hasn't been coded yet.
The intent is to use it to clean up 8.3's horrid @@@ kluge for text search
with weighted queries. There might be other uses in future, but that one
alone is sufficient reason.
Heikki Linnakangas, with some adjustments by me.
responsible for copying the query string into the new Portal. Such copying
is unnecessary in the common code path through exec_simple_query, and in
this case it can be enormously expensive because the string might contain
a large number of individual commands; we were copying the entire, long
string for each command, resulting in O(N^2) behavior for N commands.
(This is the cause of bug #4079.) A second problem with it is that
PortalDefineQuery really can't risk error, because if it elog's before
having set up the Portal, we will leak the plancache refcount that the
caller is trying to hand off to the portal. So go back to the design in
which the caller is responsible for making sure everything is copied into
the portal if necessary.
that is commands that have out-of-line parameters but the plan is prepared
assuming that the parameter values are constants. This is needed for the
plpgsql EXECUTE USING patch, but will probably have use elsewhere.
This commit includes the SPI functions and documentation, but no callers
nor regression tests. The upcoming EXECUTE USING patch will provide
regression-test coverage. I thought committing this separately made
sense since it's logically a distinct feature.
snapmgmt.c file for the former. The header files have also been reorganized
in three parts: the most basic snapshot definitions are now in a new file
snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c.
tqual.h has been reduced to the bare minimum.
This patch is just a first step towards managing live snapshots within a
transaction; there is no functionality change.
Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and
subsequent discussion.
strings. This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString. A number of
existing macros that provided variants on these themes were removed.
Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed. There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).
This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.
Brendan Jurd and Tom Lane
identical to tuplestore_puttuple(), except it operates on arrays of
Datums + nulls rather than a fully-formed HeapTuple. In several places
that use the tuplestore API, this means we can avoid creating a
HeapTuple altogether, saving a copy.
are declared to return set, and consist of just a single SELECT. We
can replace the FROM-item with a sub-SELECT and then optimize much as
if we were dealing with a view. Patch from Richard Rowell, cleaned up
by me.