hashtable entries for tuples that are found only in the second input: they
can never contribute to the output. Furthermore, this implies that the
planner should endeavor to put first the smaller (in number of groups) input
relation for an INTERSECT. Implement that, and upgrade prepunion's estimation
of the number of rows returned by setops so that there's some amount of sanity
in the estimate of which one is smaller.
This completes my project of improving usage of hashing for duplicate
elimination (aggregate functions with DISTINCT remain undone, but that's
for some other day).
As with the previous patches, this means we can INTERSECT/EXCEPT on datatypes
that can hash but not sort, and it means that INTERSECT/EXCEPT without ORDER
BY are no longer certain to produce sorted output.
would work, but in fact it didn't return the same rows when moving backwards
as when moving forwards. This would have no visible effect in a DISTINCT
query (at least assuming the column datatypes use a strong definition of
equality), but it gave entirely wrong answers for DISTINCT ON queries.
as per my recent proposal:
1. Fold SortClause and GroupClause into a single node type SortGroupClause.
We were already relying on them to be struct-equivalent, so using two node
tags wasn't accomplishing much except to get in the way of comparing items
with equal().
2. Add an "eqop" field to SortGroupClause to carry the associated equality
operator. This is cheap for the parser to get at the same time it's looking
up the sort operator, and storing it eliminates the need for repeated
not-so-cheap lookups during planning. In future this will also let us
represent GROUP/DISTINCT operations on datatypes that have hash opclasses
but no btree opclasses (ie, they have equality but no natural sort order).
The previous representation simply didn't work for that, since its only
indicator of comparison semantics was a sort operator.
3. Add a hasDistinctOn boolean to struct Query to explicitly record whether
the distinctClause came from DISTINCT or DISTINCT ON. This allows removing
some complicated and not 100% bulletproof code that attempted to figure
that out from the distinctClause alone.
This patch doesn't in itself create any new capability, but it's necessary
infrastructure for future attempts to use hash-based grouping for DISTINCT
and UNION/INTERSECT/EXCEPT.
filter to be used when INSERT or SELECT INTO has a plan that returns raw
disk tuples. The virtual-tuple-slot optimizations that were put in place
awhile ago mean that ExecInsert has to do ExecMaterializeSlot, and that
already copies the tuple if it's raw (and does so more efficiently than
a junk filter, too). So get rid of that logic. This in turn means that
we can throw away ExecMayReturnRawTuples, which wasn't used for any other
purpose, and was always a kluge anyway.
In passing, move a couple of SELECT-INTO-specific fields out of EState
and into the private state of the SELECT INTO DestReceiver, as was foreseen
in an old comment there. Also make intorel_receive use ExecMaterializeSlot
not ExecCopySlotTuple, for consistency with ExecInsert and to possibly save
a tuple copy step in some cases.
a portal are never NULL, but reliably provide the source text of the query.
It turns out that there was only one place that was really taking a short-cut,
which was the 'EXECUTE' utility statement. That doesn't seem like a
sufficiently critical performance hotspot to justify not offering a guarantee
of validity of the portal source text. Fix it to copy the source text over
from the cached plan. Add Asserts in the places that set up cached plans and
portals to reject null source strings, and simplify a bunch of places that
formerly needed to guard against nulls.
There may be a few places that cons up statements for execution without
having any source text at all; I found one such in ConvertTriggerToFK().
It seems sufficient to inject a phony source string in such a case,
for instance
ProcessUtility((Node *) atstmt,
"(generated ALTER TABLE ADD FOREIGN KEY command)",
NULL, false, None_Receiver, NULL);
We should take a second look at the usage of debug_query_string,
particularly the recently added current_query() SQL function.
ITAGAKI Takahiro and Tom Lane
bug #4290. The fundamental bug is that masking extParam by outer_params,
as finalize_plan had been doing, caused us to lose the information that
an initPlan depended on the output of a sibling initPlan. On reflection
the best thing to do seemed to be not to try to adjust outer_params for
this case but get rid of it entirely. The only thing it was really doing
for us was to filter out param IDs associated with SubPlan nodes, and that
can be done (with greater accuracy) while processing individual SubPlan
nodes in finalize_primnode. This approach was vindicated by the discovery
that the masking method was hiding a second bug: SS_finalize_plan failed to
remove extParam bits for initPlan output params that were referenced in the
main plan tree (it only got rid of those referenced by other initPlans).
It's not clear that this caused any real problems, given the limited use
of extParam by the executor, but it's certainly not what was intended.
I originally thought that there was also a problem with needing to include
indirect dependencies on external params in initPlans' param sets, but it
turns out that the executor handles this correctly so long as the depended-on
initPlan is earlier in the initPlans list than the one using its output.
That seems a bit of a fragile assumption, but it is true at the moment,
so I just documented it in some code comments rather than making what would
be rather invasive changes to remove the assumption.
Back-patch to 8.1. Previous versions don't have the case of initPlans
referring to other initPlans' outputs, so while the existing logic is still
questionable for them, there are not any known bugs to be fixed. So I'll
refrain from changing them for now.
corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
the PARAM_FLAG_CONST flag on the parameters that are passed into the portal,
while the former's behavior is unchanged. This should only affect the case
where the portal is executing an EXPLAIN; it will cause the generated plan to
look more like what would be generated if the portal were actually executing
the command being explained. Per gripe from Pavel.
functions.
Note that because this patch changes FmgrInfo, any external C functions
you might be testing with 8.4 will need to be recompiled.
Patch by Martin Pihlak, some editorialization by me (principally, removing
tracking of getrusage() numbers)
file portability/instr_time.h, and add a couple more macros to eliminate
some abstraction leakage we formerly had. Also update psql to use this
header instead of its own copy of nearly the same code.
This commit in itself is just code cleanup and shouldn't change anything.
It lays some groundwork for the upcoming function-stats patch, though.
There are two ways to track a snapshot: there's the "registered" list, which
is used for arbitrary long-lived snapshots; and there's the "active stack",
which is used for the snapshot that is considered "active" at any time.
This also allows users of snapshots to stop worrying about snapshot memory
allocation and freeing, and about using PG_TRY blocks around ActiveSnapshot
assignment. This is all done automatically now.
As a consequence, this allows us to reset MyProc->xmin when there are no
more snapshots registered in the current backend, reducing the impact that
long-running transactions have on VACUUM.
unnecessary #include lines in it. Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.
For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.
While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
as those for inherited columns; that is, it's no longer allowed for a child
table to not have a check constraint matching one that exists on a parent.
This satisfies the principle of least surprise (rows selected from the parent
will always appear to meet its check constraints) and eliminates some
longstanding bogosity in pg_dump, which formerly had to guess about whether
check constraints were really inherited or not.
The implementation involves adding conislocal and coninhcount columns to
pg_constraint (paralleling attislocal and attinhcount in pg_attribute)
and refactoring various ALTER TABLE actions to be more like those for
columns.
Alex Hunsaker, Nikhil Sontakke, Tom Lane
a user-supplied TID is out of range for the relation. This is needed to
preserve compatibility with our pre-8.3 behavior, and it is sensible anyway
since if the query were implemented by brute force rather than optimized
into a TidScan, the behavior for a non-existent TID would be zero rows out,
never an error. Per gripe from Gurjeet Singh.
UPDATE/SHARE couldn't occur as a subquery in a query with a non-SELECT
top-level operation. Symptoms included outright failure (as in report from
Mark Mielke) and silently neglecting to take the requested row locks.
Back-patch to 8.3, because the visible failure in the INSERT ... SELECT case
is a regression from 8.2. I'm a bit hesitant to back-patch further given the
lack of field complaints.
no particular need to do get_op_opfamily_properties() while building an
indexscan plan. Postpone that lookup until executor start. This simplifies
createplan.c a lot more than it complicates nodeIndexscan.c, and makes things
more uniform since we already had to do it that way for RowCompare
expressions. Should be a bit faster too, at least for plans that aren't
re-used many times, since we avoid palloc'ing and perhaps copying the
intermediate list data structure.
instead of plan time. Extend the amgettuple API so that the index AM returns
a boolean indicating whether the indexquals need to be rechecked, and make
that rechecking happen in nodeIndexscan.c (currently the only place where
it's expected to be needed; other callers of index_getnext are just erroring
out for now). For the moment, GIN and GIST have stub logic that just always
sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing
that control down to the opclass consistent() functions. The planner no
longer pays any attention to amopreqcheck, and that catalog column will go
away in due course.
indexscan always occurs in one call, and the results are returned in a
TIDBitmap instead of a limited-size array of TIDs. This should improve
speed a little by reducing AM entry/exit overhead, and it is necessary
infrastructure if we are ever to support bitmap indexes.
In an only slightly related change, add support for TIDBitmaps to preserve
(somewhat lossily) the knowledge that particular TIDs reported by an index
need to have their quals rechecked when the heap is visited. This facility
is not really used yet; we'll need to extend the forced-recheck feature to
plain indexscans before it's useful, and that hasn't been coded yet.
The intent is to use it to clean up 8.3's horrid @@@ kluge for text search
with weighted queries. There might be other uses in future, but that one
alone is sufficient reason.
Heikki Linnakangas, with some adjustments by me.
responsible for copying the query string into the new Portal. Such copying
is unnecessary in the common code path through exec_simple_query, and in
this case it can be enormously expensive because the string might contain
a large number of individual commands; we were copying the entire, long
string for each command, resulting in O(N^2) behavior for N commands.
(This is the cause of bug #4079.) A second problem with it is that
PortalDefineQuery really can't risk error, because if it elog's before
having set up the Portal, we will leak the plancache refcount that the
caller is trying to hand off to the portal. So go back to the design in
which the caller is responsible for making sure everything is copied into
the portal if necessary.
that is commands that have out-of-line parameters but the plan is prepared
assuming that the parameter values are constants. This is needed for the
plpgsql EXECUTE USING patch, but will probably have use elsewhere.
This commit includes the SPI functions and documentation, but no callers
nor regression tests. The upcoming EXECUTE USING patch will provide
regression-test coverage. I thought committing this separately made
sense since it's logically a distinct feature.
snapmgmt.c file for the former. The header files have also been reorganized
in three parts: the most basic snapshot definitions are now in a new file
snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c.
tqual.h has been reduced to the bare minimum.
This patch is just a first step towards managing live snapshots within a
transaction; there is no functionality change.
Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and
subsequent discussion.
strings. This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString. A number of
existing macros that provided variants on these themes were removed.
Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed. There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).
This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.
Brendan Jurd and Tom Lane
identical to tuplestore_puttuple(), except it operates on arrays of
Datums + nulls rather than a fully-formed HeapTuple. In several places
that use the tuplestore API, this means we can avoid creating a
HeapTuple altogether, saving a copy.
are declared to return set, and consist of just a single SELECT. We
can replace the FROM-item with a sub-SELECT and then optimize much as
if we were dealing with a view. Patch from Richard Rowell, cleaned up
by me.
during a bitmap index scan. This cannot affect the query results
(since we're just dumping the TIDs into a bitmap) but it might offer
some advantage in locality of access to the index. Per Greg Stark.
"multi_call_ctx" to be a distinct sub-context of the EState's per-query
context, and delete the multi_call_ctx as soon as the SRF finishes
execution. This avoids leaking SRF memory until the end of the current
query, which is particularly egregious when the SRF is scanned
multiple times. This change also fixes a leak of the fields of the
AttInMetadata struct in shutdown_MultiFuncCall().
Also fix a leak of the SRF result TupleDesc when rescanning a
FunctionScan node. The TupleDesc is allocated in the per-query context
for every call to ExecMakeTableFunctionResult(), so we should free it
after calling that function. Since the SRF might choose to return
a non-expendable TupleDesc, we only free the TupleDesc if it is
not being reference-counted.
Backpatch to 8.3 and 8.2 stable branches.
doing anything interesting, such as calling RevalidateCachedPlan(). The
necessity of this is demonstrated by an example from Willem Buitendyk:
during a replan, the planner might try to evaluate SPI-using functions,
and so we'd better be in a clean SPI context.
A small downside of this fix is that these two functions will now fail
outright if called when not inside a SPI-using procedure (ie, a
SPI_connect/SPI_finish pair). The documentation never promised or suggested
that that would work, though; and they are normally used in concert with
other functions, mainly SPI_prepare, that always have failed in such a case.
So the odds of breaking something seem pretty low.
In passing, make SPI_is_cursor_plan's error handling convention clearer,
and fix documentation's erroneous claim that SPI_cursor_open would
return NULL on error.
Before 8.3 these functions could not invoke replanning, so there is probably
no need for back-patching.
tablespace permissions failures when copying an index that is in the
database's default tablespace. A side-effect of the change is that explicitly
specifying the default tablespace no longer triggers a permissions check;
this is not how it was done in pre-8.3 releases but is argued to be more
consistent. Per bug #3921 from Andrew Gilligan. (Note: I argued in the
subsequent discussion that maybe LIKE shouldn't copy index tablespaces
at all, but since no one indicated agreement with that idea, I've refrained
from doing it.)
checking of argument compatibility right; although the problem is only exposed
with multiple-input aggregates in which some arguments are polymorphic and
some are not. Per bug #3852 from Sokolov Yura.
but no database changes have been made since the last CommandCounterIncrement.
This should result in a significant improvement in the number of "commands"
that can typically be performed within a transaction before hitting the 2^32
CommandId size limit. In particular this buys back (and more) the possible
adverse consequences of my previous patch to fix plan caching behavior.
The implementation requires tracking whether the current CommandCounter
value has been "used" to mark any tuples. CommandCounter values stored into
snapshots are presumed not to be used for this purpose. This requires some
small executor changes, since the executor used to conflate the curcid of
the snapshot it was using with the command ID to mark output tuples with.
Separating these concepts allows some small simplifications in executor APIs.
Something for the TODO list: look into having CommandCounterIncrement not do
AcceptInvalidationMessages. It seems fairly bogus to be doing it there,
but exactly where to do it instead isn't clear, and I'm disinclined to mess
with asynchronous behavior during late beta.
plan before the effects of DDL executed in an immediately prior SPI operation
had been absorbed. Per report from Chris Wood.
This patch has an unpleasant side effect of causing the number of
CommandCounterIncrement()s done by a typical plpgsql function to
approximately double. Amelioration of the consequences of that
will be undertaken in a separate patch.
in corner cases such as re-fetching a just-deleted row. We may be able to
relax this someday, but let's find out how many people really care before
we invest a lot of work in it. Per report from Heikki and subsequent
discussion.
While in the neighborhood, make the combination of INSENSITIVE and FOR UPDATE
throw an error, since they are semantically incompatible. (Up to now we've
accepted but just ignored the INSENSITIVE option of DECLARE CURSOR.)
then-delete on the current cursor row. The basic fix is that nodeTidscan.c
has to apply heap_get_latest_tid() to the current-scan-TID obtained from the
cursor query; this ensures we get the latest row version to work with.
However, since that only works if the query plan is a TID scan, we also have
to hack the planner to make sure only that type of plan will be selected.
(Formerly, the planner might decide to apply a seqscan if the table is very
small. This change is probably a Good Thing anyway, since it's hard to see
how a seqscan could really win.) That means the execQual.c code to support
CurrentOfExpr as a regular expression type is dead code, so replace it with
just an elog(). Also, add regression tests covering these cases. Note
that the added tests expose the fact that re-fetching an updated row
misbehaves if the cursor used FOR UPDATE. That's an independent bug that
should be fixed later. Per report from Dharmendra Goyal.
columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version. Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.
In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.
Pavan Deolasee, with help from a bunch of other people.