checks in ExecIndexBuildScanKeys() that were inadequate anyway: it's better
to verify the correct varno on an expected index key, not just reject OUTER
and INNER.
This makes the entire current contents of nodeFuncs.c dead code. I'll be
replacing it with some other stuff later, as per recent proposal.
corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
no particular need to do get_op_opfamily_properties() while building an
indexscan plan. Postpone that lookup until executor start. This simplifies
createplan.c a lot more than it complicates nodeIndexscan.c, and makes things
more uniform since we already had to do it that way for RowCompare
expressions. Should be a bit faster too, at least for plans that aren't
re-used many times, since we avoid palloc'ing and perhaps copying the
intermediate list data structure.
instead of plan time. Extend the amgettuple API so that the index AM returns
a boolean indicating whether the indexquals need to be rechecked, and make
that rechecking happen in nodeIndexscan.c (currently the only place where
it's expected to be needed; other callers of index_getnext are just erroring
out for now). For the moment, GIN and GIST have stub logic that just always
sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing
that control down to the opclass consistent() functions. The planner no
longer pays any attention to amopreqcheck, and that catalog column will go
away in due course.
during a bitmap index scan. This cannot affect the query results
(since we're just dumping the TIDs into a bitmap) but it might offer
some advantage in locality of access to the index. Per Greg Stark.
EXPLAIN-only operation was a little too short; it skipped initializing the
node's result tuple type, which may be needed depending on what's above the
indexscan node. Call ExecAssignResultTypeFromTL before exiting. (For good
luck I moved up the ExecAssignScanProjectionInfo call as well, so that
everything except indexscan-specific initialization will still be done.)
Per example from Grant Finnemore.
and/or create plans for hypothetical situations; in particular, investigate
plans that would be generated using hypothetical indexes. This is a
heavily-rewritten version of the hooks proposed by Gurjeet Singh for his
Index Advisor project. In this formulation, the index advisor can be
entirely a loadable module instead of requiring a significant part to be
in the core backend, and plans can be generated for hypothetical indexes
without requiring the creation and rolling-back of system catalog entries.
The index advisor patch as-submitted is not compatible with these hooks,
but it needs significant work anyway due to other 8.2-to-8.3 planner
changes. With these hooks in the core backend, development of the advisor
can proceed as a pgfoundry project.
ps_TupFromTlist in plan nodes that make use of it. This was being done
correctly in join nodes and Result nodes but not in any relation-scan nodes.
Bug would lead to bogus results if a set-returning function appeared in the
targetlist of a subquery that could be rescanned after partial execution,
for example a subquery within EXISTS(). Bug has been around forever :-(
... surprising it wasn't reported before.
cases. Operator classes now exist within "operator families". While most
families are equivalent to a single class, related classes can be grouped
into one family to represent the fact that they are semantically compatible.
Cross-type operators are now naturally adjunct parts of a family, without
having to wedge them into a particular opclass as we had done originally.
This commit restructures the catalogs and cleans up enough of the fallout so
that everything still works at least as well as before, but most of the work
needed to actually improve the planner's behavior will come later. Also,
there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way
to create a new family right now is to allow CREATE OPERATOR CLASS to make
one by default. I owe some more documentation work, too. But that can all
be done in smaller pieces once this infrastructure is in place.
(table or index) before trying to open its relcache entry. This fixes
race conditions in which someone else commits a change to the relation's
catalog entries while we are in process of doing relcache load. Problems
of that ilk have been reported sporadically for years, but it was not
really practical to fix until recently --- for instance, the recent
addition of WAL-log support for in-place updates helped.
Along the way, remove pg_am.amconcurrent: all AMs are now expected to support
concurrent update.
by creating a reference-count mechanism, similar to what we did a long time
ago for catcache entries. The back branches have an ugly solution involving
lots of extra copies, but this way is more efficient. Reference counting is
only applied to tupdescs that are actually in caches --- there seems no need
to use it for tupdescs that are generated in the executor, since they'll go
away during plan shutdown by virtue of being in the per-query memory context.
Neil Conway and Tom Lane
any use in the past many years, we'd have made some effort to include
them in all executor node types; but in fact they were only in
nodeAppend.c and nodeIndexscan.c, up until I copied nodeIndexscan.c's
occurrence into the new bitmap node types. Remove some other unused
macros in execdebug.h, too. Some day the whole header probably ought to
go away in favor of better-designed facilities.
bits indicating which optional capabilities can actually be exercised
at runtime. This will allow Sort and Material nodes, and perhaps later
other nodes, to avoid unnecessary overhead in common cases.
This commit just adds the infrastructure and arranges to pass the correct
flag values down to plan nodes; none of the actual optimizations are here
yet. I'm committing this separately in case anyone wants to measure the
added overhead. (It should be negligible.)
Simon Riggs and Tom Lane
if we already have a stronger lock due to the index's table being the
update target table of the query. Same optimization I applied earlier
at the table level. There doesn't seem to be much interest in the more
radical idea of not locking indexes at all, so do what we can ...
relation if it's already been locked by execMain.c as either a result
relation or a FOR UPDATE/SHARE relation. This avoids an extra trip to
the shared lock manager state. Per my suggestion yesterday.
qualification when the underlying operator is indexable and useOr is true.
That is, indexkey op ANY (ARRAY[...]) is effectively translated into an
OR combination of one indexscan for each array element. This only works
for bitmap index scans, of course, since regular indexscans no longer
support OR'ing of scans. There are still some loose ends to clean up
before changing 'x IN (list)' to translate as a ScalarArrayOpExpr;
for instance predtest.c ought to be taught about it. But this gets the
basic functionality in place.
a TupleTableSlot: instead of calling ExecClearTuple, inline the needed
operations, so that we can avoid redundant steps. In particular, when
the old and new tuples are both on the same disk page, avoid releasing
and re-acquiring the buffer pin --- this saves work in both the bufmgr
and ResourceOwner modules. To make this improvement actually useful,
partially revert a change I made on 2004-04-21 that caused SeqNext
et al to call ExecClearTuple before ExecStoreTuple. The motivation
for that, to avoid grabbing the BufMgrLock separately for releasing
the old buffer and grabbing the new one, no longer applies. My
profiling says that this saves about 5% of the CPU time for an
all-in-memory seqscan.
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
which is neither needed by nor related to that header. Remove the bogus
inclusion and instead include the header in those C files that actually
need it. Also fix unnecessary inclusions and bad inclusion order in
tsearch2 files.
node, as this behavior is now better done as a bitmap OR indexscan.
This allows considerable simplification in nodeIndexscan.c itself as
well as several planner modules concerned with indexscan plan generation.
Also we can improve the sharing of code between regular and bitmap
indexscans, since they are now working with nigh-identical Plan nodes.
ExprContexts will be freed anyway when FreeExecutorState() is reached,
and letting that routine do the work is more efficient because it will
automatically free the ExprContexts in reverse creation order. The
existing coding was effectively freeing them in exactly the worst
possible order, resulting in O(N^2) behavior inside list_delete_ptr,
which becomes highly visible in cases with a few thousand plan nodes.
ExecFreeExprContext is now effectively a no-op and could be removed,
but I left it in place in case we ever want to put it back to use.
of tuples when passing data up through multiple plan nodes. A slot can now
hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting
of Datum/isnull arrays. Upper plan levels can usually just copy the Datum
arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr()
calls to extract the data again. This work extends Atsushi Ogawa's earlier
patch, which provided the key idea of adding Datum arrays to TupleTableSlots.
(I believe however that something like this was foreseen way back in Berkeley
days --- see the old comment on ExecProject.) A test case involving many
levels of join of fairly wide tables (about 80 columns altogether) showed
about 3x overall speedup, though simple queries will probably not be
helped very much.
I have also duplicated some code in heaptuple.c in order to provide versions
of heap_formtuple and friends that use "bool" arrays to indicate null
attributes, instead of the old convention of "char" arrays containing either
'n' or ' '. This provides a better match to the convention used by
ExecEvalExpr. While I have not made a concerted effort to get rid of uses
of the old routines, I think they should be deprecated and eventually removed.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
returning a NULL pointer (some callers remembered to check the return
value, but some did not -- it is safer to just bail out).
Also, cleanup pgstat.c to use elog(ERROR) rather than elog(LOG) followed
by exit().
In the past, we used a 'Lispy' linked list implementation: a "list" was
merely a pointer to the head node of the list. The problem with that
design is that it makes lappend() and length() linear time. This patch
fixes that problem (and others) by maintaining a count of the list
length and a pointer to the tail node along with each head node pointer.
A "list" is now a pointer to a structure containing some meta-data
about the list; the head and tail pointers in that structure refer
to ListCell structures that maintain the actual linked list of nodes.
The function names of the list API have also been changed to, I hope,
be more logically consistent. By default, the old function names are
still available; they will be disabled-by-default once the rest of
the tree has been updated to use the new API names.
the next are handled by ReleaseAndReadBuffer rather than separate
ReleaseBuffer and ReadBuffer calls. This cuts the number of acquisitions
of the BufMgrLock by a factor of 2 (possibly more, if an indexscan happens
to pull successive rows from the same heap page). Unfortunately this
doesn't seem enough to get us out of the recently discussed context-switch
storm problem, but it's surely worth doing anyway.
Make btree index creation and initial validation of foreign-key constraints
use maintenance_work_mem rather than work_mem as their memory limit.
Add some code to guc.c to allow these variables to be referenced by their
old names in SHOW and SET commands, for backwards compatibility.
pointer type when it is not necessary to do so.
For future reference, casting NULL to a pointer type is only necessary
when (a) invoking a function AND either (b) the function has no prototype
OR (c) the function is a varargs function.
regular qpqual ('filter condition'), add special-purpose code to
nodeIndexscan.c to recheck them. This ends being almost no net addition
of code, because the removal of planner code balances out the extra
executor code, but it is significantly more efficient when a lossy
operator is involved in an OR indexscan. The old implementation had
to recheck the entire indexqual in such cases.
pghackers proposal of 8-Nov. All the existing cross-type comparison
operators (int2/int4/int8 and float4/float8) have appropriate support.
The original proposal of storing the right-hand-side datatype as part of
the primary key for pg_amop and pg_amproc got modified a bit in the event;
it is easier to store zero as the 'default' case and only store a nonzero
when the operator is actually cross-type. Along the way, remove the
long-since-defunct bigbox_ops operator class.
Remove the 'strategy map' code, which was a large amount of mechanism
that no longer had any use except reverse-mapping from procedure OID to
strategy number. Passing the strategy number to the index AM in the
first place is simpler and faster.
This is a preliminary step in planned support for cross-datatype index
operations. I'm committing it now since the ScanKeyEntryInitialize()
API change touches quite a lot of files, and I want to commit those
changes before the tree drifts under me.
now able to cope with assigning new relfilenode values to nailed-in-cache
indexes, so they can be reindexed using the fully crash-safe method. This
leaves only shared system indexes as special cases. Remove the 'index
deactivation' code, since it provides no useful protection in the shared-
index case. Require reindexing of shared indexes to be done in standalone
mode, but remove other restrictions on REINDEX. -P (IgnoreSystemIndexes)
now prevents using indexes for lookups, but does not disable index updates.
It is therefore safe to allow from PGOPTIONS. Upshot: reindexing system catalogs
can be done without a standalone backend for all cases except
shared catalogs.
handling many-way scans: instead of re-evaluating all prior indexscan
quals to see if a tuple has been fetched more than once, use a hash table
indexed by tuple CTID. But fall back to the old way if the hash table
grows to exceed SortMem.