When one side of the join has a NULL, we don't want to uselessly try
to match it against every remaining tuple of the other side. While
at it, rewrite the comparison machinery to avoid multiple evaluations
of the left and right input expressions and to use a btree comparator
where available, instead of double operator calls. Also revise the
state machine to eliminate redundant comparisons and hopefully make it
more readable too.
which is neither needed by nor related to that header. Remove the bogus
inclusion and instead include the header in those C files that actually
need it. Also fix unnecessary inclusions and bad inclusion order in
tsearch2 files.
startup to end, rather than re-opening it in each MultiExecBitmapIndexScan
call. I had foolishly thought that opening/closing wouldn't be much
more expensive than a rescan call, but that was sheer brain fade.
This seems to fix about half of the performance lossage reported by
Sergey Koposov. I'm still not sure where the other half went.
to produce when running the executor. This is consistent with the internal
executor APIs (such as ExecutorRun), which also use a long for this purpose.
It also allows FETCH_ALL to be passed -- since FETCH_ALL is defined as
LONG_MAX, this wouldn't have worked on platforms where int and long are of
different sizes. Per report from Tzahi Fadida.
only one argument. (Per recent discussion, the option to accept multiple
arguments is pretty useless for user-defined types, and would be a likely
source of security holes if it was used.) Simplify call sites of
output/send functions to not bother passing more than one argument.
to eliminate unnecessary deadlocks. This commit adds SELECT ... FOR SHARE
paralleling SELECT ... FOR UPDATE. The implementation uses a new SLRU
data structure (managed much like pg_subtrans) to represent multiple-
transaction-ID sets. When more than one transaction is holding a shared
lock on a particular row, we create a MultiXactId representing that set
of transactions and store its ID in the row's XMAX. This scheme allows
an effectively unlimited number of row locks, just as we did before,
while not costing any extra overhead except when a shared lock actually
has to be shared. Still TODO: use the regular lock manager to control
the grant order when multiple backends are waiting for a row lock.
Alvaro Herrera and Tom Lane.
node, as this behavior is now better done as a bitmap OR indexscan.
This allows considerable simplification in nodeIndexscan.c itself as
well as several planner modules concerned with indexscan plan generation.
Also we can improve the sharing of code between regular and bitmap
indexscans, since they are now working with nigh-identical Plan nodes.
but just to open and close it during MultiExecBitmapIndexScan. This
avoids acquiring duplicate resources (eg, multiple locks on the same
relation) in a tree with many bitmap scans. Also, don't bother to
lock the parent heap at all here, since we must be underneath a
BitmapHeapScan node that will be holding a suitable lock.
ExprContexts will be freed anyway when FreeExecutorState() is reached,
and letting that routine do the work is more efficient because it will
automatically free the ExprContexts in reverse creation order. The
existing coding was effectively freeing them in exactly the worst
possible order, resulting in O(N^2) behavior inside list_delete_ptr,
which becomes highly visible in cases with a few thousand plan nodes.
ExecFreeExprContext is now effectively a no-op and could be removed,
but I left it in place in case we ever want to put it back to use.
but the code is basically working. Along the way, rewrite the entire
approach to processing OR index conditions, and make it work in join
cases for the first time ever. orindxpath.c is now basically obsolete,
but I left it in for the time being to allow easy comparison testing
against the old implementation.
scans, using in-memory tuple ID bitmaps as the intermediary. The planner
frontend (path creation and cost estimation) is not there yet, so none
of this code can be executed. I have tested it using some hacked planner
code that is far too ugly to see the light of day, however. Committing
now so that the bulk of the infrastructure changes go in before the tree
drifts under me.
return just a single tuple at a time. Currently the only such node
type is Hash, but I expect we will soon have indexscans that can return
tuple bitmaps. A side benefit is that EXPLAIN ANALYZE now shows the
correct tuple count for a Hash node.
indexes. Replace all heap_openr and index_openr calls by heap_open
and index_open. Remove runtime lookups of catalog OID numbers in
various places. Remove relcache's support for looking up system
catalogs by name. Bulky but mostly very boring patch ...
indexes. Extend the macros in include/catalog/*.h to carry the info
about hand-assigned OIDs, and adjust the genbki script and bootstrap
code to make the relations actually get those OIDs. Remove the small
number of RelOid_pg_foo macros that we had in favor of a complete
set named like the catname.h and indexing.h macros. Next phase will
get rid of internal use of names for looking up catalogs and indexes;
but this completes the changes forcing an initdb, so it looks like a
good place to commit.
Along the way, I made the shared relations (pg_database etc) not be
'bootstrap' relations any more, so as to reduce the number of hardwired
entries and simplify changing those relations in future. I'm not
sure whether they ever really needed to be handled as bootstrap
relations, but it seems to work fine to not do so now.
ExecProcNode() with a NULL value, so the test couldn't do anything
for us except maybe mask bugs. Removing it probably doesn't save
anything much either, but then again this is a hot-spot routine.
few palloc's. I also chose to eliminate the restype and restypmod fields
entirely, since they are redundant with information stored in the node's
contained expression; re-examining the expression at need seems simpler
and more reliable than trying to keep restype/restypmod up to date.
initdb forced due to change in contents of stored rules.
old comment in the code claimed that this was necessary. Since it is not
actually necessary any more, it is clearer to remove the comment and
just return NULL instead -- the return value of ExecHash() is not used.
change saves a great deal of space in pg_proc and its primary index,
and it eliminates the former requirement that INDEX_MAX_KEYS and
FUNC_MAX_ARGS have the same value. INDEX_MAX_KEYS is still embedded
in the on-disk representation (because it affects index tuple header
size), but FUNC_MAX_ARGS is not. I believe it would now be possible
to increase FUNC_MAX_ARGS at little cost, but haven't experimented yet.
There are still a lot of vestigial references to FUNC_MAX_ARGS, which
I will clean up in a separate pass. However, getting rid of it
altogether would require changing the FunctionCallInfoData struct,
and I'm not sure I want to buy into that.
executing a statement that fires triggers. Formerly this time was
included in "Total runtime" but not otherwise accounted for.
As a side benefit, we avoid re-opening relations when firing non-deferred
AFTER triggers, because the trigger code can re-use the main executor's
ResultRelInfo data structure.
convention for isnull flags. Also, remove the useless InsertIndexResult
return struct from index AM aminsert calls --- there is no reason for
the caller to know where in the index the tuple was inserted, and we
were wasting a palloc cycle per insert to deliver this uninteresting
value (plus nontrivial complexity in some AMs).
I forced initdb because of the change in the signature of the aminsert
routines, even though nothing really looks at those pg_proc entries...
of tuples when passing data up through multiple plan nodes. A slot can now
hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting
of Datum/isnull arrays. Upper plan levels can usually just copy the Datum
arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr()
calls to extract the data again. This work extends Atsushi Ogawa's earlier
patch, which provided the key idea of adding Datum arrays to TupleTableSlots.
(I believe however that something like this was foreseen way back in Berkeley
days --- see the old comment on ExecProject.) A test case involving many
levels of join of fairly wide tables (about 80 columns altogether) showed
about 3x overall speedup, though simple queries will probably not be
helped very much.
I have also duplicated some code in heaptuple.c in order to provide versions
of heap_formtuple and friends that use "bool" arrays to indicate null
attributes, instead of the old convention of "char" arrays containing either
'n' or ' '. This provides a better match to the convention used by
ExecEvalExpr. While I have not made a concerted effort to get rid of uses
of the old routines, I think they should be deprecated and eventually removed.
a tuple are being accessed via ExecEvalVar and the attcacheoff shortcut
isn't usable (due to nulls and/or varlena columns). To do this, cache
Datums extracted from a tuple in the associated TupleTableSlot.
Also some code cleanup in and around the TupleTable handling.
Atsushi Ogawa with some kibitzing by Tom Lane.
can tell whether it is being used as an aggregate or not. This allows
such a function to avoid re-pallocing a pass-by-reference transition
value; normally it would be unsafe for a function to scribble on an input,
but in the aggregate case it's safe to reuse the old transition value.
Make int8inc() do this. This gets a useful improvement in the speed of
COUNT(*), at least on narrow tables (it seems to be swamped by I/O when
the table rows are wide). Per a discussion in early December with
Neil Conway. I also fixed int_aggregate.c to check this, thereby
turning it into something approaching a supportable technique instead
of being a crude hack.
Formerly, if such a clause contained no aggregate functions we mistakenly
treated it as equivalent to WHERE. Per spec it must cause the query to
be treated as a grouped query of a single group, the same as appearance
of aggregate functions would do. Also, the HAVING filter must execute
after aggregate function computation even if it itself contains no
aggregate functions.
on-the-fly, and thereby avoid blowing out memory when the planner has
underestimated the hash table size. Hash join will now obey the
work_mem limit with some faithfulness. Per my recent proposal
(hash aggregate part isn't done yet though).
that return tuples (such as EXPLAIN). Per gripe from Michael Fuhr.
Side effect: fix an old bug that unintentionally disabled backward scans
for all SPI-created cursors.
look at the actual aggregate transition datatypes and the actual overhead
needed by nodeAgg.c, instead of using pessimistic round numbers.
Per a discussion with Michael Tiemann.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
of an inheritance child table is binary-compatible with the rowtype of
its parent, invent an expression node type that does the conversion
correctly. Fixes the new bug exhibited by Kris Shannon as well as a
lot of old bugs that would only show up when using multiple inheritance
or after altering the parent table.
We don't really want to start a new SPI connection, just keep using the old
one; otherwise we have memory management problems as illustrated by
John Kennedy's bug report of today. This requires a bit of a hack to
ensure the SPI stack state is properly restored, but then again what we
were doing before was a hack too, strictly speaking. Add a regression
test to cover this case.
returning a NULL pointer (some callers remembered to check the return
value, but some did not -- it is safer to just bail out).
Also, cleanup pgstat.c to use elog(ERROR) rather than elog(LOG) followed
by exit().