- full concurrency for insert/update/select/vacuum:
- select and vacuum never locks more than one page simultaneously
- select (gettuple) hasn't any lock across it's calls
- insert never locks more than two page simultaneously:
- during search of leaf to insert it locks only one page
simultaneously
- while walk upward to the root it locked only parent (may be
non-direct parent) and child. One of them X-lock, another may
be S- or X-lock
- 'vacuum full' locks index
- improve gistgetmulti
- simplify XLOG records
Fix bug in index_beginscan_internal: LockRelation may clean
rd_aminfo structure, so move GET_REL_PROCEDURE after LockRelation
nonconsecutive columns of a multicolumn index, as per discussion around
mid-May (pghackers thread "Best way to scan on-disk bitmaps"). This
turns out to require only minimal changes in btree, and so far as I can
see none at all in GiST. btcostestimate did need some work, but its
original assumption that index selectivity == heap selectivity was
quite bogus even before this.
routines in the index's relcache entry, instead of doing a fresh fmgr_info
on every index access. We were already doing this for the index's opclass
support functions; not sure why we didn't think to do it for the AM
functions too. This supersedes the former method of caching (only)
amgettuple in indexscan scan descriptors; it's an improvement because the
function lookup can be amortized across multiple statements instead of
being repeated for each statement. Even though lookup for builtin
functions is pretty cheap, this seems to drop a percent or two off some
simple benchmarks.
in an inconsistent state. (This is only latent because in reality
ExecSeqRestrPos is dead code at the moment ... but someday maybe it won't
be.) Add some comments about what the API for plan node mark/restore
actually is, because it's not immediately obvious.
indexes. Replace all heap_openr and index_openr calls by heap_open
and index_open. Remove runtime lookups of catalog OID numbers in
various places. Remove relcache's support for looking up system
catalogs by name. Bulky but mostly very boring patch ...
change saves a great deal of space in pg_proc and its primary index,
and it eliminates the former requirement that INDEX_MAX_KEYS and
FUNC_MAX_ARGS have the same value. INDEX_MAX_KEYS is still embedded
in the on-disk representation (because it affects index tuple header
size), but FUNC_MAX_ARGS is not. I believe it would now be possible
to increase FUNC_MAX_ARGS at little cost, but haven't experimented yet.
There are still a lot of vestigial references to FUNC_MAX_ARGS, which
I will clean up in a separate pass. However, getting rid of it
altogether would require changing the FunctionCallInfoData struct,
and I'm not sure I want to buy into that.
access: define new index access method functions 'amgetmulti' that can
fetch multiple TIDs per call. (The functions exist but are totally
untested as yet.) Since I was modifying pg_am anyway, remove the
no-longer-needed 'rel' parameter from amcostestimate functions, and
also remove the vestigial amowner column that was creating useless
work for Alvaro's shared-object-dependencies project.
Initdb forced due to changes in pg_am.
convention for isnull flags. Also, remove the useless InsertIndexResult
return struct from index AM aminsert calls --- there is no reason for
the caller to know where in the index the tuple was inserted, and we
were wasting a palloc cycle per insert to deliver this uninteresting
value (plus nontrivial complexity in some AMs).
I forced initdb because of the change in the signature of the aminsert
routines, even though nothing really looks at those pg_proc entries...
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php.
This fix is intended to be permanent: it moves the responsibility for
calling SetBufferCommitInfoNeedsSave() into the tqual.c routines,
eliminating the requirement for callers to test whether t_infomask changed.
Also, tighten validity checking on buffer IDs in bufmgr.c --- several
routines were paranoid about out-of-range shared buffer numbers but not
about out-of-range local ones, which seems a tad pointless.
now are supposed to take some kind of lock on an index whenever you
are going to access the index contents, rather than relying only on a
lock on the parent table.
the next are handled by ReleaseAndReadBuffer rather than separate
ReleaseBuffer and ReadBuffer calls. This cuts the number of acquisitions
of the BufMgrLock by a factor of 2 (possibly more, if an indexscan happens
to pull successive rows from the same heap page). Unfortunately this
doesn't seem enough to get us out of the recently discussed context-switch
storm problem, but it's surely worth doing anyway.
Remove the 'strategy map' code, which was a large amount of mechanism
that no longer had any use except reverse-mapping from procedure OID to
strategy number. Passing the strategy number to the index AM in the
first place is simpler and faster.
This is a preliminary step in planned support for cross-datatype index
operations. I'm committing it now since the ScanKeyEntryInitialize()
API change touches quite a lot of files, and I want to commit those
changes before the tree drifts under me.
now able to cope with assigning new relfilenode values to nailed-in-cache
indexes, so they can be reindexed using the fully crash-safe method. This
leaves only shared system indexes as special cases. Remove the 'index
deactivation' code, since it provides no useful protection in the shared-
index case. Require reindexing of shared indexes to be done in standalone
mode, but remove other restrictions on REINDEX. -P (IgnoreSystemIndexes)
now prevents using indexes for lookups, but does not disable index updates.
It is therefore safe to allow from PGOPTIONS. Upshot: reindexing system catalogs
can be done without a standalone backend for all cases except
shared catalogs.
NULL key pointer, indicating that the existing scan key should be reused.
This behavior isn't used yet but will be needed for my planned fix to
the keys_are_unique code.
now knows what to do upon hitting a dead page (in theory anyway, it's
untested...). Add a post-VACUUM-cleanup entry point for index AMs, to
provide a place for dead-page scavenging to happen.
Also, fix oversight that broke btpo_prev links in temporary indexes.
initdb forced due to additions in pg_am.
the index AM when we know we are fetching a unique row. However, this
logic did not consider the possibility that it would be asked to fetch
backwards. Also fix mark/restore to work correctly in this scenario.
transaction, so as to avoid returning them out of the index AM. Saves
repeated heap_fetch operations on frequently-updated rows. Also detect
queries on unique keys (equality to all columns of a unique index), and
don't bother continuing scan once we have found first match.
Killing is implemented in the btree and hash AMs, but not yet in rtree
or gist, because there isn't an equally convenient place to do it in
those AMs (the outer amgetnext routine can't do it without re-pinning
the index page).
Did some small cleanup on APIs of HeapTupleSatisfies, heap_fetch, and
index_insert to make this a little easier.
yesterday's proposal to pghackers. Also remove unnecessary parameters
to heap_beginscan, heap_rescan. I modified pg_proc.h to reflect the
new numbers of parameters for the AM interface routines, but did not
force an initdb because nothing actually looks at those fields.
an 'opclass owner' column in pg_opclass. Nothing is done with it at
present, but since there are plans to invent a CREATE OPERATOR CLASS
command soon, we'll probably want DROP OPERATOR CLASS too, which
suggests that a notion of ownership would be a good idea.
in schemas other than the system namespace; however, there's no search
path yet, and not all operations work yet on tables outside the system
namespace.
Improve 'pg_internal.init' relcache entry preload mechanism so that it is
safe to use for all system catalogs, and arrange to preload a realistic
set of system-catalog entries instead of only the three nailed-in-cache
indexes that were formerly loaded this way. Fix mechanism for deleting
out-of-date pg_internal.init files: this must be synchronized with transaction
commit, not just done at random times within transactions. Drive it off
relcache invalidation mechanism so that no special-case tests are needed.
Cache additional information in relcache entries for indexes (their pg_index
tuples and index-operator OIDs) to eliminate repeated lookups. Also cache
index opclass info at the per-opclass level to avoid repeated lookups during
relcache load.
Generalize 'systable scan' utilities originally developed by Hiroshi,
move them into genam.c, use in a number of places where there was formerly
ugly code for choosing either heap or index scan. In particular this allows
simplification of the logic that prevents infinite recursion between syscache
and relcache during startup: we can easily switch to heapscans in relcache.c
when and where needed to avoid recursion, so IndexScanOK becomes simpler and
does not need any expensive initialization.
Eliminate useless opening of a heapscan data structure while doing an indexscan
(this saves an mdnblocks call and thus at least one kernel call).
recreated since the start of our transaction, our first reference to it
errored out because we'd try to reuse our old relcache entry for it.
Do this by accepting SI inval messages just before relcache search in
heap_openr, so that dead relcache entries will be flushed before we
search. Also, break heap_open/openr into two pairs of routines,
relation_open(r) and heap_open(r). The relation_open routines make
no tests on relkind and so can be used to open anything that has a
pg_class entry. The heap_open routines are wrappers that add a relkind
test to preserve their established behavior. Use the relation_open
routines in several places that had various kluge solutions for opening
rels that might be either heap or index rels.
Also, remove the old 'heap stats' code that's been superseded by Jan's
stats collector, and clean up some inconsistencies in error reporting
between the different types of ALTER TABLE.
lookup info in the relcache for index access method support functions.
This makes a huge difference for dynamically loaded support functions,
and should save a few cycles even for built-in ones. Also tweak dfmgr.c
so that load_external_function is called only once, not twice, when
doing fmgr_info for a dynamically loaded function. All per performance
gripe from Teodor Sigaev, 5-Oct-01.
pgsql-hackers. pg_opclass now has a row for each opclass supported by each
index AM, not a row for each opclass name. This allows pg_opclass to show
directly whether an AM supports an opclass, and furthermore makes it possible
to store additional information about an opclass that might be AM-dependent.
pg_opclass and pg_amop now store "lossy" and "haskeytype" information that we
previously expected the user to remember to provide in CREATE INDEX commands.
Lossiness is no longer an index-level property, but is associated with the
use of a particular operator in a particular index opclass.
Along the way, IndexSupportInitialize now uses the syscaches to retrieve
pg_amop and pg_amproc entries. I find this reduces backend launch time by
about ten percent, at the cost of a couple more special cases in catcache.c's
IndexScanOK.
Initial work by Oleg Bartunov and Teodor Sigaev, further hacking by Tom Lane.
initdb forced.
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
appropriate pin-count manipulation, and instead use ReleaseAndReadBuffer.
Make use of the fact that the passed-in buffer (if there is one) must
be pinned to avoid grabbing the bufmgr spinlock when we are able to
return this same buffer. Eliminate unnecessary 'previous tuple' and
'next tuple' fields of HeapScanDesc and IndexScanDesc, thereby removing
a whole lot of bookkeeping from heap_getnext() and related routines.
report on old-style functions invoked by RI triggers. We had a number of
other places that were being sloppy about which memory context FmgrInfo
subsidiary data will be allocated in. Turns out none of them actually
cause a problem in 7.1, but this is for arcane reasons such as the fact
that old-style triggers aren't supported anyway. To avoid getting burnt
later, I've restructured the trigger support so that we don't keep trigger
FmgrInfo structs in relcache memory. Some other related cleanups too:
it's not really necessary to call fmgr_info at all while setting up
the index support info in relcache entries, because those ScanKeyEntry
structs are never used to invoke the functions. This should speed up
relcache initialization a tiny bit.
maintained for each cache entry. A cache entry will not be freed until
the matching ReleaseSysCache call has been executed. This eliminates
worries about cache entries getting dropped while still in use. See
my posting to pg-hackers of even date for more info.