postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-10 17:42:29 +03:00

Author	SHA1	Message	Date
Tom Lane	d961a56899	Avoid unnecessary palloc overhead in _bt_first(). The temporary scankeys arrays that it needs can never have more than INDEX_MAX_KEYS entries, so it's reasonable to just allocate them as fixed-size local arrays, and save the cost of palloc/pfree. Not a huge savings, but a cycle saved is a cycle earned ...	2005-06-19 22:41:00 +00:00
Tom Lane	c186c93148	Change the planner to allow indexscan qualification clauses to use nonconsecutive columns of a multicolumn index, as per discussion around mid-May (pghackers thread "Best way to scan on-disk bitmaps"). This turns out to require only minimal changes in btree, and so far as I can see none at all in GiST. btcostestimate did need some work, but its original assumption that index selectivity == heap selectivity was quite bogus even before this.	2005-06-13 23:14:49 +00:00
Tom Lane	ee7ac7b11e	Modify XLogInsert API to make callers specify whether pages to be backed up have the standard layout with unused space between pd_lower and pd_upper. When this is set, XLogInsert will omit the unused space without bothering to scan it to see if it's zero. That saves time in XLogInsert, and also allows reversion of my earlier patch to make PageRepairFragmentation et al explicitly re-zero freed space. Per suggestion by Heikki Linnakangas.	2005-06-06 20:22:58 +00:00
Tom Lane	4c8495a1f2	Remove the mostly-stubbed-out-anyway support routines for WAL UNDO. That code is never going to be used in the foreseeable future, and where it's more than a stub it's making the redo routines harder to read.	2005-06-06 17:01:25 +00:00
Tom Lane	21fda22ec4	Change CRCs in WAL records from 64bit to 32bit for performance reasons. Instead of a separate CRC on each backup block, include backup blocks in their parent WAL record's CRC; this is important to ensure that the backup block really goes with the WAL record, ie there was not a page tear right at the start of the backup block. Implement a simple form of compression of backup blocks: drop any run of zeroes starting at pd_lower, so as not to store the unused 'hole' that commonly exists in PG heap and index pages. Tweak PageRepairFragmentation and related routines to ensure they keep the unused space zeroed, so that the above compression method remains effective. All per recent discussions.	2005-06-02 05:55:29 +00:00
Neil Conway	3140437495	This patch refactors away some duplicated code in the index AM build methods: they all invoke UpdateStats() since they have computed the number of heap tuples, so I created a function in catalog/index.c that each AM now calls.	2005-05-11 06:24:55 +00:00
Tom Lane	30f540be43	Repair very-low-probability race condition between relation extension and VACUUM: in the interval between adding a new page to the relation and formatting it, it was possible for VACUUM to come along and decide it should format the page too. Though not harmful in itself, this would cause data loss if a third transaction were able to insert tuples into the vacuumed page before the original extender got control back.	2005-05-07 21:32:24 +00:00
Tom Lane	278bd0cc22	For some reason access/tupmacs.h has been #including utils/memutils.h, which is neither needed by nor related to that header. Remove the bogus inclusion and instead include the header in those C files that actually need it. Also fix unnecessary inclusions and bad inclusion order in tsearch2 files.	2005-05-06 17:24:55 +00:00
Tom Lane	3a694bb0a1	Restructure LOCKTAG as per discussions of a couple months ago. Essentially, we shoehorn in a lockable-object-type field by taking a byte away from the lockmethodid, which can surely fit in one byte instead of two. This allows less artificial definitions of all the other fields of LOCKTAG; we can get rid of the special pg_xactlock pseudo-relation, and also support locks on individual tuples and general database objects (including shared objects). None of those possibilities are actually exploited just yet, however. I removed pg_xactlock from pg_class, but did not force initdb for that change. At this point, relkind 's' (SPECIAL) is unused and could be removed entirely.	2005-04-29 22:28:24 +00:00
Tom Lane	70c9763d48	Convert oidvector and int2vector into variable-length arrays. This change saves a great deal of space in pg_proc and its primary index, and it eliminates the former requirement that INDEX_MAX_KEYS and FUNC_MAX_ARGS have the same value. INDEX_MAX_KEYS is still embedded in the on-disk representation (because it affects index tuple header size), but FUNC_MAX_ARGS is not. I believe it would now be possible to increase FUNC_MAX_ARGS at little cost, but haven't experimented yet. There are still a lot of vestigial references to FUNC_MAX_ARGS, which I will clean up in a separate pass. However, getting rid of it altogether would require changing the FunctionCallInfoData struct, and I'm not sure I want to buy into that.	2005-03-29 00:17:27 +00:00
Tom Lane	bf3dbb5881	First steps towards index scans with heap access decoupled from index access: define new index access method functions 'amgetmulti' that can fetch multiple TIDs per call. (The functions exist but are totally untested as yet.) Since I was modifying pg_am anyway, remove the no-longer-needed 'rel' parameter from amcostestimate functions, and also remove the vestigial amowner column that was creating useless work for Alvaro's shared-object-dependencies project. Initdb forced due to changes in pg_am.	2005-03-27 23:53:05 +00:00
Tom Lane	94e03330cb	Create a routine PageIndexMultiDelete() that replaces a loop around PageIndexTupleDelete() with a single pass of compactification --- logic mostly lifted from PageRepairFragmentation. I noticed while profiling that a VACUUM that's cleaning up a whole lot of deleted tuples would spend as much as a third of its CPU time in PageIndexTupleDelete; not too surprising considering the loop method was roughly O(N^2) in the number of tuples involved.	2005-03-22 06:17:03 +00:00
Tom Lane	ee4ddac137	Convert index-related tuple handling routines from char 'n'/' ' to bool convention for isnull flags. Also, remove the useless InsertIndexResult return struct from index AM aminsert calls --- there is no reason for the caller to know where in the index the tuple was inserted, and we were wasting a palloc cycle per insert to deliver this uninteresting value (plus nontrivial complexity in some AMs). I forced initdb because of the change in the signature of the aminsert routines, even though nothing really looks at those pg_proc entries...	2005-03-21 01:24:04 +00:00
Tom Lane	354049c709	Remove unnecessary calls of FlushRelationBuffers: there is no need to write out data that we are about to tell the filesystem to drop. smgr_internal_unlink already had a DropRelFileNodeBuffers call to get rid of dead buffers without a write after it's no longer possible to roll back the deleting transaction. Adding a similar call in smgrtruncate simplifies callers and makes the overall division of labor clearer. This patch removes the former behavior that VACUUM would write all dirty buffers of a relation unconditionally.	2005-03-20 22:00:54 +00:00
PostgreSQL Daemon	2ff501590b	Tag appropriate files for rc3 Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...	2004-12-31 22:04:05 +00:00
Tom Lane	c3d6c7d8f9	Calculation of keys_are_unique flag was wrong for cases involving redundant cross-datatype comparisons. Per example from Merlin Moncure.	2004-12-15 19:16:39 +00:00
Tom Lane	5374d097de	Change planner to use the current true disk file size as its estimate of a relation's number of blocks, rather than the possibly-obsolete value in pg_class.relpages. Scale the value in pg_class.reltuples correspondingly to arrive at a hopefully more accurate number of rows. When pg_class contains 0/0, estimate a tuple width from the column datatypes and divide that into current file size to estimate number of rows. This improved methodology allows us to jettison the ancient hacks that put bogus default values into pg_class when a table is first created. Also, per a suggestion from Simon, make VACUUM (but not VACUUM FULL or ANALYZE) adjust the value it puts into pg_class.reltuples to try to represent the mean tuple density instead of the minimal density that actually prevails just after VACUUM. These changes alter the plans selected for certain regression tests, so update the expected files accordingly. (I removed join_1.out because it's not clear if it still applies; we can add back any variant versions as they are shown to be needed.)	2004-12-01 19:00:56 +00:00
Neil Conway	5d1dd2bc55	Micro-optimization of markpos() and restrpos() in btree and hash indexes. Rather than using ReadBuffer() to increment the reference count on an already-pinned buffer, we should use IncrBufferRefCount() as it is faster and does not require acquiring the BufMgrLock.	2004-11-17 03:13:38 +00:00
Neil Conway	4d0f669f3c	Remove obsolete comment from btbuild() and hashbuild(): we no longer use a global variable to control building indexes.	2004-11-11 00:32:50 +00:00
Tom Lane	83cd2d8b0f	Make heap_fetch API more consistent by having the buffer remain pinned in all cases when keep_buf = true. This allows ANALYZE's inner loop to use heap_release_fetch, which saves multiple buffer lookups for the same page and avoids overestimation of cost by the vacuum cost mechanism.	2004-10-26 16:05:03 +00:00
Tom Lane	9ffc8ed58b	Repair possible failure to update hint bits back to disk, per http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php. This fix is intended to be permanent: it moves the responsibility for calling SetBufferCommitInfoNeedsSave() into the tqual.c routines, eliminating the requirement for callers to test whether t_infomask changed. Also, tighten validity checking on buffer IDs in bufmgr.c --- several routines were paranoid about out-of-range shared buffer numbers but not about out-of-range local ones, which seems a tad pointless.	2004-10-15 22:40:29 +00:00
Neil Conway	0ed07d49d5	Code cleanup: don't bother casting the argument to pfree() to void * from another pointer type. Per C89, this is unnecessary, and it is common practice throughout the rest of the tree anyway.	2004-09-27 04:01:23 +00:00
Bruce Momjian	b6b71b85bc	Pgindent run for 8.0.	2004-08-29 05:07:03 +00:00
Bruce Momjian	da9a8649d8	Update copyright to 2004.	2004-08-29 04:13:13 +00:00
Tom Lane	1c72d0dec1	Fix relcache to account properly for subtransaction status of 'new' relcache entries. Also, change TransactionIdIsCurrentTransactionId() so that if consulted during transaction abort, it will not say that the aborted xact is still current. (It would be better to ensure that it's never called at all during abort, but I'm not sure we can easily guarantee that.) In combination, these fix a crash we have seen occasionally during parallel regression tests of 8.0.	2004-08-28 20:31:44 +00:00
Tom Lane	19cd31b068	Fix bug introduced into _bt_getstackbuf() on 2003-Feb-21: the initial value of 'start' could be past the end of the page, if the page was split by some concurrent inserting process since we visited it. In this situation the code could look at bogus entries and possibly find a match (since after all those entries still contain what they had before the split). This would lead to 'specified item offset is too large' followed by 'PANIC: failed to add item to the page', as reported by Joe Conway for scenarios involving heavy concurrent insertion activity.	2004-08-17 23:15:33 +00:00
Tom Lane	1a3de15a3a	Dept. of further reflection: I looked around to see if any other callers of XLogInsert had the same sort of checkpoint interlock problem as RecordTransactionCommit, and indeed I found some. Btree index build and ALTER TABLE SET TABLESPACE write data outside the friendly confines of the buffer manager, and therefore they have to take their own responsibility for checkpoint interlock. The easiest solution seems to be to force smgrimmedsync at the end of the index build or table copy, even when the operation is being WAL-logged. This is sufficient since the new index or table will be of interest to no one if we don't get as far as committing the current transaction.	2004-08-15 23:44:46 +00:00
Tom Lane	2042b3428d	Invent WAL timelines, as per recent discussion, to make point-in-time recovery more manageable. Also, undo recent change to add FILE_HEADER and WASTED_SPACE records to XLOG; instead make the XLOG page header variable-size with extra fields in the first page of an XLOG file. This should fix the boundary-case bugs observed by Mark Kirkwood. initdb forced due to change of XLOG representation.	2004-07-21 22:31:26 +00:00
Tom Lane	66ec2db728	XLOG file archiving and point-in-time recovery. There are still some loose ends and a glaring lack of documentation, but it basically works. Simon Riggs with some editorialization by Tom Lane.	2004-07-19 02:47:16 +00:00
Tom Lane	fe548629c5	Invent ResourceOwner mechanism as per my recent proposal, and use it to keep track of portal-related resources separately from transaction-related resources. This allows cursors to work in a somewhat sane fashion with nested transactions. For now, cursor behavior is non-subtransactional, that is a cursor's state does not roll back if you abort a subtransaction that fetched from the cursor. We might want to change that later.	2004-07-17 03:32:14 +00:00
Tom Lane	94d4d240bb	Rename XLOG_BTREE_NEWPAGE xlog record type into XLOG_HEAP_NEWPAGE, and shift support code into heapam.c accordingly. This is in service of soon-to-be-committed ALTER TABLE SET TABLESPACE code that will want to use this same record type for both heaps and indexes. Theoretically I should have forced initdb for this, but in practice there is no change in xlog contents because CVS tip will never really emit this record type anyhow...	2004-07-11 18:01:45 +00:00
Tom Lane	2467394ee1	Tablespaces. Alternate database locations are dead, long live tablespaces. There are various things left to do: contrib dbsize and oid2name modules need work, and so does the documentation. Also someone should think about COMMENT ON TABLESPACE and maybe RENAME TABLESPACE. Also initlocation is dead, it just doesn't know it yet. Gavin Sherry and Tom Lane.	2004-06-18 06:14:31 +00:00
Tom Lane	c3a153afed	Tweak palloc/repalloc to allow zero bytes to be requested, as per recent proposal. Eliminate several dozen now-unnecessary hacks to avoid palloc(0). (It's likely there are more that I didn't find.)	2004-06-05 19:48:09 +00:00
Tom Lane	2095206de1	Adjust btree index build to not use shared buffers, thereby avoiding the locking conflict against concurrent CHECKPOINT that was discussed a few weeks ago. Also, if not using WAL archiving (which is always true ATM but won't be if PITR makes it into this release), there's no need to WAL-log the index build process; it's sufficient to force-fsync the completed index before commit. This seems to gain about a factor of 2 in my tests, which is consistent with writing half as much data. I did not try it with WAL on a separate drive though --- probably the gain would be a lot less in that scenario.	2004-06-02 17:28:18 +00:00
Tom Lane	e674707968	Minor code rationalization: FlushRelationBuffers just returns void, rather than an error code, and does elog(ERROR) not elog(WARNING) when it detects a problem. All callers were simply elog(ERROR)'ing on failure return anyway, and I find it hard to envision a caller that would not, so we may as well simplify the callers and produce the more useful error message directly.	2004-05-31 19:24:05 +00:00
Neil Conway	72b6ad6313	Use the new List API function names throughout the backend, and disable the list compatibility API by default. While doing this, I decided to keep the llast() macro around and introduce llast_int() and llast_oid() variants.	2004-05-30 23:40:41 +00:00
Neil Conway	d0b4399d81	Reimplement the linked list data structure used throughout the backend. In the past, we used a 'Lispy' linked list implementation: a "list" was merely a pointer to the head node of the list. The problem with that design is that it makes lappend() and length() linear time. This patch fixes that problem (and others) by maintaining a count of the list length and a pointer to the tail node along with each head node pointer. A "list" is now a pointer to a structure containing some meta-data about the list; the head and tail pointers in that structure refer to ListCell structures that maintain the actual linked list of nodes. The function names of the list API have also been changed to, I hope, be more logically consistent. By default, the old function names are still available; they will be disabled-by-default once the rest of the tree has been updated to use the new API names.	2004-05-26 04:41:50 +00:00
Tom Lane	4af3421161	Get rid of rd_nblocks field in relcache entries. Turns out this was costing us lots more to maintain than it was worth. On shared tables it was of exactly zero benefit because we couldn't trust it to be up to date. On temp tables it sometimes saved an lseek, but not often enough to be worth getting excited about. And the real problem was that we forced an lseek on every relcache flush in order to update the field. So all in all it seems best to lose the complexity.	2004-05-08 19:09:25 +00:00
Tom Lane	37fa3b6c89	Tweak indexscan and seqscan code to arrange that steps from one page to the next are handled by ReleaseAndReadBuffer rather than separate ReleaseBuffer and ReadBuffer calls. This cuts the number of acquisitions of the BufMgrLock by a factor of 2 (possibly more, if an indexscan happens to pull successive rows from the same heap page). Unfortunately this doesn't seem enough to get us out of the recently discussed context-switch storm problem, but it's surely worth doing anyway.	2004-04-21 18:24:26 +00:00
Tom Lane	58f337a343	Centralize implementation of delay code by creating a pg_usleep() subroutine in src/port/pgsleep.c. Remove platform dependencies from miscadmin.h and put them in port.h where they belong. Extend recent vacuum cost-based-delay patch to apply to VACUUM FULL, ANALYZE, and non-btree index vacuuming. By the way, where is the documentation for the cost-based-delay patch?	2004-02-10 03:42:45 +00:00
Tom Lane	87bd956385	Restructure smgr API as per recent proposal. smgr no longer depends on the relcache, and so the notion of 'blind write' is gone. This should improve efficiency in bgwriter and background checkpoint processes. Internal restructuring in md.c to remove the not-very-useful array of MdfdVec objects --- might as well just use pointers. Also remove the long-dead 'persistent main memory' storage manager (mm.c), since it seems quite unlikely to ever get resurrected.	2004-02-10 01:55:27 +00:00
Jan Wieck	f425b605f4	Cost based vacuum delay feature. Jan	2004-02-06 19:36:18 +00:00
Tom Lane	391c3811a2	Rename SortMem and VacuumMem to work_mem and maintenance_work_mem. Make btree index creation and initial validation of foreign-key constraints use maintenance_work_mem rather than work_mem as their memory limit. Add some code to guc.c to allow these variables to be referenced by their old names in SHOW and SET commands, for backwards compatibility.	2004-02-03 17:34:04 +00:00
Neil Conway	192ad63bd7	More janitorial work: remove the explicit casting of NULL literals to a pointer type when it is not necessary to do so. For future reference, casting NULL to a pointer type is only necessary when (a) invoking a function AND either (b) the function has no prototype OR (c) the function is a varargs function.	2004-01-07 18:56:30 +00:00
Tom Lane	ef92b82dbb	Further cleanup in _bt_first: eliminate duplicate code paths.	2003-12-21 17:52:34 +00:00
Tom Lane	2a0caefeb5	Previous change exposed some opportunities for further simplification in _bt_first().	2003-12-21 03:00:04 +00:00
Tom Lane	569659ae16	Improve btree's initial-positioning-strategy code so that we never need to step more than one entry after descending the search tree to arrive at the correct place to start the scan. This can improve the behavior substantially when there are many entries equal to the chosen boundary value. Per suggestion from Dmitry Tkach, 14-Jul-03.	2003-12-21 01:23:06 +00:00
Neil Conway	fef0c8345a	I posted some bufmgr cleanup a few weeks ago, but it conflicted with some concurrent changes Jan was making to the bufmgr. Here's an updated version of the patch -- it should apply cleanly to CVS HEAD and passes the regression tests. This patch makes the following changes: - remove the UnlockAndReleaseBuffer() and UnlockAndWriteBuffer() macros, and replace uses of them with calls to the appropriate functions. - remove a bunch of #ifdef BMTRACE code: it is ugly & broken (i.e. it doesn't compile) - make BufferReplace() return a bool, not an int - cleanup some logic in bufmgr.c; should be functionality equivalent to the previous code, just cleaner now - remove the BM_PRIVATE flag as it is unused - improve a few comments, etc.	2003-12-14 00:34:47 +00:00
PostgreSQL Daemon	969685ad44	$Header: -> $PostgreSQL Changes ...	2003-11-29 19:52:15 +00:00
Tom Lane	fa5c8a055a	Cross-data-type comparisons are now indexable by btrees, pursuant to my pghackers proposal of 8-Nov. All the existing cross-type comparison operators (int2/int4/int8 and float4/float8) have appropriate support. The original proposal of storing the right-hand-side datatype as part of the primary key for pg_amop and pg_amproc got modified a bit in the event; it is easier to store zero as the 'default' case and only store a nonzero when the operator is actually cross-type. Along the way, remove the long-since-defunct bigbox_ops operator class.	2003-11-12 21:15:59 +00:00

1 2 3 4 5 ...

297 Commits