postgres

mirror of https://github.com/postgres/postgres.git synced 2025-06-05 23:56:58 +03:00

Author	SHA1	Message	Date
Robert Haas	568d4138c6	Use an MVCC snapshot, rather than SnapshotNow, for catalog scans. SnapshotNow scans have the undesirable property that, in the face of concurrent updates, the scan can fail to see either the old or the new versions of the row. In many cases, we work around this by requiring DDL operations to hold AccessExclusiveLock on the object being modified; in some cases, the existing locking is inadequate and random failures occur as a result. This commit doesn't change anything related to locking, but will hopefully pave the way to allowing lock strength reductions in the future. The major issue has held us back from making this change in the past is that taking an MVCC snapshot is significantly more expensive than using a static special snapshot such as SnapshotNow. However, testing of various worst-case scenarios reveals that this problem is not severe except under fairly extreme workloads. To mitigate those problems, we avoid retaking the MVCC snapshot for each new scan; instead, we take a new snapshot only when invalidation messages have been processed. The catcache machinery already requires that invalidation messages be sent before releasing the related heavyweight lock; else other backends might rely on locally-cached data rather than scanning the catalog at all. Thus, making snapshot reuse dependent on the same guarantees shouldn't break anything that wasn't already subtly broken. Patch by me. Review by Michael Paquier and Andres Freund.	2013-07-02 09:47:01 -04:00
Bruce Momjian	9af4159fce	pgindent run for release 9.3 This is the first run of the Perl-based pgindent script. Also update pgindent instructions.	2013-05-29 16:58:43 -04:00
Tom Lane	1d6c72a55b	Move materialized views' is-populated status into their pg_class entries. Previously this state was represented by whether the view's disk file had zero or nonzero size, which is problematic for numerous reasons, since it's breaking a fundamental assumption about heap storage. This was done to allow unlogged matviews to revert to unpopulated status after a crash despite our lack of any ability to update catalog entries post-crash. However, this poses enough risk of future problems that it seems better to not support unlogged matviews until we can find another way. Accordingly, revert that choice as well as a number of existing kluges forced by it in favor of creating a pg_class.relispopulated flag column.	2013-05-06 13:27:22 -04:00
Kevin Grittner	52e6e33ab4	Create a distinction between a populated matview and a scannable one. The intent was that being populated would, long term, be just one of the conditions which could affect whether a matview was scannable; being populated should be necessary but not always sufficient to scan the relation. Since only CREATE and REFRESH currently determine the scannability, names and comments accidentally conflated these concepts, leading to confusion. Also add missing locking for the SQL function which allows a test for scannability, and fix a modularity violatiion. Per complaints from Tom Lane, although its not clear that these will satisfy his concerns. Hopefully this will at least better frame the discussion.	2013-04-09 13:02:49 -05:00
Tom Lane	1908abc4a3	Arrange to cache FdwRoutine structs in foreign tables' relcache entries. This saves several catalog lookups per reference. It's not all that exciting right now, because we'd managed to minimize the number of places that need to fetch the data; but the upcoming writable-foreign-tables patch needs this info in a lot more places.	2013-03-06 23:48:09 -05:00
Kevin Grittner	3bf3ab8c56	Add a materialized view relations. A materialized view has a rule just like a view and a heap and other physical properties like a table. The rule is only used to populate the table, references in queries refer to the materialized data. This is a minimal implementation, but should still be useful in many cases. Currently data is only populated "on demand" by the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements. It is expected that future releases will add incremental updates with various timings, and that a more refined concept of defining what is "fresh" data will be developed. At some point it may even be possible to have queries use a materialized in place of references to underlying tables, but that requires the other above-mentioned features to be working first. Much of the documentation work by Robert Haas. Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja Security review by KaiGai Kohei, with a decision on how best to implement sepgsql still pending.	2013-03-03 18:23:31 -06:00
Alvaro Herrera	a730183926	Move relpath() to libpgcommon This enables non-backend code, such as pg_xlogdump, to use it easily. The previous location, in src/backend/catalog/catalog.c, made that essentially impossible because that file depends on many backend-only facilities; so this needs to live separately.	2013-02-21 22:46:17 -03:00
Tom Lane	991f3e5ab3	Provide database object names as separate fields in error messages. This patch addresses the problem that applications currently have to extract object names from possibly-localized textual error messages, if they want to know for example which index caused a UNIQUE_VIOLATION failure. It adds new error message fields to the wire protocol, which can carry the name of a table, table column, data type, or constraint associated with the error. (Since the protocol spec has always instructed clients to ignore unrecognized field types, this should not create any compatibility problem.) Support for providing these new fields has been added to just a limited set of error reports (mainly, those in the "integrity constraint violation" SQLSTATE class), but we will doubtless add them to more calls in future. Pavel Stehule, reviewed and extensively revised by Peter Geoghegan, with additional hacking by Tom Lane.	2013-01-29 17:08:26 -05:00
Alvaro Herrera	0ac5ad5134	Improve concurrency of foreign key locking This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov	2013-01-23 12:04:59 -03:00
Tom Lane	d5b31cc32b	Fix an O(N^2) performance issue for sessions modifying many relations. AtEOXact_RelationCache() scanned the entire relation cache at the end of any transaction that created a new relation or assigned a new relfilenode. Thus, clients such as pg_restore had an O(N^2) performance problem that would start to be noticeable after creating 10000 or so tables. Since typically only a small number of relcache entries need any cleanup, we can fix this by keeping a small list of their OIDs and doing hash_searches for them. We fall back to the full-table scan if the list overflows. Ideally, the maximum list length would be set at the point where N hash_searches would cost just less than the full-table scan. Some quick experimentation says that point might be around 50-100; I (tgl) conservatively set MAX_EOXACT_LIST = 32. For the case that we're worried about here, which is short single-statement transactions, it's unlikely there would ever be more than about a dozen list entries anyway; so it's probably not worth being too tense about the value. We could avoid the hash_searches by instead keeping the target relcache entries linked into a list, but that would be noticeably more complicated and bug-prone because of the need to maintain such a list in the face of relcache entry drops. Since a relcache entry can only need such cleanup after a somewhat-heavyweight filesystem operation, trying to save a hash_search per cleanup doesn't seem very useful anyway --- it's the scan over all the not-needing-cleanup entries that we wish to avoid here. Jeff Janes, reviewed and tweaked a bit by Tom Lane	2013-01-20 13:45:10 -05:00
Bruce Momjian	bd61a623ac	Update copyrights for 2013 Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.	2013-01-01 17:15:01 -05:00
Simon Riggs	ae9aba69a8	Keep rd_newRelfilenodeSubid across overflow. Teach RelationCacheInvalidate() to keep rd_newRelfilenodeSubid across rel cache message overflows, so that behaviour is now fully deterministic. Noah Misch	2012-12-24 16:43:22 +00:00
Tom Lane	6919b7e329	Fix failure to ignore leftover temp tables after a server crash. During crash recovery, we remove disk files belonging to temporary tables, but the system catalog entries for such tables are intentionally not cleaned up right away. Instead, the first backend that uses a temp schema is expected to clean out any leftover objects therein. This approach requires that we be careful to ignore leftover temp tables (since any actual access attempt would fail), even if their BackendId matches our session, if we have not yet established use of the session's corresponding temp schema. That worked fine in the past, but was broken by commit debcec7dc31a992703911a9953e299c8d730c778 which incorrectly removed the rd_islocaltemp relcache flag. Put it back, and undo various changes that substituted tests like "rel->rd_backend == MyBackendId" for use of a state-aware flag. Per trouble report from Heikki Linnakangas. Back-patch to 9.1 where the erroneous change was made. In the back branches, be careful to add rd_islocaltemp in a spot in the struct that was alignment padding before, so as not to break existing add-on code.	2012-12-17 20:15:32 -05:00
Tom Lane	3c84046490	Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, add an additional boolean column "indislive" to pg_index, so that the freshly-created and about-to-die states can be distinguished. (This change obviously is only possible in HEAD. This patch will need to be back-patched, but in 9.2 we'll use a kluge consisting of overloading the formerly-impossible state of indisvalid = true and indisready = false.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. This will need to be back-patched, but in a noticeably different form, so I'm committing it to HEAD before working on the back-patch. Problem reported by Amit Kapila, diagnosis by Pavan Deolassee, fix by Tom Lane and Andres Freund.	2012-11-28 21:26:01 -05:00
Tom Lane	71e58dcfb9	Make equal() ignore CoercionForm fields for better planning with casts. This change ensures that the planner will see implicit and explicit casts as equivalent for all purposes, except in the minority of cases where there's actually a semantic difference (as reflected by having a 3-argument cast function). In particular, this fixes cases where the EquivalenceClass machinery failed to consider two references to a varchar column as equivalent if one was implicitly cast to text but the other was explicitly cast to text, as seen in bug #7598 from Vaclav Juza. We have had similar bugs before in other parts of the planner, so I think it's time to fix this problem at the core instead of continuing to band-aid around it. Remove set_coercionform_dontcare(), which represents the band-aid previously in use for allowing matching of index and constraint expressions with inconsistent cast labeling. (We can probably get rid of COERCE_DONTCARE altogether, but I don't think removing that enum value in back branches would be wise; it's possible there's third party code referring to it.) Back-patch to 9.2. We could go back further, and might want to once this has been tested more; but for the moment I won't risk destabilizing plan choices in long-since-stable branches.	2012-10-12 12:11:22 -04:00
Alvaro Herrera	c219d9b0a5	Split tuple struct defs from htup.h to htup_details.h This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.	2012-08-30 16:52:35 -04:00
Alvaro Herrera	45326c5a11	Split resowner.h This lets files that are mere users of ResourceOwner not automatically include the headers for stuff that is managed by the resowner mechanism.	2012-08-28 18:02:07 -04:00
Robert Haas	d2c86a1ccd	Remove RELKIND_UNCATALOGED. This may have been important at some point in the past, but it no longer does anything useful. Review by Tom Lane.	2012-06-14 09:47:30 -04:00
Bruce Momjian	927d61eeff	Run pgindent on 9.2 source tree in preparation for first 9.3 commit-fest.	2012-06-10 15:20:04 -04:00
Alvaro Herrera	09ff76fcdb	Recast "ONLY" column CHECK constraints as NO INHERIT The original syntax wasn't universally loved, and it didn't allow its usage in CREATE TABLE, only ALTER TABLE. It now works everywhere, and it also allows using ALTER TABLE ONLY to add an uninherited CHECK constraint, per discussion. The pg_constraint column has accordingly been renamed connoinherit. This commit partly reverts some of the changes in 61d81bd28dbec65a6b144e0cd3d0bfe25913c3ac, particularly some pg_dump and psql bits, because now pg_get_constraintdef includes the necessary NO INHERIT within the constraint definition. Author: Nikhil Sontakke Some tweaks by me	2012-04-20 23:56:57 -03:00
Simon Riggs	8cb53654db	Add DROP INDEX CONCURRENTLY [IF EXISTS], uses ShareUpdateExclusiveLock	2012-04-06 10:21:40 +01:00
Peter Eisentraut	8a3f745f16	Do not access indclass through Form_pg_index Normally, accessing variable-length members of catalog structures past the first one doesn't work at all. Here, it happened to work because indnatts was checked to be 1, and so the defined FormData_pg_index layout, using int2vector[1] and oidvector[1] for variable-length arrays, happened to match the actual memory layout. But it's a very fragile assumption, and it's not in a performance-critical path, so code it properly using heap_getattr() instead. bug analysis by Tom Lane	2012-01-27 20:08:34 +02:00
Bruce Momjian	e126958c2e	Update copyright notices for year 2012.	2012-01-01 18:01:58 -05:00
Robert Haas	0e4611c023	Add a security_barrier option for views. When a view is marked as a security barrier, it will not be pulled up into the containing query, and no quals will be pushed down into it, so that no function or operator chosen by the user can be applied to rows not exposed by the view. Views not configured with this option cannot provide robust row-level security, but will perform far better. Patch by KaiGai Kohei; original problem report by Heikki Linnakangas (in October 2009!). Review (in earlier versions) by Noah Misch and others. Design advice by Tom Lane and myself. Further review and cleanup by me.	2011-12-22 16:16:31 -05:00
Alvaro Herrera	61d81bd28d	Allow CHECK constraints to be declared ONLY This makes them enforceable only on the parent table, not on children tables. This is useful in various situations, per discussion involving people bitten by the restrictive behavior introduced in 8.4. Message-Id: 8762mp93iw.fsf@comcast.net CAFaPBrSMMpubkGf4zcRL_YL-AERUbYF_-ZNNYfb3CVwwEqc9TQ@mail.gmail.com Authors: Nikhil Sontakke, Alex Hunsaker Reviewed by Robert Haas and myself	2011-12-19 17:30:23 -03:00
Tom Lane	e6858e6657	Measure the number of all-visible pages for use in index-only scan costing. Add a column pg_class.relallvisible to remember the number of pages that were all-visible according to the visibility map as of the last VACUUM (or ANALYZE, or some other operations that update pg_class.relpages). Use relallvisible/relpages, instead of an arbitrary constant, to estimate how many heap page fetches can be avoided during an index-only scan. This is pretty primitive and will no doubt see refinements once we've acquired more field experience with the index-only scan mechanism, but it's way better than using a constant. Note: I had to adjust an underspecified query in the window.sql regression test, because it was changing answers when the plan changed to use an index-only scan. Some of the adjacent tests perhaps should be adjusted as well, but I didn't do that here.	2011-10-14 17:23:46 -04:00
Tom Lane	a2822fb933	Support index-only scans using the visibility map to avoid heap fetches. When a btree index contains all columns required by the query, and the visibility map shows that all tuples on a target heap page are visible-to-all, we don't need to fetch that heap page. This patch depends on the previous patches that made the visibility map reliable. There's a fair amount left to do here, notably trying to figure out a less chintzy way of estimating the cost of an index-only scan, but the core functionality seems ready to commit. Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.	2011-10-07 20:14:13 -04:00
Tom Lane	1609797c25	Clean up the #include mess a little. walsender.h should depend on xlog.h, not vice versa. (Actually, the inclusion was circular until a couple hours ago, which was even sillier; but Bruce broke it in the expedient rather than logically correct direction.) Because of that poor decision, plus blind application of pgrminclude, we had a situation where half the system was depending on xlog.h to include such unrelated stuff as array.h and guc.h. Clean up the header inclusion, and manually revert a lot of what pgrminclude had done so things build again. This episode reinforces my feeling that pgrminclude should not be run without adult supervision. Inclusion changes in header files in particular need to be reviewed with great care. More generally, it'd be good if we had a clearer notion of module layering to dictate which headers can sanely include which others ... but that's a big task for another day.	2011-09-04 01:13:16 -04:00
Bruce Momjian	6416a82a62	Remove unnecessary #include references, per pgrminclude script.	2011-09-01 10:04:27 -04:00
Tom Lane	5bba65de94	Fix a missed case in code for "moving average" estimate of reltuples. It is possible for VACUUM to scan no pages at all, if the visibility map shows that all pages are all-visible. In this situation VACUUM has no new information to report about the relation's tuple density, so it wasn't changing pg_class.reltuples ... but it updated pg_class.relpages anyway. That's wrong in general, since there is no evidence to justify changing the density ratio reltuples/relpages, but it's particularly bad if the previous state was relpages=reltuples=0, which means "unknown tuple density". We just replaced "unknown" with "zero". ANALYZE would eventually recover from this, but it could take a lot of repetitions of ANALYZE to do so if the relation size is much larger than the maximum number of pages ANALYZE will scan, because of the moving-average behavior introduced by commit b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. The only known situation where we could have relpages=reltuples=0 and yet the visibility map asserts everything's visible is immediately following a pg_upgrade. It might be advisable for pg_upgrade to try to preserve the relpages/reltuples statistics; but in any case this code is wrong on its own terms, so fix it. Per report from Sergey Koposov. Back-patch to 8.4, where the visibility map was introduced, same as the previous change.	2011-08-30 14:51:38 -04:00
Tom Lane	f4d7f1adba	Fix incorrect order of operations during sinval reset processing. We have to be sure that we have revalidated each nailed-in-cache relcache entry before we try to use it to load data for some other relcache entry. The introduction of "mapped relations" in 9.0 broke this, because although we updated the state kept in relmapper.c early enough, we failed to propagate that information into relcache entries soon enough; in particular, we could try to fetch pg_class rows out of pg_class before we'd updated its relcache entry's rd_node.relNode value from the map. This bug accounts for Dave Gould's report of failures after "vacuum full pg_class", and I believe that there is risk for other system catalogs as well. The core part of the fix is to copy relmapper data into the relcache entries during "phase 1" in RelationCacheInvalidate(), before they'll be used in "phase 2". To try to future-proof the code against other similar bugs, I also rearranged the order in which nailed relations are visited during phase 2: now it's pg_class first, then pg_class_oid_index, then other nailed relations. This should ensure that RelationClearRelation can apply RelationReloadIndexInfo to all nailed indexes without risking use of not-yet-revalidated relcache entries. Back-patch to 9.0 where the relation mapper was introduced.	2011-08-16 14:38:20 -04:00
Tom Lane	2ada6779c5	Fix race condition in relcache init file invalidation. The previous code tried to synchronize by unlinking the init file twice, but that doesn't actually work: it leaves a window wherein a third process could read the already-stale init file but miss the SI messages that would tell it the data is stale. The result would be bizarre failures in catalog accesses, typically "could not read block 0 in file ..." later during startup. Instead, hold RelCacheInitLock across both the unlink and the sending of the SI messages. This is more straightforward, and might even be a bit faster since only one unlink call is needed. This has been wrong since it was put in (in 2002!), so back-patch to all supported releases.	2011-08-16 13:11:54 -04:00
Robert Haas	367bc426a1	Avoid index rebuild for no-rewrite ALTER TABLE .. ALTER TYPE. Noah Misch. Review and minor cosmetic changes by me.	2011-07-18 11:04:43 -04:00
Alvaro Herrera	897795240c	Enable CHECK constraints to be declared NOT VALID This means that they can initially be added to a large existing table without checking its initial contents, but new tuples must comply to them; a separate pass invoked by ALTER TABLE / VALIDATE can verify existing data and ensure it complies with the constraint, at which point it is marked validated and becomes a normal part of the table ecosystem. An non-validated CHECK constraint is ignored in the planner for constraint_exclusion purposes; when validated, cached plans are recomputed so that partitioning starts working right away. This patch also enables domains to have unvalidated CHECK constraints attached to them as well by way of ALTER DOMAIN / ADD CONSTRAINT / NOT VALID, which can later be validated with ALTER DOMAIN / VALIDATE CONSTRAINT. Thanks to Thom Brown, Dean Rasheed and Jaime Casanova for the various reviews, and Robert Hass for documentation wording improvement suggestions. This patch was sponsored by Enova Financial.	2011-06-30 11:24:31 -04:00
Peter Eisentraut	21f1e15aaf	Unify spelling of "canceled", "canceling", "cancellation" We had previously (af26857a2775e7ceb0916155e931008c2116632f) established the U.S. spellings as standard.	2011-06-29 09:28:46 +03:00
Bruce Momjian	bf50caf105	pgindent run before PG 9.1 beta 1.	2011-04-10 11:42:00 -04:00
Tom Lane	1192ba8b67	Avoid potential deadlock in InitCatCachePhase2(). Opening a catcache's index could require reading from that cache's own catalog, which of course would acquire AccessShareLock on the catalog. So the original coding here risks locking index before heap, which could deadlock against another backend trying to get exclusive locks in the normal order. Because InitCatCachePhase2 is only called when a backend has to start up without a relcache init file, the deadlock was seldom seen in the field. (And by the same token, there's no need to worry about any performance disadvantage; so not much point in trying to distinguish exactly which catalogs have the risk.) Bug report, diagnosis, and patch by Nikhil Sontakke. Additional commentary by me. Back-patch to all supported branches.	2011-03-22 13:00:48 -04:00
Peter Eisentraut	414c5a2ea6	Per-column collation support This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch	2011-02-08 23:04:18 +02:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Robert Haas	53dbc27c62	Support unlogged tables. The contents of an unlogged table are WAL-logged; thus, they are not available on standby servers and are truncated whenever the database system enters recovery. Indexes on unlogged tables are also unlogged. Unlogged GiST indexes are not currently supported.	2010-12-29 06:48:53 -05:00
Robert Haas	5f7b58fad8	Generalize concept of temporary relations to "relation persistence". This commit replaces pg_class.relistemp with pg_class.relpersistence; and also modifies the RangeVar node type to carry relpersistence rather than istemp. It also removes removes rd_istemp from RelationData and instead performs the correct computation based on relpersistence. For clarity, we add three new macros: RelationNeedsWAL(), RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we can clarify the purpose of each check that previous depended on rd_istemp. This is intended as infrastructure for the upcoming unlogged tables patch, as well as for future possible work on global temporary tables.	2010-12-13 12:34:26 -05:00
Tom Lane	c0b5fac701	Simplify and speed up mapping of index opfamilies to pathkeys. Formerly we looked up the operators associated with each index (caching them in relcache) and then the planner looked up the btree opfamily containing such operators in order to build the btree-centric pathkey representation that describes the index's sort order. This is quite pointless for btree indexes: we might as well just use the index's opfamily information directly. That saves syscache lookup cycles during planning, and furthermore allows us to eliminate the relcache's caching of operators altogether, which may help in reducing backend startup time. I added code to plancat.c to perform the same type of double lookup on-the-fly if it's ever faced with a non-btree amcanorder index AM. If such a thing actually becomes interesting for production, we should replace that logic with some more-direct method for identifying the corresponding btree opfamily; but it's not worth spending effort on now. There is considerably more to do pursuant to my recent proposal to get rid of sort-operator-based representations of sort orderings, but this patch grabs some of the low-hanging fruit. I'll look at the remainder of that work after the current commitfest.	2010-11-29 12:30:43 -05:00
Tom Lane	511e902b51	Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally. In the previous coding, we simply issued ALTER SEQUENCE RESTART commands, which do not roll back on error. This meant that an error between truncating and committing left the sequences out of sync with the table contents, with potentially bad consequences as were noted in a Warning on the TRUNCATE man page. To fix, create a new storage file (relfilenode) for a sequence that is to be reset due to RESTART IDENTITY. If the transaction aborts, we'll automatically revert to the old storage file. This acts just like a rewriting ALTER TABLE operation. A penalty is that we have to take exclusive lock on the sequence, but since we've already got exclusive lock on its owning table, that seems unlikely to be much of a problem. The interaction of this with usual nontransactional behaviors of sequence operations is a bit weird, but it's hard to see what would be completely consistent. Our choice is to discard cached-but-unissued sequence values both when the RESTART is executed, and at rollback if any; but to not touch the currval() state either time. In passing, move the sequence reset operations to happen before not after any AFTER TRUNCATE triggers are fired. The previous ordering was not logically sensible, but was forced by the need to minimize inconsistency if the triggers caused an error. Transactional rollback is a much better solution to that. Patch by Steve Singer, rather heavily adjusted by me.	2010-11-17 16:42:18 -05:00
Robert Haas	5ccbc3d802	Correct poor grammar in comment.	2010-11-14 23:10:45 -05:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Tom Lane	9513918c6c	Fix up flushing of composite-type typcache entries to be driven directly by SI invalidation events, rather than indirectly through the relcache. In the previous coding, we had to flush a composite-type typcache entry whenever we discarded the corresponding relcache entry. This caused problems at least when testing with RELCACHE_FORCE_RELEASE, as shown in recent report from Jeff Davis, and might result in real-world problems given the kind of unexpected relcache flush that that test mechanism is intended to model. The new coding decouples relcache and typcache management, which is a good thing anyway from a structural perspective. The cost is that we have to search the typcache linearly to find entries that need to be flushed. There are a couple of ways we could avoid that, but at the moment it's not clear it's worth any extra trouble, because the typcache contains very few entries in typical operation. Back-patch to 8.2, the same as some other recent fixes in this general area. The patch could be carried back to 8.0 with some additional work, but given that it's only hypothetical whether we're fixing any problem observable in the field, it doesn't seem worth the work now.	2010-09-02 03:16:46 +00:00
Robert Haas	debcec7dc3	Include the backend ID in the relpath of temporary relations. This allows us to reliably remove all leftover temporary relation files on cluster startup without reference to system catalogs or WAL; therefore, we no longer include temporary relations in XLOG_XACT_COMMIT and XLOG_XACT_ABORT WAL records. Since these changes require including a backend ID in each SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id field has been reduced from two bytes to one, and the maximum number of connections has been reduced from INT_MAX / 4 to 2^23-1. It would be possible to remove these restrictions by increasing the size of SharedInvalidationMessage by 4 bytes, but right now that doesn't seem like a good trade-off. Review by Jaime Casanova and Tom Lane.	2010-08-13 20:10:54 +00:00
Bruce Momjian	239d769e7e	pgindent run for 9.0, second run	2010-07-06 19:19:02 +00:00
Tom Lane	ea46000a40	Arrange for client authentication to occur before we select a specific database to connect to. This is necessary for the walsender code to work properly (it was previously using an untenable assumption that template1 would always be available to connect to). This also gets rid of a small security shortcoming that was introduced in the original patch to eliminate the flat authentication files: before, you could find out whether or not the requested database existed even if you couldn't pass the authentication checks. The changes needed to support this are mainly just to treat pg_authid and pg_auth_members as nailed relations, so that we can read them without having to be able to locate real pg_class entries for them. This mechanism was already debugged for pg_database, but we hadn't recognized the value of applying it to those catalogs too. Since the current code doesn't have support for accessing toast tables before we've brought up all of the relcache, remove pg_authid's toast table to ensure that no one can store an out-of-line toasted value of rolpassword. The case seems quite unlikely to occur in practice, and was effectively unsupported anyway in the old "flatfiles" implementation. Update genbki.pl to actually implement the same rules as bootstrap.c does for not-nullability of catalog columns. The previous coding was a bit cheesy but worked all right for the previous set of bootstrap catalogs. It does not work for pg_authid, where rolvaliduntil needs to be nullable. Initdb forced due to minor catalog changes (mainly the toast table removal).	2010-04-20 23:48:47 +00:00
Tom Lane	73981cb451	Fix a problem introduced by my patch of 2010-01-12 that revised the way relcache reload works. In the patched code, a relcache entry in process of being rebuilt doesn't get unhooked from the relcache hash table; which means that if a cache flush occurs due to sinval queue overrun while we're rebuilding it, the entry could get blown away by RelationCacheInvalidate, resulting in crash or misbehavior. Fix by ensuring that an entry being rebuilt has positive refcount, so it won't be seen as a target for removal if a cache flush occurs. (This will mean that the entry gets rebuilt twice in such a scenario, but that's okay.) It appears that the problem can only arise within a transaction that has previously reassigned the relfilenode of a pre-existing table, via TRUNCATE or a similar operation. Per bug #5412 from Rusty Conover. Back-patch to 8.2, same as the patch that introduced the problem. I think that the failure can't actually occur in 8.2, since it lacks the rd_newRelfilenodeSubid optimization, but let's make it work like the later branches anyway. Patch by Heikki, slightly editorialized on by me.	2010-04-14 21:31:11 +00:00

... 2 3 4 5 6 ...

508 Commits