postgres

mirror of https://github.com/postgres/postgres.git synced 2025-12-19 17:02:53 +03:00

Author	SHA1	Message	Date
Tom Lane	39ebb64669	Fix subtransaction cleanup after an outer-subtransaction portal fails. Formerly, we treated only portals created in the current subtransaction as having failed during subtransaction abort. However, if the error occurred while running a portal created in an outer subtransaction (ie, a cursor declared before the last savepoint), that has to be considered broken too. To allow reliable detection of which ones those are, add a bookkeeping field to struct Portal that tracks the innermost subtransaction in which each portal has actually been executed. (Without this, we'd end up failing portals containing functions that had called the subtransaction, thereby breaking plpgsql exception blocks completely.) In addition, when we fail an outer-subtransaction Portal, transfer its resources into the subtransaction's resource owner, so that they're released early in cleanup of the subxact. This fixes a problem reported by Jim Nasby in which a function executed in an outer-subtransaction cursor could cause an Assert failure or crash by referencing a relation created within the inner subtransaction. The proximate cause of the Assert failure is that AtEOSubXact_RelationCache assumed it could blow away a relcache entry without first checking that the entry had zero refcount. That was a bad idea on its own terms, so add such a check there, and to the similar coding in AtEOXact_RelationCache. This provides an independent safety measure in case there are still ways to provoke the situation despite the Portal-level changes. This has been broken since subtransactions were invented, so back-patch to all supported branches. Tom Lane and Michael Paquier	2015-09-04 13:36:50 -04:00
Tom Lane	88fab18a4c	Fix the logic for putting relations into the relcache init file. Commit `f3b5565dd4` was a couple of bricks shy of a load; specifically, it missed putting pg_trigger_tgrelid_tgname_index into the relcache init file, because that index is not used by any syscache. However, we have historically nailed that index into cache for performance reasons. The upshot was that load_relcache_init_file always decided that the init file was busted and silently ignored it, resulting in a significant hit to backend startup speed. To fix, reinstantiate RelationIdIsInInitFile() as a wrapper around RelationSupportsSysCache(), which can know about additional relations that should be in the init file despite being unknown to syscache.c. Also install some guards against future mistakes of this type: make write_relcache_init_file Assert that all nailed relations get written to the init file, and make load_relcache_init_file emit a WARNING if it takes the "wrong number of nailed relations" exit path. Now that we remove the init files during postmaster startup, that case should never occur in the field, even if we are starting a minor-version update that added or removed rels from the nailed set. So the warning shouldn't ever be seen by end users, but it will show up in the regression tests if somebody breaks this logic. Back-patch to all supported branches, like the previous commit.	2015-06-25 14:39:05 -04:00
Tom Lane	3e69a73b98	Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit `7863404417`), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.	2015-06-07 15:32:09 -04:00
Andres Freund	6cbadda25a	Improve relcache invalidation handling of currently invisible relations. The corner case where a relcache invalidation tried to rebuild the entry for a referenced relation but couldn't find it in the catalog wasn't correct. The code tried to RelationCacheDelete/RelationDestroyRelation the entry. That didn't work when assertions are enabled because the latter contains an assertion ensuring the refcount is zero. It's also more generally a bad idea, because by virtue of being referenced somebody might actually look at the entry, which is possible if the error is trapped and handled via a subtransaction abort. Instead just error out, without deleting the entry. As the entry is marked invalid, the worst that can happen is that the invalid (and at some point unused) entry lingers in the relcache. Discussion: 22459.1418656530@sss.pgh.pa.us There should be no way to hit this case < 9.4 where logical decoding introduced a bug that can hit this. But since the code for handling the corner case is there it should do something halfway sane, so backpatch all the the way back. The logical decoding bug will be handled in a separate commit.	2015-01-07 00:25:17 +01:00
Tom Lane	d47fff3d72	Explicitly support the case that a plancache's raw_parse_tree is NULL. This only happens if a client issues a Parse message with an empty query string, which is a bit odd; but since it is explicitly called out as legal by our FE/BE protocol spec, we'd probably better continue to allow it. Fix by adding tests everywhere that the raw_parse_tree field is passed to functions that don't or shouldn't accept NULL. Also make it clear in the relevant comments that NULL is an expected case. This reverts commits `a73c9dbab0` and `2e9650cbcf`, which fixed specific crash symptoms by hacking things at what now seems to be the wrong end, ie the callee functions. Making the callees allow NULL is superficially more robust, but it's not always true that there is a defensible thing for the callee to do in such cases. The caller has more context and is better able to decide what the empty-query case ought to do. Per followup discussion of bug #11335. Back-patch to 9.2. The code before that is sufficiently different that it would require development of a separate patch, which doesn't seem worthwhile for what is believed to be an essentially cosmetic change.	2014-11-12 15:58:47 -05:00
Bruce Momjian	0b44914c21	Remove tabs after spaces in C comments This was not changed in HEAD, but will be done later as part of a pgindent run. Future pgindent runs will also do this. Report by Tom Lane Backpatch through all supported branches, but not HEAD	2014-05-06 11:26:27 -04:00
Tom Lane	005f583ba4	Account better for planning cost when choosing whether to use custom plans. The previous coding in plancache.c essentially used 10% of the estimated runtime as its cost estimate for planning. This can be pretty bogus, especially when the estimated runtime is very small, such as in a simple expression plan created by plpgsql, or a simple INSERT ... VALUES. While we don't have a really good handle on how planning time compares to runtime, it seems reasonable to use an estimate based on the number of relations referenced in the query, with a rather large multiplier. This patch uses 1000 * cpu_operator_cost * (nrelations + 1), so that even a trivial query will be charged 1000 * cpu_operator_cost for planning. This should address the problem reported by Marc Cousin and others that 9.2 and up prefer custom plans in cases where the planning time greatly exceeds what can be saved.	2013-08-24 15:14:24 -04:00
Tom Lane	fd59974f2d	Fix cache flush hazard in cache_record_field_properties(). We need to increment the refcount on the composite type's cached tuple descriptor while we do lookups of its column types. Otherwise a cache flush could occur and release the tuple descriptor before we're done with it. This fails reliably with -DCLOBBER_CACHE_ALWAYS, but the odds of a failure in a production build seem rather low (since the pfree'd descriptor typically wouldn't get scribbled on immediately). That may explain the lack of any previous reports. Buildfarm issue noted by Christian Ullrich. Back-patch to 9.1 where the bogus code was added.	2013-06-11 17:26:48 -04:00
Tom Lane	c37ec840cf	Fix longstanding race condition in plancache.c. When creating or manipulating a cached plan for a transaction control command (particularly ROLLBACK), we must not perform any catalog accesses, since we might be in an aborted transaction. However, plancache.c busily saved or examined the search_path for every cached plan. If we were unlucky enough to do this at a moment where the path's expansion into schema OIDs wasn't already cached, we'd do some catalog accesses; and with some more bad luck such as an ill-timed signal arrival, that could lead to crashes or Assert failures, as exhibited in bug #8095 from Nachiket Vaidya. Fortunately, there's no real need to consider the search path for such commands, so we can just skip the relevant steps when the subject statement is a TransactionStmt. This is somewhat related to bug #5269, though the failure happens during initial cached-plan creation rather than revalidation. This bug has been there since the plan cache was invented, so back-patch to all supported branches.	2013-04-20 16:59:27 -04:00
Tom Lane	666569f1fd	Fix error-checking typo in check_TSCurrentConfig(). The code failed to detect an out-of-memory failure. Xi Wang	2013-01-20 23:10:00 -05:00
Tom Lane	a17da19ed9	Invent a "one-shot" variant of CachedPlans for better performance. SPI_execute() and related functions create a CachedPlan, execute it once, and immediately discard it, so that the functionality offered by plancache.c is of no value in this code path. And performance measurements show that the extra data copying and invalidation checking done by plancache.c slows down simple queries by 10% or more compared to 9.1. However, enough of the SPI code is shared with functions that do need plan caching that it seems impractical to bypass plancache.c altogether. Instead, let's invent a variant version of cached plans that preserves 99% of the API but doesn't offer any of the actual functionality, nor the overhead. This puts SPI_execute() performance back on par, or maybe even slightly better, than it was before. This change should resolve recent complaints of performance degradation from Dong Ye, Pavel Stehule, and others. By avoiding data copying, this change also reduces the amount of memory needed to execute many-statement SPI_execute() strings, as for instance in a recent complaint from Tomas Vondra. An additional benefit of this change is that multi-statement SPI_execute() query strings are now processed fully serially, that is we complete execution of earlier statements before running parse analysis and planning on following ones. This eliminates a long-standing POLA violation, in that DDL that affects the behavior of a later statement will now behave as expected. Back-patch to 9.2, since this was a performance regression compared to 9.1. (In 9.2, place the added struct fields so as to avoid changing the offsets of existing fields.) Heikki Linnakangas and Tom Lane	2013-01-04 17:42:25 -05:00
Tom Lane	fe2ef429a1	Fix failure to ignore leftover temp tables after a server crash. During crash recovery, we remove disk files belonging to temporary tables, but the system catalog entries for such tables are intentionally not cleaned up right away. Instead, the first backend that uses a temp schema is expected to clean out any leftover objects therein. This approach requires that we be careful to ignore leftover temp tables (since any actual access attempt would fail), even if their BackendId matches our session, if we have not yet established use of the session's corresponding temp schema. That worked fine in the past, but was broken by commit `debcec7dc3` which incorrectly removed the rd_islocaltemp relcache flag. Put it back, and undo various changes that substituted tests like "rel->rd_backend == MyBackendId" for use of a state-aware flag. Per trouble report from Heikki Linnakangas. Back-patch to 9.1 where the erroneous change was made. In the back branches, be careful to add rd_islocaltemp in a spot in the struct that was alignment padding before, so as not to break existing add-on code.	2012-12-17 20:15:39 -05:00
Tom Lane	94c014b532	Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. Commit `8cb53654db`, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, make the final state be indisvalid = true and indisready = false, which is otherwise nonsensical. This is pretty ugly but we can't add another column without forcing initdb, and it's too late for that in 9.2. (There's a cleaner fix in HEAD.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. Portions of this need to be back-patched even further, but I'll work on that separately. Problem reported by Amit Kapila, diagnosis by Pavan Deolasee, fix by Tom Lane and Andres Freund.	2012-11-29 10:37:13 -05:00
Tom Lane	0fbd44387d	Make equal() ignore CoercionForm fields for better planning with casts. This change ensures that the planner will see implicit and explicit casts as equivalent for all purposes, except in the minority of cases where there's actually a semantic difference (as reflected by having a 3-argument cast function). In particular, this fixes cases where the EquivalenceClass machinery failed to consider two references to a varchar column as equivalent if one was implicitly cast to text but the other was explicitly cast to text, as seen in bug #7598 from Vaclav Juza. We have had similar bugs before in other parts of the planner, so I think it's time to fix this problem at the core instead of continuing to band-aid around it. Remove set_coercionform_dontcare(), which represents the band-aid previously in use for allowing matching of index and constraint expressions with inconsistent cast labeling. (We can probably get rid of COERCE_DONTCARE altogether, but I don't think removing that enum value in back branches would be wise; it's possible there's third party code referring to it.) Back-patch to 9.2. We could go back further, and might want to once this has been tested more; but for the moment I won't risk destabilizing plan choices in long-since-stable branches.	2012-10-12 12:10:55 -04:00
Tom Lane	972e066638	Fix race condition in enum value comparisons. When (re) loading the typcache comparison cache for an enum type's values, use an up-to-date MVCC snapshot, not the transaction's existing snapshot. This avoids problems if we encounter an enum OID that was created since our transaction started. Per report from Andres Freund and diagnosis by Robert Haas. To ensure this is safe even if enum comparison manages to get invoked before we've set a transaction snapshot, tweak GetLatestSnapshot to redirect to GetTransactionSnapshot instead of throwing error when FirstSnapshotSet is false. The existing uses of GetLatestSnapshot (in ri_triggers.c) don't care since they couldn't be invoked except in a transaction that's already done some work --- but it seems just conceivable that this might not be true of enums, especially if we ever choose to use enums in system catalogs. Note that the comparable coding in enum_endpoint and enum_range_internal remains GetTransactionSnapshot; this is perhaps debatable, but if we changed it those functions would have to be marked volatile, which doesn't seem attractive. Back-patch to 9.1 where ALTER TYPE ADD VALUE was added.	2012-07-01 17:12:54 -04:00
Bruce Momjian	927d61eeff	Run pgindent on 9.2 source tree in preparation for first 9.3 commit-fest.	2012-06-10 15:20:04 -04:00
Alvaro Herrera	09ff76fcdb	Recast "ONLY" column CHECK constraints as NO INHERIT The original syntax wasn't universally loved, and it didn't allow its usage in CREATE TABLE, only ALTER TABLE. It now works everywhere, and it also allows using ALTER TABLE ONLY to add an uninherited CHECK constraint, per discussion. The pg_constraint column has accordingly been renamed connoinherit. This commit partly reverts some of the changes in `61d81bd28d`, particularly some pg_dump and psql bits, because now pg_get_constraintdef includes the necessary NO INHERIT within the constraint definition. Author: Nikhil Sontakke Some tweaks by me	2012-04-20 23:56:57 -03:00
Simon Riggs	8cb53654db	Add DROP INDEX CONCURRENTLY [IF EXISTS], uses ShareUpdateExclusiveLock	2012-04-06 10:21:40 +01:00
Tom Lane	f70f095c90	Allow new relmapper entries when allow_system_table_mods is true. This restores the pre-9.0 situation that it's possible to add new indexes on pg_class and other mapped-but-not-shared catalogs, so long as you broke the glass and flipped the big red Dont-Touch-Me switch. As before, there are a lot of gotchas, and you'd have to be pretty desperate to try this on a production database; but there doesn't seem to be a reason for relmapper.c to be preventing such things all by itself. Per experimentation with a case suggested by Cody Cutrer.	2012-03-21 14:09:39 -04:00
Tom Lane	9dbf2b7d75	Restructure SELECT INTO's parsetree representation into CreateTableAsStmt. Making this operation look like a utility statement seems generally a good idea, and particularly so in light of the desire to provide command triggers for utility statements. The original choice of representing it as SELECT with an IntoClause appendage had metastasized into rather a lot of places, unfortunately, so that this patch is a great deal more complicated than one might at first expect. In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS subcommands required restructuring some EXPLAIN-related APIs. Add-on code that calls ExplainOnePlan or ExplainOneUtility, or uses ExplainOneQuery_hook, will need adjustment. Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO, which formerly were accepted though undocumented, are no longer accepted. The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE. The CREATE RULE case doesn't seem to have much real-world use (since the rule would work only once before failing with "table already exists"), so we'll not bother with that one. Both SELECT INTO and CREATE TABLE AS still return a command tag of "SELECT nnnn". There was some discussion of returning "CREATE TABLE nnnn", but for the moment backwards compatibility wins the day. Andres Freund and Tom Lane	2012-03-19 21:38:12 -04:00
Tom Lane	d4bf3c9c94	Expose an API for calculating catcache hash values. Now that cache invalidation callbacks get only a hash value, and not a tuple TID (per commits `632ae6829f` and `b5282aa893`), the only way they can restrict what they invalidate is to know what the hash values mean. setrefs.c was doing this via a hard-wired assumption but that seems pretty grotty, and it'll only get worse as more cases come up. So let's expose a calculation function that takes the same parameters as SearchSysCache. Per complaint from Marko Kreen.	2012-03-07 14:51:13 -05:00
Robert Haas	cd30728fb2	Allow LEAKPROOF functions for better performance of security views. We don't normally allow quals to be pushed down into a view created with the security_barrier option, but functions without side effects are an exception: they're OK. This allows much better performance in common cases, such as when using an equality operator (that might even be indexable). There is an outstanding issue here with the CREATE FUNCTION / ALTER FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the leakproof flag. But I'm committing this as-is so that it doesn't have to be rebased again; we can fix up the grammar in a future commit. KaiGai Kohei, with some wordsmithing by me.	2012-02-13 22:21:14 -05:00
Heikki Linnakangas	a578257040	Accept a non-existent value in "ALTER USER/DATABASE SET ..." command. When default_text_search_config, default_tablespace, or temp_tablespaces setting is set per-user or per-database, with an "ALTER USER/DATABASE SET ..." statement, don't throw an error if the text search configuration or tablespace does not exist. In case of text search configuration, even if it doesn't exist in the current database, it might exist in another database, where the setting is intended to have its effect. This behavior is now the same as search_path's. Tablespaces are cluster-wide, so the same argument doesn't hold for tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER SET ..." statements before the "CREATE TABLESPACE" statements. Arguably that's pg_dumpall's fault - it should dump the statements in such an order that the tablespace is created first and then the "ALTER USER SET default_tablespace ..." statements after that - but it seems better to be consistent with search_path and default_text_search_config anyway. Besides, you could still create a dump that throws an error, by creating the tablespace, running "ALTER USER SET default_tablespace", then dropping the tablespace and running pg_dumpall on that. Backpatch to all supported versions.	2012-01-30 11:13:36 +02:00
Peter Eisentraut	8a3f745f16	Do not access indclass through Form_pg_index Normally, accessing variable-length members of catalog structures past the first one doesn't work at all. Here, it happened to work because indnatts was checked to be 1, and so the defined FormData_pg_index layout, using int2vector[1] and oidvector[1] for variable-length arrays, happened to match the actual memory layout. But it's a very fragile assumption, and it's not in a performance-critical path, so code it properly using heap_getattr() instead. bug analysis by Tom Lane	2012-01-27 20:08:34 +02:00
Bruce Momjian	e126958c2e	Update copyright notices for year 2012.	2012-01-01 18:01:58 -05:00
Robert Haas	0e4611c023	Add a security_barrier option for views. When a view is marked as a security barrier, it will not be pulled up into the containing query, and no quals will be pushed down into it, so that no function or operator chosen by the user can be applied to rows not exposed by the view. Views not configured with this option cannot provide robust row-level security, but will perform far better. Patch by KaiGai Kohei; original problem report by Heikki Linnakangas (in October 2009!). Review (in earlier versions) by Noah Misch and others. Design advice by Tom Lane and myself. Further review and cleanup by me.	2011-12-22 16:16:31 -05:00
Alvaro Herrera	61d81bd28d	Allow CHECK constraints to be declared ONLY This makes them enforceable only on the parent table, not on children tables. This is useful in various situations, per discussion involving people bitten by the restrictive behavior introduced in 8.4. Message-Id: 8762mp93iw.fsf@comcast.net CAFaPBrSMMpubkGf4zcRL_YL-AERUbYF_-ZNNYfb3CVwwEqc9TQ@mail.gmail.com Authors: Nikhil Sontakke, Alex Hunsaker Reviewed by Robert Haas and myself	2011-12-19 17:30:23 -03:00
Tom Lane	c6e3ac11b6	Create a "sort support" interface API for faster sorting. This patch creates an API whereby a btree index opclass can optionally provide non-SQL-callable support functions for sorting. In the initial patch, we only use this to provide a directly-callable comparator function, which can be invoked with a bit less overhead than the traditional SQL-callable comparator. While that should be of value in itself, the real reason for doing this is to provide a datatype-extensible framework for more aggressive optimizations, as in Peter Geoghegan's recent work. Robert Haas and Tom Lane	2011-12-07 00:19:39 -05:00
Tom Lane	65d9aedb1b	Fix getTypeIOParam to support type record[]. Since record[] uses array_in, it needs to have its element type passed as typioparam. In HEAD and 9.1, this fix essentially reverts commit `9bc933b212`, which was a hack that is no longer needed since domains don't set their typelem anymore. Before that, adjust the logic so that only domains are excluded from being treated like arrays, rather than assuming that only base types should be included. Add a regression test to demonstrate the need for this. Per report from Maxim Boguk. Back-patch to 8.4, where type record[] was added.	2011-12-01 12:44:16 -05:00
Tom Lane	b985d48779	Further code review for range types patch. Fix some bugs in coercion logic and pg_dump; more comment cleanup; minor cosmetic improvements.	2011-11-20 23:50:27 -05:00
Tom Lane	37ee4b75db	Restructure function-internal caching in the range type code. Move the responsibility for caching specialized information about range types into the type cache, so that the catalog lookups only have to occur once per session. Rearrange APIs a bit so that fn_extra caching is actually effective in the GiST support code. (Use of OidFunctionCallN is bad enough for performance in itself, but it also prevents the function from exploiting fn_extra caching.) The range I/O functions are still not very bright about caching repeated lookups, but that seems like material for a separate patch. Also, avoid unnecessary use of memcpy to fetch/store the range type OID and flags, and don't use the full range_deserialize machinery when all we need to see is the flags value. Also fix API error in range_gist_penalty --- it was failing to set *penalty for any case involving an empty range.	2011-11-15 13:05:45 -05:00
Heikki Linnakangas	4429f6a9e3	Support range data types. Selectivity estimation functions are missing for some range type operators, which is a TODO. Jeff Davis	2011-11-03 13:42:15 +02:00
Tom Lane	08e261cbc9	Fix race condition with toast table access from a stale syscache entry. If a tuple in a syscache contains an out-of-line toasted field, and we try to fetch that field shortly after some other transaction has committed an update or deletion of the tuple, there is a race condition: vacuum could come along and remove the toast tuples before we can fetch them. This leads to transient failures like "missing chunk number 0 for toast value NNNNN in pg_toast_2619", as seen in recent reports from Andrew Hammond and Tim Uckun. The design idea of syscache is that access to stale syscache entries should be prevented by relation-level locks, but that fails for at least two cases where toasted fields are possible: ANALYZE updates pg_statistic rows without locking out sessions that might want to plan queries on the same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without any meaningful lock at all. The least risky fix seems to be an idea that Heikki suggested when we were dealing with a related problem back in August: forcibly detoast any out-of-line fields before putting a tuple into syscache in the first place. This avoids the problem because at the time we fetch the parent tuple from the catalog, we should be holding an MVCC snapshot that will prevent removal of the toast tuples, even if the parent tuple is outdated immediately after we fetch it. (Note: I'm not convinced that this statement holds true at every instant where we could be fetching a syscache entry at all, but it does appear to hold true at the times where we could fetch an entry that could have a toasted field. We will need to be a bit wary of adding toast tables to low-level catalogs that don't have them already.) An additional benefit is that subsequent uses of the syscache entry should be faster, since they won't have to detoast the field. Back-patch to all supported versions. The problem is significantly harder to reproduce in pre-9.0 releases, because of their willingness to flush every entry in a syscache whenever the underlying catalog is vacuumed (cf CatalogCacheFlushRelation); but there is still a window for trouble.	2011-11-01 19:49:58 -04:00
Tom Lane	e6858e6657	Measure the number of all-visible pages for use in index-only scan costing. Add a column pg_class.relallvisible to remember the number of pages that were all-visible according to the visibility map as of the last VACUUM (or ANALYZE, or some other operations that update pg_class.relpages). Use relallvisible/relpages, instead of an arbitrary constant, to estimate how many heap page fetches can be avoided during an index-only scan. This is pretty primitive and will no doubt see refinements once we've acquired more field experience with the index-only scan mechanism, but it's way better than using a constant. Note: I had to adjust an underspecified query in the window.sql regression test, because it was changing answers when the plan changed to use an index-only scan. Some of the adjacent tests perhaps should be adjusted as well, but I didn't do that here.	2011-10-14 17:23:46 -04:00
Tom Lane	a2822fb933	Support index-only scans using the visibility map to avoid heap fetches. When a btree index contains all columns required by the query, and the visibility map shows that all tuples on a target heap page are visible-to-all, we don't need to fetch that heap page. This patch depends on the previous patches that made the visibility map reliable. There's a fair amount left to do here, notably trying to figure out a less chintzy way of estimating the cost of an index-only scan, but the core functionality seems ready to commit. Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.	2011-10-07 20:14:13 -04:00
Tom Lane	21fb95da46	Use a fresh copy of query_list when making a second plan in GetCachedPlan. The code path that tried a generic plan, didn't like it, and then made a custom plan was mistakenly passing the same copy of the query_list to the planner both times. This doesn't work too well for nontrivial queries, since the planner tends to scribble on its input. Diagnosis and fix by Yamamoto Takashi.	2011-09-26 12:44:17 -04:00
Tom Lane	d5aa7a9fe6	Avoid unnecessary snapshot-acquisitions in BuildCachedPlan. I had copied-and-pasted a claim that we couldn't reach this point when dealing with utility statements, but that was a leftover from when the caller was required to supply a plan to start with. We now will go through here at least once when handling a utility statement, so it seems worth a check to see whether a snapshot is actually needed. (Note that analyze_requires_snapshot is quite a cheap test.) Per suggestion from Yamamoto Takashi. I don't think I believe that this resolves his reported assertion failure; but it's worth changing anyway, just to save a cycle or two.	2011-09-25 17:34:20 -04:00
Tom Lane	c4ae968633	Fix Assert failure in new plancache code. The regression tests were failing with CLOBBER_CACHE_ALWAYS enabled, as reported by buildfarm member jaguar. There was an Assert in BuildCachedPlan that asserted that the CachedPlanSource hadn't been invalidated since we called RevalidateCachedQuery, which in theory can't happen because we are holding locks on all the relevant database objects. However, CLOBBER_CACHE_ALWAYS generates a false positive by making an invalidation happen anyway; and on reflection, that could also occur as a result of a badly-timed sinval reset due to queue overflow. We could just remove the Assert and forge ahead with the not-really-stale querytree, but it seems safer to do another RevalidateCachedQuery call just to make real sure everything's OK.	2011-09-17 01:47:33 -04:00
Tom Lane	e6faf910d7	Redesign the plancache mechanism for more flexibility and efficiency. Rewrite plancache.c so that a "cached plan" (which is rather a misnomer at this point) can support generation of custom, parameter-value-dependent plans, and can make an intelligent choice between using custom plans and the traditional generic-plan approach. The specific choice algorithm implemented here can probably be improved in future, but this commit is all about getting the mechanism in place, not the policy. In addition, restructure the API to greatly reduce the amount of extraneous data copying needed. The main compromise needed to make that possible was to split the initial creation of a CachedPlanSource into two steps. It's worth noting in particular that SPI_saveplan is now deprecated in favor of SPI_keepplan, which accomplishes the same end result with zero data copying, and no need to then spend even more cycles throwing away the original SPIPlan. The risk of long-term memory leaks while manipulating SPIPlans has also been greatly reduced. Most of this improvement is based on use of the recently-added MemoryContextSetParent primitive.	2011-09-16 00:43:52 -04:00
Tom Lane	db10f01baa	Improve comment about handling of temp tables in shared-inval code.	2011-09-06 17:06:54 -04:00
Bruce Momjian	f458c90bff	Add C comment about why we send cache invalidation messages for session-local objects.	2011-09-05 22:09:02 -04:00
Tom Lane	1609797c25	Clean up the #include mess a little. walsender.h should depend on xlog.h, not vice versa. (Actually, the inclusion was circular until a couple hours ago, which was even sillier; but Bruce broke it in the expedient rather than logically correct direction.) Because of that poor decision, plus blind application of pgrminclude, we had a situation where half the system was depending on xlog.h to include such unrelated stuff as array.h and guc.h. Clean up the header inclusion, and manually revert a lot of what pgrminclude had done so things build again. This episode reinforces my feeling that pgrminclude should not be run without adult supervision. Inclusion changes in header files in particular need to be reviewed with great care. More generally, it'd be good if we had a clearer notion of module layering to dictate which headers can sanely include which others ... but that's a big task for another day.	2011-09-04 01:13:16 -04:00
Bruce Momjian	6416a82a62	Remove unnecessary #include references, per pgrminclude script.	2011-09-01 10:04:27 -04:00
Tom Lane	5bba65de94	Fix a missed case in code for "moving average" estimate of reltuples. It is possible for VACUUM to scan no pages at all, if the visibility map shows that all pages are all-visible. In this situation VACUUM has no new information to report about the relation's tuple density, so it wasn't changing pg_class.reltuples ... but it updated pg_class.relpages anyway. That's wrong in general, since there is no evidence to justify changing the density ratio reltuples/relpages, but it's particularly bad if the previous state was relpages=reltuples=0, which means "unknown tuple density". We just replaced "unknown" with "zero". ANALYZE would eventually recover from this, but it could take a lot of repetitions of ANALYZE to do so if the relation size is much larger than the maximum number of pages ANALYZE will scan, because of the moving-average behavior introduced by commit `b4b6923e03`. The only known situation where we could have relpages=reltuples=0 and yet the visibility map asserts everything's visible is immediately following a pg_upgrade. It might be advisable for pg_upgrade to try to preserve the relpages/reltuples statistics; but in any case this code is wrong on its own terms, so fix it. Per report from Sergey Koposov. Back-patch to 8.4, where the visibility map was introduced, same as the previous change.	2011-08-30 14:51:38 -04:00
Tom Lane	b5282aa893	Revise sinval code to remove no-longer-used tuple TID from inval messages. This requires adjusting the API for syscache callback functions: they now get a hash value, not a TID, to identify the target tuple. Most of them weren't paying any attention to that argument anyway, but plancache did require a small amount of fixing. Also, improve performance a trifle by avoiding sending duplicate inval messages when a heap_update isn't changing the catcache lookup columns.	2011-08-16 19:27:46 -04:00
Tom Lane	632ae6829f	Forget about targeting catalog cache invalidations by tuple TID. The TID isn't stable enough: we might queue an sinval event before a VACUUM FULL, and then process it afterwards, when the target tuple no longer has the same TID. So we must invalidate entries on the basis of hash value only. The old coding can be shown to result in various bizarre, hard-to-reproduce errors in the presence of concurrent VACUUM FULLs on system catalogs, and could easily result in permanent catalog corruption, up to and including complete loss of tables. This commit is just a minimal fix that removes the unsafe comparison. We should remove transmission of the tuple TID from sinval messages altogether, and then arrange to suppress the extra message in the common case of a heap_update that doesn't change the key hashvalue. But that's going to be much more invasive, and will only produce a probably-marginal performance gain, so it doesn't seem like material for a back-patch. Back-patch to 9.0. Before that, VACUUM FULL refused to do any tuple moving if it found any INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples (and CLUSTER would give up altogether), so there was no risk of moving a tuple that might be the subject of an unsent sinval message.	2011-08-16 15:26:22 -04:00
Tom Lane	f4d7f1adba	Fix incorrect order of operations during sinval reset processing. We have to be sure that we have revalidated each nailed-in-cache relcache entry before we try to use it to load data for some other relcache entry. The introduction of "mapped relations" in 9.0 broke this, because although we updated the state kept in relmapper.c early enough, we failed to propagate that information into relcache entries soon enough; in particular, we could try to fetch pg_class rows out of pg_class before we'd updated its relcache entry's rd_node.relNode value from the map. This bug accounts for Dave Gould's report of failures after "vacuum full pg_class", and I believe that there is risk for other system catalogs as well. The core part of the fix is to copy relmapper data into the relcache entries during "phase 1" in RelationCacheInvalidate(), before they'll be used in "phase 2". To try to future-proof the code against other similar bugs, I also rearranged the order in which nailed relations are visited during phase 2: now it's pg_class first, then pg_class_oid_index, then other nailed relations. This should ensure that RelationClearRelation can apply RelationReloadIndexInfo to all nailed indexes without risking use of not-yet-revalidated relcache entries. Back-patch to 9.0 where the relation mapper was introduced.	2011-08-16 14:38:20 -04:00
Tom Lane	2ada6779c5	Fix race condition in relcache init file invalidation. The previous code tried to synchronize by unlinking the init file twice, but that doesn't actually work: it leaves a window wherein a third process could read the already-stale init file but miss the SI messages that would tell it the data is stale. The result would be bizarre failures in catalog accesses, typically "could not read block 0 in file ..." later during startup. Instead, hold RelCacheInitLock across both the unlink and the sending of the SI messages. This is more straightforward, and might even be a bit faster since only one unlink call is needed. This has been wrong since it was put in (in 2002!), so back-patch to all supported releases.	2011-08-16 13:11:54 -04:00
Robert Haas	367bc426a1	Avoid index rebuild for no-rewrite ALTER TABLE .. ALTER TYPE. Noah Misch. Review and minor cosmetic changes by me.	2011-07-18 11:04:43 -04:00
Tom Lane	14f67192c2	Remove assumptions that not-equals operators cannot be in any opclass. get_op_btree_interpretation assumed this in order to save some duplication of code, but it's not true in general anymore because we added <> support to btree_gist. (We still assume it for btree opclasses, though.) Also, essentially the same logic was baked into predtest.c. Get rid of that duplication by generalizing get_op_btree_interpretation so that it can be used by predtest.c. Per bug report from Denis de Bernardy and investigation by Jeff Davis, though I didn't use Jeff's patch exactly as-is. Back-patch to 9.1; we do not support this usage before that.	2011-07-06 14:53:16 -04:00

1 2 3 4 5 ...

685 Commits