1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-15 19:21:59 +03:00
Commit Graph

713 Commits

Author SHA1 Message Date
834aa56ea1 Fix the logic for putting relations into the relcache init file.
Commit f3b5565dd4 was a couple of bricks shy
of a load; specifically, it missed putting pg_trigger_tgrelid_tgname_index
into the relcache init file, because that index is not used by any
syscache.  However, we have historically nailed that index into cache for
performance reasons.  The upshot was that load_relcache_init_file always
decided that the init file was busted and silently ignored it, resulting
in a significant hit to backend startup speed.

To fix, reinstantiate RelationIdIsInInitFile() as a wrapper around
RelationSupportsSysCache(), which can know about additional relations
that should be in the init file despite being unknown to syscache.c.

Also install some guards against future mistakes of this type: make
write_relcache_init_file Assert that all nailed relations get written to
the init file, and make load_relcache_init_file emit a WARNING if it takes
the "wrong number of nailed relations" exit path.  Now that we remove the
init files during postmaster startup, that case should never occur in the
field, even if we are starting a minor-version update that added or removed
rels from the nailed set.  So the warning shouldn't ever be seen by end
users, but it will show up in the regression tests if somebody breaks this
logic.

Back-patch to all supported branches, like the previous commit.
2015-06-25 14:39:05 -04:00
4f2458dd78 Use a safer method for determining whether relcache init file is stale.
When we invalidate the relcache entry for a system catalog or index, we
must also delete the relcache "init file" if the init file contains a copy
of that rel's entry.  The old way of doing this relied on a specially
maintained list of the OIDs of relations present in the init file: we made
the list either when reading the file in, or when writing the file out.
The problem is that when writing the file out, we included only rels
present in our local relcache, which might have already suffered some
deletions due to relcache inval events.  In such cases we correctly decided
not to overwrite the real init file with incomplete data --- but we still
used the incomplete initFileRelationIds list for the rest of the current
session.  This could result in wrong decisions about whether the session's
own actions require deletion of the init file, potentially allowing an init
file created by some other concurrent session to be left around even though
it's been made stale.

Since we don't support changing the schema of a system catalog at runtime,
the only likely scenario in which this would cause a problem in the field
involves a "vacuum full" on a catalog concurrently with other activity, and
even then it's far from easy to provoke.  Remarkably, this has been broken
since 2002 (in commit 7863404417), but we had
never seen a reproducible test case until recently.  If it did happen in
the field, the symptoms would probably involve unexpected "cache lookup
failed" errors to begin with, then "could not open file" failures after the
next checkpoint, as all accesses to the affected catalog stopped working.
Recovery would require manually removing the stale "pg_internal.init" file.

To fix, get rid of the initFileRelationIds list, and instead consult
syscache.c's list of relations used in catalog caches to decide whether a
relation is included in the init file.  This should be a tad more efficient
anyway, since we're replacing linear search of a list with ~100 entries
with a binary search.  It's a bit ugly that the init file contents are now
so directly tied to the catalog caches, but in practice that won't make
much difference.

Back-patch to all supported branches.
2015-06-07 15:32:09 -04:00
6fdcf886d5 Improve relcache invalidation handling of currently invisible relations.
The corner case where a relcache invalidation tried to rebuild the
entry for a referenced relation but couldn't find it in the catalog
wasn't correct.

The code tried to RelationCacheDelete/RelationDestroyRelation the
entry. That didn't work when assertions are enabled because the latter
contains an assertion ensuring the refcount is zero. It's also more
generally a bad idea, because by virtue of being referenced somebody
might actually look at the entry, which is possible if the error is
trapped and handled via a subtransaction abort.

Instead just error out, without deleting the entry. As the entry is
marked invalid, the worst that can happen is that the invalid (and at
some point unused) entry lingers in the relcache.

Discussion: 22459.1418656530@sss.pgh.pa.us

There should be no way to hit this case < 9.4 where logical decoding
introduced a bug that can hit this. But since the code for handling
the corner case is there it should do something halfway sane, so
backpatch all the the way back.  The logical decoding bug will be
handled in a separate commit.
2015-01-07 00:24:15 +01:00
c2b4ed19f6 Explicitly support the case that a plancache's raw_parse_tree is NULL.
This only happens if a client issues a Parse message with an empty query
string, which is a bit odd; but since it is explicitly called out as legal
by our FE/BE protocol spec, we'd probably better continue to allow it.

Fix by adding tests everywhere that the raw_parse_tree field is passed to
functions that don't or shouldn't accept NULL.  Also make it clear in the
relevant comments that NULL is an expected case.

This reverts commits a73c9dbab0 and
2e9650cbcf, which fixed specific crash
symptoms by hacking things at what now seems to be the wrong end, ie the
callee functions.  Making the callees allow NULL is superficially more
robust, but it's not always true that there is a defensible thing for the
callee to do in such cases.  The caller has more context and is better
able to decide what the empty-query case ought to do.

Per followup discussion of bug #11335.  Back-patch to 9.2.  The code
before that is sufficiently different that it would require development
of a separate patch, which doesn't seem worthwhile for what is believed
to be an essentially cosmetic change.
2014-11-12 15:58:44 -05:00
e7a9020939 Code review for recent changes in relcache.c.
rd_replidindex should be managed the same as rd_oidindex, and rd_keyattr
and rd_idattr should be managed like rd_indexattr.  Omissions in this area
meant that the bitmapsets computed for rd_keyattr and rd_idattr would be
leaked during any relcache flush, resulting in a slow but permanent leak in
CacheMemoryContext.  There was also a tiny probability of relcache entry
corruption if we ran out of memory at just the wrong point in
RelationGetIndexAttrBitmap.  Otherwise, the fields were not zeroed where
expected, which would not bother the code any AFAICS but could greatly
confuse anyone examining the relcache entry while debugging.

Also, create an API function RelationGetReplicaIndex rather than letting
non-relcache code be intimate with the mechanisms underlying caching of
that value (we won't even mention the memory leak there).

Also, fix a relcache flush hazard identified by Andres Freund:
RelationGetIndexAttrBitmap must not assume that rd_replidindex stays valid
across index_open.

The aspects of this involving rd_keyattr date back to 9.3, so back-patch
those changes.
2014-05-14 14:55:50 -04:00
04e15c69d2 Remove tabs after spaces in C comments
This was not changed in HEAD, but will be done later as part of a
pgindent run.  Future pgindent runs will also do this.

Report by Tom Lane

Backpatch through all supported branches, but not HEAD
2014-05-06 11:26:28 -04:00
a5f11e24a4 Account better for planning cost when choosing whether to use custom plans.
The previous coding in plancache.c essentially used 10% of the estimated
runtime as its cost estimate for planning.  This can be pretty bogus,
especially when the estimated runtime is very small, such as in a simple
expression plan created by plpgsql, or a simple INSERT ... VALUES.

While we don't have a really good handle on how planning time compares
to runtime, it seems reasonable to use an estimate based on the number of
relations referenced in the query, with a rather large multiplier.  This
patch uses 1000 * cpu_operator_cost * (nrelations + 1), so that even a
trivial query will be charged 1000 * cpu_operator_cost for planning.
This should address the problem reported by Marc Cousin and others that
9.2 and up prefer custom plans in cases where the planning time greatly
exceeds what can be saved.
2013-08-24 15:14:21 -04:00
e262755bfc Fix cache flush hazard in cache_record_field_properties().
We need to increment the refcount on the composite type's cached tuple
descriptor while we do lookups of its column types.  Otherwise a cache
flush could occur and release the tuple descriptor before we're done with
it.  This fails reliably with -DCLOBBER_CACHE_ALWAYS, but the odds of a
failure in a production build seem rather low (since the pfree'd descriptor
typically wouldn't get scribbled on immediately).  That may explain the
lack of any previous reports.  Buildfarm issue noted by Christian Ullrich.

Back-patch to 9.1 where the bogus code was added.
2013-06-11 17:26:42 -04:00
9af4159fce pgindent run for release 9.3
This is the first run of the Perl-based pgindent script.  Also update
pgindent instructions.
2013-05-29 16:58:43 -04:00
1d6c72a55b Move materialized views' is-populated status into their pg_class entries.
Previously this state was represented by whether the view's disk file had
zero or nonzero size, which is problematic for numerous reasons, since it's
breaking a fundamental assumption about heap storage.  This was done to
allow unlogged matviews to revert to unpopulated status after a crash
despite our lack of any ability to update catalog entries post-crash.
However, this poses enough risk of future problems that it seems better to
not support unlogged matviews until we can find another way.  Accordingly,
revert that choice as well as a number of existing kluges forced by it
in favor of creating a pg_class.relispopulated flag column.
2013-05-06 13:27:22 -04:00
ac63dca607 Fix longstanding race condition in plancache.c.
When creating or manipulating a cached plan for a transaction control
command (particularly ROLLBACK), we must not perform any catalog accesses,
since we might be in an aborted transaction.  However, plancache.c busily
saved or examined the search_path for every cached plan.  If we were
unlucky enough to do this at a moment where the path's expansion into
schema OIDs wasn't already cached, we'd do some catalog accesses; and with
some more bad luck such as an ill-timed signal arrival, that could lead to
crashes or Assert failures, as exhibited in bug #8095 from Nachiket Vaidya.
Fortunately, there's no real need to consider the search path for such
commands, so we can just skip the relevant steps when the subject statement
is a TransactionStmt.  This is somewhat related to bug #5269, though the
failure happens during initial cached-plan creation rather than
revalidation.

This bug has been there since the plan cache was invented, so back-patch
to all supported branches.
2013-04-20 17:00:23 -04:00
52e6e33ab4 Create a distinction between a populated matview and a scannable one.
The intent was that being populated would, long term, be just one
of the conditions which could affect whether a matview was
scannable; being populated should be necessary but not always
sufficient to scan the relation.  Since only CREATE and REFRESH
currently determine the scannability, names and comments
accidentally conflated these concepts, leading to confusion.

Also add missing locking for the SQL function which allows a
test for scannability, and fix a modularity violatiion.

Per complaints from Tom Lane, although its not clear that these
will satisfy his concerns.  Hopefully this will at least better
frame the discussion.
2013-04-09 13:02:49 -05:00
473ab40c8b Add sql_drop event for event triggers
This event takes place just before ddl_command_end, and is fired if and
only if at least one object has been dropped by the command.  (For
instance, DROP TABLE IF EXISTS of a table that does not in fact exist
will not lead to such a trigger firing).  Commands that drop multiple
objects (such as DROP SCHEMA or DROP OWNED BY) will cause a single event
to fire.  Some firings might be surprising, such as
ALTER TABLE DROP COLUMN.

The trigger is fired after the drop has taken place, because that has
been deemed the safest design, to avoid exposing possibly-inconsistent
internal state (system catalogs as well as current transaction) to the
user function code.  This means that careful tracking of object
identification is required during the object removal phase.

Like other currently existing events, there is support for tag
filtering.

To support the new event, add a new pg_event_trigger_dropped_objects()
set-returning function, which returns a set of rows comprising the
objects affected by the command.  This is to be used within the user
function code, and is mostly modelled after the recently introduced
pg_identify_object() function.

Catalog version bumped due to the new function.

Dimitri Fontaine and Álvaro Herrera
Review by Robert Haas, Tom Lane
2013-03-28 13:05:48 -03:00
1908abc4a3 Arrange to cache FdwRoutine structs in foreign tables' relcache entries.
This saves several catalog lookups per reference.  It's not all that
exciting right now, because we'd managed to minimize the number of places
that need to fetch the data; but the upcoming writable-foreign-tables patch
needs this info in a lot more places.
2013-03-06 23:48:09 -05:00
3bf3ab8c56 Add a materialized view relations.
A materialized view has a rule just like a view and a heap and
other physical properties like a table.  The rule is only used to
populate the table, references in queries refer to the
materialized data.

This is a minimal implementation, but should still be useful in
many cases.  Currently data is only populated "on demand" by the
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
It is expected that future releases will add incremental updates
with various timings, and that a more refined concept of defining
what is "fresh" data will be developed.  At some point it may even
be possible to have queries use a materialized in place of
references to underlying tables, but that requires the other
above-mentioned features to be working first.

Much of the documentation work by Robert Haas.
Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja
Security review by KaiGai Kohei, with a decision on how best to
implement sepgsql still pending.
2013-03-03 18:23:31 -06:00
a730183926 Move relpath() to libpgcommon
This enables non-backend code, such as pg_xlogdump, to use it easily.
The previous location, in src/backend/catalog/catalog.c, made that
essentially impossible because that file depends on many backend-only
facilities; so this needs to live separately.
2013-02-21 22:46:17 -03:00
991f3e5ab3 Provide database object names as separate fields in error messages.
This patch addresses the problem that applications currently have to
extract object names from possibly-localized textual error messages,
if they want to know for example which index caused a UNIQUE_VIOLATION
failure.  It adds new error message fields to the wire protocol, which
can carry the name of a table, table column, data type, or constraint
associated with the error.  (Since the protocol spec has always instructed
clients to ignore unrecognized field types, this should not create any
compatibility problem.)

Support for providing these new fields has been added to just a limited set
of error reports (mainly, those in the "integrity constraint violation"
SQLSTATE class), but we will doubtless add them to more calls in future.

Pavel Stehule, reviewed and extensively revised by Peter Geoghegan, with
additional hacking by Tom Lane.
2013-01-29 17:08:26 -05:00
0d5fbdc157 Change plan caching to honor, not resist, changes in search_path.
In the initial implementation of plan caching, we saved the active
search_path when a plan was first cached, then reinstalled that path
anytime we needed to reparse or replan.  The idea of that was to try to
reselect the same referenced objects, in somewhat the same way that views
continue to refer to the same objects in the face of schema or name
changes.  Of course, that analogy doesn't bear close inspection, since
holding the search_path fixed doesn't cope with object drops or renames.
Moreover sticking with the old path seems to create more surprises than
it avoids.  So instead of doing that, consider that the cached plan depends
on search_path, and force reparse/replan if the active search_path is
different than it was when we last saved the plan.

This gets us fairly close to having "transparency" of plan caching, in the
sense that the cached statement acts the same as if you'd just resubmitted
the original query text for another execution.  There are still some corner
cases where this fails though: a new object added in the search path
schema(s) might capture a reference in the query text, but we'd not realize
that and force a reparse.  We might try to fix that in the future, but for
the moment it looks too expensive and complicated.
2013-01-25 14:14:41 -05:00
0ac5ad5134 Improve concurrency of foreign key locking
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE".  These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE".  UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.

Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.

The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid.  Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates.  This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed.  pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.

Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header.  This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.

Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)

With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.

As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.

Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane.  There's probably room for several more tests.

There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it.  Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.

This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
	AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
	1290721684-sup-3951@alvh.no-ip.org
	1294953201-sup-2099@alvh.no-ip.org
	1320343602-sup-2290@alvh.no-ip.org
	1339690386-sup-8927@alvh.no-ip.org
	4FE5FF020200002500048A3D@gw.wicourts.gov
	4FEAB90A0200002500048B7D@gw.wicourts.gov
2013-01-23 12:04:59 -03:00
841a5150c5 Add ddl_command_end support for event triggers.
Dimitri Fontaine, with slight changes by me
2013-01-21 18:00:24 -05:00
535e69a43f Fix error-checking typo in check_TSCurrentConfig().
The code failed to detect an out-of-memory failure.

Xi Wang
2013-01-20 23:09:35 -05:00
d5b31cc32b Fix an O(N^2) performance issue for sessions modifying many relations.
AtEOXact_RelationCache() scanned the entire relation cache at the end of
any transaction that created a new relation or assigned a new relfilenode.
Thus, clients such as pg_restore had an O(N^2) performance problem that
would start to be noticeable after creating 10000 or so tables.  Since
typically only a small number of relcache entries need any cleanup, we
can fix this by keeping a small list of their OIDs and doing hash_searches
for them.  We fall back to the full-table scan if the list overflows.

Ideally, the maximum list length would be set at the point where N
hash_searches would cost just less than the full-table scan.  Some quick
experimentation says that point might be around 50-100; I (tgl)
conservatively set MAX_EOXACT_LIST = 32.  For the case that we're worried
about here, which is short single-statement transactions, it's unlikely
there would ever be more than about a dozen list entries anyway; so it's
probably not worth being too tense about the value.

We could avoid the hash_searches by instead keeping the target relcache
entries linked into a list, but that would be noticeably more complicated
and bug-prone because of the need to maintain such a list in the face of
relcache entry drops.  Since a relcache entry can only need such cleanup
after a somewhat-heavyweight filesystem operation, trying to save a
hash_search per cleanup doesn't seem very useful anyway --- it's the scan
over all the not-needing-cleanup entries that we wish to avoid here.

Jeff Janes, reviewed and tweaked a bit by Tom Lane
2013-01-20 13:45:10 -05:00
94afbd5831 Invent a "one-shot" variant of CachedPlans for better performance.
SPI_execute() and related functions create a CachedPlan, execute it once,
and immediately discard it, so that the functionality offered by
plancache.c is of no value in this code path.  And performance measurements
show that the extra data copying and invalidation checking done by
plancache.c slows down simple queries by 10% or more compared to 9.1.
However, enough of the SPI code is shared with functions that do need plan
caching that it seems impractical to bypass plancache.c altogether.
Instead, let's invent a variant version of cached plans that preserves
99% of the API but doesn't offer any of the actual functionality, nor the
overhead.  This puts SPI_execute() performance back on par, or maybe even
slightly better, than it was before.  This change should resolve recent
complaints of performance degradation from Dong Ye, Pavel Stehule, and
others.

By avoiding data copying, this change also reduces the amount of memory
needed to execute many-statement SPI_execute() strings, as for instance in
a recent complaint from Tomas Vondra.

An additional benefit of this change is that multi-statement SPI_execute()
query strings are now processed fully serially, that is we complete
execution of earlier statements before running parse analysis and planning
on following ones.  This eliminates a long-standing POLA violation, in that
DDL that affects the behavior of a later statement will now behave as
expected.

Back-patch to 9.2, since this was a performance regression compared to 9.1.
(In 9.2, place the added struct fields so as to avoid changing the offsets
of existing fields.)

Heikki Linnakangas and Tom Lane
2013-01-04 17:42:19 -05:00
bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
ae9aba69a8 Keep rd_newRelfilenodeSubid across overflow.
Teach RelationCacheInvalidate() to keep rd_newRelfilenodeSubid across rel cache
message overflows, so that behaviour is now fully deterministic.

Noah Misch
2012-12-24 16:43:22 +00:00
6919b7e329 Fix failure to ignore leftover temp tables after a server crash.
During crash recovery, we remove disk files belonging to temporary tables,
but the system catalog entries for such tables are intentionally not
cleaned up right away.  Instead, the first backend that uses a temp schema
is expected to clean out any leftover objects therein.  This approach
requires that we be careful to ignore leftover temp tables (since any
actual access attempt would fail), *even if their BackendId matches our
session*, if we have not yet established use of the session's corresponding
temp schema.  That worked fine in the past, but was broken by commit
debcec7dc3 which incorrectly removed the
rd_islocaltemp relcache flag.  Put it back, and undo various changes
that substituted tests like "rel->rd_backend == MyBackendId" for use
of a state-aware flag.  Per trouble report from Heikki Linnakangas.

Back-patch to 9.1 where the erroneous change was made.  In the back
branches, be careful to add rd_islocaltemp in a spot in the struct that
was alignment padding before, so as not to break existing add-on code.
2012-12-17 20:15:32 -05:00
3c84046490 Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY.
Commit 8cb53654db, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation.  The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY.  This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions.  Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.

To fix, add an additional boolean column "indislive" to pg_index, so that
the freshly-created and about-to-die states can be distinguished.  (This
change obviously is only possible in HEAD.  This patch will need to be
back-patched, but in 9.2 we'll use a kluge consisting of overloading the
formerly-impossible state of indisvalid = true and indisready = false.)

In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update.  The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption.  This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.

In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate.  These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.

Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation.  Previously we
could have been left with stale values of some fields in an index relcache
entry.  It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.

In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.

This will need to be back-patched, but in a noticeably different form,
so I'm committing it to HEAD before working on the back-patch.

Problem reported by Amit Kapila, diagnosis by Pavan Deolassee,
fix by Tom Lane and Andres Freund.
2012-11-28 21:26:01 -05:00
1577b46b7c Split out rmgr rm_desc functions into their own files
This is necessary (but not sufficient) to have them compilable outside
of a backend environment.
2012-11-28 13:01:15 -03:00
1f67078ea3 Add OpenTransientFile, with automatic cleanup at end-of-xact.
Files opened with BasicOpenFile or PathNameOpenFile are not automatically
cleaned up on error. That puts unnecessary burden on callers that only want
to keep the file open for a short time. There is AllocateFile, but that
returns a buffered FILE * stream, which in many cases is not the nicest API
to work with. So add function called OpenTransientFile, which returns a
unbuffered fd that's cleaned up like the FILE* returned by AllocateFile().

This plugs a few rare fd leaks in error cases:

1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile
2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't
   use OpenTransientFile here because the fd is supposed to persist over
   transaction boundaries.
3. lo_import/lo_export - fixed by using OpenTransientFile instead of
   PathNameOpenFile.

In addition to plugging those leaks, this replaces many BasicOpenFile() calls
with OpenTransientFile() that were not leaking, because the code meticulously
closed the file on error. That wasn't strictly necessary, but IMHO it's good
for robustness.

The same leaks exist in older versions, but given the rarity of the issues,
I'm not backpatching this. Not yet, anyway - it might be good to backpatch
later, after this mechanism has had some more testing in master branch.
2012-11-27 10:25:50 +02:00
002191a1a3 Further cleanup of catcache.c ilist changes.
Remove useless duplicate initialization of bucket headers, don't use a
dlist_mutable_iter in a performance-critical path that doesn't need it,
make some other cosmetic changes for consistency's sake.
2012-10-18 19:30:43 -04:00
dc5aeca168 Remove unnecessary "head" arguments from some dlist/slist functions.
dlist_delete, dlist_insert_after, dlist_insert_before, slist_insert_after
do not need access to the list header, and indeed insisting on that negates
one of the main advantages of a doubly-linked list.

In consequence, revert addition of "cache_bucket" field to CatCTup.
2012-10-18 19:04:20 -04:00
a66ee69add Embedded list interface
Provide a common implementation of embedded singly-linked and
doubly-linked lists.  "Embedded" in the sense that the nodes'
next/previous pointers exist within some larger struct; this design
choice reduces memory allocation overhead.

Most of the implementation uses inlineable functions (where supported),
for performance.

Some existing uses of both types of lists have been converted to the new
code, for demonstration purposes.  Other uses can (and probably will) be
converted in the future.  Since dllist.c is unused after this conversion,
it has been removed.

Author: Andres Freund
Some tweaks by me
Reviewed by Tom Lane, Peter Geoghegan
2012-10-17 11:31:20 -03:00
71e58dcfb9 Make equal() ignore CoercionForm fields for better planning with casts.
This change ensures that the planner will see implicit and explicit casts
as equivalent for all purposes, except in the minority of cases where
there's actually a semantic difference (as reflected by having a 3-argument
cast function).  In particular, this fixes cases where the EquivalenceClass
machinery failed to consider two references to a varchar column as
equivalent if one was implicitly cast to text but the other was explicitly
cast to text, as seen in bug #7598 from Vaclav Juza.  We have had similar
bugs before in other parts of the planner, so I think it's time to fix this
problem at the core instead of continuing to band-aid around it.

Remove set_coercionform_dontcare(), which represents the band-aid
previously in use for allowing matching of index and constraint expressions
with inconsistent cast labeling.  (We can probably get rid of
COERCE_DONTCARE altogether, but I don't think removing that enum value in
back branches would be wise; it's possible there's third party code
referring to it.)

Back-patch to 9.2.  We could go back further, and might want to once this
has been tested more; but for the moment I won't risk destabilizing plan
choices in long-since-stable branches.
2012-10-12 12:11:22 -04:00
c219d9b0a5 Split tuple struct defs from htup.h to htup_details.h
This reduces unnecessary exposure of other headers through htup.h, which
is very widely included by many files.

I have chosen to move the function prototypes to the new file as well,
because that means htup.h no longer needs to include tupdesc.h.  In
itself this doesn't have much effect in indirect inclusion of tupdesc.h
throughout the tree, because it's also required by execnodes.h; but it's
something to explore in the future, and it seemed best to do the htup.h
change now while I'm busy with it.
2012-08-30 16:52:35 -04:00
fda0594fc2 remove catcache.h from syscache.h
Instead, place a forward struct declaration for struct catclist in
syscache.h.  This reduces header proliferation somewhat.
2012-08-28 18:36:39 -04:00
45326c5a11 Split resowner.h
This lets files that are mere users of ResourceOwner not automatically
include the headers for stuff that is managed by the resowner mechanism.
2012-08-28 18:02:07 -04:00
21786db81f Fix cache flush hazard in event trigger cache.
Bug spotted by Jeff Davis using -DCLOBBER_CACHE_ALWAYS.
2012-08-08 16:38:37 -04:00
eea65943c6 Fix memory leaks in event trigger code.
Spotted by Jeff Davis.
2012-08-07 17:00:16 -04:00
3a0e4d36eb Make new event trigger facility actually do something.
Commit 3855968f32 added syntax, pg_dump,
psql support, and documentation, but the triggers didn't actually fire.
With this commit, they now do.  This is still a pretty basic facility
overall because event triggers do not get a whole lot of information
about what the user is trying to do unless you write them in C; and
there's still no option to fire them anywhere except at the very
beginning of the execution sequence, but it's better than nothing,
and a good building block for future work.

Along the way, add a regression test for ALTER LARGE OBJECT, since
testing of event triggers reveals that we haven't got one.

Dimitri Fontaine and Robert Haas
2012-07-20 11:39:01 -04:00
3855968f32 Syntax support and documentation for event triggers.
They don't actually do anything yet; that will get fixed in a
follow-on commit.  But this gets the basic infrastructure in place,
including CREATE/ALTER/DROP EVENT TRIGGER; support for COMMENT,
SECURITY LABEL, and ALTER EXTENSION .. ADD/DROP EVENT TRIGGER;
pg_dump and psql support; and documentation for the anticipated
initial feature set.

Dimitri Fontaine, with review and a bunch of additional hacking by me.
Thom Brown extensively reviewed earlier versions of this patch set,
but there's not a whole lot of that code left in this commit, as it
turns out.
2012-07-18 10:16:16 -04:00
9ad45c18b6 Fix race condition in enum value comparisons.
When (re) loading the typcache comparison cache for an enum type's values,
use an up-to-date MVCC snapshot, not the transaction's existing snapshot.
This avoids problems if we encounter an enum OID that was created since our
transaction started.  Per report from Andres Freund and diagnosis by Robert
Haas.

To ensure this is safe even if enum comparison manages to get invoked
before we've set a transaction snapshot, tweak GetLatestSnapshot to
redirect to GetTransactionSnapshot instead of throwing error when
FirstSnapshotSet is false.  The existing uses of GetLatestSnapshot (in
ri_triggers.c) don't care since they couldn't be invoked except in a
transaction that's already done some work --- but it seems just conceivable
that this might not be true of enums, especially if we ever choose to use
enums in system catalogs.

Note that the comparable coding in enum_endpoint and enum_range_internal
remains GetTransactionSnapshot; this is perhaps debatable, but if we
changed it those functions would have to be marked volatile, which doesn't
seem attractive.

Back-patch to 9.1 where ALTER TYPE ADD VALUE was added.
2012-07-01 17:12:49 -04:00
0ce4459a36 Increase MAX_SYSCACHE_CALLBACKS from 20 to 32.
By my count there are 18 callers of CacheRegisterSyscacheCallback in the
core code in HEAD, so we are potentially leaving as few as 2 slots for any
add-on code to use (though possibly not all these callers would actually
activate in any particular session).  That doesn't seem like a lot of
headroom, so let's pump it up a little.
2012-06-20 19:47:37 -04:00
d2c86a1ccd Remove RELKIND_UNCATALOGED.
This may have been important at some point in the past, but it no
longer does anything useful.

Review by Tom Lane.
2012-06-14 09:47:30 -04:00
927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
09ff76fcdb Recast "ONLY" column CHECK constraints as NO INHERIT
The original syntax wasn't universally loved, and it didn't allow its
usage in CREATE TABLE, only ALTER TABLE.  It now works everywhere, and
it also allows using ALTER TABLE ONLY to add an uninherited CHECK
constraint, per discussion.

The pg_constraint column has accordingly been renamed connoinherit.

This commit partly reverts some of the changes in
61d81bd28d, particularly some pg_dump and
psql bits, because now pg_get_constraintdef includes the necessary NO
INHERIT within the constraint definition.

Author: Nikhil Sontakke
Some tweaks by me
2012-04-20 23:56:57 -03:00
8cb53654db Add DROP INDEX CONCURRENTLY [IF EXISTS], uses ShareUpdateExclusiveLock 2012-04-06 10:21:40 +01:00
f70f095c90 Allow new relmapper entries when allow_system_table_mods is true.
This restores the pre-9.0 situation that it's possible to add new indexes
on pg_class and other mapped-but-not-shared catalogs, so long as you broke
the glass and flipped the big red Dont-Touch-Me switch.  As before, there
are a lot of gotchas, and you'd have to be pretty desperate to try this
on a production database; but there doesn't seem to be a reason for
relmapper.c to be preventing such things all by itself.  Per
experimentation with a case suggested by Cody Cutrer.
2012-03-21 14:09:39 -04:00
9dbf2b7d75 Restructure SELECT INTO's parsetree representation into CreateTableAsStmt.
Making this operation look like a utility statement seems generally a good
idea, and particularly so in light of the desire to provide command
triggers for utility statements.  The original choice of representing it as
SELECT with an IntoClause appendage had metastasized into rather a lot of
places, unfortunately, so that this patch is a great deal more complicated
than one might at first expect.

In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS
subcommands required restructuring some EXPLAIN-related APIs.  Add-on code
that calls ExplainOnePlan or ExplainOneUtility, or uses
ExplainOneQuery_hook, will need adjustment.

Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO,
which formerly were accepted though undocumented, are no longer accepted.
The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE.
The CREATE RULE case doesn't seem to have much real-world use (since the
rule would work only once before failing with "table already exists"),
so we'll not bother with that one.

Both SELECT INTO and CREATE TABLE AS still return a command tag of
"SELECT nnnn".  There was some discussion of returning "CREATE TABLE nnnn",
but for the moment backwards compatibility wins the day.

Andres Freund and Tom Lane
2012-03-19 21:38:12 -04:00
d4bf3c9c94 Expose an API for calculating catcache hash values.
Now that cache invalidation callbacks get only a hash value, and not a
tuple TID (per commits 632ae6829f and
b5282aa893), the only way they can restrict
what they invalidate is to know what the hash values mean.  setrefs.c was
doing this via a hard-wired assumption but that seems pretty grotty, and
it'll only get worse as more cases come up.  So let's expose a calculation
function that takes the same parameters as SearchSysCache.  Per complaint
from Marko Kreen.
2012-03-07 14:51:13 -05:00
cd30728fb2 Allow LEAKPROOF functions for better performance of security views.
We don't normally allow quals to be pushed down into a view created
with the security_barrier option, but functions without side effects
are an exception: they're OK.  This allows much better performance in
common cases, such as when using an equality operator (that might
even be indexable).

There is an outstanding issue here with the CREATE FUNCTION / ALTER
FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the
leakproof flag.  But I'm committing this as-is so that it doesn't
have to be rebased again; we can fix up the grammar in a future
commit.

KaiGai Kohei, with some wordsmithing by me.
2012-02-13 22:21:14 -05:00