postgres

mirror of https://github.com/postgres/postgres.git synced 2025-07-18 17:42:25 +03:00

Author	SHA1	Message	Date
Bruce Momjian	1d033d3054	Remove tabs after spaces in C comments This was not changed in HEAD, but will be done later as part of a pgindent run. Future pgindent runs will also do this. Report by Tom Lane Backpatch through all supported branches, but not HEAD	2014-05-06 11:26:25 -04:00
Tom Lane	634056567a	Fix tablespace creation WAL replay to work on Windows. The code segment that removes the old symlink (if present) wasn't clued into the fact that on Windows, symlinks are junction points which have to be removed with rmdir(). Backpatch to 9.0, where the failing code was introduced. MauMau, reviewed by Muhammad Asif Naeem and Amit Kapila	2014-04-04 23:09:49 -04:00
Tom Lane	7aea1050eb	Avoid transaction-commit race condition while receiving a NOTIFY message. Use TransactionIdIsInProgress, then TransactionIdDidCommit, to distinguish whether a NOTIFY message's originating transaction is in progress, committed, or aborted. The previous coding could accept a message from a transaction that was still in-progress according to the PGPROC array; if the client were fast enough at starting a new transaction, it might fail to see table rows added/updated by the message-sending transaction. Which of course would usually be the point of receiving the message. We noted this type of race condition long ago in tqual.c, but async.c overlooked it. The race condition probably cannot occur unless there are multiple NOTIFY senders in action, since an individual backend doesn't send NOTIFY signals until well after it's done committing. But if two senders commit in close succession, it's certainly possible that we could see the second sender's message within the race condition window while responding to the signal from the first one. Per bug #9557 from Marko Tiikkaja. This patch is slightly more invasive than what he proposed, since it removes the now-redundant TransactionIdDidAbort call. Back-patch to 9.0, where the current NOTIFY implementation was introduced.	2014-03-13 12:03:07 -04:00
Robert Haas	43d4e965ed	Avoid repeated name lookups during table and index DDL. If the name lookups come to different conclusions due to concurrent activity, we might perform some parts of the DDL on a different table than other parts. At least in the case of CREATE INDEX, this can be used to cause the permissions checks to be performed against a different table than the index creation, allowing for a privilege escalation attack. This changes the calling convention for DefineIndex, CreateTrigger, transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible (in 9.2 and newer), and AlterTable (in 9.1 and older). In addition, CheckRelationOwnership is removed in 9.2 and newer and the calling convention is changed in older branches. A field has also been added to the Constraint node (FkConstraint in 8.4). Third-party code calling these functions or using the Constraint node will require updating. Report by Andres Freund. Patch by Robert Haas and Andres Freund, reviewed by Tom Lane. Security: CVE-2014-0062	2014-02-17 09:33:37 -05:00
Noah Misch	c0ac4c75ff	Prevent privilege escalation in explicit calls to PL validators. The primary role of PL validators is to be called implicitly during CREATE FUNCTION, but they are also normal functions that a user can call explicitly. Add a permissions check to each validator to ensure that a user cannot use explicit validator calls to achieve things he could not otherwise achieve. Back-patch to 8.4 (all supported versions). Non-core procedural language extensions ought to make the same two-line change to their own validators. Andres Freund, reviewed by Tom Lane and Noah Misch. Security: CVE-2014-0061	2014-02-17 09:33:37 -05:00
Noah Misch	7890636979	Shore up ADMIN OPTION restrictions. Granting a role without ADMIN OPTION is supposed to prevent the grantee from adding or removing members from the granted role. Issuing SET ROLE before the GRANT bypassed that, because the role itself had an implicit right to add or remove members. Plug that hole by recognizing that implicit right only when the session user matches the current role. Additionally, do not recognize it during a security-restricted operation or during execution of a SECURITY DEFINER function. The restriction on SECURITY DEFINER is not security-critical. However, it seems best for a user testing his own SECURITY DEFINER function to see the same behavior others will see. Back-patch to 8.4 (all supported versions). The SQL standards do not conflate roles and users as PostgreSQL does; only SQL roles have members, and only SQL users initiate sessions. An application using PostgreSQL users and roles as SQL users and roles will never attempt to grant membership in the role that is the session user, so the implicit right to add or remove members will never arise. The security impact was mostly that a role member could revoke access from others, contrary to the wishes of his own grantor. Unapproved role member additions are less notable, because the member can still largely achieve that by creating a view or a SECURITY DEFINER function. Reviewed by Andres Freund and Tom Lane. Reported, independently, by Jonas Sundman and Noah Misch. Security: CVE-2014-0060	2014-02-17 09:33:37 -05:00
Tom Lane	d17a667e80	Fix unsafe references to errno within error messaging logic. Various places were supposing that errno could be expected to hold still within an ereport() nest or similar contexts. This isn't true necessarily, though in some cases it accidentally failed to fail depending on how the compiler chanced to order the subexpressions. This class of thinko explains recent reports of odd failures on clang-built versions, typically missing or inappropriate HINT fields in messages. Problem identified by Christian Kruse, who also submitted the patch this commit is based on. (I fixed a few issues in his patch and found a couple of additional places with the same disease.) Back-patch as appropriate to all supported branches.	2014-01-29 20:04:11 -05:00
Stephen Frost	e70c428213	Allow SET TABLESPACE to database default We've always allowed CREATE TABLE to create tables in the database's default tablespace without checking for CREATE permissions on that tablespace. Unfortunately, the original implementation of ALTER TABLE ... SET TABLESPACE didn't pick up on that exception. This changes ALTER TABLE ... SET TABLESPACE to allow the database's default tablespace without checking for CREATE rights on that tablespace, just as CREATE TABLE works today. Users could always do this through a series of commands (CREATE TABLE ... AS SELECT * FROM ...; DROP TABLE ...; etc), so let's fix the oversight in SET TABLESPACE's original implementation.	2014-01-18 18:50:29 -05:00
Tom Lane	2d76d75d94	Fix compute_scalar_stats() for case that all values exceed WIDTH_THRESHOLD. The standard typanalyze functions skip over values whose detoasted size exceeds WIDTH_THRESHOLD (1024 bytes), so as to limit memory bloat during ANALYZE. However, we (I think I, actually :-() failed to consider the possibility that every non-null value in a column is too wide. While compute_minimal_stats() seems to behave reasonably anyway in such a case, compute_scalar_stats() just fell through and generated no pg_statistic entry at all. That's unnecessarily pessimistic: we can still produce valid stanullfrac and stawidth values in such cases, since we do include too-wide values in the average-width calculation. Furthermore, since the general assumption in this code is that too-wide values are probably all distinct from each other, it seems reasonable to set stadistinct to -1 ("all distinct"). Per complaint from Kadri Raudsepp. This has been like this since roughly neolithic times, so back-patch to all supported branches.	2014-01-11 13:42:05 -05:00
Tom Lane	0a63f7d977	Add HOLD/RESUME_INTERRUPTS in HandleCatchupInterrupt/HandleNotifyInterrupt. This prevents a possible longjmp out of the signal handler if a timeout or SIGINT occurs while something within the handler has transiently set ImmediateInterruptOK. For safety we must hold off the timeout or cancel error until we're back in mainline, or at least till we reach the end of the signal handler when ImmediateInterruptOK was true at entry. This syncs these functions with the logic now present in handle_sig_alarm. AFAICT there is no live bug here in 9.0 and up, because I don't think we currently can wait for any heavyweight lock inside these functions, and there is no other code (except read-from-client) that will turn on ImmediateInterruptOK. However, that was not true pre-9.0: in older branches ProcessIncomingNotify might block trying to lock pg_listener, and then a SIGINT could lead to undesirable control flow. It might be all right anyway given the relatively narrow code ranges in which NOTIFY interrupts are enabled, but for safety's sake I'm back-patching this.	2013-12-13 14:05:27 -05:00
Heikki Linnakangas	ba63799598	Don't update relfrozenxid if any pages were skipped. Vacuum recognizes that it can update relfrozenxid by checking whether it has processed all pages of a relation. Unfortunately it performed that check after truncating the dead pages at the end of the relation, and used the new number of pages to decide whether all pages have been scanned. If the new number of pages happened to be smaller or equal to the number of pages scanned, it incorrectly decided that all pages were scanned. This can lead to relfrozenxid being updated, even though some pages were skipped that still contain old XIDs. That can lead to data loss due to xid wraparounds with some rows suddenly missing. This likely has escaped notice so far because it takes a large number (~2^31) of xids being used to see the effect, while a full-table vacuum before that would fix the issue. The incorrect logic was introduced by commit `b4b6923e03`. Backpatch this fix down to 8.4, like that commit. Andres Freund, with some modifications by me.	2013-11-27 13:43:18 +02:00
Tom Lane	d81cd0430f	Fix some odd behaviors when using a SQL-style simple GMT offset timezone. Formerly, when using a SQL-spec timezone setting with a fixed GMT offset (called a "brute force" timezone in the code), the session_timezone variable was not updated to match the nominal timezone; rather, all code was expected to ignore session_timezone if HasCTZSet was true. This is of course obviously fragile, though a search of the code finds only timeofday() failing to honor the rule. A bigger problem was that DetermineTimeZoneOffset() supposed that if its pg_tz parameter was pointer-equal to session_timezone, then HasCTZSet should override the parameter. This would cause datetime input containing an explicit zone name to be treated as referencing the brute-force zone instead, if the zone name happened to match the session timezone that had prevailed before installing the brute-force zone setting (as reported in bug #8572). The same malady could affect AT TIME ZONE operators. To fix, set up session_timezone so that it matches the brute-force zone specification, which we can do using the POSIX timezone definition syntax "<abbrev>offset", and get rid of the bogus lookaside check in DetermineTimeZoneOffset(). Aside from fixing the erroneous behavior in datetime parsing and AT TIME ZONE, this will cause the timeofday() function to print its result in the user-requested time zone rather than some previously-set zone. It might also affect results in third-party extensions, if there are any that make use of session_timezone without considering HasCTZSet, but in all cases the new behavior should be saner than before. Back-patch to all supported branches.	2013-11-01 12:13:33 -04:00
Heikki Linnakangas	ca4ac3f6c4	Fix spurious warning after vacuuming a page on a table with no indexes. There is a rare race condition, when a transaction that inserted a tuple aborts while vacuum is processing the page containing the inserted tuple. Vacuum prunes the page first, which normally removes any dead tuples, but if the inserting transaction aborts right after that, the loop after pruning will see a dead tuple and remove it instead. That's OK, but if the page is on a table with no indexes, and the page becomes completely empty after removing the dead tuple (or tuples) on it, it will be immediately marked as all-visible. That's OK, but the sanity check in vacuum would throw a warning because it thinks that the page contains dead tuples and was nevertheless marked as all-visible, even though it just vacuumed away the dead tuples and so it doesn't actually contain any. Spotted this while reading the code. It's difficult to hit the race condition otherwise, but can be done by putting a breakpoint after the heap_page_prune() call. Backpatch all the way to 8.4, where this code first appeared.	2013-09-26 11:39:13 +03:00
Noah Misch	6c1fec9330	Restore REINDEX constraint validation. Refactoring as part of commit `8ceb245680` had the unintended effect of making REINDEX TABLE and REINDEX DATABASE no longer validate constraints enforced by the indexes in question; REINDEX INDEX still did so. Indexes marked invalid remained so, and constraint violations arising from data corruption went undetected. Back-patch to 9.0, like the causative commit.	2013-07-30 20:00:31 -04:00
Tom Lane	4228bde2e0	Only install a portal's ResourceOwner if it actually has one. In most scenarios a portal without a ResourceOwner is dead and not subject to any further execution, but a portal for a cursor WITH HOLD remains in existence with no ResourceOwner after the creating transaction is over. In this situation, if we attempt to "execute" the portal directly to fetch data from it, we were setting CurrentResourceOwner to NULL, leading to a segfault if the datatype output code did anything that required a resource owner (such as trying to fetch system catalog entries that weren't already cached). The case appears to be impossible to provoke with stock libpq, but psqlODBC at least is able to cause it when working with held cursors. Simplest fix is to just skip the assignment to CurrentResourceOwner, so that any resources used by the data output operations will be managed by the transaction-level resource owner instead. For consistency I changed all the places that install a portal's resowner as current, even though some of them are probably not reachable with a held cursor's portal. Per report from Joshua Berry (with thanks to Hiroshi Inoue for developing a self-contained test case). Back-patch to all supported versions.	2013-06-13 13:11:45 -04:00
Kevin Grittner	2751c7f739	Ensure ANALYZE phase is not skipped because of canceled truncate. Patch `b19e4250b4` attempted to preserve existing behavior regarding statistics generation in the case that a truncation attempt was canceled due to lock conflicts. It failed to do this accurately in two regards: (1) autovacuum had previously generated statistics if the truncate attempt failed to initially get the lock rather than having started the attempt, and (2) the VACUUM ANALYZE command had always generated statistics. Both of these changes were unintended, and are reverted by this patch. On review, there seems to be consensus that the previous failure to generate statistics when the truncate was terminated was more an unfortunate consequence of how that effort was previously terminated than a feature we want to keep; so this patch generates statistics even when an autovacuum truncation attempt terminates early. Another unintended change which is kept on the basis that it is an improvement is that when a VACUUM command is truncating, it will the new heuristic for avoiding blocking other processes, rather than keeping an AccessExclusiveLock on the table for however long the truncation takes. Per multiple reports, with some renaming per patch by Jeff Janes. Backpatch to 9.0, where problem was created.	2013-04-29 13:06:49 -05:00
Tom Lane	0dcff7560a	Avoid deadlock between concurrent CREATE INDEX CONCURRENTLY commands. There was a high probability of two or more concurrent C.I.C. commands deadlocking just before completion, because each would wait for the others to release their reference snapshots. Fix by releasing the snapshot before waiting for other snapshots to go away. Per report from Paul Hinze. Back-patch to all active branches.	2013-04-25 16:58:19 -04:00
Peter Eisentraut	a8d80461a2	Correct tense in log message	2013-02-23 23:33:30 -05:00
Tom Lane	ae1525fdcb	Fix bogus when-to-deregister-from-listener-array logic. Since a backend adds itself to the global listener array during Exec_ListenPreCommit, it's inappropriate for it to remove itself during Exec_UnlistenCommit or Exec_UnlistenAllCommit --- that leads to failure when committing a transaction that did UNLISTEN then LISTEN, since we end up not registered though we should be. (This leads to missing later notifications, or to Assert failures in assert-enabled builds.) Instead deal with deregistering at the bottom of AtCommit_Notify, when we know the final state of the listenChannels list. Also, simplify the representation of registration status by replacing the transient backendHasExecutedInitialListen flag with an amRegisteredListener flag. Per report from Greg Sabino Mullane. Back-patch to 9.0, where the problem was introduced during the LISTEN/NOTIFY rewrite.	2013-02-13 12:48:20 -05:00
Alvaro Herrera	275a86bed8	Fix typo in freeze_table_age implementation The original code used freeze_min_age instead of freeze_table_age. The main consequence of this mistake is that lowering freeze_min_age would cause full-table scans to occur much more frequently, which causes serious issues because the number of writes required is much larger. That feature (freeze_min_age) is supposed to affect only how soon tuples are frozen; some pages should still be skipped due to the visibility map. Backpatch to 8.4, where the freeze_table_age feature was introduced. Report and patch from Andres Freund	2013-02-01 12:11:10 -03:00
Kevin Grittner	56d2975c3c	Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. Backpatched to 9.0 from initial master commit at `b19e4250b4` -- before that the differences are too large to be clearly safe. Jan Wieck	2013-01-23 13:40:19 -06:00
Tom Lane	92c1f917b8	Protect against SnapshotNow race conditions in pg_tablespace scans. Use of SnapshotNow is known to expose us to race conditions if the tuple(s) being sought could be updated by concurrently-committing transactions. CREATE DATABASE and DROP DATABASE are particularly exposed because they do heavyweight filesystem operations during their scans of pg_tablespace, so that the scans run for a very long time compared to most. Furthermore, the potential consequences of a missed or twice-visited row are nastier than average: * createdb() could fail with a bogus "file already exists" error, or silently fail to copy one or more tablespace's worth of files into the new database. * remove_dbtablespaces() could miss one or more tablespaces, thus failing to free filesystem space for the dropped database. * check_db_file_conflict() could likewise miss a tablespace, leading to an OID conflict that could result in data loss either immediately or in future operations. (This seems of very low probability, though, since a duplicate database OID would be unlikely to start with.) Hence, it seems worth fixing these three places to use MVCC snapshots, even though this will someday be superseded by a generic solution to SnapshotNow race conditions. Back-patch to all active branches. Stephen Frost and Tom Lane	2013-01-18 18:06:38 -05:00
Simon Riggs	c52e0e2afc	Avoid holding vmbuffer pin after VACUUM. During VACUUM if we pause to perform a cycle of index cleanup we drop the vmbuffer pin, so we should do the same thing when heap scan completes. This avoids holding vmbuffer pin across the main index cleanup in VACUUM, which could be minutes or hours longer than necessary for correctness. Bug report and suggested fix from Pavan Deolasee	2012-12-03 18:56:41 +00:00
Tom Lane	f369815080	Add missing buffer lock acquisition in GetTupleForTrigger(). If we had not been holding buffer pin continuously since the tuple was initially fetched by the UPDATE or DELETE query, it would be possible for VACUUM or a page-prune operation to move the tuple while we're trying to copy it. This would result in a garbage "old" tuple value being passed to an AFTER ROW UPDATE or AFTER ROW DELETE trigger. The preconditions for this are somewhat improbable, and the timing constraints are very tight; so it's not so surprising that this hasn't been reported from the field, even though the bug has been there a long time. Problem found by Andres Freund. Back-patch to all active branches.	2012-11-30 13:56:11 -05:00
Tom Lane	1dbd02dc37	Fix assorted bugs in CREATE INDEX CONCURRENTLY. This patch changes CREATE INDEX CONCURRENTLY so that the pg_index flag changes it makes without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. This is a subset of a patch already applied in 9.2 and HEAD. Back-patch into all earlier supported branches. Tom Lane and Andres Freund	2012-11-29 14:52:07 -05:00
Tom Lane	8e344eacdc	Fix handling of inherited check constraints in ALTER COLUMN TYPE. This case got broken in 8.4 by the addition of an error check that complains if ALTER TABLE ONLY is used on a table that has children. We do use ONLY for this situation, but it's okay because the necessary recursion occurs at a higher level. So we need to have a separate flag to suppress recursion without making the error check. Reported and patched by Pavan Deolasee, with some editorial adjustments by me. Back-patch to 8.4, since this is a regression of functionality that worked in earlier branches.	2012-11-05 13:36:31 -05:00
Tom Lane	2d7045db0f	Fix longstanding crash-safety bug with newly-created-or-reset sequences. If a crash occurred immediately after the first nextval() call for a serial column, WAL replay would restore the sequence to a state in which it appeared that no nextval() had been done, thus allowing the first sequence value to be returned again by the next nextval() call; as reported in bug #6748 from Xiangming Mei. More generally, the problem would occur if an ALTER SEQUENCE was executed on a freshly created or reset sequence. (The manifestation with serial columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step to serial column creation.) The cause is that sequence creation attempted to save one WAL entry by writing out a WAL record that made it appear that the first nextval() had already happened (viz, with is_called = true), while marking the sequence's in-database state with log_cnt = 1 to show that the first nextval() need not emit a WAL record. However, ALTER SEQUENCE would emit a new WAL entry reflecting the actual in-database state (with is_called = false). Then, nextval would allocate the first sequence value and set is_called = true, but it would trust the log_cnt value and not emit any WAL record. A crash at this point would thus restore the sequence to its post-ALTER state, causing the next nextval() call to return the first sequence value again. To fix, get rid of the idea of logging an is_called status different from reality. This means that the first nextval-driven WAL record will happen at the first nextval call not the second, but the marginal cost of that is pretty negligible. In addition, make sure that ALTER SEQUENCE resets log_cnt to zero in any case where it touches sequence parameters that affect future nextval results. This will result in some user-visible changes in the contents of a sequence's log_cnt column, as reflected in the patch's regression test changes; but no application should be depending on that anyway, since it was already true that log_cnt changes rather unpredictably depending on checkpoint timing. In addition, make some basically-cosmetic improvements to get rid of sequence.c's undesirable intimacy with page layout details. It was always really trying to WAL-log the contents of the sequence tuple, so we should have it do that directly using a HeapTuple's t_data and t_len, rather than backing into it with some magic assumptions about where the tuple would be on the sequence's page. Back-patch to all supported branches.	2012-07-25 17:40:53 -04:00
Tom Lane	afc7328457	Prevent CREATE TABLE LIKE/INHERITS from (mis) copying whole-row Vars. If a CHECK constraint or index definition contained a whole-row Var (that is, "table.*"), an attempt to copy that definition via CREATE TABLE LIKE or table inheritance produced incorrect results: the copied Var still claimed to have the rowtype of the source table, rather than the created table. For the LIKE case, it seems reasonable to just throw error for this situation, since the point of LIKE is that the new table is not permanently coupled to the old, so there's no reason to assume its rowtype will stay compatible. In the inheritance case, we should ideally allow such constraints, but doing so will require nontrivial refactoring of CREATE TABLE processing (because we'd need to know the OID of the new table's rowtype before we adjust inherited CHECK constraints). In view of the lack of previous complaints, that doesn't seem worth the risk in a back-patched bug fix, so just make it throw error for the inheritance case as well. Along the way, replace change_varattnos_of_a_node() with a more robust function map_variable_attnos(), which is capable of being extended to handle insertion of ConvertRowtypeExpr whenever we get around to fixing the inheritance case nicely, and in the meantime it returns a failure indication to the caller so that a helpful message with some context can be thrown. Also, this code will do the right thing with subselects (if we ever allow them in CHECK or indexes), and it range-checks varattnos before using them to index into the map array. Per report from Sergey Konoplev. Back-patch to all supported branches.	2012-06-30 16:44:09 -04:00
Tom Lane	38a90526a1	Fix NOTIFY to cope with I/O problems, such as out-of-disk-space. The LISTEN/NOTIFY subsystem got confused if SimpleLruZeroPage failed, which would typically happen as a result of a write() failure while attempting to dump a dirty pg_notify page out of memory. Subsequently, all attempts to send more NOTIFY messages would fail with messages like "Could not read from file "pg_notify/nnnn" at offset nnnnn: Success". Only restarting the server would clear this condition. Per reports from Kevin Grittner and Christoph Berg. Back-patch to 9.0, where the problem was introduced during the LISTEN/NOTIFY rewrite.	2012-06-29 00:51:49 -04:00
Tom Lane	82992a4cd0	Fix DROP TABLESPACE to unlink symlink when directory is not there. If the tablespace directory is missing entirely, we allow DROP TABLESPACE to go through, on the grounds that it should be possible to clean up the catalog entry in such a situation. However, we forgot that the pg_tblspc symlink might still be there. We should try to remove the symlink too (but not fail if it's no longer there), since not doing so can lead to weird behavior subsequently, as per report from Michael Nolan. There was some discussion of adding dependency links to prevent DROP TABLESPACE when the catalogs still contain references to the tablespace. That might be worth doing too, but it's an orthogonal question, and in any case wouldn't be back-patchable. Back-patch to 9.0, which is as far back as the logic looks like this. We could possibly do something similar in 8.x, but given the lack of reports I'm not sure it's worth the trouble, and anyway the case could not arise in the form the logic is meant to cover (namely, a post-DROP transaction rollback having resurrected the pg_tablespace entry after some or all of the filesystem infrastructure is gone).	2012-05-13 18:07:02 -04:00
Tom Lane	70e94d2dd7	Fix COPY FROM for null marker strings that correspond to invalid encoding. The COPY documentation says "COPY FROM matches the input against the null string before removing backslashes". It is therefore reasonable to presume that null markers like E'\\0' will work ... and they did, until someone put the tests in the wrong order during microoptimization-driven rewrites. Since then, we've been failing if the null marker is something that would de-escape to an invalidly-encoded string. Since null markers generally need to be something that can't appear in the data, this represents a nontrivial loss of functionality; surprising nobody noticed it earlier. Per report from Jeff Davis. Backpatch to 8.4 where this got broken.	2012-03-25 23:17:32 -04:00
Tom Lane	de323d534c	Require execute permission on the trigger function for CREATE TRIGGER. This check was overlooked when we added function execute permissions to the system years ago. For an ordinary trigger function it's not a big deal, since trigger functions execute with the permissions of the table owner, so they couldn't do anything the user issuing the CREATE TRIGGER couldn't have done anyway. However, if a trigger function is SECURITY DEFINER, that is not the case. The lack of checking would allow another user to install it on his own table and then invoke it with, essentially, forged input data; which the trigger function is unlikely to realize, so it might do something undesirable, for instance insert false entries in an audit log table. Reported by Dinesh Kumar, patch by Robert Haas Security: CVE-2012-0866	2012-02-23 15:39:07 -05:00
Peter Eisentraut	31f26140b3	Remove inappropriate quotes And adjust wording for consistency.	2012-02-23 12:51:33 +02:00
Alvaro Herrera	140766dff6	REASSIGN OWNED: Support foreign data wrappers and servers This was overlooked when implementing those kinds of objects, in commit `cae565e503`. Per report from Pawel Casperek.	2012-02-22 17:32:23 -03:00
Tom Lane	07e36415e5	Avoid throwing ERROR during WAL replay of DROP TABLESPACE. Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless removal of the tablespace's directories succeeds, that does not guarantee that the same operation will succeed during WAL replay. Foreseeable reasons for it to fail include temp files created in the tablespace by Hot Standby backends, wrong directory permissions on a standby server, etc etc. The original coding threw ERROR if replay failed to remove the directories, but that is a serious overreaction. Throwing an error aborts recovery, and worse means that manual intervention will be needed to get the database to start again, since otherwise the same error will recur on subsequent attempts to replay the same WAL record. And the consequence of failing to remove the directories is only that some probably-small amount of disk space is wasted, so it hardly seems justified to throw an error. Accordingly, arrange to report such failures as LOG messages and keep going when a failure occurs during replay. Back-patch to 9.0 where Hot Standby was introduced. In principle such problems can occur in earlier releases, but Hot Standby increases the odds of trouble significantly. Given the lack of field reports of such issues, I'm satisfied with patching back as far as the patch applies easily.	2012-02-06 14:44:10 -05:00
Tom Lane	2b196f01ef	Fix transient clobbering of shared buffers during WAL replay. RestoreBkpBlocks was in the habit of zeroing and refilling the target buffer; which was perfectly safe when the code was written, but is unsafe during Hot Standby operation. The reason is that we have coding rules that allow backends to continue accessing a tuple in a heap relation while holding only a pin on its buffer. Such a backend could see transiently zeroed data, if WAL replay had occasion to change other data on the page. This has been shown to be the cause of bug #6425 from Duncan Rance (who deserves kudos for developing a sufficiently-reproducible test case) as well as Bridget Frey's re-report of bug #6200. It most likely explains the original report as well, though we don't yet have confirmation of that. To fix, change the code so that only bytes that are supposed to change will change, even transiently. This actually saves cycles in RestoreBkpBlocks, since it's not writing the same bytes twice. Also fix seq_redo, which has the same disease, though it has to work a bit harder to meet the requirement. So far as I can tell, no other WAL replay routines have this type of bug. In particular, the index-related replay routines, which would certainly be broken if they had to meet the same standard, are not at risk because we do not have coding rules that allow access to an index page when not holding a buffer lock on it. Back-patch to 9.0 where Hot Standby was added.	2012-02-05 15:49:46 -05:00
Heikki Linnakangas	8bff6407ba	Accept a non-existent value in "ALTER USER/DATABASE SET ..." command. When default_text_search_config, default_tablespace, or temp_tablespaces setting is set per-user or per-database, with an "ALTER USER/DATABASE SET ..." statement, don't throw an error if the text search configuration or tablespace does not exist. In case of text search configuration, even if it doesn't exist in the current database, it might exist in another database, where the setting is intended to have its effect. This behavior is now the same as search_path's. Tablespaces are cluster-wide, so the same argument doesn't hold for tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER SET ..." statements before the "CREATE TABLESPACE" statements. Arguably that's pg_dumpall's fault - it should dump the statements in such an order that the tablespace is created first and then the "ALTER USER SET default_tablespace ..." statements after that - but it seems better to be consistent with search_path and default_text_search_config anyway. Besides, you could still create a dump that throws an error, by creating the tablespace, running "ALTER USER SET default_tablespace", then dropping the tablespace and running pg_dumpall on that. Backpatch to all supported versions.	2012-01-30 11:39:26 +02:00
Tom Lane	e5f97c5f81	Fix CLUSTER/VACUUM FULL for toast values owned by recently-updated rows. In commit `7b0d0e9356`, I made CLUSTER and VACUUM FULL try to preserve toast value OIDs from the original toast table to the new one. However, if we have to copy both live and recently-dead versions of a row that has a toasted column, those versions may well reference the same toast value with the same OID. The patch then led to duplicate-key failures as we tried to insert the toast value twice with the same OID. (The previous behavior was not very desirable either, since it would have silently inserted the same value twice with different OIDs. That wastes space, but what's worse is that the toast values inserted for already-dead heap rows would not be reclaimed by subsequent ordinary VACUUMs, since they go into the new toast table marked live not deleted.) To fix, check if the copied OID already exists in the new toast table, and if so, assume that it stores the desired value. This is reasonably safe since the only case where we will copy an OID from a previous toast pointer is when toast_insert_or_update was given that toast pointer and so we just pulled the data from the old table; if we got two different values that way then we have big problems anyway. We do have to assume that no other backend is inserting items into the new toast table concurrently, but that's surely safe for CLUSTER and VACUUM FULL. Per bug #6393 from Maxim Boguk. Back-patch to 9.0, same as the previous patch.	2012-01-12 16:40:24 -05:00
Tom Lane	7443ab2b34	Update per-column ACLs, not only per-table ACL, when changing table owner. We forgot to modify column ACLs, so privileges were still shown as having been granted by the old owner. This meant that neither the new owner nor a superuser could revoke the now-untraceable-to-table-owner permissions. Per bug #6350 from Marc Balmer. This has been wrong since column ACLs were added, so back-patch to 8.4.	2011-12-21 18:23:24 -05:00
Tom Lane	68b0997017	Change FK trigger creation order to better support self-referential FKs. When a foreign-key constraint references another column of the same table, row updates will queue both the PK's ON UPDATE action and the FK's CHECK action in the same event. The ON UPDATE action must execute first, else the CHECK will check a non-final state of the row and possibly throw an inappropriate error, as seen in bug #6268 from Roman Lytovchenko. Now, the firing order of multiple triggers for the same event is determined by the sort order of their pg_trigger.tgnames, and the auto-generated names we use for FK triggers are "RI_ConstraintTrigger_NNNN" where NNNN is the trigger OID. So most of the time the firing order is the same as creation order, and so rearranging the creation order fixes it. This patch will fail to fix the problem if the OID counter wraps around or adds a decimal digit (eg, from 99999 to 100000) while we are creating the triggers for an FK constraint. Given the small odds of that, and the low usage of self-referential FKs, we'll live with that solution in the back branches. A better fix is to change the auto-generated names for FK triggers, but it seems unwise to do that in stable branches because there may be client code that depends on the naming convention. We'll fix it that way in HEAD in a separate patch. Back-patch to all supported branches, since this bug has existed for a long time.	2011-10-26 13:02:40 -04:00
Tom Lane	ad1e8274eb	Avoid possibly accessing off the end of memory in examine_attribute(). Since the last couple of columns of pg_type are often NULL, sizeof(FormData_pg_type) can be an overestimate of the actual size of the tuple data part. Therefore memcpy'ing that much out of the catalog cache, as analyze.c was doing, poses a small risk of copying past the end of memory and incurring SIGSEGV. No such crash has been identified in the field, but we've certainly seen the equivalent happen in other code paths, so patch this one all the way back. Per valgrind testing by Noah Misch, though this is not his proposed patch. I chose to use SearchSysCacheCopy1 rather than inventing special-purpose infrastructure for copying only the minimal part of a pg_type tuple.	2011-09-06 14:37:48 -04:00
Tom Lane	047f205f4e	Fix a missed case in code for "moving average" estimate of reltuples. It is possible for VACUUM to scan no pages at all, if the visibility map shows that all pages are all-visible. In this situation VACUUM has no new information to report about the relation's tuple density, so it wasn't changing pg_class.reltuples ... but it updated pg_class.relpages anyway. That's wrong in general, since there is no evidence to justify changing the density ratio reltuples/relpages, but it's particularly bad if the previous state was relpages=reltuples=0, which means "unknown tuple density". We just replaced "unknown" with "zero". ANALYZE would eventually recover from this, but it could take a lot of repetitions of ANALYZE to do so if the relation size is much larger than the maximum number of pages ANALYZE will scan, because of the moving-average behavior introduced by commit `b4b6923e03`. The only known situation where we could have relpages=reltuples=0 and yet the visibility map asserts everything's visible is immediately following a pg_upgrade. It might be advisable for pg_upgrade to try to preserve the relpages/reltuples statistics; but in any case this code is wrong on its own terms, so fix it. Per report from Sergey Koposov. Back-patch to 8.4, where the visibility map was introduced, same as the previous change.	2011-08-30 14:49:57 -04:00
Tom Lane	52120ee834	Fix trigger WHEN conditions when both BEFORE and AFTER triggers exist. Due to tuple-slot mismanagement, evaluation of WHEN conditions for AFTER ROW UPDATE triggers could crash if there had been a BEFORE ROW trigger fired for the same update. Fix by not trying to overload the use of estate->es_trig_tuple_slot. Per report from Yoran Heling. Back-patch to 9.0, when trigger WHEN conditions were introduced.	2011-08-21 18:16:08 -04:00
Tom Lane	44b6d53b46	Preserve toast value OIDs in toast-swap-by-content for CLUSTER/VACUUM FULL. This works around the problem that a catalog cache entry might contain a toast pointer that we try to dereference just as a VACUUM FULL completes on that catalog. We will see the sinval message on the cache entry when we acquire lock on the toast table, but by that point we've already told tuptoaster.c "here's the pointer to fetch", so it's difficult from a code structural standpoint to update the pointer before we use it. Much less painful to ensure that toast pointers are not invalidated in the first place. We have to add a bit of code to deal with the case that a value that previously wasn't toasted becomes so; but that should be a seldom-exercised corner case, so the inefficiency shouldn't be significant. Back-patch to 9.0. In prior versions, we didn't allow CLUSTER on system catalogs, and VACUUM FULL didn't result in reassignment of toast OIDs, so there was no problem.	2011-08-16 13:48:16 -04:00
Tom Lane	5707f35559	Fix unsafe order of operations in foreign-table DDL commands. When updating or deleting a system catalog tuple, it's necessary to acquire RowExclusiveLock on the catalog before looking up the tuple; otherwise a concurrent VACUUM FULL on the catalog might move the tuple to a different TID before we can apply the update. Coding patterns that find the tuple via a table scan aren't at risk here, but when obtaining the tuple from a catalog cache, correct ordering is important; and several routines in foreigncmds.c got it wrong. Noted while running the regression tests in parallel with VACUUM FULL of assorted system catalogs. For consistency I moved all the heap_open calls to the starts of their functions, including a couple for which there was no actual bug. Back-patch to 8.4 where foreigncmds.c was added.	2011-08-14 15:40:36 -04:00
Tom Lane	789d3d4541	Fix EXPLAIN to handle gating Result nodes within inner-indexscan subplans. It is possible for a NestLoop plan node to pass an OUTER Var into an "inner indexscan" that is an Append construct (derived from an inheritance tree or UNION ALL subquery). The OUTER tuple is then passed down at runtime to the leaf indexscan node(s) where it will actually be used. EXPLAIN has to likewise pass the information about the nestloop's outer subplan down through the Append node, else it will fail to print the outer-reference Vars (with complaints like "bogus varno: 65001"). However, there was a case missed in all this: we could also have gating Result nodes that were inserted into the appendrel plan tree to deal with pseudoconstant qual conditions. So EXPLAIN has to pass down the outer plan node to a Result's subplan, too. Per example from Jon Nelson. The problem is gone in 9.1 because we replaced the nestloop outer-tuple kluge with a Param-based data transfer mechanism. Also, so far as I can tell, the case can't happen before 8.4 because of restrictions on what sorts of appendrel members could be pulled up into the parent query. So this patch is only needed for 8.4 and 9.0.	2011-07-03 01:35:15 -04:00
Tom Lane	ccdce73b21	Fix thinko in previous patch to always update pg_class.reltuples/relpages. I mis-simplified the test where ANALYZE decided if it could get away without doing anything: under the new regime, that's never allowed. Per bug #6068 from Jeff Janes. Back-patch to 8.4, just like previous patch.	2011-06-19 14:01:01 -04:00
Tom Lane	bc0550f994	Clean up after erroneous SELECT FOR UPDATE/SHARE on a sequence. My previous commit disallowed this operation, but did nothing about cleaning up the damage if one had already been done. With the operation disallowed, it's okay to just forcibly clear xmax in a sequence's tuple, since any value seen there could not represent a live transaction's lock. So, any sequence-specific operation will repair the problem automatically, whether or not the user has already seen "could not access status of transaction" failures.	2011-06-02 15:31:02 -04:00
Tom Lane	73bd34c81e	Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.	2011-05-30 17:07:07 -04:00
Tom Lane	722548e430	Preserve caller's memory context in ProcessCompletedNotifies(). This is necessary to avoid long-term memory leakage, because the main loop in PostgresMain expects to be executing in MessageContext, and hence is a bit sloppy about freeing stuff that is only needed for the duration of processing the current client message. The known case of an actual leak is when encoding conversion has to be done on the incoming command string, but there might be others. Per report from Per-Olov Esgard. Back-patch to 9.0, where the bug was introduced by the LISTEN/NOTIFY rewrite.	2011-05-27 12:10:58 -04:00

1 2 3 4 5 ...

2199 Commits