an allegedly immutable index function. It was previously recognized that
we had to prevent such a function from executing SET/RESET ROLE/SESSION
AUTHORIZATION, or it could trivially obtain the privileges of the session
user. However, since there is in general no privilege checking for changes
of session-local state, it is also possible for such a function to change
settings in a way that might subvert later operations in the same session.
Examples include changing search_path to cause an unexpected function to
be called, or replacing an existing prepared statement with another one
that will execute a function of the attacker's choosing.
The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against
these threats, which are the same places previously deemed to need protection
against the SET ROLE issue. GUC changes are still allowed, since there are
many useful cases for that, but we prevent security problems by forcing a
rollback of any GUC change after completing the operation. Other cases are
handled by throwing an error if any change is attempted; these include temp
table creation, closing a cursor, and creating or deleting a prepared
statement. (In 7.4, the infrastructure to roll back GUC changes doesn't
exist, so we settle for rejecting changes of "search_path" in these contexts.)
Original report and patch by Gurjeet Singh, additional analysis by
Tom Lane.
Security: CVE-2009-4136
current transaction has any open references to the target relation or index
(implying it has an active query using the relation). Also back-patch the
8.2 fix that prohibits TRUNCATE and CLUSTER when there are pending
AFTER-trigger events. Per suggestion from Heikki.
varoattno along with varattno. This resulted in having Vars that were not
seen as equal(), causing inheritance of the "same" constraint from different
parent relations to fail. An example is
create table pp1 (f1 int check (f1>0));
create table cc1 (f2 text, f3 int) inherits (pp1);
create table cc2(f4 float) inherits(pp1,cc1);
Backpatch as far as 7.4. (The test case still fails in 7.4, for reasons
that I don't feel like investigating at the moment.)
This is a backpatch commit only. The fix will be applied in HEAD as part
of the upcoming pg_constraint patch.
checked to see if it's been initialized to all non-nulls. The implicit NOT
NULL constraint was not being checked during the ALTER (in fact, not even if
there was an explicit NOT NULL too), because ATExecAddColumn neglected to
set the flag needed to make the test happen. This has been broken since
the capability was first added, in 8.0.
Brendan Jurd, per a report from Kaloyan Iliev.
made query plan. Use of ALTER COLUMN TYPE creates a hazard for cached
query plans: they could contain Vars that claim a column has a different
type than it now has. Fix this by checking during plan startup that Vars
at relation scan level match the current relation tuple descriptor. Since
at that point we already have at least AccessShareLock, we can be sure the
column type will not change underneath us later in the query. However,
since a backend's locks do not conflict against itself, there is still a
hole for an attacker to exploit: he could try to execute ALTER COLUMN TYPE
while a query is in progress in the current backend. Seal that hole by
rejecting ALTER TABLE whenever the target relation is already open in
the current backend.
This is a significant security hole: not only can one trivially crash the
backend, but with appropriate misuse of pass-by-reference datatypes it is
possible to read out arbitrary locations in the server process's memory,
which could allow retrieving database content the user should not be able
to see. Our thanks to Jeff Trout for the initial report.
Security: CVE-2007-0556
a table. Otherwise a USING clause that yields NULL can leave the table
violating its constraint (possibly there are other cases too). Per report
from Alexander Pravking.
overly strong lock on pg_depend, and it wasn't closing the rel when done.
The latter bug was masked by the ResourceOwner code, which is something
that should be changed.
column with a default expression. In that situation, we need to rewrite
the heap relation. To evaluate the new default expression, we use
ExecEvalExpr(); however, this can allocate memory in the current memory
context, and ATRewriteTable() does not switch out of the active portal's
heap memory context. The end result is a rather large memory leak (on
the order of gigabytes for a reasonably sized table).
This patch changes ATRewriteTable() to switch to the per-tuple memory
context before beginning the per-tuple loop. It also removes an explicit
heap_freetuple() in the loop, since that is no longer needed.
In an unrelated change, I noticed the code was scanning through the
attributes of the new tuple descriptor for each tuple of the old table.
I changed this to use precomputation, which should slightly speed up
the loop.
Thanks to steve@deefs.net for reporting the leak.
is the minimum required fix. I want to look next at taking advantage of
it by simplifying the message semantics in the shared inval message queue,
but that part can be held over for 8.1 if it turns out too ugly.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
more than 65K columns, or when the created table has more than 65K columns
due to adding inherited columns from parent relations. Fix a similar
crash when processing SELECT queries with more than 65K target list
entries. In all three cases we would eventually detect the error and
elog, but the check was being made too late.
clause implicitly whenever one is not given explicitly. Remove concept
of a schema having an associated tablespace, and simplify the rules for
selecting a default tablespace for a table or index. It's now just
(a) explicit TABLESPACE clause; (b) default_tablespace if that's not an
empty string; (c) database's default. This will allow pg_dump to use
SET commands instead of tablespace clauses to determine object locations
(but I didn't actually make it do so). All per recent discussions.
of HeapTupleSatisfiesItself() to trigger a hint-bit update on the tuple:
if the row was updated or deleted by a subtransaction of my own transaction
that was later rolled back. This cannot occur in pre-8.0 of course, so
the hint-bit patch applied a couple weeks ago is OK for existing releases.
But for 8.0 it seems we had better fix things so that RI_FKey_check can
pass the correct buffer number to HeapTupleSatisfiesItself. Accordingly,
add fields to the TriggerData struct to carry the buffer ID(s) for the
old and new tuple(s). There are other possible solutions but this one
seems cleanest; it will allow other AFTER-trigger functions to safely
do tqual.c calls if they want to. Put new fields at end of struct so
that there is no API breakage.
at the top level of the column's old default expression before adding
an implicit coercion to the new column type. This seems to satisfy the
principle of least surprise, as per discussion of bug #1290.
NO ACTION check is deferrable. This seems to be a closer approximation
to what the SQL spec says than what we were doing before, and it prevents
some anomalous behaviors that are possible now that triggers can fire
during the execution of PL functions.
Stephan Szabo.
as per recent discussions. Invent SubTransactionIds that are managed like
CommandIds (ie, counter is reset at start of each top transaction), and
use these instead of TransactionIds to keep track of subtransaction status
in those modules that need it. This means that a subtransaction does not
need an XID unless it actually inserts/modifies rows in the database.
Accordingly, don't assign it an XID nor take a lock on the XID until it
tries to do that. This saves a lot of overhead for subtransactions that
are only used for error recovery (eg plpgsql exceptions). Also, arrange
to release a subtransaction's XID lock as soon as the subtransaction
exits, in both the commit and abort cases. This avoids holding many
unique locks after a long series of subtransactions. The price is some
additional overhead in XactLockTableWait, but that seems acceptable.
Finally, restructure the state machine in xact.c to have a more orthogonal
set of states for subtransactions.
so that we close and flush the doomed relation's relcache entry before
we start to delete the underlying catalog rows, rather than afterwards.
For awhile yesterday I thought that an unexpected relcache entry rebuild
partway through this sequence might explain the infrequent parallel
regression failures we were chasing. It doesn't, mainly because there's
no CommandCounterIncrement in the sequence and so the deletions aren't
"really" done yet. But it sure seems like trouble waiting to happen.
of XLogInsert had the same sort of checkpoint interlock problem as
RecordTransactionCommit, and indeed I found some. Btree index build
and ALTER TABLE SET TABLESPACE write data outside the friendly confines
of the buffer manager, and therefore they have to take their own
responsibility for checkpoint interlock. The easiest solution seems to
be to force smgrimmedsync at the end of the index build or table copy,
even when the operation is being WAL-logged. This is sufficient since
the new index or table will be of interest to no one if we don't get
as far as committing the current transaction.
don't hold an open file reference to the original table at the end.
This is a good thing in any case, particularly so on Windows which
cannot drop the table file otherwise.
dependency was looking at wrong columns and so would always fail.
Was not exposed by regression tests because we are only testing cases
involving built-in (pinned) types and so no actual dependency entry
exists to be removed.
to the old owner with the new owner. This is not necessarily right, but
it's sure a lot more likely to be what the user wants than doing nothing.
Christopher Kings-Lynne, some rework by Tom Lane.
recovery more manageable. Also, undo recent change to add FILE_HEADER
and WASTED_SPACE records to XLOG; instead make the XLOG page header
variable-size with extra fields in the first page of an XLOG file.
This should fix the boundary-case bugs observed by Mark Kirkwood.
initdb forced due to change of XLOG representation.
force relcache rebuild for the other table as well as the column's
own table. Otherwise, already-cached foreign key triggers will stop
working. Per example from Alexander Pravking.
performance front, but with feature freeze upon us I think it's time to
drive a stake in the ground and say that this will be in 7.5.
Alvaro Herrera, with some help from Tom Lane.
aggregates, conversions, functions, operators, operator classes,
schemas, types, and tablespaces. Fold the existing implementations
of alter domain owner and alter database owner in with these.
Christopher Kings-Lynne
There are various things left to do: contrib dbsize and oid2name modules
need work, and so does the documentation. Also someone should think about
COMMENT ON TABLESPACE and maybe RENAME TABLESPACE. Also initlocation is
dead, it just doesn't know it yet.
Gavin Sherry and Tom Lane.
ALTER TABLE tab ADD COLUMN col SERIAL, but we forgot to install the
dependency between the column and the sequence, so the sequence
would not go away if you dropped the table later.
sequences, as per recent discussion. All these names are now of the
form table_column_type, with digits added if needed to make them unique.
Default constraint names are chosen to be unique across their whole schema,
not just within the parent object, so as to be more SQL-spec-compatible
and make the information schema views more useful.
Instead of prohibiting that, put code into ALTER TABLE to reject ALTERs
that would affect other tables' columns. Eventually we will probably
want to extend ALTER TABLE to actually do something useful here, but
in the meantime it seems wrong to forbid the feature completely just
because ALTER isn't fully baked.
loop over the fields instead of a loop around heap_getattr. This is
considerably faster (O(N) instead of O(N^2)) when there are nulls or
varlena fields, since those prevent use of attcacheoff. Replace loops
over heap_getattr with heap_deformtuple in situations where all or most
of the fields have to be fetched, such as printtup and tuptoaster.
Profiling done more than a year ago shows that this should be a nice
win for situations involving many-column tables.
In the past, we used a 'Lispy' linked list implementation: a "list" was
merely a pointer to the head node of the list. The problem with that
design is that it makes lappend() and length() linear time. This patch
fixes that problem (and others) by maintaining a count of the list
length and a pointer to the tail node along with each head node pointer.
A "list" is now a pointer to a structure containing some meta-data
about the list; the head and tail pointers in that structure refer
to ListCell structures that maintain the actual linked list of nodes.
The function names of the list API have also been changed to, I hope,
be more logically consistent. By default, the old function names are
still available; they will be disabled-by-default once the rest of
the tree has been updated to use the new API names.
in favor of using the REINDEX TABLE apparatus, which does the same thing
simpler and faster. Also, make TRUNCATE not use cluster.c at all, but
just assign a new relfilenode and REINDEX. This partially addresses
Hartmut Raschick's complaint from last December that 7.4's TRUNCATE is
an order of magnitude slower than prior releases. By getting rid of
a lot of unnecessary catalog updates, these changes buy back about a
factor of two (on my system). The remaining overhead seems associated
with creating and deleting storage files, which we may not be able to
do much about without abandoning transaction safety for TRUNCATE.
conversion of basic ASCII letters. Remove all uses of strcasecmp and
strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp;
remove most but not all direct uses of toupper and tolower in favor of
pg_toupper and pg_tolower. These functions use the same notions of
case folding already developed for identifier case conversion. I left
the straight locale-based folding in place for situations where we are
just manipulating user data and not trying to match it to built-in
strings --- for example, the SQL upper() function is still locale
dependent. Perhaps this will prove not to be what's wanted, but at
the moment we can initdb and pass regression tests in Turkish locale.