1
0
mirror of https://github.com/postgres/postgres.git synced 2025-08-30 06:01:21 +03:00
Commit Graph

350 Commits

Author SHA1 Message Date
Peter Eisentraut
de973f6e75 Fix new warnings from GCC 7
This addresses the new warning types -Wformat-truncation
-Wformat-overflow that are part of -Wall, via -Wformat, in GCC 7.
2017-05-15 13:31:35 -04:00
Tom Lane
bcd7b47c28 Avoid returning stale attribute bitmaps in RelationGetIndexAttrBitmap().
The problem with the original coding here is that we might receive (and
clear) a relcache invalidation signal for the target relation down inside
one of the index_open calls we're doing.  Since the target is open, we
would not drop the relcache entry, just reset its rd_indexvalid and
rd_indexlist fields.  But RelationGetIndexAttrBitmap() kept going, and
would eventually cache and return potentially-obsolete attribute bitmaps.

The case where this matters is where the inval signal was from a CREATE
INDEX CONCURRENTLY telling us about a new index on a formerly-unindexed
column.  (In all other cases, the lock we hold on the target rel should
prevent any concurrent change in index state.)  Even just returning the
stale attribute bitmap is not such a problem, because it shouldn't matter
during the transaction in which we receive the signal.  What hurts is
caching the stale data, because it can survive into later transactions,
breaking CREATE INDEX CONCURRENTLY's expectation that later transactions
will not create new broken HOT chains.  The upshot is that there's a window
for building corrupted indexes during CREATE INDEX CONCURRENTLY.

This patch fixes the problem by rechecking that the set of index OIDs
is still the same at the end of RelationGetIndexAttrBitmap() as it was
at the start.  If not, we loop back and try again.  That's a little
more than is strictly necessary to fix the bug --- in principle, we
could return the stale data but not cache it --- but it seems like a
bad idea on general principles for relcache to return data it knows
is stale.

There might be more hazards of the same ilk, or there might be a better
way to fix this one, but this patch definitely improves matters and seems
unlikely to make anything worse.  So let's push it into today's releases
even as we continue to study the problem.

Pavan Deolasee and myself

Discussion: https://postgr.es/m/CABOikdM2MUq9cyZJi1KyLmmkCereyGp5JQ4fuwKoyKEde_mzkQ@mail.gmail.com
2017-02-06 13:20:25 -05:00
Tom Lane
39ebb64669 Fix subtransaction cleanup after an outer-subtransaction portal fails.
Formerly, we treated only portals created in the current subtransaction as
having failed during subtransaction abort.  However, if the error occurred
while running a portal created in an outer subtransaction (ie, a cursor
declared before the last savepoint), that has to be considered broken too.

To allow reliable detection of which ones those are, add a bookkeeping
field to struct Portal that tracks the innermost subtransaction in which
each portal has actually been executed.  (Without this, we'd end up
failing portals containing functions that had called the subtransaction,
thereby breaking plpgsql exception blocks completely.)

In addition, when we fail an outer-subtransaction Portal, transfer its
resources into the subtransaction's resource owner, so that they're
released early in cleanup of the subxact.  This fixes a problem reported by
Jim Nasby in which a function executed in an outer-subtransaction cursor
could cause an Assert failure or crash by referencing a relation created
within the inner subtransaction.

The proximate cause of the Assert failure is that AtEOSubXact_RelationCache
assumed it could blow away a relcache entry without first checking that the
entry had zero refcount.  That was a bad idea on its own terms, so add such
a check there, and to the similar coding in AtEOXact_RelationCache.  This
provides an independent safety measure in case there are still ways to
provoke the situation despite the Portal-level changes.

This has been broken since subtransactions were invented, so back-patch
to all supported branches.

Tom Lane and Michael Paquier
2015-09-04 13:36:50 -04:00
Tom Lane
88fab18a4c Fix the logic for putting relations into the relcache init file.
Commit f3b5565dd4 was a couple of bricks shy
of a load; specifically, it missed putting pg_trigger_tgrelid_tgname_index
into the relcache init file, because that index is not used by any
syscache.  However, we have historically nailed that index into cache for
performance reasons.  The upshot was that load_relcache_init_file always
decided that the init file was busted and silently ignored it, resulting
in a significant hit to backend startup speed.

To fix, reinstantiate RelationIdIsInInitFile() as a wrapper around
RelationSupportsSysCache(), which can know about additional relations
that should be in the init file despite being unknown to syscache.c.

Also install some guards against future mistakes of this type: make
write_relcache_init_file Assert that all nailed relations get written to
the init file, and make load_relcache_init_file emit a WARNING if it takes
the "wrong number of nailed relations" exit path.  Now that we remove the
init files during postmaster startup, that case should never occur in the
field, even if we are starting a minor-version update that added or removed
rels from the nailed set.  So the warning shouldn't ever be seen by end
users, but it will show up in the regression tests if somebody breaks this
logic.

Back-patch to all supported branches, like the previous commit.
2015-06-25 14:39:05 -04:00
Tom Lane
3e69a73b98 Use a safer method for determining whether relcache init file is stale.
When we invalidate the relcache entry for a system catalog or index, we
must also delete the relcache "init file" if the init file contains a copy
of that rel's entry.  The old way of doing this relied on a specially
maintained list of the OIDs of relations present in the init file: we made
the list either when reading the file in, or when writing the file out.
The problem is that when writing the file out, we included only rels
present in our local relcache, which might have already suffered some
deletions due to relcache inval events.  In such cases we correctly decided
not to overwrite the real init file with incomplete data --- but we still
used the incomplete initFileRelationIds list for the rest of the current
session.  This could result in wrong decisions about whether the session's
own actions require deletion of the init file, potentially allowing an init
file created by some other concurrent session to be left around even though
it's been made stale.

Since we don't support changing the schema of a system catalog at runtime,
the only likely scenario in which this would cause a problem in the field
involves a "vacuum full" on a catalog concurrently with other activity, and
even then it's far from easy to provoke.  Remarkably, this has been broken
since 2002 (in commit 7863404417), but we had
never seen a reproducible test case until recently.  If it did happen in
the field, the symptoms would probably involve unexpected "cache lookup
failed" errors to begin with, then "could not open file" failures after the
next checkpoint, as all accesses to the affected catalog stopped working.
Recovery would require manually removing the stale "pg_internal.init" file.

To fix, get rid of the initFileRelationIds list, and instead consult
syscache.c's list of relations used in catalog caches to decide whether a
relation is included in the init file.  This should be a tad more efficient
anyway, since we're replacing linear search of a list with ~100 entries
with a binary search.  It's a bit ugly that the init file contents are now
so directly tied to the catalog caches, but in practice that won't make
much difference.

Back-patch to all supported branches.
2015-06-07 15:32:09 -04:00
Andres Freund
6cbadda25a Improve relcache invalidation handling of currently invisible relations.
The corner case where a relcache invalidation tried to rebuild the
entry for a referenced relation but couldn't find it in the catalog
wasn't correct.

The code tried to RelationCacheDelete/RelationDestroyRelation the
entry. That didn't work when assertions are enabled because the latter
contains an assertion ensuring the refcount is zero. It's also more
generally a bad idea, because by virtue of being referenced somebody
might actually look at the entry, which is possible if the error is
trapped and handled via a subtransaction abort.

Instead just error out, without deleting the entry. As the entry is
marked invalid, the worst that can happen is that the invalid (and at
some point unused) entry lingers in the relcache.

Discussion: 22459.1418656530@sss.pgh.pa.us

There should be no way to hit this case < 9.4 where logical decoding
introduced a bug that can hit this. But since the code for handling
the corner case is there it should do something halfway sane, so
backpatch all the the way back.  The logical decoding bug will be
handled in a separate commit.
2015-01-07 00:25:17 +01:00
Bruce Momjian
0b44914c21 Remove tabs after spaces in C comments
This was not changed in HEAD, but will be done later as part of a
pgindent run.  Future pgindent runs will also do this.

Report by Tom Lane

Backpatch through all supported branches, but not HEAD
2014-05-06 11:26:27 -04:00
Tom Lane
fe2ef429a1 Fix failure to ignore leftover temp tables after a server crash.
During crash recovery, we remove disk files belonging to temporary tables,
but the system catalog entries for such tables are intentionally not
cleaned up right away.  Instead, the first backend that uses a temp schema
is expected to clean out any leftover objects therein.  This approach
requires that we be careful to ignore leftover temp tables (since any
actual access attempt would fail), *even if their BackendId matches our
session*, if we have not yet established use of the session's corresponding
temp schema.  That worked fine in the past, but was broken by commit
debcec7dc3 which incorrectly removed the
rd_islocaltemp relcache flag.  Put it back, and undo various changes
that substituted tests like "rel->rd_backend == MyBackendId" for use
of a state-aware flag.  Per trouble report from Heikki Linnakangas.

Back-patch to 9.1 where the erroneous change was made.  In the back
branches, be careful to add rd_islocaltemp in a spot in the struct that
was alignment padding before, so as not to break existing add-on code.
2012-12-17 20:15:39 -05:00
Tom Lane
94c014b532 Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY.
Commit 8cb53654db, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation.  The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY.  This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions.  Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.

To fix, make the final state be indisvalid = true and indisready = false,
which is otherwise nonsensical.  This is pretty ugly but we can't add
another column without forcing initdb, and it's too late for that in 9.2.
(There's a cleaner fix in HEAD.)

In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update.  The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption.  This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.

In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate.  These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.

Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation.  Previously we
could have been left with stale values of some fields in an index relcache
entry.  It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.

In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.

Portions of this need to be back-patched even further, but I'll work
on that separately.

Problem reported by Amit Kapila, diagnosis by Pavan Deolasee,
fix by Tom Lane and Andres Freund.
2012-11-29 10:37:13 -05:00
Tom Lane
0fbd44387d Make equal() ignore CoercionForm fields for better planning with casts.
This change ensures that the planner will see implicit and explicit casts
as equivalent for all purposes, except in the minority of cases where
there's actually a semantic difference (as reflected by having a 3-argument
cast function).  In particular, this fixes cases where the EquivalenceClass
machinery failed to consider two references to a varchar column as
equivalent if one was implicitly cast to text but the other was explicitly
cast to text, as seen in bug #7598 from Vaclav Juza.  We have had similar
bugs before in other parts of the planner, so I think it's time to fix this
problem at the core instead of continuing to band-aid around it.

Remove set_coercionform_dontcare(), which represents the band-aid
previously in use for allowing matching of index and constraint expressions
with inconsistent cast labeling.  (We can probably get rid of
COERCE_DONTCARE altogether, but I don't think removing that enum value in
back branches would be wise; it's possible there's third party code
referring to it.)

Back-patch to 9.2.  We could go back further, and might want to once this
has been tested more; but for the moment I won't risk destabilizing plan
choices in long-since-stable branches.
2012-10-12 12:10:55 -04:00
Bruce Momjian
927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Alvaro Herrera
09ff76fcdb Recast "ONLY" column CHECK constraints as NO INHERIT
The original syntax wasn't universally loved, and it didn't allow its
usage in CREATE TABLE, only ALTER TABLE.  It now works everywhere, and
it also allows using ALTER TABLE ONLY to add an uninherited CHECK
constraint, per discussion.

The pg_constraint column has accordingly been renamed connoinherit.

This commit partly reverts some of the changes in
61d81bd28d, particularly some pg_dump and
psql bits, because now pg_get_constraintdef includes the necessary NO
INHERIT within the constraint definition.

Author: Nikhil Sontakke
Some tweaks by me
2012-04-20 23:56:57 -03:00
Simon Riggs
8cb53654db Add DROP INDEX CONCURRENTLY [IF EXISTS], uses ShareUpdateExclusiveLock 2012-04-06 10:21:40 +01:00
Peter Eisentraut
8a3f745f16 Do not access indclass through Form_pg_index
Normally, accessing variable-length members of catalog structures past
the first one doesn't work at all.  Here, it happened to work because
indnatts was checked to be 1, and so the defined FormData_pg_index
layout, using int2vector[1] and oidvector[1] for variable-length
arrays, happened to match the actual memory layout.  But it's a very
fragile assumption, and it's not in a performance-critical path, so
code it properly using heap_getattr() instead.

bug analysis by Tom Lane
2012-01-27 20:08:34 +02:00
Bruce Momjian
e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Robert Haas
0e4611c023 Add a security_barrier option for views.
When a view is marked as a security barrier, it will not be pulled up
into the containing query, and no quals will be pushed down into it,
so that no function or operator chosen by the user can be applied to
rows not exposed by the view.  Views not configured with this
option cannot provide robust row-level security, but will perform far
better.

Patch by KaiGai Kohei; original problem report by Heikki Linnakangas
(in October 2009!).  Review (in earlier versions) by Noah Misch and
others.  Design advice by Tom Lane and myself.  Further review and
cleanup by me.
2011-12-22 16:16:31 -05:00
Alvaro Herrera
61d81bd28d Allow CHECK constraints to be declared ONLY
This makes them enforceable only on the parent table, not on children
tables.  This is useful in various situations, per discussion involving
people bitten by the restrictive behavior introduced in 8.4.

Message-Id:
8762mp93iw.fsf@comcast.net
CAFaPBrSMMpubkGf4zcRL_YL-AERUbYF_-ZNNYfb3CVwwEqc9TQ@mail.gmail.com

Authors: Nikhil Sontakke, Alex Hunsaker
Reviewed by Robert Haas and myself
2011-12-19 17:30:23 -03:00
Tom Lane
e6858e6657 Measure the number of all-visible pages for use in index-only scan costing.
Add a column pg_class.relallvisible to remember the number of pages that
were all-visible according to the visibility map as of the last VACUUM
(or ANALYZE, or some other operations that update pg_class.relpages).
Use relallvisible/relpages, instead of an arbitrary constant, to estimate
how many heap page fetches can be avoided during an index-only scan.

This is pretty primitive and will no doubt see refinements once we've
acquired more field experience with the index-only scan mechanism, but
it's way better than using a constant.

Note: I had to adjust an underspecified query in the window.sql regression
test, because it was changing answers when the plan changed to use an
index-only scan.  Some of the adjacent tests perhaps should be adjusted
as well, but I didn't do that here.
2011-10-14 17:23:46 -04:00
Tom Lane
a2822fb933 Support index-only scans using the visibility map to avoid heap fetches.
When a btree index contains all columns required by the query, and the
visibility map shows that all tuples on a target heap page are
visible-to-all, we don't need to fetch that heap page.  This patch depends
on the previous patches that made the visibility map reliable.

There's a fair amount left to do here, notably trying to figure out a less
chintzy way of estimating the cost of an index-only scan, but the core
functionality seems ready to commit.

Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
2011-10-07 20:14:13 -04:00
Tom Lane
1609797c25 Clean up the #include mess a little.
walsender.h should depend on xlog.h, not vice versa.  (Actually, the
inclusion was circular until a couple hours ago, which was even sillier;
but Bruce broke it in the expedient rather than logically correct
direction.)  Because of that poor decision, plus blind application of
pgrminclude, we had a situation where half the system was depending on
xlog.h to include such unrelated stuff as array.h and guc.h.  Clean up
the header inclusion, and manually revert a lot of what pgrminclude had
done so things build again.

This episode reinforces my feeling that pgrminclude should not be run
without adult supervision.  Inclusion changes in header files in particular
need to be reviewed with great care.  More generally, it'd be good if we
had a clearer notion of module layering to dictate which headers can sanely
include which others ... but that's a big task for another day.
2011-09-04 01:13:16 -04:00
Bruce Momjian
6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Tom Lane
5bba65de94 Fix a missed case in code for "moving average" estimate of reltuples.
It is possible for VACUUM to scan no pages at all, if the visibility map
shows that all pages are all-visible.  In this situation VACUUM has no new
information to report about the relation's tuple density, so it wasn't
changing pg_class.reltuples ... but it updated pg_class.relpages anyway.
That's wrong in general, since there is no evidence to justify changing the
density ratio reltuples/relpages, but it's particularly bad if the previous
state was relpages=reltuples=0, which means "unknown tuple density".
We just replaced "unknown" with "zero".  ANALYZE would eventually recover
from this, but it could take a lot of repetitions of ANALYZE to do so if
the relation size is much larger than the maximum number of pages ANALYZE
will scan, because of the moving-average behavior introduced by commit
b4b6923e03.

The only known situation where we could have relpages=reltuples=0 and yet
the visibility map asserts everything's visible is immediately following
a pg_upgrade.  It might be advisable for pg_upgrade to try to preserve the
relpages/reltuples statistics; but in any case this code is wrong on its
own terms, so fix it.  Per report from Sergey Koposov.

Back-patch to 8.4, where the visibility map was introduced, same as the
previous change.
2011-08-30 14:51:38 -04:00
Tom Lane
f4d7f1adba Fix incorrect order of operations during sinval reset processing.
We have to be sure that we have revalidated each nailed-in-cache relcache
entry before we try to use it to load data for some other relcache entry.
The introduction of "mapped relations" in 9.0 broke this, because although
we updated the state kept in relmapper.c early enough, we failed to
propagate that information into relcache entries soon enough; in
particular, we could try to fetch pg_class rows out of pg_class before
we'd updated its relcache entry's rd_node.relNode value from the map.

This bug accounts for Dave Gould's report of failures after "vacuum full
pg_class", and I believe that there is risk for other system catalogs
as well.

The core part of the fix is to copy relmapper data into the relcache
entries during "phase 1" in RelationCacheInvalidate(), before they'll be
used in "phase 2".  To try to future-proof the code against other similar
bugs, I also rearranged the order in which nailed relations are visited
during phase 2: now it's pg_class first, then pg_class_oid_index, then
other nailed relations.  This should ensure that RelationClearRelation can
apply RelationReloadIndexInfo to all nailed indexes without risking use
of not-yet-revalidated relcache entries.

Back-patch to 9.0 where the relation mapper was introduced.
2011-08-16 14:38:20 -04:00
Tom Lane
2ada6779c5 Fix race condition in relcache init file invalidation.
The previous code tried to synchronize by unlinking the init file twice,
but that doesn't actually work: it leaves a window wherein a third process
could read the already-stale init file but miss the SI messages that would
tell it the data is stale.  The result would be bizarre failures in catalog
accesses, typically "could not read block 0 in file ..." later during
startup.

Instead, hold RelCacheInitLock across both the unlink and the sending of
the SI messages.  This is more straightforward, and might even be a bit
faster since only one unlink call is needed.

This has been wrong since it was put in (in 2002!), so back-patch to all
supported releases.
2011-08-16 13:11:54 -04:00
Robert Haas
367bc426a1 Avoid index rebuild for no-rewrite ALTER TABLE .. ALTER TYPE.
Noah Misch.  Review and minor cosmetic changes by me.
2011-07-18 11:04:43 -04:00
Alvaro Herrera
897795240c Enable CHECK constraints to be declared NOT VALID
This means that they can initially be added to a large existing table
without checking its initial contents, but new tuples must comply to
them; a separate pass invoked by ALTER TABLE / VALIDATE can verify
existing data and ensure it complies with the constraint, at which point
it is marked validated and becomes a normal part of the table ecosystem.

An non-validated CHECK constraint is ignored in the planner for
constraint_exclusion purposes; when validated, cached plans are
recomputed so that partitioning starts working right away.

This patch also enables domains to have unvalidated CHECK constraints
attached to them as well by way of ALTER DOMAIN / ADD CONSTRAINT / NOT
VALID, which can later be validated with ALTER DOMAIN / VALIDATE
CONSTRAINT.

Thanks to Thom Brown, Dean Rasheed and Jaime Casanova for the various
reviews, and Robert Hass for documentation wording improvement
suggestions.

This patch was sponsored by Enova Financial.
2011-06-30 11:24:31 -04:00
Peter Eisentraut
21f1e15aaf Unify spelling of "canceled", "canceling", "cancellation"
We had previously (af26857a27)
established the U.S. spellings as standard.
2011-06-29 09:28:46 +03:00
Bruce Momjian
bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane
1192ba8b67 Avoid potential deadlock in InitCatCachePhase2().
Opening a catcache's index could require reading from that cache's own
catalog, which of course would acquire AccessShareLock on the catalog.
So the original coding here risks locking index before heap, which could
deadlock against another backend trying to get exclusive locks in the
normal order.  Because InitCatCachePhase2 is only called when a backend
has to start up without a relcache init file, the deadlock was seldom seen
in the field.  (And by the same token, there's no need to worry about any
performance disadvantage; so not much point in trying to distinguish
exactly which catalogs have the risk.)

Bug report, diagnosis, and patch by Nikhil Sontakke.  Additional commentary
by me.  Back-patch to all supported branches.
2011-03-22 13:00:48 -04:00
Peter Eisentraut
414c5a2ea6 Per-column collation support
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.

Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
2011-02-08 23:04:18 +02:00
Bruce Momjian
5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Robert Haas
53dbc27c62 Support unlogged tables.
The contents of an unlogged table are WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery.  Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.
2010-12-29 06:48:53 -05:00
Robert Haas
5f7b58fad8 Generalize concept of temporary relations to "relation persistence".
This commit replaces pg_class.relistemp with pg_class.relpersistence;
and also modifies the RangeVar node type to carry relpersistence rather
than istemp.  It also removes removes rd_istemp from RelationData and
instead performs the correct computation based on relpersistence.

For clarity, we add three new macros: RelationNeedsWAL(),
RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we
can clarify the purpose of each check that previous depended on
rd_istemp.

This is intended as infrastructure for the upcoming unlogged tables
patch, as well as for future possible work on global temporary tables.
2010-12-13 12:34:26 -05:00
Tom Lane
c0b5fac701 Simplify and speed up mapping of index opfamilies to pathkeys.
Formerly we looked up the operators associated with each index (caching
them in relcache) and then the planner looked up the btree opfamily
containing such operators in order to build the btree-centric pathkey
representation that describes the index's sort order.  This is quite
pointless for btree indexes: we might as well just use the index's opfamily
information directly.  That saves syscache lookup cycles during planning,
and furthermore allows us to eliminate the relcache's caching of operators
altogether, which may help in reducing backend startup time.

I added code to plancat.c to perform the same type of double lookup
on-the-fly if it's ever faced with a non-btree amcanorder index AM.
If such a thing actually becomes interesting for production, we should
replace that logic with some more-direct method for identifying the
corresponding btree opfamily; but it's not worth spending effort on now.

There is considerably more to do pursuant to my recent proposal to get rid
of sort-operator-based representations of sort orderings, but this patch
grabs some of the low-hanging fruit.  I'll look at the remainder of that
work after the current commitfest.
2010-11-29 12:30:43 -05:00
Tom Lane
511e902b51 Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.
In the previous coding, we simply issued ALTER SEQUENCE RESTART commands,
which do not roll back on error.  This meant that an error between
truncating and committing left the sequences out of sync with the table
contents, with potentially bad consequences as were noted in a Warning on
the TRUNCATE man page.

To fix, create a new storage file (relfilenode) for a sequence that is to
be reset due to RESTART IDENTITY.  If the transaction aborts, we'll
automatically revert to the old storage file.  This acts just like a
rewriting ALTER TABLE operation.  A penalty is that we have to take
exclusive lock on the sequence, but since we've already got exclusive lock
on its owning table, that seems unlikely to be much of a problem.

The interaction of this with usual nontransactional behaviors of sequence
operations is a bit weird, but it's hard to see what would be completely
consistent.  Our choice is to discard cached-but-unissued sequence values
both when the RESTART is executed, and at rollback if any; but to not touch
the currval() state either time.

In passing, move the sequence reset operations to happen before not after
any AFTER TRUNCATE triggers are fired.  The previous ordering was not
logically sensible, but was forced by the need to minimize inconsistency
if the triggers caused an error.  Transactional rollback is a much better
solution to that.

Patch by Steve Singer, rather heavily adjusted by me.
2010-11-17 16:42:18 -05:00
Robert Haas
5ccbc3d802 Correct poor grammar in comment. 2010-11-14 23:10:45 -05:00
Magnus Hagander
9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Tom Lane
9513918c6c Fix up flushing of composite-type typcache entries to be driven directly by
SI invalidation events, rather than indirectly through the relcache.

In the previous coding, we had to flush a composite-type typcache entry
whenever we discarded the corresponding relcache entry.  This caused problems
at least when testing with RELCACHE_FORCE_RELEASE, as shown in recent report
from Jeff Davis, and might result in real-world problems given the kind of
unexpected relcache flush that that test mechanism is intended to model.

The new coding decouples relcache and typcache management, which is a good
thing anyway from a structural perspective.  The cost is that we have to
search the typcache linearly to find entries that need to be flushed.  There
are a couple of ways we could avoid that, but at the moment it's not clear
it's worth any extra trouble, because the typcache contains very few entries
in typical operation.

Back-patch to 8.2, the same as some other recent fixes in this general area.
The patch could be carried back to 8.0 with some additional work, but given
that it's only hypothetical whether we're fixing any problem observable in
the field, it doesn't seem worth the work now.
2010-09-02 03:16:46 +00:00
Robert Haas
debcec7dc3 Include the backend ID in the relpath of temporary relations.
This allows us to reliably remove all leftover temporary relation
files on cluster startup without reference to system catalogs or WAL;
therefore, we no longer include temporary relations in XLOG_XACT_COMMIT
and XLOG_XACT_ABORT WAL records.

Since these changes require including a backend ID in each
SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id
field has been reduced from two bytes to one, and the maximum number
of connections has been reduced from INT_MAX / 4 to 2^23-1.  It would
be possible to remove these restrictions by increasing the size of
SharedInvalidationMessage by 4 bytes, but right now that doesn't seem
like a good trade-off.

Review by Jaime Casanova and Tom Lane.
2010-08-13 20:10:54 +00:00
Bruce Momjian
239d769e7e pgindent run for 9.0, second run 2010-07-06 19:19:02 +00:00
Tom Lane
ea46000a40 Arrange for client authentication to occur before we select a specific
database to connect to. This is necessary for the walsender code to work
properly (it was previously using an untenable assumption that template1 would
always be available to connect to).  This also gets rid of a small security
shortcoming that was introduced in the original patch to eliminate the flat
authentication files: before, you could find out whether or not the requested
database existed even if you couldn't pass the authentication checks.

The changes needed to support this are mainly just to treat pg_authid and
pg_auth_members as nailed relations, so that we can read them without having
to be able to locate real pg_class entries for them.  This mechanism was
already debugged for pg_database, but we hadn't recognized the value of
applying it to those catalogs too.

Since the current code doesn't have support for accessing toast tables before
we've brought up all of the relcache, remove pg_authid's toast table to ensure
that no one can store an out-of-line toasted value of rolpassword.  The case
seems quite unlikely to occur in practice, and was effectively unsupported
anyway in the old "flatfiles" implementation.

Update genbki.pl to actually implement the same rules as bootstrap.c does for
not-nullability of catalog columns.  The previous coding was a bit cheesy but
worked all right for the previous set of bootstrap catalogs.  It does not work
for pg_authid, where rolvaliduntil needs to be nullable.

Initdb forced due to minor catalog changes (mainly the toast table removal).
2010-04-20 23:48:47 +00:00
Tom Lane
73981cb451 Fix a problem introduced by my patch of 2010-01-12 that revised the way
relcache reload works.  In the patched code, a relcache entry in process of
being rebuilt doesn't get unhooked from the relcache hash table; which means
that if a cache flush occurs due to sinval queue overrun while we're
rebuilding it, the entry could get blown away by RelationCacheInvalidate,
resulting in crash or misbehavior.  Fix by ensuring that an entry being
rebuilt has positive refcount, so it won't be seen as a target for removal
if a cache flush occurs.  (This will mean that the entry gets rebuilt twice
in such a scenario, but that's okay.)  It appears that the problem can only
arise within a transaction that has previously reassigned the relfilenode of
a pre-existing table, via TRUNCATE or a similar operation.  Per bug #5412
from Rusty Conover.

Back-patch to 8.2, same as the patch that introduced the problem.
I think that the failure can't actually occur in 8.2, since it lacks the
rd_newRelfilenodeSubid optimization, but let's make it work like the later
branches anyway.

Patch by Heikki, slightly editorialized on by me.
2010-04-14 21:31:11 +00:00
Bruce Momjian
65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Tom Lane
50a90fac40 Stamp HEAD as 9.0devel, and update various places that were referring to 8.5
(hope I got 'em all).  Per discussion, this release will be 9.0 not 8.5.
2010-02-17 04:19:41 +00:00
Robert Haas
e26c539e9f Wrap calls to SearchSysCache and related functions using macros.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4).  This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.

Design and review by Tom Lane.
2010-02-14 18:42:19 +00:00
Tom Lane
cbe9d6beb4 Fix up rickety handling of relation-truncation interlocks.
Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr
relation entries, so that they will get reset to InvalidBlockNumber whenever
an smgr-level flush happens.  Because we now send smgr invalidation messages
immediately (not at end of transaction) when a relation truncation occurs,
this ensures that other backends will reset their values before they next
access the relation.  We no longer need the unreliable assumption that a
VACUUM that's doing a truncation will hold its AccessExclusive lock until
commit --- in fact, we can intentionally release that lock as soon as we've
completed the truncation.  This patch therefore reverts (most of) Alvaro's
patch of 2009-11-10, as well as my marginal hacking on it yesterday.  We can
also get rid of assorted no-longer-needed relcache flushes, which are far more
expensive than an smgr flush because they kill a lot more state.

In passing this patch fixes smgr_redo's failure to perform visibility-map
truncation, and cleans up some rather dubious assumptions in freespace.c and
visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of
date.
2010-02-09 21:43:30 +00:00
Tom Lane
9a75803b1a Remove CatalogCacheFlushRelation, and the reloidattr infrastructure that was
needed by nothing else.

The restructuring I just finished doing on cache management exposed to me how
silly this routine was.  Its function was to go into the catcache and blow
away all entries related to a given relation when there was a relcache flush
on that relation.  However, there is no point in removing a catcache entry
if the catalog row it represents is still valid --- and if it isn't valid,
there must have been a catcache entry flush on it, because that's triggered
directly by heap_update or heap_delete on the catalog row.  So this routine
accomplished nothing except to blow away valid cache entries that we'd very
likely be wanting in the near future to help reconstruct the relcache entry.
Dumb.

On top of which, it required a subtle and easy-to-get-wrong attribute in
syscache definitions, ie, the column containing the OID of the related
relation if any.  Removing that is a very useful maintenance simplification.
2010-02-08 05:53:55 +00:00
Tom Lane
b9b8831ad6 Create a "relation mapping" infrastructure to support changing the relfilenodes
of shared or nailed system catalogs.  This has two key benefits:

* The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.

* We no longer have to use an unsafe reindex-in-place approach for reindexing
  shared catalogs.

CLUSTER on nailed catalogs now works too, although I left it disabled on
shared catalogs because the resulting pg_index.indisclustered update would
only be visible in one database.

Since reindexing shared system catalogs is now fully transactional and
crash-safe, the former special cases in REINDEX behavior have been removed;
shared catalogs are treated the same as non-shared.

This commit does not do anything about the recently-discussed problem of
deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
concurrent queries; will address that in a separate patch.  As a stopgap,
parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
such failures during the regression tests.
2010-02-07 20:48:13 +00:00
Tom Lane
9727c583fe Restructure CLUSTER/newstyle VACUUM FULL/ALTER TABLE support so that swapping
of old and new toast tables can be done either at the logical level (by
swapping the heaps' reltoastrelid links) or at the physical level (by swapping
the relfilenodes of the toast tables and their indexes).  This is necessary
infrastructure for upcoming changes to support CLUSTER/VAC FULL on shared
system catalogs, where we cannot change reltoastrelid.  The physical swap
saves a few catalog updates too.

We unfortunately have to keep the logical-level swap logic because in some
cases we will be adding or deleting a toast table, so there's no possibility
of a physical swap.  However, that only happens as a consequence of schema
changes in the table, which we do not need to support for system catalogs,
so such cases aren't an obstacle for that.

In passing, refactor the cluster support functions a little bit to eliminate
unnecessarily-duplicated code; and fix the problem that while CLUSTER had
been taught to rename the final toast table at need, ALTER TABLE had not.
2010-02-04 00:09:14 +00:00
Tom Lane
70a2b05a59 Assorted cleanups in preparation for using a map file to support altering
the relfilenode of currently-not-relocatable system catalogs.

1. Get rid of inval.c's dependency on relfilenode, by not having it emit
smgr invalidations as a result of relcache flushes.  Instead, smgr sinval
messages are sent directly from smgr.c when an actual relation delete or
truncate is done.  This makes considerably more structural sense and allows
elimination of a large number of useless smgr inval messages that were
formerly sent even in cases where nothing was changing at the
physical-relation level.  Note that this reintroduces the concept of
nontransactional inval messages, but that's okay --- because the messages
are sent by smgr.c, they will be sent in Hot Standby slaves, just from a
lower logical level than before.

2. Move setNewRelfilenode out of catalog/index.c, where it never logically
belonged, into relcache.c; which is a somewhat debatable choice as well but
better than before.  (I considered catalog/storage.c, but that seemed too
low level.)  Rename to RelationSetNewRelfilenode.

3. Cosmetic cleanups of some other relfilenode manipulations.
2010-02-03 01:14:17 +00:00