1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-16 15:02:33 +03:00
Commit Graph

1185 Commits

Author SHA1 Message Date
Heikki Linnakangas
9858a8c81c Rely on relcache invalidation to update the cached size of the FSM. 2008-11-26 17:08:58 +00:00
Heikki Linnakangas
3396000684 Rethink the way FSM truncation works. Instead of WAL-logging FSM
truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To
make that cleaner from modularity point of view, move the WAL-logging one
level up to RelationTruncate, and move RelationTruncate and all the
related WAL-logging to new src/backend/catalog/storage.c file. Introduce
new RelationCreateStorage and RelationDropStorage functions that are used
instead of calling smgrcreate/smgrscheduleunlink directly. Move the
pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new
functions. This leaves smgr.c as a thin wrapper around md.c; all the
transactional stuff is now in storage.c.

This will make it easier to add new forks with similar truncation logic,
like the visibility map.
2008-11-19 10:34:52 +00:00
Heikki Linnakangas
f06b7604ca Fix oversight in previous error-reporting patch; mustn't pfree path string
before passing it to elog.
2008-11-14 11:09:50 +00:00
Tom Lane
cad3a26a95 Fix sloppy omission of now-required #include's. 2008-11-11 14:17:02 +00:00
Heikki Linnakangas
7e8b0b9ab1 Change error messages to print the physical path, like
"base/11517/3767_fsm", instead of symbolic names like "1663/11517/3767/1",
per Alvaro's suggestion. I didn't change the messages in the higher-level
index, heap and FSM routines, though, where the fork is implicit.
2008-11-11 13:19:16 +00:00
Tom Lane
6517f377d6 Implement ALTER DATABASE SET TABLESPACE to move a whole database (or at least
as much of it as lives in its default tablespace) to a new tablespace.

Guillaume Lelarge, with some help from Bernd Helmle and Tom Lane
2008-11-07 18:25:07 +00:00
Tom Lane
85e2cedf98 Improve bulk-insert performance by keeping the current target buffer pinned
(but not locked, as that would risk deadlocks).  Also, make it work in a small
ring of buffers to avoid having bulk inserts trash the whole buffer arena.

Robert Haas, after an idea of Simon Riggs'.
2008-11-06 20:51:15 +00:00
Tom Lane
b4eae023bb Clean up the messy semantics (not to mention inefficiency) of PageGetTempPage
by splitting it into three functions with better-defined behaviors.

Zdenek Kotala
2008-11-03 20:47:49 +00:00
Tom Lane
d7112cfa88 Remove the last vestiges of the MAKE_PTR/MAKE_OFFSET mechanism. We haven't
allowed different processes to have different addresses for the shmem segment
in quite a long time, but there were still a few places left that used the
old coding convention.  Clean them up to reduce confusion and improve the
compiler's ability to detect pointer type mismatches.

Kris Jurka
2008-11-02 21:24:52 +00:00
Tom Lane
902d1cb35f Remove all uses of the deprecated functions heap_formtuple, heap_modifytuple,
and heap_deformtuple in favor of the newer functions heap_form_tuple et al
(which do the same things but use bool control flags instead of arbitrary
char values).  Eliminate the former duplicate coding of these functions,
reducing the deprecated functions to mere wrappers around the newer ones.
We can't get rid of them entirely because add-on modules probably still
contain many instances of the old coding style.

Kris Jurka
2008-11-02 01:45:28 +00:00
Heikki Linnakangas
e9816533e3 Update FSM on WAL replay. This is a bit limited; the FSM is only updated
on non-full-page-image WAL records, and quite arbitrarily, only if there's
less than 20% free space on the page after the insert/update (not on HOT
updates, though). The 20% cutoff should avoid most of the overhead, when
replaying a bulk insertion, for example, while ensuring that pages that
are full are marked as full in the FSM.

This is mostly to avoid the nasty worst case scenario, where you replay
from a PITR archive, and the FSM information in the base backup is really
out of date. If there was a lot of pages that the outdated FSM claims to
have free space, but don't actually have any, the first unlucky inserter
after the recovery would traverse through all those pages, just to find
out that they're full. We didn't have this problem with the old FSM
implementation, because we simply threw the FSM information away on a
non-clean shutdown.
2008-10-31 19:40:27 +00:00
Heikki Linnakangas
19c8dc839b Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer
functions into one ReadBufferExtended function, that takes the strategy
and mode as argument. There's three modes, RBM_NORMAL which is the default
used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and
a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages
without throwing an error. The FSM needs the new mode to recover from
corrupt pages, which could happend if we crash after extending an FSM file,
and the new page is "torn".

Add fork number to some error messages in bufmgr.c, that still lacked it.
2008-10-31 15:05:00 +00:00
Alvaro Herrera
089ae3bc9a Properly access a buffer's LSN using existing access macros instead of abusing
knowledge of page layout.

Stolen from Jonah Harris' CRC patch
2008-10-20 21:11:15 +00:00
Tom Lane
dd4c165bc3 Improve some of the comments in fsmpage.c. 2008-10-07 21:10:11 +00:00
Heikki Linnakangas
89f373bf5b Index FSMs needs to be vacuumed as well. Report by Jeff Davis. 2008-10-06 08:04:11 +00:00
Tom Lane
68827a7ada Suppress an uninitialized-variable warning (not all versions of gcc
complain here, but some do)
2008-10-01 14:59:23 +00:00
Heikki Linnakangas
f06ef2bede Fix WAL redo of FSM truncation. We can't call smgrtruncate() during WAL
replay, because it tries to XLogInsert().
2008-10-01 08:12:14 +00:00
Tom Lane
6ca1b1cd95 Fix compiler warning (unportable sprintf usage) 2008-09-30 14:15:58 +00:00
Heikki Linnakangas
15c121b3ed Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the
free space information is stored in a dedicated FSM relation fork, with each
relation (except for hash indexes; they don't use FSM).

This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
trace of them from the backend, initdb, and documentation.

Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
introduce a new variant of the get_raw_page(regclass, int4, int4) function in
contrib/pageinspect that let's you to return pages from any relation fork, and
a new fsm_page_contents() function to inspect the new FSM pages.
2008-09-30 10:52:14 +00:00
Alvaro Herrera
5817d861e9 Optimize CleanupTempFiles by having a boolean flag that keeps track of whether
there are FD_XACT_TEMPORARY files to clean up at transaction end.

Per performance profiling results on AWeber's huge systems.

Patch by me after an idea suggested by Simon Riggs.
2008-09-19 04:57:10 +00:00
Tom Lane
35c2a3c3cf Allow ShowBufferUsage() to report the number of reads/writes that have
occurred to temporary files.  This replaces the unused
NDirectFileRead/NDirectFileWrite counters.

Itagaki Takahiro
2008-09-17 13:15:55 +00:00
Heikki Linnakangas
3f0e808c4a Introduce the concept of relation forks. An smgr relation can now consist
of multiple forks, and each fork can be created and grown separately.

The bulk of this patch is about changing the smgr API to include an extra
ForkNumber argument in every smgr function. Also, smgrscheduleunlink and
smgrdounlink no longer implicitly call smgrclose, because other forks might
still exist after unlinking one. The callers of those functions have been
modified to call smgrclose instead.

This patch in itself doesn't have any user-visible effect, but provides the
infrastructure needed for upcoming patches. The additional forks envisioned
are a rewritten FSM implementation that doesn't rely on a fixed-size shared
memory block, and a visibility map to allow skipping portions of a table in
VACUUM that have no dead tuples.
2008-08-11 11:05:11 +00:00
Tom Lane
d8b04d5fac In ReadOrZeroBuffer (and related entry points), don't bother to call
PageHeaderIsValid when we zero the buffer instead of reading the page in.
The actual performance improvement is probably marginal since this function
isn't very heavily used, but a cycle saved is a cycle earned.

Zdenek Kotala
2008-08-05 15:09:04 +00:00
Tom Lane
4abd7b49f1 Improve CREATE/DROP/RENAME DATABASE so that when failing because the source
or target database is being accessed by other users, it tells you whether
the "other users" are live sessions or uncommitted prepared transactions.
(Indeed, it tells you exactly how many of each, but that's mostly just
because it was easy to do so.)  This should help forestall the gotcha of
not realizing that a prepared transaction is what's blocking the command.
Per discussion.
2008-08-04 18:03:46 +00:00
Alvaro Herrera
e36e6b1cab Add a few more DTrace probes to the backend.
Robert Lor
2008-08-01 13:16:09 +00:00
Tom Lane
dc02a4814a Fix a race condition that I introduced into sinvaladt.c during the recent
rewrite.  When called from SIInsertDataEntries, SICleanupQueue releases
the write lock if it has to issue a kill() to signal some laggard backend.
That still seems like a good idea --- but it's possible that by the time
we get the lock back, there are no longer enough free message slots to
satisfy SIInsertDataEntries' requirement.  Must recheck, and repeat the
whole SICleanupQueue process if not.  Noted while reading code.
2008-07-18 14:45:48 +00:00
Tom Lane
6816577a78 Change the PageGetContents() macro to guarantee its result is maxalign'd,
thereby forestalling any problems with alignment of the data structure placed
there.  Since SizeOfPageHeaderData is maxalign'd anyway in 8.3 and HEAD, this
does not actually change anything right now, but it is foreseeable that the
header size will change again someday.  I had to fix a couple of places that
were assuming that the content offset is just SizeOfPageHeaderData rather than
MAXALIGN(SizeOfPageHeaderData).  Per discussion of Zdenek's page-macros patch.
2008-07-13 21:50:04 +00:00
Tom Lane
9d035f4254 Clean up the use of some page-header-access macros: principally, use
SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that
makes the code clearer, and avoid casting between Page and PageHeader where
possible.  Zdenek Kotala, with some additional cleanup by Heikki Linnakangas.

I did not apply the parts of the proposed patch that would have resulted in
slightly changing the on-disk format of hash indexes; it seems to me that's
not a win as long as there's any chance of having in-place upgrade for 8.4.
2008-07-13 20:45:47 +00:00
Alvaro Herrera
110147653a Make sure we only try to free snapshots that have been passed through
CopySnapshot, per Neil Conway.  Also add a comment about the assumption in
GetSnapshotData that the argument is statically allocated.

Also, fix some more typos in comments in snapmgr.c.
2008-07-11 02:10:14 +00:00
Tom Lane
5b965bf08b Teach autovacuum how to determine whether a temp table belongs to a crashed
backend.  If so, send a LOG message to the postmaster log, and if the table
is beyond the vacuum-for-wraparound horizon, forcibly drop it.  Per recent
discussions.  Perhaps we ought to back-patch this, but it probably needs
to age a bit in HEAD first.
2008-07-01 02:09:34 +00:00
Tom Lane
dab421d2f0 Seems I was too optimistic in supposing that sinval's maxMsgNum could be
read and written without a lock.  The value itself is atomic, sure, but on
processors with weak memory ordering it's possible for a reader to see the
value change before it sees the associated message written into the buffer
array.  Fix by introducing a spinlock that's used just to read and write
maxMsgNum.  (We could do this with less overhead if we recognized a concept
of "memory access barrier"; is it worth introducing such a thing?  At the
moment probably not --- I can't measure any clear slowdown from adding the
spinlock, so this solution is probably fine.)  Per buildfarm results.
2008-06-20 00:24:53 +00:00
Tom Lane
fad153ec45 Rewrite the sinval messaging mechanism to reduce contention and avoid
unnecessary cache resets.  The major changes are:

* When the queue overflows, we only issue a cache reset to the specific
backend or backends that still haven't read the oldest message, rather
than resetting everyone as in the original coding.

* When we observe backend(s) falling well behind, we signal SIGUSR1
to only one backend, the one that is furthest behind and doesn't already
have a signal outstanding for it.  When it finishes catching up, it will
in turn signal SIGUSR1 to the next-furthest-back guy, if there is one that
is far enough behind to justify a signal.  The PMSIGNAL_WAKEN_CHILDREN
mechanism is removed.

* We don't attempt to clean out dead messages after every message-receipt
operation; rather, we do it on the insertion side, and only when the queue
fullness passes certain thresholds.

* Split SInvalLock into SInvalReadLock and SInvalWriteLock so that readers
don't block writers nor vice versa (except during the infrequent queue
cleanout operations).

* Transfer multiple sinval messages for each acquisition of a read or
write lock.
2008-06-19 21:32:56 +00:00
Alvaro Herrera
a3540b0f65 Improve our #include situation by moving pointer types away from the
corresponding struct definitions.  This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
2008-06-19 00:46:06 +00:00
Tom Lane
86fdb32bd0 Remove freeBackends counter from the sinval shared memory area. We used to
use it to help enforce superuser_reserved_backends, but since 8.1 it's
just been dead weight.
2008-06-17 20:07:08 +00:00
Heikki Linnakangas
a213f1ee6c Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relation
forks. XLogOpenRelation() and the associated light-weight relation cache in
xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument,
instead of Relation.

For functions that still need a Relation struct during WAL replay, there's a
new function called CreateFakeRelcacheEntry() that returns a fake entry like
XLogOpenRelation() used to.
2008-06-12 09:12:31 +00:00
Neil Conway
8374246054 Further tweak for comment in CheckDeadLock(), per Tom. 2008-06-09 18:23:05 +00:00
Neil Conway
da80a4b97e Fix typo in comment. 2008-06-09 06:55:34 +00:00
Alvaro Herrera
cc87402d6e Move BufferGetPageSize and BufferGetPage from bufpage.h to bufmgr.h. It is
more logical that way, and also it reduces the amount of unnecessary includes
in bufpage.h, which is widely used.

Zdenek Kotala.

My previous patch to bufpage.h should also have credited him as author, but I
forgot (sorry about that).
2008-06-08 22:00:48 +00:00
Bruce Momjian
d82a1d582c This is the patch replace offnum++ by OffsetNumberNext, to be
consistent.  OffsetNumberNext() has some casting that makes it useful.

Fujii Masao
2008-05-13 15:44:08 +00:00
Alvaro Herrera
5da9da71c4 Improve snapshot manager by keeping explicit track of snapshots.
There are two ways to track a snapshot: there's the "registered" list, which
is used for arbitrary long-lived snapshots; and there's the "active stack",
which is used for the snapshot that is considered "active" at any time.
This also allows users of snapshots to stop worrying about snapshot memory
allocation and freeing, and about using PG_TRY blocks around ActiveSnapshot
assignment.  This is all done automatically now.

As a consequence, this allows us to reset MyProc->xmin when there are no
more snapshots registered in the current backend, reducing the impact that
long-running transactions have on VACUUM.
2008-05-12 20:02:02 +00:00
Alvaro Herrera
9084399782 Put back bufmgr.h in bufpage.h -- it is needed by some macros.
Remove #include bufmgr.h from (most?) source files which already include
bufpage.h.
2008-05-12 16:06:10 +00:00
Alvaro Herrera
f8c4d7db60 Restructure some header files a bit, in particular heapam.h, by removing some
unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
2008-05-12 00:00:54 +00:00
Tom Lane
3c6248a828 Remove the recently added USE_SEGMENTED_FILES option, and indeed remove all
support for a nonsegmented mode from md.c.  Per recent discussions, there
doesn't seem to be much value in a "never segment" option as opposed to
segmenting with a suitably large segment size.  So instead provide a
configure-time switch to set the desired segment size in units of gigabytes.
While at it, expose a configure switch for BLCKSZ as well.

Zdenek Kotala
2008-05-02 01:08:27 +00:00
Heikki Linnakangas
9cb91f90c9 Fix two race conditions between the pending unlink mechanism that was put in
place to prevent reusing relation OIDs before next checkpoint, and DROP
DATABASE. First, if a database was dropped, bgwriter would still try to unlink
the files that the rmtree() call by the DROP DATABASE command has already
deleted, or is just about to delete. Second, if a database is dropped, and
another database is created with the same OID, bgwriter would in the worst
case delete a relation in the new database that happened to get the same OID
as a dropped relation in the old database.

To fix these race conditions:
- make rmtree() ignore ENOENT errors. This fixes the 1st race condition.
- make ForgetDatabaseFsyncRequests forget unlink requests as well.
- force checkpoint on in dropdb on all platforms

Since ForgetDatabaseFsyncRequests() is asynchronous, the 2nd change isn't
enough on its own to fix the problem of dropping and creating a database with
same OID, but forcing a checkpoint on DROP DATABASE makes it sufficient.

Per Tom Lane's bug report and proposal. Backpatch to 8.3.
2008-04-18 06:48:38 +00:00
Tom Lane
d1cbd26ded Repair two places where SIGTERM exit could leave shared memory state
corrupted.  (Neither is very important if SIGTERM is used to shut down the
whole database cluster together, but there's a problem if someone tries to
SIGTERM individual backends.)  To do this, introduce new infrastructure
macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care
of transiently pushing an on_shmem_exit cleanup hook.  Also use this method
for createdb cleanup --- that wasn't a shared-memory-corruption problem,
but SIGTERM abort of createdb could leave orphaned files lying around.

Backpatch as far as 8.2.  The shmem corruption cases don't exist in 8.1,
and the createdb usage doesn't seem important enough to risk backpatching
further.
2008-04-16 23:59:40 +00:00
Tom Lane
ec498cdcbb Create new routines systable_beginscan_ordered, systable_getnext_ordered,
systable_endscan_ordered that have API similar to systable_beginscan etc
(in particular, the passed-in scankeys have heap not index attnums),
but guarantee ordered output, unlike the existing functions.  For the moment
these are just very thin wrappers around index_beginscan/index_getnext/etc.
Someday they might need to get smarter; but for now this is just a code
refactoring exercise to reduce the number of direct callers of index_getnext,
in preparation for changing that function's API.

In passing, remove index_getnext_indexitem, which has been dead code for
quite some time, and will have even less use than that in the presence
of run-time-lossy indexes.
2008-04-12 23:14:21 +00:00
Alvaro Herrera
73b0300b2a Move the HTSU_Result enum definition into snapshot.h, to avoid including
tqual.h into heapam.h.  This makes all inclusion of tqual.h explicit.

I also sorted alphabetically the includes on some source files.
2008-03-26 21:10:39 +00:00
Alvaro Herrera
78f02ca1f5 Rename snapmgmt.c/h to snapmgr.c/h, for consistency with other files.
Per complaint from Tom Lane.
2008-03-26 18:48:59 +00:00
Alvaro Herrera
d43b085d57 Separate snapshot management code from tuple visibility code, create a
snapmgmt.c file for the former.  The header files have also been reorganized
in three parts: the most basic snapshot definitions are now in a new file
snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c.
tqual.h has been reduced to the bare minimum.

This patch is just a first step towards managing live snapshots within a
transaction; there is no functionality change.

Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and
subsequent discussion.
2008-03-26 16:20:48 +00:00
Tom Lane
9b8e1eb375 Adjust the recent patch for reporting of deadlocked queries so that we report
query texts only to the server log.  This eliminates the issue of possible
leaking of security-sensitive data in other sessions' queries.  Since the
log is presumed secure, we can now log the queries of all sessions involved
in the deadlock, whether or not they belong to the same user as the one
reporting the failure.
2008-03-24 18:22:36 +00:00