includes a few new ones.
- Fixed compilation errors on OS X for probes that use typedefs
- Fixed a number of probes to pass ForkNumber per the relation forks
patch
- The new probes are those that were taken out from the previous
submitted patch and required simple fixes. Will submit the other probes
that may require more discussion in a separate patch.
Robert Lor
right child if it doesn't need to. This saves some miniscule number
of cycles, but the ulterior motive is to avoid an optimization bug
known to exist in SCO's C compiler (and perhaps others?)
replication patch needs a signal, but we've already used SIGUSR1 and
SIGUSR2 in normal backends. This patch allows reusing SIGUSR1 for that,
and for other purposes too if the need arises.
non-writable large objects need to have their snapshots registered on the
transaction resowner, not the current portal's, because it must persist until
the large object is closed (which the portal does not). Also, ensure that the
serializable snapshot is recorded by the transaction resource owner too, even
when a subtransaction has changed the current resource owner before
serializable is taken.
Per bug reports from Pavan Deolasee.
truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To
make that cleaner from modularity point of view, move the WAL-logging one
level up to RelationTruncate, and move RelationTruncate and all the
related WAL-logging to new src/backend/catalog/storage.c file. Introduce
new RelationCreateStorage and RelationDropStorage functions that are used
instead of calling smgrcreate/smgrscheduleunlink directly. Move the
pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new
functions. This leaves smgr.c as a thin wrapper around md.c; all the
transactional stuff is now in storage.c.
This will make it easier to add new forks with similar truncation logic,
like the visibility map.
"base/11517/3767_fsm", instead of symbolic names like "1663/11517/3767/1",
per Alvaro's suggestion. I didn't change the messages in the higher-level
index, heap and FSM routines, though, where the fork is implicit.
(but not locked, as that would risk deadlocks). Also, make it work in a small
ring of buffers to avoid having bulk inserts trash the whole buffer arena.
Robert Haas, after an idea of Simon Riggs'.
allowed different processes to have different addresses for the shmem segment
in quite a long time, but there were still a few places left that used the
old coding convention. Clean them up to reduce confusion and improve the
compiler's ability to detect pointer type mismatches.
Kris Jurka
and heap_deformtuple in favor of the newer functions heap_form_tuple et al
(which do the same things but use bool control flags instead of arbitrary
char values). Eliminate the former duplicate coding of these functions,
reducing the deprecated functions to mere wrappers around the newer ones.
We can't get rid of them entirely because add-on modules probably still
contain many instances of the old coding style.
Kris Jurka
on non-full-page-image WAL records, and quite arbitrarily, only if there's
less than 20% free space on the page after the insert/update (not on HOT
updates, though). The 20% cutoff should avoid most of the overhead, when
replaying a bulk insertion, for example, while ensuring that pages that
are full are marked as full in the FSM.
This is mostly to avoid the nasty worst case scenario, where you replay
from a PITR archive, and the FSM information in the base backup is really
out of date. If there was a lot of pages that the outdated FSM claims to
have free space, but don't actually have any, the first unlucky inserter
after the recovery would traverse through all those pages, just to find
out that they're full. We didn't have this problem with the old FSM
implementation, because we simply threw the FSM information away on a
non-clean shutdown.
functions into one ReadBufferExtended function, that takes the strategy
and mode as argument. There's three modes, RBM_NORMAL which is the default
used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and
a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages
without throwing an error. The FSM needs the new mode to recover from
corrupt pages, which could happend if we crash after extending an FSM file,
and the new page is "torn".
Add fork number to some error messages in bufmgr.c, that still lacked it.
free space information is stored in a dedicated FSM relation fork, with each
relation (except for hash indexes; they don't use FSM).
This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
trace of them from the backend, initdb, and documentation.
Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
introduce a new variant of the get_raw_page(regclass, int4, int4) function in
contrib/pageinspect that let's you to return pages from any relation fork, and
a new fsm_page_contents() function to inspect the new FSM pages.
there are FD_XACT_TEMPORARY files to clean up at transaction end.
Per performance profiling results on AWeber's huge systems.
Patch by me after an idea suggested by Simon Riggs.
of multiple forks, and each fork can be created and grown separately.
The bulk of this patch is about changing the smgr API to include an extra
ForkNumber argument in every smgr function. Also, smgrscheduleunlink and
smgrdounlink no longer implicitly call smgrclose, because other forks might
still exist after unlinking one. The callers of those functions have been
modified to call smgrclose instead.
This patch in itself doesn't have any user-visible effect, but provides the
infrastructure needed for upcoming patches. The additional forks envisioned
are a rewritten FSM implementation that doesn't rely on a fixed-size shared
memory block, and a visibility map to allow skipping portions of a table in
VACUUM that have no dead tuples.
PageHeaderIsValid when we zero the buffer instead of reading the page in.
The actual performance improvement is probably marginal since this function
isn't very heavily used, but a cycle saved is a cycle earned.
Zdenek Kotala
or target database is being accessed by other users, it tells you whether
the "other users" are live sessions or uncommitted prepared transactions.
(Indeed, it tells you exactly how many of each, but that's mostly just
because it was easy to do so.) This should help forestall the gotcha of
not realizing that a prepared transaction is what's blocking the command.
Per discussion.
rewrite. When called from SIInsertDataEntries, SICleanupQueue releases
the write lock if it has to issue a kill() to signal some laggard backend.
That still seems like a good idea --- but it's possible that by the time
we get the lock back, there are no longer enough free message slots to
satisfy SIInsertDataEntries' requirement. Must recheck, and repeat the
whole SICleanupQueue process if not. Noted while reading code.
thereby forestalling any problems with alignment of the data structure placed
there. Since SizeOfPageHeaderData is maxalign'd anyway in 8.3 and HEAD, this
does not actually change anything right now, but it is foreseeable that the
header size will change again someday. I had to fix a couple of places that
were assuming that the content offset is just SizeOfPageHeaderData rather than
MAXALIGN(SizeOfPageHeaderData). Per discussion of Zdenek's page-macros patch.
SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that
makes the code clearer, and avoid casting between Page and PageHeader where
possible. Zdenek Kotala, with some additional cleanup by Heikki Linnakangas.
I did not apply the parts of the proposed patch that would have resulted in
slightly changing the on-disk format of hash indexes; it seems to me that's
not a win as long as there's any chance of having in-place upgrade for 8.4.
CopySnapshot, per Neil Conway. Also add a comment about the assumption in
GetSnapshotData that the argument is statically allocated.
Also, fix some more typos in comments in snapmgr.c.
backend. If so, send a LOG message to the postmaster log, and if the table
is beyond the vacuum-for-wraparound horizon, forcibly drop it. Per recent
discussions. Perhaps we ought to back-patch this, but it probably needs
to age a bit in HEAD first.
read and written without a lock. The value itself is atomic, sure, but on
processors with weak memory ordering it's possible for a reader to see the
value change before it sees the associated message written into the buffer
array. Fix by introducing a spinlock that's used just to read and write
maxMsgNum. (We could do this with less overhead if we recognized a concept
of "memory access barrier"; is it worth introducing such a thing? At the
moment probably not --- I can't measure any clear slowdown from adding the
spinlock, so this solution is probably fine.) Per buildfarm results.
unnecessary cache resets. The major changes are:
* When the queue overflows, we only issue a cache reset to the specific
backend or backends that still haven't read the oldest message, rather
than resetting everyone as in the original coding.
* When we observe backend(s) falling well behind, we signal SIGUSR1
to only one backend, the one that is furthest behind and doesn't already
have a signal outstanding for it. When it finishes catching up, it will
in turn signal SIGUSR1 to the next-furthest-back guy, if there is one that
is far enough behind to justify a signal. The PMSIGNAL_WAKEN_CHILDREN
mechanism is removed.
* We don't attempt to clean out dead messages after every message-receipt
operation; rather, we do it on the insertion side, and only when the queue
fullness passes certain thresholds.
* Split SInvalLock into SInvalReadLock and SInvalWriteLock so that readers
don't block writers nor vice versa (except during the infrequent queue
cleanout operations).
* Transfer multiple sinval messages for each acquisition of a read or
write lock.
corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
forks. XLogOpenRelation() and the associated light-weight relation cache in
xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument,
instead of Relation.
For functions that still need a Relation struct during WAL replay, there's a
new function called CreateFakeRelcacheEntry() that returns a fake entry like
XLogOpenRelation() used to.
more logical that way, and also it reduces the amount of unnecessary includes
in bufpage.h, which is widely used.
Zdenek Kotala.
My previous patch to bufpage.h should also have credited him as author, but I
forgot (sorry about that).
There are two ways to track a snapshot: there's the "registered" list, which
is used for arbitrary long-lived snapshots; and there's the "active stack",
which is used for the snapshot that is considered "active" at any time.
This also allows users of snapshots to stop worrying about snapshot memory
allocation and freeing, and about using PG_TRY blocks around ActiveSnapshot
assignment. This is all done automatically now.
As a consequence, this allows us to reset MyProc->xmin when there are no
more snapshots registered in the current backend, reducing the impact that
long-running transactions have on VACUUM.