http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php.
This fix is intended to be permanent: it moves the responsibility for
calling SetBufferCommitInfoNeedsSave() into the tqual.c routines,
eliminating the requirement for callers to test whether t_infomask changed.
Also, tighten validity checking on buffer IDs in bufmgr.c --- several
routines were paranoid about out-of-range shared buffer numbers but not
about out-of-range local ones, which seems a tad pointless.
keep track of portal-related resources separately from transaction-related
resources. This allows cursors to work in a somewhat sane fashion with
nested transactions. For now, cursor behavior is non-subtransactional,
that is a cursor's state does not roll back if you abort a subtransaction
that fetched from the cursor. We might want to change that later.
performance front, but with feature freeze upon us I think it's time to
drive a stake in the ground and say that this will be in 7.5.
Alvaro Herrera, with some help from Tom Lane.
There are various things left to do: contrib dbsize and oid2name modules
need work, and so does the documentation. Also someone should think about
COMMENT ON TABLESPACE and maybe RENAME TABLESPACE. Also initlocation is
dead, it just doesn't know it yet.
Gavin Sherry and Tom Lane.
not holding the buffer's cntx_lock or io_in_progress_lock. A recent
report from Litao Wu makes me wonder whether it is ever possible for
us to drop a buffer and forget to release its cntx_lock. The Assert
does not fire in the regression tests, but that proves little ...
temp tables, and avoid WAL-logging truncations of temp tables. Do issue
fsync on truncated files (not sure this is necessary but it seems like
a good idea).
rather than an error code, and does elog(ERROR) not elog(WARNING)
when it detects a problem. All callers were simply elog(ERROR)'ing on
failure return anyway, and I find it hard to envision a caller that would
not, so we may as well simplify the callers and produce the more useful
error message directly.
explicitly fsync'ing every (non-temp) file we have written since the
last checkpoint. In the vast majority of cases, the burden of the
fsyncs should fall on the bgwriter process not on backends. (To this
end, we assume that an fsync issued by the bgwriter will force out
blocks written to the same file by other processes using other file
descriptors. Anyone have a problem with that?) This makes the world
safe for WIN32, which ain't even got sync(2), and really makes the world
safe for Unixen as well, because sync(2) never had the semantics we need:
it offers no way to wait for the requested I/O to finish.
Along the way, fix a bug I recently introduced in xlog recovery:
file truncation replay failed to clear bufmgr buffers for the dropped
blocks, which could result in 'PANIC: heap_delete_redo: no block'
later on in xlog replay.
than being random pieces of other files. Give bgwriter responsibility
for all checkpoint activity (other than a post-recovery checkpoint);
so this child process absorbs the functionality of the former transient
checkpoint and shutdown subprocesses. While at it, create an actual
include file for postmaster.c, which for some reason never had its own
file before.
costing us lots more to maintain than it was worth. On shared tables
it was of exactly zero benefit because we couldn't trust it to be
up to date. On temp tables it sometimes saved an lseek, but not often
enough to be worth getting excited about. And the real problem was that
we forced an lseek on every relcache flush in order to update the field.
So all in all it seems best to lose the complexity.
of whether we have successfully read data into a buffer; this makes the
error behavior a bit more transparent (IMHO anyway), and also makes it
work correctly for local buffers which don't use Start/TerminateBufferIO.
Collapse three separate functions for writing a shared buffer into one.
This overlaps a bit with cleanups that Neil proposed awhile back, but
seems not to have committed yet.
of VACUUM cases so that VACUUM requests don't affect the ARC state at all,
avoid corner case where BufferSync would uselessly rewrite a buffer that
no longer contains the page that was to be flushed. Make some minor
other cleanups in and around the bufmgr as well, such as moving PinBuffer
and UnpinBuffer into bufmgr.c where they really belong.
for already empty buffers because their buffer tag was not cleard out
when the buffers have been invalidated before.
Also removed the misnamed BM_FREE bufhdr flag and replaced the checks,
which effectively ask if the buffer is unpinned, with checks against the
refcount field.
Jan
subroutine in src/port/pgsleep.c. Remove platform dependencies from
miscadmin.h and put them in port.h where they belong. Extend recent
vacuum cost-based-delay patch to apply to VACUUM FULL, ANALYZE, and
non-btree index vacuuming.
By the way, where is the documentation for the cost-based-delay patch?
the relcache, and so the notion of 'blind write' is gone. This should
improve efficiency in bgwriter and background checkpoint processes.
Internal restructuring in md.c to remove the not-very-useful array of
MdfdVec objects --- might as well just use pointers.
Also remove the long-dead 'persistent main memory' storage manager (mm.c),
since it seems quite unlikely to ever get resurrected.
done by the background writer between writing dirty blocks and
napping.
none (default) no action
sync bgwriter calls smgrsync() causing a sync(2)
A global sync() is only good on dedicated database servers, so
more flush methods should be added in the future.
Jan
pointer type when it is not necessary to do so.
For future reference, casting NULL to a pointer type is only necessary
when (a) invoking a function AND either (b) the function has no prototype
OR (c) the function is a varargs function.
some concurrent changes Jan was making to the bufmgr. Here's an
updated version of the patch -- it should apply cleanly to CVS
HEAD and passes the regression tests.
This patch makes the following changes:
- remove the UnlockAndReleaseBuffer() and UnlockAndWriteBuffer()
macros, and replace uses of them with calls to the appropriate
functions.
- remove a bunch of #ifdef BMTRACE code: it is ugly & broken
(i.e. it doesn't compile)
- make BufferReplace() return a bool, not an int
- cleanup some logic in bufmgr.c; should be functionality
equivalent to the previous code, just cleaner now
- remove the BM_PRIVATE flag as it is unused
- improve a few comments, etc.
This first part of the background writer does no syncing at all.
It's only purpose is to keep the LRU heads clean so that regular
backends seldom to never have to call write().
Jan
index pages: when _bt_getbuf asks the FSM for a free index page, it is
possible (and, in some cases, even moderately likely) that the answer
will be the same page that _bt_split is trying to split. _bt_getbuf
already knew that the returned page might not be free, but it wasn't
prepared for the possibility that even trying to lock the page could
be problematic. Fix by doing a conditional rather than unconditional
grab of the page lock.
detected during buffer dump to be labeled with the buffer location.
For example, if a page LSN is clobbered, we now produce something like
ERROR: XLogFlush: request 2C000000/8468EC8 is not satisfied --- flushed only
to 0/8468EF0
CONTEXT: writing block 0 of relation 428946/566240
whereas before there was no convenient way to find out which page had
been trashed.
page when it's read in, per pghackers discussion around 17-Feb. Add a
GUC variable zero_damaged_pages that causes the response to be a WARNING
followed by zeroing the page, rather than the normal ERROR; this is per
Hiroshi's suggestion that there needs to be a way to get at the data
in the rest of the table.
longer works -- IncrHeapAccessStat() didn't actually *do* anything
anymore, so no reason to keep it around AFAICS. I also fixed a
grammatical error in a comment.
Neil Conway
> There's no longer a separate call to heap_storage_create in that routine
> --- the right place to make the test is now in the storage_create
> boolean parameter being passed to heap_create. A simple change, but
> it passeth patch's understanding ...
Thanks.
Attached is a patch against cvs tip as of 8:30 PM PST or so. Turned out
that even after fixing the failed hunks, there was a new spot in
bufmgr.c which needed to be fixed (related to temp relations;
RelationUpdateNumberOfBlocks). But thankfully the regression test code
caught it :-)
Joe Conway