1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-13 16:22:44 +03:00
Commit Graph

338 Commits

Author SHA1 Message Date
Tom Lane
6d61cdec07 Clean up and document the API for XLogOpenRelation and XLogReadBuffer.
This commit doesn't make much functional change, but it does eliminate some
duplicated code --- for instance, PageIsNew tests are now done inside
XLogReadBuffer rather than by each caller.
The GIST xlog code still needs a lot of love, but I'll worry about that
separately.
2006-03-29 21:17:39 +00:00
Bruce Momjian
f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane
9a506a6257 Arrange to call AbsorbFsyncRequests every so often while performing a
checkpoint in the bgwriter.  This forestalls overflow of the fsync request
queue, which is not fatal but causes considerable performance degradation
when it occurs (because backends then have to do their own fsyncs).  Per
patch from Itagaki Takahiro, modified a little bit by me.
2006-03-03 00:02:02 +00:00
Tom Lane
304160c3e2 Fix ReadBuffer() to correctly handle the case where it's trying to extend
the relation but it finds a pre-existing valid buffer.  The buffer does not
correspond to any page known to the kernel, so we *must* do smgrextend to
ensure that the space becomes allocated.  The 7.x branches all do this
correctly, but the corner case got lost somewhere during 8.0 bufmgr rewrites.
(My fault no doubt :-( ... I think I assumed that such a buffer must be
not-BM_VALID, which is not so.)
2006-01-06 00:04:20 +00:00
Tom Lane
195f164228 Get rid of the SpinLockAcquire/SpinLockAcquire_NoHoldoff distinction
in favor of having just one set of macros that don't do HOLD/RESUME_INTERRUPTS
(hence, these correspond to the old SpinLockAcquire_NoHoldoff case).
Given our coding rules for spinlock use, there is no reason to allow
CHECK_FOR_INTERRUPTS to be done while holding a spinlock, and also there
is no situation where ImmediateInterruptOK will be true while holding a
spinlock.  Therefore doing HOLD/RESUME_INTERRUPTS while taking/releasing a
spinlock is just a waste of cycles.  Qingqing Zhou and Tom Lane.
2005-12-29 18:08:05 +00:00
Bruce Momjian
436a2956d8 Re-run pgindent, fixing a problem where comment lines after a blank
comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.
2005-11-22 18:17:34 +00:00
Tom Lane
c859308aba DropRelFileNodeBuffers failed to fix the state of the lookup hash table
that was added to localbuf.c in 8.1; therefore, applying it to a temp table
left corrupt lookup state in memory.  The only case where this had a
significant chance of causing problems was an ON COMMIT DELETE ROWS temp
table; the other possible paths left bogus state that was unlikely to
be used again.  Per report from Csaba Nagy.
2005-11-17 17:42:02 +00:00
Tom Lane
fbbe00242d Tweak buffer manager so that 'internal' accesses to a buffer do not
advance its usage_count.  This includes writes of dirty buffers triggered
by bgwriter, checkpoint, or FlushRelationBuffers, as well as various
corner cases that really ought not count as accesses to the page.
Should make for some marginal improvement in the quality of our decisions
about when to recycle buffers.  Per suggestion from ITAGAKI Takahiro.
2005-10-27 17:07:58 +00:00
Bruce Momjian
1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane
07eeb9d109 Do all accesses to shared buffer headers through volatile-qualified
pointers, to ensure that compilers won't rearrange accesses to occur
while we're not holding the buffer header spinlock.  It's probably
not necessary to mark volatile in every single place in bufmgr.c,
but better safe than sorry.  Per trouble report from Kevin Grittner.
2005-10-12 16:45:14 +00:00
Tom Lane
0007490e09 Convert the arithmetic for shared memory size calculation from 'int'
to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc.  It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.)  Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine.  Original patch from Koichi Suzuki, additional
work by moi.
2005-08-20 23:26:37 +00:00
Bruce Momjian
27639809d2 Reverse out Assert addition. 2005-08-12 23:13:54 +00:00
Bruce Momjian
fab177e64f Improve documention on loading large data sets into plperl.
David Fetter
2005-08-12 21:42:53 +00:00
Tom Lane
3ae7e4a33b Remove BufferBlockPointers array in favor of a base + (bufnum) * BLCKSZ
computation.  On modern machines this is as fast if not faster, and we
don't have to clog the CPU's L2 cache with a tens-of-KB pointer array.
If we ever decide to adopt a more dynamic allocation method for shared
buffers, we'll probably have to revert this patch, but in the meantime
we might as well save a few bytes and nanoseconds.  Per Qingqing Zhou.
2005-08-12 05:05:51 +00:00
Tom Lane
15269b5955 Avoid useless loop overhead in AtEOXact routines when the backend is
compiled with USE_ASSERT_CHECKING but is running with assert_enabled false.
2005-08-08 19:44:22 +00:00
Tom Lane
7117cd3a77 Cause ShutdownPostgres to do a normal transaction abort during backend
exit, instead of trying to take shortcuts.  Introduce some additional
shutdown callback routines to eliminate kluges like having ProcKill
be responsible for shutting down the buffer manager.  Ensure that the
order of operations during shutdown is predictable and what you would
expect given the module layering.
2005-08-08 03:12:16 +00:00
Tom Lane
6eac4e69cf Tweak BgBufferSync() so that a persistent write error on a dirty buffer
doesn't block the bgwriter from making progress writing out other buffers.
This was a hard problem in the context of the ARC/2Q design, but it's
trivial in the context of clock sweep ... just advance the sweep counter
before we try to write not after.
2005-08-02 20:52:08 +00:00
Tom Lane
e92a88272e Modify hash_search() API to prevent future occurrences of the error
spotted by Qingqing Zhou.  The HASH_ENTER action now automatically
fails with elog(ERROR) on out-of-memory --- which incidentally lets
us eliminate duplicate error checks in quite a bunch of places.  If
you really need the old return-NULL-on-out-of-memory behavior, you
can ask for HASH_ENTER_NULL.  But there is now an Assert in that path
checking that you aren't hoping to get that behavior in a palloc-based
hash table.
Along the way, remove the old HASH_FIND_SAVE/HASH_REMOVE_SAVED actions,
which were not being used anywhere anymore, and were surely too ugly
and unsafe to want to see revived again.
2005-05-29 04:23:07 +00:00
Tom Lane
ee3b71f6bc Split the shared-memory array of PGPROC pointers out of the sinval
communication structure, and make it its own module with its own lock.
This should reduce contention at least a little, and it definitely makes
the code seem cleaner.  Per my recent proposal.
2005-05-19 21:35:48 +00:00
Tom Lane
354049c709 Remove unnecessary calls of FlushRelationBuffers: there is no need
to write out data that we are about to tell the filesystem to drop.
smgr_internal_unlink already had a DropRelFileNodeBuffers call to
get rid of dead buffers without a write after it's no longer possible
to roll back the deleting transaction.  Adding a similar call in
smgrtruncate simplifies callers and makes the overall division of
labor clearer.  This patch removes the former behavior that VACUUM
would write all dirty buffers of a relation unconditionally.
2005-03-20 22:00:54 +00:00
Tom Lane
91728fa26c Add temp_buffers GUC variable to allow users to determine the size
of the local buffer arena for temporary table access.
2005-03-19 23:27:11 +00:00
Tom Lane
d65522aeb6 Upgrade localbuf.c to use a hash table instead of linear search to
find already-allocated local buffers.  This is the last obstacle
in the way of setting NLocBuffer to something reasonably large.
2005-03-19 17:39:43 +00:00
Tom Lane
88164799ce Need to reset local buffer pin counts, not only shared buffer pins,
before we attempt any file deletions in ShutdownPostgres.  Per Tatsuo.
2005-03-18 16:16:09 +00:00
Tom Lane
cef01c3355 Avoid infinite loop in InvalidateBuffer if we ourselves are holding
a pin on the victim buffer.
2005-03-18 05:25:23 +00:00
Tom Lane
5d5087363d Replace the BufMgrLock with separate locks on the lookup hashtable and
the freelist, plus per-buffer spinlocks that protect access to individual
shared buffer headers.  This requires abandoning a global freelist (since
the freelist is a global contention point), which shoots down ARC and 2Q
as well as plain LRU management.  Adopt a clock sweep algorithm instead.
Preliminary results show substantial improvement in multi-backend situations.
2005-03-04 20:21:07 +00:00
Tom Lane
cc4f58f4cd Ensure that all details of the ARC algorithm are hidden within freelist.c.
This refactoring does not change any algorithms or data structures, just
remove visibility of the ARC datastructures from other source files.
2005-02-03 23:29:19 +00:00
Tom Lane
0ce4d56924 Phase 1 of fix for 'SMgrRelation hashtable corrupted' problem. This
is the minimum required fix.  I want to look next at taking advantage of
it by simplifying the message semantics in the shared inval message queue,
but that part can be held over for 8.1 if it turns out too ugly.
2005-01-10 20:02:24 +00:00
Tom Lane
c9d8edc906 Repair bufmgr deadlock problem reported by Michael Wildpaner. Must take
share lock on a buffer being written out before releasing BufMgrLock in
the BufferAlloc code path; if we do it later we might block on someone
who's re-pinned the buffer.  I believe this is only an issue for BufferAlloc
and not the other places that call FlushBuffer.  BufferSync must continue
to do it the old way since it may well be trying to write buffers that
other backends have pinned; but it should not be holding any conflicting
locks.  FlushRelationBuffers is okay since it's got exclusive lock at the
relation level.
2005-01-03 18:49:41 +00:00
PostgreSQL Daemon
2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Neil Conway
4acc97d7e4 Assert that BufferIsPinned() in IncrBufferRefCount(), rather than using
a home-brewed combination of assertions that boiled down to the same
thing.
2004-11-24 02:56:17 +00:00
Tom Lane
4347cc2392 Allow background writing to be shut down by setting limit values to zero.
This does not disable the bgwriter process: it still has to wake up often
enough to collect fsync requests from backends in a timely fashion.  But
it responds to the recent gripe about not being able to prevent the disk
from being spun up constantly.
2004-10-17 22:01:51 +00:00
Tom Lane
fdd13f1568 Give the ResourceOwner mechanism full responsibility for releasing buffer
pins at end of transaction, and reduce AtEOXact_Buffers to an Assert
cross-check that this was done correctly.  When not USE_ASSERT_CHECKING,
AtEOXact_Buffers is a complete no-op.  This gets rid of an O(NBuffers)
bottleneck during transaction commit/abort, which recent testing has shown
becomes significant above a few tens of thousands of shared buffers.
2004-10-16 18:57:26 +00:00
Tom Lane
1c2de47746 Remove BufferLocks[] array in favor of a single pointer to the buffer
(if any) currently waited for by LockBufferForCleanup(), which is all
that we were using it for anymore.  Saves some space and eliminates
proportional-to-NBuffers slowdown in UnlockBuffers().
2004-10-16 18:05:07 +00:00
Tom Lane
9ffc8ed58b Repair possible failure to update hint bits back to disk, per
http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php.
This fix is intended to be permanent: it moves the responsibility for
calling SetBufferCommitInfoNeedsSave() into the tqual.c routines,
eliminating the requirement for callers to test whether t_infomask changed.
Also, tighten validity checking on buffer IDs in bufmgr.c --- several
routines were paranoid about out-of-range shared buffer numbers but not
about out-of-range local ones, which seems a tad pointless.
2004-10-15 22:40:29 +00:00
Tom Lane
8f9f198603 Restructure subtransaction handling to reduce resource consumption,
as per recent discussions.  Invent SubTransactionIds that are managed like
CommandIds (ie, counter is reset at start of each top transaction), and
use these instead of TransactionIds to keep track of subtransaction status
in those modules that need it.  This means that a subtransaction does not
need an XID unless it actually inserts/modifies rows in the database.
Accordingly, don't assign it an XID nor take a lock on the XID until it
tries to do that.  This saves a lot of overhead for subtransactions that
are only used for error recovery (eg plpgsql exceptions).  Also, arrange
to release a subtransaction's XID lock as soon as the subtransaction
exits, in both the commit and abort cases.  This avoids holding many
unique locks after a long series of subtransactions.  The price is some
additional overhead in XactLockTableWait, but that seems acceptable.
Finally, restructure the state machine in xact.c to have a more orthogonal
set of states for subtransactions.
2004-09-16 16:58:44 +00:00
Tom Lane
eb917c1a21 I can't see any good reason for DropRelFileNodeBuffers to be issuing
FATAL when it detects a nonzero reference count.  Reduce to ERROR.
2004-09-06 17:31:32 +00:00
Tom Lane
a421b4e850 FlushRelationBuffers was also being a bit cavalier about whether the
relation is already opened by smgr.
2004-08-31 16:13:06 +00:00
Bruce Momjian
b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Bruce Momjian
da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tom Lane
fe548629c5 Invent ResourceOwner mechanism as per my recent proposal, and use it to
keep track of portal-related resources separately from transaction-related
resources.  This allows cursors to work in a somewhat sane fashion with
nested transactions.  For now, cursor behavior is non-subtransactional,
that is a cursor's state does not roll back if you abort a subtransaction
that fetched from the cursor.  We might want to change that later.
2004-07-17 03:32:14 +00:00
Tom Lane
573a71a5da Nested transactions. There is still much left to do, especially on the
performance front, but with feature freeze upon us I think it's time to
drive a stake in the ground and say that this will be in 7.5.

Alvaro Herrera, with some help from Tom Lane.
2004-07-01 00:52:04 +00:00
Tom Lane
2467394ee1 Tablespaces. Alternate database locations are dead, long live tablespaces.
There are various things left to do: contrib dbsize and oid2name modules
need work, and so does the documentation.  Also someone should think about
COMMENT ON TABLESPACE and maybe RENAME TABLESPACE.  Also initlocation is
dead, it just doesn't know it yet.

Gavin Sherry and Tom Lane.
2004-06-18 06:14:31 +00:00
Tom Lane
bbf0ebadaf StrategyDirtyBufferList wasn't being careful to honor max_buffers limit.
Bug is only latent given that sole caller is passing NBuffers, but it
could bite someone in the rear someday.
2004-06-11 17:20:39 +00:00
Tom Lane
e6cba71503 Add some code to Assert that when we release pin on a buffer, we are
not holding the buffer's cntx_lock or io_in_progress_lock.  A recent
report from Litao Wu makes me wonder whether it is ever possible for
us to drop a buffer and forget to release its cntx_lock.  The Assert
does not fire in the regression tests, but that proves little ...
2004-06-11 16:43:24 +00:00
Tom Lane
921d749bd4 Adjust our timezone library to use pg_time_t (typedef'd as int64) in
place of time_t, as per prior discussion.  The behavior does not change
on machines without a 64-bit-int type, but on machines with one, which
is most, we are rid of the bizarre boundary behavior at the edges of
the 32-bit-time_t range (1901 and 2038).  The system will now treat
times over the full supported timestamp range as being in your local
time zone.  It may seem a little bizarre to consider that times in
4000 BC are PST or EST, but this is surely at least as reasonable as
propagating Gregorian calendar rules back that far.

I did not modify the format of the zic timezone database files, which
means that for the moment the system will not know about daylight-savings
periods outside the range 1901-2038.  Given the way the files are set up,
it's not a simple decision like 'widen to 64 bits'; we have to actually
think about the range of years that need to be supported.  We should
probably inquire what the plans of the upstream zic people are before
making any decisions of our own.
2004-06-03 02:08:07 +00:00
Tom Lane
91d20ff7aa Additional mop-up for sync-to-fsync changes: avoid issuing fsyncs for
temp tables, and avoid WAL-logging truncations of temp tables.  Do issue
fsync on truncated files (not sure this is necessary but it seems like
a good idea).
2004-05-31 20:31:33 +00:00
Tom Lane
e674707968 Minor code rationalization: FlushRelationBuffers just returns void,
rather than an error code, and does elog(ERROR) not elog(WARNING)
when it detects a problem.  All callers were simply elog(ERROR)'ing on
failure return anyway, and I find it hard to envision a caller that would
not, so we may as well simplify the callers and produce the more useful
error message directly.
2004-05-31 19:24:05 +00:00
Tom Lane
9b178555fc Per previous discussions, get rid of use of sync(2) in favor of
explicitly fsync'ing every (non-temp) file we have written since the
last checkpoint.  In the vast majority of cases, the burden of the
fsyncs should fall on the bgwriter process not on backends.  (To this
end, we assume that an fsync issued by the bgwriter will force out
blocks written to the same file by other processes using other file
descriptors.  Anyone have a problem with that?)  This makes the world
safe for WIN32, which ain't even got sync(2), and really makes the world
safe for Unixen as well, because sync(2) never had the semantics we need:
it offers no way to wait for the requested I/O to finish.

Along the way, fix a bug I recently introduced in xlog recovery:
file truncation replay failed to clear bufmgr buffers for the dropped
blocks, which could result in 'PANIC:  heap_delete_redo: no block'
later on in xlog replay.
2004-05-31 03:48:10 +00:00
Tom Lane
076a055acf Separate out bgwriter code into a logically separate module, rather
than being random pieces of other files.  Give bgwriter responsibility
for all checkpoint activity (other than a post-recovery checkpoint);
so this child process absorbs the functionality of the former transient
checkpoint and shutdown subprocesses.  While at it, create an actual
include file for postmaster.c, which for some reason never had its own
file before.
2004-05-29 22:48:23 +00:00
Tom Lane
1a321f26d8 Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs by
about a third, make it work on non-Windows platforms again.  (But perhaps
I broke the WIN32 code, since I have no way to test that.)  Fold all the
paths that fork postmaster child processes to go through the single
routine SubPostmasterMain, which takes care of resurrecting the state that
would normally be inherited from the postmaster (including GUC variables).
Clean up some places where there's no particularly good reason for the
EXEC and non-EXEC cases to work differently.  Take care of one or two
FIXMEs that remained in the code.
2004-05-28 05:13:32 +00:00