postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-13 16:22:44 +03:00

Author	SHA1	Message	Date
Heikki Linnakangas	58bece7a60	Fix #ifdeffed debugging code to work with relation forks.	2008-11-27 07:38:01 +00:00
Heikki Linnakangas	3396000684	Rethink the way FSM truncation works. Instead of WAL-logging FSM truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To make that cleaner from modularity point of view, move the WAL-logging one level up to RelationTruncate, and move RelationTruncate and all the related WAL-logging to new src/backend/catalog/storage.c file. Introduce new RelationCreateStorage and RelationDropStorage functions that are used instead of calling smgrcreate/smgrscheduleunlink directly. Move the pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new functions. This leaves smgr.c as a thin wrapper around md.c; all the transactional stuff is now in storage.c. This will make it easier to add new forks with similar truncation logic, like the visibility map.	2008-11-19 10:34:52 +00:00
Tom Lane	cad3a26a95	Fix sloppy omission of now-required #include's.	2008-11-11 14:17:02 +00:00
Heikki Linnakangas	7e8b0b9ab1	Change error messages to print the physical path, like "base/11517/3767_fsm", instead of symbolic names like "1663/11517/3767/1", per Alvaro's suggestion. I didn't change the messages in the higher-level index, heap and FSM routines, though, where the fork is implicit.	2008-11-11 13:19:16 +00:00
Tom Lane	85e2cedf98	Improve bulk-insert performance by keeping the current target buffer pinned (but not locked, as that would risk deadlocks). Also, make it work in a small ring of buffers to avoid having bulk inserts trash the whole buffer arena. Robert Haas, after an idea of Simon Riggs'.	2008-11-06 20:51:15 +00:00
Heikki Linnakangas	19c8dc839b	Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer functions into one ReadBufferExtended function, that takes the strategy and mode as argument. There's three modes, RBM_NORMAL which is the default used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages without throwing an error. The FSM needs the new mode to recover from corrupt pages, which could happend if we crash after extending an FSM file, and the new page is "torn". Add fork number to some error messages in bufmgr.c, that still lacked it.	2008-10-31 15:05:00 +00:00
Alvaro Herrera	089ae3bc9a	Properly access a buffer's LSN using existing access macros instead of abusing knowledge of page layout. Stolen from Jonah Harris' CRC patch	2008-10-20 21:11:15 +00:00
Tom Lane	35c2a3c3cf	Allow ShowBufferUsage() to report the number of reads/writes that have occurred to temporary files. This replaces the unused NDirectFileRead/NDirectFileWrite counters. Itagaki Takahiro	2008-09-17 13:15:55 +00:00
Heikki Linnakangas	3f0e808c4a	Introduce the concept of relation forks. An smgr relation can now consist of multiple forks, and each fork can be created and grown separately. The bulk of this patch is about changing the smgr API to include an extra ForkNumber argument in every smgr function. Also, smgrscheduleunlink and smgrdounlink no longer implicitly call smgrclose, because other forks might still exist after unlinking one. The callers of those functions have been modified to call smgrclose instead. This patch in itself doesn't have any user-visible effect, but provides the infrastructure needed for upcoming patches. The additional forks envisioned are a rewritten FSM implementation that doesn't rely on a fixed-size shared memory block, and a visibility map to allow skipping portions of a table in VACUUM that have no dead tuples.	2008-08-11 11:05:11 +00:00
Tom Lane	d8b04d5fac	In ReadOrZeroBuffer (and related entry points), don't bother to call PageHeaderIsValid when we zero the buffer instead of reading the page in. The actual performance improvement is probably marginal since this function isn't very heavily used, but a cycle saved is a cycle earned. Zdenek Kotala	2008-08-05 15:09:04 +00:00
Alvaro Herrera	e36e6b1cab	Add a few more DTrace probes to the backend. Robert Lor	2008-08-01 13:16:09 +00:00
Tom Lane	9d035f4254	Clean up the use of some page-header-access macros: principally, use SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that makes the code clearer, and avoid casting between Page and PageHeader where possible. Zdenek Kotala, with some additional cleanup by Heikki Linnakangas. I did not apply the parts of the proposed patch that would have resulted in slightly changing the on-disk format of hash indexes; it seems to me that's not a win as long as there's any chance of having in-place upgrade for 8.4.	2008-07-13 20:45:47 +00:00
Alvaro Herrera	a3540b0f65	Improve our #include situation by moving pointer types away from the corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.	2008-06-19 00:46:06 +00:00
Heikki Linnakangas	a213f1ee6c	Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relation forks. XLogOpenRelation() and the associated light-weight relation cache in xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument, instead of Relation. For functions that still need a Relation struct during WAL replay, there's a new function called CreateFakeRelcacheEntry() that returns a fake entry like XLogOpenRelation() used to.	2008-06-12 09:12:31 +00:00
Alvaro Herrera	cc87402d6e	Move BufferGetPageSize and BufferGetPage from bufpage.h to bufmgr.h. It is more logical that way, and also it reduces the amount of unnecessary includes in bufpage.h, which is widely used. Zdenek Kotala. My previous patch to bufpage.h should also have credited him as author, but I forgot (sorry about that).	2008-06-08 22:00:48 +00:00
Alvaro Herrera	9084399782	Put back bufmgr.h in bufpage.h -- it is needed by some macros. Remove #include bufmgr.h from (most?) source files which already include bufpage.h.	2008-05-12 16:06:10 +00:00
Alvaro Herrera	f8c4d7db60	Restructure some header files a bit, in particular heapam.h, by removing some unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.	2008-05-12 00:00:54 +00:00
Bruce Momjian	fca9fff41b	More README src cleanups.	2008-03-21 13:23:29 +00:00
Bruce Momjian	4e228447aa	Make source code READMEs more consistent. Add CVS tags to all README files.	2008-03-20 17:55:15 +00:00
Peter Eisentraut	0474dcb608	Refactor backend makefiles to remove lots of duplicate code	2008-02-19 10:30:09 +00:00
Bruce Momjian	9098ab9e32	Update copyrights in source tree to 2008.	2008-01-01 19:46:01 +00:00
Bruce Momjian	f6e8730d11	Re-run pgindent with updated list of typedefs. (Updated README should avoid this problem in the future.)	2007-11-15 22:25:18 +00:00
Bruce Momjian	fdf5a5efb7	pgindent run for 8.3.	2007-11-15 21:14:46 +00:00
Tom Lane	7a315a09dc	Dept. of second thoughts: fix loop in BgBufferSync so that the exit when bgwriter_lru_maxpages is exceeded leaves the loop variables in the expected state. In the original coding, we'd fail to advance next_to_clean, causing that buffer to be probably-uselessly rechecked next time, and also have an off-by-one idea of the number of buffers scanned.	2007-09-25 22:11:48 +00:00
Tom Lane	6f5c38dcd0	Just-in-time background writing strategy. This code avoids re-scanning buffers that cannot possibly need to be cleaned, and estimates how many buffers it should try to clean based on moving averages of recent allocation requests and density of reusable buffers. The patch also adds a couple more columns to pg_stat_bgwriter to help measure the effectiveness of the bgwriter. Greg Smith, building on his own work and ideas from several other people, in particular a much older patch from Itagaki Takahiro.	2007-09-25 20:03:38 +00:00
Tom Lane	282d2a03dd	HOT updates. When we update a tuple without changing any of its indexed columns, and the new version can be stored on the same heap page, we no longer generate extra index entries for the new version. Instead, index searches follow the HOT-chain links to ensure they find the correct tuple version. In addition, this patch introduces the ability to "prune" dead tuples on a per-page basis, without having to do a complete VACUUM pass to recover space. VACUUM is still needed to clean up dead index entries, however. Pavan Deolasee, with help from a bunch of other people.	2007-09-20 17:56:33 +00:00
Tom Lane	9fc25c0511	Improve logging of checkpoints. Patch by Greg Smith, worked over by Heikki and a little bit by me.	2007-06-30 19:12:02 +00:00
Tom Lane	867e2c91a0	Implement "distributed" checkpoints in which the checkpoint I/O is spread over a fairly long period of time, rather than being spat out in a burst. This happens only for background checkpoints carried out by the bgwriter; other cases, such as a shutdown checkpoint, are still done at full speed. Remove the "all buffers" scan in the bgwriter, and associated stats infrastructure, since this seems no longer very useful when the checkpoint itself is properly throttled. Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas, and some minor API editorialization by me.	2007-06-28 00:02:40 +00:00
Tom Lane	de6a6383a7	Update obsolete comment: it's no longer the case that mdread() will allow reads beyond EOF, except by special coercion.	2007-06-18 00:47:20 +00:00
Tom Lane	a04a423599	Arrange for large sequential scans to synchronize with each other, so that when multiple backends are scanning the same relation concurrently, each page is (ideally) read only once. Jeff Davis, with review by Heikki and Tom.	2007-06-08 18:23:53 +00:00
Tom Lane	d526575f89	Make large sequential scans and VACUUMs work in a limited-size "ring" of buffers, rather than blowing out the whole shared-buffer arena. Aside from avoiding cache spoliation, this fixes the problem that VACUUM formerly tended to cause a WAL flush for every page it modified, because we had it hacked to use only a single buffer. Those flushes will now occur only once per ring-ful. The exact ring size, and the threshold for seqscans to switch into the ring usage pattern, remain under debate; but the infrastructure seems done. The key bit of infrastructure is a new optional BufferAccessStrategy object that can be passed to ReadBuffer operations; this replaces the former StrategyHintVacuum API. This patch also changes the buffer usage-count methodology a bit: we now advance usage_count when first pinning a buffer, rather than when last unpinning it. To preserve the behavior that a buffer's lifetime starts to decrease when it's released, the clock sweep code is modified to not decrement usage_count of pinned buffers. Work not done in this commit: teach GiST and GIN indexes to use the vacuum BufferAccessStrategy for vacuum-driven fetches. Original patch by Simon, reworked by Heikki and again by Tom.	2007-05-30 20:12:03 +00:00
Tom Lane	77947c51c0	Fix up pgstats counting of live and dead tuples to recognize that committed and aborted transactions have different effects; also teach it not to assume that prepared transactions are always committed. Along the way, simplify the pgstats API by tying counting directly to Relations; I cannot detect any redeeming social value in having stats pointers in HeapScanDesc and IndexScanDesc structures. And fix a few corner cases in which counts might be missed because the relation's pgstat_info pointer hadn't been set.	2007-05-27 03:50:39 +00:00
Tom Lane	63735ca815	Dept. of second thoughts: add comments cautioning against using ReadOrZeroBuffer to fetch pages from beyond physical EOF. This would usually work, but would cause problems for md.c if writes occurred beyond a segment boundary when the previous segment file hadn't been fully extended.	2007-05-02 23:34:48 +00:00
Tom Lane	8c3cc86e7b	During WAL recovery, when reading a page that we intend to overwrite completely from the WAL data, don't bother to physically read it; just have bufmgr.c return a zeroed-out buffer instead. This speeds recovery significantly, and also avoids unnecessary failures when a page-to-be-overwritten has corrupt page headers on disk. This replaces a former kluge that accomplished the latter by pretending zero_damaged_pages was always ON during WAL recovery; which was OK when the kluge was put in, but is unsafe when restoring a WAL log that was written with full_page_writes off. Heikki Linnakangas	2007-05-02 23:18:03 +00:00
Magnus Hagander	335feca441	Add some instrumentation to the bgwriter, through the stats collector. New view pg_stat_bgwriter, and the functions required to build it.	2007-03-30 18:34:56 +00:00
Bruce Momjian	8b4ff8b6a1	Wording cleanup for error messages. Also change can't -> cannot. Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".	2007-02-01 19:10:30 +00:00
Peter Eisentraut	2cc01004c6	Remove remains of old depend target.	2007-01-20 17:16:17 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	72619f8191	Modify local buffer management to request memory for local buffers in blocks of increasing size, instead of one at a time. This reduces the memory management overhead when num_temp_buffers is large: in the previous coding we would actually waste 50% of the space used for temp buffers, because aset.c would round the individual requests up to 16K. Problem noted while studying a performance issue reported by Steven Flatt. Back-patch as far as 8.1 --- older versions used few enough local buffers that the issue isn't significant for them.	2006-12-27 22:31:54 +00:00
Tom Lane	954c1813ac	Remove an unnecessary HOLD_INTERRUPTS/RESUME_INTERRUPTS pair. This was required back when RESUME_INTERRUPTS could actually execute ProcessInterrupts, but that hasn't been true since 2001...	2006-10-22 20:34:54 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Tom Lane	ffae5cc5a6	Add a check to prevent overwriting valid data if smgrnblocks() gives a wrong answer, as has been seen to occur with a buggy Linux kernel. Not really our bug, but it's a simple test in a seldom-used control path, so might as well have a defense.	2006-09-25 22:01:10 +00:00
Tom Lane	2e5e856f6b	Marginal cleanup in arrangements for ensuring StrategyHintVacuum is cleared after an error during VACUUM. We have a PG_TRY block anyway around the only call sites, so just reset it in the CATCH clause instead of having AtEOXact_Buffers blindly do it during xact end. I think the old code was actively wrong for the case of a failure during ANALYZE inside a subtransaction --- the flag wouldn't get cleared until main transaction end. Probably not worth back-patching though.	2006-09-17 22:16:22 +00:00
Tom Lane	b25dc481c8	Fix oversight in sizing of shared buffer lookup hashtable. Because BufferAlloc tries to insert a new mapping entry before deleting the old one for a buffer, we have a transient need for more than NBuffers entries --- one more in 8.1, and as many as NUM_BUFFER_PARTITIONS more in CVS HEAD. In theory this could lead to an "out of shared memory" failure if shmem had already been completely claimed by the time the extra entries were needed.	2006-07-23 18:34:45 +00:00
Tom Lane	10b9ca3d05	Split the buffer mapping table into multiple separately lockable partitions, as per discussion. Passes functionality checks, but I don't have any performance data yet.	2006-07-23 03:07:58 +00:00
Tom Lane	cd24163f6d	Fix another passel of include-file breakage. Kris Jurka, Tom Lane	2006-07-14 16:59:19 +00:00
Bruce Momjian	e0522505bd	Remove 576 references of include files that were not needed.	2006-07-14 14:52:27 +00:00
Tom Lane	8ff80c1bd3	Remove obsolete comment about VACUUM FULL: it takes buffer content locks now, and must do so to ensure bgwriter doesn't write a page that is in process of being compacted.	2006-06-08 14:58:33 +00:00
Tom Lane	0fcc3c2f1d	Repair a low-probability race condition identified by Qingqing Zhou. If a process abandons a wait in LockBufferForCleanup (in practice, only happens if someone cancels a VACUUM) just before someone else sends it a signal indicating the buffer is available, it was possible for the wakeup to remain in the process' semaphore, causing misbehavior next time the process waited for an lmgr lock. Rather than try to prevent the race condition directly, it seems best to make the lock manager robust against leftover wakeups, by having it repeat waiting on the semaphore if the lock has not actually been granted or denied yet.	2006-04-14 03:38:56 +00:00
Tom Lane	a8b8f4db23	Clean up WAL/buffer interactions as per my recent proposal. Get rid of the misleadingly-named WriteBuffer routine, and instead require routines that change buffer pages to call MarkBufferDirty (which does exactly what it says). We also require that they do so before calling XLogInsert; this takes care of the synchronization requirement documented in SyncOneBuffer. Note that because bufmgr takes the buffer content lock (in shared mode) while writing out any buffer, it doesn't matter whether MarkBufferDirty is executed before the buffer content change is complete, so long as the content change is completed before releasing exclusive lock on the buffer. So it's OK to set the dirtybit before we fill in the LSN. This eliminates the former kluge of needing to set the dirtybit in LockBuffer. Aside from making the code more transparent, we can also add some new debugging assertions, in particular that the caller of MarkBufferDirty must hold the buffer content lock, not merely a pin.	2006-03-31 23:32:07 +00:00

1 2 3 4 5 ...

338 Commits