postgres

mirror of https://github.com/postgres/postgres.git synced 2025-07-08 11:42:09 +03:00

Author	SHA1	Message	Date
Tom Lane	73b796a52c	Improve coding around the fsync request queue. In all branches back to 8.3, this patch fixes a questionable assumption in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there are no uninitialized pad bytes in the request queue structs. This would only cause trouble if (a) there were such pad bytes, which could happen in 8.4 and up if the compiler makes enum ForkNumber narrower than 32 bits, but otherwise would require not-currently-planned changes in the widths of other typedefs; and (b) the kernel has not uniformly initialized the contents of shared memory to zeroes. Still, it seems a tad risky, and we can easily remove any risk by pre-zeroing the request array for ourselves. In addition to that, we need to establish a coding rule that struct RelFileNode can't contain any padding bytes, since such structs are copied into the request array verbatim. (There are other places that are assuming this anyway, it turns out.) In 9.1 and up, the risk was a bit larger because we were also effectively assuming that struct RelFileNodeBackend contained no pad bytes, and with fields of different types in there, that would be much easier to break. However, there is no good reason to ever transmit fsync or delete requests for temp files to the bgwriter/checkpointer, so we can revert the request structs to plain RelFileNode, getting rid of the padding risk and saving some marginal number of bytes and cycles in fsync queue manipulation while we are at it. The savings might be more than marginal during deletion of a temp relation, because the old code transmitted an entirely useless but nonetheless expensive-to-process ForgetRelationFsync request to the background process, and also had the background process perform the file deletion even though that can safely be done immediately. In addition, make some cleanup of nearby comments and small improvements to the code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.	2012-07-17 16:56:54 -04:00
Bruce Momjian	927d61eeff	Run pgindent on 9.2 source tree in preparation for first 9.3 commit-fest.	2012-06-10 15:20:04 -04:00
Tom Lane	ece01aae47	Scan the buffer pool just once, not once per fork, during relation drop. This provides a speedup of about 4X when NBuffers is large enough. There is also a useful reduction in sinval traffic, since we only do CacheInvalidateSmgr() once not once per fork. Simon Riggs, reviewed and somewhat revised by Tom Lane	2012-06-07 17:43:11 -04:00
Tom Lane	e8d029a30b	Do unlocked prechecks in bufmgr.c loops that scan the whole buffer pool. DropRelFileNodeBuffers, DropDatabaseBuffers, FlushRelationBuffers, and FlushDatabaseBuffers have to scan the whole shared_buffers pool because we have no index structure that would find the target buffers any more efficiently than that. This gets expensive with large NBuffers. We can shave some cycles from these loops by prechecking to see if the current buffer is interesting before we acquire the buffer header lock. Ordinarily such a test would be unsafe, but in these cases it should be safe because we are already assuming that the caller holds a lock that prevents any new target pages from being loaded into the buffer pool concurrently. Therefore, no buffer tag should be changing to a value of interest, only away from a value of interest. So a false negative match is impossible, while a false positive is safe because we'll recheck after acquiring the buffer lock. Initial testing says that this speeds these loops by a factor of 2X to 3X on common Intel hardware. Patch for DropRelFileNodeBuffers by Jeff Janes (based on an idea of Heikki's); extended to the remaining sequential scans by Tom Lane	2012-06-07 16:46:26 -04:00
Tom Lane	6308ba05a7	Improve control logic for bgwriter hibernation mode. Commit `6d90eaaa89` added a hibernation mode to the bgwriter to reduce the server's idle-power consumption. However, its interaction with the detailed behavior of BgBufferSync's feedback control loop wasn't very well thought out. That control loop depends primarily on the rate of buffer allocation, not the rate of buffer dirtying, so the hibernation mode has to be designed to operate only when no new buffer allocations are happening. Also, the check for whether the system is effectively idle was not quite right and would fail to detect a constant low level of activity, thus allowing the bgwriter to go into hibernation mode in a way that would let the cycle time vary quite a bit, possibly further confusing the feedback loop. To fix, move the wakeup support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode unless no buffer allocations have happened recently. In addition, fix the delaying logic to remove the problem of possibly not responding to signals promptly, which was basically caused by trying to use the process latch's is_set flag for multiple purposes. I can't prove it but I'm suspicious that that hack was responsible for the intermittent "postmaster does not shut down" failures we've been seeing in the buildfarm lately. In any case it did nothing to improve the readability or robustness of the code. In passing, express the hibernation sleep time as a multiplier on BgWriterDelay, not a constant. I'm not sure whether there's any value in exposing the longer sleep time as an independently configurable setting, but we can at least make it act like this for little extra code.	2012-05-09 23:37:10 -04:00
Tom Lane	1dd89eadcd	Rename I/O timing statistics columns to blk_read_time and blk_write_time. This seems more consistent with the pre-existing choices for names of other statistics columns. Rename assorted internal identifiers to match.	2012-04-29 18:13:33 -04:00
Tom Lane	309c64745e	Rename track_iotiming GUC to track_io_timing. This spelling seems significantly more readable to me.	2012-04-29 16:23:54 -04:00
Robert Haas	ab77b2da8b	Fix incorrect comment in SetBufferCommitInfoNeedsSave(). Noah Misch spotted the fact that the old comment is in fact incorrect, due to memory ordering hazards.	2012-04-18 10:55:40 -04:00
Robert Haas	644828908f	Expose track_iotiming data via the statistics collector. Ants Aasma's original patch to add timing information for buffer I/O requests exposed this data at the relation level, which was judged too costly. I've here exposed it at the database level instead.	2012-04-05 11:40:24 -04:00
Robert Haas	40b9b95769	New GUC, track_iotiming, to track I/O timings. Currently, the only way to see the numbers this gathers is via EXPLAIN (ANALYZE, BUFFERS), but the plan is to add visibility through the stats collector and pg_stat_statements in subsequent patches. Ants Aasma, reviewed by Greg Smith, with some further changes by me.	2012-03-27 14:55:02 -04:00
Robert Haas	2254367435	Make EXPLAIN (BUFFERS) track blocks dirtied, as well as those written. Also expose the new counters through pg_stat_statements. Patch by me. Review by Fujii Masao and Greg Smith.	2012-02-22 20:33:05 -05:00
Heikki Linnakangas	6d90eaaa89	Make bgwriter sleep longer when it has no work to do, to save electricity. To make it wake up promptly when activity starts again, backends nudge it by setting a latch in MarkBufferDirty(). The latch is kept set while bgwriter is active, so there is very little overhead from that when the system is busy. It is only armed before going into longer sleep. Peter Geoghegan, with some changes by me.	2012-01-26 18:39:13 +02:00
Robert Haas	7e4911b2ae	Fix variable confusion in BufferSync(). As noted by Heikki Linnakangas, the previous coding confused the "flags" variable with the "mask" variable. The affect of this appears to be that unlogged buffers would get written out at every checkpoint rather than only at shutdown time. Although that's arguably an acceptable failure mode, I'm back-patching this change, since it seems like a poor idea to rely on this happening to work.	2012-01-06 08:35:48 -05:00
Bruce Momjian	e126958c2e	Update copyright notices for year 2012.	2012-01-01 18:01:58 -05:00
Alvaro Herrera	9d3b502443	Improve logging of autovacuum I/O activity This adds some I/O stats to the logging of autovacuum (when the operation takes long enough that log_autovacuum_min_duration causes it to be logged), so that it is easier to tune. Notably, it adds buffer I/O counts (hits, misses, dirtied) and read and write rate. Authors: Greg Smith and Noah Misch	2011-11-25 16:34:32 -03:00
Tom Lane	40d35036bb	Avoid floating-point underflow while tracking buffer allocation rate. When the system is idle for awhile after activity, the "smoothed_alloc" state variable in BgBufferSync converges slowly to zero. With standard IEEE float arithmetic this results in several iterations with denormalized values, which causes kernel traps and annoying log messages on some poorly-designed platforms. There's no real need to track such small values of smoothed_alloc, so we can prevent the kernel traps by forcing it to zero as soon as it's too small to be interesting for our purposes. This issue is purely cosmetic, since the iterations don't happen fast enough for the kernel traps to pose any meaningful performance problem, but still it seems worth shutting up the log messages. The kernel log messages were previously reported by a number of people, but kudos to Greg Matthews for tracking down exactly where they were coming from.	2011-11-19 00:35:29 -05:00
Simon Riggs	806a2aee37	Split work of bgwriter between 2 processes: bgwriter and checkpointer. bgwriter is now a much less important process, responsible for page cleaning duties only. checkpointer is now responsible for checkpoints and so has a key role in shutdown. Later patches will correct doc references to the now old idea that bgwriter performs checkpoints. Has beneficial effect on performance at high write rates, but mainly refactoring to more easily allow changes for power reduction by simplifying previously tortuous code around required to allow page cleaning and checkpointing to time slice in the same process. Patch by me, Review by Dickson Guedes	2011-11-01 17:14:47 +00:00
Robert Haas	53f1ca59b5	Allow hint bits to be set sooner for temporary and unlogged tables. We need not wait until the commit record is durably on disk, because in the event of a crash the page we're updating with hint bits will be gone anyway. Per off-list report from Heikki Linnakangas, this can significantly degrade the performance of unlogged tables; I was able to show a 2x speedup from this patch on a pgbench run with scale factor 15. In practice, this will mostly help small, heavily updated tables, because on larger tables you're unlikely to run into the same row again before the commit record makes it out to disk.	2011-10-28 17:08:09 -04:00
Tom Lane	a7801b62f2	Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h. As per my recent proposal, this refactors things so that these typedefs and macros are available in a header that can be included in frontend-ish code. I also changed various headers that were undesirably including utils/timestamp.h to include datatype/timestamp.h instead. Unsurprisingly, this showed that half the system was getting utils/timestamp.h by way of xlog.h. No actual code changes here, just header refactoring.	2011-09-09 13:23:41 -04:00
Tom Lane	2e53bd5517	Fix incorrect initialization of ProcGlobal->startupBufferPinWaitBufId. It was initialized in the wrong place and to the wrong value. With bad luck this could result in incorrect query-cancellation failures in hot standby sessions, should a HS backend be holding pin on buffer number 1 while trying to acquire a lock.	2011-08-02 13:23:52 -04:00
Peter Eisentraut	21f1e15aaf	Unify spelling of "canceled", "canceling", "cancellation" We had previously (`af26857a27`) established the U.S. spellings as standard.	2011-06-29 09:28:46 +03:00
Peter Eisentraut	8a8fbe7e79	Capitalization fixes	2011-06-19 00:37:30 +03:00
Alvaro Herrera	fba105b109	Use "transient" files for blind writes, take 2 "Blind writes" are a mechanism to push buffers down to disk when evicting them; since they may belong to different databases than the one a backend is connected to, the backend does not necessarily have a relation to link them to, and thus no way to blow them away. We were keeping those files open indefinitely, which would cause a problem if the underlying table was deleted, because the operating system would not be able to reclaim the disk space used by those files. To fix, have bufmgr mark such files as transient to smgr; the lower layer is allowed to close the file descriptor when the current transaction ends. We must be careful to have any other access of the file to remove the transient markings, to prevent unnecessary expensive system calls when evicting buffers belonging to our own database (which files we're likely to require again soon.) This commit fixes a bug in the previous one, which neglected to cleanly handle the LRU ring that fd.c uses to manage open files, and caused an unacceptable failure just before beta2 and was thus reverted.	2011-06-10 13:43:02 -04:00
Alvaro Herrera	9261557eb1	Revert "Use "transient" files for blind writes" This reverts commit `54d9e8c6c1`, which caused a failure on the buildfarm. Not a good thing to have just before a beta release.	2011-06-09 16:41:44 -04:00
Alvaro Herrera	54d9e8c6c1	Use "transient" files for blind writes "Blind writes" are a mechanism to push buffers down to disk when evicting them; since they may belong to different databases than the one a backend is connected to, the backend does not necessarily have a relation to link them to, and thus no way to blow them away. We were keeping those files open indefinitely, which would cause a problem if the underlying table was deleted, because the operating system would not be able to reclaim the disk space used by those files. To fix, have bufmgr mark such files as transient to smgr; the lower layer is allowed to close the file descriptor when the current transaction ends. We must be careful to have any other access of the file to remove the transient markings, to prevent unnecessary expensive system calls when evicting buffers belonging to our own database (which files we're likely to require again soon.)	2011-06-09 16:25:49 -04:00
Bruce Momjian	bf50caf105	pgindent run before PG 9.1 beta 1.	2011-04-10 11:42:00 -04:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Robert Haas	53dbc27c62	Support unlogged tables. The contents of an unlogged table are WAL-logged; thus, they are not available on standby servers and are truncated whenever the database system enters recovery. Indexes on unlogged tables are also unlogged. Unlogged GiST indexes are not currently supported.	2010-12-29 06:48:53 -05:00
Robert Haas	5f7b58fad8	Generalize concept of temporary relations to "relation persistence". This commit replaces pg_class.relistemp with pg_class.relpersistence; and also modifies the RangeVar node type to carry relpersistence rather than istemp. It also removes removes rd_istemp from RelationData and instead performs the correct computation based on relpersistence. For clarity, we add three new macros: RelationNeedsWAL(), RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we can clarify the purpose of each check that previous depended on rd_istemp. This is intended as infrastructure for the upcoming unlogged tables patch, as well as for future possible work on global temporary tables.	2010-12-13 12:34:26 -05:00
Robert Haas	c2281ac87c	Remove belt-and-suspenders guards against buffer pin leaks. Forcibly releasing all leftover buffer pins should be unnecessary now that we have a robust ResourceOwner mechanism, and it significantly increases the cost of process shutdown. Instead, in an assert-enabled build, assert that no pins are held; in a non-assert-enabled build, do nothing.	2010-11-25 00:06:46 -05:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Robert Haas	a481ff71af	Remove the isLocalBuf argument from ReadBuffer_common. Since an SMgrRelation now knows whether or not the underlying relation is temporary, there's no point in also passing that information via an additional argument.	2010-08-20 01:07:50 +00:00
Robert Haas	27f145a40e	Further dtrace adjustments for the backend-IDs-in-relpath patch. Update the documentation, and back out a few ill-considered changes whose folly I failed to realize for failure to read the documentation.	2010-08-14 02:22:10 +00:00
Robert Haas	105d4c5ffe	Fix assorted dtrace breakage caused by patch to include backend IDs in temp relpaths. Per buildfarm.	2010-08-13 22:54:17 +00:00
Robert Haas	debcec7dc3	Include the backend ID in the relpath of temporary relations. This allows us to reliably remove all leftover temporary relation files on cluster startup without reference to system catalogs or WAL; therefore, we no longer include temporary relations in XLOG_XACT_COMMIT and XLOG_XACT_ABORT WAL records. Since these changes require including a backend ID in each SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id field has been reduced from two bytes to one, and the maximum number of connections has been reduced from INT_MAX / 4 to 2^23-1. It would be possible to remove these restrictions by increasing the size of SharedInvalidationMessage by 4 bytes, but right now that doesn't seem like a good trade-off. Review by Jaime Casanova and Tom Lane.	2010-08-13 20:10:54 +00:00
Bruce Momjian	65e806cba1	pgindent run for 9.0	2010-02-26 02:01:40 +00:00
Simon Riggs	959ac58c04	In HS, Startup process sets SIGALRM when waiting for buffer pin. If woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock.	2010-01-23 16:37:12 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Robert Haas	cddca5ec13	Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics. This patch also removes buffer-usage statistics from the track_counts output, since this (or the global server statistics) is deemed to be a better interface to this information. Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.	2009-12-15 04:57:48 +00:00
Bruce Momjian	d747140279	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew.	2009-06-11 14:49:15 +00:00
Tom Lane	1b2bb33a54	Add a comment documenting the question of whether PrefetchBuffer should try to protect an already-existing buffer from being evicted. This was left as an open issue when the posix_fadvise patch was committed. I'm not sure there's any evidence to justify more work in this area, but we should have some record about it in the source code.	2009-04-03 18:17:43 +00:00
Tom Lane	948d6ec90f	Modify the relcache to record the temp status of both local and nonlocal temp relations; this is no more expensive than before, now that we have pg_class.relistemp. Insert tests into bufmgr.c to prevent attempting to fetch pages from nonlocal temp relations. This provides a low-level defense against bugs-of-omission allowing temp pages to be loaded into shared buffers, as in the contrib/pgstattuple problem reported by Stuart Bishop. While at it, tweak a bunch of places to use new relcache tests (instead of expensive probes into pg_namespace) to detect local or nonlocal temp tables.	2009-03-31 22:12:48 +00:00
Tom Lane	471913a6a5	More fixes for 8.4 DTrace probes. Remove useless BUFFER_HIT/BUFFER_MISS probes --- the BUFFER_READ_DONE probe provides the same information and more besides. Expand the LOCK_WAIT_START/DONE probe arguments so that there's actually some chance of telling what is being waited for. Update and clean up the documentation.	2009-03-23 01:52:38 +00:00
Tom Lane	44023dc5f5	Add isExtend to the parameters of the buffer_read_start and buffer_read_done DTrace probes, so that ordinary reads can be distinguished from relation extension operations. Move buffer_read_start probe to before the smgrnblocks() call that's needed in the isExtend case, since really that step should be charged as part of the time needed for the extension operation. (This makes it slightly harder to match the read_start with the associated read_done, since now you can't match them on blockNumber, but it should still be possible since isExtend operations on the same relation can never be interleaved.) Per recent discussion. In passing, add the page identity (forkNum/blockNum) to the parameters of the buffer_flush_start/buffer_flush_done probes, which were unaccountably lacking the info.	2009-03-22 22:39:05 +00:00
Tom Lane	d287c9eff0	Restore previous ordering of BUFFER_FLUSH_START probe. I had wanted to make it include the time for the possible smgropen() call, but that results in a null pointer dereference :-(. An alternative solution would be to fetch the buffer tag instead of looking at *reln, but I'll just put it back as it was for the moment. BTW, this indicates that DTrace probes evaluate their arguments even when nominally inactive. What was that about "zero cost", again?	2009-03-13 17:46:21 +00:00
Tom Lane	e04810e8c4	Code review for dtrace probes added (so far) to 8.4. Adjust placement of some bufmgr probes, take out redundant and memory-leak-inducing path arguments to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to recalculate space used in sort__done, clean up formatting in places where I'm not sure pgindent will do a nice job by itself.	2009-03-11 23:19:25 +00:00
Tom Lane	b7b8f0b609	Implement prefetching via posix_fadvise() for bitmap index scans. A new GUC variable effective_io_concurrency controls how many concurrent block prefetch requests will be issued. (The best way to handle this for plain index scans is still under debate, so that part is not applied yet --- tgl) Greg Stark	2009-01-12 05:10:45 +00:00
Bruce Momjian	511db38ace	Update copyright for 2009.	2009-01-01 17:24:05 +00:00
Bruce Momjian	5a90bc1fbe	The attached patch contains a couple of fixes in the existing probes and includes a few new ones. - Fixed compilation errors on OS X for probes that use typedefs - Fixed a number of probes to pass ForkNumber per the relation forks patch - The new probes are those that were taken out from the previous submitted patch and required simple fixes. Will submit the other probes that may require more discussion in a separate patch. Robert Lor	2008-12-17 01:39:04 +00:00
Heikki Linnakangas	3396000684	Rethink the way FSM truncation works. Instead of WAL-logging FSM truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To make that cleaner from modularity point of view, move the WAL-logging one level up to RelationTruncate, and move RelationTruncate and all the related WAL-logging to new src/backend/catalog/storage.c file. Introduce new RelationCreateStorage and RelationDropStorage functions that are used instead of calling smgrcreate/smgrscheduleunlink directly. Move the pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new functions. This leaves smgr.c as a thin wrapper around md.c; all the transactional stuff is now in storage.c. This will make it easier to add new forks with similar truncation logic, like the visibility map.	2008-11-19 10:34:52 +00:00

... 2 3 4 5 6 ...

441 Commits