postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-15 03:41:20 +03:00

Author	SHA1	Message	Date
Robert Haas	4b373e42d1	Improve C comments in GetSnapshotData. Move discussion of why our algorithm for taking snapshots in recovery to a more appropriate location in the function, and delete incorrect mention of taking a lock.	2012-08-21 11:47:10 -04:00
Tom Lane	26b438694c	Only allow autovacuum to be auto-canceled by a directly blocked process. In the original coding of the autovacuum cancel feature, commit `acac68b2bc`, an autovacuum process was considered a target for cancellation if it was found to hard-block any process examined in the deadlock search. This patch tightens the test so that the autovacuum must directly hard-block the current process. This should make the behavior more predictable in general, and in particular it ensures that an autovacuum will not be canceled with less than deadlock_timeout grace period. In the old coding, it was possible for an autovacuum to be canceled almost instantly, given unfortunate timing of two or more other processes' lock attempts. This also justifies the logging methodology in the recent commit d7318d43d891bd63e82dcfc27948113ed7b1db80; without this restriction, that patch isn't providing enough information to see the connection of the canceling process to the autovacuum. Like that one, patch all the way back.	2012-07-26 14:29:22 -04:00
Robert Haas	d7318d43d8	Log a better message when canceling autovacuum. The old message was at DEBUG2, so typically it didn't show up in the log at all. As a result, in most cases where autovacuum was canceled, the only information that was logged was the table being vacuumed, with no indication as to what problem caused the cancel. Crank up the level to LOG and add some more details to assist with debugging. Back-patch all the way, per discussion on pgsql-hackers.	2012-07-26 09:19:03 -04:00
Tom Lane	2d46a57ddc	Improve copydir() code for the case that fsync is off. We should avoid calling sync_file_range or posix_fadvise in this case, since (a) we don't really care if the data gets synced, and might as well save the kernel calls; (b) at least on Linux we know that the kernel might block us until it's scheduled the write. Also, avoid making a useless second traversal of the directory tree if we're not actually going to call fsync(2) after all.	2012-07-21 20:10:29 -04:00
Tom Lane	be86e3dd5b	Rethink checkpointer's fsync-request table representation. Instead of having one hash table entry per relation/fork/segment, just have one per relation, and use bitmapsets to represent which specific segments need to be fsync'd. This eliminates the need to scan the whole hash table to implement FORGET_RELATION_FSYNC, which fixes the O(N^2) behavior recently demonstrated by Jeff Janes for cases involving lots of TRUNCATE or DROP TABLE operations during a single checkpoint cycle. Per an idea from Robert Haas. (FORGET_DATABASE_FSYNC still sucks, but since dropping a database is a pretty expensive operation anyway, we'll live with that.) In passing, improve the delayed-unlink code: remove the pass over the list in mdpreckpt, since it wasn't doing anything for us except supporting a useless Assert in mdpostckpt, and fix mdpostckpt so that it will absorb fsync requests every so often when clearing a large backlog of deletion requests.	2012-07-19 19:28:22 -04:00
Tom Lane	3072b7bade	Send only one FORGET_RELATION_FSYNC request when dropping a relation. We were sending one per fork, but a little bit of refactoring allows us to send just one request with forknum == InvalidForkNumber. This not only reduces pressure on the shared-memory request queue, but saves repeated traversals of the checkpointer's hash table.	2012-07-19 13:07:33 -04:00
Tom Lane	4a9c30a8a1	Fix management of pendingOpsTable in auxiliary processes. mdinit() was misusing IsBootstrapProcessingMode() to decide whether to create an fsync pending-operations table in the current process. This led to creating a table not only in the startup and checkpointer processes as intended, but also in the bgwriter process, not to mention other auxiliary processes such as walwriter and walreceiver. Creation of the table in the bgwriter is fatal, because it absorbs fsync requests that should have gone to the checkpointer; instead they just sit in bgwriter local memory and are never acted on. So writes performed by the bgwriter were not being fsync'd which could result in data loss after an OS crash. I think there is no live bug with respect to walwriter and walreceiver because those never perform any writes of shared buffers; but the potential is there for future breakage in those processes too. To fix, make AuxiliaryProcessMain() export the current process's AuxProcType as a global variable, and then make mdinit() test directly for the types of aux process that should have a pendingOpsTable. Having done that, we might as well also get rid of the random bool flags such as am_walreceiver that some of the aux processes had grown. (Note that we could not have fixed the bug by examining those variables in mdinit(), because it's called from BaseInit() which is run by AuxiliaryProcessMain() before entering any of the process-type-specific code.) Back-patch to 9.2, where the problem was introduced by the split-up of bgwriter and checkpointer processes. The bogus pendingOpsTable exists in walwriter and walreceiver processes in earlier branches, but absent any evidence that it causes actual problems there, I'll leave the older branches alone.	2012-07-18 15:28:10 -04:00
Tom Lane	73b796a52c	Improve coding around the fsync request queue. In all branches back to 8.3, this patch fixes a questionable assumption in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there are no uninitialized pad bytes in the request queue structs. This would only cause trouble if (a) there were such pad bytes, which could happen in 8.4 and up if the compiler makes enum ForkNumber narrower than 32 bits, but otherwise would require not-currently-planned changes in the widths of other typedefs; and (b) the kernel has not uniformly initialized the contents of shared memory to zeroes. Still, it seems a tad risky, and we can easily remove any risk by pre-zeroing the request array for ourselves. In addition to that, we need to establish a coding rule that struct RelFileNode can't contain any padding bytes, since such structs are copied into the request array verbatim. (There are other places that are assuming this anyway, it turns out.) In 9.1 and up, the risk was a bit larger because we were also effectively assuming that struct RelFileNodeBackend contained no pad bytes, and with fields of different types in there, that would be much easier to break. However, there is no good reason to ever transmit fsync or delete requests for temp files to the bgwriter/checkpointer, so we can revert the request structs to plain RelFileNode, getting rid of the padding risk and saving some marginal number of bytes and cycles in fsync queue manipulation while we are at it. The savings might be more than marginal during deletion of a temp relation, because the old code transmitted an entirely useless but nonetheless expensive-to-process ForgetRelationFsync request to the background process, and also had the background process perform the file deletion even though that can safely be done immediately. In addition, make some cleanup of nearby comments and small improvements to the code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.	2012-07-17 16:56:54 -04:00
Alvaro Herrera	f34c68f096	Introduce timeout handling framework Management of timeouts was getting a little cumbersome; what we originally had was more than enough back when we were only concerned about deadlocks and query cancel; however, when we added timeouts for standby processes, the code got considerably messier. Since there are plans to add more complex timeouts, this seems a good time to introduce a central timeout handling module. External modules register their timeout handlers during process initialization, and later enable and disable them as they see fit using a simple API; timeout.c is in charge of keeping track of which timeouts are in effect at any time, installing a common SIGALRM signal handler, and calling setitimer() as appropriate to ensure timely firing of external handlers. timeout.c additionally supports pluggable modules to add their own timeouts, though this capability isn't exercised anywhere yet. Additionally, as of this commit, walsender processes are aware of timeouts; we had a preexisting bug there that made those ignore SIGALRM, thus being subject to unhandled deadlocks, particularly during the authentication phase. This has already been fixed in back branches in commit `0bf8eb2a`, which see for more details. Main author: Zoltán Böszörményi Some review and cleanup by Álvaro Herrera Extensive reworking by Tom Lane	2012-07-16 22:55:33 -04:00
Peter Eisentraut	dd16f9480a	Remove unreachable code The Solaris Studio compiler warns about these instances, unlike more mainstream compilers such as gcc. But manual inspection showed that the code is clearly not reachable, and we hope no worthy compiler will complain about removing this code.	2012-07-16 22:15:03 +03:00
Tom Lane	b966dd6c42	Add fsync capability to initdb, and use sync_file_range() if available. Historically we have not worried about fsync'ing anything during initdb (in fact, initdb intentionally passes -F to each backend launch to prevent it from fsync'ing). But with filesystems getting more aggressive about caching data, that's not such a good plan anymore. Make initdb do a pass over the finished data directory tree to fsync everything. For testing purposes, the -N/--nosync flag can be used to restore the old behavior. Also, testing shows that on Linux, sync_file_range() is much faster than posix_fadvise() for hinting to the kernel that an fsync is coming, apparently because the latter blocks on a rather small request queue while the former doesn't. So use this function if available in initdb, and also in the backend's pg_flush_data() (where it currently will affect only the speed of CREATE DATABASE's cloning step). We will later make pg_regress invoke initdb with the --nosync flag to avoid slowing down cases such as "make check" in contrib. But let's not do so until we've shaken out any portability issues in this patch. Jeff Davis, reviewed by Andres Freund	2012-07-13 17:16:58 -04:00
Robert Haas	b79ab00144	When LWLOCK_STATS is defined, count spindelays. When LWLOCK_STATS is not defined, the only change is that SpinLockAcquire now returns the number of delays. Patch by me, review by Jeff Janes.	2012-06-26 16:06:07 -04:00
Alvaro Herrera	77ed0c6950	Tighten up includes in sinvaladt.h, twophase.h, proc.h Remove proc.h from sinvaladt.h and twophase.h; also replace xlog.h in proc.h with xlogdefs.h.	2012-06-25 18:40:40 -04:00
Heikki Linnakangas	0ab9d1c4b3	Replace XLogRecPtr struct with a 64-bit integer. This simplifies code that needs to do arithmetic on XLogRecPtrs. To avoid changing on-disk format of data pages, the LSN on data pages is still stored in the old format. That should keep pg_upgrade happy. However, we have XLogRecPtrs embedded in the control file, and in the structs that are sent over the replication protocol, so this changes breaks compatibility of pg_basebackup and server. I didn't do anything about this in this patch, per discussion on -hackers, the right thing to do would to be to change the replication protocol to be architecture-independent, so that you could use a newer version of pg_receivexlog, for example, against an older server version.	2012-06-24 19:19:45 +03:00
Heikki Linnakangas	eeb6f37d89	Add a small cache of locks owned by a resource owner in ResourceOwner. This speeds up reassigning locks to the parent owner, when the transaction holds a lot of locks, but only a few of them belong to the current resource owner. This is particularly helps pg_dump when dumping a large number of objects. The cache can hold up to 15 locks in each resource owner. After that, the cache is marked as overflowed, and we fall back to the old method of scanning the whole local lock table. The tradeoff here is that the cache has to be scanned whenever a lock is released, so if the cache is too large, lock release becomes more expensive. 15 seems enough to cover pg_dump, and doesn't have much impact on lock release. Jeff Janes, reviewed by Amit Kapila and Heikki Linnakangas.	2012-06-21 15:30:26 +03:00
Bruce Momjian	927d61eeff	Run pgindent on 9.2 source tree in preparation for first 9.3 commit-fest.	2012-06-10 15:20:04 -04:00
Simon Riggs	3725570539	Fix bug in early startup of Hot Standby with subtransactions. When HS startup is deferred because of overflowed subtransactions, ensure that we re-initialize KnownAssignedXids for when both existing and incoming snapshots have non-zero qualifying xids. Fixes bug #6661 reported by Valentine Gogichashvili. Analysis and fix by Andres Freund	2012-06-08 17:34:04 +01:00
Tom Lane	ece01aae47	Scan the buffer pool just once, not once per fork, during relation drop. This provides a speedup of about 4X when NBuffers is large enough. There is also a useful reduction in sinval traffic, since we only do CacheInvalidateSmgr() once not once per fork. Simon Riggs, reviewed and somewhat revised by Tom Lane	2012-06-07 17:43:11 -04:00
Tom Lane	e8d029a30b	Do unlocked prechecks in bufmgr.c loops that scan the whole buffer pool. DropRelFileNodeBuffers, DropDatabaseBuffers, FlushRelationBuffers, and FlushDatabaseBuffers have to scan the whole shared_buffers pool because we have no index structure that would find the target buffers any more efficiently than that. This gets expensive with large NBuffers. We can shave some cycles from these loops by prechecking to see if the current buffer is interesting before we acquire the buffer header lock. Ordinarily such a test would be unsafe, but in these cases it should be safe because we are already assuming that the caller holds a lock that prevents any new target pages from being loaded into the buffer pool concurrently. Therefore, no buffer tag should be changing to a value of interest, only away from a value of interest. So a false negative match is impossible, while a false positive is safe because we'll recheck after acquiring the buffer lock. Initial testing says that this speeds these loops by a factor of 2X to 3X on common Intel hardware. Patch for DropRelFileNodeBuffers by Jeff Janes (based on an idea of Heikki's); extended to the remaining sequential scans by Tom Lane	2012-06-07 16:46:26 -04:00
Robert Haas	07ab1383e3	Fix two more bugs in fast-path relation locking. First, the previous code failed to account for the fact that, during Hot Standby operation, the startup process takes AccessExclusiveLocks on relations without setting MyDatabaseId. This resulted in fast path strong lock counts failing to be incremented with the startup process took locks, which in turn allowed conflicting lock requests to succeed when they should not have. Report by Erik Rijkers, diagnosis by Heikki Linnakangas. Second, LockReleaseAll() failed to honor the allLocks and lockmethodid restrictions with respect to fast-path locks. It's not clear to me whether this produces any user-visible breakage at the moment, but it's certainly wrong. Rearrange order of operations in LockReleaseAll to fix. Noted by Tom Lane.	2012-05-30 16:17:46 -04:00
Robert Haas	219c024c64	Repair out-of-date information in src/backend/storage/buffer/README. In commit `d526575f89`, we changed things so that buffer usage counts are incremented when the buffer is pinned, rather than when it is unpinned, but the README file didn't get the memo. Report by Amit Kapila.	2012-05-22 09:32:09 -04:00
Heikki Linnakangas	9e4637bf89	Update comments that became out-of-date with the PGXACT struct. When the "hot" members of PGPROC were split off to separate PGXACT structs, many PGPROC fields referred to in comments were moved to PGXACT, but the comments were neglected in the commit. Mostly this is just a search/replace of PGPROC with PGXACT, but the way the dummy PGPROC entries are created for prepared transactions changed more, making some of the comments totally bogus. Noah Misch	2012-05-14 10:28:55 +03:00
Tom Lane	6308ba05a7	Improve control logic for bgwriter hibernation mode. Commit `6d90eaaa89` added a hibernation mode to the bgwriter to reduce the server's idle-power consumption. However, its interaction with the detailed behavior of BgBufferSync's feedback control loop wasn't very well thought out. That control loop depends primarily on the rate of buffer allocation, not the rate of buffer dirtying, so the hibernation mode has to be designed to operate only when no new buffer allocations are happening. Also, the check for whether the system is effectively idle was not quite right and would fail to detect a constant low level of activity, thus allowing the bgwriter to go into hibernation mode in a way that would let the cycle time vary quite a bit, possibly further confusing the feedback loop. To fix, move the wakeup support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode unless no buffer allocations have happened recently. In addition, fix the delaying logic to remove the problem of possibly not responding to signals promptly, which was basically caused by trying to use the process latch's is_set flag for multiple purposes. I can't prove it but I'm suspicious that that hack was responsible for the intermittent "postmaster does not shut down" failures we've been seeing in the buildfarm lately. In any case it did nothing to improve the readability or robustness of the code. In passing, express the hibernation sleep time as a multiplier on BgWriterDelay, not a constant. I'm not sure whether there's any value in exposing the longer sleep time as an independently configurable setting, but we can at least make it act like this for little extra code.	2012-05-09 23:37:10 -04:00
Simon Riggs	8f28789bff	Rename BgWriterShmem/Request to CheckpointerShmem/Request	2012-05-09 14:23:45 +01:00
Tom Lane	5461564a9d	Reduce idle power consumption of walwriter and checkpointer processes. This patch modifies the walwriter process so that, when it has not found anything useful to do for many consecutive wakeup cycles, it extends its sleep time to reduce the server's idle power consumption. It reverts to normal as soon as it's done any successful flushes. It's still true that during any async commit, backends check for completed, unflushed pages of WAL and signal the walwriter if there are any; so that in practice the walwriter can get awakened and returned to normal operation sooner than the sleep time might suggest. Also, improve the checkpointer so that it uses a latch and a computed delay time to not wake up at all except when it has something to do, replacing a previous hardcoded 0.5 sec wakeup cycle. This also is primarily useful for reducing the server's power consumption when idle. In passing, get rid of the dedicated latch for signaling the walwriter in favor of using its procLatch, since that comports better with possible generic signal handlers using that latch. Also, fix a pre-existing bug with failure to save/restore errno in walwriter's signal handlers. Peter Geoghegan, somewhat simplified by Tom	2012-05-08 20:03:26 -04:00
Tom Lane	71b9549d05	Overdue code review for transaction-level advisory locks patch. Commit `62c7bd31c8` had assorted problems, most visibly that it broke PREPARE TRANSACTION in the presence of session-level advisory locks (which should be ignored by PREPARE), as per a recent complaint from Stephen Rees. More abstractly, the patch made the LockMethodData.transactional flag not merely useless but outright dangerous, because in point of fact that flag no longer tells you anything at all about whether a lock is held transactionally. This fix therefore removes that flag altogether. We now rely entirely on the convention already in use in lock.c that transactional lock holds must be owned by some ResourceOwner, while session holds are never so owned. Setting the locallock struct's owner link to NULL thus denotes a session hold, and there is no redundant marker for that. PREPARE TRANSACTION now works again when there are session-level advisory locks, and it is also able to transfer transactional advisory locks to the prepared transaction, but for implementation reasons it throws an error if we hold both types of lock on a single lockable object. Perhaps it will be worth improving that someday. Assorted other minor cleanup and documentation editing, as well. Back-patch to 9.1, except that in the 9.1 branch I did not remove the LockMethodData.transactional flag for fear of causing an ABI break for any external code that might be examining those structs.	2012-05-04 17:44:31 -04:00
Robert Haas	1b4998fd44	Further corrections from the department of redundancy department. Thom Brown	2012-05-02 11:11:25 -04:00
Heikki Linnakangas	f291ccd43e	Remove duplicate words in comments. Found these with grep -r "for for ".	2012-05-02 10:20:27 +03:00
Tom Lane	1dd89eadcd	Rename I/O timing statistics columns to blk_read_time and blk_write_time. This seems more consistent with the pre-existing choices for names of other statistics columns. Rename assorted internal identifiers to match.	2012-04-29 18:13:33 -04:00
Tom Lane	309c64745e	Rename track_iotiming GUC to track_io_timing. This spelling seems significantly more readable to me.	2012-04-29 16:23:54 -04:00
Robert Haas	5d4b60f2f2	Lots of doc corrections. Josh Kupershmidt	2012-04-23 22:43:09 -04:00
Robert Haas	4a6fab03f2	Finish rename of FastPathStrongLocks to FastPathStrongRelationLocks. Commit `8e5ac74c12` tried to do this renaming, but I relied on gcc to tell me where I needed to make changes, instead of grep. Noted by Jeff Davis.	2012-04-18 11:29:34 -04:00
Robert Haas	53c5b869b4	Tighten up error recovery for fast-path locking. The previous code could cause a backend crash after BEGIN; SAVEPOINT a; LOCK TABLE foo (interrupted by ^C or statement timeout); ROLLBACK TO SAVEPOINT a; LOCK TABLE foo, and might have leaked strong-lock counts in other situations. Report by Zoltán Böszörményi; patch review by Jeff Davis.	2012-04-18 11:17:30 -04:00
Robert Haas	ab77b2da8b	Fix incorrect comment in SetBufferCommitInfoNeedsSave(). Noah Misch spotted the fact that the old comment is in fact incorrect, due to memory ordering hazards.	2012-04-18 10:55:40 -04:00
Robert Haas	b736aef2ec	Publish checkpoint timing information to pg_stat_bgwriter. Greg Smith, Peter Geoghegan, and Robert Haas	2012-04-05 14:04:37 -04:00
Robert Haas	644828908f	Expose track_iotiming data via the statistics collector. Ants Aasma's original patch to add timing information for buffer I/O requests exposed this data at the relation level, which was judged too costly. I've here exposed it at the database level instead.	2012-04-05 11:40:24 -04:00
Heikki Linnakangas	5762a4d909	Inherit max_safe_fds to child processes in EXEC_BACKEND mode. Postmaster sets max_safe_fds by testing how many open file descriptors it can open, and that is normally inherited by all child processes at fork(). Not so on EXEC_BACKEND, ie. Windows, however. Because of that, we effectively ignored max_files_per_process on Windows, and always assumed a conservative default of 32 simultaneous open files. That could have an impact on performance, if you need to access a lot of different files in a query. After this patch, the value is passed to child processes by save/restore_backend_variables() among many other global variables. It has been like this forever, but given the lack of complaints about it, I'm not backpatching this.	2012-03-29 08:19:11 +03:00
Robert Haas	40b9b95769	New GUC, track_iotiming, to track I/O timings. Currently, the only way to see the numbers this gathers is via EXPLAIN (ANALYZE, BUFFERS), but the plan is to add visibility through the stats collector and pg_stat_statements in subsequent patches. Ants Aasma, reviewed by Greg Smith, with some further changes by me.	2012-03-27 14:55:02 -04:00
Peter Eisentraut	0e85abd658	Clean up compiler warnings from unused variables with asserts disabled For those variables only used when asserts are enabled, use a new macro PG_USED_FOR_ASSERTS_ONLY, which expands to __attribute__((unused)) when asserts are not enabled.	2012-03-21 23:33:10 +02:00
Robert Haas	07d1edb954	Extend object access hook framework to support arguments, and DROP. This allows loadable modules to get control at drop time, perhaps for the purpose of performing additional security checks or to log the event. The initial purpose of this code is to support sepgsql, but other applications should be possible as well. KaiGai Kohei, reviewed by me.	2012-03-09 14:34:56 -05:00
Heikki Linnakangas	d6a7271958	Correctly detect SSI conflicts of prepared transactions after crash. A prepared transaction can get new conflicts in and out after preparing, so we cannot rely on the in- and out-flags stored in the statefile at prepare- time. As a quick fix, make the conservative assumption that after a restart, all prepared transactions are considered to have both in- and out-conflicts. That can lead to unnecessary rollbacks after a crash, but that shouldn't be a big problem in practice; you don't want prepared transactions to hang around for a long time anyway. Dan Ports	2012-02-29 15:42:36 +02:00
Robert Haas	2254367435	Make EXPLAIN (BUFFERS) track blocks dirtied, as well as those written. Also expose the new counters through pg_stat_statements. Patch by me. Review by Fujii Masao and Greg Smith.	2012-02-22 20:33:05 -05:00
Heikki Linnakangas	1a01560cbb	Rename LWLockWaitUntilFree to LWLockAcquireOrWait. LWLockAcquireOrWait makes it more clear that the lock is acquired if it's free.	2012-02-08 09:17:13 +02:00
Heikki Linnakangas	15ad6f1510	When building with LWLOCK_STATS, initialize the stats in LWLockWaitUntilFree. If LWLockWaitUntilFree was called before the first LWLockAcquire call, you would either crash because of access to uninitialized array or account the acquisition incorrectly. LWLockConditionalAcquire doesn't have this problem because it doesn't update the lwlock stats. In practice, this never happens because there is no codepath where you would call LWLockWaitUntilfree before LWLockAcquire after a new process is launched. But that's just accidental, there's no guarantee that that's always going to be true in the future. Spotted by Jeff Janes.	2012-02-07 10:11:54 +02:00
Tom Lane	c6d76d7c82	Add locking around WAL-replay modification of shared-memory variables. Originally, most of this code assumed that no Postgres backends could be running concurrently with it, and so no locking could be needed. That assumption fails in Hot Standby. While it's still true that Hot Standby backends should never change values like nextXid, they can examine them, and consistency is important in some cases such as when computing a snapshot. Therefore, prudence requires that WAL replay code obtain the relevant locks when modifying such variables, even though it can examine them without taking a lock. We were following that coding rule in some places but not all. This commit applies the coding rule uniformly to all updates of ShmemVariableCache and MultiXactState fields; a search of the replay routines did not find any other cases that seemed to be at risk. In addition, this commit fixes a longstanding thinko in replay of NEXTOID and checkpoint records: we tried to advance nextOid only if it was behind the value in the WAL record, but the comparison would draw the wrong conclusion if OID wraparound had occurred since the previous value. Better to just unconditionally assign the new value, since OID assignment shouldn't be happening during replay anyway. The additional locking seems to be more in the nature of future-proofing than fixing any live bug, so I am not going to back-patch it. The NEXTOID fix will be back-patched separately.	2012-02-06 12:34:10 -05:00
Tom Lane	2af72cefea	Add missing Assert and fix inaccurate elog message in standby_redo(). All other WAL redo routines either call RestoreBkpBlocks() or Assert that they haven't been passed any backup blocks. Make this one do likewise. Also, fix incorrect routine name in its failure message.	2012-02-04 22:32:35 -05:00
Heikki Linnakangas	82d4b262d9	Fix bug in the new wait-until-lwlock-is-free mechanism. If there was a wait-until-free process in the head of the wait queue, followed by an exclusive locker, the exclusive locker was not be woken up as it should.	2012-01-31 00:09:30 +02:00
Heikki Linnakangas	9b38d46d9f	Make group commit more effective. When a backend needs to flush the WAL, and someone else is already flushing the WAL, wait until it releases the WALInsertLock and check if we still need to do the flush or if the other backend already did the work for us, before acquiring WALInsertLock. This helps group commit, because when the WAL flush finishes, all the backends that were waiting for it can be woken up in one go, and the can all concurrently observe that they're done, rather than waking them up one by one in a cascading fashion. This is based on a new LWLock function, LWLockWaitUntilFree(), which has peculiar semantics. If the lock is immediately free, it grabs the lock and returns true. If it's not free, it waits until it is released, but then returns false without grabbing the lock. This is used in XLogFlush(), so that when the lock is acquired, the backend flushes the WAL, but if it's not, the backend first checks the current flush location before retrying. Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although this patch as committed ended up being very different from that.	2012-01-30 16:53:48 +02:00
Tom Lane	ad10853b30	Assorted comment fixes, mostly just typos, but some obsolete statements. YAMAMOTO Takashi	2012-01-29 19:23:56 -05:00
Magnus Hagander	672614cf21	Prevent logging "failed to stat file: success" for temp files This was broken in commit `bc3347484a`, the addition of statistics counters for temp files. Reported by Thom Brown	2012-01-28 10:03:26 +01:00

1 2 3 4 5 ...

1386 Commits