postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-29 23:43:17 +03:00

Author	SHA1	Message	Date
Kevin Grittner	a343e223a5	Revert no-op changes to BufferGetPage() The reverted changes were intended to force a choice of whether any newly-added BufferGetPage() calls needed to be accompanied by a test of the snapshot age, to support the "snapshot too old" feature. Such an accompanying test is needed in about 7% of the cases, where the page is being used as part of a scan rather than positioning for other purposes (such as DML or vacuuming). The additional effort required for back-patching, and the doubt whether the intended benefit would really be there, have indicated it is best just to rely on developers to do the right thing based on comments and existing usage, as we do with many other conventions. This change should have little or no effect on generated executable code. Motivated by the back-patching pain of Tom Lane and Robert Haas	2016-04-20 08:31:19 -05:00
Tom Lane	a0382e2d7e	Make partition-lock-release coding more transparent in BufferAlloc(). Coverity complained that oldPartitionLock was possibly dereferenced after having been set to NULL. That actually can't happen, because we'd only use it if (oldFlags & BM_TAG_VALID) is true. But nonetheless Coverity is justified in complaining, because at line 1275 we actually overwrite oldFlags, and then still expect its BM_TAG_VALID bit to be a safe guide to whether to release the oldPartitionLock. Thus, the code would be incorrect if someone else had changed the buffer's BM_TAG_VALID flag meanwhile. That should not happen, since we hold pin on the buffer throughout this sequence, but it's starting to look like a rather shaky chain of logic. And there's no need for such assumptions, because we can simply replace the (oldFlags & BM_TAG_VALID) tests with (oldPartitionLock != NULL), which has identical results and makes it plain to all comers that we don't dereference a null pointer. A small side benefit is that the range of liveness of oldFlags is greatly reduced, possibly allowing the compiler to save a register. This is just cleanup, not an actual bug fix, so there seems no need for a back-patch.	2016-04-18 18:05:56 -04:00
Tom Lane	4039c736eb	Adjust spin.c's spinlock emulation so that 0 is not a valid spinlock value. We've had repeated troubles over the years with failures to initialize spinlocks correctly; see `6b93fcd14` for a recent example. Most of the time, on most platforms, such oversights can escape notice because all-zeroes is the expected initial content of an slock_t variable. The only platform we have where the initialized state of an slock_t isn't zeroes is HPPA, and that's practically gone in the wild. To make it easier to catch such errors without needing one of those, adjust the --disable-spinlocks code so that zero is not a valid value for an slock_t for it. In passing, remove a bunch of unnecessary #include's from spin.c; commit `daa7527afc` removed all the intermodule coupling that made them necessary.	2016-04-16 19:53:38 -04:00
Tom Lane	6b85d4ba9b	Fix portability problem induced by commit `a6f6b7819`. pg_xlogdump includes bufmgr.h. With a compiler that emits code for static inline functions even when they're unreferenced, that leads to unresolved external references in the new static-inline version of BufferGetPage(). So hide it with #ifndef FRONTEND, as we've done for similar issues elsewhere. Per buildfarm member pademelon.	2016-04-15 10:44:28 -04:00
Andres Freund	4b74c6a40e	Make init_spin_delay() C89 compliant #2 . My previous attempt at doing so, in `80abbeba23`, was not sufficient. While that fixed the problem for bufmgr.c and lwlock.c , s_lock.c still has non-constant expressions in the struct initializer, because the file/line/function information comes from the caller of s_lock(). Give up on using a macro, and use a static inline instead. Discussion: 4369.1460435533@sss.pgh.pa.us	2016-04-14 19:26:13 -07:00
Andres Freund	80abbeba23	Make init_spin_delay() C89 compliant and change stuck spinlock reporting. The current definition of init_spin_delay (introduced recently in `48354581a`) wasn't C89 compliant. It's not legal to refer to refer to non-constant expressions, and the ptr argument was one. This, as reported by Tom, lead to a failure on buildfarm animal pademelon. The pointer, especially on system systems with ASLR, isn't super helpful anyway, though. So instead of making init_spin_delay into an inline function, make s_lock_stuck() report the function name in addition to file:line and change init_spin_delay() accordingly. While not a direct replacement, the function name is likely more useful anyway (line numbers are often hard to interpret in third party reports). This also fixes what file/line number is reported for waits via s_lock(). As PG_FUNCNAME_MACRO is now used outside of elog.h, move it to c.h. Reported-By: Tom Lane Discussion: 4369.1460435533@sss.pgh.pa.us	2016-04-13 17:00:53 -07:00
Andres Freund	6b93fcd149	Avoid atomic operation in MarkLocalBufferDirty(). The recent patch to make Pin/UnpinBuffer lockfree in the hot path (`48354581a`), accidentally used pg_atomic_fetch_or_u32() in MarkLocalBufferDirty(). Other code operating on local buffers was careful to only use pg_atomic_read/write_u32 which just read/write from memory; to avoid unnecessary overhead. On its own that'd just make MarkLocalBufferDirty() slightly less efficient, but in addition InitLocalBuffers() doesn't call pg_atomic_init_u32() - thus the spinlock fallback for the atomic operations isn't initialized. That in turn caused, as reported by Tom, buildfarm animal gaur to fail. As those errors are actually useful against this type of error, continue to omit - intentionally this time - initialization of the atomic variable. In addition, add an explicit note about only using pg_atomic_read/write on local buffers's state to BufferDesc's description. Reported-By: Tom Lane Discussion: 1881.1460431476@sss.pgh.pa.us	2016-04-13 15:28:29 -07:00
Tom Lane	95ef43c430	Widen amount-to-flush arguments of FileWriteback and callers. It's silly to define these counts as narrower than they might someday need to be. Also, I believe that the BLCKSZ * nflush calculation in mdwriteback was capable of overflowing an int.	2016-04-13 18:12:06 -04:00
Tom Lane	fa11a09fed	Fix assorted portability issues with using msync() for data flushing. Commit `428b1d6b29` introduced the use of msync() for flushing dirty data from the kernel's file buffers. Several portability issues were overlooked, though: * Not all implementations of mmap() think that nbytes == 0 means "map the whole file". To fix, use lseek() to find out the true length. Fix callers of pg_flush_data to be aware that nbytes == 0 may result in trashing the file's seek position. * Not all implementations of mmap() will accept partial-page mmap requests. To fix, round down the length request to whatever sysconf() says the page size is. (I think this is OK from a portability standpoint, because sysconf() is required by SUS v2, and we aren't trying to compile this part on Windows anyway. Buildfarm should let us know if not.) * On 32-bit machines, the file size might exceed the available free address space, or even exceed what will fit in size_t. Check for the latter explicitly to avoid passing a false request size to mmap(). If mmap fails, silently fall through to the next implementation method, rather than bleating to the postmaster log and giving up. * mmap'ing directories fails on some platforms, and even if it works, msync'ing the directory is quite unlikely to help, as for that matter are the other flush implementations. In pre_sync_fname(), just skip flush attempts on directories. In passing, copy-edit the comments a bit. Stas Kelvich and myself	2016-04-13 17:17:51 -04:00
Kevin Grittner	2201d801b0	Avoid extra locks in GetSnapshotData if old_snapshot_threshold < 0 On a big NUMA machine with 1000 connections in saturation load there was a performance regression due to spinlock contention, for acquiring values which were never used. Just fill with dummy values if we're not going to use them. This patch has not been benchmarked yet on a big NUMA machine, but it seems like a good idea on general principle, and it seemed to prevent an apparent 2.2% regression on a single-socket i7 box running 200 connections at saturation load.	2016-04-12 11:48:02 -05:00
Kevin Grittner	a6f6b78196	Use static inline function for BufferGetPage() I was initially concerned that the some of the hundreds of references to BufferGetPage() where the literal BGP_NO_SNAPSHOT_TEST were passed might not optimize as well as a macro, leading to some hard-to-find performance regressions in corner cases. Inspection of disassembled code has shown identical code at all inspected locations, and the size difference doesn't amount to even one byte per such call. So make it readable. Per gripes from Álvaro Herrera and Tom Lane	2016-04-11 16:47:50 -05:00
Andres Freund	008608b9d5	Avoid the use of a separate spinlock to protect a LWLock's wait queue. Previously we used a spinlock, in adition to the atomically manipulated ->state field, to protect the wait queue. But it's pretty simple to instead perform the locking using a flag in state. Due to `6150a1b0` BufferDescs, on platforms (like PPC) with > 1 byte spinlocks, increased their size above 64byte. As 64 bytes are the size we pad allocated BufferDescs to, this can increase false sharing; causing performance problems in turn. Together with the previous commit this reduces the size to <= 64 bytes on all common platforms. Author: Andres Freund Discussion: CAA4eK1+ZeB8PMwwktf+3bRS0Pt4Ux6Rs6Aom0uip8c6shJWmyg@mail.gmail.com 20160327121858.zrmrjegmji2ymnvr@alap3.anarazel.de	2016-04-10 20:12:32 -07:00
Andres Freund	48354581a4	Allow Pin/UnpinBuffer to operate in a lockfree manner. Pinning/Unpinning a buffer is a very frequent operation; especially in read-mostly cache resident workloads. Benchmarking shows that in various scenarios the spinlock protecting a buffer header's state becomes a significant bottleneck. The problem can be reproduced with pgbench -S on larger machines, but can be considerably worse for queries which touch the same buffers over and over at a high frequency (e.g. nested loops over a small inner table). To allow atomic operations to be used, cram BufferDesc's flags, usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable; that allows to manipulate them together using 32bit compare-and-swap operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could be lifted by using a 64bit field, but it's not a realistic configuration atm). As not all operations can easily implemented in a lockfree manner, implement the previous buf_hdr_lock via a flag bit in the atomic variable. That way we can continue to lock the header in places where it's needed, but can get away without acquiring it in the more frequent hot-paths. There's some additional operations which can be done without the lock, but aren't in this patch; but the most important places are covered. As bufmgr.c now essentially re-implements spinlocks, abstract the delay logic from s_lock.c into something more generic. It now has already two users, and more are coming up; there's a follupw patch for lwlock.c at least. This patch is based on a proof-of-concept written by me, which Alexander Korotkov made into a fully working patch; the committed version is again revised by me. Benchmarking and testing has, amongst others, been provided by Dilip Kumar, Alexander Korotkov, Robert Haas. On a large x86 system improvements for readonly pgbench, with a high client count, of a factor of 8 have been observed. Author: Alexander Korotkov and Andres Freund Discussion: 2400449.GjM57CE0Yg@dinodell	2016-04-10 20:12:32 -07:00
Kevin Grittner	848ef42bb8	Add the "snapshot too old" feature This feature is controlled by a new old_snapshot_threshold GUC. A value of -1 disables the feature, and that is the default. The value of 0 is just intended for testing. Above that it is the number of minutes a snapshot can reach before pruning and vacuum are allowed to remove dead tuples which the snapshot would otherwise protect. The xmin associated with a transaction ID does still protect dead tuples. A connection which is using an "old" snapshot does not get an error unless it accesses a page modified recently enough that it might not be able to produce accurate results. This is similar to the Oracle feature, and we use the same SQLSTATE and error message for compatibility.	2016-04-08 14:36:30 -05:00
Kevin Grittner	8b65cf4c5e	Modify BufferGetPage() to prepare for "snapshot too old" feature This patch is a no-op patch which is intended to reduce the chances of failures of omission once the functional part of the "snapshot too old" patch goes in. It adds parameters for snapshot, relation, and an enum to specify whether the snapshot age check needs to be done for the page at this point. This initial patch passes NULL for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the third. The follow-on patch will change the places where the test needs to be made.	2016-04-08 14:30:10 -05:00
Robert Haas	719c84c1be	Extend relations multiple blocks at a time to improve scalability. Contention on the relation extension lock can become quite fierce when multiple processes are inserting data into the same relation at the same time at a high rate. Experimentation shows the extending the relation multiple blocks at a time improves scalability. Dilip Kumar, reviewed by Petr Jelinek, Amit Kapila, and me.	2016-04-08 02:04:46 -04:00
Simon Riggs	cac0e36682	Revert `bf08f2292f` Remove recent changes to logging XLOG_RUNNING_XACTS by request.	2016-04-06 14:03:46 +01:00
Robert Haas	09adc9a8c0	Align all shared memory allocations to cache line boundaries. Experimentation shows this only costs about 6kB, which seems well worth it given the major performance effects that can be caused by insufficient alignment, especially on larger systems. Discussion: 14166.1458924422@sss.pgh.pa.us	2016-04-05 15:47:49 -04:00
Simon Riggs	bf08f2292f	Avoid archiving XLOG_RUNNING_XACTS on idle server If archive_timeout > 0 we should avoid logging XLOG_RUNNING_XACTS if idle. Bug 13685 reported by Laurence Rowe, investigated in detail by Michael Paquier, though this is not his proposed fix. 20151016203031.3019.72930@wrigleys.postgresql.org Simple non-invasive patch to allow later backpatch to 9.4 and 9.5	2016-04-04 07:18:05 +01:00
Tom Lane	a1953f3a60	Make all the declarations of WaitEventSetWaitBlock be marked "inline". The inconsistency here triggered compiler warnings on some buildfarm members, and it's surely pretty pointless.	2016-04-02 13:55:44 -04:00
Noah Misch	4ad6f13500	Copyedit comments and documentation.	2016-04-01 21:53:10 -04:00
Robert Haas	bd0f206f55	Fix typo in comment. Thomas Munro	2016-03-28 20:55:15 -04:00
Andres Freund	9f7c527af3	Fix LWLockReportWaitEnd() parameter list to be (void). Previously it was an "old style" function declaration.	2016-03-27 22:53:31 +02:00
Andres Freund	1a7a43672b	Don't use !! but != 0/NULL to force boolean evaluation. I introduced several uses of !! to force bit arithmetic to be boolean, but per discussion the project prefers != 0/NULL. Discussion: CA+TgmoZP5KakLGP6B4vUjgMBUW0woq_dJYi0paOz-My0Hwt_vQ@mail.gmail.com	2016-03-27 18:10:19 +02:00
Andres Freund	98a64d0bd7	Introduce WaitEventSet API. Commit `ac1d794` ("Make idle backends exit if the postmaster dies.") introduced a regression on, at least, large linux systems. Constantly adding the same postmaster_alive_fds to the OSs internal datastructures for implementing poll/select can cause significant contention; leading to a performance regression of nearly 3x in one example. This can be avoided by using e.g. linux' epoll, which avoids having to add/remove file descriptors to the wait datastructures at a high rate. Unfortunately the current latch interface makes it hard to allocate any persistent per-backend resources. Replace, with a backward compatibility layer, WaitLatchOrSocket with a new WaitEventSet API. Users can allocate such a Set across multiple calls, and add more than one file-descriptor to wait on. The latter has been added because there's upcoming postgres features where that will be helpful. In addition to the previously existing poll(2), select(2), WaitForMultipleObjects() implementations also provide an epoll_wait(2) based implementation to address the aforementioned performance problem. Epoll is only available on linux, but that is the most likely OS for machines large enough (four sockets) to reproduce the problem. To actually address the aforementioned regression, create and use a long-lived WaitEventSet for FE/BE communication. There are additional places that would benefit from a long-lived set, but that's a task for another day. Thanks to Amit Kapila, who helped make the windows code I blindly wrote actually work. Reported-By: Dmitry Vasilyev Discussion: CAB-SwXZh44_2ybvS5Z67p_CDz=XFn4hNAD=CnMEF+QqkXwFrGg@mail.gmail.com 20160114143931.GG10941@awork2.anarazel.de	2016-03-21 12:22:54 +01:00
Andres Freund	72e2d21c12	Combine win32 and unix latch implementations. Previously latches for windows and unix had been implemented in different files. A later patch introduce an expanded wait infrastructure, keeping the implementation separate would introduce too much duplication. This basically just moves the functions, without too much change. The reason to keep this separate is that it allows blame to continue working a little less badly; and to make review a tiny bit easier. Discussion: 20160114143931.GG10941@awork2.anarazel.de	2016-03-21 11:03:26 +01:00
Robert Haas	c6dda1f48e	Add idle_in_transaction_session_timeout. Vik Fearing, reviewed by Stéphane Schildknecht and me, and revised slightly by me.	2016-03-16 11:30:45 -04:00
Robert Haas	3aff33aa68	Fix typos. Oskari Saarenmaa	2016-03-15 18:06:11 -04:00
Andres Freund	e01157500f	Include portability/mem.h into fd.c for MAP_FAILED. Buildfarm members gaur and pademelon are old enough not to know about MAP_FAILED; which is used in `428b1d6`. Include portability/mem.h to fix; as already done in a bunch of other places.	2016-03-12 12:16:48 -08:00
Robert Haas	481c76abf4	Fix a typo, and remove unnecessary pgstat_report_wait_end(). Per Amit Kapila.	2016-03-11 07:34:00 -05:00
Robert Haas	a414d96ad2	Simplify GetLockNameFromTagType. The old code is wrong, because it returns a pointer to an automatic variable. And it's also more clever than we really need to be considering that the case it's worrying about should never happen.	2016-03-10 21:37:22 -05:00
Andres Freund	c94f0c29ce	Blindly try to fix dtrace enabled builds, broken in `9cd00c45`. Reported-By: Peter Eisentraut Discussion: 56E2239E.1050607@gmx.net	2016-03-10 17:51:03 -08:00
Andres Freund	9cd00c457e	Checkpoint sorting and balancing. Up to now checkpoints were written in the order they're in the BufferDescriptors. That's nearly random in a lot of cases, which performs badly on rotating media, but even on SSDs it causes slowdowns. To avoid that, sort checkpoints before writing them out. We currently sort by tablespace, relfilenode, fork and block number. One of the major reasons that previously wasn't done, was fear of imbalance between tablespaces. To address that balance writes between tablespaces. The other prime concern was that the relatively large allocation to sort the buffers in might fail, preventing checkpoints from happening. Thus pre-allocate the required memory in shared memory, at server startup. This particularly makes it more efficient to have checkpoint flushing enabled, because that'll often result in a lot of writes that can be coalesced into one flush. Discussion: alpine.DEB.2.10.1506011320000.28433@sto Author: Fabien Coelho and Andres Freund	2016-03-10 17:05:09 -08:00
Andres Freund	428b1d6b29	Allow to trigger kernel writeback after a configurable number of writes. Currently writes to the main data files of postgres all go through the OS page cache. This means that some operating systems can end up collecting a large number of dirty buffers in their respective page caches. When these dirty buffers are flushed to storage rapidly, be it because of fsync(), timeouts, or dirty ratios, latency for other reads and writes can increase massively. This is the primary reason for regular massive stalls observed in real world scenarios and artificial benchmarks; on rotating disks stalls on the order of hundreds of seconds have been observed. On linux it is possible to control this by reducing the global dirty limits significantly, reducing the above problem. But global configuration is rather problematic because it'll affect other applications; also PostgreSQL itself doesn't always generally want this behavior, e.g. for temporary files it's undesirable. Several operating systems allow some control over the kernel page cache. Linux has sync_file_range(2), several posix systems have msync(2) and posix_fadvise(2). sync_file_range(2) is preferable because it requires no special setup, whereas msync() requires the to-be-flushed range to be mmap'ed. For the purpose of flushing dirty data posix_fadvise(2) is the worst alternative, as flushing dirty data is just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages from the page cache. Thus the feature is enabled by default only on linux, but can be enabled on all systems that have any of the above APIs. While desirable and likely possible this patch does not contain an implementation for windows. With the infrastructure added, writes made via checkpointer, bgwriter and normal user backends can be flushed after a configurable number of writes. Each of these sources of writes controlled by a separate GUC, checkpointer_flush_after, bgwriter_flush_after and backend_flush_after respectively; they're separate because the number of flushes that are good are separate, and because the performance considerations of controlled flushing for each of these are different. A later patch will add checkpoint sorting - after that flushes from the ckeckpoint will almost always be desirable. Bgwriter flushes are most of the time going to be random, which are slow on lots of storage hardware. Flushing in backends works well if the storage and bgwriter can keep up, but if not it can have negative consequences. This patch is likely to have negative performance consequences without checkpoint sorting, but unfortunately so has sorting without flush control. Discussion: alpine.DEB.2.10.1506011320000.28433@sto Author: Fabien Coelho and Andres Freund	2016-03-10 17:04:34 -08:00
Simon Riggs	37c54863cf	Rework wait for AccessExclusiveLocks on Hot Standby Earlier version committed in 9.0 caused spurious waits in some cases. New infrastructure for lock waits in 9.3 used to correct and improve this. Jeff Janes based upon a proposal by Simon Riggs, who also reviewed Additional review comments from Amit Kapila	2016-03-10 19:26:24 +00:00
Robert Haas	53be0b1add	Provide much better wait information in pg_stat_activity. When a process is waiting for a heavyweight lock, we will now indicate the type of heavyweight lock for which it is waiting. Also, you can now see when a process is waiting for a lightweight lock - in which case we will indicate the individual lock name or the tranche, as appropriate - or for a buffer pin. Amit Kapila, Ildus Kurbangaliev, reviewed by me. Lots of helpful discussion and suggestions by many others, including Alexander Korotkov, Vladimir Borodin, and many others.	2016-03-10 12:44:09 -05:00
Andres Freund	606e0f9841	Introduce durable_rename() and durable_link_or_rename(). Renaming a file using rename(2) is not guaranteed to be durable in face of crashes; especially on filesystems like xfs and ext4 when mounted with data=writeback. To be certain that a rename() atomically replaces the previous file contents in the face of crashes and different filesystems, one has to fsync the old filename, rename the file, fsync the new filename, fsync the containing directory. This sequence is not generally adhered to currently; which exposes us to data loss risks. To avoid having to repeat this arduous sequence, introduce durable_rename(), which wraps all that. Also add durable_link_or_rename(). Several places use link() (with a fallback to rename()) to rename a file, trying to avoid replacing the target file out of paranoia. Some of those rename sequences need to be durable as well. There seems little reason extend several copies of the same logic, so centralize the link() callers. This commit does not yet make use of the new functions; they're used in a followup commit. Author: Michael Paquier, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches	2016-03-09 18:53:53 -08:00
Robert Haas	070140ee48	Add some functions to fd.c for the convenience of extensions. For example, if you want to perform an ioctl() on a file descriptor opened through the fd.c routines, there's no way to do that without being able to get at the underlying fd. KaiGai Kohei	2016-03-08 10:09:50 -05:00
Robert Haas	a892234f83	Change the format of the VM fork to add a second bit per page. The new bit indicates whether every tuple on the page is already frozen. It is cleared only when the all-visible bit is cleared, and it can be set only when we vacuum a page and find that every tuple on that page is both visible to every transaction and in no need of any future vacuuming. A future commit will use this new bit to optimize away full-table scans that would otherwise be triggered by XID wraparound considerations. A page which is merely all-visible must still be scanned in that case, but a page which is all-frozen need not be. This commit does not attempt that optimization, although that optimization is the goal here. It seems better to get the basic infrastructure in place first. Per discussion, it's very desirable for pg_upgrade to automatically migrate existing VM forks from the old format to the new format. That, too, will be handled in a follow-on patch. Masahiko Sawada, reviewed by Kyotaro Horiguchi, Fujii Masao, Amit Kapila, Simon Riggs, Andres Freund, and others, and substantially revised by me.	2016-03-01 21:49:41 -05:00
Tom Lane	52f5d578d6	Create a function to reliably identify which sessions block which others. This patch introduces "pg_blocking_pids(int) returns int[]", which returns the PIDs of any sessions that are blocking the session with the given PID. Historically people have obtained such information using a self-join on the pg_locks view, but it's unreasonably tedious to do it that way with any modicum of correctness, and the addition of parallel queries has pretty much broken that approach altogether. (Given some more columns in the view than there are today, you could imagine handling parallel-query cases with a 4-way join; but ugh.) The new function has the following behaviors that are painful or impossible to get right via pg_locks: 1. Correctly understands which lock modes block which other ones. 2. In soft-block situations (two processes both waiting for conflicting lock modes), only the one that's in front in the wait queue is reported to block the other. 3. In parallel-query cases, reports all sessions blocking any member of the given PID's lock group, and reports a session by naming its leader process's PID, which will be the pg_backend_pid() value visible to clients. The motivation for doing this right now is mostly to fix the isolation tests. Commit `38f8bdcac4` lobotomized isolationtester's is-it-waiting query by removing its ability to recognize nonconflicting lock modes, as a crude workaround for the inability to handle soft-block situations properly. But even without the lock mode tests, the old query was excessively slow, particularly in CLOBBER_CACHE_ALWAYS builds; some of our buildfarm animals fail the new deadlock-hard test because the deadlock timeout elapses before they can probe the waiting status of all eight sessions. Replacing the pg_locks self-join with use of pg_blocking_pids() is not only much more correct, but a lot faster: I measure it at about 9X faster in a typical dev build with Asserts, and 3X faster in CLOBBER_CACHE_ALWAYS builds. That should provide enough headroom for the slower CLOBBER_CACHE_ALWAYS animals to pass the test, without having to lengthen deadlock_timeout yet more and thus slow down the test for everyone else.	2016-02-22 14:31:43 -05:00
Tom Lane	73bf8715aa	Remove redundant PGPROC.lockGroupLeaderIdentifier field. We don't really need this field, because it's either zero or redundant with PGPROC.pid. The use of zero to mark "not a group leader" is not necessary since we can just as well test whether lockGroupLeader is NULL. This does not save very much, either as to code or data, but the simplification seems worthwhile anyway.	2016-02-22 11:20:35 -05:00
Andres Freund	ea56b06cf7	Fix wrong keysize in PrivateRefCountHash creation. In `4b4b680c3` I accidentally used sizeof(PrivateRefCountArray) instead of sizeof(PrivateRefCountEntry) when creating the refcount overflow hashtable. As the former is bigger than the latter, this luckily only resulted in a slightly increased memory usage when many buffers are pinned in a backend. Reported-By: Takashi Horikawa Discussion: 73FA3881462C614096F815F75628AFCD035A48C3@BPXM01GP.gisp.nec.co.jp Backpatch: 9.5, where thew new ref count infrastructure was introduced	2016-02-21 22:48:44 -08:00
Robert Haas	88aca5662d	Fix incorrect decision about which lock to take. Spotted by Tom Lane.	2016-02-21 17:06:41 +05:30
Robert Haas	d91a4a6c85	Cosmetic improvements to group locking. Reflow text in lock manager README so that it fits within 80 columns. Correct some mistakes. Expand the README to explain not only why group locking exists but also the data structures that support it. Improve comments related to group locking several files. Change the name of a macro argument for improved clarity. Most of these problems were reported by Tom Lane, but I found a few of them myself. Robert Haas and Tom Lane	2016-02-21 15:42:02 +05:30
Tom Lane	9b92e76f7b	Make GetLockStatusData's header comment resemble reality. The API spec for this function was changed completely (and for the better) by commit `3cba8999b3`, but it didn't bother with anything as mundane as updating the comments.	2016-02-13 15:42:31 -05:00
Robert Haas	63461a63f9	Make builtin lwlock tranche names consistent. Previously, we had a mix of styles. Amit Kapila	2016-02-12 08:07:11 -05:00
Robert Haas	c319991bca	Use separate lwlock tranches for buffer, lock, and predicate lock managers. This finishes the work - spread across many commits over the last several months - of putting each type of lock other than the named individual locks into a separate tranche. Amit Kapila	2016-02-11 14:07:33 -05:00
Robert Haas	a455878d99	Rename PGPROC fields related to group XID clearing again. Commit `0e141c0fbb` introduced a new facility to reduce ProcArrayLock contention by clearing several XIDs from the ProcArray under a single lock acquisition. The names initially chosen were deemed not to be very good choices, so commit `4aec49899e` renamed them. But now it seems like we still didn't get it right. A pending patch wants to add similar infrastructure for batching CLOG updates, so the names need to be clear enough to allow a new set of structure members with a related purpose. Amit Kapila	2016-02-11 08:55:24 -05:00
Tom Lane	c5e9b77127	Revert "Temporarily make pg_ctl and server shutdown a whole lot chattier." This reverts commit `3971f64843` and a couple of followon debugging commits; I think we've learned what we can from them.	2016-02-10 16:01:04 -05:00
Robert Haas	79a7ff0fe5	Code cleanup in the wake of recent LWLock refactoring. As of commit `c1772ad922`, there's no longer any way of requesting additional LWLocks in the main tranche, so we don't need NumLWLocks() or LWLockAssign() any more. Also, some of the allocation counters that we had previously aren't needed any more either. Amit Kapila	2016-02-10 09:58:09 -05:00

1 2 3 4 5 ...

1716 Commits