postgres

mirror of https://github.com/postgres/postgres.git synced 2025-07-07 00:36:50 +03:00

Author	SHA1	Message	Date
Heikki Linnakangas	214fa27e92	Initialize shared memory copy of ckptXidEpoch correctly when not in recovery. This bug was introduced by commit `20d98ab6e4`, so backpatch this to 9.0-9.2 like that one. This fixes bug #6710, reported by Tarvi Pillessaar	2012-06-29 19:23:04 +03:00
Tom Lane	00e5844592	Cope with smaller-than-normal BLCKSZ setting in SPGiST indexes on text. The original coding failed miserably for BLCKSZ of 4K or less, as reported by Josh Kupershmidt. With the present design for text indexes, a given inner tuple could have up to 256 labels (requiring either 3K or 4K bytes depending on MAXALIGN), which means that we can't positively guarantee no failures for smaller blocksizes. But we can at least make it behave sanely so long as there are few enough labels to fit on a page. Considering that btree is also more prone to "index tuple too large" failures when BLCKSZ is small, it's not clear that we should expend more work than this on this case.	2012-06-26 14:36:29 -04:00
Peter Eisentraut	0b847cba69	Improve reporting of permission errors for array types Because permissions are assigned to element types, not array types, complaining about permission denied on an array type would be misleading to users. So adjust the reporting to refer to the element type instead. In order not to duplicate the required logic in two dozen places, refactor the permission denied reporting for types a bit. pointed out by Yeb Havinga during the review of the type privilege feature	2012-06-15 23:03:50 +03:00
Tom Lane	b8b69d8990	Revert "Reduce checkpoints and WAL traffic on low activity database server" This reverts commit `18fb9d8d21`. Per discussion, it does not seem like a good idea to allow committed changes to go un-checkpointed indefinitely, as could happen in a low-traffic server; that makes us entirely reliant on the WAL stream with no redundancy that might aid data recovery in case of disk failure. This re-introduces the original problem of hot-standby setups generating a small continuing stream of WAL traffic even when idle, but there are other ways to address that without compromising crash recovery, so we'll revisit that issue in a future release cycle.	2012-06-13 18:48:44 -04:00
Bruce Momjian	927d61eeff	Run pgindent on 9.2 source tree in preparation for first 9.3 commit-fest.	2012-06-10 15:20:04 -04:00
Tom Lane	ece01aae47	Scan the buffer pool just once, not once per fork, during relation drop. This provides a speedup of about 4X when NBuffers is large enough. There is also a useful reduction in sinval traffic, since we only do CacheInvalidateSmgr() once not once per fork. Simon Riggs, reviewed and somewhat revised by Tom Lane	2012-06-07 17:43:11 -04:00
Simon Riggs	2c8a4e9be2	Wake WALSender to reduce data loss at failover for async commit. WALSender now woken up after each background flush by WALwriter, avoiding multi-second replication delay for an all-async commit workload. Replication delay reduced from 7s with default settings to 200ms and often much less, allowing significantly reduced data loss at failover. Andres Freund and Simon Riggs	2012-06-07 19:22:47 +01:00
Robert Haas	b50991eedb	Fix more crash-safe visibility map bugs, and improve comments. In lazy_scan_heap, we could issue bogus warnings about incorrect information in the visibility map, because we checked the visibility map bit before locking the heap page, creating a race condition. Fix by rechecking the visibility map bit before we complain. Rejigger some related logic so that we rely on the possibly-outdated all_visible_according_to_vm value as little as possible. In heap_multi_insert, it's not safe to clear the visibility map bit before beginning the critical section. The visibility map is not crash-safe unless we treat clearing the bit as a critical operation. Specifically, if the transaction were to error out after we set the bit and before entering the critical section, we could end up writing the heap page to disk (with the bit cleared) and crashing before the visibility map page made it to disk. That would be bad. heap_insert has this correct, but somehow the order of operations got rearranged when heap_multi_insert was added. Also, add some more comments to visibilitymap_test, lazy_scan_heap, and IndexOnlyNext, expounding on concurrency issues. Per extensive code review by Andres Freund, and further review by Tom Lane, who also made the original report about the bogus warnings.	2012-06-07 12:48:13 -04:00
Simon Riggs	d3abbbebe5	Avoid early reuse of btree pages, causing incorrect query results. When we allowed read-only transactions to skip assigning XIDs we introduced the possibility that a fully deleted btree page could be reused. This broke the index link sequence which could then lead to indexscans silently returning fewer rows than would have been correct. The actual incidence of silent errors from this is thought to be very low because of the exact workload required and locking pre-conditions. Fix is to remove pages only if index page opaque->btpo.xact precedes RecentGlobalXmin. Noah Misch, reviewed by Simon Riggs	2012-06-01 12:21:45 +01:00
Tom Lane	a04dc87db1	Improve comment for GetStableLatestTransactionId().	2012-05-31 11:20:02 -04:00
Simon Riggs	a2b516dab9	Only throw recovery conflicts when InHotStandby. Bug fix to recent patch to allow Index Only Scans on Hot Standby. Bug report from Jaime Casanova	2012-05-31 13:11:47 +01:00
Heikki Linnakangas	d1996ed5e8	Change the way parent pages are tracked during buffered GiST build. We used to mimic the way a stack is constructed when descending the tree during normal GiST inserts, but that was quite complicated during a buffered build. It was also wrong: in GiST, the left-to-right relationships on different levels might not match each other, so that when you know the parent of a child page, you won't necessarily find the parent of the page to the right of the child page by following the rightlinks at the parent level. This sometimes led to "could not re-find parent" errors while building a GiST index. We now use a simple hash table to track the parent of every internal page. Whenever a page is split, and downlinks are moved from one page to another, we update the hash table accordingly. This is also better for performance than the old method, as we never need to move right to re-find the parent page, which could take a significant amount of time for buffers that were created much earlier in the index build.	2012-05-30 12:05:57 +03:00
Heikki Linnakangas	be02b16826	Delete the temporary file used in buffered GiST build, after the build. There were two bugs here: We forgot to call gistFreeBuildBuffers() function at the end of build, and we passed interXact == true to BufFileCreateTemp, so the file wasn't automatically cleaned up at end-of-transaction either.	2012-05-30 12:05:57 +03:00
Heikki Linnakangas	4bc6fb57f7	Fix integer overflow bug in GiST buffering build calculations. The result of (maintenance_work_mem * 1024) / BLCKSZ doesn't fit in a signed 32-bit integer, if maintenance_work_mem >= 2GB. Use double instead. And while we're at it, write the calculations in an easier to understand form, with the intermediary steps written out and commented.	2012-05-29 22:27:42 +03:00
Tom Lane	2755abf386	Teach AbortOutOfAnyTransaction to clean up partially-started transactions. AbortOutOfAnyTransaction failed to do anything if the state it saw on entry corresponded to failing partway through StartTransaction. I fixed AbortCurrentTransaction to cope with that case way back in commit `60b2444cc3`, but evidently overlooked that AbortOutOfAnyTransaction should do likewise. Back-patch to all supported branches. It's not clear that this omission has any more-than-cosmetic consequences, but it's also not clear that it doesn't, so back-patching seems the least risky choice.	2012-05-28 23:57:06 -04:00
Tom Lane	532fe28dad	Prevent synchronized scanning when systable_beginscan chooses a heapscan. The only interesting-for-performance case wherein we force heapscan here is when we're rebuilding the relcache init file, and the only such case that is likely to be examining a catalog big enough to be syncscanned is RelationBuildTupleDesc. But the early-exit optimization in that code gets broken if we start the scan at a random place within the catalog, so that allowing syncscan is actually a big deoptimization if pg_attribute is large (at least for the normal case where the rows for core system catalogs have never been changed since initdb). Hence, prevent syncscan here. Per my testing pursuant to complaints from Jeff Frost and Greg Sabino Mullane, though neither of them seem to have actually hit this specific problem. Back-patch to 8.3, where syncscan was introduced.	2012-05-26 19:09:52 -04:00
Tom Lane	ed962fd712	Ensure that seqscans check for interrupts at least once per page. If a seqscan encounters many consecutive pages containing only dead tuples, it can remain in the loop in heapgettup for a long time, and there was no CHECK_FOR_INTERRUPTS anywhere in that loop. This meant there were real-world situations where a query would be effectively uncancelable for long stretches. Add a check placed to occur once per page, which should be enough to provide reasonable response time without adding any measurable overhead. Report and patch by Merlin Moncure (though I tweaked it a bit). Back-patch to all supported branches.	2012-05-22 19:42:05 -04:00
Heikki Linnakangas	1d27dcf578	Fix bug in gistRelocateBuildBuffersOnSplit(). When we create a temporary copy of the old node buffer, in stack, we mustn't leak that into any of the long-lived data structures. Before this patch, when we called gistPopItupFromNodeBuffer(), it got added to the array of "loaded buffers". After gistRelocateBuildBuffersOnSplit() exits, the pointer added to the loaded buffers array points to garbage. Often that goes unnotied, because when we go through the array of loaded buffers to unload them, buffers with a NULL pageBuffer are ignored, which can often happen by accident even if the pointer points to garbage. This patch fixes that by marking the temporary copy in stack explicitly as temporary, and refrain from adding buffers marked as temporary to the array of loaded buffers. While we're at it, initialize nodeBuffer->pageBlocknum to InvalidBlockNumber and improve comments a bit. This isn't strictly necessary, but makes debugging easier.	2012-05-18 19:38:32 +03:00
Heikki Linnakangas	6593c5b5dc	Fix bug in freespace calculation in heap_multi_insert(). If the amount of freespace on page was less than the amount reserved by fillfactor, the calculation would underflow. This fixes bug #6643 reported by Tomonari Katsumata.	2012-05-16 14:13:06 +03:00
Heikki Linnakangas	9e4637bf89	Update comments that became out-of-date with the PGXACT struct. When the "hot" members of PGPROC were split off to separate PGXACT structs, many PGPROC fields referred to in comments were moved to PGXACT, but the comments were neglected in the commit. Mostly this is just a search/replace of PGPROC with PGXACT, but the way the dummy PGPROC entries are created for prepared transactions changed more, making some of the comments totally bogus. Noah Misch	2012-05-14 10:28:55 +03:00
Simon Riggs	867540b49c	Ensure backwards compatibility for GetStableLatestTransactionId()	2012-05-12 13:26:10 +01:00
Peter Eisentraut	afe86a9e73	Fix obsolescent C declaration syntax gcc -Wextra/-Wold-style-declaration thinks that "inline" should go before the function return type.	2012-05-12 12:52:02 +03:00
Simon Riggs	b06679e012	Ensure age() returns a stable value rather than the latest value	2012-05-11 14:36:24 +01:00
Heikki Linnakangas	3652d72dd4	On GiST page split, release the locks on child pages before recursing up. When inserting the downlinks for a split gist page, we used hold the locks on the child pages until the insertion into the parent - and recursively its parent if it had to be split too - were all completed. Change that so that the locks on child pages are released after the insertion in the immediate parent is done, before recursing further up the tree. This reduces the number of lwlocks that are held simultaneously. Holding many locks is bad for concurrency, and in extreme cases you can even hit the limit of 100 simultaneously held lwlocks in a backend. If you're really unlucky, you can hit the limit while in a critical section, which brings down the whole system. This fixes bug #6629 reported by Tom Forbes. Backpatch to 9.1. The page splitting code was rewritten in 9.1, and the old code did not have this problem.	2012-05-11 12:35:28 +03:00
Tom Lane	acd4c7d58b	Fix an issue in recent walwriter hibernation patch. Users of asynchronous-commit mode expect there to be a guaranteed maximum delay before an async commit's WAL records get flushed to disk. The original version of the walwriter hibernation patch broke that. Add an extra shared-memory flag to allow async commits to kick the walwriter out of hibernation mode, without adding any noticeable overhead in cases where no action is needed.	2012-05-08 23:06:40 -04:00
Tom Lane	5461564a9d	Reduce idle power consumption of walwriter and checkpointer processes. This patch modifies the walwriter process so that, when it has not found anything useful to do for many consecutive wakeup cycles, it extends its sleep time to reduce the server's idle power consumption. It reverts to normal as soon as it's done any successful flushes. It's still true that during any async commit, backends check for completed, unflushed pages of WAL and signal the walwriter if there are any; so that in practice the walwriter can get awakened and returned to normal operation sooner than the sleep time might suggest. Also, improve the checkpointer so that it uses a latch and a computed delay time to not wake up at all except when it has something to do, replacing a previous hardcoded 0.5 sec wakeup cycle. This also is primarily useful for reducing the server's power consumption when idle. In passing, get rid of the dedicated latch for signaling the walwriter in favor of using its procLatch, since that comports better with possible generic signal handlers using that latch. Also, fix a pre-existing bug with failure to save/restore errno in walwriter's signal handlers. Peter Geoghegan, somewhat simplified by Tom	2012-05-08 20:03:26 -04:00
Robert Haas	0038110421	Avoid repeated CLOG access from heap_hot_search_buffer. At the time we check whether the tuple is dead to all running transactions, we've already verified that it isn't visible to our scan, setting hint bits if appropriate. So there's no need to recheck CLOG for the all-dead test we do just a moment later. So, add HeapTupleIsSurelyDead() to test the appropriate condition under the assumption that all relevant hit bits are already set. Review by Tom Lane.	2012-05-02 12:40:07 -04:00
Robert Haas	e01e66f808	More duplicate word removal.	2012-05-02 09:28:16 -04:00
Heikki Linnakangas	f291ccd43e	Remove duplicate words in comments. Found these with grep -r "for for ".	2012-05-02 10:20:27 +03:00
Tom Lane	809e7e21af	Converge all SQL-level statistics timing values to float8 milliseconds. This patch adjusts the core statistics views to match the decision already taken for pg_stat_statements, that values representing elapsed time should be represented as float8 and measured in milliseconds. By using float8, we are no longer tied to a specific maximum precision of timing data. (Internally, it's still microseconds, but we could now change that without needing changes at the SQL level.) The columns affected are pg_stat_bgwriter.checkpoint_write_time pg_stat_bgwriter.checkpoint_sync_time pg_stat_database.blk_read_time pg_stat_database.blk_write_time pg_stat_user_functions.total_time pg_stat_user_functions.self_time pg_stat_xact_user_functions.total_time pg_stat_xact_user_functions.self_time The first four of these are new in 9.2, so there is no compatibility issue from changing them. The others require a release note comment that they are now double precision (and can show a fractional part) rather than bigint as before; also their underlying statistics functions now match the column definitions, instead of returning bigint microseconds.	2012-04-30 14:03:33 -04:00
Robert Haas	0d2235a25b	Remove duplicate word in comment. Noted by Peter Geoghegan.	2012-04-30 13:14:46 -04:00
Robert Haas	3424bff90f	Prevent index-only scans from returning wrong answers under Hot Standby. The alternative of disallowing index-only scans in HS operation was discussed, but the consensus was that it was better to treat marking a page all-visible as a recovery conflict for snapshots that could still fail to see XIDs on that page. We may in the future try to soften this, so that we simply force index scans to do heap fetches in cases where this may be an issue, rather than throwing a hard conflict.	2012-04-26 20:00:21 -04:00
Tom Lane	9873001e6d	Another trivial comment-typo fix.	2012-04-25 14:28:58 -04:00
Robert Haas	5d4b60f2f2	Lots of doc corrections. Josh Kupershmidt	2012-04-23 22:43:09 -04:00
Peter Eisentraut	48658a1b81	Fix some typos Josh Kupershmidt	2012-04-22 19:23:47 +03:00
Robert Haas	53c5b869b4	Tighten up error recovery for fast-path locking. The previous code could cause a backend crash after BEGIN; SAVEPOINT a; LOCK TABLE foo (interrupted by ^C or statement timeout); ROLLBACK TO SAVEPOINT a; LOCK TABLE foo, and might have leaked strong-lock counts in other situations. Report by Zoltán Böszörményi; patch review by Jeff Davis.	2012-04-18 11:17:30 -04:00
Heikki Linnakangas	fe546f3da6	Don't wait for the commit record to be replicated if we wrote no WAL. When using synchronous replication, we waited for the commit record to be replicated, but if we our transaction didn't write any other WAL records, that's not required because we don't even flush the WAL locally to disk in that case. This lead to long waits when committing a transaction that only modified a temporary table. Bug spotted by Thom Brown.	2012-04-17 16:28:31 +03:00
Peter Eisentraut	a33fcd7e79	Fix typo Kyotaro HORIGUCHI	2012-04-16 15:36:40 +03:00
Robert Haas	3ae5133b1c	Teach SLRU code to avoid replacing I/O-busy pages. Patch by me; review by Tom Lane and others.	2012-04-08 23:05:55 -04:00
Tom Lane	0ab4db52c0	Fix misleading output from gin_desc(). XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed with a list link field labeled as "blkno", which was confusing, especially when the link was empty (InvalidBlockNumber). Print the metapage block number instead, since that's what's actually being updated. We could include the link values too as a separate field, but not clear it's worth the trouble. Back-patch to 8.4 where the dubious code was added.	2012-04-06 18:10:21 -04:00
Robert Haas	b736aef2ec	Publish checkpoint timing information to pg_stat_bgwriter. Greg Smith, Peter Geoghegan, and Robert Haas	2012-04-05 14:04:37 -04:00
Simon Riggs	68219aaf6b	Correct epoch of txid_current() when executed on a Hot Standby server. Initialise ckptXidEpoch from starting checkpoint and maintain the correct value as we roll forwards. This allows GetNextXidAndEpoch() to return the correct epoch when executed during recovery. Backpatch to 9.0 when the problem is first observable by a user. Bug report from Daniel Farina	2012-03-29 14:55:30 +01:00
Robert Haas	7386089d23	Code cleanup for heap_freeze_tuple. It used to be case that lazy vacuum could call this function with only a shared lock on the buffer, but neither lazy vacuum nor any other code path does that any more. Simplify the code accordingly and clean up some related, obsolete comments.	2012-03-26 11:03:06 -04:00
Peter Eisentraut	0e85abd658	Clean up compiler warnings from unused variables with asserts disabled For those variables only used when asserts are enabled, use a new macro PG_USED_FOR_ASSERTS_ONLY, which expands to __attribute__((unused)) when asserts are not enabled.	2012-03-21 23:33:10 +02:00
Peter Eisentraut	e684ab5e1e	Add additional safety check against invalid backup label file It was already checking for invalid data after "BACKUP FROM", but would possibly crash if "BACKUP FROM" was missing altogether. found by Coverity	2012-03-14 22:41:50 +02:00
Tom Lane	b4af1c25bb	Fix SPGiST vacuum algorithm to handle concurrent tuple motion properly. A leaf tuple that we need to delete could get moved as a consequence of an insertion happening concurrently with the VACUUM scan. If it moves from a page past the current scan point to a page before, we'll miss it, which is not acceptable. Hence, when we see a leaf-page REDIRECT that could have been made since our scan started, chase down the redirection pointer much as if we were doing a normal index search, and be sure to vacuum every page it leads to. This fixes the issue because, if the tuple was on page N at the instant we start our scan, we will surely find it as a consequence of chasing the redirect from page N, no matter how much it moves around in between. Problem noted by Takashi Yamamoto.	2012-03-12 16:10:28 -04:00
Tom Lane	c6a11b89e4	Teach SPGiST to store nulls and do whole-index scans. This patch fixes the other major compatibility-breaking limitation of SPGiST, that it didn't store anything for null values of the indexed column, and so could not support whole-index scans or "x IS NULL" tests. The approach is to create a wholly separate search tree for the null entries, and use fixed "allTheSame" insertion and search rules when processing this tree, instead of calling the index opclass methods. This way the opclass methods do not need to worry about dealing with nulls. Catversion bump is for pg_am updates as well as the change in on-disk format of SPGiST indexes; there are some tweaks in SPGiST WAL records as well. Heavily rewritten version of a patch by Oleg Bartunov and Teodor Sigaev. (The original also stored nulls separately, but it reused GIN code to do so; which required undesirable compromises in the on-disk format, and would likely lead to bugs due to the GIN code being required to work in two very different contexts.)	2012-03-11 16:29:59 -04:00
Tom Lane	03e56f798e	Restructure SPGiST opclass interface API to support whole-index scans. The original API definition was incapable of supporting whole-index scans because there was no way to invoke leaf-value reconstruction without checking any qual conditions. Also, it was inefficient for multiple-qual-condition scans because value reconstruction got done over again for each qual condition, and because other internal work in the consistent functions likewise had to be done for each qual. To fix these issues, pass the whole scankey array to the opclass consistent functions, instead of only letting them see one item at a time. (Essentially, the loop over scankey entries is now inside the consistent functions not outside them. This makes the consistent functions a bit more complicated, but not unreasonably so.) In itself this commit does nothing except save a few cycles in multiple-qual-condition index scans, since we can't support whole-index scans on SPGiST indexes until nulls are included in the index. However, I consider this a must-fix for 9.2 because once we release it will get very much harder to change the opclass API definition.	2012-03-10 18:36:49 -05:00
Heikki Linnakangas	342baf4ce6	Update outdated comment. HeapTupleHeader.t_natts field doesn't exist anymore. Kevin Grittner	2012-03-09 08:07:56 +02:00
Heikki Linnakangas	d93f209f48	Silence warning about unused variable, when building without assertions.	2012-03-08 11:10:02 +02:00

1 2 3 4 5 ...

1994 Commits