postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-10 17:42:29 +03:00

Author	SHA1	Message	Date
Bruce Momjian	a6fd7b7a5f	Post-PG 10 beta1 pgindent run perltidy run not included.	2017-05-17 16:31:56 -04:00
Andres Freund	b58c433ef9	Fix duplicated words in comment. Reported-By: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wzn3rY2N0gTWndaApD113T+O8L6oz8cm7_F3P8y4awdoOg@mail.gmail.com Backpatch: no, only present in master	2017-05-06 17:03:45 -07:00
Andres Freund	fa117ee403	Allow avoiding tuple copy within tuplesort_gettupleslot(). Add a "copy" argument to make it optional to receive a copy of caller tuple that is safe to use following a subsequent manipulating of tuplesort's state. This is a performance optimization. Most existing tuplesort_gettupleslot() callers are made to opt out of copying. Existing callers that happen to rely on the validity of tuple memory beyond subsequent manipulations of the tuplesort request their own copy. This brings tuplesort_gettupleslot() in line with tuplestore_gettupleslot(). In the future, a "copy" tuplesort_getdatum() argument may be added, that similarly allows callers to opt out of receiving their own copy of tuple. In passing, clarify assumptions that callers of other tuplesort fetch routines may make about tuple memory validity, per gripe from Tom Lane. Author: Peter Geoghegan Discussion: CAM3SWZQWZZ_N=DmmL7tKy_OUjGH_5mN=N=A6h7kHyyDvEhg2DA@mail.gmail.com	2017-04-06 14:48:59 -07:00
Robert Haas	ea69a0dead	Expand hash indexes more gradually. Since hash indexes typically have very few overflow pages, adding a new splitpoint essentially doubles the on-disk size of the index, which can lead to large and abrupt increases in disk usage (and perhaps long delays on occasion). To mitigate this problem to some degree, divide larger splitpoints into four equal phases. This means that, for example, instead of growing from 4GB to 8GB all at once, a hash index will now grow from 4GB to 5GB to 6GB to 7GB to 8GB, which is perhaps still not as smooth as we'd like but certainly an improvement. This changes the on-disk format of the metapage, so bump HASH_VERSION from 2 to 3. This will force a REINDEX of all existing hash indexes, but that's probably a good idea anyway. First, hash indexes from pre-10 versions of PostgreSQL could easily be corrupted, and we don't want to confuse corruption carried over from an older release with any corruption caused despite the new write-ahead logging in v10. Second, it will let us remove some backward-compatibility code added by commit `293e24e507`. Mithun Cy, reviewed by Amit Kapila, Jesper Pedersen and me. Regression test outputs updated by me. Discussion: http://postgr.es/m/CAD__OuhG6F1gQLCgMQNnMNgoCvOLQZz9zKYJQNYvYmmJoM42gA@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoYty0jCf-pa+m+vYUJ716+AxM7nv_syvyanyf5O-L_i2A@mail.gmail.com	2017-04-03 23:46:33 -04:00
Tom Lane	4e46c97fde	Fix NULL pointer dereference in tuplesort.c. Oversight in commit `e94568ecc`. This could cause a crash when an external datum tuplesort of a pass-by-value type required multiple passes. Per report from Mithun Cy. Peter Geoghegan Discussion: https://postgr.es/m/CAD__OujuhfWFULGFSt1fyHqUb8N-XafjJhudwt88V0Qs2o84qg@mail.gmail.com	2017-01-16 13:53:40 -05:00
Bruce Momjian	1d25779284	Update copyright via script for 2017	2017-01-03 13:48:53 -05:00
Heikki Linnakangas	01ec25631f	Simplify tape block format. No more indirect blocks. The blocks form a linked list instead. This saves some memory, because we don't need to have a buffer in memory to hold the indirect block (or blocks). To reflect that, TAPE_BUFFER_OVERHEAD is reduced from 3 to 1 buffer, which allows using more memory for building the initial runs. Reviewed by Peter Geoghegan and Robert Haas. Discussion: https://www.postgresql.org/message-id/34678beb-938e-646e-db9f-a7def5c44ada%40iki.fi	2016-12-22 18:45:00 +02:00
Robert Haas	3856cf9607	Remove should_free arguments to tuplesort routines. Since commit `e94568ecc1`, the answer is always "false", and we do not need to complicate the API by arranging to return a constant value. Peter Geoghegan Discussion: http://postgr.es/m/CAM3SWZQWZZ_N=DmmL7tKy_OUjGH_5mN=N=A6h7kHyyDvEhg2DA@mail.gmail.com	2016-12-12 15:57:35 -05:00
Heikki Linnakangas	64bc26f90d	Fix thinko in safeguard for negative availMem. Also, use pass read_buffer_size * numInputTapes rather than just availMem to USEMEM, to be neat. Peter Geoghegan.	2016-12-08 23:05:21 +02:00
Heikki Linnakangas	f7d54f4f7d	Fix accounting of memory needed for merge heap. We allegedly allocated all remaining memory for the read buffers of the sort tapes, but we allocated the merge heap only after that. That means that the allocation of the merge heap was guaranteed to go over the memory limit. Fix by allocating the merge heap first. This makes little difference in practice, because the merge heap is tiny, but let's tidy. While we're at it, add a safeguard for the case that we are already over the limit when allocating the read buffers. That shouldn't happen, but better safe than sorry. The memory accounting error was reported off-list by Peter Geoghegan.	2016-12-08 10:15:57 +02:00
Robert Haas	fc19c1801b	Limit the number of number of tapes used for a sort to 501. Gigantic numbers of tapes don't work out well. Original patch by Peter Geoghegan; comments entirely rewritten by me.	2016-11-15 10:30:33 -05:00
Heikki Linnakangas	d8589946dd	Fix use-after-free around DISTINCT transition function calls. Have tuplesort_gettupleslot() copy the contents of its current table slot as needed. This is based on an approach taken by tuplestore_gettupleslot(). In the future, tuplesort_gettupleslot() may also be taught to avoid copying the tuple where caller can determine that that is safe (the tuplestore_gettupleslot() interface already offers this option to callers). Patch by Peter Geoghegan. Fixes bug #14344, reported by Regina Obe. Report: <20160929035538.20224.39628@wrigleys.postgresql.org> Backpatch-through: 9.6	2016-10-17 12:13:16 +03:00
Heikki Linnakangas	b75f467b6e	Simplify the code for logical tape read buffers. Pass the buffer size as argument to LogicalTapeRewindForRead, rather than setting it earlier with the separate LogicTapeAssignReadBufferSize call. This way, the buffer size is set closer to where it's actually used, which makes the code easier to understand. This makes the calculation for how much memory to use for the buffers less precise. We now use the same amount of memory for every tape, rounded down to the nearest BLCKSZ boundary, instead of using one more block for some tapes, to get the total up to exact amount of memory available. That should be OK, merging isn't too sensitive to the exact amount of memory used. Reviewed by Peter Geoghegan Discussion: <0f607c4b-df23-353e-bf56-c0389d28495f@iki.fi>	2016-10-12 12:05:45 +03:00
Heikki Linnakangas	d4fca5e6c7	Fix another outdated comment. Preloading is done by logtape.c now.	2016-10-04 19:16:00 +03:00
Heikki Linnakangas	c86c2d9d57	Update comment. mergepreread()/mergeprereadone() don't exist anymore, the function that does roughly the same is now called mergereadnext().	2016-10-04 09:47:54 +03:00
Heikki Linnakangas	e94568ecc1	Change the way pre-reading in external sort's merge phase works. Don't pre-read tuples into SortTuple slots during merge. Instead, use the memory for larger read buffers in logtape.c. We're doing the same number of READTUP() calls either way, but managing the pre-read SortTuple slots is much more complicated. Also, the on-tape representation is more compact than SortTuples, so we can fit more pre-read tuples into the same amount of memory this way. And we have better cache-locality, when we use just a small number of SortTuple slots. Now that we only hold one tuple from each tape in the SortTuple slots, we can greatly simplify the "batch memory" management. We now maintain a small set of fixed-sized slots, to hold the tuples, and fall back to palloc() for larger tuples. We use this method during all merge phases, not just the final merge, and also when randomAccess is requested, and also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we do an external sort. Reviewed by Peter Geoghegan and Claudio Freire. Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>	2016-10-03 13:37:49 +03:00
Heikki Linnakangas	c99dd5bfed	Fix and clarify comments on replacement selection. These were modified by the patch to only use replacement selection for the first run in an external sort.	2016-09-15 11:51:43 +03:00
Heikki Linnakangas	24598337c8	Implement binary heap replace-top operation in a smarter way. In external sort's merge phase, we maintain a binary heap holding the next tuple from each input tape. On each step, the topmost tuple is returned, and replaced with the next tuple from the same tape. We were doing the replacement by deleting the top node in one operation, and inserting the next tuple after that. However, you can do a "replace-top" operation more efficiently, in one "sift-up". A deletion will always walk the heap from top to bottom, but in a replacement, we can stop as soon as we find the right place for the new tuple. This is particularly helpful, if the tapes are not in completely random order, so that the next tuple from a tape is likely to land near the top of the heap. Peter Geoghegan, reviewed by Claudio Freire, with some editing by me. Discussion: <CAM3SWZRhBhiknTF_=NjDSnNZ11hx=U_SEYwbc5vd=x7M4mMiCw@mail.gmail.com>	2016-09-11 16:27:27 +03:00
Tom Lane	f032722f86	Guard against possible memory allocation botch in batchmemtuples(). Negative availMemLessRefund would be problematic. It's not entirely clear whether the case can be hit in the code as it stands, but this seems like good future-proofing in any case. While we're at it, insist that the value be not merely positive but not tiny, so as to avoid doing a lot of repalloc work for little gain. Peter Geoghegan Discussion: <CAM3SWZRVkuUB68DbAkgw=532gW0f+fofKueAMsY7hVYi68MuYQ@mail.gmail.com>	2016-09-06 15:50:31 -04:00
Tom Lane	ea268cdc9a	Add macros to make AllocSetContextCreate() calls simpler and safer. I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>	2016-08-27 17:50:38 -04:00
Robert Haas	008c4135cc	Fix possible sorting error when aborting use of abbreviated keys. Due to an error in the abbreviated key abort logic, the most recently processed SortTuple could be incorrectly marked NULL, resulting in an incorrect final sort order. In the worst case, this could result in a corrupt btree index, which would need to be rebuild using REINDEX. However, abbrevation doesn't abort very often, not all data types use it, and only one tuple would end up in the wrong place, so the practical impact of this mistake may be somewhat limited. Report and patch by Peter Geoghegan.	2016-08-22 15:22:11 -04:00
Tom Lane	b5bce6c1ec	Final pgindent + perltidy run for 9.6.	2016-08-15 13:42:51 -04:00
Bruce Momjian	6eb5b05d22	C comment: fix typo Author: Amit Langote	2016-08-03 10:32:32 -04:00
Robert Haas	1b0fc85077	Properly adjust pointers when tuples are moved during CLUSTER. Otherwise, when we abandon incremental memory accounting and use batch allocation for the final merge pass, we might crash. This has been broken since `0011c0091e`. Peter Geoghegan, tested by Noah Misch	2016-07-07 13:47:16 -04:00
Robert Haas	b22934dc03	Fix a prototype which is inconsistent with the function definition. Peter Geoghegan	2016-07-07 13:46:51 -04:00
Robert Haas	4bc424b968	pgindent run for 9.6	2016-06-09 18:02:36 -04:00
Teodor Sigaev	8b99edefca	Revert CREATE INDEX ... INCLUDING ... It's not ready yet, revert two commits `690c543550` - unstable test output `386e3d7609` - patch itself	2016-04-08 21:52:13 +03:00
Teodor Sigaev	386e3d7609	CREATE INDEX ... INCLUDING (column[, ...]) Now indexes (but only B-tree for now) can contain "extra" column(s) which doesn't participate in index structure, they are just stored in leaf tuples. It allows to use index only scan by using single index instead of two or more indexes. Author: Anastasia Lubennikova with minor editorializing by me Reviewers: David Rowley, Peter Geoghegan, Jeff Janes	2016-04-08 19:45:59 +03:00
Robert Haas	b0b64f6505	Attempt to fix breakage due to declaration following code. Per Tom Lane and the buildfarm.	2016-04-08 10:52:56 -04:00
Robert Haas	0711803775	Use quicksort, not replacement selection, for external sorting. We still use replacement selection for the first run of the sort only and only when the number of tuples is relatively small. Otherwise, the first run, and subsequent runs in all cases, are produced using quicksort. This tends to be faster except perhaps for very small amounts of working memory. Peter Geoghegan, reviewed by Tomas Vondra, Jeff Janes, Mithun Cy, Greg Stark, and me.	2016-04-08 02:36:26 -04:00
Robert Haas	08a6d36dcb	Use INT64_FORMAT instead of %ld for int64. Commit `0011c0091e` introduced this mistake. Patch by me. Reported by Andres Freund, who also reviewed the patch.	2016-03-18 14:54:09 -04:00
Robert Haas	2d8a1e22b1	Various minor corrections of and improvements to comments. Aleksander Alekseev	2016-03-18 09:38:59 -04:00
Robert Haas	c27033ff7c	Update tuplesort.c comments for memory mangement improvements. I'm committing these changes separately so that it's clear what is Peter's original work versus what I changed. This is a followup to commit `0011c0091e`, and these changes are all by me.	2016-03-17 16:11:14 -04:00
Robert Haas	0011c0091e	Improve memory management for external sorts. Introduce a new memory context which stores tuple data, and reset it at the end of each merge pass; this helps avoid memory fragmentation and, consequently, overallocation. Also, for the final merge patch, eliminate memory context chunk header overhead entirely by allocating all of the memory used for buffering tuples during the merge in a single chunk. Since this modestly increases the number of tuples we can store, grow the memtuples array a bit so that we're less likely to run short of slots there. Peter Geoghegan. Review and testing of patches in this series by Jeff Janes, Greg Stark, Mithun Cy, and me.	2016-03-17 16:10:41 -04:00
Robert Haas	f1f5ec1efa	Reuse abbreviated keys in ordered [set] aggregates. When processing ordered aggregates following a sort that could make use of the abbreviated key optimization, only call the equality operator to compare successive pairs of tuples when their abbreviated keys were not equal. Peter Geoghegan, reviewd by Andreas Karlsson and by me.	2016-02-17 15:40:00 +05:30
Tom Lane	65c5fcd353	Restructure index access method API to hide most of it at the C level. This patch reduces pg_am to just two columns, a name and a handler function. All the data formerly obtained from pg_am is now provided in a C struct returned by the handler function. This is similar to the designs we've adopted for FDWs and tablesample methods. There are multiple advantages. For one, the index AM's support functions are now simple C functions, making them faster to call and much less error-prone, since the C compiler can now check function signatures. For another, this will make it far more practical to define index access methods in installable extensions. A disadvantage is that SQL-level code can no longer see attributes of index AMs; in particular, some of the crosschecks in the opr_sanity regression test are no longer possible from SQL. We've addressed that by adding a facility for the index AM to perform such checks instead. (Much more could be done in that line, but for now we're content if the amvalidate functions more or less replace what opr_sanity used to do.) We might also want to expose some sort of reporting functionality, but this patch doesn't do that. Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily editorialized on by me.	2016-01-17 19:36:59 -05:00
Bruce Momjian	ee94300446	Update copyright for 2016 Backpatch certain files through 9.1	2016-01-02 13:33:40 -05:00
Robert Haas	0ba3f3bc65	Comment improvements for abbreviated keys. Peter Geoghegan and Robert Haas	2015-12-22 13:57:18 -05:00
Robert Haas	ee44cb7566	Improve comments about abbreviation abort. Peter Geoghegan	2015-11-03 14:11:49 -05:00
Tom Lane	8ea3e7a75c	Fix bogus "out of memory" reports in tuplestore.c. The tuplesort/tuplestore memory management logic assumed that the chunk allocation overhead for its memtuples array could not increase when increasing the array size. This is and always was true for tuplesort, but we (I, I think) blindly copied that logic into tuplestore.c without noticing that the assumption failed to hold for the much smaller array elements used by tuplestore. Given rather small work_mem, this could result in an improper complaint about "unexpected out-of-memory situation", as reported by Brent DeSpain in bug #13530. The easiest way to fix this is just to increase tuplestore's initial array size so that the assumption holds. Rather than relying on magic constants, though, let's export a #define from aset.c that represents the safe allocation threshold, and make tuplestore's calculation depend on that. Do the same in tuplesort.c to keep the logic looking parallel, even though tuplesort.c isn't actually at risk at present. This will keep us from breaking it if we ever muck with the allocation parameters in aset.c. Back-patch to all supported versions. The error message doesn't occur pre-9.3, not so much because the problem can't happen as because the pre-9.3 tuplestore code neglected to check for it. (The chance of trouble is a great deal larger as of 9.3, though, due to changes in the array-size-increasing strategy.) However, allowing LACKMEM() to become true unexpectedly could still result in less-than-desirable behavior, so let's patch it all the way back.	2015-08-04 18:18:46 -04:00
Robert Haas	a6a2357820	Update comment to match behavior of latest code. Peter Geoghegan	2015-08-04 11:45:29 -04:00
Bruce Momjian	807b9e0dff	pgindent run for 9.5	2015-05-23 21:35:49 -04:00
Robert Haas	61f68e0bed	Fix comment. Commit `78efd5c1ed` overlooked this. Report by Peter Geoghegan.	2015-05-13 15:27:41 -04:00
Robert Haas	78efd5c1ed	Extend abbreviated key infrastructure to datum tuplesorts. Andrew Gierth, reviewed by Peter Geoghegan and by me.	2015-05-13 14:36:26 -04:00
Robert Haas	2720e96a9b	Fix handling of sortKeys field in Tuplesortstate. Commit `5cefbf5a6c` introduced an assumption that this field would always be non-NULL when doing a merge pass, but that's not true. Without this fix, you can crash the server by building a hash index that is sufficiently large relative to maintenance_work_mem, or by triggering a large datum sort. Commit `5ea86e6e65` changed the comments for that field to say that it would be set in all cases except for the hash index case, but that wasn't (and still isn't) true. The datum-sort failure was spotted by Tomas Vondra; initial analysis of that failure was by Peter Geoghegan. The remaining issues were spotted by me during review of the surrounding code, and the patch is all my fault.	2015-03-09 10:35:41 -04:00
Stephen Frost	804b6b6db4	Fix column-privilege leak in error-message paths While building error messages to return to the user, BuildIndexValueDescription, ExecBuildSlotValueDescription and ri_ReportViolation would happily include the entire key or entire row in the result returned to the user, even if the user didn't have access to view all of the columns being included. Instead, include only those columns which the user is providing or which the user has select rights on. If the user does not have any rights to view the table or any of the columns involved then no detail is provided and a NULL value is returned from BuildIndexValueDescription and ExecBuildSlotValueDescription. Note that, for key cases, the user must have access to all of the columns for the key to be shown; a partial key will not be returned. Further, in master only, do not return any data for cases where row security is enabled on the relation and row security should be applied for the user. This required a bit of refactoring and moving of things around related to RLS- note the addition of utils/misc/rls.c. Back-patch all the way, as column-level privileges are now in all supported versions. This has been assigned CVE-2014-8161, but since the issue and the patch have already been publicized on pgsql-hackers, there's no point in trying to hide this commit.	2015-01-28 12:31:30 -05:00
Robert Haas	5cefbf5a6c	Don't use abbreviated keys for the final merge pass. When we write tuples out to disk and read them back in, the abbreviated keys become non-abbreviated, because the readtup routines don't know anything about abbreviation. But without this fix, the rest of the code still thinks the abbreviation-aware compartor should be used, so chaos ensues. Report by Andrew Gierth; patch by Peter Geoghegan.	2015-01-23 11:58:31 -05:00
Robert Haas	4ea51cdfe8	Use abbreviated keys for faster sorting of text datums. This commit extends the SortSupport infrastructure to allow operator classes the option to provide abbreviated representations of Datums; in the case of text, we abbreviate by taking the first few characters of the strxfrm() blob. If the abbreviated comparison is insufficent to resolve the comparison, we fall back on the normal comparator. This can be much faster than the old way of doing sorting if the first few bytes of the string are usually sufficient to resolve the comparison. There is the potential for a performance regression if all of the strings to be sorted are identical for the first 8+ characters and differ only in later positions; therefore, the SortSupport machinery now provides an infrastructure to abort the use of abbreviation if it appears that abbreviation is producing comparatively few distinct keys. HyperLogLog, a streaming cardinality estimator, is included in this commit and used to make that determination for text. Peter Geoghegan, reviewed by me.	2015-01-19 15:28:27 -05:00
Bruce Momjian	4baaf863ec	Update copyright for 2015 Backpatch certain files through 9.0	2015-01-06 11:43:47 -05:00
Robert Haas	5ea86e6e65	Use the sortsupport infrastructure in more cases. This removes some fmgr overhead from cases such as btree index builds. Peter Geoghegan, reviewed by Andreas Karlsson and me.	2014-11-07 15:50:55 -05:00

1 2 3 4

177 Commits