postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-22 12:22:45 +03:00

Author	SHA1	Message	Date
Tom Lane	26a7b48e10	Eliminate some repetitive coding in tuplesort.c. Use a macro LogicalTapeReadExact() to encapsulate the error check when we want to read an exact number of bytes from a "tape". Per a suggestion of Takahiro Itagaki.	2010-10-07 20:32:21 -04:00
Tom Lane	3ba11d3df2	Teach CLUSTER to use seqscan-and-sort when it's faster than indexscan. ... or at least, when the planner's cost estimates say it will be faster. Leonardo Francalanci, reviewed by Itagaki Takahiro and Tom Lane	2010-10-07 20:00:28 -04:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Bruce Momjian	65e806cba1	pgindent run for 9.0	2010-02-26 02:01:40 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Heikki Linnakangas	84d723b6ce	Previous fix for temporary file management broke returning a set from PL/pgSQL function within an exception handler. Make sure we use the right resource owner when we create the tuplestore to hold returned tuples. Simplify tuplestore API so that the caller doesn't need to be in the right memory context when calling tuplestore_put* functions. tuplestore.c automatically switches to the memory context used when the tuplestore was created. Tuplesort was already modified like this earlier. This patch also removes the now useless MemoryContextSwitch calls from callers. Report by Aleksei on pgsql-bugs on Dec 22 2009. Backpatch to 8.1, like the previous patch that broke this.	2009-12-29 17:40:59 +00:00
Tom Lane	9bd27b7c9e	Extend EXPLAIN to support output in XML or JSON format. There are probably still some adjustments to be made in the details of the output, but this gets the basic structure in place. Robert Haas	2009-08-10 05:46:50 +00:00
Tom Lane	527f0ae3fa	Department of second thoughts: let's show the exact key during unique index build failures, too. Refactor a bit more since that error message isn't spelled the same.	2009-08-01 20:59:17 +00:00
Bruce Momjian	d747140279	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew.	2009-06-11 14:49:15 +00:00
Tom Lane	25bf7f8b9b	Fix possible failures when a tuplestore switches from in-memory to on-disk mode while callers hold pointers to in-memory tuples. I reported this for the case of nodeWindowAgg's primary scan tuple, but inspection of the code shows that all of the calls in nodeWindowAgg and nodeCtescan are at risk. For the moment, fix it with a rather brute-force approach of copying whenever one of the at-risk callers requests a tuple. Later we might think of some sort of reference-count approach to reduce tuple copying.	2009-03-27 18:30:21 +00:00
Tom Lane	e04810e8c4	Code review for dtrace probes added (so far) to 8.4. Adjust placement of some bufmgr probes, take out redundant and memory-leak-inducing path arguments to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to recalculate space used in sort__done, clean up formatting in places where I'm not sure pgindent will do a nice job by itself.	2009-03-11 23:19:25 +00:00
Bruce Momjian	511db38ace	Update copyright for 2009.	2009-01-01 17:24:05 +00:00
Tom Lane	95b07bc7f5	Support window functions a la SQL:2008. Hitoshi Harada, with some kibitzing from Heikki and Tom.	2008-12-28 18:54:01 +00:00
Tom Lane	38e9348282	Make a couple of small changes to the tuplestore API, for the benefit of the upcoming window-functions patch. First, tuplestore_trim is now an exported function that must be explicitly invoked by callers at appropriate times, rather than something that tuplestore tries to do behind the scenes. Second, a read pointer that is marked as allowing backward scan no longer prevents truncation. This means that a read pointer marked as having BACKWARD but not REWIND capability can only safely read backwards as far as the oldest other read pointer. (The expected use pattern for this involves having another read pointer that serves as the truncation fencepost.)	2008-12-27 17:39:00 +00:00
Tom Lane	d26bf23f34	Arrange to squeeze out the MINIMAL_TUPLE_PADDING in the tuple representation written to temp files by tuplesort.c and tuplestore.c. This saves 2 bytes per row for 32-bit machines, and 6 bytes per row for 64-bit machines, which seems worth the slight additional uglification of the tuple read/write routines.	2008-10-28 15:51:03 +00:00
Tom Lane	34f89cb4af	Fix oversight in recent patch to support multiple read positions in tuplestore: in READFILE state tuplestore_select_read_pointer must save the current file seek position in the read pointer being deactivated.	2008-10-07 00:05:55 +00:00
Tom Lane	44d5be0e53	Implement SQL-standard WITH clauses, including WITH RECURSIVE. There are some unimplemented aspects: recursive queries must use UNION ALL (should allow UNION too), and we don't have SEARCH or CYCLE clauses. These might or might not get done for 8.4, but even without them it's a pretty useful feature. There are also a couple of small loose ends and definitional quibbles, which I'll send a memo about to pgsql-hackers shortly. But let's land the patch now so we can get on with other development. Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane	2008-10-04 21:56:55 +00:00
Tom Lane	dad4cb6258	Improve tuplestore.c to support multiple concurrent read positions. This facility replaces the former mark/restore support but is otherwise upward-compatible with previous uses. It's expected to be needed for single evaluation of CTEs and also for window functions, so I'm committing it separately instead of waiting for either one of those patches to be finished. Per discussion with Greg Stark and Hitoshi Harada. Note: I removed nodeFunctionscan's mark/restore support, instead of bothering to update it for this change, because it was dead code anyway.	2008-10-01 19:51:50 +00:00
Tom Lane	4adc2f72a4	Change hash indexes to store only the hash code rather than the whole indexed value. This means that hash index lookups are always lossy and have to be rechecked when the heap is visited; however, the gain in index compactness outweighs this when the indexed values are wide. Also, we only need to perform datatype comparisons when the hash codes match exactly, rather than for every entry in the hash bucket; so it could also win for datatypes that have expensive comparison functions. A small additional win is gained by keeping hash index pages sorted by hash code and using binary search to reduce the number of index tuples we have to look at. Xiao Meng This commit also incorporates Zdenek Kotala's patch to isolate hash metapages and hash bitmaps a bit better from the page header datastructures.	2008-09-15 18:43:41 +00:00
Alvaro Herrera	e36e6b1cab	Add a few more DTrace probes to the backend. Robert Lor	2008-08-01 13:16:09 +00:00
Alvaro Herrera	a3540b0f65	Improve our #include situation by moving pointer types away from the corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.	2008-06-19 00:46:06 +00:00
Alvaro Herrera	f8c4d7db60	Restructure some header files a bit, in particular heapam.h, by removing some unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.	2008-05-12 00:00:54 +00:00
Neil Conway	1d812a98b4	Add a new tuplestore API function, tuplestore_putvalues(). This is identical to tuplestore_puttuple(), except it operates on arrays of Datums + nulls rather than a fully-formed HeapTuple. In several places that use the tuplestore API, this means we can avoid creating a HeapTuple altogether, saving a copy.	2008-03-25 19:26:54 +00:00
Tom Lane	0c5962c054	Grab some low-hanging fruit in the new hash index build code. oprofile shows that a nontrivial amount of time is being spent in repeated calls to index_getprocinfo, which really only needs to be called once. So do that, and inline _hash_datum2hashkey to make it work.	2008-03-17 03:45:36 +00:00
Tom Lane	787eba734b	When creating a large hash index, pre-sort the index entries by estimated bucket number, so as to ensure locality of access to the index during the insertion step. Without this, building an index significantly larger than available RAM takes a very long time because of thrashing. On the other hand, sorting is just useless overhead when the index does fit in RAM. We choose to sort when the initial index size exceeds effective_cache_size. This is a revised version of work by Tom Raney and Shreya Bhargava.	2008-03-16 23:15:08 +00:00
Tom Lane	f0828b2fc3	Provide a build-time option to store large relations as single files, rather than dividing them into 1GB segments as has been our longtime practice. This requires working support for large files in the operating system; at least for the time being, it won't be the default. Zdenek Kotala	2008-03-10 20:06:27 +00:00
Peter Eisentraut	0474dcb608	Refactor backend makefiles to remove lots of duplicate code	2008-02-19 10:30:09 +00:00
Bruce Momjian	9098ab9e32	Update copyrights in source tree to 2008.	2008-01-01 19:46:01 +00:00
Bruce Momjian	fdf5a5efb7	pgindent run for 8.3.	2007-11-15 21:14:46 +00:00
Tom Lane	2aae35d049	Mention the index name in 'could not create unique index' errors, per suggestion from Rene Gollent.	2007-10-29 21:31:28 +00:00
Tom Lane	d2825e1c85	Since sort_bounded_heap makes state changes that should be made regardless of the number of tuples involved, it's incorrect to skip it when memtupcount = 1; the number of cycles saved is minuscule anyway. An alternative solution would be to pull the state changes out to the call site in tuplesort_performsort, but keeping them near the corresponding changes in make_bounded_heap seems marginally cleaner. Noticed by Greg Stark.	2007-09-01 18:47:39 +00:00
Neil Conway	494d6f809e	Fix a memory leak in tuplestore_end(). Unlikely to be significant during normal operation, but tuplestore_end() ought to do what it claims to do.	2007-08-02 17:48:52 +00:00
Tom Lane	24ee8af573	Rework temp_tablespaces patch so that temp tablespaces are assigned separately for each temp file, rather than once per sort or hashjoin; this allows spreading the data of a large sort or join across multiple tablespaces. (I remain dubious that this will make any difference in practice, but certain people insisted.) Arrange to cache the results of parsing the GUC variable instead of recomputing from scratch on every demand, and push usage of the cache down to the bottommost fd.c level.	2007-06-07 19:19:57 +00:00
Tom Lane	acfce502ba	Create a GUC parameter temp_tablespaces that allows selection of the tablespace(s) in which to store temp tables and temporary files. This is a list to allow spreading the load across multiple tablespaces (a random list element is chosen each time a temp object is to be created). Temp files are not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace directories. Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.	2007-06-03 17:08:34 +00:00
Tom Lane	2415ad9831	Teach tuplestore.c to throw away data before the "mark" point when the caller is using mark/restore but not rewind or backward-scan capability. Insert a materialize plan node between a mergejoin and its inner child if the inner child is a sort that is expected to spill to disk. The materialize shields the sort from the need to do mark/restore and thereby allows it to perform its final merge pass on-the-fly; while the materialize itself is normally cheap since it won't spill to disk unless the number of tuples with equal key values exceeds work_mem. Greg Stark, with some kibitzing from Tom Lane.	2007-05-21 17:57:35 +00:00
Tom Lane	d2a4a4069f	Add a line to the EXPLAIN ANALYZE output for a Sort node, showing the actual sort strategy and amount of space used. By popular demand.	2007-05-04 21:29:53 +00:00
Tom Lane	d26559dbf3	Teach tuplesort.c about "top N" sorting, in which only the first N tuples need be returned. We keep a heap of the current best N tuples and sift-up new tuples into it as we scan the input. For M input tuples this means only about Mlog(N) comparisons instead of Mlog(M), not to mention a lot less workspace when N is small --- avoiding spill-to-disk for large M is actually the most attractive thing about it. Patch includes planner and executor support for invoking this facility in ORDER BY ... LIMIT queries. Greg Stark, with some editorialization by moi.	2007-05-04 01:13:45 +00:00
Peter Eisentraut	2cc01004c6	Remove remains of old depend target.	2007-01-20 17:16:17 +00:00
Tom Lane	a191a169d6	Change the planner-to-executor API so that the planner tells the executor which comparison operators to use for plan nodes involving tuple comparison (Agg, Group, Unique, SetOp). Formerly the executor looked up the default equality operator for the datatype, which was really pretty shaky, since it's possible that the data being fed to the node is sorted according to some nondefault operator class that could have an incompatible idea of equality. The planner knows what it has sorted by and therefore can provide the right equality operator to use. Also, this change moves a couple of catalog lookups out of the executor and into the planner, which should help startup time for pre-planned queries by some small amount. Modify the planner to remove some other cavalier assumptions about always being able to use the default operators. Also add "nulls first/last" info to the Plan node for a mergejoin --- neither the executor nor the planner can cope yet, but at least the API is in place.	2007-01-10 18:06:05 +00:00
Tom Lane	4431758229	Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST per-column options for btree indexes. The planner's support for this is still pretty rudimentary; it does not yet know how to plan mergejoins with nondefault ordering options. The documentation is pretty rudimentary, too. I'll work on improving that stuff later. Note incompatible change from prior behavior: ORDER BY ... USING will now be rejected if the operator is not a less-than or greater-than member of some btree opclass. This prevents less-than-sane behavior if an operator that doesn't actually define a proper sort ordering is selected.	2007-01-09 02:14:16 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	a78fcfb512	Restructure operator classes to allow improved handling of cross-data-type cases. Operator classes now exist within "operator families". While most families are equivalent to a single class, related classes can be grouped into one family to represent the fact that they are semantically compatible. Cross-type operators are now naturally adjunct parts of a family, without having to wedge them into a particular opclass as we had done originally. This commit restructures the catalogs and cleans up enough of the fallout so that everything still works at least as well as before, but most of the work needed to actually improve the planner's behavior will come later. Also, there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way to create a new family right now is to allow CREATE OPERATOR CLASS to make one by default. I owe some more documentation work, too. But that can all be done in smaller pieces once this infrastructure is in place.	2006-12-23 00:43:13 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Tom Lane	6edd2b4a91	Switch over to using our own qsort() all the time, as has been proposed repeatedly. Now that we don't have to worry about memory leaks from glibc's qsort, we can safely put CHECK_FOR_INTERRUPTS into the tuplesort comparators, as was requested a couple months ago. Also, get rid of non-reentrancy and an extra level of function call in tuplesort.c by providing a variant qsort_arg() API that passes an extra void * argument through to the comparison routine. (We might want to use that in other places too, I didn't look yet.)	2006-10-03 22:18:23 +00:00
Bruce Momjian	e0522505bd	Remove 576 references of include files that were not needed.	2006-07-14 14:52:27 +00:00
Tom Lane	cdd5178c69	Extend the MinimalTuple concept to tuplesort.c, thereby reducing the per-tuple space overhead for sorts in memory. I chose to replace the previous patch that tried to write out the bare minimum amount of data when sorting on disk; instead, just dump the MinimalTuples as-is. This wastes 3 to 10 bytes per tuple depending on architecture and null-bitmap length, but the simplification in the writetup/readtup routines seems worth it.	2006-06-27 16:53:02 +00:00
Tom Lane	3f50ba27cf	Create infrastructure for 'MinimalTuple' representation of in-memory tuples with less header overhead than a regular HeapTuple, per my recent proposal. Teach TupleTableSlot code how to deal with these. As proof of concept, change tuplestore.c to store MinimalTuples instead of HeapTuples. Future patches will expand the concept to other places where it is useful.	2006-06-27 02:51:40 +00:00
Tom Lane	7f52e0c50e	Tweak writetup_heap/readtup_heap to avoid storing the tuple identity and transaction visibility fields of tuples being sorted. These are always uninteresting in a tuple being sorted (if the fields were actually selected, they'd have been pulled out into user columns beforehand). This saves about 24 bytes per row being sorted, which is a useful savings for any but the widest of sort rows. Per recent discussion.	2006-05-23 21:37:59 +00:00
Tom Lane	c65ab0bfa9	Recent changes in memory management in tuplesort.c had a problem: the case where we run low on array slots before we run low on memory is much more probable than I had thought, and so it's important to treat each tape fairly in that case. To fix this, track per-tape slot allocations just like we track per-tape space allocation. Also, in the FINALMERGE code path avoid scanning all the input tapes when we really only need to read from one. This should fix poor behavior with very large work_mem as exhibited by Stefan Kaltenbrunner. I didn't do anything about putting an upper bound on the number of tapes, but maybe we should still consider that.	2006-03-10 23:19:00 +00:00
Tom Lane	c8cd76de28	Tweak trace_sort code to show the merge order (number of active input tapes) for each merge step. This will give us some idea of how effective the merge distribution algorithm is.	2006-03-08 16:59:03 +00:00

... 4 5 6 7 8 ...

440 Commits