postgres

mirror of https://github.com/postgres/postgres.git synced 2025-06-30 21:42:05 +03:00

Author	SHA1	Message	Date
Tom Lane	164899db1c	Revert thinko introduced into prefix_selectivity() by my recent patch: make_greater_string needs the < procedure not the >= one. Spotted by Peter.	2008-03-17 17:13:54 +00:00
Tom Lane	f4230d2937	Change patternsel() so that instead of switching from a pure pattern-examination heuristic method to purely histogram-driven selectivity at histogram size 100, we compute both estimates and use a weighted average. The weight put on the heuristic estimate decreases linearly with histogram size, dropping to zero for 100 or more histogram entries. Likewise in ltreeparentsel(). After a patch by Greg Stark, though I reorganized the logic a bit to give the caller of histogram_selectivity() more control.	2008-03-09 00:32:09 +00:00
Tom Lane	422495d0da	Modify prefix_selectivity() so that it will never estimate the selectivity of the generated range condition var >= 'foo' AND var < 'fop' as being less than what eqsel() would estimate for var = 'foo'. This is intuitively reasonable and it gets rid of the need for some entirely ad-hoc coding we formerly used to reject bogus estimates. The basic problem here is that if the prefix is more than a few characters long, the two boundary values are too close together to be distinguishable by comparison to the column histogram, resulting in a selectivity estimate of zero, which is often not very sane. Change motivated by an example from Peter Eisentraut. Arguably this is a bug fix, but I'll refrain from back-patching it for the moment.	2008-03-08 22:41:38 +00:00
Bruce Momjian	9098ab9e32	Update copyrights in source tree to 2008.	2008-01-01 19:46:01 +00:00
Tom Lane	9fd8843647	Fix mergejoin cost estimation so that we consider the statistical ranges of the two join variables at both ends: not only trailing rows that need not be scanned because there cannot be a match on the other side, but initial rows that will be scanned without possibly having a match. This allows a more realistic estimate of startup cost to be made, per recent pgsql-performance discussion. In passing, fix a couple of bugs that had crept into mergejoinscansel: it was not quite up to speed for the task of estimating descending-order scans, which is a new requirement in 8.3.	2007-12-08 21:05:11 +00:00
Bruce Momjian	f6e8730d11	Re-run pgindent with updated list of typedefs. (Updated README should avoid this problem in the future.)	2007-11-15 22:25:18 +00:00
Bruce Momjian	fdf5a5efb7	pgindent run for 8.3.	2007-11-15 21:14:46 +00:00
Tom Lane	a96fa85025	Second pass at improving LIKE/regex estimation in non-C locales. It turns out that it's actually quite likely that a string that is an extension of the given prefix will sort as larger than the "greater" string our previous code created. To provide some defense against that, do the comparisons against a modified string instead of just the bare prefix. We tack on "Z", "z", "y", or "9", whichever is seen as largest in the current locale. Testing suggests that this is sufficient at least for cases involving ASCII data.	2007-11-09 20:10:02 +00:00
Tom Lane	2de946be6a	Improve the performance of LIKE/regex estimation in non-C locales, by making make_greater_string() try harder to generate a string that's actually greater than its input string. Before we just assumed that making a string that was memcmp-greater was enough, but it is easy to generate examples where this is not so when the locale is not C. Instead, loop until the relevant comparison function agrees that the generated string is greater than the input. Unfortunately this is probably not enough to guarantee that the generated string is greater than all extensions of the input, so we cannot relax the restriction to C locale for the LIKE/regex index optimization. But it should at least improve the odds of getting a useful selectivity estimate in prefix_selectivity(). Per example from Guillaume Smet. Backpatch to 8.1, mainly because that's what the complainant is using...	2007-11-07 22:37:24 +00:00
Tom Lane	9542287123	Fix patternsel() and callers to do the right thing for NOT LIKE and the other negated-match operators. patternsel had been using the supplied operator as though it were a positive-match operator, and thus obtaining a wrong result, which was even more wrong after the caller subtracted it from 1. Seems cleanest to give patternsel an explicit "negate" argument so that it knows what's going on. Also install the same factorization scheme for pattern join selectivity estimators; even though they are just stubs at the moment, this may keep someone from making the same type of mistake when they get filled out. Per report from Greg Mullane. Backpatch to 8.2 --- previous releases do not show the problem because patternsel() doesn't actually use the operator directly.	2007-11-07 21:00:37 +00:00
Tom Lane	0ee5a39862	Apply a band-aid fix for the problem that 8.2 and up completely misestimate the number of rows likely to be produced by a query such as SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL; What this is doing is selecting for t1 rows with no match in t2, and thus it may produce a significant number of rows even if the t2.key table column contains no nulls at all. 8.2 thinks the table column's null fraction is relevant and thus may estimate no rows out, which results in terrible plans if there are more joins above this one. A proper fix for this will involve passing much more information about the context of a clause to the selectivity estimator functions than we ever have. There's no time left to write such a patch for 8.3, and it wouldn't be back-patchable into 8.2 anyway. Instead, put in an ad-hoc test to defeat the normal table-stats-based estimation when an IS NULL test is evaluated at an outer join, and just use a constant estimate instead --- I went with 0.5 for lack of a better idea. This won't catch every case but it will catch the typical ways of writing such queries, and it seems unlikely to make things worse for other queries.	2007-08-31 23:35:22 +00:00
Tom Lane	140d4ebcb4	Tsearch2 functionality migrates to core. The bulk of this work is by Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing, so anything that's broken is probably my fault. Documentation is nonexistent as yet, but let's land the patch so we can get some portability testing done.	2007-08-21 01:11:32 +00:00
Magnus Hagander	343a9a27a9	Check return code from strxfrm on Windows since it has a non-standard way of indicating errors, so we don't try to allocate INT_MAX bytes to store a result in.	2007-05-05 17:05:48 +00:00
Tom Lane	afcf09dd90	Some further performance tweaks for planning large inheritance trees that are mostly excluded by constraints: do the CE test a bit earlier to save some adjust_appendrel_attrs() work on excluded children, and arrange to use array indexing rather than rt_fetch() to fetch RTEs in the main body of the planner. The latter is something I'd wanted to do for awhile anyway, but seeing list_nth_cell() as 35% of the runtime gets one's attention.	2007-04-21 21:01:45 +00:00
Tom Lane	f02a82b6ad	Make 'col IS NULL' clauses be indexable conditions. Teodor Sigaev, with some kibitzing from Tom Lane.	2007-04-06 22:33:43 +00:00
Tom Lane	bf94076348	Fix array coercion expressions to ensure that the correct volatility is seen by code inspecting the expression. The best way to do this seems to be to drop the original representation as a function invocation, and instead make a special expression node type that represents applying the element-type coercion function to each array element. In this way the element function is exposed and will be checked for volatility. Per report from Guillaume Smet.	2007-03-27 23:21:12 +00:00
Tom Lane	54d20024c1	Fix some problems with selectivity estimation for partial indexes. First, genericcostestimate() was being way too liberal about including partial-index conditions in its selectivity estimate, resulting in substantial underestimates for situations such as an indexqual "x = 42" used with an index on x "WHERE x >= 40 AND x < 50". While the code is intentionally set up to favor selecting partial indexes when available, this was too much... Second, choose_bitmap_and() was likewise easily fooled by cases of this type, since it would similarly think that the partial index had selectivity independent of the indexqual. Fixed by using predicate_implied_by() rather than simple equality checks to determine redundancy. This is a good deal more expensive but I don't see much alternative. At least the extra cost is only paid when there's actually a partial index under consideration. Per report from Jeff Davis. I'm not going to risk back-patching this, though.	2007-03-21 22:18:12 +00:00
Tom Lane	0f4ff460c4	Fix up the remaining places where the expression node structure would lose available information about the typmod of an expression; namely, Const, ArrayRef, ArrayExpr, and EXPR and ARRAY SubLinks. In the ArrayExpr and SubLink cases it wasn't really the data structure's fault, but exprTypmod() being lazy. This seems like a good idea in view of the expected increase in typmod usage from Teodor's work to allow user-defined types to have typmods. In particular this responds to the concerns we had about eliminating the special-purpose hack that exprTypmod() used to have for BPCHAR Consts. We can now tell whether or not such a Const has been cast to a specific length, and report or display properly if so. initdb forced due to changes in stored rules.	2007-03-17 00:11:05 +00:00
Tom Lane	234a02b2a8	Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len). Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with VARSIZE and VARDATA, and as a consequence almost no code was using the longer names. Rename the length fields of struct varlena and various derived structures to catch anyplace that was accessing them directly; and clean up various places so caught. In itself this patch doesn't change any behavior at all, but it is necessary infrastructure if we hope to play any games with the representation of varlena headers. Greg Stark and Tom Lane	2007-02-27 23:48:10 +00:00
Tom Lane	eab6b8b27e	Turn the rangetable used by the executor into a flat list, and avoid storing useless substructure for its RangeTblEntry nodes. (I chose to keep using the same struct node type and just zero out the link fields for unneeded info, rather than making a separate ExecRangeTblEntry type --- it seemed too fragile to have two different rangetable representations.) Along the way, put subplans into a list in the toplevel PlannedStmt node, and have SubPlan nodes refer to them by list index instead of direct pointers. Vadim wanted to do that years ago, but I never understood what he was on about until now. It makes things a whole lot more robust, because we can stop worrying about duplicate processing of subplans during expression tree traversals. That's been a constant source of bugs, and it's finally gone. There are some consequent simplifications yet to be made, like not using a separate EState for subplans in the executor, but I'll tackle that later.	2007-02-22 22:00:26 +00:00
Tom Lane	7c5e5439d2	Get rid of some old and crufty global variables in the planner. When this code was last gone over, there wasn't really any alternative to globals because we didn't have the PlannerInfo struct being passed all through the planner code. Now that we do, we can restructure things to avoid non-reentrancy. I'm fooling with this because otherwise I'd have had to add another global variable for the planned compact range table list.	2007-02-19 07:03:34 +00:00
Teodor Sigaev	61f621b506	Revert gincostestimate changes.	2007-01-31 16:54:51 +00:00
Teodor Sigaev	d4c6da1527	Allow GIN's extractQuery method to signal that nothing can satisfy the query. In this case extractQuery should returns -1 as nentries. This changes prototype of extractQuery method to use int32* instead of uint32* for nentries argument. Based on that gincostestimate may see two corner cases: nothing will be found or seqscan should be used. Per proposal at http://archives.postgresql.org/pgsql-hackers/2007-01/msg01581.php PS tsearch_core patch should be sightly modified to support changes, but I'm waiting a verdict about reviewing of tsearch_core patch.	2007-01-31 15:09:45 +00:00
Tom Lane	a053437d9e	Dept of second thoughts: the IQ of estimate_array_length() needs to be kept on par with that of scalararraysel(), else estimates that should track might not. Hence teach it about binary-compatible cases, too.	2007-01-28 02:53:34 +00:00
Tom Lane	af18f6ad85	Fix scalararraysel() to cope with binary-compatible cases, such as text[] versus varchar[]. This oversight probably explains Ryan Holmes' recent complaint --- he was getting a generic selectivity estimate instead of anything intelligent.	2007-01-28 01:37:38 +00:00
Tom Lane	4f06c688c7	Put back planner's ability to cache the results of mergejoinscansel(), which I had removed in the first cut of the EquivalenceClass rewrite to simplify that patch a little. But it's still important --- in a four-way join problem mergejoinscansel() was eating about 40% of the planning time according to gprof. Also, improve the EquivalenceClass code to re-use join RestrictInfos rather than generating fresh ones for each join considered. This saves some memory space but more importantly improves the effectiveness of caching planning info in RestrictInfos.	2007-01-22 20:00:40 +00:00
Tom Lane	f41803bb39	Refactor planner's pathkeys data structure to create a separate, explicit representation of equivalence classes of variables. This is an extensive rewrite, but it brings a number of benefits: * planner no longer fails in the presence of "incomplete" operator families that don't offer operators for every possible combination of datatypes. * avoid generating and then discarding redundant equality clauses. * remove bogus assumption that derived equalities always use operators named "=". * mergejoins can work with a variety of sort orders (e.g., descending) now, instead of tying each mergejoinable operator to exactly one sort order. * better recognition of redundant sort columns. * can make use of equalities appearing underneath an outer join.	2007-01-20 20:45:41 +00:00
Tom Lane	4431758229	Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST per-column options for btree indexes. The planner's support for this is still pretty rudimentary; it does not yet know how to plan mergejoins with nondefault ordering options. The documentation is pretty rudimentary, too. I'll work on improving that stuff later. Note incompatible change from prior behavior: ORDER BY ... USING will now be rejected if the operator is not a less-than or greater-than member of some btree opclass. This prevents less-than-sane behavior if an operator that doesn't actually define a proper sort ordering is selected.	2007-01-09 02:14:16 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	d6061d2f31	Fix regex_fixed_prefix() to cope reasonably well with regex patterns of the form '^(foo)$'. Before, these could never be optimized into indexscans. The recent changes to make psql and pg_dump generate such patterns (for \d commands and -t and related switches, respectively) therefore represented a big performance hit for people with large pg_class catalogs, as seen in recent gripe from Erik Jones. While at it, be more paranoid about case-sensitivity checking in multibyte encodings, and fix some other corner cases in which a regex might be interpreted too liberally.	2007-01-03 22:39:26 +00:00
Tom Lane	a78fcfb512	Restructure operator classes to allow improved handling of cross-data-type cases. Operator classes now exist within "operator families". While most families are equivalent to a single class, related classes can be grouped into one family to represent the fact that they are semantically compatible. Cross-type operators are now naturally adjunct parts of a family, without having to wedge them into a particular opclass as we had done originally. This commit restructures the catalogs and cleans up enough of the fallout so that everything still works at least as well as before, but most of the work needed to actually improve the planner's behavior will come later. Also, there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way to create a new family right now is to allow CREATE OPERATOR CLASS to make one by default. I owe some more documentation work, too. But that can all be done in smaller pieces once this infrastructure is in place.	2006-12-23 00:43:13 +00:00
Tom Lane	281f40187f	Fix some planner bugs exposed by reports from Arjen van der Meijden. These are all in new-in-8.2 logic associated with indexability of ScalarArrayOpExpr (IN-clauses) or amortization of indexscan costs across repeated indexscans on the inside of a nestloop. In particular: Fix some logic errors in the estimation for multiple scans induced by a ScalarArrayOpExpr indexqual. Include a small cost component in bitmap index scans to reflect the costs of manipulating the bitmap itself; this is mainly to prevent a bitmap scan from appearing to have the same cost as a plain indexscan for fetching a single tuple. Also add a per-index-scan-startup CPU cost component; while prior releases were clearly too pessimistic about the cost of repeated indexscans, the original 8.2 coding allowed the cost of an indexscan to effectively go to zero if repeated often enough, which is overly optimistic. Pay some attention to index correlation when estimating costs for a nestloop inner indexscan: this is significant when the plan fetches multiple heap tuples per iteration, since high correlation means those tuples are probably on the same or adjacent heap pages.	2006-12-15 18:42:26 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Tom Lane	bfd1ffa948	Change patternsel (LIKE/regex selectivity estimation) so that if there is a large enough histogram, it will use the number of matches in the histogram to derive a selectivity estimate, rather than the admittedly pretty bogus heuristics involving examining the pattern contents. I set 'large enough' at 100, but perhaps we should change that later. Also apply the same technique in contrib/ltree's <@ and @> estimator. Per discussion with Stefan Kaltenbrunner and Matteo Beccati.	2006-09-20 19:50:21 +00:00
Tom Lane	b74c543685	Improve usage of effective_cache_size parameter by assuming that all the tables in the query compete for cache space, not just the one we are currently costing an indexscan for. This seems more realistic, and it definitely will help in examples recently exhibited by Stefan Kaltenbrunner. To get the total size of all the tables involved, we must tweak the handling of 'append relations' a bit --- formerly we looked up information about the child tables on-the-fly during set_append_rel_pathlist, but it needs to be done before we start doing any cost estimation, so push it into the add_base_rels_to_query scan.	2006-09-19 22:49:53 +00:00
Bruce Momjian	9a7483714f	Work around bug in strxfmt() but in MS VS2005. William ZHANG	2006-07-26 17:17:28 +00:00
Tom Lane	8dcaea7be0	Add a fudge factor to genericcostestimate() to prevent the planner from thinking that indexes of different sizes are equally attractive. Per gripe from Jim Nasby. (I remain unconvinced that there's such a problem in existing releases, but CVS HEAD definitely has got a problem because of its new count-only-leaf-pages approach to indexscan costing.)	2006-07-24 01:19:48 +00:00
Bruce Momjian	e0522505bd	Remove 576 references of include files that were not needed.	2006-07-14 14:52:27 +00:00
Tom Lane	08ccdf020e	Fix oversight in planning for multiple indexscans driven by ScalarArrayOpExpr index quals: we were estimating the right total number of rows returned, but treating the index-access part of the cost as if a single scan were fetching that many consecutive index tuples. Actually we should treat it as a multiple indexscan, and if there are enough of 'em the Mackert-Lohman discount should kick in.	2006-07-01 22:07:23 +00:00
Tom Lane	8a30cc2127	Make the planner estimate costs for nestloop inner indexscans on the basis that the Mackert-Lohmann formula applies across all the repetitions of the nestloop, not just each scan independently. We use the M-L formula to estimate the number of pages fetched from the index as well as from the table; that isn't what it was designed for, but it seems reasonably applicable anyway. This makes large numbers of repetitions look much cheaper than before, which accords with many reports we've received of overestimation of the cost of a nestloop. Also, change the index access cost model to charge random_page_cost per index leaf page touched, while explicitly not counting anything for access to metapage or upper tree pages. This may all need tweaking after we get some field experience, but in simple tests it seems to be giving saner results than before. The main thing is to get the infrastructure in place to let cost_index() and amcostestimate functions take repeated scans into account at all. Per my recent proposal. Note: this patch changes pg_proc.h, but I did not force initdb because the changes are basically cosmetic --- the system does not look into pg_proc to decide how to call an index amcostestimate function, and there's no way to call such a function from SQL at all.	2006-06-06 17:59:58 +00:00
Tom Lane	eed6c9ed7e	Add a GUC parameter seq_page_cost, and use that everywhere we formerly assumed that a sequential page fetch has cost 1.0. This patch doesn't in itself change the system's behavior at all, but it opens the door to people adopting other units of measurement for EXPLAIN costs. Also, if we ever decide it's worth inventing per-tablespace access cost settings, this change provides a workable intellectual framework for that.	2006-06-05 02:49:58 +00:00
Teodor Sigaev	8a3631f8d8	GIN: Generalized Inverted iNdex. text[], int4[], Tsearch2 support for GIN.	2006-05-02 11:28:56 +00:00
Tom Lane	427c6b5b98	Avoid assuming that statistics for a parent relation reflect the properties of the union of its child relations as well. This might have been a good idea when it was originally coded, but it's a fatally bad idea when inheritance is being used for partitioning. It's better to have no stats at all than completely misleading stats. Per report from Mark Liberman. The bug arguably exists all the way back, but I've only patched HEAD and 8.1 because we weren't particularly trying to support partitioning before 8.1. Eventually we ought to look at deriving union statistics instead of just punting, but for now the drop kick looks good.	2006-05-02 04:34:18 +00:00
Tom Lane	0f0a33099c	Generalize mcv_selectivity() to support both VAR OP CONST and CONST OP VAR cases. This was not needed in the existing uses within selfuncs.c, but if we're gonna export it for general use, the extra generality seems helpful. Motivated by looking at ltree example.	2006-04-27 17:52:40 +00:00
Tom Lane	a3c1a11fc1	If we're going to expose VariableStatData for contrib modules to use, then we should export a reasonable set of the supporting routines too.	2006-04-27 00:46:59 +00:00
Bruce Momjian	59d61409cd	Move ltree parentsel() selectivity function into /contrib/ltree.	2006-04-26 22:33:36 +00:00
Bruce Momjian	b3e4aefcfb	Enhanced containment selectivity function for /contrib/ltree Matteo Beccati	2006-04-26 18:28:34 +00:00
Tom Lane	efe222268f	Eliminate some no-longer-needed workarounds for palloc's old behavior of rejecting palloc(0). Also, tweak like_selectivity() to avoid assuming the presented pattern is nonempty; although that assumption is valid, it doesn't really help much, and the new coding is more correct anyway since it properly handles redundant wildcards. In combination these changes should eliminate a Coverity warning noted by Martijn.	2006-04-20 17:50:18 +00:00
Bruce Momjian	f2f5b05655	Update copyright for 2006. Update scripts.	2006-03-05 15:59:11 +00:00
Tom Lane	3a0a16cb7e	Allow row comparisons to be used as indexscan qualifications. This completes the project to upgrade our handling of row comparisons.	2006-01-25 20:29:24 +00:00

1 2 3 4 5

246 Commits