Since collation is effectively an argument, not a property of the function,
FmgrInfo is really the wrong place for it; and this becomes critical in
cases where a cached FmgrInfo is used for varying purposes that might need
different collation settings. Fix by passing it in FunctionCallInfoData
instead. In particular this allows a clean fix for bug #5970 (record_cmp
not working). This requires touching a bit more code than the original
method, but nobody ever thought that collations would not be an invasive
patch...
In particular, if we don't have real ndistinct estimates for both sides,
fall back to assuming that half of the left-hand rows have join partners.
This is what was done in 8.2 and 8.3 (cf nulltestsel() in those versions).
It's pretty stupid but it won't lead us to think that an antijoin produces
no rows out, as seen in recent example from Uwe Schroeder.
This warning is new in gcc 4.6 and part of -Wall. This patch cleans
up most of the noise, but there are some still warnings that are
trickier to remove.
This is necessary, not optional, now that ILIKE and regexes are collation
aware --- else we might derive a wrong comparison constant for index
optimized pattern matches.
The previous coding worked only if ltproc->fn_collation was always either
DEFAULT_COLLATION_OID or a C-compatible locale. While that's true at the
moment, it wasn't documented (and in fact wasn't true when this code was
committed...). But it only takes a couple more lines to make its internal
caching behavior locale-aware, so let's do that.
In nearly all cases, the caller already knows the correct collation, and
in a number of places, the value the caller has handy is more correct than
the default for the type would be. (In particular, this patch makes it
significantly less likely that eval_const_expressions will result in
changing the exposed collation of an expression.) So an internal lookup
is both expensive and wrong.
All expression nodes now have an explicit output-collation field, unless
they are known to only return a noncollatable data type (such as boolean
or record). Also, nodes that can invoke collation-aware functions store
a separate field that is the collation value to pass to the function.
This avoids confusion that arises when a function has collatable inputs
and noncollatable output type, or vice versa.
Also, replace the parser's on-the-fly collation assignment method with
a post-pass over the completed expression tree. This allows us to use
a more complex (and hopefully more nearly spec-compliant) assignment
rule without paying for it in extra storage in every expression node.
Fix assorted bugs in the planner's handling of collations by making
collation one of the defining properties of an EquivalenceClass and
by converting CollateExprs into discardable RelabelType nodes during
expression preprocessing.
While this will give wrong answers when estimating selectivity for a
comparison operator that's using a non-default collation, the estimation
error probably won't be large; and anyway the former approach created
estimation errors of its own by trying to use a histogram that might have
been computed with some other collation. So we'll adopt this simplified
approach for now and perhaps improve it sometime in the future.
This patch incorporates changes from Andres Freund to make sure that
selfuncs.c passes a valid collation OID to any datatype-specific function
it calls, in case that function wants collation information. Said OID will
now always be DEFAULT_COLLATION_OID, but at least we won't get errors.
That function was supposing that indexoid == 0 for a hypothetical index,
but that is not likely to be true in any non-toy implementation of an index
adviser, since assigning a fake OID is the only way to know at EXPLAIN time
which hypothetical index got selected. Fix by adding a flag to
IndexOptInfo to mark hypothetical indexes. Back-patch to 9.0 where
get_actual_variable_range() was added.
Gurjeet Singh
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
The only reason this wasn't crashing while testing the core anyarray
operators was that it was disabled for those cases because of passing the
wrong type information to get_opfamily_proc :-(. So fix that too, and make
it insist on finding the support proc --- in hindsight, silently doing
nothing is not as sane a coping mechanism as all that.
The "date" type supports a wider range of dates than int64 timestamps do.
However, there is pre-int64-timestamp code in the planner that assumes that
all date values can be converted to timestamp with impunity. Fortunately,
what we really need out of the conversion is always a double (float8)
value; so even when the date is out of timestamp's range it's possible to
produce a sane answer. All we need is a code path that doesn't try to
force the result into int64. Per trouble report from David Rericha.
Back-patch to all supported versions. Although this is surely a corner
case, there's not much point in advertising a date range wider than
timestamp's if we will choke on such values in unexpected places.
This is a heavily revised version of builtin_knngist_core-0.9. The
ordering operators are no longer mixed in with actual quals, which would
have confused not only humans but significant parts of the planner.
Instead, ordering operators are carried separately throughout planning and
execution.
Since the API for ambeginscan and amrescan functions had to be changed
anyway, this commit takes the opportunity to rationalize that a bit.
RelationGetIndexScan no longer forces a premature index_rescan call;
instead, callers of index_beginscan must call index_rescan too. Aside from
making the AM-side initialization logic a bit less peculiar, this has the
advantage that we do not make a useless extra am_rescan call when there are
runtime key values. AMs formerly could not assume that the key values
passed to amrescan were actually valid; now they can.
Teodor Sigaev and Tom Lane
Formerly we looked up the operators associated with each index (caching
them in relcache) and then the planner looked up the btree opfamily
containing such operators in order to build the btree-centric pathkey
representation that describes the index's sort order. This is quite
pointless for btree indexes: we might as well just use the index's opfamily
information directly. That saves syscache lookup cycles during planning,
and furthermore allows us to eliminate the relcache's caching of operators
altogether, which may help in reducing backend startup time.
I added code to plancat.c to perform the same type of double lookup
on-the-fly if it's ever faced with a non-btree amcanorder index AM.
If such a thing actually becomes interesting for production, we should
replace that logic with some more-direct method for identifying the
corresponding btree opfamily; but it's not worth spending effort on now.
There is considerably more to do pursuant to my recent proposal to get rid
of sort-operator-based representations of sort orderings, but this patch
grabs some of the low-hanging fruit. I'll look at the remainder of that
work after the current commitfest.
This patch eliminates various bizarre behaviors caused by sloppy thinking
about the difference between a domain type and its underlying array type.
In particular, the operation of updating one element of such an array
has to be considered as yielding a value of the underlying array type,
*not* a value of the domain, because there's no assurance that the
domain's CHECK constraints are still satisfied. If we're intending to
store the result back into a domain column, we have to re-cast to the
domain type so that constraints are re-checked.
For similar reasons, such a domain can't be blindly matched to an ANYARRAY
polymorphic parameter, because the polymorphic function is likely to apply
array-ish operations that could invalidate the domain constraints. For the
moment, we just forbid such matching. We might later wish to insert an
automatic downcast to the underlying array type, but such a change should
also change matching of domains to ANYELEMENT for consistency.
To ensure that all such logic is rechecked, this patch removes the original
hack of setting a domain's pg_type.typelem field to match its base type;
the typelem will always be zero instead. In those places where it's really
okay to look through the domain type with no other logic changes, use the
newly added get_base_element_type function in place of get_element_type.
catversion bumped due to change in pg_type contents.
Per bug #5717 from Richard Huxton and subsequent discussion.
The better estimate requires more statistics than we previously stored:
in particular, counts of "entry" versus "data" pages within the index,
as well as knowledge of the number of distinct key values. We collect
this information during initial index build and update it during VACUUM,
storing the info in new fields on the index metapage. No initdb is
required because these fields will read as zeroes in a pre-existing
index, and the new gincostestimate code is coded to behave (reasonably)
sanely if they are zeroes.
Teodor Sigaev, reviewed by Jan Urbanski, Tom Lane, and Itagaki Takahiro.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4). This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.
Design and review by Tom Lane.
This patch only supports seq_page_cost and random_page_cost as parameters,
but it provides the infrastructure to scalably support many more.
In particular, we may want to add support for effective_io_concurrency,
but I'm leaving that as future work for now.
Thanks to Tom Lane for design help and Alvaro Herrera for the review.
"column < constant", and the comparison value is in the first or last
histogram bin or outside the histogram entirely, try to fetch the actual
column min or max value using an index scan (if there is an index on the
column). If successful, replace the lower or upper histogram bound with
that value before carrying on with the estimate. This limits the
estimation error caused by moving min/max values when the comparison
value is close to the min or max. Per a complaint from Josh Berkus.
It is tempting to consider using this mechanism for mergejoinscansel as well,
but that would inject index fetches into main-line join estimation not just
endpoint cases. I'm refraining from that until we can get a better handle
on the costs of doing this type of lookup.
to be just a minor extension of the previous patch that made "x IS NULL"
indexable, because we can treat the IS NOT NULL condition as if it were
"x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option),
just like IS NULL is treated like "x = NULL". Aside from any possible
usefulness in its own right, this is an important improvement for
index-optimized MAX/MIN aggregates: it is now reliably possible to get
a column's min or max value cheaply, even when there are a lot of nulls
cluttering the interesting end of the index.
and teach ANALYZE to compute such stats for tables that have subclasses.
Per my proposal of yesterday.
autovacuum still needs to be taught about running ANALYZE on parent tables
when their subclasses change, but the feature is useful even without that.
style by default. Per discussion, there seems to be hardly anything that
really relies on being able to change the regex flavor, so the ability to
select it via embedded options ought to be enough for any stragglers.
Also, if we didn't remove the GUC, we'd really be morally obligated to
mark the regex functions non-immutable, which'd possibly create performance
issues.
Both hex format and the traditional "escape" format are automatically
handled on input. The output format is selected by the new GUC variable
bytea_output.
As committed, bytea_output defaults to HEX, which is an *incompatible
change*. We will keep it this way for awhile for testing purposes, but
should consider whether to switch to the more backwards-compatible
default of ESCAPE before 8.5 is released.
Peter Eisentraut
Stefan Kaltenbrunner. The most reasonable behavior (at least for the near
term) seems to be to ignore the PlaceHolderVar and examine its argument
instead. In support of this, change the API of pull_var_clause() to allow
callers to request recursion into PlaceHolderVars. Currently
estimate_num_groups() is the only customer for that behavior, but where
there's one there may be others.
unique for a particular query, if the index predicate is satisfied. This
requires a bit of reordering of operations so that we check the predicates
before doing any selectivity estimates, but shouldn't really cause any
noticeable slowdown. Per a comment from Michal Politowski.
inputs is unique or nearly so), make eqjoinsel() clamp the ndistinct estimates
to be not more than the estimated number of rows coming from the input
relations. This allows the estimate to change in response to the selectivity
of restriction conditions on the inputs.
This is a pretty narrow patch and maybe we should be more aggressive about
similarly clamping ndistinct in other cases; but I'm worried about
double-counting the effects of the restriction conditions. However, it seems
to help for the case exhibited by Grzegorz Jaskiewicz (antijoin against a
small subset of a relation), so let's try this for awhile.
that represent some expression that we desire to compute below the top level
of the plan, and then let that value "bubble up" as though it were a plain
Var (ie, a column value).
The immediate application is to allow sub-selects to be flattened even when
they are below an outer join and have non-nullable output expressions.
Formerly we couldn't flatten because such an expression wouldn't properly
go to NULL when evaluated above the outer join. Now, we wrap it in a
PlaceHolderVar and arrange for the actual evaluation to occur below the outer
join. When the resulting Var bubbles up through the join, it will be set to
NULL if necessary, yielding the correct results. This fixes a planner
limitation that's existed since 7.1.
In future we might want to use this mechanism to re-introduce some form of
Hellerstein's "expensive functions" optimization, ie place the evaluation of
an expensive function at the most suitable point in the plan tree.
applied to expression indexes, not to plain relations. The original coding
in btcostestimate conflated the two cases, but it's not hard to use
get_relation_stats_hook instead when we're looking to the underlying relation.
into nodes/nodeFuncs, so as to reduce wanton cross-subsystem #includes inside
the backend. There's probably more that should be done along this line,
but this is a start anyway.
and anti joins. To do this, pass the SpecialJoinInfo struct for the current
join as an additional optional argument to operator join selectivity
estimation functions. This allows the estimator to tell not only what kind
of join is being formed, but which variable is on which side of the join;
a requirement long recognized but not dealt with till now. This also leaves
the door open for future improvements in the estimators, such as accounting
for the null-insertion effects of lower outer joins. I didn't do anything
about that in the current patch but the information is in principle deducible
from what's passed.
The patch also clarifies the definition of join selectivity for semi/anti
joins: it's the fraction of the left input that has (at least one) match
in the right input. This allows getting rid of some very fuzzy thinking
that I had committed in the original 7.4-era IN-optimization patch.
There's probably room to estimate this better than the present patch does,
but at least we know what to estimate.
Since I had to touch CREATE OPERATOR anyway to allow a variant signature
for join estimator functions, I took the opportunity to add a couple of
additional checks that were missing, per my recent message to -hackers:
* Check that estimator functions return float8;
* Require execute permission at the time of CREATE OPERATOR on the
operator's function as well as the estimator functions;
* Require ownership of any pre-existing operator that's modified by
the command.
I also moved the lookup of the functions out of OperatorCreate() and
into operatorcmds.c, since that seemed more consistent with most of
the other catalog object creation processes, eg CREATE TYPE.
the old JOIN_IN code, but antijoins are new functionality.) Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins. Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation. That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch. So for the moment the output size estimates for semi and
especially anti joins are quite bogus.
results always contribute two groups, regardless of the expression contents.
This is very substantially more accurate than the regular heuristic for
certain boolean tests like "col IS NULL". Per gripe from Sam Mason.
Back-patch to all supported releases, since the behavior of
estimate_num_groups() hasn't changed all that much since 7.4.
unnecessary #include lines in it. Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.
For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.
While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
no particular need to do get_op_opfamily_properties() while building an
indexscan plan. Postpone that lookup until executor start. This simplifies
createplan.c a lot more than it complicates nodeIndexscan.c, and makes things
more uniform since we already had to do it that way for RowCompare
expressions. Should be a bit faster too, at least for plans that aren't
re-used many times, since we avoid palloc'ing and perhaps copying the
intermediate list data structure.
strings. This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString. A number of
existing macros that provided variants on these themes were removed.
Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed. There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).
This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.
Brendan Jurd and Tom Lane
pattern-examination heuristic method to purely histogram-driven selectivity at
histogram size 100, we compute both estimates and use a weighted average.
The weight put on the heuristic estimate decreases linearly with histogram
size, dropping to zero for 100 or more histogram entries.
Likewise in ltreeparentsel(). After a patch by Greg Stark, though I
reorganized the logic a bit to give the caller of histogram_selectivity()
more control.
of the generated range condition var >= 'foo' AND var < 'fop' as being less
than what eqsel() would estimate for var = 'foo'. This is intuitively
reasonable and it gets rid of the need for some entirely ad-hoc coding we
formerly used to reject bogus estimates. The basic problem here is that
if the prefix is more than a few characters long, the two boundary values
are too close together to be distinguishable by comparison to the column
histogram, resulting in a selectivity estimate of zero, which is often
not very sane. Change motivated by an example from Peter Eisentraut.
Arguably this is a bug fix, but I'll refrain from back-patching it
for the moment.