If there's anyone out there who's actually using datatype-defined
default values, this will be an incompatible change in behavior ...
but the old behavior was so broken that I doubt anyone was using it.
pgsql-hackers. pg_opclass now has a row for each opclass supported by each
index AM, not a row for each opclass name. This allows pg_opclass to show
directly whether an AM supports an opclass, and furthermore makes it possible
to store additional information about an opclass that might be AM-dependent.
pg_opclass and pg_amop now store "lossy" and "haskeytype" information that we
previously expected the user to remember to provide in CREATE INDEX commands.
Lossiness is no longer an index-level property, but is associated with the
use of a particular operator in a particular index opclass.
Along the way, IndexSupportInitialize now uses the syscaches to retrieve
pg_amop and pg_amproc entries. I find this reduces backend launch time by
about ten percent, at the cost of a couple more special cases in catcache.c's
IndexScanOK.
Initial work by Oleg Bartunov and Teodor Sigaev, further hacking by Tom Lane.
initdb forced.
a tad sloppy about generating the targetlist for some nodes, by generating
a tlist entry that claimed to be a constant when the value wasn't actually
constant. This caused setrefs.c to do the wrong thing later on.
clauses are equal(), before trying to match them up using btree opclass
inference rules. This allows it to recognize many simple cases involving
non-btree operations, for example 'x IS NULL'. Clean up code a little.
more care with resjunk tlist entries than it was doing. The original
coding ignored resjunk entries entirely, but a resjunk entry that is
in either the distinctClause or sortClause lists indicates that DISTINCT
ON was used. It's important for ruleutils.c to get this right, else we
may dump views using DISTINCT ON incorrectly.
has a DISTINCT ON clause, per bug report from Anthony Wood. While at it,
improve the DISTINCT-ON-clause recognizer routine to not be fooled by out-
of-order DISTINCT lists.
Note: I didn't force an initdb, figuring that one today was enough.
However, there is a new function in pg_proc.h, and pg_dump won't be
able to dump partial indexes until you add that function.
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
IS TRUE, etc, with some degree of verisimilitude. Split out
selectivity support functions from builtins.h into a new header
file selfuncs.h, so as to reduce the number of header files builtins.h
must depend on. Fix a few missing inclusions exposed thereby.
From Joe Conway, with some kibitzing from Tom Lane.
tests to return the correct results per SQL9x when given NULL inputs.
Reimplement these tests as well as IS [NOT] NULL to have their own
expression node types, instead of depending on special functions.
From Joe Conway, with a little help from Tom Lane.
should be computed from total number of distinct values in whole
relation, not # distinct values we expect to have after restriction
clauses are applied.
WHERE (a = 1 or a = 2) and b = 42
and an index on (a,b), include the clause b = 42 in the indexquals
generated for each arm of the OR clause. Essentially this is an index-
driven conversion from CNF to DNF. Implementation is a bit klugy, but
better than not exploiting the extra quals at all ...
of costsize.c routines to pass Query root, so that costsize can figure
more things out by itself and not be so dependent on its callers to tell
it everything it needs to know. Use selectivity of hash or merge clause
to estimate number of tuples processed internally in these joins
(this is more useful than it would've been before, since eqjoinsel is
somewhat more accurate than before).
create_index_paths are not immediately discarded, but are available for
subsequent planner work. This allows avoiding redundant syscache lookups
in several places. Change interface to operator selectivity estimation
procedures to allow faster and more flexible estimation.
Initdb forced due to change of pg_proc entries for selectivity functions!
collected by ANALYZE. Also, add some modest amount of intelligence to
guesses that are used for varlena columns in the absence of any ANALYZE
statistics. The 'width' reported by EXPLAIN is finally something less
than totally bogus for varlena columns ... and, in consequence, hashjoin
estimating should be a little better ...
a separate statement (though it can still be invoked as part of VACUUM, too).
pg_statistic redesigned to be more flexible about what statistics are
stored. ANALYZE now collects a list of several of the most common values,
not just one, plus a histogram (not just the min and max values). Random
sampling is used to make the process reasonably fast even on very large
tables. The number of values and histogram bins collected is now
user-settable via an ALTER TABLE command.
There is more still to do; the new stats are not being used everywhere
they could be in the planner. But the remaining changes for this project
should be localized, and the behavior is already better than before.
A not-very-related change is that sorting now makes use of btree comparison
routines if it can find one, rather than invoking '<' twice.
join. This is needed to avoid improper evaluation of expressions that
should be nulled out, as in Victor Wagner's bug report of 4/27/01.
Pretty ugly solution, but no time to do anything better for 7.1.1.
to specific base or join RelOptInfo nodes during planning. This preserves
the more-intuitive behavior of 7.0.* --- if you write an expensive clause
(such as a sub-select) last, it should get evaluated last. Someday we
ought to try to have some intelligence about the order of evaluation of
WHERE clauses, but for now we should not override what the user wrote.
join clauses. The mergejoin executor wants all the join clauses to appear
as merge quals, not as extra joinquals, for these kinds of joins. But the
planner would consider plans in which partially-sorted input paths were
used, leading to only some of the join clauses becoming merge quals.
This is fine for inner/left joins, not fine for right/full joins.
inheritance query: make duplicate copies of subplans in adjust_inherited_attrs.
When we redesign querytrees we really gotta do something about this
issue of whether querytrees are read-only and can share substructure
or not.
1. If there is exactly one pg_operator entry of the right name and oprkind,
oper() and related routines would return that entry whether its input type
had anything to do with the request or not. This is just premature
optimization: we shouldn't return the single candidate until after we verify
that it really is a valid candidate, ie, is at least coercion-compatible
with the given types.
2. oper() and related routines only promise a coercion-compatible result.
Unfortunately, there were quite a few callers that assumed the returned
operator is binary-compatible with the given datatype; they would proceed
to call it without making any datatype coercions. These callers include
sorting, grouping, aggregation, and VACUUM ANALYZE. In general I think
it is appropriate for these callers to require an exact or binary-compatible
match, so I've added a new routine compatible_oper() that only succeeds if
it can find an operator that doesn't require any run-time conversions.
Callers now call oper() or compatible_oper() depending on whether they are
prepared to deal with type conversion or not.
The upshot of these bugs is revealed by the following silliness in PL/Tcl's
selftest: it creates an operator @< on int4, and then tries to use it to
sort a char(N) column. The system would let it do that :-( (and evidently
has done so since 6.3 :-( :-(). The result in this case was just a silly
sort order, but the reverse combination would've provoked coredump from
trying to dereference integers. With this fix you get more reasonable
behavior:
pltcl_test=# select * from T_pkey1 order by key1, key2 using @<;
ERROR: Unable to identify an operator '@<' for types 'bpchar' and 'bpchar'
You will have to retype this query using an explicit cast
try to push restrictions on the view down into the view subquery,
so that they can become indexscan quals or what-have-you rather than
being applied at the top level of the subquery. 7.0 and before were
able to do this, though in a much klugier way, and I'd hate to have
anyone complaining that 7.1 is stupider than 7.0 ...
as both a GROUP BY item and an output expression, the top-level Group
node should just copy up the evaluated expression value from its input,
rather than re-evaluating the expression. Aside from any performance
benefit this might offer, this avoids a crash when there is a sub-SELECT
in said expression.
comparison does not consider paths different when they differ only in
uninteresting aspects of sort order. (We had a special case of this
consideration for indexscans already, but generalize it to apply to
ordered join paths too.) Be stricter about what is a canonical pathkey
to allow faster pathkey comparison. Cache canonical pathkeys and
dispersion stats for left and right sides of a RestrictInfo's clause,
to avoid repeated computation. Total speedup will depend on number of
tables in a query, but I see about 4x speedup of planning phase for
a sample seven-table query.