Deduce equality constraints that are implied by transitivity of

mergejoinable qual clauses, and add them to the query quals. For example, WHERE a = b AND b = c will cause us to add AND a = c. This is necessary to ensure that it's safe to use these variables as interchangeable sort keys, which is something 7.0 knows how to do. Should provide a useful improvement in planning ability, too.
2025-07-17 06:41:09 +03:00 · 2000-07-24 03:11:01 +00:00
parent c39c198bc3
commit cd9f0ca545
6 changed files with 398 additions and 158 deletions
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@ -7,7 +7,7 @@ actual output plan, the /path code generates all possible ways to join the
 tables, and /prep handles special cases like inheritance.  /util is utility
 stuff.  /geqo is the separate "genetic optimization" planner --- it does
 a semi-random search through the join tree space, rather than exhaustively
-considering all possible join trees.  (But each join considered by geqo
+considering all possible join trees.  (But each join considered by /geqo
 is given to /path to create paths for, so we consider all possible
 implementation paths for each specific join even in GEQO mode.)
@ -40,7 +40,7 @@ the WHERE clause "tab1.col1 = tab2.col1" generates a JoinInfo for tab1
 listing tab2 as an unjoined relation, and also one for tab2 showing tab1
 as an unjoined relation.
-If we have only a single base relation in the query, we are done here.
+If we have only a single base relation in the query, we are done now.
 Otherwise we have to figure out how to join the base relations into a
 single join relation.
@ -225,5 +225,185 @@ way, the next level up will have the maximum freedom to build mergejoins
 without sorting, since it can pick from any of the paths retained for its
 inputs.
-See path/pathkeys.c for an explanation of the PathKeys data structure that
+
-represents what is known about the sort order of a particular Path.
+PathKeys
 --------
 The PathKeys data structure represents what is known about the sort order
 of a particular Path.
 Path.pathkeys is a List of Lists of PathKeyItem nodes that represent
 the sort order of the result generated by the Path.  The n'th sublist
 represents the n'th sort key of the result.
 In single/base relation RelOptInfo's, the Paths represent various ways
 of scanning the relation and the resulting ordering of the tuples.
 Sequential scan Paths have NIL pathkeys, indicating no known ordering.
 Index scans have Path.pathkeys that represent the chosen index's ordering,
 if any.  A single-key index would create a pathkey with a single sublist,
 e.g. ( (tab1.indexkey1/sortop1) ).  A multi-key index generates a sublist
 per key, e.g. ( (tab1.indexkey1/sortop1) (tab1.indexkey2/sortop2) ) which
 shows major sort by indexkey1 (ordering by sortop1) and minor sort by
 indexkey2 with sortop2.
 Note that a multi-pass indexscan (OR clause scan) has NIL pathkeys since
 we can say nothing about the overall order of its result.  Also, an
 indexscan on an unordered type of index generates NIL pathkeys.  However,
 we can always create a pathkey by doing an explicit sort.  The pathkeys
 for a Sort plan's output just represent the sort key fields and the
 ordering operators used.
 Things get more interesting when we consider joins.  Suppose we do a
 mergejoin between A and B using the mergeclause A.X = B.Y.  The output
 of the mergejoin is sorted by X --- but it is also sorted by Y.  We
 represent this fact by listing both keys in a single pathkey sublist:
 ( (A.X/xsortop B.Y/ysortop) ).  This pathkey asserts that the major
 sort order of the Path can be taken to be *either* A.X or B.Y.
 They are equal, so they are both primary sort keys.  By doing this,
 we allow future joins to use either var as a pre-sorted key, so upper
 Mergejoins may be able to avoid having to re-sort the Path.  This is
 why pathkeys is a List of Lists.
 We keep a sortop associated with each PathKeyItem because cross-data-type
 mergejoins are possible; for example int4 = int8 is mergejoinable.
 In this case we need to remember that the left var is ordered by int4lt
 while the right var is ordered by int8lt.  So the different members of
 each sublist could have different sortops.
 Note that while the order of the top list is meaningful (primary vs.
 secondary sort key), the order of each sublist is arbitrary.  Each sublist
 should be regarded as a set of equivalent keys, with no significance
 to the list order.
 With a little further thought, it becomes apparent that pathkeys for
 joins need not only come from mergejoins.  For example, if we do a
 nestloop join between outer relation A and inner relation B, then any
 pathkeys relevant to A are still valid for the join result: we have
 not altered the order of the tuples from A.  Even more interesting,
 if there was a mergeclause (more formally, an "equijoin clause") A.X=B.Y,
 and A.X was a pathkey for the outer relation A, then we can assert that
 B.Y is a pathkey for the join result; X was ordered before and still is,
 and the joined values of Y are equal to the joined values of X, so Y
 must now be ordered too.  This is true even though we used neither an
 explicit sort nor a mergejoin on Y.
 More generally, whenever we have an equijoin clause A.X = B.Y and a
 pathkey A.X, we can add B.Y to that pathkey if B is part of the joined
 relation the pathkey is for, *no matter how we formed the join*.  It works
 as long as the clause has been applied at some point while forming the
 join relation.  (In the current implementation, we always apply qual
 clauses as soon as possible, ie, as far down in the plan tree as possible.
 So we can always make this deduction.  If we postponed filtering by qual
 clauses then we'd not be able to assume pathkey equivalence until after
 the equality check(s) had been applied.)
 In short, then: when producing the pathkeys for a merge or nestloop join,
 we can keep all of the keys of the outer path, since the ordering of the
 outer path will be preserved in the result.  Furthermore, we can add to
 each pathkey sublist any inner vars that are equijoined to any of the
 outer vars in the sublist; this works regardless of whether we are
 implementing the join using that equijoin clause as a mergeclause,
 or merely enforcing the clause after-the-fact as a qpqual filter.
 Although Hashjoins also work only with equijoin operators, it is *not*
 safe to consider the output of a Hashjoin to be sorted in any particular
 order --- not even the outer path's order.  This is true because the
 executor might have to split the join into multiple batches.  Therefore
 a Hashjoin is always given NIL pathkeys.  (Also, we need to use only
 mergejoinable operators when deducing which inner vars are now sorted,
 because a mergejoin operator tells us which left- and right-datatype
 sortops can be considered equivalent, whereas a hashjoin operator
 doesn't imply anything about sort order.)
 Pathkeys are also useful to represent an ordering that we wish to achieve,
 since they are easily compared to the pathkeys of a potential candidate
 path.  So, SortClause lists are turned into pathkeys lists for use inside
 the optimizer.
 OK, now for how it *really* works:
 We did implement pathkeys just as described above, and found that the
 planner spent a huge amount of time comparing pathkeys, because the
 representation of pathkeys as unordered lists made it expensive to decide
 whether two were equal or not.  So, we've modified the representation
 as described next.
 If we scan the WHERE clause for equijoin clauses (mergejoinable clauses)
 during planner startup, we can construct lists of equivalent pathkey items
 for the query.  There could be more than two items per equivalence set;
 for example, WHERE A.X = B.Y AND B.Y = C.Z AND D.R = E.S creates the
 equivalence sets { A.X B.Y C.Z } and { D.R E.S } (plus associated sortops).
 Any pathkey item that belongs to an equivalence set implies that all the
 other items in its set apply to the relation too, or at least all the ones
 that are for fields present in the relation.  (Some of the items in the
 set might be for as-yet-unjoined relations.)  Furthermore, any multi-item
 pathkey sublist that appears at any stage of planning the query *must* be
 a subset of one or another of these equivalence sets; there's no way we'd
 have put two items in the same pathkey sublist unless they were equijoined
 in WHERE.
 Now suppose that we allow a pathkey sublist to contain pathkey items for
 vars that are not yet part of the pathkey's relation.  This introduces
 no logical difficulty, because such items can easily be seen to be
 irrelevant; we just mandate that they be ignored.  But having allowed
 this, we can declare (by fiat) that any multiple-item pathkey sublist
 must be "equal()" to the appropriate equivalence set.  In effect,
 whenever we make a pathkey sublist that mentions any var appearing in an
 equivalence set, we instantly add all the other vars equivalenced to it,
 whether they appear yet in the pathkey's relation or not.  And we also
 mandate that the pathkey sublist appear in the same order as the
 equivalence set it comes from.  (In practice, we simply return a pointer
 to the relevant equivalence set without building any new sublist at all.
 Each equivalence set becomes a "canonical pathkey" for all its members.)
 This makes comparing pathkeys very simple and fast, and saves a lot of
 work and memory space for pathkey construction as well.
 Note that pathkey sublists having just one item still exist, and are
 not expected to be equal() to any equivalence set.  This occurs when
 we describe a sort order that involves a var that's not mentioned in
 any equijoin clause of the WHERE.  We could add singleton sets containing
 such vars to the query's list of equivalence sets, but there's little
 point in doing so.
 By the way, it's OK and even useful for us to build equivalence sets
 that mention multiple vars from the same relation.  For example, if
 we have WHERE A.X = A.Y and we are scanning A using an index on X,
 we can legitimately conclude that the path is sorted by Y as well;
 and this could be handy if Y is the variable used in other join clauses
 or ORDER BY.  So, any WHERE clause with a mergejoinable operator can
 contribute to an equivalence set, even if it's not a join clause.
 As sketched so far, equijoin operators allow us to conclude that
 A.X = B.Y and B.Y = C.Z together imply A.X = C.Z, even when different
 datatypes are involved.  What is not immediately obvious is that to use
 the "canonical pathkey" representation, we *must* make this deduction.
 An example (from a real bug in Postgres 7.0) is a mergejoin for a query
 like
 	SELECT * FROM t1, t2 WHERE t1.f2 = t2.f3 AND t1.f1 = t2.f3;
 The canonical-pathkey mechanism is able to deduce that t1.f1 = t1.f2
 (ie, both appear in the same canonical pathkey set).  If we sort t1
 and then apply a mergejoin, we *must* filter the t1 tuples using the
 implied qualification f1 = f2, because otherwise the output of the sort
 will be ordered by f1 or f2 (whichever we sort on) but not both.  The
 merge will then fail since (depending on which qual clause it applies
 first) it's expecting either ORDER BY f1,f2 or ORDER BY f2,f1, but the
 actual output of the sort has neither of these orderings.  The best fix
 for this is to generate all the implied equality constraints for each
 equijoin set and add these clauses to the query's qualification list.
 In other words, we *explicitly* deduce f1 = f2 and add this to the WHERE
 clause.  The constraint will be applied as a qpqual to the output of the
 scan on t1, resulting in sort output that is indeed ordered by both vars.
 This approach provides more information to the selectivity estimation
 code than it would otherwise have, and reduces the number of tuples
 processed in join stages, so it's a win to make these deductions even
 if we weren't forced to.
 Yet another implication of all this is that mergejoinable operators
 must form closed equivalence sets.  For example, if "int2 = int4"
 and "int4 = int8" are both marked mergejoinable, then there had better
 be a mergejoinable "int2 = int8" operator as well.  Otherwise, when
 we're given WHERE int2var = int4var AND int4var = int8var, we'll fail
 while trying to create a representation of the implied clause
 int2var = int8var.
 -- bjm & tgl
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@ -3,12 +3,15 @@
 * pathkeys.c
 *	  Utilities for matching and building path keys
 *
 * See src/backend/optimizer/README for a great deal of information about
 * the nature and use of path keys.
 *
 *
 * Portions Copyright (c) 1996-2000, PostgreSQL, Inc
 * Portions Copyright (c) 1994, Regents of the University of California
 *
 *
 * IDENTIFICATION
- *	  $Header: /cvsroot/pgsql/src/backend/optimizer/path/pathkeys.c,v 1.22 2000/05/30 00:49:47 momjian Exp $
+ *	  $Header: /cvsroot/pgsql/src/backend/optimizer/path/pathkeys.c,v 1.23 2000/07/24 03:10:56 tgl Exp $
 *
 *-------------------------------------------------------------------------
 */
@ -18,156 +21,17 @@
 #include "optimizer/clauses.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
 #include "optimizer/tlist.h"
 #include "parser/parsetree.h"
 #include "parser/parse_func.h"
 #include "utils/lsyscache.h"
 static PathKeyItem *makePathKeyItem(Node *key, Oid sortop);
 static List *make_canonical_pathkey(Query *root, PathKeyItem *item);
 static Var *find_indexkey_var(Query *root, RelOptInfo *rel,
-				  AttrNumber varattno);
+							  AttrNumber varattno);
 /*--------------------
 *	Explanation of Path.pathkeys
 *
 *	Path.pathkeys is a List of Lists of PathKeyItem nodes that represent
 *	the sort order of the result generated by the Path.  The n'th sublist
 *	represents the n'th sort key of the result.
 *
 *	In single/base relation RelOptInfo's, the Paths represent various ways
 *	of scanning the relation and the resulting ordering of the tuples.
 *	Sequential scan Paths have NIL pathkeys, indicating no known ordering.
 *	Index scans have Path.pathkeys that represent the chosen index's ordering,
 *	if any.  A single-key index would create a pathkey with a single sublist,
 *	e.g. ( (tab1.indexkey1/sortop1) ).	A multi-key index generates a sublist
 *	per key, e.g. ( (tab1.indexkey1/sortop1) (tab1.indexkey2/sortop2) ) which
 *	shows major sort by indexkey1 (ordering by sortop1) and minor sort by
 *	indexkey2 with sortop2.
 *
 *	Note that a multi-pass indexscan (OR clause scan) has NIL pathkeys since
 *	we can say nothing about the overall order of its result.  Also, an
 *	indexscan on an unordered type of index generates NIL pathkeys.  However,
 *	we can always create a pathkey by doing an explicit sort.  The pathkeys
 *	for a sort plan's output just represent the sort key fields and the
 *	ordering operators used.
 *
 *	Things get more interesting when we consider joins.  Suppose we do a
 *	mergejoin between A and B using the mergeclause A.X = B.Y.	The output
 *	of the mergejoin is sorted by X --- but it is also sorted by Y.  We
 *	represent this fact by listing both keys in a single pathkey sublist:
 *	( (A.X/xsortop B.Y/ysortop) ).	This pathkey asserts that the major
 *	sort order of the Path can be taken to be *either* A.X or B.Y.
 *	They are equal, so they are both primary sort keys.  By doing this,
 *	we allow future joins to use either var as a pre-sorted key, so upper
 *	Mergejoins may be able to avoid having to re-sort the Path.  This is
 *	why pathkeys is a List of Lists.
 *
 *	We keep a sortop associated with each PathKeyItem because cross-data-type
 *	mergejoins are possible; for example int4 = int8 is mergejoinable.
 *	In this case we need to remember that the left var is ordered by int4lt
 *	while the right var is ordered by int8lt.  So the different members of
 *	each sublist could have different sortops.
 *
 *	Note that while the order of the top list is meaningful (primary vs.
 *	secondary sort key), the order of each sublist is arbitrary.  Each sublist
 *	should be regarded as a set of equivalent keys, with no significance
 *	to the list order.
 *
 *	With a little further thought, it becomes apparent that pathkeys for
 *	joins need not only come from mergejoins.  For example, if we do a
 *	nestloop join between outer relation A and inner relation B, then any
 *	pathkeys relevant to A are still valid for the join result: we have
 *	not altered the order of the tuples from A.  Even more interesting,
 *	if there was a mergeclause (more formally, an "equijoin clause") A.X=B.Y,
 *	and A.X was a pathkey for the outer relation A, then we can assert that
 *	B.Y is a pathkey for the join result; X was ordered before and still is,
 *	and the joined values of Y are equal to the joined values of X, so Y
 *	must now be ordered too.  This is true even though we used no mergejoin.
 *
 *	More generally, whenever we have an equijoin clause A.X = B.Y and a
 *	pathkey A.X, we can add B.Y to that pathkey if B is part of the joined
 *	relation the pathkey is for, *no matter how we formed the join*.
 *
 *	In short, then: when producing the pathkeys for a merge or nestloop join,
 *	we can keep all of the keys of the outer path, since the ordering of the
 *	outer path will be preserved in the result.  Furthermore, we can add to
 *	each pathkey sublist any inner vars that are equijoined to any of the
 *	outer vars in the sublist; this works regardless of whether we are
 *	implementing the join using that equijoin clause as a mergeclause,
 *	or merely enforcing the clause after-the-fact as a qpqual filter.
 *
 *	Although Hashjoins also work only with equijoin operators, it is *not*
 *	safe to consider the output of a Hashjoin to be sorted in any particular
 *	order --- not even the outer path's order.  This is true because the
 *	executor might have to split the join into multiple batches.  Therefore
 *	a Hashjoin is always given NIL pathkeys.  (Also, we need to use only
 *	mergejoinable operators when deducing which inner vars are now sorted,
 *	because a mergejoin operator tells us which left- and right-datatype
 *	sortops can be considered equivalent, whereas a hashjoin operator
 *	doesn't imply anything about sort order.)
 *
 *	Pathkeys are also useful to represent an ordering that we wish to achieve,
 *	since they are easily compared to the pathkeys of a potential candidate
 *	path.  So, SortClause lists are turned into pathkeys lists for use inside
 *	the optimizer.
 *
 *	OK, now for how it *really* works:
 *
 *	We did implement pathkeys just as described above, and found that the
 *	planner spent a huge amount of time comparing pathkeys, because the
 *	representation of pathkeys as unordered lists made it expensive to decide
 *	whether two were equal or not.	So, we've modified the representation
 *	as described next.
 *
 *	If we scan the WHERE clause for equijoin clauses (mergejoinable clauses)
 *	during planner startup, we can construct lists of equivalent pathkey items
 *	for the query.	There could be more than two items per equivalence set;
 *	for example, WHERE A.X = B.Y AND B.Y = C.Z AND D.R = E.S creates the
 *	equivalence sets { A.X B.Y C.Z } and { D.R E.S } (plus associated sortops).
 *	Any pathkey item that belongs to an equivalence set implies that all the
 *	other items in its set apply to the relation too, or at least all the ones
 *	that are for fields present in the relation.  (Some of the items in the
 *	set might be for as-yet-unjoined relations.)  Furthermore, any multi-item
 *	pathkey sublist that appears at any stage of planning the query *must* be
 *	a subset of one or another of these equivalence sets; there's no way we'd
 *	have put two items in the same pathkey sublist unless they were equijoined
 *	in WHERE.
 *
 *	Now suppose that we allow a pathkey sublist to contain pathkey items for
 *	vars that are not yet part of the pathkey's relation.  This introduces
 *	no logical difficulty, because such items can easily be seen to be
 *	irrelevant; we just mandate that they be ignored.  But having allowed
 *	this, we can declare (by fiat) that any multiple-item pathkey sublist
 *	must be equal() to the appropriate equivalence set.  In effect, whenever
 *	we make a pathkey sublist that mentions any var appearing in an
 *	equivalence set, we instantly add all the other vars equivalenced to it,
 *	whether they appear yet in the pathkey's relation or not.  And we also
 *	mandate that the pathkey sublist appear in the same order as the
 *	equivalence set it comes from.	(In practice, we simply return a pointer
 *	to the relevant equivalence set without building any new sublist at all.)
 *	This makes comparing pathkeys very simple and fast, and saves a lot of
 *	work and memory space for pathkey construction as well.
 *
 *	Note that pathkey sublists having just one item still exist, and are
 *	not expected to be equal() to any equivalence set.	This occurs when
 *	we describe a sort order that involves a var that's not mentioned in
 *	any equijoin clause of the WHERE.  We could add singleton sets containing
 *	such vars to the query's list of equivalence sets, but there's little
 *	point in doing so.
 *
 *	By the way, it's OK and even useful for us to build equivalence sets
 *	that mention multiple vars from the same relation.	For example, if
 *	we have WHERE A.X = A.Y and we are scanning A using an index on X,
 *	we can legitimately conclude that the path is sorted by Y as well;
 *	and this could be handy if Y is the variable used in other join clauses
 *	or ORDER BY.  So, any WHERE clause with a mergejoinable operator can
 *	contribute to an equivalence set, even if it's not a join clause.
 *
 *	-- bjm & tgl
 *--------------------
 */
 /*
@ -225,35 +89,107 @@ add_equijoined_keys(Query *root, RestrictInfo *restrictinfo)
 	 * into our new set. When done, we add the new set to the front of
 	 * equi_key_list.
 	 *
 	 * It may well be that the two items we're given are already known to
 	 * be equijoin-equivalent, in which case we don't need to change our
 	 * data structure.  If we find both of them in the same equivalence
 	 * set to start with, we can quit immediately.
 	 *
 	 * This is a standard UNION-FIND problem, for which there exist better
 	 * data structures than simple lists.  If this code ever proves to be
 	 * a bottleneck then it could be sped up --- but for now, simple is
 	 * beautiful.
 	 */
-	newset = lcons(item1, lcons(item2, NIL));
+	newset = NIL;
 	foreach(cursetlink, root->equi_key_list)
 	{
 		List	   *curset = lfirst(cursetlink);
 		bool		item1here = member(item1, curset);
 		bool		item2here = member(item2, curset);
-		if (member(item1, curset) || member(item2, curset))
+		if (item1here || item2here)
 		{
 			/* If find both in same equivalence set, no need to do any more */
 			if (item1here && item2here)
 			{
 				/* Better not have seen only one in an earlier set... */
 				Assert(newset == NIL);
 				return;
 			}
 			/* Build the new set only when we know we must */
 			if (newset == NIL)
 				newset = lcons(item1, lcons(item2, NIL));
 			/* Found a set to merge into our new set */
 			newset = LispUnion(newset, curset);
 			/*
 			 * Remove old set from equi_key_list.  NOTE this does not
-			 * change lnext(cursetlink), so the outer foreach doesn't
+			 * change lnext(cursetlink), so the foreach loop doesn't break.
 			 * break.
 			 */
 			root->equi_key_list = lremove(curset, root->equi_key_list);
 			freeList(curset);	/* might as well recycle old cons cells */
 		}
 	}
 	/* Build the new set only when we know we must */
 	if (newset == NIL)
 		newset = lcons(item1, lcons(item2, NIL));
 	root->equi_key_list = lcons(newset, root->equi_key_list);
 }
 /*
 * generate_implied_equalities
 *	  Scan the completed equi_key_list for the query, and generate explicit
 *	  qualifications (WHERE clauses) for all the pairwise equalities not
 *	  already mentioned in the quals.  This is useful because the additional
 *	  clauses help the selectivity-estimation code, and in fact it's
 *	  *necessary* to ensure that sort keys we think are equivalent really
 *	  are (see src/backend/optimizer/README for more info).
 *
 * This routine just walks the equi_key_list to find all pairwise equalities.
 * We call process_implied_equality (in plan/initsplan.c) to determine whether
 * each is already known and add it to the proper restrictinfo list if not.
 */
 void
 generate_implied_equalities(Query *root)
 {
 	List	   *cursetlink;
 	foreach(cursetlink, root->equi_key_list)
 	{
 		List	   *curset = lfirst(cursetlink);
 		List	   *ptr1;
 		/*
 		 * A set containing only two items cannot imply any equalities
 		 * beyond the one that created the set, so we can skip it.
 		 */
 		if (length(curset) < 3)
 			continue;
 		/*
 		 * Match each item in the set with all that appear after it
 		 * (it's sufficient to generate A=B, need not process B=A too).
 		 */
 		foreach(ptr1, curset)
 		{
 			PathKeyItem *item1 = (PathKeyItem *) lfirst(ptr1);
 			List	   *ptr2;
 			foreach(ptr2, lnext(ptr1))
 			{
 				PathKeyItem *item2 = (PathKeyItem *) lfirst(ptr2);
 				process_implied_equality(root, item1->key, item2->key,
 										 item1->sortop, item2->sortop);
 			}
 		}
 	}
 }
 /*
 * make_canonical_pathkey
 *	  Given a PathKeyItem, find the equi_key_list subset it is a member of,
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@ -8,13 +8,14 @@
 *
 *
 * IDENTIFICATION
- *	  $Header: /cvsroot/pgsql/src/backend/optimizer/plan/initsplan.c,v 1.46 2000/04/12 17:15:21 momjian Exp $
+ *	  $Header: /cvsroot/pgsql/src/backend/optimizer/plan/initsplan.c,v 1.47 2000/07/24 03:11:01 tgl Exp $
 *
 *-------------------------------------------------------------------------
 */
 #include <sys/types.h>
 #include "postgres.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "optimizer/clauses.h"
@ -25,6 +26,9 @@
 #include "optimizer/planmain.h"
 #include "optimizer/tlist.h"
 #include "optimizer/var.h"
 #include "parser/parse_expr.h"
 #include "parser/parse_oper.h"
 #include "parser/parse_type.h"
 #include "utils/lsyscache.h"
@ -122,6 +126,7 @@ add_missing_rels_to_query(Query *root)
 	}
 }
 /*****************************************************************************
 *
 *	  QUALIFICATIONS
@ -129,7 +134,6 @@ add_missing_rels_to_query(Query *root)
 *****************************************************************************/
 /*
 * add_restrict_and_join_to_rels
 *	  Fill RestrictInfo and JoinInfo lists of relation entries for all
@ -280,6 +284,113 @@ add_join_info_to_rels(Query *root, RestrictInfo *restrictinfo,
 	}
 }
 /*
 * process_implied_equality
 *	  Check to see whether we already have a restrictinfo item that says
 *	  item1 = item2, and create one if not.  This is a consequence of
 *	  transitivity of mergejoin equality: if we have mergejoinable
 *	  clauses A = B and B = C, we can deduce A = C (where = is an
 *	  appropriate mergejoinable operator).
 */
 void
 process_implied_equality(Query *root, Node *item1, Node *item2,
 						 Oid sortop1, Oid sortop2)
 {
 	Index		irel1;
 	Index		irel2;
 	RelOptInfo *rel1;
 	List	   *restrictlist;
 	List	   *itm;
 	Oid			ltype,
 				rtype;
 	Operator	eq_operator;
 	Form_pg_operator pgopform;
 	Expr	   *clause;
 	/*
 	 * Currently, since check_mergejoinable only accepts Var = Var clauses,
 	 * we should only see Var nodes here.  Would have to work a little
 	 * harder to locate the right rel(s) if more-general mergejoin clauses
 	 * were accepted.
 	 */
 	Assert(IsA(item1, Var));
 	irel1 = ((Var *) item1)->varno;
 	Assert(IsA(item2, Var));
 	irel2 = ((Var *) item2)->varno;
 	/*
 	 * If both vars belong to same rel, we need to look at that rel's
 	 * baserestrictinfo list.  If different rels, each will have a
 	 * joininfo node for the other, and we can scan either list.
 	 */
 	rel1 = get_base_rel(root, irel1);
 	if (irel1 == irel2)
 		restrictlist = rel1->baserestrictinfo;
 	else
 	{
 		JoinInfo   *joininfo = find_joininfo_node(rel1,
 												  lconsi(irel2, NIL));
 		restrictlist = joininfo->jinfo_restrictinfo;
 	}
 	/*
 	 * Scan to see if equality is already known.
 	 */
 	foreach(itm, restrictlist)
 	{
 		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(itm);
 		Node	   *left,
 				   *right;
 		if (restrictinfo->mergejoinoperator == InvalidOid)
 			continue;			/* ignore non-mergejoinable clauses */
 		/* We now know the restrictinfo clause is a binary opclause */
 		left = (Node *) get_leftop(restrictinfo->clause);
 		right = (Node *) get_rightop(restrictinfo->clause);
 		if ((equal(item1, left) && equal(item2, right)) ||
 			(equal(item2, left) && equal(item1, right)))
 			return;				/* found a matching clause */
 	}
 	/*
 	 * This equality is new information, so construct a clause
 	 * representing it to add to the query data structures.
 	 */
 	ltype = exprType(item1);
 	rtype = exprType(item2);
 	eq_operator = oper("=", ltype, rtype, true);
 	if (!HeapTupleIsValid(eq_operator))
 	{
 		/*
 		 * Would it be safe to just not add the equality to the query if
 		 * we have no suitable equality operator for the combination of
 		 * datatypes?  NO, because sortkey selection may screw up anyway.
 		 */
 		elog(ERROR, "Unable to identify an equality operator for types '%s' and '%s'",
 			 typeidTypeName(ltype), typeidTypeName(rtype));
 	}
 	pgopform = (Form_pg_operator) GETSTRUCT(eq_operator);
 	/*
 	 * Let's just make sure this appears to be a compatible operator.
 	 */
 	if (pgopform->oprlsortop != sortop1 ||
 		pgopform->oprrsortop != sortop2 ||
 		pgopform->oprresult != BOOLOID)
 		elog(ERROR, "Equality operator for types '%s' and '%s' should be mergejoinable, but isn't",
 			 typeidTypeName(ltype), typeidTypeName(rtype));
 	clause = makeNode(Expr);
 	clause->typeOid = BOOLOID;
 	clause->opType = OP_EXPR;
 	clause->oper = (Node *) makeOper(oprid(eq_operator), /* opno */
 									 InvalidOid, /* opid */
 									 BOOLOID, /* operator result type */
 									 0,
 									 NULL);
 	clause->args = lcons(item1, lcons(item2, NIL));
 	add_restrict_and_join_to_rel(root, (Node *) clause);
 }
 /*****************************************************************************
 *
 *	 CHECKS FOR MERGEJOINABLE AND HASHJOINABLE CLAUSES
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@ -14,7 +14,7 @@
 *
 *
 * IDENTIFICATION
- *	  $Header: /cvsroot/pgsql/src/backend/optimizer/plan/planmain.c,v 1.55 2000/04/12 17:15:22 momjian Exp $
+ *	  $Header: /cvsroot/pgsql/src/backend/optimizer/plan/planmain.c,v 1.56 2000/07/24 03:11:01 tgl Exp $
 *
 *-------------------------------------------------------------------------
 */
@ -184,7 +184,7 @@ subplanner(Query *root,
 	 * base_rel_list as relation references are found (e.g., in the
 	 * qualification, the targetlist, etc.).  Restrict and join clauses
 	 * are added to appropriate lists belonging to the mentioned
-	 * relations, and we also build lists of equijoined keys for pathkey
+	 * relations.  We also build lists of equijoined keys for pathkey
 	 * construction.
 	 */
 	root->base_rel_list = NIL;
@ -193,8 +193,18 @@ subplanner(Query *root,
 	make_var_only_tlist(root, flat_tlist);
 	add_restrict_and_join_to_rels(root, qual);
 	/*
 	 * Make sure we have RelOptInfo nodes for all relations used.
 	 */
 	add_missing_rels_to_query(root);
 	/*
 	 * Use the completed lists of equijoined keys to deduce any implied
 	 * but unstated equalities (for example, A=B and B=C imply A=C).
 	 */
 	generate_implied_equalities(root);
 	/*
 	 * We should now have all the pathkey equivalence sets built, so it's
 	 * now possible to convert the requested query_pathkeys to canonical
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@ -8,7 +8,7 @@
 * Portions Copyright (c) 1996-2000, PostgreSQL, Inc
 * Portions Copyright (c) 1994, Regents of the University of California
 *
- * $Id: paths.h,v 1.45 2000/05/31 00:28:38 petere Exp $
+ * $Id: paths.h,v 1.46 2000/07/24 03:10:54 tgl Exp $
 *
 *-------------------------------------------------------------------------
 */
@ -90,6 +90,7 @@ typedef enum
 } PathKeysComparison;
 extern void add_equijoined_keys(Query *root, RestrictInfo *restrictinfo);
 extern void generate_implied_equalities(Query *root);
 extern List *canonicalize_pathkeys(Query *root, List *pathkeys);
 extern PathKeysComparison compare_pathkeys(List *keys1, List *keys2);
 extern bool pathkeys_contained_in(List *keys1, List *keys2);
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@ -7,7 +7,7 @@
 * Portions Copyright (c) 1996-2000, PostgreSQL, Inc
 * Portions Copyright (c) 1994, Regents of the University of California
 *
- * $Id: planmain.h,v 1.42 2000/06/18 22:44:33 tgl Exp $
+ * $Id: planmain.h,v 1.43 2000/07/24 03:10:54 tgl Exp $
 *
 *-------------------------------------------------------------------------
 */
@ -43,6 +43,8 @@ extern Result *make_result(List *tlist, Node *resconstantqual, Plan *subplan);
 extern void make_var_only_tlist(Query *root, List *tlist);
 extern void add_restrict_and_join_to_rels(Query *root, List *clauses);
 extern void add_missing_rels_to_query(Query *root);
 extern void process_implied_equality(Query *root, Node *item1, Node *item2,
 									 Oid sortop1, Oid sortop2);
 /*
 * prototypes for plan/setrefs.c