1
0
mirror of https://github.com/postgres/postgres.git synced 2025-10-28 11:55:03 +03:00

Revise generation of hashjoin paths: generate one path per

hashjoinable clause, not one path for a randomly-chosen element of each
set of clauses with the same join operator.  That is, if you wrote
   SELECT ... WHERE t1.f1 = t2.f2 and t1.f3 = t2.f4,
and both '=' ops were the same opcode (say, all four fields are int4),
then the system would either consider hashing on f1=f2 or on f3=f4,
but it would *not* consider both possibilities.  Boo hiss.
Also, revise estimation of hashjoin costs to include a penalty when the
inner join var has a high disbursion --- ie, the most common value is
pretty common.  This tends to lead to badly skewed hash bucket occupancy
and way more comparisons than you'd expect on average.
I imagine that the cost calculation still needs tweaking, but at least
it generates a more reasonable plan than before on George Young's example.
This commit is contained in:
Tom Lane
1999-08-06 04:00:17 +00:00
parent b7883d7e3a
commit e1fad50a5d
5 changed files with 199 additions and 116 deletions

View File

@@ -6,7 +6,7 @@
*
* Copyright (c) 1994, Regents of the University of California
*
* $Id: pathnode.h,v 1.19 1999/07/30 04:07:22 tgl Exp $
* $Id: pathnode.h,v 1.20 1999/08/06 04:00:13 tgl Exp $
*
*-------------------------------------------------------------------------
*/
@@ -33,9 +33,9 @@ extern MergePath *create_mergejoin_path(RelOptInfo *joinrel, int outersize,
List *mergeclauses, List *outersortkeys, List *innersortkeys);
extern HashPath *create_hashjoin_path(RelOptInfo *joinrel, int outersize,
int innersize, int outerwidth, int innerwidth, Path *outer_path,
Path *inner_path, List *pathkeys, Oid operator, List *hashclauses,
List *outerkeys, List *innerkeys);
int innersize, int outerwidth, int innerwidth, Path *outer_path,
Path *inner_path, List *pathkeys, Oid operator, List *hashclauses,
List *outerkeys, List *innerkeys, Cost innerdisbursion);
/*
* prototypes for rel.c