Revise hash join code so that we can increase the number of batches

on-the-fly, and thereby avoid blowing out memory when the planner has underestimated the hash table size. Hash join will now obey the work_mem limit with some faithfulness. Per my recent proposal (hash aggregate part isn't done yet though).
2025-11-03 09:13:20 +03:00 · 2005-03-06 22:15:05 +00:00
parent 31b6d840f6
commit 849074f9ae
9 changed files with 654 additions and 450 deletions
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -15,7 +15,7 @@
 *
 *
 * IDENTIFICATION
- *	  $PostgreSQL: pgsql/src/backend/utils/adt/selfuncs.c,v 1.171 2005/02/01 23:07:58 tgl Exp $
+ *	  $PostgreSQL: pgsql/src/backend/utils/adt/selfuncs.c,v 1.172 2005/03/06 22:15:04 tgl Exp $
 *
 *-------------------------------------------------------------------------
 */
@@ -2154,7 +2154,7 @@ estimate_num_groups(Query *root, List *groupExprs, double input_rows)
 * inner rel is well-dispersed (or the alternatives seem much worse).
 */
 Selectivity
-estimate_hash_bucketsize(Query *root, Node *hashkey, int nbuckets)
+estimate_hash_bucketsize(Query *root, Node *hashkey, double nbuckets)
 {
 	VariableStatData vardata;
 	double		estfract,
@@ -2212,8 +2212,8 @@ estimate_hash_bucketsize(Query *root, Node *hashkey, int nbuckets)
 	 * the number of buckets is less than the expected number of distinct
 	 * values; otherwise it is 1/ndistinct.
 	 */
-	if (ndistinct > (double) nbuckets)
-		estfract = 1.0 / (double) nbuckets;
+	if (ndistinct > nbuckets)
+		estfract = 1.0 / nbuckets;
 	else
 		estfract = 1.0 / ndistinct;