1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-19 13:42:17 +03:00

Improve planner's handling of SetOp plans.

Remove the code for inserting flag columns in the inputs of a SetOp.
That was the only reason why there would be resjunk columns in a
set-operations plan tree, so we can get rid of some code that
supported that, too.

Get rid of choose_hashed_setop() in favor of building Paths for
the hashed and sorted alternatives, and letting them fight it out
within add_path().

Remove set_operation_ordered_results_useful(), which was giving wrong
answers due to examining the wrong ancestor node: we need to examine
the immediate SetOperationStmt parent not the topmost node.  Instead
make each caller of recurse_set_operations() pass down the relevant
parent node.  (This thinko seems to have led only to wasted planning
cycles and possibly-inferior plans, not wrong query answers.  Perhaps
we should back-patch it, but I'm not doing so right now.)

Teach generate_nonunion_paths() to consider pre-sorted inputs for
sorted SetOps, rather than always generating a Sort node.

Patch by me; thanks to Richard Guo and David Rowley for review.

Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
This commit is contained in:
Tom Lane
2024-12-19 17:02:25 -05:00
parent 2762792952
commit 8d96f57d5c
8 changed files with 365 additions and 332 deletions

View File

@@ -3681,17 +3681,70 @@ create_setop_path(PlannerInfo *root,
pathnode->numGroups = numGroups;
/*
* Charge one cpu_operator_cost per comparison per input tuple. We assume
* all columns get compared at most of the tuples.
*
* XXX all wrong for hashing
* Compute cost estimates. As things stand, we end up with the same total
* cost in this node for sort and hash methods, but different startup
* costs. This could be refined perhaps, but it'll do for now.
*/
pathnode->path.disabled_nodes =
leftpath->disabled_nodes + rightpath->disabled_nodes;
pathnode->path.startup_cost =
leftpath->startup_cost + rightpath->startup_cost;
pathnode->path.total_cost = leftpath->total_cost + rightpath->total_cost +
cpu_operator_cost * (leftpath->rows + rightpath->rows) * list_length(groupList);
if (strategy == SETOP_SORTED)
{
/*
* In sorted mode, we can emit output incrementally. Charge one
* cpu_operator_cost per comparison per input tuple. Like cost_group,
* we assume all columns get compared at most of the tuples.
*/
pathnode->path.startup_cost =
leftpath->startup_cost + rightpath->startup_cost;
pathnode->path.total_cost =
leftpath->total_cost + rightpath->total_cost +
cpu_operator_cost * (leftpath->rows + rightpath->rows) * list_length(groupList);
/*
* Also charge a small amount per extracted tuple. Like cost_sort,
* charge only operator cost not cpu_tuple_cost, since SetOp does no
* qual-checking or projection.
*/
pathnode->path.total_cost += cpu_operator_cost * outputRows;
}
else
{
Size hashentrysize;
/*
* In hashed mode, we must read all the input before we can emit
* anything. Also charge comparison costs to represent the cost of
* hash table lookups.
*/
pathnode->path.startup_cost =
leftpath->total_cost + rightpath->total_cost +
cpu_operator_cost * (leftpath->rows + rightpath->rows) * list_length(groupList);
pathnode->path.total_cost = pathnode->path.startup_cost;
/*
* Also charge a small amount per extracted tuple. Like cost_sort,
* charge only operator cost not cpu_tuple_cost, since SetOp does no
* qual-checking or projection.
*/
pathnode->path.total_cost += cpu_operator_cost * outputRows;
/*
* Mark the path as disabled if enable_hashagg is off. While this
* isn't exactly a HashAgg node, it seems close enough to justify
* letting that switch control it.
*/
if (!enable_hashagg)
pathnode->path.disabled_nodes++;
/*
* Also disable if it doesn't look like the hashtable will fit into
* hash_mem.
*/
hashentrysize = MAXALIGN(leftpath->pathtarget->width) +
MAXALIGN(SizeofMinimalTupleHeader);
if (hashentrysize * numGroups > get_hash_memory_limit())
pathnode->path.disabled_nodes++;
}
pathnode->path.rows = outputRows;
return pathnode;