1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-25 12:03:53 +03:00

Revert "Optimize order of GROUP BY keys".

This reverts commit db0d67db24 and
several follow-on fixes.  The idea of making a cost-based choice
of the order of the sorting columns is not fundamentally unsound,
but it requires cost information and data statistics that we don't
really have.  For example, relying on procost to distinguish the
relative costs of different sort comparators is pretty pointless
so long as most such comparator functions are labeled with cost 1.0.
Moreover, estimating the number of comparisons done by Quicksort
requires more than just an estimate of the number of distinct values
in the input: you also need some idea of the sizes of the larger
groups, if you want an estimate that's good to better than a factor of
three or so.  That's data that's often unknown or not very reliable.
Worse, to arrive at estimates of the number of calls made to the
lower-order-column comparison functions, the code needs to make
estimates of the numbers of distinct values of multiple columns,
which are necessarily even less trustworthy than per-column stats.
Even if all the inputs are perfectly reliable, the cost algorithm
as-implemented cannot offer useful information about how to order
sorting columns beyond the point at which the average group size
is estimated to drop to 1.

Close inspection of the code added by db0d67db2 shows that there
are also multiple small bugs.  These could have been fixed, but
there's not much point if we don't trust the estimates to be
accurate in-principle.

Finally, the changes in cost_sort's behavior made for very large
changes (often a factor of 2 or so) in the cost estimates for all
sorting operations, not only those for multi-column GROUP BY.
That naturally changes plan choices in many situations, and there's
precious little evidence to show that the changes are for the better.
Given the above doubts about whether the new estimates are really
trustworthy, it's hard to summon much confidence that these changes
are better on the average.

Since we're hard up against the release deadline for v15, let's
revert these changes for now.  We can always try again later.

Note: in v15, I left T_PathKeyInfo in place in nodes.h even though
it's unreferenced.  Removing it would be an ABI break, and it seems
a bit late in the release cycle for that.

Discussion: https://postgr.es/m/TYAPR01MB586665EB5FB2C3807E893941F5579@TYAPR01MB5866.jpnprd01.prod.outlook.com
This commit is contained in:
Tom Lane
2022-10-03 10:56:16 -04:00
parent f60eb3f282
commit f4c7c410ee
24 changed files with 459 additions and 1943 deletions

View File

@@ -115,9 +115,7 @@ extern void cost_incremental_sort(Path *path,
Cost input_startup_cost, Cost input_total_cost,
double input_tuples, int width, Cost comparison_cost, int sort_mem,
double limit_tuples);
extern Cost cost_sort_estimate(PlannerInfo *root, List *pathkeys,
int nPresortedKeys, double tuples);
extern void cost_append(AppendPath *apath, PlannerInfo *root);
extern void cost_append(AppendPath *apath);
extern void cost_merge_append(Path *path, PlannerInfo *root,
List *pathkeys, int n_streams,
Cost input_startup_cost, Cost input_total_cost,

View File

@@ -24,7 +24,6 @@ extern PGDLLIMPORT bool enable_geqo;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
extern PGDLLIMPORT bool enable_group_by_reordering;
/* Hook for plugins to get control in set_rel_pathlist() */
typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
@@ -203,13 +202,6 @@ typedef enum
extern PathKeysComparison compare_pathkeys(List *keys1, List *keys2);
extern bool pathkeys_contained_in(List *keys1, List *keys2);
extern bool pathkeys_count_contained_in(List *keys1, List *keys2, int *n_common);
extern int group_keys_reorder_by_pathkeys(List *pathkeys,
List **group_pathkeys,
List **group_clauses);
extern List *get_useful_group_keys_orderings(PlannerInfo *root, double nrows,
List *path_pathkeys,
List *group_pathkeys, List *group_clauses,
List *aggregate_pathkeys);
extern Path *get_cheapest_path_for_pathkeys(List *paths, List *pathkeys,
Relids required_outer,
CostSelector cost_criterion,