1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-25 01:02:05 +03:00

Improve planner's handling of set-returning functions in grouping columns.

Improve query_is_distinct_for() to accept SRFs in the targetlist when
we can prove distinctness from a DISTINCT clause.  In that case the
de-duplication will surely happen after SRF expansion, so the proof
still works.  Continue to punt in the case where we'd try to prove
distinctness from GROUP BY (or, in the future, source relations).
To do that, we'd have to determine whether the SRFs were in the
grouping columns or elsewhere in the tlist, and it still doesn't
seem worth the trouble.  But this trivial change allows us to
recognize that "SELECT DISTINCT unnest(foo) FROM ..." produces
unique-ified output, which seems worth having.

Also, fix estimate_num_groups() to consider the possibility of SRFs in
the grouping columns.  Its failure to do so was masked before v10 because
grouping_planner() scaled up plan rowcount estimates by the estimated SRF
multiplier after performing grouping.  That doesn't happen anymore, which
is more correct, but it means we need an adjustment in the estimate for
the number of groups.  Failure to do this leads to an underestimate for
the number of output rows of subqueries like "SELECT DISTINCT unnest(foo)"
compared to what 9.6 and earlier estimated, thus breaking plan choices
in some cases.

Per report from Dmitry Shalashov.  Back-patch to v10 to avoid degraded
plan choices compared to previous releases.

Discussion: https://postgr.es/m/CAKPeCUGAeHgoh5O=SvcQxREVkoX7UdeJUMj1F5=aBNvoTa+O8w@mail.gmail.com
This commit is contained in:
Tom Lane
2017-11-25 11:48:09 -05:00
parent b10967eddf
commit df3a66e282
2 changed files with 41 additions and 14 deletions

View File

@ -3361,6 +3361,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
List **pgset)
{
List *varinfos = NIL;
double srf_multiplier = 1.0;
double numdistinct;
ListCell *l;
int i;
@ -3394,6 +3395,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
foreach(l, groupExprs)
{
Node *groupexpr = (Node *) lfirst(l);
double this_srf_multiplier;
VariableStatData vardata;
List *varshere;
ListCell *l2;
@ -3402,6 +3404,21 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
if (pgset && !list_member_int(*pgset, i++))
continue;
/*
* Set-returning functions in grouping columns are a bit problematic.
* The code below will effectively ignore their SRF nature and come up
* with a numdistinct estimate as though they were scalar functions.
* We compensate by scaling up the end result by the largest SRF
* rowcount estimate. (This will be an overestimate if the SRF
* produces multiple copies of any output value, but it seems best to
* assume the SRF's outputs are distinct. In any case, it's probably
* pointless to worry too much about this without much better
* estimates for SRF output rowcounts than we have today.)
*/
this_srf_multiplier = expression_returns_set_rows(groupexpr);
if (srf_multiplier < this_srf_multiplier)
srf_multiplier = this_srf_multiplier;
/* Short-circuit for expressions returning boolean */
if (exprType(groupexpr) == BOOLOID)
{
@ -3467,9 +3484,15 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
*/
if (varinfos == NIL)
{
/* Apply SRF multiplier as we would do in the long path */
numdistinct *= srf_multiplier;
/* Round off */
numdistinct = ceil(numdistinct);
/* Guard against out-of-range answers */
if (numdistinct > input_rows)
numdistinct = input_rows;
if (numdistinct < 1.0)
numdistinct = 1.0;
return numdistinct;
}
@ -3638,6 +3661,10 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
varinfos = newvarinfos;
} while (varinfos != NIL);
/* Now we can account for the effects of any SRFs */
numdistinct *= srf_multiplier;
/* Round off */
numdistinct = ceil(numdistinct);
/* Guard against out-of-range answers */