mirror of
https://github.com/postgres/postgres.git
synced 2025-11-21 00:42:43 +03:00
Collect and use element-frequency statistics for arrays.
This patch improves selectivity estimation for the array <@, &&, and @> (containment and overlaps) operators. It enables collection of statistics about individual array element values by ANALYZE, and introduces operator-specific estimators that use these stats. In addition, ScalarArrayOpExpr constructs of the forms "const = ANY/ALL (array_column)" and "const <> ANY/ALL (array_column)" are estimated by treating them as variants of the containment operators. Since we still collect scalar-style stats about the array values as a whole, the pg_stats view is expanded to show both these stats and the array-style stats in separate columns. This creates an incompatible change in how stats for tsvector columns are displayed in pg_stats: the stats about lexemes are now displayed in the array-related columns instead of the original scalar-related columns. There are a few loose ends here, notably that it'd be nice to be able to suppress either the scalar-style stats or the array-element stats for columns for which they're not useful. But the patch is in good enough shape to commit for wider testing. Alexander Korotkov, reviewed by Noah Misch and Nathan Boley
This commit is contained in:
@@ -220,6 +220,10 @@ mcelem_tsquery_selec(TSQuery query, Datum *mcelem, int nmcelem,
|
||||
/*
|
||||
* There should be two more Numbers than Values, because the last two
|
||||
* cells are taken for minimal and maximal frequency. Punt if not.
|
||||
*
|
||||
* (Note: the MCELEM statistics slot definition allows for a third extra
|
||||
* number containing the frequency of nulls, but we're not expecting that
|
||||
* to appear for a tsvector column.)
|
||||
*/
|
||||
if (nnumbers != nmcelem + 2)
|
||||
return tsquery_opr_selec_no_stats(query);
|
||||
|
||||
@@ -377,6 +377,11 @@ compute_tsvector_stats(VacAttrStats *stats,
|
||||
* able to find out the minimal and maximal frequency without
|
||||
* going through all the values. We keep those two extra
|
||||
* frequencies in two extra cells in mcelem_freqs.
|
||||
*
|
||||
* (Note: the MCELEM statistics slot definition allows for a third
|
||||
* extra number containing the frequency of nulls, but we don't
|
||||
* create that for a tsvector column, since null elements aren't
|
||||
* possible.)
|
||||
*/
|
||||
mcelem_values = (Datum *) palloc(num_mcelem * sizeof(Datum));
|
||||
mcelem_freqs = (float4 *) palloc((num_mcelem + 2) * sizeof(float4));
|
||||
|
||||
Reference in New Issue
Block a user