1
0
mirror of https://github.com/postgres/postgres.git synced 2025-08-28 18:48:04 +03:00

Edit SGML documentation related to extended statistics.

Use the "statistics object" terminology uniformly here too.  Assorted
copy-editing.  Put new catalogs.sgml sections into alphabetical order.
This commit is contained in:
Tom Lane
2017-05-14 19:15:52 -04:00
parent e84c019598
commit 93ece9cc88
4 changed files with 397 additions and 334 deletions

View File

@@ -79,11 +79,13 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
<term><replaceable class="PARAMETER">statistic_type</replaceable></term>
<listitem>
<para>
A statistic type to be computed in this statistics object. Currently
supported types are <literal>ndistinct</literal>, which enables
n-distinct coefficient tracking,
and <literal>dependencies</literal>, which enables functional
dependencies.
A statistic type to be computed in this statistics object.
Currently supported types are
<literal>ndistinct</literal>, which enables n-distinct statistics, and
<literal>dependencies</literal>, which enables functional
dependency statistics.
For more information, see <xref linkend="planner-stats-extended">
and <xref linkend="multivariate-statistics-examples">.
</para>
</listitem>
</varlistentry>
@@ -92,7 +94,8 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
<term><replaceable class="PARAMETER">column_name</replaceable></term>
<listitem>
<para>
The name of a table column to be included in the statistics object.
The name of a table column to be covered by the computed statistics.
At least two column names must be given.
</para>
</listitem>
</varlistentry>
@@ -114,7 +117,9 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
<title>Notes</title>
<para>
You must be the owner of a table to create or change statistics on it.
You must be the owner of a table to create a statistics object
reading it. Once created, however, the ownership of the statistics
object is independent of the underlying table(s).
</para>
</refsect1>
@@ -124,8 +129,8 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_na
<para>
Create table <structname>t1</> with two functionally dependent columns, i.e.
knowledge of a value in the first column is sufficient for determining the
value in the other column. Then functional dependencies are built on those
columns:
value in the other column. Then functional dependency statistics are built
on those columns:
<programlisting>
CREATE TABLE t1 (
@@ -136,21 +141,25 @@ CREATE TABLE t1 (
INSERT INTO t1 SELECT i/100, i/500
FROM generate_series(1,1000000) s(i);
ANALYZE t1;
-- the number of matching rows will be drastically underestimated:
EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
CREATE STATISTICS s1 (dependencies) ON a, b FROM t1;
ANALYZE t1;
-- valid combination of values
-- now the rowcount estimate is more accurate:
EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
-- invalid combination of values
EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
</programlisting>
Without functional-dependency statistics, the planner would make the
same estimate of the number of matching rows for these two queries.
With such statistics, it is able to tell that one case has matches
and the other does not.
Without functional-dependency statistics, the planner would assume
that the two <literal>WHERE</> conditions are independent, and would
multiply their selectivities together to arrive at a much-too-small
rowcount estimate.
With such statistics, the planner recognizes that the <literal>WHERE</>
conditions are redundant and does not underestimate the rowcount.
</para>
</refsect1>