1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-28 23:42:10 +03:00

Implement multivariate n-distinct coefficients

Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns.  Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too.  All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.

This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table.  This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve.  A new
special pseudo-type pg_ndistinct is used.

(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)

Author: Tomas Vondra.  Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
    Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
    https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
This commit is contained in:
Alvaro Herrera
2017-03-24 14:06:10 -03:00
parent f120b614e0
commit 7b504eb282
77 changed files with 3599 additions and 63 deletions

View File

@ -295,6 +295,11 @@
<entry>planner statistics</entry>
</row>
<row>
<entry><link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link></entry>
<entry>extended planner statistics</entry>
</row>
<row>
<entry><link linkend="catalog-pg-subscription"><structname>pg_subscription</structname></link></entry>
<entry>logical replication subscriptions</entry>
@ -4247,6 +4252,98 @@
</table>
</sect1>
<sect1 id="catalog-pg-statistic-ext">
<title><structname>pg_statistic_ext</structname></title>
<indexterm zone="catalog-pg-statistic-ext">
<primary>pg_statistic_ext</primary>
</indexterm>
<para>
The catalog <structname>pg_statistic_ext</structname>
holds extended planner statistics.
</para>
<table>
<title><structname>pg_statistic_ext</> Columns</title>
<tgroup cols="4">
<thead>
<row>
<entry>Name</entry>
<entry>Type</entry>
<entry>References</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><structfield>starelid</structfield></entry>
<entry><type>oid</type></entry>
<entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
<entry>The table that the described columns belongs to</entry>
</row>
<row>
<entry><structfield>staname</structfield></entry>
<entry><type>name</type></entry>
<entry></entry>
<entry>Name of the statistic.</entry>
</row>
<row>
<entry><structfield>stanamespace</structfield></entry>
<entry><type>oid</type></entry>
<entry><literal><link linkend="catalog-pg-namespace"><structname>pg_namespace</structname></link>.oid</literal></entry>
<entry>
The OID of the namespace that contains this statistic
</entry>
</row>
<row>
<entry><structfield>staowner</structfield></entry>
<entry><type>oid</type></entry>
<entry><literal><link linkend="catalog-pg-authid"><structname>pg_authid</structname></link>.oid</literal></entry>
<entry>Owner of the statistic</entry>
</row>
<row>
<entry><structfield>staenabled</structfield></entry>
<entry><type>char[]</type></entry>
<entry></entry>
<entry>
An array with the modes of the enabled statistic types, encoded as
<literal>d</literal> for ndistinct coefficients.
</entry>
</row>
<row>
<entry><structfield>stakeys</structfield></entry>
<entry><type>int2vector</type></entry>
<entry><literal><link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.attnum</literal></entry>
<entry>
This is an array of values that indicate which table columns this
statistic covers. For example a value of <literal>1 3</literal> would
mean that the first and the third table columns make up the statistic key.
</entry>
</row>
<row>
<entry><structfield>standistinct</structfield></entry>
<entry><type>pg_ndistinct</type></entry>
<entry></entry>
<entry>
N-distinct coefficients, serialized as <structname>pg_ndistinct</> type.
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect1>
<sect1 id="catalog-pg-namespace">
<title><structname>pg_namespace</structname></title>

View File

@ -16720,6 +16720,10 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
<primary>pg_get_serial_sequence</primary>
</indexterm>
<indexterm>
<primary>pg_get_statisticsextdef</primary>
</indexterm>
<indexterm>
<primary>pg_get_triggerdef</primary>
</indexterm>
@ -16889,6 +16893,11 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
<entry>get name of the sequence that a <type>serial</type>, <type>smallserial</type> or <type>bigserial</type> column
uses</entry>
</row>
<row>
<entry><literal><function>pg_get_statisticsextdef(<parameter>statext_oid</parameter>)</function></literal></entry>
<entry><type>text</type></entry>
<entry>get <command>CREATE STATISTICS</> command for extended statistics objects</entry>
</row>
<row>
<entry><function>pg_get_triggerdef</function>(<parameter>trigger_oid</parameter>)</entry>
<entry><type>text</type></entry>
@ -17034,19 +17043,20 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
<para>
<function>pg_get_constraintdef</function>,
<function>pg_get_indexdef</function>, <function>pg_get_ruledef</function>,
and <function>pg_get_triggerdef</function>, respectively reconstruct the
creating command for a constraint, index, rule, or trigger. (Note that this
is a decompiled reconstruction, not the original text of the command.)
<function>pg_get_expr</function> decompiles the internal form of an
individual expression, such as the default value for a column. It can be
useful when examining the contents of system catalogs. If the expression
might contain Vars, specify the OID of the relation they refer to as the
second parameter; if no Vars are expected, zero is sufficient.
<function>pg_get_viewdef</function> reconstructs the <command>SELECT</>
query that defines a view. Most of these functions come in two variants,
one of which can optionally <quote>pretty-print</> the result. The
pretty-printed format is more readable, but the default format is more
likely to be interpreted the same way by future versions of
<function>pg_get_statisticsextdef</function>, and
<function>pg_get_triggerdef</function>, respectively reconstruct the
creating command for a constraint, index, rule, extended statistics object,
or trigger. (Note that this is a decompiled reconstruction, not the
original text of the command.) <function>pg_get_expr</function> decompiles
the internal form of an individual expression, such as the default value
for a column. It can be useful when examining the contents of system
catalogs. If the expression might contain Vars, specify the OID of the
relation they refer to as the second parameter; if no Vars are expected,
zero is sufficient. <function>pg_get_viewdef</function> reconstructs the
<command>SELECT</> query that defines a view. Most of these functions come
in two variants, one of which can optionally <quote>pretty-print</> the
result. The pretty-printed format is more readable, but the default format
is more likely to be interpreted the same way by future versions of
<productname>PostgreSQL</>; avoid using pretty-printed output for dump
purposes. Passing <literal>false</> for the pretty-print parameter yields
the same result as the variant that does not have the parameter at all.

View File

@ -34,6 +34,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY alterSequence SYSTEM "alter_sequence.sgml">
<!ENTITY alterSubscription SYSTEM "alter_subscription.sgml">
<!ENTITY alterSystem SYSTEM "alter_system.sgml">
<!ENTITY alterStatistics SYSTEM "alter_statistics.sgml">
<!ENTITY alterTable SYSTEM "alter_table.sgml">
<!ENTITY alterTableSpace SYSTEM "alter_tablespace.sgml">
<!ENTITY alterTSConfig SYSTEM "alter_tsconfig.sgml">
@ -80,6 +81,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY createSchema SYSTEM "create_schema.sgml">
<!ENTITY createSequence SYSTEM "create_sequence.sgml">
<!ENTITY createServer SYSTEM "create_server.sgml">
<!ENTITY createStatistics SYSTEM "create_statistics.sgml">
<!ENTITY createSubscription SYSTEM "create_subscription.sgml">
<!ENTITY createTable SYSTEM "create_table.sgml">
<!ENTITY createTableAs SYSTEM "create_table_as.sgml">
@ -126,6 +128,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY dropSchema SYSTEM "drop_schema.sgml">
<!ENTITY dropSequence SYSTEM "drop_sequence.sgml">
<!ENTITY dropServer SYSTEM "drop_server.sgml">
<!ENTITY dropStatistics SYSTEM "drop_statistics.sgml">
<!ENTITY dropSubscription SYSTEM "drop_subscription.sgml">
<!ENTITY dropTable SYSTEM "drop_table.sgml">
<!ENTITY dropTableSpace SYSTEM "drop_tablespace.sgml">

View File

@ -0,0 +1,115 @@
<!--
doc/src/sgml/ref/alter_statistics.sgml
PostgreSQL documentation
-->
<refentry id="SQL-ALTERSTATISTICS">
<indexterm zone="sql-alterstatistics">
<primary>ALTER STATISTICS</primary>
</indexterm>
<refmeta>
<refentrytitle>ALTER STATISTICS</refentrytitle>
<manvolnum>7</manvolnum>
<refmiscinfo>SQL - Language Statements</refmiscinfo>
</refmeta>
<refnamediv>
<refname>ALTER STATISTICS</refname>
<refpurpose>
change the definition of a extended statistics
</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
ALTER STATISTICS <replaceable class="parameter">name</replaceable> OWNER TO { <replaceable class="PARAMETER">new_owner</replaceable> | CURRENT_USER | SESSION_USER }
ALTER STATISTICS <replaceable class="parameter">name</replaceable> RENAME TO <replaceable class="parameter">new_name</replaceable>
ALTER STATISTICS <replaceable class="parameter">name</replaceable> SET SCHEMA <replaceable class="parameter">new_schema</replaceable>
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>
<command>ALTER STATISTICS</command> changes the parameters of an existing
extended statistics. Any parameters not specifically set in the
<command>ALTER STATISTICS</command> command retain their prior settings.
</para>
<para>
You must own the statistics to use <command>ALTER STATISTICS</>.
To change a statistics' schema, you must also have <literal>CREATE</>
privilege on the new schema.
To alter the owner, you must also be a direct or indirect member of the new
owning role, and that role must have <literal>CREATE</literal> privilege on
the statistics' schema. (These restrictions enforce that altering the owner
doesn't do anything you couldn't do by dropping and recreating the statistics.
However, a superuser can alter ownership of any statistics anyway.)
</para>
</refsect1>
<refsect1>
<title>Parameters</title>
<para>
<variablelist>
<varlistentry>
<term><replaceable class="parameter">name</replaceable></term>
<listitem>
<para>
The name (optionally schema-qualified) of the statistics to be altered.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="PARAMETER">new_owner</replaceable></term>
<listitem>
<para>
The user name of the new owner of the statistics.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="parameter">new_name</replaceable></term>
<listitem>
<para>
The new name for the statistics.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="parameter">new_schema</replaceable></term>
<listitem>
<para>
The new schema for the statistics.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</refsect1>
<refsect1>
<title>Compatibility</title>
<para>
There's no <command>ALTER STATISTICS</command> command in the SQL standard.
</para>
</refsect1>
<refsect1>
<title>See Also</title>
<simplelist type="inline">
<member><xref linkend="sql-createstatistics"></member>
<member><xref linkend="sql-dropstatistics"></member>
</simplelist>
</refsect1>
</refentry>

View File

@ -119,9 +119,12 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
<para>
This form drops a column from a table. Indexes and
table constraints involving the column will be automatically
dropped as well. You will need to say <literal>CASCADE</> if
anything outside the table depends on the column, for example,
foreign key references or views.
dropped as well.
Multivariate statistics referencing the dropped column will also be
removed if the removal of the column would cause the statistics to
contain data for only a single column.
You will need to say <literal>CASCADE</> if anything outside the table
depends on the column, for example, foreign key references or views.
If <literal>IF EXISTS</literal> is specified and the column
does not exist, no error is thrown. In this case a notice
is issued instead.

View File

@ -51,6 +51,7 @@ COMMENT ON
SCHEMA <replaceable class="PARAMETER">object_name</replaceable> |
SEQUENCE <replaceable class="PARAMETER">object_name</replaceable> |
SERVER <replaceable class="PARAMETER">object_name</replaceable> |
STATISTICS <replaceable class="PARAMETER">object_name</replaceable> |
TABLE <replaceable class="PARAMETER">object_name</replaceable> |
TABLESPACE <replaceable class="PARAMETER">object_name</replaceable> |
TEXT SEARCH CONFIGURATION <replaceable class="PARAMETER">object_name</replaceable> |
@ -125,8 +126,8 @@ COMMENT ON
The name of the object to be commented. Names of tables,
aggregates, collations, conversions, domains, foreign tables, functions,
indexes, operators, operator classes, operator families, sequences,
text search objects, types, and views can be schema-qualified.
When commenting on a column,
statistics, text search objects, types, and views can be
schema-qualified. When commenting on a column,
<replaceable class="parameter">relation_name</replaceable> must refer
to a table, view, composite type, or foreign table.
</para>
@ -327,6 +328,7 @@ COMMENT ON RULE my_rule ON my_table IS 'Logs updates of employee records';
COMMENT ON SCHEMA my_schema IS 'Departmental data';
COMMENT ON SEQUENCE my_sequence IS 'Used to generate primary keys';
COMMENT ON SERVER myserver IS 'my foreign server';
COMMENT ON STATISTICS my_statistics IS 'Improves planner row estimations';
COMMENT ON TABLE my_schema.my_table IS 'Employee Information';
COMMENT ON TABLESPACE my_tablespace IS 'Tablespace for indexes';
COMMENT ON TEXT SEARCH CONFIGURATION my_config IS 'Special word filtering';

View File

@ -0,0 +1,155 @@
<!--
doc/src/sgml/ref/create_statistics.sgml
PostgreSQL documentation
-->
<refentry id="SQL-CREATESTATISTICS">
<indexterm zone="sql-createstatistics">
<primary>CREATE STATISTICS</primary>
</indexterm>
<refmeta>
<refentrytitle>CREATE STATISTICS</refentrytitle>
<manvolnum>7</manvolnum>
<refmiscinfo>SQL - Language Statements</refmiscinfo>
</refmeta>
<refnamediv>
<refname>CREATE STATISTICS</refname>
<refpurpose>define extended statistics</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="PARAMETER">statistics_name</replaceable> ON (
<replaceable class="PARAMETER">column_name</replaceable>, <replaceable class="PARAMETER">column_name</replaceable> [, ...])
FROM <replaceable class="PARAMETER">table_name</replaceable>
</synopsis>
</refsynopsisdiv>
<refsect1 id="SQL-CREATESTATISTICS-description">
<title>Description</title>
<para>
<command>CREATE STATISTICS</command> will create a new extended statistics
object on the specified table.
The statistics will be created in the current database and
will be owned by the user issuing the command.
</para>
<para>
If a schema name is given (for example, <literal>CREATE STATISTICS
myschema.mystat ...</>) then the statistics is created in the specified
schema. Otherwise it is created in the current schema. The name of
the statistics must be distinct from the name of any other statistics in the
same schema.
</para>
</refsect1>
<refsect1>
<title>Parameters</title>
<variablelist>
<varlistentry>
<term><literal>IF NOT EXISTS</></term>
<listitem>
<para>
Do not throw an error if a statistics with the same name already exists.
A notice is issued in this case. Note that only the name of the
statistics object is considered here. The definition of the statistics is
not considered.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="PARAMETER">statistics_name</replaceable></term>
<listitem>
<para>
The name (optionally schema-qualified) of the statistics to be created.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="PARAMETER">column_name</replaceable></term>
<listitem>
<para>
The name of a column to be included in the statistics.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="PARAMETER">table_name</replaceable></term>
<listitem>
<para>
The name (optionally schema-qualified) of the table the statistics should
be created on.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Notes</title>
<para>
You must be the owner of a table to create or change statistics on it.
</para>
</refsect1>
<refsect1 id="SQL-CREATESTATISTICS-examples">
<title>Examples</title>
<para>
Create table <structname>t1</> with two functionally dependent columns, i.e.
knowledge of a value in the first column is sufficient for determining the
value in the other column. Then functional dependencies are built on those
columns:
<programlisting>
CREATE TABLE t1 (
a int,
b int
);
INSERT INTO t1 SELECT i/100, i/500
FROM generate_series(1,1000000) s(i);
CREATE STATISTICS s1 ON (a, b) FROM t1;
ANALYZE t1;
-- valid combination of values
EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
-- invalid combination of values
EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
</programlisting>
</para>
</refsect1>
<refsect1>
<title>Compatibility</title>
<para>
There's no <command>CREATE STATISTICS</command> command in the SQL standard.
</para>
</refsect1>
<refsect1>
<title>See Also</title>
<simplelist type="inline">
<member><xref linkend="sql-alterstatistics"></member>
<member><xref linkend="sql-dropstatistics"></member>
</simplelist>
</refsect1>
</refentry>

View File

@ -0,0 +1,98 @@
<!--
doc/src/sgml/ref/drop_statistics.sgml
PostgreSQL documentation
-->
<refentry id="SQL-DROPSTATISTICS">
<indexterm zone="sql-dropstatistics">
<primary>DROP STATISTICS</primary>
</indexterm>
<refmeta>
<refentrytitle>DROP STATISTICS</refentrytitle>
<manvolnum>7</manvolnum>
<refmiscinfo>SQL - Language Statements</refmiscinfo>
</refmeta>
<refnamediv>
<refname>DROP STATISTICS</refname>
<refpurpose>remove extended statistics</refpurpose>
</refnamediv>
<refsynopsisdiv>
<synopsis>
DROP STATISTICS [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...]
</synopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>
<command>DROP STATISTICS</command> removes statistics from the database.
Only the statistics owner, the schema owner, and superuser can drop a
statistics.
</para>
</refsect1>
<refsect1>
<title>Parameters</title>
<variablelist>
<varlistentry>
<term><literal>IF EXISTS</literal></term>
<listitem>
<para>
Do not throw an error if the statistics do not exist. A notice is
issued in this case.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="PARAMETER">name</replaceable></term>
<listitem>
<para>
The name (optionally schema-qualified) of the statistics to drop.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Examples</title>
<para>
To destroy two statistics objects on different schemas, without failing
if they don't exist:
<programlisting>
DROP STATISTICS IF EXISTS
accounting.users_uid_creation,
public.grants_user_role;
</programlisting>
</para>
</refsect1>
<refsect1>
<title>Compatibility</title>
<para>
There's no <command>DROP STATISTICS</command> command in the SQL standard.
</para>
</refsect1>
<refsect1>
<title>See Also</title>
<simplelist type="inline">
<member><xref linkend="sql-alterstatistics"></member>
<member><xref linkend="sql-createstatistics"></member>
</simplelist>
</refsect1>
</refentry>

View File

@ -60,6 +60,7 @@
&alterSchema;
&alterSequence;
&alterServer;
&alterStatistics;
&alterSubscription;
&alterSystem;
&alterTable;
@ -108,6 +109,7 @@
&createSchema;
&createSequence;
&createServer;
&createStatistics;
&createSubscription;
&createTable;
&createTableAs;
@ -154,6 +156,7 @@
&dropSchema;
&dropSequence;
&dropServer;
&dropStatistics;
&dropSubscription;
&dropTable;
&dropTableSpace;