1
0
mirror of https://github.com/postgres/postgres.git synced 2025-08-28 18:48:04 +03:00
Commit Graph

23 Commits

Author SHA1 Message Date
David Rowley
c2d35bb88e Doc: mention that extended stats aren't used for joins
Statistics defined by the CREATE STATISTICS command are only used to
assist with the selectivity estimations of base relations, never for
joins.  Here we mention this fact in the notes section of the CREATE
STATISTICS command.

Discussion: https://postgr.es/m/CAApHDvrMuVgDOrmg_EtFDZ=AOovq6EsJNnHH1ddyZ8EqL4yzMw@mail.gmail.com
Backpatch-through: 11
2023-06-22 12:45:50 +12:00
Bruce Momjian
97fe6d2210 doc: in create statistics docs, mention analyze for parent info
Discussion: https://postgr.es/m/Yv1Bw8J+1pYfHiRl@momjian.us

Backpatch-through: 10
2022-08-31 23:11:46 -04:00
Dean Rasheed
624aa2a13b Make the name optional in CREATE STATISTICS.
This allows users to omit the statistics name in a CREATE STATISTICS
command, letting the system auto-generate a sensible, unique name,
putting the statistics object in the same schema as the table.

Simon Riggs, reviewed by Matthias van de Meent.

Discussion: https://postgr.es/m/CANbhV-FGD2d_C3zFTfT2aRfX_TaPSgOeKES58RLZx5XzQp5NhA@mail.gmail.com
2022-07-21 19:23:13 +01:00
David Rowley
04539e73fa Use the correct article for abbreviations
We've accumulated quite a mix of instances of "an SQL" and "a SQL" in the
documents.  It would be good to be a bit more consistent with these.

The most recent version of the SQL standard I looked at seems to prefer
"an SQL".  That seems like a good lead to follow, so here we change all
instances of "a SQL" to become "an SQL".  Most instances correctly use
"an SQL" already, so it also makes sense to use the dominant variation in
order to minimise churn.

Additionally, there were some other abbreviations that needed to be
adjusted. FSM, SSPI, SRF and a few others.  Also fix some pronounceable,
abbreviations to use "a" instead of "an".  For example, "a SASL" instead
of "an SASL".

Here I've only adjusted the documents and error messages.  Many others
still exist in source code comments.  Translator hint comments seem to be
the biggest culprit.  It currently does not seem worth the churn to change
these.

Discussion: https://postgr.es/m/CAApHDvpML27UqFXnrYO1MJddsKVMQoiZisPvsAGhKE_tsKXquw%40mail.gmail.com
2021-06-11 13:38:04 +12:00
Tomas Vondra
a4d75c86bf Extended statistics on expressions
Allow defining extended statistics on expressions, not just just on
simple column references.  With this commit, expressions are supported
by all existing extended statistics kinds, improving the same types of
estimates. A simple example may look like this:

  CREATE TABLE t (a int);
  CREATE STATISTICS s ON mod(a,10), mod(a,20) FROM t;
  ANALYZE t;

The collected statistics are useful e.g. to estimate queries with those
expressions in WHERE or GROUP BY clauses:

  SELECT * FROM t WHERE mod(a,10) = 0 AND mod(a,20) = 0;

  SELECT 1 FROM t GROUP BY mod(a,10), mod(a,20);

This introduces new internal statistics kind 'e' (expressions) which is
built automatically when the statistics object definition includes any
expressions. This represents single-expression statistics, as if there
was an expression index (but without the index maintenance overhead).
The statistics is stored in pg_statistics_ext_data as an array of
composite types, which is possible thanks to 79f6a942bd.

CREATE STATISTICS allows building statistics on a single expression, in
which case in which case it's not possible to specify statistics kinds.

A new system view pg_stats_ext_exprs can be used to display expression
statistics, similarly to pg_stats and pg_stats_ext views.

ALTER TABLE ... ALTER COLUMN ... TYPE now treats indexes the same way it
treats indexes, i.e. it drops and recreates the statistics. This means
all statistics are reset, and we no longer try to preserve at least the
functional dependencies. This should not be a major issue in practice,
as the functional dependencies actually rely on per-column statistics,
which were always reset anyway.

Author: Tomas Vondra
Reviewed-by: Justin Pryzby, Dean Rasheed, Zhihong Yu
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
2021-03-27 00:01:11 +01:00
Bruce Momjian
953c64e0f6 doc: add commas after 'i.e.' and 'e.g.'
This follows the American format,
https://jakubmarian.com/comma-after-i-e-and-e-g/. There is no intention
of requiring this format for future text, but making existing text
consistent every few years makes sense.

Discussion: https://postgr.es/m/20200825183619.GA22369@momjian.us

Backpatch-through: 9.5
2020-08-31 18:33:37 -04:00
Bruce Momjian
7e0fb165dd docs: adjust multi-column most-common-value statistics
This commit adds a mention that the order of columns specified during
multi-column most-common-value statistics is insignificant, and tries to
simplify examples.

Discussion: https://postgr.es/m/20190828162238.GA8360@momjian.us

Backpatch-through: 12
2019-09-30 13:44:22 -04:00
Tomas Vondra
7f44efa10b Fix incorrect CREATE STATISTICS example in docs
The example was incorrectly using parantheses around the list of columns, so
just drop them.

Reported-By: Robert Haas
Discussion: https://postgr.es/m/CA%2BTgmoZZEMAqWMAfvLHZnK57SoxOutgvE-ALO94WsRA7zZ7wyQ%40mail.gmail.com
2019-06-16 01:20:42 +02:00
Tomas Vondra
7300a69950 Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.

Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.

Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
Peter Eisentraut
f7481d2c3c Documentation spell checking and markup improvements 2018-06-29 21:26:41 +02:00
Peter Eisentraut
3c49c6facb Convert documentation to DocBook XML
Since some preparation work had already been done, the only source
changes left were changing empty-element tags like <xref linkend="foo">
to <xref linkend="foo"/>, and changing the DOCTYPE.

The source files are still named *.sgml, but they are actually XML files
now.  Renaming could be considered later.

In the build system, the intermediate step to convert from SGML to XML
is removed.  Everything is build straight from the source files again.
The OpenSP (or the old SP) package is no longer needed.

The documentation toolchain instructions are updated and are much
simpler now.

Peter Eisentraut, Alexander Lakhin, Jürgen Purtz
2017-11-23 09:44:28 -05:00
Peter Eisentraut
1ff01b3902 Convert SGML IDs to lower case
IDs in SGML are case insensitive, and we have accumulated a mix of upper
and lower case IDs, including different variants of the same ID.  In
XML, these will be case sensitive, so we need to fix up those
differences.  Going to all lower case seems most straightforward, and
the current build process already makes all anchors and lower case
anyway during the SGML->XML conversion, so this doesn't create any
difference in the output right now.  A future XML-only build process
would, however, maintain any mixed case ID spellings in the output, so
that is another reason to clean this up beforehand.

Author: Alexander Lakhin <exclusion@gmail.com>
2017-10-20 19:26:10 -04:00
Peter Eisentraut
c29c578908 Don't use SGML empty tags
For DocBook XML compatibility, don't use SGML empty tags (</>) anymore,
replace by the full tag name.  Add a warning option to catch future
occurrences.

Alexander Lakhin, Jürgen Purtz
2017-10-17 15:10:33 -04:00
Peter Eisentraut
44b3230e82 Use lower-case SGML attribute values
for DocBook XML compatibility
2017-10-10 10:15:57 -04:00
Peter Eisentraut
821fb8cdbf Message style fixes 2017-09-11 11:21:27 -04:00
Peter Eisentraut
bbaf9e8f84 Documentation spell checking and markup improvements 2017-06-18 14:02:12 -04:00
Alvaro Herrera
eb67e2e35a extended stats: Clarify behavior of omitting stat type clause
Pointed out by Jeff Janes
Discussion: https://postgr.es/m/CAMkU=1zGhK-nW10RAXhokcT3MM=YBg=j5LkG9RMDwmu3i0H0Og@mail.gmail.com
2017-05-25 13:22:25 -04:00
Tom Lane
93ece9cc88 Edit SGML documentation related to extended statistics.
Use the "statistics object" terminology uniformly here too.  Assorted
copy-editing.  Put new catalogs.sgml sections into alphabetical order.
2017-05-14 19:15:52 -04:00
Alvaro Herrera
bc085205c8 Change CREATE STATISTICS syntax
Previously, we had the WITH clause in the middle of the command, where
you'd specify both generic options as well as statistic types.  Few
people liked this, so this commit changes it to remove the WITH keyword
from that clause and makes it accept statistic types only.  (We
currently don't have any generic options, but if we invent in the
future, we will gain a new WITH clause, probably at the end of the
command).

Also, the column list is now specified without parens, which makes the
whole command look more similar to a SELECT command.  This change will
let us expand the command to supporting expressions (not just columns
names) as well as multiple tables and their join conditions.

Tom added lots of code comments and fixed some parts of the CREATE
STATISTICS reference page, too; more changes in this area are
forthcoming.  He also fixed a potential problem in the alter_generic
regression test, reducing verbosity on a cascaded drop to avoid
dependency on message ordering, as we do in other tests.

Tom also closed a security bug: we documented that table ownership was
required in order to create a statistics object on it, but didn't
actually implement it.

Implement tab-completion for statistics objects.  This can stand some
more improvement.

Authors: Alvaro Herrera, with lots of cleanup by Tom Lane
Discussion: https://postgr.es/m/20170420212426.ltvgyhnefvhixm6i@alvherre.pgsql
2017-05-12 14:59:35 -03:00
Alvaro Herrera
8c5cdb7f4f Tighten up relation kind checks for extended statistics
We were accepting creation of extended statistics only for regular
tables, but they can usefully be created for foreign tables, partitioned
tables, and materialized views, too.  Allow those cases.

While at it, make sure all the rejected cases throw a consistent error
message, and add regression tests for the whole thing.

Author: David Rowley, Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA@mail.gmail.com
2017-04-17 17:55:55 -03:00
Peter Eisentraut
f0e44021df doc: Add some markup 2017-04-07 22:45:39 -04:00
Simon Riggs
2686ee1b7c Collect and use multi-column dependency stats
Follow on patch in the multi-variate statistics patch series.

CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
ANALYZE;
will collect dependency stats on (a, b) and then use the measured
dependency in subsequent query planning.

Commit 7b504eb282 added
CREATE STATISTICS with n-distinct coefficients. These are now
specified using the mutually exclusive option WITH (ndistinct).

Author: Tomas Vondra, David Rowley
Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas
and many other comments and contributions
Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com
2017-04-05 18:00:42 -04:00
Alvaro Herrera
7b504eb282 Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns.  Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too.  All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.

This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table.  This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve.  A new
special pseudo-type pg_ndistinct is used.

(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)

Author: Tomas Vondra.  Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
    Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
    https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 14:06:10 -03:00