1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-30 11:03:19 +03:00

Support ORDER BY within aggregate function calls, at long last providing a

non-kluge method for controlling the order in which values are fed to an
aggregate function.  At the same time eliminate the old implementation
restriction that DISTINCT was only supported for single-argument aggregates.

Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
dropped null values of x unconditionally.  Now, it does so only if the
agg transition function is strict; otherwise nulls are treated as DISTINCT
normally would, ie, you get one copy.

Andrew Gierth, reviewed by Hitoshi Harada
This commit is contained in:
Tom Lane
2009-12-15 17:57:48 +00:00
parent 6a6efb9640
commit 34d26872ed
31 changed files with 1184 additions and 248 deletions

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.492 2009/11/24 19:21:15 petere Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.493 2009/12/15 17:57:46 tgl Exp $ -->
<chapter id="functions">
<title>Functions and Operators</title>
@ -8440,7 +8440,8 @@ SELECT xmlroot(xmlparse(document '<?xml version="1.1"?><content>abc</content>'),
The function <function>xmlagg</function> is, unlike the other
functions described here, an aggregate function. It concatenates the
input values to the aggregate function call,
like <function>xmlconcat</function> does.
much like <function>xmlconcat</function> does, except that concatenation
occurs across rows rather than across expressions in a single row.
See <xref linkend="functions-aggregate"> for additional information
about aggregate functions.
</para>
@ -8459,8 +8460,22 @@ SELECT xmlagg(x) FROM test;
</para>
<para>
To determine the order of the concatenation, something like the
following approach can be used:
To determine the order of the concatenation, an <literal>ORDER BY</>
clause may be added to the aggregate call as described in
<xref linkend="syntax-aggregates">. For example:
<screen><![CDATA[
SELECT xmlagg(x ORDER BY y DESC) FROM test;
xmlagg
----------------------
<bar/><foo>abc</foo>
]]></screen>
</para>
<para>
The following non-standard approach used to be recommended
in previous versions, and may still be useful in specific
cases:
<screen><![CDATA[
SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
@ -8468,9 +8483,6 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
----------------------
<bar/><foo>abc</foo>
]]></screen>
Again, see <xref linkend="functions-aggregate"> for additional
information.
</para>
</sect3>
@ -9887,20 +9899,19 @@ SELECT count(*) FROM sometable;
The aggregate functions <function>array_agg</function>
and <function>xmlagg</function>, as well as similar user-defined
aggregate functions, produce meaningfully different result values
depending on the order of the input values. In the current
implementation, the order of the input is in principle unspecified.
Supplying the input values from a sorted subquery
will usually work, however. For example:
depending on the order of the input values. This ordering is
unspecified by default, but can be controlled by writing an
<literal>ORDER BY</> clause within the aggregate call, as shown in
<xref linkend="syntax-aggregates">.
Alternatively, supplying the input values from a sorted subquery
will usually work. For example:
<screen><![CDATA[
SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
]]></screen>
But this syntax is not allowed in the SQL standard, and is
not portable to other database systems. A future version of
<productname>PostgreSQL</> might provide an additional feature to control
the order in a better-defined way (<literal>xmlagg(expr ORDER BY expr, expr,
...)</literal>).
not portable to other database systems.
</para>
<para>

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.138 2009/11/05 23:24:22 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.139 2009/12/15 17:57:46 tgl Exp $ -->
<chapter id="sql-syntax">
<title>SQL Syntax</title>
@ -1525,17 +1525,19 @@ sqrt(2)
syntax of an aggregate expression is one of the following:
<synopsis>
<replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable> [ , ... ] )
<replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable> [ , ... ] )
<replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)
<replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable> [ , ... ] [ <replaceable>order_by_clause</replaceable> ] )
<replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable> [ , ... ] [ <replaceable>order_by_clause</replaceable> ] )
<replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable> [ , ... ] [ <replaceable>order_by_clause</replaceable> ] )
<replaceable>aggregate_name</replaceable> ( * )
</synopsis>
where <replaceable>aggregate_name</replaceable> is a previously
defined aggregate (possibly qualified with a schema name), and
defined aggregate (possibly qualified with a schema name),
<replaceable>expression</replaceable> is
any value expression that does not itself contain an aggregate
expression or a window function call.
expression or a window function call, and
<replaceable>order_by_clause</replaceable> is a optional
<literal>ORDER BY</> clause as described below.
</para>
<para>
@ -1545,8 +1547,9 @@ sqrt(2)
whether to ignore null values or not &mdash; but all the standard ones do.)
The second form is the same as the first, since
<literal>ALL</literal> is the default. The third form invokes the
aggregate for all distinct non-null values of the expressions found
in the input rows. The last form invokes the aggregate once for
aggregate for all distinct values of the expressions found
in the input rows (ignoring nulls if the function chooses to do so).
The last form invokes the aggregate once for
each input row regardless of null or non-null values; since no
particular input value is specified, it is generally only useful
for the <function>count(*)</function> aggregate function.
@ -1560,6 +1563,40 @@ sqrt(2)
distinct non-null values of <literal>f1</literal>.
</para>
<para>
Ordinarily, the input rows are fed to the aggregate function in an
unspecified order. In many cases this does not matter; for example,
<function>min</> produces the same result no matter what order it
receives the inputs in. However, some aggregate functions
(such as <function>array_agg</> and <function>xmlagg</>) produce
results that depend on the ordering of the input rows. When using
such an aggregate, the optional <replaceable>order_by_clause</> can be
used to specify the desired ordering. The <replaceable>order_by_clause</>
has the same syntax as for a query-level <literal>ORDER BY</> clause, as
described in <xref linkend="queries-order">, except that its expressions
are always just expressions and cannot be output-column names or numbers.
For example:
<programlisting>
SELECT array_agg(a ORDER BY b DESC) FROM table;
</programlisting>
</para>
<para>
If <literal>DISTINCT</> is specified in addition to an
<replaceable>order_by_clause</>, then all the <literal>ORDER BY</>
expressions must match regular arguments of the aggregate; that is,
you cannot sort on an expression that is not included in the
<literal>DISTINCT</> list.
</para>
<note>
<para>
The ability to specify both <literal>DISTINCT</> and <literal>ORDER BY</>
in an aggregate function is a <productname>PostgreSQL</> extension.
</para>
</note>
<para>
The predefined aggregate functions are described in <xref
linkend="functions-aggregate">. Other aggregate functions can be added
@ -1588,13 +1625,6 @@ sqrt(2)
appearing only in the result list or <literal>HAVING</> clause
applies with respect to the query level that the aggregate belongs to.
</para>
<note>
<para>
<productname>PostgreSQL</productname> currently does not support
<literal>DISTINCT</> with more than one input expression.
</para>
</note>
</sect2>
<sect2 id="syntax-window-functions">
@ -1697,7 +1727,8 @@ ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
<literal>count(*) OVER (PARTITION BY x ORDER BY y)</>.
<literal>*</> is customarily not used for non-aggregate window functions.
Aggregate window functions, unlike normal aggregate functions, do not
allow <literal>DISTINCT</> to be used within the function argument list.
allow <literal>DISTINCT</> or <literal>ORDER BY</> to be used within the
function argument list.
</para>
<para>