Support ORDER BY within aggregate function calls, at long last providing a

non-kluge method for controlling the order in which values are fed to an aggregate function. At the same time eliminate the old implementation restriction that DISTINCT was only supported for single-argument aggregates. Possibly release-notable behavioral change: formerly, agg(DISTINCT x) dropped null values of x unconditionally. Now, it does so only if the agg transition function is strict; otherwise nulls are treated as DISTINCT normally would, ie, you get one copy. Andrew Gierth, reviewed by Hitoshi Harada
2025-07-30 11:03:19 +03:00 · 2009-12-15 17:57:48 +00:00
parent 6a6efb9640
commit 34d26872ed
31 changed files with 1184 additions and 248 deletions
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.492 2009/11/24 19:21:15 petere Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.493 2009/12/15 17:57:46 tgl Exp $ -->

 <chapter id="functions">
  <title>Functions and Operators</title>
@ -8440,7 +8440,8 @@ SELECT xmlroot(xmlparse(document '<?xml version="1.1"?><content>abc</content>'),
     The function <function>xmlagg</function> is, unlike the other
     functions described here, an aggregate function.  It concatenates the
     input values to the aggregate function call,
-     like <function>xmlconcat</function> does.
+     much like <function>xmlconcat</function> does, except that concatenation
+     occurs across rows rather than across expressions in a single row.
     See <xref linkend="functions-aggregate"> for additional information
     about aggregate functions.
    </para>
@ -8459,8 +8460,22 @@ SELECT xmlagg(x) FROM test;
    </para>

    <para>
-     To determine the order of the concatenation, something like the
-     following approach can be used:
+     To determine the order of the concatenation, an <literal>ORDER BY</>
+     clause may be added to the aggregate call as described in
+     <xref linkend="syntax-aggregates">. For example:
+
+<screen><![CDATA[
+SELECT xmlagg(x ORDER BY y DESC) FROM test;
+        xmlagg
+----------------------
+ <bar/><foo>abc</foo>
+]]></screen>
+    </para>
+
+    <para>
+     The following non-standard approach used to be recommended
+     in previous versions, and may still be useful in specific
+     cases:

 <screen><![CDATA[
 SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
@ -8468,9 +8483,6 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 ----------------------
 <bar/><foo>abc</foo>
 ]]></screen>
-
-     Again, see <xref linkend="functions-aggregate"> for additional
-     information.
    </para>
   </sect3>

@ -9887,20 +9899,19 @@ SELECT count(*) FROM sometable;
   The aggregate functions <function>array_agg</function>
   and <function>xmlagg</function>, as well as similar user-defined
   aggregate functions, produce meaningfully different result values
-   depending on the order of the input values.  In the current
-   implementation, the order of the input is in principle unspecified.
-   Supplying the input values from a sorted subquery
-   will usually work, however.  For example:
+   depending on the order of the input values.  This ordering is
+   unspecified by default, but can be controlled by writing an
+   <literal>ORDER BY</> clause within the aggregate call, as shown in
+   <xref linkend="syntax-aggregates">.
+   Alternatively, supplying the input values from a sorted subquery
+   will usually work.  For example:

 <screen><![CDATA[
 SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
 ]]></screen>

   But this syntax is not allowed in the SQL standard, and is
-   not portable to other database systems.  A future version of
-   <productname>PostgreSQL</> might provide an additional feature to control
-   the order in a better-defined way (<literal>xmlagg(expr ORDER BY expr, expr,
-   ...)</literal>).
+   not portable to other database systems.
  </para>

  <para>
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.138 2009/11/05 23:24:22 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.139 2009/12/15 17:57:46 tgl Exp $ -->

 <chapter id="sql-syntax">
 <title>SQL Syntax</title>
@ -1525,17 +1525,19 @@ sqrt(2)
    syntax of an aggregate expression is one of the following:

 <synopsis>
-<replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable> [ , ... ] )
-<replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable> [ , ... ] )
-<replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)
+<replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable> [ , ... ] [ <replaceable>order_by_clause</replaceable> ] )
+<replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable> [ , ... ] [ <replaceable>order_by_clause</replaceable> ] )
+<replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable> [ , ... ] [ <replaceable>order_by_clause</replaceable> ] )
 <replaceable>aggregate_name</replaceable> ( * )
 </synopsis>

    where <replaceable>aggregate_name</replaceable> is a previously
-    defined aggregate (possibly qualified with a schema name), and
+    defined aggregate (possibly qualified with a schema name),
    <replaceable>expression</replaceable> is
    any value expression that does not itself contain an aggregate
-    expression or a window function call.
+    expression or a window function call, and
+    <replaceable>order_by_clause</replaceable> is a optional
+    <literal>ORDER BY</> clause as described below.
   </para>

   <para>
@ -1545,8 +1547,9 @@ sqrt(2)
    whether to ignore null values or not &mdash; but all the standard ones do.)
    The second form is the same as the first, since
    <literal>ALL</literal> is the default.  The third form invokes the
-    aggregate for all distinct non-null values of the expressions found
-    in the input rows.  The last form invokes the aggregate once for
+    aggregate for all distinct values of the expressions found
+    in the input rows (ignoring nulls if the function chooses to do so).
+    The last form invokes the aggregate once for
    each input row regardless of null or non-null values; since no
    particular input value is specified, it is generally only useful
    for the <function>count(*)</function> aggregate function.
@ -1560,6 +1563,40 @@ sqrt(2)
    distinct non-null values of <literal>f1</literal>.
   </para>

+   <para>
+    Ordinarily, the input rows are fed to the aggregate function in an
+    unspecified order.  In many cases this does not matter; for example,
+    <function>min</> produces the same result no matter what order it
+    receives the inputs in.  However, some aggregate functions
+    (such as <function>array_agg</> and <function>xmlagg</>) produce
+    results that depend on the ordering of the input rows.  When using
+    such an aggregate, the optional <replaceable>order_by_clause</> can be
+    used to specify the desired ordering.  The <replaceable>order_by_clause</>
+    has the same syntax as for a query-level <literal>ORDER BY</> clause, as
+    described in <xref linkend="queries-order">, except that its expressions
+    are always just expressions and cannot be output-column names or numbers.
+    For example:
+
+<programlisting>
+SELECT array_agg(a ORDER BY b DESC) FROM table;
+</programlisting>
+   </para>
+
+   <para>
+    If <literal>DISTINCT</> is specified in addition to an
+    <replaceable>order_by_clause</>, then all the <literal>ORDER BY</>
+    expressions must match regular arguments of the aggregate; that is,
+    you cannot sort on an expression that is not included in the
+    <literal>DISTINCT</> list.
+   </para>
+
+   <note>
+    <para>
+     The ability to specify both <literal>DISTINCT</> and <literal>ORDER BY</>
+     in an aggregate function is a <productname>PostgreSQL</> extension.
+    </para>
+   </note>
+
   <para>
    The predefined aggregate functions are described in <xref
    linkend="functions-aggregate">.  Other aggregate functions can be added
@ -1588,13 +1625,6 @@ sqrt(2)
    appearing only in the result list or <literal>HAVING</> clause
    applies with respect to the query level that the aggregate belongs to.
   </para>
-
-   <note>
-    <para>
-     <productname>PostgreSQL</productname> currently does not support
-     <literal>DISTINCT</> with more than one input expression.
-    </para>
-   </note>
  </sect2>

  <sect2 id="syntax-window-functions">
@ -1697,7 +1727,8 @@ ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    <literal>count(*) OVER (PARTITION BY x ORDER BY y)</>.
    <literal>*</> is customarily not used for non-aggregate window functions.
    Aggregate window functions, unlike normal aggregate functions, do not
-    allow <literal>DISTINCT</> to be used within the function argument list.
+    allow <literal>DISTINCT</> or <literal>ORDER BY</> to be used within the
+    function argument list.
   </para>

   <para>