1
0
mirror of https://github.com/postgres/postgres.git synced 2025-08-30 06:01:21 +03:00

Support ordered-set (WITHIN GROUP) aggregates.

This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()).  We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.

Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions.  To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c.  This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need.  There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.

In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates.  Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.

Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
This commit is contained in:
Tom Lane
2013-12-23 16:11:35 -05:00
parent 37484ad2aa
commit 8d65da1f01
64 changed files with 4686 additions and 755 deletions

View File

@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ <replaceable class="parameter">argmode</replaceable> ] [ <replaceable class="parameter">arg_name</replaceable> ] <replaceable class="parameter">arg_data_type</replaceable> [ , ... ] ) (
CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ <replaceable class="parameter">argmode</replaceable> ] [ <replaceable class="parameter">argname</replaceable> ] <replaceable class="parameter">arg_data_type</replaceable> [ , ... ] ) (
SFUNC = <replaceable class="PARAMETER">sfunc</replaceable>,
STYPE = <replaceable class="PARAMETER">state_data_type</replaceable>
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
@@ -30,6 +30,16 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ <replacea
[ , SORTOP = <replaceable class="PARAMETER">sort_operator</replaceable> ]
)
CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ [ <replaceable class="parameter">argmode</replaceable> ] [ <replaceable class="parameter">argname</replaceable> ] <replaceable class="parameter">arg_data_type</replaceable> [ , ... ] ]
ORDER BY [ <replaceable class="parameter">argmode</replaceable> ] [ <replaceable class="parameter">argname</replaceable> ] <replaceable class="parameter">arg_data_type</replaceable> [ , ... ] ) (
SFUNC = <replaceable class="PARAMETER">sfunc</replaceable>,
STYPE = <replaceable class="PARAMETER">state_data_type</replaceable>
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , HYPOTHETICAL ]
)
<phrase>or the old syntax</phrase>
CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
@@ -69,6 +79,8 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
name and input data type(s) of an aggregate must also be distinct from
the name and input data type(s) of every ordinary function in the same
schema.
This behavior is identical to overloading of ordinary function names
(see <xref linkend="sql-createfunction">).
</para>
<para>
@@ -128,7 +140,7 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
<para>
If the state transition function is not strict, then it will be called
unconditionally at each input row, and must deal with null inputs
and null transition values for itself. This allows the aggregate
and null state values for itself. This allows the aggregate
author to have full control over the aggregate's handling of null values.
</para>
@@ -142,6 +154,22 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
input rows.
</para>
<para>
The syntax with <literal>ORDER BY</literal> in the parameter list creates
a special type of aggregate called an <firstterm>ordered-set
aggregate</firstterm>; or if <literal>HYPOTHETICAL</> is specified, then
a <firstterm>hypothetical-set aggregate</firstterm> is created. These
aggregates operate over groups of sorted values in order-dependent ways,
so that specification of an input sort order is an essential part of a
call. Also, they can have <firstterm>direct</> arguments, which are
arguments that are evaluated only once per aggregation rather than once
per input row. Hypothetical-set aggregates are a subclass of ordered-set
aggregates in which some of the direct arguments are required to match,
in number and datatypes, the aggregated argument columns. This allows
the values of those direct arguments to be added to the collection of
aggregate-input rows as an additional <quote>hypothetical</> row.
</para>
<para>
Aggregates that behave like <function>MIN</> or <function>MAX</> can
sometimes be optimized by looking into an index instead of scanning every
@@ -202,7 +230,7 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
</varlistentry>
<varlistentry>
<term><replaceable class="parameter">arg_name</replaceable></term>
<term><replaceable class="parameter">argname</replaceable></term>
<listitem>
<para>
@@ -234,6 +262,7 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
only one input parameter. To define a zero-argument aggregate function
with this syntax, specify the <literal>basetype</> as
<literal>"ANY"</> (not <literal>*</>).
Ordered-set aggregates cannot be defined with the old syntax.
</para>
</listitem>
</varlistentry>
@@ -243,7 +272,7 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<listitem>
<para>
The name of the state transition function to be called for each
input row. For an <replaceable class="PARAMETER">N</>-argument
input row. For a normal <replaceable class="PARAMETER">N</>-argument
aggregate function, the <replaceable class="PARAMETER">sfunc</>
must take <replaceable class="PARAMETER">N</>+1 arguments,
the first being of type <replaceable
@@ -254,6 +283,13 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
takes the current state value and the current input data value(s),
and returns the next state value.
</para>
<para>
For ordered-set (including hypothetical-set) aggregates, the state
transition function receives only the current state value and the
aggregated arguments, not the direct arguments. Otherwise it is the
same.
</para>
</listitem>
</varlistentry>
@@ -287,7 +323,8 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<listitem>
<para>
The name of the final function called to compute the aggregate's
result after all input rows have been traversed. The function
result after all input rows have been traversed.
For a normal aggregate, this function
must take a single argument of type <replaceable
class="PARAMETER">state_data_type</replaceable>. The return
data type of the aggregate is defined as the return type of this
@@ -296,6 +333,17 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
aggregate's result, and the return type is <replaceable
class="PARAMETER">state_data_type</replaceable>.
</para>
<para>
For ordered-set (including hypothetical-set) aggregates, the
final function receives not only the final state value,
but also the values of all the direct arguments, followed by
null values corresponding to each aggregated argument.
(The reason for including the aggregated arguments in the function
signature is that this may be necessary to allow correct resolution
of the aggregate result type, when a polymorphic aggregate is
being defined.)
</para>
</listitem>
</varlistentry>
@@ -319,7 +367,22 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<function>MAX</>-like aggregate.
This is just an operator name (possibly schema-qualified).
The operator is assumed to have the same input data types as
the aggregate (which must be a single-argument aggregate).
the aggregate (which must be a single-argument normal aggregate).
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>HYPOTHETICAL</literal></term>
<listitem>
<para>
For ordered-set aggregates only, this flag specifies that the aggregate
arguments are to be processed according to the requirements for
hypothetical-set aggregates: that is, the last few direct arguments must
match the data types of the aggregated (<literal>WITHIN GROUP</>)
arguments. The <literal>HYPOTHETICAL</literal> flag has no effect on
run-time behavior, only on parse-time resolution of the data types and
collations of the aggregate's arguments.
</para>
</listitem>
</varlistentry>
@@ -331,6 +394,29 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
</para>
</refsect1>
<refsect1>
<title>Notes</title>
<para>
The syntax for ordered-set aggregates allows <literal>VARIADIC</>
to be specified for both the last direct parameter and the last
aggregated (<literal>WITHIN GROUP</>) parameter. However, the
current implementation restricts use of <literal>VARIADIC</>
in two ways. First, ordered-set aggregates can only use
<literal>VARIADIC "any"</>, not other variadic array types.
Second, if the last direct parameter is <literal>VARIADIC "any"</>,
then there can be only one aggregated parameter and it must also
be <literal>VARIADIC "any"</>. (In the representation used in the
system catalogs, these two parameters are merged into a single
<literal>VARIADIC "any"</> item, since <structname>pg_proc</> cannot
represent functions with more than one <literal>VARIADIC</> parameter.)
If the aggregate is a hypothetical-set aggregate, the direct arguments
that match the <literal>VARIADIC "any"</> parameter are the hypothetical
ones; any preceding parameters represent additional direct arguments
that are not constrained to match the aggregated arguments.
</para>
</refsect1>
<refsect1>
<title>Examples</title>