1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-31 22:04:40 +03:00

Allow polymorphic aggregates to have non-polymorphic state data types.

Before 9.4, such an aggregate couldn't be declared, because its final
function would have to have polymorphic result type but no polymorphic
argument, which CREATE FUNCTION would quite properly reject.  The
ordered-set-aggregate patch found a workaround: allow the final function
to be declared as accepting additional dummy arguments that have types
matching the aggregate's regular input arguments.  However, we failed
to notice that this problem applies just as much to regular aggregates,
despite the fact that we had a built-in regular aggregate array_agg()
that was known to be undeclarable in SQL because its final function
had an illegal signature.  So what we should have done, and what this
patch does, is to decouple the extra-dummy-arguments behavior from
ordered-set aggregates and make it generally available for all aggregate
declarations.  We have to put this into 9.4 rather than waiting till
later because it slightly alters the rules for declaring ordered-set
aggregates.

The patch turned out a bit bigger than I'd hoped because it proved
necessary to record the extra-arguments option in a new pg_aggregate
column.  I'd thought we could just look at the final function's pronargs
at runtime, but that didn't work well for variadic final functions.
It's probably just as well though, because it simplifies life for pg_dump
to record the option explicitly.

While at it, fix array_agg() to have a valid final-function signature,
and add an opr_sanity test to notice future deviations from polymorphic
consistency.  I also marked the percentile_cont() aggregates as not
needing extra arguments, since they don't.
This commit is contained in:
Tom Lane
2014-04-23 19:17:31 -04:00
parent 125ba2945a
commit f0fedfe82c
17 changed files with 509 additions and 298 deletions

View File

@ -404,6 +404,18 @@
<entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
<entry>Final function for moving-aggregate mode (zero if none)</entry>
</row>
<row>
<entry><structfield>aggfinalextra</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
<entry>True to pass extra dummy arguments to aggfinalfn</entry>
</row>
<row>
<entry><structfield>aggmfinalextra</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
<entry>True to pass extra dummy arguments to aggmfinalfn</entry>
</row>
<row>
<entry><structfield>aggsortop</structfield></entry>
<entry><type>oid</type></entry>

View File

@ -26,12 +26,14 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ <replacea
STYPE = <replaceable class="PARAMETER">state_data_type</replaceable>
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , MSFUNC = <replaceable class="PARAMETER">msfunc</replaceable> ]
[ , MINVFUNC = <replaceable class="PARAMETER">minvfunc</replaceable> ]
[ , MSTYPE = <replaceable class="PARAMETER">mstate_data_type</replaceable> ]
[ , MSSPACE = <replaceable class="PARAMETER">mstate_data_size</replaceable> ]
[ , MFINALFUNC = <replaceable class="PARAMETER">mffunc</replaceable> ]
[ , MFINALFUNC_EXTRA ]
[ , MINITCOND = <replaceable class="PARAMETER">minitial_condition</replaceable> ]
[ , SORTOP = <replaceable class="PARAMETER">sort_operator</replaceable> ]
)
@ -42,6 +44,7 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ [ <replac
STYPE = <replaceable class="PARAMETER">state_data_type</replaceable>
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , HYPOTHETICAL ]
)
@ -54,12 +57,14 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
STYPE = <replaceable class="PARAMETER">state_data_type</replaceable>
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , MSFUNC = <replaceable class="PARAMETER">sfunc</replaceable> ]
[ , MINVFUNC = <replaceable class="PARAMETER">invfunc</replaceable> ]
[ , MSTYPE = <replaceable class="PARAMETER">state_data_type</replaceable> ]
[ , MSSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , MFINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , MFINALFUNC_EXTRA ]
[ , MINITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , SORTOP = <replaceable class="PARAMETER">sort_operator</replaceable> ]
)
@ -166,12 +171,25 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
input rows.
</para>
<para>
Sometimes it is useful to declare the final function as taking not just
the state value, but extra parameters corresponding to the aggregate's
input values. The main reason for doing this is if the final function
is polymorphic and the state value's data type would be inadequate to
pin down the result type. These extra parameters are always passed as
NULL (and so the final function must not be strict when
the <literal>FINALFUNC_EXTRA</> option is used), but nonetheless they
are valid parameters. The final function could for example make use
of <function>get_fn_expr_argtype</> to identify the actual argument type
in the current call.
</para>
<para>
An aggregate can optionally support <firstterm>moving-aggregate mode</>,
as described in <xref linkend="xaggr-moving-aggregates">. This requires
specifying the <literal>MSFUNC</>, <literal>MINVFUNC</>,
and <literal>MSTYPE</> parameters, and optionally
the <literal>MSPACE</>, <literal>MFINALFUNC</>,
the <literal>MSPACE</>, <literal>MFINALFUNC</>, <literal>MFINALFUNC_EXTRA</>,
and <literal>MINITCOND</> parameters. Except for <literal>MINVFUNC</>,
these parameters work like the corresponding simple-aggregate parameters
without <literal>M</>; they define a separate implementation of the
@ -361,12 +379,16 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<para>
For ordered-set (including hypothetical-set) aggregates, the
final function receives not only the final state value,
but also the values of all the direct arguments, followed by
null values corresponding to each aggregated argument.
(The reason for including the aggregated arguments in the function
signature is that this may be necessary to allow correct resolution
of the aggregate result type, when a polymorphic aggregate is
being defined.)
but also the values of all the direct arguments.
</para>
<para>
If <literal>FINALFUNC_EXTRA</> is specified, then in addition to the
final state value and any direct arguments, the final function
receives extra NULL values corresponding to the aggregate's regular
(aggregated) arguments. This is mainly useful to allow correct
resolution of the aggregate result type when a polymorphic aggregate
is being defined.
</para>
</listitem>
</varlistentry>
@ -438,9 +460,11 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
The name of the final function called to compute the aggregate's
result after all input rows have been traversed, when using
moving-aggregate mode. This works the same as <replaceable>ffunc</>,
except that its input type is <replaceable>mstate_data_type</>.
except that its first argument's type
is <replaceable>mstate_data_type</> and extra dummy arguments are
specified by writing <literal>MFINALFUNC_EXTRA</>.
The aggregate result type determined by <replaceable>mffunc</>
and <replaceable>mstate_data_type</> must match that determined by the
or <replaceable>mstate_data_type</> must match that determined by the
aggregate's regular implementation.
</para>
</listitem>
@ -494,6 +518,13 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<refsect1>
<title>Notes</title>
<para>
In parameters that specify support function names, you can write
a schema name if needed, for example <literal>SFUNC = public.sum</>.
Do not write argument types there, however &mdash; the argument types
of the support functions are determined from other parameters.
</para>
<para>
If an aggregate supports moving-aggregate mode, it will improve
calculation efficiency when the aggregate is used as a window function

View File

@ -127,7 +127,7 @@ CREATE AGGREGATE avg (float8)
<function>float8_accum</> requires a three-element array, not just
two elements, because it accumulates the sum of squares as well as
the sum and count of the inputs. This is so that it can be used for
some other aggregates besides <function>avg</>.
some other aggregates as well as <function>avg</>.
</para>
</note>
@ -182,7 +182,7 @@ CREATE AGGREGATE avg (float8)
The inverse transition function is passed the current state value and the
aggregate input value(s) for the earliest row included in the current
state. It must reconstruct what the state value would have been if the
given input value had never been aggregated, but only the rows following
given input row had never been aggregated, but only the rows following
it. This sometimes requires that the forward transition function keep
more state than is needed for plain aggregation mode. Therefore, the
moving-aggregate mode uses a completely separate implementation from the
@ -339,6 +339,47 @@ SELECT attrelid::regclass, array_accum(atttypid::regtype)
</programlisting>
</para>
<para>
Ordinarily, an aggregate function with a polymorphic result type has a
polymorphic state type, as in the above example. This is necessary
because otherwise the final function cannot be declared sensibly: it
would need to have a polymorphic result type but no polymorphic argument
type, which <command>CREATE FUNCTION</> will reject on the grounds that
the result type cannot be deduced from a call. But sometimes it is
inconvenient to use a polymorphic state type. The most common case is
where the aggregate support functions are to be written in C and the
state type should be declared as <type>internal</> because there is
no SQL-level equivalent for it. To address this case, it is possible to
declare the final function as taking extra <quote>dummy</> arguments
that match the input arguments of the aggregate. Such dummy arguments
are always passed as NULLs since no specific value is available when the
final function is called. Their only use is to allow a polymorphic
final function's result type to be connected to the aggregate's input
type(s). For example, the definition of the built-in
aggregate <function>array_agg</> is equivalent to
<programlisting>
CREATE FUNCTION array_agg_transfn(internal, anyelement)
RETURNS internal ...;
CREATE FUNCTION array_agg_finalfn(internal, anyelement)
RETURNS anyarray ...;
CREATE AGGREGATE array_agg (anyelement)
(
sfunc = array_agg_transfn,
stype = internal,
finalfunc = array_agg_finalfn,
finalfunc_extra
);
</programlisting>
Here, the <literal>finalfunc_extra</> option specifies that the final
function receives, in addition to the state value, extra dummy
argument(s) corresponding to the aggregate's input argument(s).
The extra <type>anyelement</> argument allows the declaration
of <function>array_agg_finalfn</> to be valid.
</para>
<para>
An aggregate function can be made to accept a varying number of arguments
by declaring its last argument as a <literal>VARIADIC</> array, in much
@ -401,15 +442,23 @@ SELECT myaggregate(a, b, c ORDER BY a) FROM ...
definition of <function>percentile_disc</> is equivalent to:
<programlisting>
CREATE FUNCTION ordered_set_transition(internal, anyelement)
RETURNS internal ...;
CREATE FUNCTION percentile_disc_final(internal, float8, anyelement)
RETURNS anyelement ...;
CREATE AGGREGATE percentile_disc (float8 ORDER BY anyelement)
(
sfunc = ordered_set_transition,
stype = internal,
finalfunc = percentile_disc_final
finalfunc = percentile_disc_final,
finalfunc_extra
);
</programlisting>
which could be used to obtain a median household income like this:
This aggregate takes a <type>float8</> direct argument (the percentile
fraction) and an aggregated input that can be of any sortable data type.
It could be used to obtain a median household income like this:
<programlisting>
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY income) FROM households;
@ -447,25 +496,12 @@ SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY income) FROM households;
same definition as for normal aggregates, but note that the direct
arguments (if any) are not provided. The final function receives
the last state value, the values of the direct arguments if any,
and null values corresponding to the aggregated input(s). While the
null values seem useless at first sight, they are important because
they make it possible to include the data types of the aggregated
input(s) in the final function's signature, which may be necessary
to resolve the output type of a polymorphic aggregate. For example,
the built-in <function>mode()</> ordered-set aggregate takes a
single aggregated column of any sortable data type and returns a
value of that same type. This is possible because the final function
is declared as <literal>mode_final(internal, anyelement) returns
anyelement</>, with the <type>anyelement</> parameter corresponding
to the dummy null argument that represents the aggregated column.
The actual data is conveyed in the <type>internal</>-type state
value, but type resolution needs a parse-time indication of what the
result data type will be, and the dummy argument provides that.
In the example of <function>percentile_disc</>, the support functions
are respectively declared as
<literal>ordered_set_transition(internal, "any") returns internal</>
and <literal>percentile_disc_final(internal, float8, anyelement)
returns anyelement</>.
and (if <literal>finalfunc_extra</> is specified) NULL values
corresponding to the aggregated input(s). As with normal
aggregates, <literal>finalfunc_extra</> is only really useful if the
aggregate is polymorphic; then the extra dummy argument(s) are needed
to connect the final function's result type to the aggregate's input
type(s).
</para>
<para>