1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-28 23:42:10 +03:00

Explicitly track whether aggregate final functions modify transition state.

Up to now, there's been hard-wired assumptions that normal aggregates'
final functions never modify their transition states, while ordered-set
aggregates' final functions always do.  This has always been a bit
limiting, and in particular it's getting in the way of improving the
built-in ordered-set aggregates to allow merging of transition states.
Therefore, let's introduce catalog and CREATE AGGREGATE infrastructure
that lets the finalfn's behavior be declared explicitly.

There are now three possibilities for the finalfn behavior: it's purely
read-only, it trashes the transition state irrecoverably, or it changes
the state in such a way that no more transfn calls are possible but the
state can still be passed to other, compatible finalfns.  There are no
examples of this third case today, but we'll shortly make the built-in
OSAs act like that.

This change allows user-defined aggregates to explicitly disclaim support
for use as window functions, and/or to prevent transition state merging,
if their implementations cannot handle that.  While it was previously
possible to handle the window case with a run-time error check, there was
not any way to prevent transition state merging, which in retrospect is
something commit 804163bc2 should have provided for.  But better late
than never.

In passing, split out pg_aggregate.c's extern function declarations into
a new header file pg_aggregate_fn.h, similarly to what we've done for
some other catalog headers, so that pg_aggregate.h itself can be safe
for frontend files to include.  This lets pg_dump use the symbolic
names for relevant constants.

Discussion: https://postgr.es/m/4834.1507849699@sss.pgh.pa.us
This commit is contained in:
Tom Lane
2017-10-14 15:21:39 -04:00
parent 5f340cb30c
commit 4de2d4fba3
16 changed files with 555 additions and 251 deletions

View File

@ -486,6 +486,26 @@
<entry></entry>
<entry>True to pass extra dummy arguments to <structfield>aggmfinalfn</structfield></entry>
</row>
<row>
<entry><structfield>aggfinalmodify</structfield></entry>
<entry><type>char</type></entry>
<entry></entry>
<entry>Whether <structfield>aggfinalfn</structfield> modifies the
transition state value:
<literal>r</literal> if it is read-only,
<literal>s</literal> if the <structfield>aggtransfn</structfield>
cannot be applied after the <structfield>aggfinalfn</structfield>, or
<literal>w</literal> if it writes on the value
</entry>
</row>
<row>
<entry><structfield>aggmfinalmodify</structfield></entry>
<entry><type>char</type></entry>
<entry></entry>
<entry>Like <structfield>aggfinalmodify</structfield>, but for
the <structfield>aggmfinalfn</structfield>
</entry>
</row>
<row>
<entry><structfield>aggsortop</structfield></entry>
<entry><type>oid</type></entry>

View File

@ -27,6 +27,7 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ <replacea
[ , SSPACE = <replaceable class="parameter">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="parameter">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
[ , FINALFUNC_MODIFY = { READ_ONLY | SHARABLE | READ_WRITE } ]
[ , COMBINEFUNC = <replaceable class="parameter">combinefunc</replaceable> ]
[ , SERIALFUNC = <replaceable class="parameter">serialfunc</replaceable> ]
[ , DESERIALFUNC = <replaceable class="parameter">deserialfunc</replaceable> ]
@ -37,6 +38,7 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ <replacea
[ , MSSPACE = <replaceable class="parameter">mstate_data_size</replaceable> ]
[ , MFINALFUNC = <replaceable class="parameter">mffunc</replaceable> ]
[ , MFINALFUNC_EXTRA ]
[ , MFINALFUNC_MODIFY = { READ_ONLY | SHARABLE | READ_WRITE } ]
[ , MINITCOND = <replaceable class="parameter">minitial_condition</replaceable> ]
[ , SORTOP = <replaceable class="parameter">sort_operator</replaceable> ]
[ , PARALLEL = { SAFE | RESTRICTED | UNSAFE } ]
@ -49,6 +51,7 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ [ <replac
[ , SSPACE = <replaceable class="parameter">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="parameter">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
[ , FINALFUNC_MODIFY = { READ_ONLY | SHARABLE | READ_WRITE } ]
[ , INITCOND = <replaceable class="parameter">initial_condition</replaceable> ]
[ , PARALLEL = { SAFE | RESTRICTED | UNSAFE } ]
[ , HYPOTHETICAL ]
@ -63,6 +66,7 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> (
[ , SSPACE = <replaceable class="parameter">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="parameter">ffunc</replaceable> ]
[ , FINALFUNC_EXTRA ]
[ , FINALFUNC_MODIFY = { READ_ONLY | SHARABLE | READ_WRITE } ]
[ , COMBINEFUNC = <replaceable class="parameter">combinefunc</replaceable> ]
[ , SERIALFUNC = <replaceable class="parameter">serialfunc</replaceable> ]
[ , DESERIALFUNC = <replaceable class="parameter">deserialfunc</replaceable> ]
@ -73,6 +77,7 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> (
[ , MSSPACE = <replaceable class="parameter">mstate_data_size</replaceable> ]
[ , MFINALFUNC = <replaceable class="parameter">mffunc</replaceable> ]
[ , MFINALFUNC_EXTRA ]
[ , MFINALFUNC_MODIFY = { READ_ONLY | SHARABLE | READ_WRITE } ]
[ , MINITCOND = <replaceable class="parameter">minitial_condition</replaceable> ]
[ , SORTOP = <replaceable class="parameter">sort_operator</replaceable> ]
)
@ -197,7 +202,8 @@ CREATE AGGREGATE <replaceable class="parameter">name</replaceable> (
as described in <xref linkend="xaggr-moving-aggregates">. This requires
specifying the <literal>MSFUNC</>, <literal>MINVFUNC</>,
and <literal>MSTYPE</> parameters, and optionally
the <literal>MSPACE</>, <literal>MFINALFUNC</>, <literal>MFINALFUNC_EXTRA</>,
the <literal>MSPACE</>, <literal>MFINALFUNC</>,
<literal>MFINALFUNC_EXTRA</>, <literal>MFINALFUNC_MODIFY</>,
and <literal>MINITCOND</> parameters. Except for <literal>MINVFUNC</>,
these parameters work like the corresponding simple-aggregate parameters
without <literal>M</>; they define a separate implementation of the
@ -412,6 +418,21 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
</listitem>
</varlistentry>
<varlistentry>
<term><literal>FINALFUNC_MODIFY</> = { <literal>READ_ONLY</> | <literal>SHARABLE</> | <literal>READ_WRITE</> }</term>
<listitem>
<para>
This option specifies whether the final function is a pure function
that does not modify its arguments. <literal>READ_ONLY</> indicates
it does not; the other two values indicate that it may change the
transition state value. See <xref linkend="sql-createaggregate-notes"
endterm="sql-createaggregate-notes-title"> below for more detail. The
default is <literal>READ_ONLY</>, except for ordered-set aggregates,
for which the default is <literal>READ_WRITE</>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="parameter">combinefunc</replaceable></term>
<listitem>
@ -563,6 +584,16 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
</listitem>
</varlistentry>
<varlistentry>
<term><literal>MFINALFUNC_MODIFY</> = { <literal>READ_ONLY</> | <literal>SHARABLE</> | <literal>READ_WRITE</> }</term>
<listitem>
<para>
This option is like <literal>FINALFUNC_MODIFY</>, but it describes
the behavior of the moving-aggregate final function.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="parameter">minitial_condition</replaceable></term>
<listitem>
@ -587,12 +618,12 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
</varlistentry>
<varlistentry>
<term><literal>PARALLEL</literal></term>
<term><literal>PARALLEL =</> { <literal>SAFE</> | <literal>RESTRICTED</> | <literal>UNSAFE</> }</term>
<listitem>
<para>
The meanings of <literal>PARALLEL SAFE</>, <literal>PARALLEL
RESTRICTED</>, and <literal>PARALLEL UNSAFE</> are the same as
for <xref linkend="sql-createfunction">. An aggregate will not be
in <xref linkend="sql-createfunction">. An aggregate will not be
considered for parallelization if it is marked <literal>PARALLEL
UNSAFE</> (which is the default!) or <literal>PARALLEL RESTRICTED</>.
Note that the parallel-safety markings of the aggregate's support
@ -624,8 +655,8 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
</para>
</refsect1>
<refsect1>
<title>Notes</title>
<refsect1 id="sql-createaggregate-notes">
<title id="sql-createaggregate-notes-title">Notes</title>
<para>
In parameters that specify support function names, you can write
@ -634,6 +665,34 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
of the support functions are determined from other parameters.
</para>
<para>
Ordinarily, Postgres functions are expected to be true functions that
do not modify their input values. However, an aggregate transition
function, <emphasis>when used in the context of an aggregate</>,
is allowed to cheat and modify its transition-state argument in place.
This can provide substantial performance benefits compared to making
a fresh copy of the transition state each time.
</para>
<para>
Likewise, while an aggregate final function is normally expected not to
modify its input values, sometimes it is impractical to avoid modifying
the transition-state argument. Such behavior must be declared using
the <literal>FINALFUNC_MODIFY</> parameter. The <literal>READ_WRITE</>
value indicates that the final function modifies the transition state in
unspecified ways. This value prevents use of the aggregate as a window
function, and it also prevents merging of transition states for aggregate
calls that share the same input values and transition functions.
The <literal>SHARABLE</> value indicates that the transition function
cannot be applied after the final function, but multiple final-function
calls can be performed on the ending transition state value. This value
prevents use of the aggregate as a window function, but it allows merging
of transition states. (That is, the optimization of interest here is not
applying the same final function repeatedly, but applying different final
functions to the same ending transition state value. This is allowed as
long as none of the final functions are marked <literal>READ_WRITE</>.)
</para>
<para>
If an aggregate supports moving-aggregate mode, it will improve
calculation efficiency when the aggregate is used as a window function
@ -671,7 +730,8 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
Note that whether or not the aggregate supports moving-aggregate
mode, <productname>PostgreSQL</productname> can handle a moving frame
end without recalculation; this is done by continuing to add new values
to the aggregate's state. It is assumed that the final function does
to the aggregate's state. This is why use of an aggregate as a window
function requires that the final function be read-only: it must
not damage the aggregate's state value, so that the aggregation can be
continued even after an aggregate result value has been obtained for
one set of frame boundaries.

View File

@ -487,6 +487,13 @@ SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY income) FROM households;
C, since their state values aren't definable as any SQL data type.
(In the above example, notice that the state value is declared as
type <type>internal</> &mdash; this is typical.)
Also, because the final function performs the sort, it is not possible
to continue adding input rows by executing the transition function again
later. This means the final function is not <literal>READ_ONLY</>;
it must be declared in <xref linkend="sql-createaggregate">
as <literal>READ_WRITE</>, or as <literal>SHARABLE</> if it's
possible for additional final-function calls to make use of the
already-sorted state.
</para>
<para>
@ -622,16 +629,15 @@ SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY income) FROM households;
<programlisting>
if (AggCheckCallContext(fcinfo, NULL))
</programlisting>
One reason for checking this is that when it is true for a transition
function, the first input
One reason for checking this is that when it is true, the first input
must be a temporary state value and can therefore safely be modified
in-place rather than allocating a new copy.
See <function>int8inc()</> for an example.
(This is the <emphasis>only</>
case where it is safe for a function to modify a pass-by-reference input.
In particular, final functions for normal aggregates must not
modify their inputs in any case, because in some cases they will be
re-executed on the same final state value.)
(While aggregate transition functions are always allowed to modify
the transition value in-place, aggregate final functions are generally
discouraged from doing so; if they do so, the behavior must be declared
when creating the aggregate. See <xref linkend="sql-createaggregate">
for more detail.)
</para>
<para>