1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-16 06:01:02 +03:00

Allow aggregates to provide estimates of their transition state data size.

Formerly the planner had a hard-wired rule of thumb for guessing the amount
of space consumed by an aggregate function's transition state data.  This
estimate is critical to deciding whether it's OK to use hash aggregation,
and in many situations the built-in estimate isn't very good.  This patch
adds a column to pg_aggregate wherein a per-aggregate estimate can be
provided, overriding the planner's default, and infrastructure for setting
the column via CREATE AGGREGATE.

It may be that additional smarts will be required in future, perhaps even
a per-aggregate estimation function.  But this is already a step forward.

This is extracted from a larger patch to improve the performance of numeric
and int8 aggregates.  I (tgl) thought it was worth reviewing and committing
this infrastructure separately.  In this commit, all built-in aggregates
are given aggtransspace = 0, so no behavior should change.

Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra
This commit is contained in:
Tom Lane
2013-11-16 16:03:40 -05:00
parent 55c3d86a2a
commit 6cb86143e8
14 changed files with 245 additions and 147 deletions

View File

@ -372,6 +372,13 @@
<entry><literal><link linkend="catalog-pg-type"><structname>pg_type</structname></link>.oid</literal></entry>
<entry>Data type of the aggregate function's internal transition (state) data</entry>
</row>
<row>
<entry><structfield>aggtransspace</structfield></entry>
<entry><type>int4</type></entry>
<entry></entry>
<entry>Approximate average size (in bytes) of the transition state
data, or zero to use a default estimate</entry>
</row>
<row>
<entry><structfield>agginitval</structfield></entry>
<entry><type>text</type></entry>

View File

@ -24,6 +24,7 @@ PostgreSQL documentation
CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ <replaceable class="parameter">argmode</replaceable> ] [ <replaceable class="parameter">arg_name</replaceable> ] <replaceable class="parameter">arg_data_type</replaceable> [ , ... ] ) (
SFUNC = <replaceable class="PARAMETER">sfunc</replaceable>,
STYPE = <replaceable class="PARAMETER">state_data_type</replaceable>
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , SORTOP = <replaceable class="PARAMETER">sort_operator</replaceable> ]
@ -35,6 +36,7 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
BASETYPE = <replaceable class="PARAMETER">base_type</replaceable>,
SFUNC = <replaceable class="PARAMETER">sfunc</replaceable>,
STYPE = <replaceable class="PARAMETER">state_data_type</replaceable>
[ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
[ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
[ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
[ , SORTOP = <replaceable class="PARAMETER">sort_operator</replaceable> ]
@ -264,6 +266,22 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="PARAMETER">state_data_size</replaceable></term>
<listitem>
<para>
The approximate average size (in bytes) of the aggregate's state value.
If this parameter is omitted or is zero, a default estimate is used
based on the <replaceable>state_data_type</>.
The planner uses this value to estimate the memory required for a
grouped aggregate query. The planner will consider using hash
aggregation for such a query only if the hash table is estimated to fit
in <xref linkend="guc-work-mem">; therefore, large values of this
parameter discourage use of hash aggregation.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="PARAMETER">ffunc</replaceable></term>
<listitem>