1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-28 23:42:10 +03:00

Extend CTE patch to support recursive UNION (ie, without ALL). The

implementation uses an in-memory hash table, so it will poop out for very
large recursive results ... but the performance characteristics of a
sort-based implementation would be pretty unpleasant too.
This commit is contained in:
Tom Lane
2008-10-07 19:27:04 +00:00
parent 059349be0c
commit 0d115dde82
13 changed files with 345 additions and 74 deletions

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/queries.sgml,v 1.46 2008/10/04 21:56:52 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/queries.sgml,v 1.47 2008/10/07 19:27:03 tgl Exp $ -->
<chapter id="queries">
<title>Queries</title>
@ -1519,7 +1519,8 @@ SELECT sum(n) FROM t;
</programlisting>
The general form of a recursive <literal>WITH</> query is always a
<firstterm>non-recursive term</>, then <literal>UNION ALL</>, then a
<firstterm>non-recursive term</>, then <literal>UNION</> (or
<literal>UNION ALL</>), then a
<firstterm>recursive term</>, where only the recursive term can contain
a reference to the query's own output. Such a query is executed as
follows:
@ -1530,9 +1531,10 @@ SELECT sum(n) FROM t;
<step performance="required">
<para>
Evaluate the non-recursive term. Include all its output rows in the
result of the recursive query, and also place them in a temporary
<firstterm>working table</>.
Evaluate the non-recursive term. For <literal>UNION</> (but not
<literal>UNION ALL</>), discard duplicate rows. Include all remaining
rows in the result of the recursive query, and also place them in a
temporary <firstterm>working table</>.
</para>
</step>
@ -1544,9 +1546,11 @@ SELECT sum(n) FROM t;
<step performance="required">
<para>
Evaluate the recursive term, substituting the current contents of
the working table for the recursive self-reference. Include all its
output rows in the result of the recursive query, and also place them
in a temporary <firstterm>intermediate table</>.
the working table for the recursive self-reference.
For <literal>UNION</> (but not <literal>UNION ALL</>), discard
duplicate rows and rows that duplicate any previous result row.
Include all remaining rows in the result of the recursive query, and
also place them in a temporary <firstterm>intermediate table</>.
</para>
</step>
@ -1598,10 +1602,13 @@ GROUP BY sub_part
<para>
When working with recursive queries it is important to be sure that
the recursive part of the query will eventually return no tuples,
or else the query will loop indefinitely. A useful trick for
development purposes is to place a <literal>LIMIT</> in the parent
query. For example, this query would loop forever without the
<literal>LIMIT</>:
or else the query will loop indefinitely. Sometimes, using
<literal>UNION</> instead of <literal>UNION ALL</> can accomplish this
by discarding rows that duplicate previous output rows; this catches
cycles that would otherwise repeat. A useful trick for testing queries
when you are not certain if they might loop is to place a <literal>LIMIT</>
in the parent query. For example, this query would loop forever without
the <literal>LIMIT</>:
<programlisting>
WITH RECURSIVE t(n) AS (
@ -1614,7 +1621,7 @@ SELECT n FROM t LIMIT 100;
This works because <productname>PostgreSQL</productname>'s implementation
evaluates only as many rows of a <literal>WITH</> query as are actually
demanded by the parent query. Using this trick in production is not
fetched by the parent query. Using this trick in production is not
recommended, because other systems might work differently.
</para>

View File

@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/ref/select.sgml,v 1.105 2008/10/04 21:56:52 tgl Exp $
$PostgreSQL: pgsql/doc/src/sgml/ref/select.sgml,v 1.106 2008/10/07 19:27:04 tgl Exp $
PostgreSQL documentation
-->
@ -202,10 +202,10 @@ and <replaceable class="parameter">with_query</replaceable> is:
subquery to reference itself by name. Such a subquery must have
the form
<synopsis>
<replaceable class="parameter">non_recursive_term</replaceable> UNION ALL <replaceable class="parameter">recursive_term</replaceable>
<replaceable class="parameter">non_recursive_term</replaceable> UNION [ ALL ] <replaceable class="parameter">recursive_term</replaceable>
</synopsis>
where the recursive self-reference must appear on the right-hand
side of <literal>UNION ALL</>. Only one recursive self-reference
side of the <literal>UNION</>. Only one recursive self-reference
is permitted per query.
</para>
@ -1234,7 +1234,7 @@ SELECT distance, employee_name FROM employee_recursive;
</programlisting>
Notice the typical form of recursive queries:
an initial condition, followed by <literal>UNION ALL</literal>,
an initial condition, followed by <literal>UNION</literal>,
followed by the recursive part of the query. Be sure that the
recursive part of the query will eventually return no tuples, or
else the query will loop indefinitely. (See <xref linkend="queries-with">