1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-30 11:03:19 +03:00

SEARCH and CYCLE clauses

This adds the SQL standard feature that adds the SEARCH and CYCLE
clauses to recursive queries to be able to do produce breadth- or
depth-first search orders and detect cycles.  These clauses can be
rewritten into queries using existing syntax, and that is what this
patch does in the rewriter.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
This commit is contained in:
Peter Eisentraut
2021-02-01 13:54:59 +01:00
parent bb513b364b
commit 3696a600e2
28 changed files with 2301 additions and 33 deletions

View File

@ -2218,6 +2218,39 @@ SELECT * FROM search_tree <emphasis>ORDER BY depth</emphasis>;
in any case.
</para>
</tip>
<para>
There is built-in syntax to compute a depth- or breadth-first sort column.
For example:
<programlisting>
WITH RECURSIVE search_tree(id, link, data) AS (
SELECT t.id, t.link, t.data
FROM tree t
UNION ALL
SELECT t.id, t.link, t.data
FROM tree t, search_tree st
WHERE t.id = st.link
) <emphasis>SEARCH DEPTH FIRST BY id SET ordercol</emphasis>
SELECT * FROM search_tree ORDER BY ordercol;
WITH RECURSIVE search_tree(id, link, data) AS (
SELECT t.id, t.link, t.data
FROM tree t
UNION ALL
SELECT t.id, t.link, t.data
FROM tree t, search_tree st
WHERE t.id = st.link
) <emphasis>SEARCH BREADTH FIRST BY id SET ordercol</emphasis>
SELECT * FROM search_tree ORDER BY ordercol;
</programlisting>
This syntax is internally expanded to something similar to the above
hand-written forms. The <literal>SEARCH</literal> clause specifies whether
depth- or breadth first search is wanted, the list of columns to track for
sorting, and a column name that will contain the result data that can be
used for sorting. That column will implicitly be added to the output rows
of the CTE.
</para>
</sect3>
<sect3 id="queries-with-cycle">
@ -2305,10 +2338,39 @@ SELECT * FROM search_graph;
</para>
</tip>
<para>
There is built-in syntax to simplify cycle detection. The above query can
also be written like this:
<programlisting>
WITH RECURSIVE search_graph(id, link, data, depth) AS (
SELECT g.id, g.link, g.data, 1
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1
FROM graph g, search_graph sg
WHERE g.id = sg.link
) <emphasis>CYCLE id SET is_cycle TO true DEFAULT false USING path</emphasis>
SELECT * FROM search_graph;
</programlisting>
and it will be internally rewritten to the above form. The
<literal>CYCLE</literal> clause specifies first the list of columns to
track for cycle detection, then a column name that will show whether a
cycle has been detected, then two values to use in that column for the yes
and no cases, and finally the name of another column that will track the
path. The cycle and path columns will implicitly be added to the output
rows of the CTE.
</para>
<tip>
<para>
The cycle path column is computed in the same way as the depth-first
ordering column show in the previous section.
ordering column show in the previous section. A query can have both a
<literal>SEARCH</literal> and a <literal>CYCLE</literal> clause, but a
depth-first search specification and a cycle detection specification would
create redundant computations, so it's more efficient to just use the
<literal>CYCLE</literal> clause and order by the path column. If
breadth-first ordering is wanted, then specifying both
<literal>SEARCH</literal> and <literal>CYCLE</literal> can be useful.
</para>
</tip>

View File

@ -73,6 +73,8 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>and <replaceable class="parameter">with_query</replaceable> is:</phrase>
<replaceable class="parameter">with_query_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ] AS [ [ NOT ] MATERIALIZED ] ( <replaceable class="parameter">select</replaceable> | <replaceable class="parameter">values</replaceable> | <replaceable class="parameter">insert</replaceable> | <replaceable class="parameter">update</replaceable> | <replaceable class="parameter">delete</replaceable> )
[ SEARCH { BREADTH | DEPTH } FIRST BY <replaceable>column_name</replaceable> [, ...] SET <replaceable>search_seq_col_name</replaceable> ]
[ CYCLE <replaceable>column_name</replaceable> [, ...] SET <replaceable>cycle_mark_col_name</replaceable> TO <replaceable>cycle_mark_value</replaceable> DEFAULT <replaceable>cycle_mark_default</replaceable> USING <replaceable>cycle_path_col_name</replaceable> ]
TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</synopsis>
@ -276,6 +278,48 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
queries that do not use recursion or forward references.
</para>
<para>
The optional <literal>SEARCH</literal> clause computes a <firstterm>search
sequence column</firstterm> that can be used for ordering the results of a
recursive query in either breadth-first or depth-first order. The
supplied column name list specifies the row key that is to be used for
keeping track of visited rows. A column named
<replaceable>search_seq_col_name</replaceable> will be added to the result
column list of the <literal>WITH</literal> query. This column can be
ordered by in the outer query to achieve the respective ordering. See
<xref linkend="queries-with-search"/> for examples.
</para>
<para>
The optional <literal>CYCLE</literal> clause is used to detect cycles in
recursive queries. The supplied column name list specifies the row key
that is to be used for keeping track of visited rows. A column named
<replaceable>cycle_mark_col_name</replaceable> will be added to the result
column list of the <literal>WITH</literal> query. This column will be set
to <replaceable>cycle_mark_value</replaceable> when a cycle has been
detected, else to <replaceable>cycle_mark_default</replaceable>.
Furthermore, processing of the recursive union will stop when a cycle has
been detected. <replaceable>cycle_mark_value</replaceable> and
<replaceable>cycle_mark_default</replaceable> must be constants and they
must be coercible to a common data type, and the data type must have an
inequality operator. (The SQL standard requires that they be character
strings, but PostgreSQL does not require that.) Furthermore, a column
named <replaceable>cycle_path_col_name</replaceable> will be added to the
result column list of the <literal>WITH</literal> query. This column is
used internally for tracking visited rows. See <xref
linkend="queries-with-cycle"/> for examples.
</para>
<para>
Both the <literal>SEARCH</literal> and the <literal>CYCLE</literal> clause
are only valid for recursive <literal>WITH</literal> queries. The
<replaceable>with_query</replaceable> must be a <literal>UNION</literal>
(or <literal>UNION ALL</literal>) of two <literal>SELECT</literal> (or
equivalent) commands (no nested <literal>UNION</literal>s). If both
clauses are used, the column added by the <literal>SEARCH</literal> clause
appears before the columns added by the <literal>CYCLE</literal> clause.
</para>
<para>
The primary query and the <literal>WITH</literal> queries are all
(notionally) executed at the same time. This implies that the effects of