1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-25 01:02:05 +03:00

Implement comparison of generic records (composite types), and invent a

pseudo-type record[] to represent arrays of possibly-anonymous composite
types.  Since composite datums carry their own type identification, no
extra knowledge is needed at the array level.

The main reason for doing this right now is that it is necessary to support
the general case of detection of cycles in recursive queries: if you need to
compare more than one column to detect a cycle, you need to compare a ROW()
to an array built from ROW()s, at least if you want to do it as the spec
suggests.  Add some documentation and regression tests concerning the cycle
detection issue.
This commit is contained in:
Tom Lane
2008-10-13 16:25:20 +00:00
parent d6dfa1e6c6
commit e3b0117459
18 changed files with 809 additions and 22 deletions

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.448 2008/10/03 07:33:08 heikki Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.449 2008/10/13 16:25:19 tgl Exp $ -->
<chapter id="functions">
<title>Functions and Operators</title>
@ -10667,6 +10667,20 @@ AND
be either true or false, never null.
</para>
<note>
<para>
The SQL specification requires row-wise comparison to return NULL if the
result depends on comparing two NULL values or a NULL and a non-NULL.
<productname>PostgreSQL</productname> does this only when comparing the
results of two row constructors or comparing a row constructor to the
output of a subquery (as in <xref linkend="functions-subquery">).
In other contexts where two composite-type values are compared, two
NULL field values are considered equal, and a NULL is considered larger
than a non-NULL. This is necessary in order to have consistent sorting
and indexing behavior for composite types.
</para>
</note>
</sect2>
</sect1>

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/queries.sgml,v 1.47 2008/10/07 19:27:03 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/queries.sgml,v 1.48 2008/10/13 16:25:19 tgl Exp $ -->
<chapter id="queries">
<title>Queries</title>
@ -1604,8 +1604,85 @@ GROUP BY sub_part
the recursive part of the query will eventually return no tuples,
or else the query will loop indefinitely. Sometimes, using
<literal>UNION</> instead of <literal>UNION ALL</> can accomplish this
by discarding rows that duplicate previous output rows; this catches
cycles that would otherwise repeat. A useful trick for testing queries
by discarding rows that duplicate previous output rows. However, often a
cycle does not involve output rows that are completely duplicate: it may be
necessary to check just one or a few fields to see if the same point has
been reached before. The standard method for handling such situations is
to compute an array of the already-visited values. For example, consider
the following query that searches a table <structname>graph</> using a
<structfield>link</> field:
<programlisting>
WITH RECURSIVE search_graph(id, link, data, depth) AS (
SELECT g.id, g.link, g.data, 1
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1
FROM graph g, search_graph sg
WHERE g.id = sg.link
)
SELECT * FROM search_graph;
</programlisting>
This query will loop if the <structfield>link</> relationships contain
cycles. Because we require a <quote>depth</> output, just changing
<literal>UNION ALL</> to <literal>UNION</> would not eliminate the looping.
Instead we need to recognize whether we have reached the same row again
while following a particular path of links. We add two columns
<structfield>path</> and <structfield>cycle</> to the loop-prone query:
<programlisting>
WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
SELECT g.id, g.link, g.data, 1,
ARRAY[g.id],
false
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1,
path || ARRAY[g.id],
g.id = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;
</programlisting>
Aside from preventing cycles, the array value is often useful in its own
right as representing the <quote>path</> taken to reach any particular row.
</para>
<para>
In the general case where more than one field needs to be checked to
recognize a cycle, use an array of rows. For example, if we needed to
compare fields <structfield>f1</> and <structfield>f2</>:
<programlisting>
WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
SELECT g.id, g.link, g.data, 1,
ARRAY[ROW(g.f1, g.f2)],
false
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1,
path || ARRAY[ROW(g.f1, g.f2)],
ROW(g.f1, g.f2) = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;
</programlisting>
</para>
<tip>
<para>
Omit the <literal>ROW()</> syntax in the common case where only one field
needs to be checked to recognize a cycle. This allows a simple array
rather than a composite-type array to be used, gaining efficiency.
</para>
</tip>
<para>
A helpful trick for testing queries
when you are not certain if they might loop is to place a <literal>LIMIT</>
in the parent query. For example, this query would loop forever without
the <literal>LIMIT</>: