Move targetlist SRF handling from expression evaluation to new executor node.

Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT generate_series(1,5)) so far was done in the expression evaluation (i.e. ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code. This meant that most executor nodes performing projection, and most expression evaluation functions, had to deal with the possibility that an evaluated expression could return a set of return values. That's bad because it leads to repeated code in a lot of places. It also, and that's my (Andres's) motivation, made it a lot harder to implement a more efficient way of doing expression evaluation. To fix this, introduce a new executor node (ProjectSet) that can evaluate targetlists containing one or more SRFs. To avoid the complexity of the old way of handling nested expressions returning sets (e.g. having to pass up ExprDoneCond, and dealing with arguments to functions returning sets etc.), those SRFs can only be at the top level of the node's targetlist. The planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is only necessary in ProjectSet nodes and that SRFs are only present at the top level of the node's targetlist. If there are nested SRFs the planner creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get input from an underlying node. We also discussed and prototyped evaluating targetlist SRFs using ROWS FROM(), but that turned out to be more complicated than we'd hoped. While moving SRF evaluation to ProjectSet would allow to retain the old "least common multiple" behavior when multiple SRFs are present in one targetlist (i.e. continue returning rows until all SRFs are at the end of their input at the same time), we decided to instead only return rows till all SRFs are exhausted, returning NULL for already exhausted ones. We deemed the previous behavior to be too confusing, unexpected and actually not particularly useful. As a side effect, the previously prohibited case of multiple set returning arguments to a function, is now allowed. Not because it's particularly desirable, but because it ends up working and there seems to be no argument for adding code to prohibit it. Currently the behavior for COALESCE and CASE containing SRFs has changed, returning multiple rows from the expression, even when the SRF containing "arm" of the expression is not evaluated. That's because the SRFs are evaluated in a separate ProjectSet node. As that's quite confusing, we're likely to instead prohibit SRFs in those places. But that's still being discussed, and the code would reside in places not touched here, so that's a task for later. There's a lot of, now superfluous, code dealing with set return expressions around. But as the changes to get rid of those are verbose largely boring, it seems better for readability to keep the cleanup as a separate commit. Author: Tom Lane and Andres Freund Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
2025-10-16 17:07:43 +03:00 · 2017-01-18 12:46:50 -08:00
parent e37360d5df
commit 69f4b9c85f
35 changed files with 1186 additions and 274 deletions
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -962,12 +962,11 @@ SELECT name, child FROM nodes, LATERAL listchildren(name) AS child;
    </para>

    <para>
-     Currently, functions returning sets can also be called in the select list
+     Functions returning sets can also be called in the select list
     of a query.  For each row that the query
-     generates by itself, the function returning set is invoked, and an output
-     row is generated for each element of the function's result set. Note,
-     however, that this capability is deprecated and might be removed in future
-     releases. The previous example could also be done with queries like
+     generates by itself, the set-returning function is invoked, and an output
+     row is generated for each element of the function's result set.
+     The previous example could also be done with queries like
     these:

 <screen>
@@ -998,6 +997,33 @@ SELECT name, listchildren(name) FROM nodes;
     the <literal>LATERAL</> syntax.
    </para>

+    <para>
+     If there is more than one set-returning function in the same select
+     list, the behavior is similar to what you get from putting the functions
+     into a single <literal>LATERAL ROWS FROM( ... )</> <literal>FROM</>-clause
+     item.  For each row from the underlying query, there is an output row
+     using the first result from each function, then an output row using the
+     second result, and so on.  If some of the set-returning functions
+     produce fewer outputs than others, null values are substituted for the
+     missing data, so that the total number of rows emitted for one
+     underlying row is the same as for the set-returning function that
+     produced the most outputs.
+    </para>
+
+    <para>
+     Set-returning functions can be nested in a select list, although that is
+     not allowed in <literal>FROM</>-clause items.  In such cases, each level
+     of nesting is treated separately, as though it were
+     another <literal>LATERAL ROWS FROM( ... )</> item.  For example, in
+<programlisting>
+SELECT srf1(srf2(x), srf3(y)), srf4(srf5(z)) FROM ...
+</programlisting>
+     the set-returning functions <function>srf2</>, <function>srf3</>,
+     and <function>srf5</> would be run in lockstep for each row of the
+     underlying query, and then <function>srf1</> and <function>srf4</> would
+     be applied in lockstep to each row produced by the lower functions.
+    </para>
+
    <note>
     <para>
      If a function's last command is <command>INSERT</>, <command>UPDATE</>,
@@ -1012,14 +1038,14 @@ SELECT name, listchildren(name) FROM nodes;

    <note>
     <para>
-      The key problem with using set-returning functions in the select list,
-      rather than the <literal>FROM</> clause, is that putting more than one
-      set-returning function in the same select list does not behave very
-      sensibly.  (What you actually get if you do so is a number of output
-      rows equal to the least common multiple of the numbers of rows produced
-      by each set-returning function.)  The <literal>LATERAL</> syntax
-      produces less surprising results when calling multiple set-returning
-      functions, and should usually be used instead.
+      Before <productname>PostgreSQL</> 10, putting more than one
+      set-returning function in the same select list did not behave very
+      sensibly unless they always produced equal numbers of rows.  Otherwise,
+      what you got was a number of output rows equal to the least common
+      multiple of the numbers of rows produced by the set-returning
+      functions.  Furthermore, nested set-returning functions did not work at
+      all.  Use of the <literal>LATERAL</> syntax is recommended when writing
+      queries that need to work in older <productname>PostgreSQL</> versions.
     </para>
    </note>
   </sect2>