1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-30 11:03:19 +03:00

Improve EXPLAIN's display of SubPlan nodes and output parameters.

Historically we've printed SubPlan expression nodes as "(SubPlan N)",
which is pretty uninformative.  Trying to reproduce the original SQL
for the subquery is still as impractical as before, and would be
mighty verbose as well.  However, we can still do better than that.
Displaying the "testexpr" when present, and adding a keyword to
indicate the SubLinkType, goes a long way toward showing what's
really going on.

In addition, this patch gets rid of EXPLAIN's use of "$n" to represent
subplan and initplan output Params.  Instead we now print "(SubPlan
N).colX" or "(InitPlan N).colX" to represent the X'th output column
of that subplan.  This eliminates confusion with the use of "$n" to
represent PARAM_EXTERN Params, and it's useful for the first part of
this change because it eliminates needing some other indication of
which subplan is referenced by a SubPlan that has a testexpr.

In passing, this adds simple regression test coverage of the
ROWCOMPARE_SUBLINK code paths, which were entirely unburdened
by testing before.

Tom Lane and Dean Rasheed, reviewed by Aleksander Alekseev.
Thanks to Chantal Keller for raising the question of whether
this area couldn't be improved.

Discussion: https://postgr.es/m/2838538.1705692747@sss.pgh.pa.us
This commit is contained in:
Tom Lane
2024-03-19 18:19:24 -04:00
parent cc4826dd5e
commit fd0398fcb0
24 changed files with 793 additions and 464 deletions

View File

@ -573,8 +573,106 @@ WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;
which shows that the planner thinks that sorting <literal>onek</literal> by
index-scanning is about 12% more expensive than sequential-scan-and-sort.
Of course, the next question is whether it's right about that.
We can investigate that using <command>EXPLAIN ANALYZE</command>, as discussed
below.
We can investigate that using <command>EXPLAIN ANALYZE</command>, as
discussed <link linkend="using-explain-analyze">below</link>.
</para>
<para>
<indexterm>
<primary>subplan</primary>
</indexterm>
Some query plans involve <firstterm>subplans</firstterm>, which arise
from sub-<literal>SELECT</literal>s in the original query. Such
queries can sometimes be transformed into ordinary join plans, but
when they cannot be, we get plans like:
<screen>
EXPLAIN VERBOSE SELECT unique1
FROM tenk1 t
WHERE t.ten &lt; ALL (SELECT o.ten FROM onek o WHERE o.four = t.four);
QUERY PLAN
-------------------------------------------------------------------&zwsp;------
Seq Scan on public.tenk1 t (cost=0.00..586095.00 rows=5000 width=4)
Output: t.unique1
Filter: (ALL (t.ten &lt; (SubPlan 1).col1))
SubPlan 1
-&gt; Seq Scan on public.onek o (cost=0.00..116.50 rows=250 width=4)
Output: o.ten
Filter: (o.four = t.four)
</screen>
This rather artificial example serves to illustrate a couple of
points: values from the outer plan level can be passed down into a
subplan (here, <literal>t.four</literal> is passed down) and the
results of the sub-select are available to the outer plan. Those
result values are shown by <command>EXPLAIN</command> with notations
like
<literal>(<replaceable>subplan_name</replaceable>).col<replaceable>N</replaceable></literal>,
which refers to the <replaceable>N</replaceable>'th output column of
the sub-<literal>SELECT</literal>.
</para>
<para>
<indexterm>
<primary>subplan</primary>
<secondary>hashed</secondary>
</indexterm>
In the example above, the <literal>ALL</literal> operator runs the
subplan again for each row of the outer query (which accounts for the
high estimated cost). Some queries can use a <firstterm>hashed
subplan</firstterm> to avoid that:
<screen>
EXPLAIN SELECT *
FROM tenk1 t
WHERE t.unique1 NOT IN (SELECT o.unique1 FROM onek o);
QUERY PLAN
-------------------------------------------------------------------&zwsp;-------------------------
Seq Scan on tenk1 t (cost=61.77..531.77 rows=5000 width=244)
Filter: (NOT (ANY (unique1 = (hashed SubPlan 1).col1)))
SubPlan 1
-&gt; Index Only Scan using onek_unique1 on onek o (cost=0.28..59.27 rows=1000 width=4)
(4 rows)
</screen>
Here, the subplan is run a single time and its output is loaded into
an in-memory hash table, which is then probed by the
outer <literal>ANY</literal> operator. This requires that the
sub-<literal>SELECT</literal> not reference any variables of the outer
query, and that the <literal>ANY</literal>'s comparison operator be
amenable to hashing.
</para>
<para>
<indexterm>
<primary>initplan</primary>
</indexterm>
If, in addition to not referencing any variables of the outer query,
the sub-<literal>SELECT</literal> cannot return more than one row,
it may instead be implemented as an <firstterm>initplan</firstterm>:
<screen>
EXPLAIN VERBOSE SELECT unique1
FROM tenk1 t1 WHERE t1.ten = (SELECT (random() * 10)::integer);
QUERY PLAN
------------------------------------------------------------&zwsp;--------
Seq Scan on public.tenk1 t1 (cost=0.02..470.02 rows=1000 width=4)
Output: t1.unique1
Filter: (t1.ten = (InitPlan 1).col1)
InitPlan 1
-&gt; Result (cost=0.00..0.02 rows=1 width=4)
Output: ((random() * '10'::double precision))::integer
</screen>
An initplan is run only once per execution of the outer plan, and its
results are saved for re-use in later rows of the outer plan. So in
this example <literal>random()</literal> is evaluated only once and
all the values of <literal>t1.ten</literal> are compared to the same
randomly-chosen integer. That's quite different from what would
happen without the sub-<literal>SELECT</literal> construct.
</para>
</sect2>