1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-30 11:03:19 +03:00

Teach planner how to rearrange join order for some classes of OUTER JOIN.

Per my recent proposal.  I ended up basing the implementation on the
existing mechanism for enforcing valid join orders of IN joins --- the
rules for valid outer-join orders are somewhat similar.
This commit is contained in:
Tom Lane
2005-12-20 02:30:36 +00:00
parent 1a6aaaa6c4
commit e3b9852728
23 changed files with 969 additions and 722 deletions

View File

@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.38 2005/12/09 15:51:13 petere Exp $
$PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.39 2005/12/20 02:30:35 tgl Exp $
-->
<chapter Id="runtime-config">
<title>Server Configuration</title>
@ -2028,6 +2028,7 @@ SELECT * FROM parent WHERE key = 2400;
this many items. Smaller values reduce planning time but may
yield inferior query plans. The default is 8. It is usually
wise to keep this less than <xref linkend="guc-geqo-threshold">.
For more information see <xref linkend="explicit-joins">.
</para>
</listitem>
</varlistentry>
@ -2039,48 +2040,24 @@ SELECT * FROM parent WHERE key = 2400;
</indexterm>
<listitem>
<para>
The planner will rewrite explicit inner <literal>JOIN</>
constructs into lists of <literal>FROM</> items whenever a
list of no more than this many items in total would
result. Prior to <productname>PostgreSQL</> 7.4, joins
specified via the <literal>JOIN</literal> construct would
never be reordered by the query planner. The query planner has
subsequently been improved so that inner joins written in this
form can be reordered; this configuration parameter controls
the extent to which this reordering is performed.
<note>
<para>
At present, the order of outer joins specified via the
<literal>JOIN</> construct is never adjusted by the query
planner; therefore, <varname>join_collapse_limit</> has no
effect on this behavior. The planner may be improved to
reorder some classes of outer joins in a future release of
<productname>PostgreSQL</productname>.
</para>
</note>
The planner will rewrite explicit <literal>JOIN</>
constructs (except <literal>FULL JOIN</>s) into lists of
<literal>FROM</> items whenever a list of no more than this many items
would result. Smaller values reduce planning time but may
yield inferior query plans.
</para>
<para>
By default, this variable is set the same as
<varname>from_collapse_limit</varname>, which is appropriate
for most uses. Setting it to 1 prevents any reordering of
inner <literal>JOIN</>s. Thus, the explicit join order
explicit <literal>JOIN</>s. Thus, the explicit join order
specified in the query will be the actual order in which the
relations are joined. The query planner does not always choose
the optimal join order; advanced users may elect to
temporarily set this variable to 1, and then specify the join
order they desire explicitly. Another consequence of setting
this variable to 1 is that the query planner will behave more
like the <productname>PostgreSQL</productname> 7.3 query
planner, which some users might find useful for backward
compatibility reasons.
</para>
<para>
Setting this variable to a value between 1 and
<varname>from_collapse_limit</varname> might be useful to
trade off planning time against the quality of the chosen plan
(higher values produce better plans).
order they desire explicitly.
For more information see <xref linkend="explicit-joins">.
</para>
</listitem>
</varlistentry>

View File

@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.54 2005/11/04 23:14:00 petere Exp $
$PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.55 2005/12/20 02:30:35 tgl Exp $
-->
<chapter id="performance-tips">
@ -627,7 +627,7 @@ SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
</para>
<para>
When the query involves outer joins, the planner has much less freedom
When the query involves outer joins, the planner has less freedom
than it does for plain (inner) joins. For example, consider
<programlisting>
SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
@ -637,16 +637,30 @@ SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
emitted for each row of A that has no matching row in the join of B and C.
Therefore the planner has no choice of join order here: it must join
B to C and then join A to that result. Accordingly, this query takes
less time to plan than the previous query.
less time to plan than the previous query. In other cases, the planner
may be able to determine that more than one join order is safe.
For example, given
<programlisting>
SELECT * FROM a LEFT JOIN b ON (a.bid = b.id) LEFT JOIN c ON (a.cid = c.id);
</programlisting>
it is valid to join A to either B or C first. Currently, only
<literal>FULL JOIN</> completely constrains the join order. Most
practical cases involving <literal>LEFT JOIN</> or <literal>RIGHT JOIN</>
can be rearranged to some extent.
</para>
<para>
Explicit inner join syntax (<literal>INNER JOIN</>, <literal>CROSS
JOIN</>, or unadorned <literal>JOIN</>) is semantically the same as
listing the input relations in <literal>FROM</>, so it does not need to
constrain the join order. But it is possible to instruct the
<productname>PostgreSQL</productname> query planner to treat
explicit inner <literal>JOIN</>s as constraining the join order anyway.
listing the input relations in <literal>FROM</>, so it does not
constrain the join order.
</para>
<para>
Even though most kinds of <literal>JOIN</> don't completely constrain
the join order, it is possible to instruct the
<productname>PostgreSQL</productname> query planner to treat all
<literal>JOIN</> clauses as constraining the join order anyway.
For example, these three queries are logically equivalent:
<programlisting>
SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
@ -660,7 +674,8 @@ SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
</para>
<para>
To force the planner to follow the <literal>JOIN</> order for inner joins,
To force the planner to follow the join order laid out by explicit
<literal>JOIN</>s,
set the <xref linkend="guc-join-collapse-limit"> run-time parameter to 1.
(Other possible values are discussed below.)
</para>
@ -697,9 +712,9 @@ FROM x, y,
WHERE somethingelse;
</programlisting>
This situation might arise from use of a view that contains a join;
the view's <literal>SELECT</> rule will be inserted in place of the view reference,
yielding a query much like the above. Normally, the planner will try
to collapse the subquery into the parent, yielding
the view's <literal>SELECT</> rule will be inserted in place of the view
reference, yielding a query much like the above. Normally, the planner
will try to collapse the subquery into the parent, yielding
<programlisting>
SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
</programlisting>
@ -722,12 +737,12 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
linkend="guc-join-collapse-limit">
are similarly named because they do almost the same thing: one controls
when the planner will <quote>flatten out</> subselects, and the
other controls when it will flatten out explicit inner joins. Typically
other controls when it will flatten out explicit joins. Typically
you would either set <varname>join_collapse_limit</> equal to
<varname>from_collapse_limit</> (so that explicit joins and subselects
act similarly) or set <varname>join_collapse_limit</> to 1 (if you want
to control join order with explicit joins). But you might set them
differently if you are trying to fine-tune the trade off between planning
differently if you are trying to fine-tune the trade-off between planning
time and run time.
</para>
</sect1>