mirror of
https://github.com/postgres/postgres.git
synced 2025-05-28 05:21:27 +03:00
executing a statement that fires triggers. Formerly this time was included in "Total runtime" but not otherwise accounted for. As a side benefit, we avoid re-opening relations when firing non-deferred AFTER triggers, because the trigger code can re-use the main executor's ResultRelInfo data structure.
790 lines
34 KiB
Plaintext
790 lines
34 KiB
Plaintext
<!--
|
|
$PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.51 2005/03/25 21:57:57 tgl Exp $
|
|
-->
|
|
|
|
<chapter id="performance-tips">
|
|
<title>Performance Tips</title>
|
|
|
|
<indexterm zone="performance-tips">
|
|
<primary>performance</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Query performance can be affected by many things. Some of these can
|
|
be manipulated by the user, while others are fundamental to the underlying
|
|
design of the system. This chapter provides some hints about understanding
|
|
and tuning <productname>PostgreSQL</productname> performance.
|
|
</para>
|
|
|
|
<sect1 id="using-explain">
|
|
<title>Using <command>EXPLAIN</command></title>
|
|
|
|
<indexterm zone="using-explain">
|
|
<primary>EXPLAIN</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="using-explain">
|
|
<primary>query plan</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</productname> devises a <firstterm>query
|
|
plan</firstterm> for each query it is given. Choosing the right
|
|
plan to match the query structure and the properties of the data
|
|
is absolutely critical for good performance. You can use the
|
|
<xref linkend="sql-explain" endterm="sql-explain-title"> command
|
|
to see what query plan the system creates for any query.
|
|
Plan-reading is an art that deserves an extensive tutorial, which
|
|
this is not; but here is some basic information.
|
|
</para>
|
|
|
|
<para>
|
|
The numbers that are currently quoted by <command>EXPLAIN</command> are:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Estimated start-up cost (Time expended before output scan can start,
|
|
e.g., time to do the sorting in a sort node.)
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Estimated total cost (If all rows were to be retrieved, which they may not
|
|
be: a query with a <literal>LIMIT</> clause will stop short of paying the total cost,
|
|
for example.)
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Estimated number of rows output by this plan node (Again, only if
|
|
executed to completion)
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Estimated average width (in bytes) of rows output by this plan
|
|
node
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
The costs are measured in units of disk page fetches. (CPU effort
|
|
estimates are converted into disk-page units using some
|
|
fairly arbitrary fudge factors. If you want to experiment with these
|
|
factors, see the list of run-time configuration parameters in
|
|
<xref linkend="runtime-config-query-constants">.)
|
|
</para>
|
|
|
|
<para>
|
|
It's important to note that the cost of an upper-level node includes
|
|
the cost of all its child nodes. It's also important to realize that
|
|
the cost only reflects things that the planner/optimizer cares about.
|
|
In particular, the cost does not consider the time spent transmitting
|
|
result rows to the frontend, which could be a pretty dominant
|
|
factor in the true elapsed time; but the planner ignores it because
|
|
it cannot change it by altering the plan. (Every correct plan will
|
|
output the same row set, we trust.)
|
|
</para>
|
|
|
|
<para>
|
|
Rows output is a little tricky because it is <emphasis>not</emphasis> the
|
|
number of rows
|
|
processed/scanned by the query, it is usually less, reflecting the
|
|
estimated selectivity of any <literal>WHERE</>-clause conditions that are being
|
|
applied at this node. Ideally the top-level rows estimate will
|
|
approximate the number of rows actually returned, updated, or deleted
|
|
by the query.
|
|
</para>
|
|
|
|
<para>
|
|
Here are some examples (using the regression test database after a
|
|
<command>VACUUM ANALYZE</>, and 7.3 development sources):
|
|
|
|
<programlisting>
|
|
EXPLAIN SELECT * FROM tenk1;
|
|
|
|
QUERY PLAN
|
|
-------------------------------------------------------------
|
|
Seq Scan on tenk1 (cost=0.00..333.00 rows=10000 width=148)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
This is about as straightforward as it gets. If you do
|
|
|
|
<programlisting>
|
|
SELECT * FROM pg_class WHERE relname = 'tenk1';
|
|
</programlisting>
|
|
|
|
you will find out that <classname>tenk1</classname> has 233 disk
|
|
pages and 10000 rows. So the cost is estimated at 233 page
|
|
reads, defined as costing 1.0 apiece, plus 10000 * <xref
|
|
linkend="guc-cpu-tuple-cost"> which is
|
|
currently 0.01 (try <command>SHOW cpu_tuple_cost</command>).
|
|
</para>
|
|
|
|
<para>
|
|
Now let's modify the query to add a <literal>WHERE</> condition:
|
|
|
|
<programlisting>
|
|
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000;
|
|
|
|
QUERY PLAN
|
|
------------------------------------------------------------
|
|
Seq Scan on tenk1 (cost=0.00..358.00 rows=1033 width=148)
|
|
Filter: (unique1 < 1000)
|
|
</programlisting>
|
|
|
|
The estimate of output rows has gone down because of the <literal>WHERE</> clause.
|
|
However, the scan will still have to visit all 10000 rows, so the cost
|
|
hasn't decreased; in fact it has gone up a bit to reflect the extra CPU
|
|
time spent checking the <literal>WHERE</> condition.
|
|
</para>
|
|
|
|
<para>
|
|
The actual number of rows this query would select is 1000, but the
|
|
estimate is only approximate. If you try to duplicate this experiment,
|
|
you will probably get a slightly different estimate; moreover, it will
|
|
change after each <command>ANALYZE</command> command, because the
|
|
statistics produced by <command>ANALYZE</command> are taken from a
|
|
randomized sample of the table.
|
|
</para>
|
|
|
|
<para>
|
|
Modify the query to restrict the condition even more:
|
|
|
|
<programlisting>
|
|
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 50;
|
|
|
|
QUERY PLAN
|
|
-------------------------------------------------------------------------------
|
|
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..179.33 rows=49 width=148)
|
|
Index Cond: (unique1 < 50)
|
|
</programlisting>
|
|
|
|
and you will see that if we make the <literal>WHERE</> condition selective
|
|
enough, the planner will
|
|
eventually decide that an index scan is cheaper than a sequential scan.
|
|
This plan will only have to visit 50 rows because of the index,
|
|
so it wins despite the fact that each individual fetch is more expensive
|
|
than reading a whole disk page sequentially.
|
|
</para>
|
|
|
|
<para>
|
|
Add another condition to the <literal>WHERE</> clause:
|
|
|
|
<programlisting>
|
|
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 50 AND stringu1 = 'xxx';
|
|
|
|
QUERY PLAN
|
|
-------------------------------------------------------------------------------
|
|
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..179.45 rows=1 width=148)
|
|
Index Cond: (unique1 < 50)
|
|
Filter: (stringu1 = 'xxx'::name)
|
|
</programlisting>
|
|
|
|
The added condition <literal>stringu1 = 'xxx'</literal> reduces the
|
|
output-rows estimate, but not the cost because we still have to visit the
|
|
same set of rows. Notice that the <literal>stringu1</> clause
|
|
cannot be applied as an index condition (since this index is only on
|
|
the <literal>unique1</> column). Instead it is applied as a filter on
|
|
the rows retrieved by the index. Thus the cost has actually gone up
|
|
a little bit to reflect this extra checking.
|
|
</para>
|
|
|
|
<para>
|
|
Let's try joining two tables, using the columns we have been discussing:
|
|
|
|
<programlisting>
|
|
EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 50 AND t1.unique2 = t2.unique2;
|
|
|
|
QUERY PLAN
|
|
----------------------------------------------------------------------------
|
|
Nested Loop (cost=0.00..327.02 rows=49 width=296)
|
|
-> Index Scan using tenk1_unique1 on tenk1 t1
|
|
(cost=0.00..179.33 rows=49 width=148)
|
|
Index Cond: (unique1 < 50)
|
|
-> Index Scan using tenk2_unique2 on tenk2 t2
|
|
(cost=0.00..3.01 rows=1 width=148)
|
|
Index Cond: ("outer".unique2 = t2.unique2)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
In this nested-loop join, the outer scan is the same index scan we had
|
|
in the example before last, and so its cost and row count are the same
|
|
because we are applying the <literal>WHERE</> clause <literal>unique1 < 50</literal> at that node.
|
|
The <literal>t1.unique2 = t2.unique2</literal> clause is not relevant yet, so it doesn't
|
|
affect row count of the outer scan. For the inner scan, the <literal>unique2</> value of the
|
|
current
|
|
outer-scan row is plugged into the inner index scan
|
|
to produce an index condition like
|
|
<literal>t2.unique2 = <replaceable>constant</replaceable></literal>. So we get the
|
|
same inner-scan plan and costs that we'd get from, say, <literal>EXPLAIN SELECT
|
|
* FROM tenk2 WHERE unique2 = 42</literal>. The costs of the loop node are then set
|
|
on the basis of the cost of the outer scan, plus one repetition of the
|
|
inner scan for each outer row (49 * 3.01, here), plus a little CPU
|
|
time for join processing.
|
|
</para>
|
|
|
|
<para>
|
|
In this example the join's output row count is the same as the product
|
|
of the two scans' row counts, but that's not true in general, because
|
|
in general you can have <literal>WHERE</> clauses that mention both tables and
|
|
so can only be applied at the join point, not to either input scan.
|
|
For example, if we added <literal>WHERE ... AND t1.hundred < t2.hundred</literal>,
|
|
that would decrease the output row count of the join node, but not change
|
|
either input scan.
|
|
</para>
|
|
|
|
<para>
|
|
One way to look at variant plans is to force the planner to disregard
|
|
whatever strategy it thought was the winner, using the enable/disable
|
|
flags for each plan type. (This is a crude tool, but useful. See
|
|
also <xref linkend="explicit-joins">.)
|
|
|
|
<programlisting>
|
|
SET enable_nestloop = off;
|
|
EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 50 AND t1.unique2 = t2.unique2;
|
|
|
|
QUERY PLAN
|
|
--------------------------------------------------------------------------
|
|
Hash Join (cost=179.45..563.06 rows=49 width=296)
|
|
Hash Cond: ("outer".unique2 = "inner".unique2)
|
|
-> Seq Scan on tenk2 t2 (cost=0.00..333.00 rows=10000 width=148)
|
|
-> Hash (cost=179.33..179.33 rows=49 width=148)
|
|
-> Index Scan using tenk1_unique1 on tenk1 t1
|
|
(cost=0.00..179.33 rows=49 width=148)
|
|
Index Cond: (unique1 < 50)
|
|
</programlisting>
|
|
|
|
This plan proposes to extract the 50 interesting rows of <classname>tenk1</classname>
|
|
using ye same olde index scan, stash them into an in-memory hash table,
|
|
and then do a sequential scan of <classname>tenk2</classname>, probing into the hash table
|
|
for possible matches of <literal>t1.unique2 = t2.unique2</literal> at each <classname>tenk2</classname> row.
|
|
The cost to read <classname>tenk1</classname> and set up the hash table is entirely start-up
|
|
cost for the hash join, since we won't get any rows out until we can
|
|
start reading <classname>tenk2</classname>. The total time estimate for the join also
|
|
includes a hefty charge for the CPU time to probe the hash table
|
|
10000 times. Note, however, that we are <emphasis>not</emphasis> charging 10000 times 179.33;
|
|
the hash table setup is only done once in this plan type.
|
|
</para>
|
|
|
|
<para>
|
|
It is possible to check on the accuracy of the planner's estimated costs
|
|
by using <command>EXPLAIN ANALYZE</>. This command actually executes the query,
|
|
and then displays the true run time accumulated within each plan node
|
|
along with the same estimated costs that a plain <command>EXPLAIN</command> shows.
|
|
For example, we might get a result like this:
|
|
|
|
<screen>
|
|
EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 50 AND t1.unique2 = t2.unique2;
|
|
|
|
QUERY PLAN
|
|
-------------------------------------------------------------------------------
|
|
Nested Loop (cost=0.00..327.02 rows=49 width=296)
|
|
(actual time=1.181..29.822 rows=50 loops=1)
|
|
-> Index Scan using tenk1_unique1 on tenk1 t1
|
|
(cost=0.00..179.33 rows=49 width=148)
|
|
(actual time=0.630..8.917 rows=50 loops=1)
|
|
Index Cond: (unique1 < 50)
|
|
-> Index Scan using tenk2_unique2 on tenk2 t2
|
|
(cost=0.00..3.01 rows=1 width=148)
|
|
(actual time=0.295..0.324 rows=1 loops=50)
|
|
Index Cond: ("outer".unique2 = t2.unique2)
|
|
Total runtime: 31.604 ms
|
|
</screen>
|
|
|
|
Note that the <quote>actual time</quote> values are in milliseconds of
|
|
real time, whereas the <quote>cost</quote> estimates are expressed in
|
|
arbitrary units of disk fetches; so they are unlikely to match up.
|
|
The thing to pay attention to is the ratios.
|
|
</para>
|
|
|
|
<para>
|
|
In some query plans, it is possible for a subplan node to be executed more
|
|
than once. For example, the inner index scan is executed once per outer
|
|
row in the above nested-loop plan. In such cases, the
|
|
<quote>loops</quote> value reports the
|
|
total number of executions of the node, and the actual time and rows
|
|
values shown are averages per-execution. This is done to make the numbers
|
|
comparable with the way that the cost estimates are shown. Multiply by
|
|
the <quote>loops</quote> value to get the total time actually spent in
|
|
the node.
|
|
</para>
|
|
|
|
<para>
|
|
The <literal>Total runtime</literal> shown by <command>EXPLAIN
|
|
ANALYZE</command> includes executor start-up and shut-down time, as well
|
|
as time spent processing the result rows. It does not include parsing,
|
|
rewriting, or planning time. For a <command>SELECT</> query, the total
|
|
run time will normally be just a little larger than the total time
|
|
reported for the top-level plan node. For <command>INSERT</>,
|
|
<command>UPDATE</>, and <command>DELETE</> commands, the total run time
|
|
may be considerably larger, because it includes the time spent processing
|
|
the result rows. In these commands, the time for the top plan node
|
|
essentially is the time spent computing the new rows and/or locating the
|
|
old ones, but it doesn't include the time spent making the changes.
|
|
Time spent firing triggers, if any, is also outside the top plan node,
|
|
and is shown separately for each trigger.
|
|
</para>
|
|
|
|
<para>
|
|
It is worth noting that <command>EXPLAIN</> results should not be extrapolated
|
|
to situations other than the one you are actually testing; for example,
|
|
results on a toy-sized table can't be assumed to apply to large tables.
|
|
The planner's cost estimates are not linear and so it may well choose
|
|
a different plan for a larger or smaller table. An extreme example
|
|
is that on a table that only occupies one disk page, you'll nearly
|
|
always get a sequential scan plan whether indexes are available or not.
|
|
The planner realizes that it's going to take one disk page read to
|
|
process the table in any case, so there's no value in expending additional
|
|
page reads to look at an index.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="planner-stats">
|
|
<title>Statistics Used by the Planner</title>
|
|
|
|
<indexterm zone="planner-stats">
|
|
<primary>statistics</primary>
|
|
<secondary>of the planner</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
As we saw in the previous section, the query planner needs to estimate
|
|
the number of rows retrieved by a query in order to make good choices
|
|
of query plans. This section provides a quick look at the statistics
|
|
that the system uses for these estimates.
|
|
</para>
|
|
|
|
<para>
|
|
One component of the statistics is the total number of entries in
|
|
each table and index, as well as the number of disk blocks occupied
|
|
by each table and index. This information is kept in the table
|
|
<link linkend="catalog-pg-class"><structname>pg_class</structname></link>, in
|
|
the columns <structfield>reltuples</structfield> and
|
|
<structfield>relpages</structfield>. We can look at it with
|
|
queries similar to this one:
|
|
|
|
<screen>
|
|
SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 'tenk1%';
|
|
|
|
relname | relkind | reltuples | relpages
|
|
---------------+---------+-----------+----------
|
|
tenk1 | r | 10000 | 233
|
|
tenk1_hundred | i | 10000 | 30
|
|
tenk1_unique1 | i | 10000 | 30
|
|
tenk1_unique2 | i | 10000 | 30
|
|
(4 rows)
|
|
</screen>
|
|
|
|
Here we can see that <structname>tenk1</structname> contains 10000
|
|
rows, as do its indexes, but the indexes are (unsurprisingly) much
|
|
smaller than the table.
|
|
</para>
|
|
|
|
<para>
|
|
For efficiency reasons, <structfield>reltuples</structfield>
|
|
and <structfield>relpages</structfield> are not updated on-the-fly,
|
|
and so they usually contain somewhat out-of-date values.
|
|
They are updated by <command>VACUUM</>, <command>ANALYZE</>, and a
|
|
few DDL commands such as <command>CREATE INDEX</>. A stand-alone
|
|
<command>ANALYZE</>, that is one not part of <command>VACUUM</>,
|
|
generates an approximate <structfield>reltuples</structfield> value
|
|
since it does not read every row of the table. The planner
|
|
will scale the values it finds in <structname>pg_class</structname>
|
|
to match the current physical table size, thus obtaining a closer
|
|
approximation.
|
|
</para>
|
|
|
|
<indexterm>
|
|
<primary>pg_statistic</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Most queries retrieve only a fraction of the rows in a table, due
|
|
to having <literal>WHERE</> clauses that restrict the rows to be
|
|
examined. The planner thus needs to make an estimate of the
|
|
<firstterm>selectivity</> of <literal>WHERE</> clauses, that is,
|
|
the fraction of rows that match each condition in the
|
|
<literal>WHERE</> clause. The information used for this task is
|
|
stored in the <link linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link>
|
|
system catalog. Entries in <structname>pg_statistic</structname>
|
|
are updated by the <command>ANALYZE</> and <command>VACUUM
|
|
ANALYZE</> commands and are always approximate even when freshly
|
|
updated.
|
|
</para>
|
|
|
|
<indexterm>
|
|
<primary>pg_stats</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Rather than look at <structname>pg_statistic</structname> directly,
|
|
it's better to look at its view <structname>pg_stats</structname>
|
|
when examining the statistics manually. <structname>pg_stats</structname>
|
|
is designed to be more easily readable. Furthermore,
|
|
<structname>pg_stats</structname> is readable by all, whereas
|
|
<structname>pg_statistic</structname> is only readable by a superuser.
|
|
(This prevents unprivileged users from learning something about
|
|
the contents of other people's tables from the statistics. The
|
|
<structname>pg_stats</structname> view is restricted to show only
|
|
rows about tables that the current user can read.)
|
|
For example, we might do:
|
|
|
|
<screen>
|
|
SELECT attname, n_distinct, most_common_vals FROM pg_stats WHERE tablename = 'road';
|
|
|
|
attname | n_distinct | most_common_vals
|
|
---------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
name | -0.467008 | {"I- 580 Ramp","I- 880 Ramp","Sp Railroad ","I- 580 ","I- 680 Ramp","I- 80 Ramp","14th St ","5th St ","Mission Blvd","I- 880 "}
|
|
thepath | 20 | {"[(-122.089,37.71),(-122.0886,37.711)]"}
|
|
(2 rows)
|
|
</screen>
|
|
</para>
|
|
|
|
<para>
|
|
<structname>pg_stats</structname> is described in detail in
|
|
<xref linkend="view-pg-stats">.
|
|
</para>
|
|
|
|
<para>
|
|
The amount of information stored in <structname>pg_statistic</structname>,
|
|
in particular the maximum number of entries in the
|
|
<structfield>most_common_vals</> and <structfield>histogram_bounds</>
|
|
arrays for each column, can be set on a
|
|
column-by-column basis using the <command>ALTER TABLE SET STATISTICS</>
|
|
command, or globally by setting the
|
|
<xref linkend="guc-default-statistics-target"> configuration variable.
|
|
The default limit is presently 10 entries. Raising the limit
|
|
may allow more accurate planner estimates to be made, particularly for
|
|
columns with irregular data distributions, at the price of consuming
|
|
more space in <structname>pg_statistic</structname> and slightly more
|
|
time to compute the estimates. Conversely, a lower limit may be
|
|
appropriate for columns with simple data distributions.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="explicit-joins">
|
|
<title>Controlling the Planner with Explicit <literal>JOIN</> Clauses</title>
|
|
|
|
<indexterm zone="explicit-joins">
|
|
<primary>join</primary>
|
|
<secondary>controlling the order</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
It is possible
|
|
to control the query planner to some extent by using the explicit <literal>JOIN</>
|
|
syntax. To see why this matters, we first need some background.
|
|
</para>
|
|
|
|
<para>
|
|
In a simple join query, such as
|
|
<programlisting>
|
|
SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
|
|
</programlisting>
|
|
the planner is free to join the given tables in any order. For
|
|
example, it could generate a query plan that joins A to B, using
|
|
the <literal>WHERE</> condition <literal>a.id = b.id</>, and then
|
|
joins C to this joined table, using the other <literal>WHERE</>
|
|
condition. Or it could join B to C and then join A to that result.
|
|
Or it could join A to C and then join them with B, but that
|
|
would be inefficient, since the full Cartesian product of A and C
|
|
would have to be formed, there being no applicable condition in the
|
|
<literal>WHERE</> clause to allow optimization of the join. (All
|
|
joins in the <productname>PostgreSQL</productname> executor happen
|
|
between two input tables, so it's necessary to build up the result
|
|
in one or another of these fashions.) The important point is that
|
|
these different join possibilities give semantically equivalent
|
|
results but may have hugely different execution costs. Therefore,
|
|
the planner will explore all of them to try to find the most
|
|
efficient query plan.
|
|
</para>
|
|
|
|
<para>
|
|
When a query only involves two or three tables, there aren't many join
|
|
orders to worry about. But the number of possible join orders grows
|
|
exponentially as the number of tables expands. Beyond ten or so input
|
|
tables it's no longer practical to do an exhaustive search of all the
|
|
possibilities, and even for six or seven tables planning may take an
|
|
annoyingly long time. When there are too many input tables, the
|
|
<productname>PostgreSQL</productname> planner will switch from exhaustive
|
|
search to a <firstterm>genetic</firstterm> probabilistic search
|
|
through a limited number of possibilities. (The switch-over threshold is
|
|
set by the <xref linkend="guc-geqo-threshold"> run-time
|
|
parameter.)
|
|
The genetic search takes less time, but it won't
|
|
necessarily find the best possible plan.
|
|
</para>
|
|
|
|
<para>
|
|
When the query involves outer joins, the planner has much less freedom
|
|
than it does for plain (inner) joins. For example, consider
|
|
<programlisting>
|
|
SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
|
|
</programlisting>
|
|
Although this query's restrictions are superficially similar to the
|
|
previous example, the semantics are different because a row must be
|
|
emitted for each row of A that has no matching row in the join of B and C.
|
|
Therefore the planner has no choice of join order here: it must join
|
|
B to C and then join A to that result. Accordingly, this query takes
|
|
less time to plan than the previous query.
|
|
</para>
|
|
|
|
<para>
|
|
Explicit inner join syntax (<literal>INNER JOIN</>, <literal>CROSS
|
|
JOIN</>, or unadorned <literal>JOIN</>) is semantically the same as
|
|
listing the input relations in <literal>FROM</>, so it does not need to
|
|
constrain the join order. But it is possible to instruct the
|
|
<productname>PostgreSQL</productname> query planner to treat
|
|
explicit inner <literal>JOIN</>s as constraining the join order anyway.
|
|
For example, these three queries are logically equivalent:
|
|
<programlisting>
|
|
SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
|
|
SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id;
|
|
SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
|
|
</programlisting>
|
|
But if we tell the planner to honor the <literal>JOIN</> order,
|
|
the second and third take less time to plan than the first. This effect
|
|
is not worth worrying about for only three tables, but it can be a
|
|
lifesaver with many tables.
|
|
</para>
|
|
|
|
<para>
|
|
To force the planner to follow the <literal>JOIN</> order for inner joins,
|
|
set the <xref linkend="guc-join-collapse-limit"> run-time parameter to 1.
|
|
(Other possible values are discussed below.)
|
|
</para>
|
|
|
|
<para>
|
|
You do not need to constrain the join order completely in order to
|
|
cut search time, because it's OK to use <literal>JOIN</> operators
|
|
within items of a plain <literal>FROM</> list. For example, consider
|
|
<programlisting>
|
|
SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...;
|
|
</programlisting>
|
|
With <varname>join_collapse_limit</> = 1, this
|
|
forces the planner to join A to B before joining them to other tables,
|
|
but doesn't constrain its choices otherwise. In this example, the
|
|
number of possible join orders is reduced by a factor of 5.
|
|
</para>
|
|
|
|
<para>
|
|
Constraining the planner's search in this way is a useful technique
|
|
both for reducing planning time and for directing the planner to a
|
|
good query plan. If the planner chooses a bad join order by default,
|
|
you can force it to choose a better order via <literal>JOIN</> syntax
|
|
— assuming that you know of a better order, that is. Experimentation
|
|
is recommended.
|
|
</para>
|
|
|
|
<para>
|
|
A closely related issue that affects planning time is collapsing of
|
|
subqueries into their parent query. For example, consider
|
|
<programlisting>
|
|
SELECT *
|
|
FROM x, y,
|
|
(SELECT * FROM a, b, c WHERE something) AS ss
|
|
WHERE somethingelse;
|
|
</programlisting>
|
|
This situation might arise from use of a view that contains a join;
|
|
the view's <literal>SELECT</> rule will be inserted in place of the view reference,
|
|
yielding a query much like the above. Normally, the planner will try
|
|
to collapse the subquery into the parent, yielding
|
|
<programlisting>
|
|
SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
|
|
</programlisting>
|
|
This usually results in a better plan than planning the subquery
|
|
separately. (For example, the outer <literal>WHERE</> conditions might be such that
|
|
joining X to A first eliminates many rows of A, thus avoiding the need to
|
|
form the full logical output of the subquery.) But at the same time,
|
|
we have increased the planning time; here, we have a five-way join
|
|
problem replacing two separate three-way join problems. Because of the
|
|
exponential growth of the number of possibilities, this makes a big
|
|
difference. The planner tries to avoid getting stuck in huge join search
|
|
problems by not collapsing a subquery if more than <varname>from_collapse_limit</>
|
|
<literal>FROM</> items would result in the parent
|
|
query. You can trade off planning time against quality of plan by
|
|
adjusting this run-time parameter up or down.
|
|
</para>
|
|
|
|
<para>
|
|
<xref linkend="guc-from-collapse-limit"> and <xref
|
|
linkend="guc-join-collapse-limit">
|
|
are similarly named because they do almost the same thing: one controls
|
|
when the planner will <quote>flatten out</> subselects, and the
|
|
other controls when it will flatten out explicit inner joins. Typically
|
|
you would either set <varname>join_collapse_limit</> equal to
|
|
<varname>from_collapse_limit</> (so that explicit joins and subselects
|
|
act similarly) or set <varname>join_collapse_limit</> to 1 (if you want
|
|
to control join order with explicit joins). But you might set them
|
|
differently if you are trying to fine-tune the trade off between planning
|
|
time and run time.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="populate">
|
|
<title>Populating a Database</title>
|
|
|
|
<para>
|
|
One may need to insert a large amount of data when first populating
|
|
a database. This section contains some suggestions on how to make
|
|
this process as efficient as possible.
|
|
</para>
|
|
|
|
<sect2 id="disable-autocommit">
|
|
<title>Disable Autocommit</title>
|
|
|
|
<indexterm>
|
|
<primary>autocommit</primary>
|
|
<secondary>bulk-loading data</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Turn off autocommit and just do one commit at the end. (In plain
|
|
SQL, this means issuing <command>BEGIN</command> at the start and
|
|
<command>COMMIT</command> at the end. Some client libraries may
|
|
do this behind your back, in which case you need to make sure the
|
|
library does it when you want it done.) If you allow each
|
|
insertion to be committed separately,
|
|
<productname>PostgreSQL</productname> is doing a lot of work for
|
|
each row that is added. An additional benefit of doing all
|
|
insertions in one transaction is that if the insertion of one row
|
|
were to fail then the insertion of all rows inserted up to that
|
|
point would be rolled back, so you won't be stuck with partially
|
|
loaded data.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="populate-copy-from">
|
|
<title>Use <command>COPY</command></title>
|
|
|
|
<para>
|
|
Use <xref linkend="sql-copy" endterm="sql-copy-title"> to load
|
|
all the rows in one command, instead of using a series of
|
|
<command>INSERT</command> commands. The <command>COPY</command>
|
|
command is optimized for loading large numbers of rows; it is less
|
|
flexible than <command>INSERT</command>, but incurs significantly
|
|
less overhead for large data loads. Since <command>COPY</command>
|
|
is a single command, there is no need to disable autocommit if you
|
|
use this method to populate a table.
|
|
</para>
|
|
|
|
<para>
|
|
If you cannot use <command>COPY</command>, it may help to use <xref
|
|
linkend="sql-prepare" endterm="sql-prepare-title"> to create a
|
|
prepared <command>INSERT</command> statement, and then use
|
|
<command>EXECUTE</command> as many times as required. This avoids
|
|
some of the overhead of repeatedly parsing and planning
|
|
<command>INSERT</command>.
|
|
</para>
|
|
|
|
<para>
|
|
Note that loading a large number of rows using
|
|
<command>COPY</command> is almost always faster than using
|
|
<command>INSERT</command>, even if <command>PREPARE</> is used and
|
|
multiple insertions are batched into a single transaction.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="populate-rm-indexes">
|
|
<title>Remove Indexes</title>
|
|
|
|
<para>
|
|
If you are loading a freshly created table, the fastest way is to
|
|
create the table, bulk load the table's data using
|
|
<command>COPY</command>, then create any indexes needed for the
|
|
table. Creating an index on pre-existing data is quicker than
|
|
updating it incrementally as each row is loaded.
|
|
</para>
|
|
|
|
<para>
|
|
If you are augmenting an existing table, you can drop the index,
|
|
load the table, and then recreate the index. Of course, the
|
|
database performance for other users may be adversely affected
|
|
during the time that the index is missing. One should also think
|
|
twice before dropping unique indexes, since the error checking
|
|
afforded by the unique constraint will be lost while the index is
|
|
missing.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="populate-work-mem">
|
|
<title>Increase <varname>maintenance_work_mem</varname></title>
|
|
|
|
<para>
|
|
Temporarily increasing the <xref linkend="guc-maintenance-work-mem">
|
|
configuration variable when loading large amounts of data can
|
|
lead to improved performance. This is because when a B-tree index
|
|
is created from scratch, the existing content of the table needs
|
|
to be sorted. Allowing the merge sort to use more memory
|
|
means that fewer merge passes will be required. A larger setting for
|
|
<varname>maintenance_work_mem</varname> may also speed up validation
|
|
of foreign-key constraints.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="populate-checkpoint-segments">
|
|
<title>Increase <varname>checkpoint_segments</varname></title>
|
|
|
|
<para>
|
|
Temporarily increasing the <xref
|
|
linkend="guc-checkpoint-segments"> configuration variable can also
|
|
make large data loads faster. This is because loading a large
|
|
amount of data into <productname>PostgreSQL</productname> can
|
|
cause checkpoints to occur more often than the normal checkpoint
|
|
frequency (specified by the <varname>checkpoint_timeout</varname>
|
|
configuration variable). Whenever a checkpoint occurs, all dirty
|
|
pages must be flushed to disk. By increasing
|
|
<varname>checkpoint_segments</varname> temporarily during bulk
|
|
data loads, the number of checkpoints that are required can be
|
|
reduced.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="populate-analyze">
|
|
<title>Run <command>ANALYZE</command> Afterwards</title>
|
|
|
|
<para>
|
|
Whenever you have significantly altered the distribution of data
|
|
within a table, running <xref linkend="sql-analyze"
|
|
endterm="sql-analyze-title"> is strongly recommended. This
|
|
includes bulk loading large amounts of data into the table. Running
|
|
<command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
|
|
ensures that the planner has up-to-date statistics about the
|
|
table. With no statistics or obsolete statistics, the planner may
|
|
make poor decisions during query planning, leading to poor
|
|
performance on any tables with inaccurate or nonexistent
|
|
statistics.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
</chapter>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode:sgml
|
|
sgml-omittag:nil
|
|
sgml-shorttag:t
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
sgml-parent-document:nil
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
-->
|