1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-28 23:42:10 +03:00

R-tree is dead ... long live GiST.

This commit is contained in:
Tom Lane
2005-11-07 17:36:47 +00:00
parent 645adf5de8
commit 2a8d3d83ef
46 changed files with 213 additions and 2954 deletions

View File

@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.75 2005/11/04 23:13:59 petere Exp $
$PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.76 2005/11/07 17:36:44 tgl Exp $
-->
<chapter id="backup">
<title>Backup and Restore</title>
@ -1129,8 +1129,8 @@ restore_command = 'copy /mnt/server/archivedir/%f "%p"' # Windows
<itemizedlist>
<listitem>
<para>
Operations on hash and R-tree indexes are
not presently WAL-logged, so replay will not update these index types.
Operations on hash indexes are
not presently WAL-logged, so replay will not update these indexes.
The recommended workaround is to manually <command>REINDEX</> each
such index after completing a recovery operation.
</para>

View File

@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.33 2005/10/25 13:38:09 momjian Exp $
$PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.34 2005/11/07 17:36:44 tgl Exp $
Genetic Optimizer
-->
@ -51,8 +51,8 @@ Genetic Optimizer
caused by the support of a variety of <firstterm>join
methods</firstterm> (e.g., nested loop, hash join, merge join in
<productname>PostgreSQL</productname>) to process individual joins
and a diversity of <firstterm>indexes</firstterm> (e.g., R-tree,
B-tree, hash in <productname>PostgreSQL</productname>) as access
and a diversity of <firstterm>indexes</firstterm> (e.g.,
B-tree, hash, GiST in <productname>PostgreSQL</productname>) as access
paths for relations.
</para>

View File

@ -1,25 +1,22 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/gist.sgml,v 1.24 2005/11/04 23:14:00 petere Exp $
$PostgreSQL: pgsql/doc/src/sgml/gist.sgml,v 1.25 2005/11/07 17:36:44 tgl Exp $
-->
<chapter id="GiST">
<title>GiST Indexes</title>
<sect1 id="gist-intro">
<title>Introduction</title>
<para>
<indexterm>
<primary>index</primary>
<secondary>GiST</secondary>
</indexterm>
<indexterm>
<primary>GiST</primary>
<see>index</see>
</indexterm>
<sect1 id="gist-intro">
<title>Introduction</title>
<para>
<acronym>GiST</acronym> stands for Generalized Search Tree. It is a
balanced, tree-structured access method, that acts as a base template in
which to implement arbitrary indexing schemes. B+-trees, R-trees and many
which to implement arbitrary indexing schemes. B-trees, R-trees and many
other indexing schemes can be implemented in <acronym>GiST</acronym>.
</para>
@ -60,17 +57,17 @@ $PostgreSQL: pgsql/doc/src/sgml/gist.sgml,v 1.24 2005/11/04 23:14:00 petere Exp
<para>
This extensibility should not be confused with the extensibility of the
other standard search trees in terms of the data they can handle. For
example, <productname>PostgreSQL</productname> supports extensible B+-trees
and R-trees. That means that you can use
<productname>PostgreSQL</productname> to build a B+-tree or R-tree over any
data type you want. But B+-trees only support range predicates
example, <productname>PostgreSQL</productname> supports extensible B-trees
and hash indexes. That means that you can use
<productname>PostgreSQL</productname> to build a B-tree or hash over any
data type you want. But B-trees only support range predicates
(<literal>&lt;</literal>, <literal>=</literal>, <literal>&gt;</literal>),
and R-trees only support n-D range queries (contains, contained, equals).
and hash indexes only support equality queries.
</para>
<para>
So if you index, say, an image collection with a
<productname>PostgreSQL</productname> B+-tree, you can only issue queries
<productname>PostgreSQL</productname> B-tree, you can only issue queries
such as <quote>is imagex equal to imagey</quote>, <quote>is imagex less
than imagey</quote> and <quote>is imagex greater than imagey</quote>?
Depending on how you define <quote>equals</quote>, <quote>less than</quote>
@ -84,7 +81,7 @@ $PostgreSQL: pgsql/doc/src/sgml/gist.sgml,v 1.24 2005/11/04 23:14:00 petere Exp
All it takes to get a <acronym>GiST</acronym> access method up and running
is to implement seven user-defined methods, which define the behavior of
keys in the tree. Of course these methods have to be pretty fancy to
support fancy queries, but for all the standard queries (B+-trees,
support fancy queries, but for all the standard queries (B-trees,
R-trees, etc.) they're relatively straightforward. In short,
<acronym>GiST</acronym> combines extensibility along with generality, code
reuse, and a clean interface.

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.54 2005/11/04 23:14:00 petere Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.55 2005/11/07 17:36:44 tgl Exp $ -->
<chapter id="indexes">
<title id="indexes-title">Indexes</title>
@ -104,7 +104,7 @@ CREATE INDEX test1_id_index ON test1 (id);
<para>
<productname>PostgreSQL</productname> provides several index types:
B-tree, R-tree, Hash, and GiST. Each index type uses a different
B-tree, Hash, and GiST. Each index type uses a different
algorithm that is best suited to different types of queries.
By default, the <command>CREATE INDEX</command> command will create a
B-tree index, which fits the most common situations.
@ -155,20 +155,51 @@ CREATE INDEX test1_id_index ON test1 (id);
<para>
<indexterm>
<primary>index</primary>
<secondary>R-tree</secondary>
<secondary>hash</secondary>
</indexterm>
<indexterm>
<primary>R-tree</primary>
<primary>hash</primary>
<see>index</see>
</indexterm>
R-tree indexes are suited for queries on two-dimensional spatial data.
To create an R-tree index, use a command of the form
Hash indexes can only handle simple equality comparisons.
The query planner will consider using a hash index whenever an
indexed column is involved in a comparison using the
<literal>=</literal> operator. The following command is used to
create a hash index:
<synopsis>
CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING rtree (<replaceable>column</replaceable>);
CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING hash (<replaceable>column</replaceable>);
</synopsis>
The <productname>PostgreSQL</productname> query planner will
consider using an R-tree index whenever an indexed column is
involved in a comparison using one of these operators:
</para>
<note>
<para>
Testing has shown <productname>PostgreSQL</productname>'s hash
indexes to perform no better than B-tree indexes, and the
index size and build time for hash indexes is much worse.
Furthermore, hash index operations are not presently WAL-logged,
so hash indexes may need to be rebuilt with <command>REINDEX</>
after a database crash.
For these reasons, hash index use is presently discouraged.
</para>
</note>
<para>
<indexterm>
<primary>index</primary>
<secondary>GiST</secondary>
</indexterm>
<indexterm>
<primary>GiST</primary>
<see>index</see>
</indexterm>
GiST indexes are not a single kind of index, but rather an infrastructure
within which many different indexing strategies can be implemented.
Accordingly, the particular operators with which a GiST index can be
used vary depending on the indexing strategy (the <firstterm>operator
class</>). As an example, the standard distribution of
<productname>PostgreSQL</productname> includes GiST operator classes
for several two-dimensional geometric data types, which support indexed
queries using these operators:
<simplelist>
<member><literal>&lt;&lt;</literal></member>
@ -187,64 +218,10 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
(See <xref linkend="functions-geometry"> for the meaning of
these operators.)
</para>
<para>
<indexterm>
<primary>index</primary>
<secondary>hash</secondary>
</indexterm>
<indexterm>
<primary>hash</primary>
<see>index</see>
</indexterm>
Hash indexes can only handle simple equality comparisons.
The query planner will consider using a hash index whenever an
indexed column is involved in a comparison using the
<literal>=</literal> operator. The following command is used to
create a hash index:
<synopsis>
CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING hash (<replaceable>column</replaceable>);
</synopsis>
</para>
<para>
GiST indexes are not a single kind of index, but rather an infrastructure
within which many different indexing strategies can be implemented.
Accordingly, the particular operators with which a GiST index can be
used vary depending on the indexing strategy (the <firstterm>operator
class</>). The standard distribution of
<productname>PostgreSQL</productname> includes GiST operator classes
equivalent to the R-tree operator classes, and many other GiST operator
Many other GiST operator
classes are available in the <literal>contrib</> collection or as separate
projects. For more information see <xref linkend="GiST">.
</para>
<note>
<para>
Testing has shown <productname>PostgreSQL</productname>'s hash
indexes to perform no better than B-tree indexes, and the
index size and build time for hash indexes is much worse.
Furthermore, hash index operations are not presently WAL-logged,
so hash indexes may need to be rebuilt with <command>REINDEX</>
after a database crash.
For these reasons, hash index use is presently discouraged.
</para>
<para>
Similarly, R-tree indexes do not seem to have any performance
advantages compared to the equivalent operations of GiST indexes.
Like hash indexes, they are not WAL-logged and may need
reindexing after a database crash.
</para>
<para>
While the problems with hash indexes may be fixed eventually,
it is likely that the R-tree index type will be retired in a future
release. Users are encouraged to migrate applications that use R-tree
indexes to GiST indexes.
</para>
</note>
</sect1>

View File

@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/mvcc.sgml,v 2.52 2005/10/21 01:41:28 tgl Exp $
$PostgreSQL: pgsql/doc/src/sgml/mvcc.sgml,v 2.53 2005/11/07 17:36:44 tgl Exp $
-->
<chapter id="mvcc">
@ -991,18 +991,6 @@ UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
R-tree indexes
</term>
<listitem>
<para>
Share/exclusive index-level locks are used for read/write access.
Locks are released after the entire command is done.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
@ -1012,8 +1000,7 @@ UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;
indexes, they are the recommended index type for concurrent
applications that need to index scalar data. When dealing with
non-scalar data, B-trees are not useful, and GiST indexes should
be used instead. R-tree indexes are deprecated and are likely
to disappear entirely in a future release.
be used instead.
</para>
</sect1>
</chapter>

View File

@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/ref/create_index.sgml,v 1.51 2005/01/04 00:39:53 tgl Exp $
$PostgreSQL: pgsql/doc/src/sgml/ref/create_index.sgml,v 1.52 2005/11/07 17:36:44 tgl Exp $
PostgreSQL documentation
-->
@ -34,7 +34,7 @@ CREATE [ UNIQUE ] INDEX <replaceable class="parameter">name</replaceable> ON <re
<command>CREATE INDEX</command> constructs an index <replaceable
class="parameter">index_name</replaceable> on the specified table.
Indexes are primarily used to enhance database performance (though
inappropriate use will result in slower performance).
inappropriate use can result in slower performance).
</para>
<para>
@ -55,11 +55,7 @@ CREATE [ UNIQUE ] INDEX <replaceable class="parameter">name</replaceable> ON <re
<para>
<productname>PostgreSQL</productname> provides the index methods
B-tree, R-tree, hash, and GiST. The B-tree index method is an
implementation of Lehman-Yao high-concurrency B-trees. The R-tree
index method implements standard R-trees using Guttman's quadratic
split algorithm. The hash index method is an implementation of
Litwin's linear hashing. Users can also define their own index
B-tree, hash, and GiST. Users can also define their own index
methods, but that is fairly complicated.
</para>
@ -137,9 +133,9 @@ CREATE [ UNIQUE ] INDEX <replaceable class="parameter">name</replaceable> ON <re
<term><replaceable class="parameter">method</replaceable></term>
<listitem>
<para>
The name of the method to be used for the index. Choices are
The name of the index method to be used. Choices are
<literal>btree</literal>, <literal>hash</literal>,
<literal>rtree</literal>, and <literal>gist</literal>. The
and <literal>gist</literal>. The
default method is <literal>btree</literal>.
</para>
</listitem>
@ -243,6 +239,15 @@ CREATE [ UNIQUE ] INDEX <replaceable class="parameter">name</replaceable> ON <re
The best way to use indexes in such cases is to create a partial index
using an <literal>IS NULL</> predicate.
</para>
<para>
Prior releases of <productname>PostgreSQL</productname> also had an
R-tree index method. This method has been removed because
it had no significant advantages over the GiST method.
If <literal>USING rtree</> is specified, <command>CREATE INDEX</>
will interpret it as <literal>USING gist</>, to simplify conversion
of old databases to GiST.
</para>
</refsect1>
<refsect1>
@ -270,13 +275,13 @@ CREATE INDEX code_idx ON films(code) TABLESPACE indexspace;
Is this example correct?
</comment>
<para>
To create a R-tree index on a point attribute so that we
To create a GiST index on a point attribute so that we
can efficiently use box operators on the result of the
conversion function:
</para>
<programlisting>
CREATE INDEX pointloc
ON points USING RTREE (point2box(location) box_ops);
ON points USING GIST (point2box(location) box_ops);
SELECT * FROM points
WHERE point2box(points.pointloc) = boxes.box;
</programlisting>

View File

@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.41 2005/07/19 01:27:59 neilc Exp $
$PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.42 2005/11/07 17:36:44 tgl Exp $
-->
<sect1 id="xindex">
@ -170,8 +170,12 @@ $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.41 2005/07/19 01:27:59 neilc Exp
</table>
<para>
R-tree indexes express relationships in two-dimensional space.
They use twelve strategies, shown in
GiST indexes are even more flexible: they do not have a fixed set of
strategies at all. Instead, the <quote>consistency</> support routine
of each particular GiST operator class interprets the strategy numbers
however it likes. As an example, several of the built-in GiST index
operator classes index two-dimensional geometric objects, providing
the <quote>R-tree</> strategies shown in
<xref linkend="xindex-rtree-strat-table">. Four of these are true
two-dimensional tests (overlaps, same, contains, contained by);
four of them consider only the X direction; and the other four
@ -179,7 +183,7 @@ $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.41 2005/07/19 01:27:59 neilc Exp
</para>
<table tocentry="1" id="xindex-rtree-strat-table">
<title>R-tree Strategies</title>
<title>GiST Two-Dimensional <quote>R-tree</> Strategies</title>
<tgroup cols="2">
<thead>
<row>
@ -240,13 +244,6 @@ $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.41 2005/07/19 01:27:59 neilc Exp
</tgroup>
</table>
<para>
GiST indexes are even more flexible: they do not have a fixed set of
strategies at all. Instead, the <quote>consistency</> support routine
of each particular GiST operator class interprets the strategy numbers
however it likes.
</para>
<para>
Note that all strategy operators return Boolean values. In
practice, all operators defined as index method strategies must
@ -274,9 +271,8 @@ $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.41 2005/07/19 01:27:59 neilc Exp
additional support routines in order to work. For example, the B-tree
index method must be able to compare two keys and determine whether one
is greater than, equal to, or less than the other. Similarly, the
R-tree index method must be able to compute
intersections, unions, and sizes of rectangles. These
operations do not correspond to operators used in qualifications in
hash index method must be able to compute hash codes for key values.
These operations do not correspond to operators used in qualifications in
SQL commands; they are administrative routines used by
the index methods, internally.
</para>
@ -339,37 +335,6 @@ $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.41 2005/07/19 01:27:59 neilc Exp
</tgroup>
</table>
<para>
R-tree indexes require three support functions,
shown in <xref linkend="xindex-rtree-support-table">.
</para>
<table tocentry="1" id="xindex-rtree-support-table">
<title>R-tree Support Functions</title>
<tgroup cols="2">
<thead>
<row>
<entry>Function</entry>
<entry>Support Number</entry>
</row>
</thead>
<tbody>
<row>
<entry>union</entry>
<entry>1</entry>
</row>
<row>
<entry>intersection</entry>
<entry>2</entry>
</row>
<row>
<entry>size</entry>
<entry>3</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
GiST indexes require seven support functions,
shown in <xref linkend="xindex-gist-support-table">.
@ -746,7 +711,7 @@ SELECT * FROM table WHERE integer_column &lt; 4;
</programlisting>
can be satisfied exactly by a B-tree index on the integer column.
But there are cases where an index is useful as an inexact guide to
the matching rows. For example, if an R-tree index stores only
the matching rows. For example, if a GiST index stores only
bounding boxes for objects, then it cannot exactly satisfy a <literal>WHERE</>
condition that tests overlap between nonrectangular objects such as
polygons. Yet we could use the index to find objects whose bounding