R-tree is dead ... long live GiST.

2025-07-28 23:42:10 +03:00 · 2005-11-07 17:36:47 +00:00
parent 645adf5de8
commit 2a8d3d83ef
46 changed files with 213 additions and 2954 deletions
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.75 2005/11/04 23:13:59 petere Exp $
+$PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.76 2005/11/07 17:36:44 tgl Exp $
 -->
 <chapter id="backup">
 <title>Backup and Restore</title>
@ -1129,8 +1129,8 @@ restore_command = 'copy /mnt/server/archivedir/%f "%p"'  # Windows
  <itemizedlist>
   <listitem>
    <para>
-     Operations on hash and R-tree indexes are
-     not presently WAL-logged, so replay will not update these index types.
+     Operations on hash indexes are
+     not presently WAL-logged, so replay will not update these indexes.
     The recommended workaround is to manually <command>REINDEX</> each
     such index after completing a recovery operation.
    </para>
--- a/doc/src/sgml/geqo.sgml
+++ b/doc/src/sgml/geqo.sgml
@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.33 2005/10/25 13:38:09 momjian Exp $
+$PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.34 2005/11/07 17:36:44 tgl Exp $
 Genetic Optimizer
 -->

@ -51,8 +51,8 @@ Genetic Optimizer
    caused by the support of a variety of <firstterm>join
    methods</firstterm> (e.g., nested loop, hash join, merge join in
    <productname>PostgreSQL</productname>) to process individual joins
-    and a diversity of <firstterm>indexes</firstterm> (e.g., R-tree,
-    B-tree, hash in <productname>PostgreSQL</productname>) as access
+    and a diversity of <firstterm>indexes</firstterm> (e.g.,
+    B-tree, hash, GiST in <productname>PostgreSQL</productname>) as access
    paths for relations.
   </para>

--- a/doc/src/sgml/gist.sgml
+++ b/doc/src/sgml/gist.sgml
@ -1,25 +1,22 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/gist.sgml,v 1.24 2005/11/04 23:14:00 petere Exp $
+$PostgreSQL: pgsql/doc/src/sgml/gist.sgml,v 1.25 2005/11/07 17:36:44 tgl Exp $
 -->

 <chapter id="GiST">
 <title>GiST Indexes</title>

-<sect1 id="gist-intro">
- <title>Introduction</title>
-
- <para>
   <indexterm>
    <primary>index</primary>
    <secondary>GiST</secondary>
   </indexterm>
-   <indexterm>
-    <primary>GiST</primary>
-    <see>index</see>
-   </indexterm>
+
+<sect1 id="gist-intro">
+ <title>Introduction</title>
+
+ <para>
   <acronym>GiST</acronym> stands for Generalized Search Tree.  It is a
   balanced, tree-structured access method, that acts as a base template in
-   which to implement arbitrary indexing schemes. B+-trees, R-trees and many
+   which to implement arbitrary indexing schemes. B-trees, R-trees and many
   other indexing schemes can be implemented in <acronym>GiST</acronym>.
 </para>

@ -60,17 +57,17 @@ $PostgreSQL: pgsql/doc/src/sgml/gist.sgml,v 1.24 2005/11/04 23:14:00 petere Exp
 <para>
   This extensibility should not be confused with the extensibility of the
   other standard search trees in terms of the data they can handle.  For
-   example, <productname>PostgreSQL</productname> supports extensible B+-trees
-   and R-trees. That means that you can use
-   <productname>PostgreSQL</productname> to build a B+-tree or R-tree over any
-   data type you want. But B+-trees only support range predicates
+   example, <productname>PostgreSQL</productname> supports extensible B-trees
+   and hash indexes. That means that you can use
+   <productname>PostgreSQL</productname> to build a B-tree or hash over any
+   data type you want. But B-trees only support range predicates
   (<literal>&lt;</literal>, <literal>=</literal>, <literal>&gt;</literal>),
-   and R-trees only support n-D range queries (contains, contained, equals).
+   and hash indexes only support equality queries.
 </para>
 
 <para>
   So if you index, say, an image collection with a
-   <productname>PostgreSQL</productname> B+-tree, you can only issue queries
+   <productname>PostgreSQL</productname> B-tree, you can only issue queries
   such as <quote>is imagex equal to imagey</quote>, <quote>is imagex less
   than imagey</quote> and <quote>is imagex greater than imagey</quote>?
   Depending on how you define <quote>equals</quote>, <quote>less than</quote>
@ -84,7 +81,7 @@ $PostgreSQL: pgsql/doc/src/sgml/gist.sgml,v 1.24 2005/11/04 23:14:00 petere Exp
   All it takes to get a <acronym>GiST</acronym> access method up and running
   is to implement seven user-defined methods, which define the behavior of
   keys in the tree. Of course these methods have to be pretty fancy to
-   support fancy queries, but for all the standard queries (B+-trees,
+   support fancy queries, but for all the standard queries (B-trees,
   R-trees, etc.) they're relatively straightforward. In short,
   <acronym>GiST</acronym> combines extensibility along with generality, code
   reuse, and a clean interface.
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.54 2005/11/04 23:14:00 petere Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.55 2005/11/07 17:36:44 tgl Exp $ -->

 <chapter id="indexes">
 <title id="indexes-title">Indexes</title>
@ -104,7 +104,7 @@ CREATE INDEX test1_id_index ON test1 (id);

  <para>
   <productname>PostgreSQL</productname> provides several index types:
-   B-tree, R-tree, Hash, and GiST.  Each index type uses a different
+   B-tree, Hash, and GiST.  Each index type uses a different
   algorithm that is best suited to different types of queries.
   By default, the <command>CREATE INDEX</command> command will create a
   B-tree index, which fits the most common situations.
@ -155,20 +155,51 @@ CREATE INDEX test1_id_index ON test1 (id);
  <para>
   <indexterm>
    <primary>index</primary>
-    <secondary>R-tree</secondary>
+    <secondary>hash</secondary>
   </indexterm>
   <indexterm>
-    <primary>R-tree</primary>
+    <primary>hash</primary>
    <see>index</see>
   </indexterm>
-   R-tree indexes are suited for queries on two-dimensional spatial data.
-   To create an R-tree index, use a command of the form
+   Hash indexes can only handle simple equality comparisons.
+   The query planner will consider using a hash index whenever an
+   indexed column is involved in a comparison using the
+   <literal>=</literal> operator.  The following command is used to
+   create a hash index:
 <synopsis>
-CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING rtree (<replaceable>column</replaceable>);
+CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING hash (<replaceable>column</replaceable>);
 </synopsis>
-   The <productname>PostgreSQL</productname> query planner will
-   consider using an R-tree index whenever an indexed column is
-   involved in a comparison using one of these operators:
+  </para>
+
+  <note>
+   <para>
+    Testing has shown <productname>PostgreSQL</productname>'s hash
+    indexes to perform no better than B-tree indexes, and the
+    index size and build time for hash indexes is much worse.
+    Furthermore, hash index operations are not presently WAL-logged,
+    so hash indexes may need to be rebuilt with <command>REINDEX</>
+    after a database crash.
+    For these reasons, hash index use is presently discouraged.
+   </para>
+  </note>  
+
+  <para>
+   <indexterm>
+    <primary>index</primary>
+    <secondary>GiST</secondary>
+   </indexterm>
+   <indexterm>
+    <primary>GiST</primary>
+    <see>index</see>
+   </indexterm>
+   GiST indexes are not a single kind of index, but rather an infrastructure
+   within which many different indexing strategies can be implemented.
+   Accordingly, the particular operators with which a GiST index can be
+   used vary depending on the indexing strategy (the <firstterm>operator
+   class</>).  As an example, the standard distribution of
+   <productname>PostgreSQL</productname> includes GiST operator classes
+   for several two-dimensional geometric data types, which support indexed
+   queries using these operators:

   <simplelist>
    <member><literal>&lt;&lt;</literal></member>
@ -187,64 +218,10 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>

   (See <xref linkend="functions-geometry"> for the meaning of
   these operators.)
-  </para>
-
-  <para>
-   <indexterm>
-    <primary>index</primary>
-    <secondary>hash</secondary>
-   </indexterm>
-   <indexterm>
-    <primary>hash</primary>
-    <see>index</see>
-   </indexterm>
-   Hash indexes can only handle simple equality comparisons.
-   The query planner will consider using a hash index whenever an
-   indexed column is involved in a comparison using the
-   <literal>=</literal> operator.  The following command is used to
-   create a hash index:
-<synopsis>
-CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING hash (<replaceable>column</replaceable>);
-</synopsis>
-  </para>
-
-  <para>
-   GiST indexes are not a single kind of index, but rather an infrastructure
-   within which many different indexing strategies can be implemented.
-   Accordingly, the particular operators with which a GiST index can be
-   used vary depending on the indexing strategy (the <firstterm>operator
-   class</>).  The standard distribution of
-   <productname>PostgreSQL</productname> includes GiST operator classes
-   equivalent to the R-tree operator classes, and many other GiST operator
+   Many other GiST operator
   classes are available in the <literal>contrib</> collection or as separate
   projects.  For more information see <xref linkend="GiST">.
  </para>
-
-  <note>
-   <para>
-    Testing has shown <productname>PostgreSQL</productname>'s hash
-    indexes to perform no better than B-tree indexes, and the
-    index size and build time for hash indexes is much worse.
-    Furthermore, hash index operations are not presently WAL-logged,
-    so hash indexes may need to be rebuilt with <command>REINDEX</>
-    after a database crash.
-    For these reasons, hash index use is presently discouraged.
-   </para>
-
-   <para>
-    Similarly, R-tree indexes do not seem to have any performance
-    advantages compared to the equivalent operations of GiST indexes.
-    Like hash indexes, they are not WAL-logged and may need
-    reindexing after a database crash.
-   </para>
-
-   <para>
-    While the problems with hash indexes may be fixed eventually,
-    it is likely that the R-tree index type will be retired in a future
-    release.  Users are encouraged to migrate applications that use R-tree
-    indexes to GiST indexes.
-   </para>
-  </note>  
 </sect1>


--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/mvcc.sgml,v 2.52 2005/10/21 01:41:28 tgl Exp $
+$PostgreSQL: pgsql/doc/src/sgml/mvcc.sgml,v 2.53 2005/11/07 17:36:44 tgl Exp $
 -->

 <chapter id="mvcc">
@ -991,18 +991,6 @@ UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;
       </para>
      </listitem>
     </varlistentry>
-
-     <varlistentry>
-      <term>
-       R-tree indexes
-      </term>
-      <listitem>
-       <para>
-        Share/exclusive index-level locks are used for read/write access.
-        Locks are released after the entire command is done.
-       </para>
-      </listitem>
-     </varlistentry>
    </variablelist>
   </para>

@ -1012,8 +1000,7 @@ UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;
    indexes, they are the recommended index type for concurrent
    applications that need to index scalar data. When dealing with
    non-scalar data, B-trees are not useful, and GiST indexes should
-    be used instead.  R-tree indexes are deprecated and are likely
-    to disappear entirely in a future release.
+    be used instead.
   </para>
  </sect1>
 </chapter>
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/ref/create_index.sgml,v 1.51 2005/01/04 00:39:53 tgl Exp $
+$PostgreSQL: pgsql/doc/src/sgml/ref/create_index.sgml,v 1.52 2005/11/07 17:36:44 tgl Exp $
 PostgreSQL documentation
 -->

@ -34,7 +34,7 @@ CREATE [ UNIQUE ] INDEX <replaceable class="parameter">name</replaceable> ON <re
   <command>CREATE INDEX</command> constructs an index <replaceable
   class="parameter">index_name</replaceable> on the specified table.
   Indexes are primarily used to enhance database performance (though
-   inappropriate use will result in slower performance).
+   inappropriate use can result in slower performance).
  </para>

  <para>
@ -55,11 +55,7 @@ CREATE [ UNIQUE ] INDEX <replaceable class="parameter">name</replaceable> ON <re

  <para>
   <productname>PostgreSQL</productname> provides the index methods
-   B-tree, R-tree, hash, and GiST. The B-tree index method is an
-   implementation of Lehman-Yao high-concurrency B-trees. The R-tree
-   index method implements standard R-trees using Guttman's quadratic
-   split algorithm. The hash index method is an implementation of
-   Litwin's linear hashing.  Users can also define their own index
+   B-tree, hash, and GiST.  Users can also define their own index
   methods, but that is fairly complicated.
  </para>

@ -137,9 +133,9 @@ CREATE [ UNIQUE ] INDEX <replaceable class="parameter">name</replaceable> ON <re
      <term><replaceable class="parameter">method</replaceable></term>
      <listitem>
       <para>
-        The name of the method to be used for the index.  Choices are
+        The name of the index method to be used.  Choices are
        <literal>btree</literal>, <literal>hash</literal>,
-        <literal>rtree</literal>, and <literal>gist</literal>.  The
+        and <literal>gist</literal>.  The
        default method is <literal>btree</literal>.
       </para>
      </listitem>
@ -243,6 +239,15 @@ CREATE [ UNIQUE ] INDEX <replaceable class="parameter">name</replaceable> ON <re
   The best way to use indexes in such cases is to create a partial index
   using an <literal>IS NULL</> predicate.
  </para>
+
+  <para>
+   Prior releases of <productname>PostgreSQL</productname> also had an
+   R-tree index method.  This method has been removed because
+   it had no significant advantages over the GiST method.
+   If <literal>USING rtree</> is specified, <command>CREATE INDEX</>
+   will interpret it as <literal>USING gist</>, to simplify conversion
+   of old databases to GiST.
+  </para>
 </refsect1>

 <refsect1>
@ -270,13 +275,13 @@ CREATE INDEX code_idx ON films(code) TABLESPACE indexspace;
 Is this example correct?
 </comment>
  <para>
-   To create a R-tree index on a point attribute so that we
+   To create a GiST index on a point attribute so that we
   can efficiently use box operators on the result of the
   conversion function:
  </para>
  <programlisting>
 CREATE INDEX pointloc
-    ON points USING RTREE (point2box(location) box_ops);
+    ON points USING GIST (point2box(location) box_ops);
 SELECT * FROM points
    WHERE point2box(points.pointloc) = boxes.box;
  </programlisting>
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.41 2005/07/19 01:27:59 neilc Exp $
+$PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.42 2005/11/07 17:36:44 tgl Exp $
 -->

 <sect1 id="xindex">
@ -170,8 +170,12 @@ $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.41 2005/07/19 01:27:59 neilc Exp
   </table>

  <para>
-   R-tree indexes express relationships in two-dimensional space.
-   They use twelve strategies, shown in
+   GiST indexes are even more flexible: they do not have a fixed set of
+   strategies at all.  Instead, the <quote>consistency</> support routine
+   of each particular GiST operator class interprets the strategy numbers
+   however it likes.  As an example, several of the built-in GiST index
+   operator classes index two-dimensional geometric objects, providing
+   the <quote>R-tree</> strategies shown in
   <xref linkend="xindex-rtree-strat-table">.  Four of these are true
   two-dimensional tests (overlaps, same, contains, contained by);
   four of them consider only the X direction; and the other four
@ -179,7 +183,7 @@ $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.41 2005/07/19 01:27:59 neilc Exp
  </para>

   <table tocentry="1" id="xindex-rtree-strat-table">
-    <title>R-tree Strategies</title>
+    <title>GiST Two-Dimensional <quote>R-tree</> Strategies</title>
    <tgroup cols="2">
     <thead>
      <row>
@ -240,13 +244,6 @@ $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.41 2005/07/19 01:27:59 neilc Exp
    </tgroup>
   </table>

-  <para>
-   GiST indexes are even more flexible: they do not have a fixed set of
-   strategies at all.  Instead, the <quote>consistency</> support routine
-   of each particular GiST operator class interprets the strategy numbers
-   however it likes.
-  </para>
-
  <para>
   Note that all strategy operators return Boolean values.  In
   practice, all operators defined as index method strategies must
@ -274,9 +271,8 @@ $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.41 2005/07/19 01:27:59 neilc Exp
   additional support routines in order to work. For example, the B-tree
   index method must be able to compare two keys and determine whether one
   is greater than, equal to, or less than the other.  Similarly, the
-   R-tree index method must be able to compute
-   intersections,  unions, and sizes of rectangles.  These
-   operations do not correspond to operators used in qualifications in
+   hash index method must be able to compute hash codes for key values.
+   These operations do not correspond to operators used in qualifications in
   SQL commands;  they are administrative routines used by
   the index methods, internally.
  </para>
@ -339,37 +335,6 @@ $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.41 2005/07/19 01:27:59 neilc Exp
    </tgroup>
   </table>

-  <para>
-   R-tree indexes require three support functions,
-   shown in <xref linkend="xindex-rtree-support-table">.
-  </para>
-
-   <table tocentry="1" id="xindex-rtree-support-table">
-    <title>R-tree Support Functions</title>
-    <tgroup cols="2">
-     <thead>
-      <row>
-       <entry>Function</entry>
-       <entry>Support Number</entry>
-      </row>
-     </thead>
-     <tbody>
-      <row>
-       <entry>union</entry>
-       <entry>1</entry>
-      </row>
-      <row>
-       <entry>intersection</entry>
-       <entry>2</entry>
-      </row>
-      <row>
-       <entry>size</entry>
-       <entry>3</entry>
-      </row>
-     </tbody>
-    </tgroup>
-   </table>
-
  <para>
   GiST indexes require seven support functions,
   shown in <xref linkend="xindex-gist-support-table">.
@ -746,7 +711,7 @@ SELECT * FROM table WHERE integer_column &lt; 4;
 </programlisting>
   can be satisfied exactly by a B-tree index on the integer column.
   But there are cases where an index is useful as an inexact guide to
-   the matching rows.  For example, if an R-tree index stores only
+   the matching rows.  For example, if a GiST index stores only
   bounding boxes for objects, then it cannot exactly satisfy a <literal>WHERE</>
   condition that tests overlap between nonrectangular objects such as
   polygons.  Yet we could use the index to find objects whose bounding