mirror of
https://github.com/postgres/postgres.git
synced 2025-04-29 13:56:47 +03:00
GIN documentation and slightly improving GiST docs.
Thanks to Christopher Kings-Lynne <chris.kingslynne@gmail.com> for initial version and Jeff Davis <pgsql@j-davis.com> for inspection
This commit is contained in:
parent
4eef745fb1
commit
0ca9907ce4
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.85 2006/09/08 15:55:52 tgl Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.86 2006/09/14 11:16:27 teodor Exp $ -->
|
||||
|
||||
<chapter Id="runtime-config">
|
||||
<title>Server Configuration</title>
|
||||
@ -2172,7 +2172,20 @@ SELECT * FROM parent WHERE key = 2400;
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
|
||||
<varlistentry id="guc-gin-fuzzy-search-limit" xreflabel="gin_fuzzy_search_limit">
|
||||
<term><varname>gin_fuzzy_search_limit</varname> (<type>integer</type>)</term>
|
||||
<indexterm>
|
||||
<primary><varname>gin_fuzzy_search_limit</> configuration parameter</primary>
|
||||
</indexterm>
|
||||
<listitem>
|
||||
<para>
|
||||
Soft upper limit of the size of the returned set by GIN index. For more
|
||||
information see <xref linkend="gin-tips">.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
</variablelist>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.46 2006/09/05 03:09:56 momjian Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.47 2006/09/14 11:16:27 teodor Exp $ -->
|
||||
|
||||
<!entity history SYSTEM "history.sgml">
|
||||
<!entity info SYSTEM "info.sgml">
|
||||
@ -78,6 +78,7 @@
|
||||
<!entity catalogs SYSTEM "catalogs.sgml">
|
||||
<!entity geqo SYSTEM "geqo.sgml">
|
||||
<!entity gist SYSTEM "gist.sgml">
|
||||
<!entity gin SYSTEM "gin.sgml">
|
||||
<!entity planstats SYSTEM "planstats.sgml">
|
||||
<!entity indexam SYSTEM "indexam.sgml">
|
||||
<!entity nls SYSTEM "nls.sgml">
|
||||
|
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.36 2006/03/10 19:10:48 momjian Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.37 2006/09/14 11:16:27 teodor Exp $ -->
|
||||
|
||||
<chapter id="geqo">
|
||||
<chapterinfo>
|
||||
@ -49,8 +49,8 @@
|
||||
methods</firstterm> (e.g., nested loop, hash join, merge join in
|
||||
<productname>PostgreSQL</productname>) to process individual joins
|
||||
and a diversity of <firstterm>indexes</firstterm> (e.g.,
|
||||
B-tree, hash, GiST in <productname>PostgreSQL</productname>) as access
|
||||
paths for relations.
|
||||
B-tree, hash, GiST and GIN in <productname>PostgreSQL</productname>) as
|
||||
access paths for relations.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
|
231
doc/src/sgml/gin.sgml
Normal file
231
doc/src/sgml/gin.sgml
Normal file
@ -0,0 +1,231 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/gin.sgml,v 2.1 2006/09/14 11:16:27 teodor Exp $ -->
|
||||
|
||||
<chapter id="GIN">
|
||||
<title>GIN Indexes</title>
|
||||
|
||||
<indexterm>
|
||||
<primary>index</primary>
|
||||
<secondary>GIN</secondary>
|
||||
</indexterm>
|
||||
|
||||
<sect1 id="gin-intro">
|
||||
<title>Introduction</title>
|
||||
|
||||
<para>
|
||||
<acronym>GIN</acronym> stands for Generalized Inverted Index. It is
|
||||
an index structure storing a set of (key, posting list) pairs, where
|
||||
'posting list' is a set of rows in which the key occurs. The
|
||||
row may contain many keys.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
It is generalized in the sense that a <acronym>GIN</acronym> index
|
||||
does not need to be aware of the operation that it accelerates.
|
||||
Instead, it uses custom strategies defined for particular data types.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
One advantage of <acronym>GIN</acronym> is that it allows the development
|
||||
of custom data types with the appropriate access methods, by
|
||||
an expert in the domain of the data type, rather than a database expert.
|
||||
This is much the same advantage as using <acronym>GiST</acronym>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <acronym>GIN</acronym>
|
||||
implementation in <productname>PostgreSQL</productname> is primarily
|
||||
maintained by Teodor Sigaev and Oleg Bartunov, and there is more
|
||||
information on their
|
||||
<ulink url="http://www.sai.msu.su/~megera/oddmuse/index.cgi/Gin">website</ulink>.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="gin-extensibility">
|
||||
<title>Extensibility</title>
|
||||
|
||||
<para>
|
||||
The <acronym>GIN</acronym> interface has a high level of abstraction,
|
||||
requiring the access method implementer to only implement the semantics of
|
||||
the data type being accessed. The <acronym>GIN</acronym> layer itself
|
||||
takes care of concurrency, logging and searching the tree structure.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
All it takes to get a <acronym>GIN</acronym> access method working
|
||||
is to implement four user-defined methods, which define the behavior of
|
||||
keys in the tree. In short, <acronym>GIN</acronym> combines extensibility
|
||||
along with generality, code reuse, and a clean interface.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="gin-implementation">
|
||||
<title>Implementation</title>
|
||||
|
||||
<para>
|
||||
Internally, <acronym>GIN</acronym> consists of a B-tree index constructed
|
||||
over keys, where each key is an element of the indexed value
|
||||
(element of array, for example) and where each tuple in a leaf page is
|
||||
either a pointer to a B-tree over heap pointers (PT, posting tree), or a
|
||||
list of heap pointers (PL, posting list) if the tuple is small enough.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There are four methods that an index operator class for
|
||||
<acronym>GIN</acronym> must provide (prototypes are in pseudocode):
|
||||
</para>
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>int compare( Datum a, Datum b )</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Compares keys (not indexed values!) and returns an integer less than
|
||||
zero, zero, or greater than zero, indicating whether the first key is
|
||||
less than, equal to, or greater than the second.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Datum* extractValue(Datum inputValue, uint32 *nkeys)</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Returns an array of keys of value to be indexed, nkeys should
|
||||
contain the number of returned keys.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Datum* extractQuery(Datum query, uint32 nkeys,
|
||||
StrategyNumber n)</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Returns an array of keys of the query to be executed. n contains
|
||||
strategy number of operation (see <xref linkend="xindex-strategies">).
|
||||
Depending on n, query may be different type.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>bool consistent( bool check[], StrategyNumber n, Datum query)</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Returns TRUE if indexed value satisfies query qualifier with strategy n
|
||||
(or may satisfy in case of RECHECK mark in operator class).
|
||||
Each element of the check array is TRUE if indexed value has a
|
||||
corresponding key in the query: if (check[i] == TRUE ) the i-th key of
|
||||
the query is present in the indexed value.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
</variablelist>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="gin-tips">
|
||||
<title>GIN tips and trics</title>
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>Create vs insert</term>
|
||||
<listitem>
|
||||
<para>
|
||||
In most cases, insertion into <acronym>GIN</acronym> index is slow because
|
||||
many GIN keys may be inserted for each table row. So, when loading data
|
||||
in bulk it may be useful to drop index and recreate it
|
||||
after the data is loaded in the table.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>gin_fuzzy_search_limit</term>
|
||||
<listitem>
|
||||
<para>
|
||||
The primary goal of development <acronym>GIN</acronym> indices was
|
||||
support for highly scalable, full-text search in
|
||||
<productname>PostgreSQL</productname> and there are often situations when
|
||||
a full-text search returns a very large set of results. Since reading
|
||||
tuples from the disk and sorting them could take a lot of time, this is
|
||||
unacceptable for production. (Note that the index search itself is very
|
||||
fast.)
|
||||
</para>
|
||||
<para>
|
||||
Such queries usually contain very frequent words, so the results are not
|
||||
very helpful. To facilitate execution of such queries
|
||||
<acronym>GIN</acronym> has a configurable soft upper limit of the size
|
||||
of the returned set, determined by the
|
||||
<varname>gin_fuzzy_search_limit</varname> GUC variable. It is set to 0 by
|
||||
default (no limit).
|
||||
</para>
|
||||
<para>
|
||||
If a non-zero search limit is set, then the returned set is a subset of
|
||||
the whole result set, chosen at random.
|
||||
</para>
|
||||
<para>
|
||||
"Soft" means that the actual number of returned results could slightly
|
||||
differ from the specified limit, depending on the query and the quality
|
||||
of the system's random number generator.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<variablelist>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="gin-limit">
|
||||
<title>Limitations</title>
|
||||
|
||||
<para>
|
||||
<acronym>GIN</acronym> doesn't support full scan of index due to it's
|
||||
extremely inefficiency: because of a lot of keys per value,
|
||||
each heap pointer will returned several times.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
When extractQuery returns zero number of keys, <acronym>GIN</acronym> will
|
||||
emit a error: for different opclass and strategy semantic meaning of void
|
||||
query may be different (for example, any array contains void array,
|
||||
but they aren't overlapped with void one), and <acronym>GIN</acronym> can't
|
||||
suggest reasonable answer.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<acronym>GIN</acronym> searches keys only by equality matching. This may
|
||||
be improved in future.
|
||||
</para>
|
||||
</sect1>
|
||||
<sect1 id="gin-examples">
|
||||
<title>Examples</title>
|
||||
|
||||
<para>
|
||||
The <productname>PostgreSQL</productname> source distribution includes
|
||||
<acronym>GIN</acronym> classes for one-dimensional arrays of all internal
|
||||
types. The following
|
||||
<filename>contrib</> modules also contain <acronym>GIN</acronym>
|
||||
operator classes:
|
||||
</para>
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>intarray</term>
|
||||
<listitem>
|
||||
<para>Enhanced support for int4[]</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>tsearch2</term>
|
||||
<listitem>
|
||||
<para>Support for inverted text indexing. This is much faster for very
|
||||
large, mostly-static sets of documents.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
</chapter>
|
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.61 2006/09/13 23:42:26 tgl Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.62 2006/09/14 11:16:27 teodor Exp $ -->
|
||||
|
||||
<chapter id="indexes">
|
||||
<title id="indexes-title">Indexes</title>
|
||||
@ -116,7 +116,7 @@ CREATE INDEX test1_id_index ON test1 (id);
|
||||
|
||||
<para>
|
||||
<productname>PostgreSQL</productname> provides several index types:
|
||||
B-tree, Hash, and GiST. Each index type uses a different
|
||||
B-tree, Hash, GIN and GiST. Each index type uses a different
|
||||
algorithm that is best suited to different types of queries.
|
||||
By default, the <command>CREATE INDEX</command> command will create a
|
||||
B-tree index, which fits the most common situations.
|
||||
@ -238,6 +238,37 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
|
||||
classes are available in the <literal>contrib</> collection or as separate
|
||||
projects. For more information see <xref linkend="GiST">.
|
||||
</para>
|
||||
<para>
|
||||
<indexterm>
|
||||
<primary>index</primary>
|
||||
<secondary>GIN</secondary>
|
||||
</indexterm>
|
||||
<indexterm>
|
||||
<primary>GIN</primary>
|
||||
<see>index</see>
|
||||
</indexterm>
|
||||
GIN is a inverted index and it's usable for values which have more
|
||||
than one key, arrays for example. Like to GiST, GIN may support
|
||||
many different user-defined indexing strategies and the particular
|
||||
operators with which a GIN index can be used vary depending on the
|
||||
indexing strategy.
|
||||
As an example, the standard distribution of
|
||||
<productname>PostgreSQL</productname> includes GIN operator classes
|
||||
for one-dimentional arrays, which support indexed
|
||||
queries using these operators:
|
||||
|
||||
<simplelist>
|
||||
<member><literal><@</literal></member>
|
||||
<member><literal>@></literal></member>
|
||||
<member><literal>=</literal></member>
|
||||
<member><literal>&&</literal></member>
|
||||
</simplelist>
|
||||
|
||||
(See <xref linkend="functions-array"> for the meaning of
|
||||
these operators.)
|
||||
Another GIN operator classes are available in the <literal>contrib</>
|
||||
tsearch2 and intarray modules. For more information see <xref linkend="GIN">.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
|
||||
|
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/mvcc.sgml,v 2.58 2006/09/03 01:59:09 momjian Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/mvcc.sgml,v 2.59 2006/09/14 11:16:27 teodor Exp $ -->
|
||||
|
||||
<chapter id="mvcc">
|
||||
<title>Concurrency Control</title>
|
||||
@ -987,6 +987,20 @@ UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>
|
||||
<acronym>GIN</acronym> indexes
|
||||
</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Short-term share/exclusive page-level locks are used for
|
||||
read/write access. Locks are released immediately after each
|
||||
index row is fetched or inserted. However, note that GIN index
|
||||
usually requires several inserts per one table row.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</para>
|
||||
|
||||
@ -995,7 +1009,7 @@ UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;
|
||||
applications; since they also have more features than hash
|
||||
indexes, they are the recommended index type for concurrent
|
||||
applications that need to index scalar data. When dealing with
|
||||
non-scalar data, B-trees are not useful, and GiST indexes should
|
||||
non-scalar data, B-trees are not useful, and GiST or GIN indexes should
|
||||
be used instead.
|
||||
</para>
|
||||
</sect1>
|
||||
|
@ -1,5 +1,5 @@
|
||||
<!--
|
||||
$PostgreSQL: pgsql/doc/src/sgml/ref/create_opclass.sgml,v 1.15 2006/09/10 17:36:52 tgl Exp $
|
||||
$PostgreSQL: pgsql/doc/src/sgml/ref/create_opclass.sgml,v 1.16 2006/09/14 11:16:27 teodor Exp $
|
||||
PostgreSQL documentation
|
||||
-->
|
||||
|
||||
@ -192,7 +192,7 @@ CREATE OPERATOR CLASS <replaceable class="parameter">name</replaceable> [ DEFAUL
|
||||
<para>
|
||||
The data type actually stored in the index. Normally this is
|
||||
the same as the column data type, but some index methods
|
||||
(only GiST at this writing) allow it to be different. The
|
||||
(GIN and GiST for now) allow it to be different. The
|
||||
<literal>STORAGE</> clause must be omitted unless the index
|
||||
method allows a different type to be used.
|
||||
</para>
|
||||
|
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.45 2006/09/05 03:09:56 momjian Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.46 2006/09/14 11:16:27 teodor Exp $ -->
|
||||
|
||||
<sect1 id="xindex">
|
||||
<title>Interfacing Extensions To Indexes</title>
|
||||
@ -242,6 +242,44 @@
|
||||
</tgroup>
|
||||
</table>
|
||||
|
||||
<para>
|
||||
GIN indexes are similar to GiST in flexibility: it hasn't a fixed set
|
||||
of strategies. Instead, the <quote>consistency</> support routine
|
||||
interprets the strategy numbers accordingly with operator class
|
||||
definition. As an example, strategies of operator class over arrays
|
||||
is shown in <xref linkend="xindex-gin-array-strat-table">.
|
||||
</para>
|
||||
|
||||
<table tocentry="1" id="xindex-gin-array-strat-table">
|
||||
<title>GiST Two-Dimensional <quote>R-tree</> Strategies</title>
|
||||
<tgroup cols="2">
|
||||
<thead>
|
||||
<row>
|
||||
<entry>Operation</entry>
|
||||
<entry>Strategy Number</entry>
|
||||
</row>
|
||||
</thead>
|
||||
<tbody>
|
||||
<row>
|
||||
<entry>overlap</entry>
|
||||
<entry>1</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>contains</entry>
|
||||
<entry>2</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>is contained by</entry>
|
||||
<entry>3</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>equal</entry>
|
||||
<entry>4</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
|
||||
<para>
|
||||
Note that all strategy operators return Boolean values. In
|
||||
practice, all operators defined as index method strategies must
|
||||
@ -349,37 +387,84 @@
|
||||
</thead>
|
||||
<tbody>
|
||||
<row>
|
||||
<entry>consistent</entry>
|
||||
<entry>consistent - determine whether key satifies the
|
||||
query qualifier</entry>
|
||||
<entry>1</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>union</entry>
|
||||
<entry>union - compute union of of a set of given keys</entry>
|
||||
<entry>2</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>compress</entry>
|
||||
<entry>compress - computes a compressed representation of a key or value
|
||||
to be indexed</entry>
|
||||
<entry>3</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>decompress</entry>
|
||||
<entry>decompress - computes a decompressed representation of a
|
||||
compressed key </entry>
|
||||
<entry>4</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>penalty</entry>
|
||||
<entry>penalty - compute penalty for inserting new key into subtree
|
||||
with given subtree's key</entry>
|
||||
<entry>5</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>picksplit</entry>
|
||||
<entry>picksplit - determine which entries of a page are to be moved
|
||||
to the new page and compute the union keys for resulting pages </entry>
|
||||
<entry>6</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>equal</entry>
|
||||
<entry>equal - compare two keys and returns true if they are equal
|
||||
</entry>
|
||||
<entry>7</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
|
||||
<para>
|
||||
GIN indexes require four support functions,
|
||||
shown in <xref linkend="xindex-gin-support-table">.
|
||||
</para>
|
||||
|
||||
<table tocentry="1" id="xindex-gin-support-table">
|
||||
<title>GIN Support Functions</title>
|
||||
<tgroup cols="2">
|
||||
<thead>
|
||||
<row>
|
||||
<entry>Function</entry>
|
||||
<entry>Support Number</entry>
|
||||
</row>
|
||||
</thead>
|
||||
<tbody>
|
||||
<row>
|
||||
<entry>
|
||||
compare - Compare two keys and return an integer less than zero, zero, or
|
||||
greater than zero, indicating whether the first key is less than, equal to,
|
||||
or greater than the second.
|
||||
</entry>
|
||||
<entry>1</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>extractValue - extract keys from value to be indexed</entry>
|
||||
<entry>2</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>extractQuery - extract keys from query</entry>
|
||||
<entry>3</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>consistent - determine whether value matches by the
|
||||
query</entry>
|
||||
<entry>4</entry>
|
||||
</row>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
|
||||
<para>
|
||||
Unlike strategy operators, support functions return whichever data
|
||||
type the particular index method expects; for example in the case
|
||||
|
Loading…
x
Reference in New Issue
Block a user