mirror of
https://github.com/postgres/postgres.git
synced 2025-07-30 11:03:19 +03:00
Add external documentation for KNNGIST.
This commit is contained in:
@ -78,7 +78,7 @@
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
All it takes to get a <acronym>GiST</acronym> access method up and running
|
All it takes to get a <acronym>GiST</acronym> access method up and running
|
||||||
is to implement seven user-defined methods, which define the behavior of
|
is to implement several user-defined methods, which define the behavior of
|
||||||
keys in the tree. Of course these methods have to be pretty fancy to
|
keys in the tree. Of course these methods have to be pretty fancy to
|
||||||
support fancy queries, but for all the standard queries (B-trees,
|
support fancy queries, but for all the standard queries (B-trees,
|
||||||
R-trees, etc.) they're relatively straightforward. In short,
|
R-trees, etc.) they're relatively straightforward. In short,
|
||||||
@ -93,12 +93,13 @@
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
There are seven methods that an index operator class for
|
There are seven methods that an index operator class for
|
||||||
<acronym>GiST</acronym> must provide. Correctness of the index is ensured
|
<acronym>GiST</acronym> must provide, and an eighth that is optional.
|
||||||
|
Correctness of the index is ensured
|
||||||
by proper implementation of the <function>same</>, <function>consistent</>
|
by proper implementation of the <function>same</>, <function>consistent</>
|
||||||
and <function>union</> methods, while efficiency (size and speed) of the
|
and <function>union</> methods, while efficiency (size and speed) of the
|
||||||
index will depend on the <function>penalty</> and <function>picksplit</>
|
index will depend on the <function>penalty</> and <function>picksplit</>
|
||||||
methods.
|
methods.
|
||||||
The remaining two methods are <function>compress</> and
|
The remaining two basic methods are <function>compress</> and
|
||||||
<function>decompress</>, which allow an index to have internal tree data of
|
<function>decompress</>, which allow an index to have internal tree data of
|
||||||
a different type than the data it indexes. The leaves are to be of the
|
a different type than the data it indexes. The leaves are to be of the
|
||||||
indexed data type, while the other tree nodes can be of any C struct (but
|
indexed data type, while the other tree nodes can be of any C struct (but
|
||||||
@ -106,6 +107,9 @@
|
|||||||
see about <literal>varlena</> for variable sized data). If the tree's
|
see about <literal>varlena</> for variable sized data). If the tree's
|
||||||
internal data type exists at the SQL level, the <literal>STORAGE</> option
|
internal data type exists at the SQL level, the <literal>STORAGE</> option
|
||||||
of the <command>CREATE OPERATOR CLASS</> command can be used.
|
of the <command>CREATE OPERATOR CLASS</> command can be used.
|
||||||
|
The optional eighth method is <function>distance</>, which is needed
|
||||||
|
if the operator class wishes to support ordered scans (nearest-neighbor
|
||||||
|
searches).
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<variablelist>
|
<variablelist>
|
||||||
@ -567,6 +571,73 @@ my_same(PG_FUNCTION_ARGS)
|
|||||||
</listitem>
|
</listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term><function>distance</></term>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Given an index entry <literal>p</> and a query value <literal>q</>,
|
||||||
|
this function determines the index entry's
|
||||||
|
<quote>distance</> from the query value. This function must be
|
||||||
|
supplied if the operator class contains any ordering operators.
|
||||||
|
A query using the ordering operator will be implemented by returning
|
||||||
|
index entries with the smallest <quote>distance</> values first,
|
||||||
|
so the results must be consistent with the operator's semantics.
|
||||||
|
For a leaf index entry the result just represents the distance to
|
||||||
|
the index entry; for an internal tree node, the result must be the
|
||||||
|
smallest distance that any child entry could have.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The <acronym>SQL</> declaration of the function must look like this:
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
CREATE OR REPLACE FUNCTION my_distance(internal, data_type, smallint, oid)
|
||||||
|
RETURNS float8
|
||||||
|
AS 'MODULE_PATHNAME'
|
||||||
|
LANGUAGE C STRICT;
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
And the matching code in the C module could then follow this skeleton:
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
Datum my_distance(PG_FUNCTION_ARGS);
|
||||||
|
PG_FUNCTION_INFO_V1(my_distance);
|
||||||
|
|
||||||
|
Datum
|
||||||
|
my_distance(PG_FUNCTION_ARGS)
|
||||||
|
{
|
||||||
|
GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
|
||||||
|
data_type *query = PG_GETARG_DATA_TYPE_P(1);
|
||||||
|
StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
|
||||||
|
/* Oid subtype = PG_GETARG_OID(3); */
|
||||||
|
data_type *key = DatumGetDataType(entry->key);
|
||||||
|
double retval;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* determine return value as a function of strategy, key and query.
|
||||||
|
*/
|
||||||
|
|
||||||
|
PG_RETURN_FLOAT8(retval);
|
||||||
|
}
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
The arguments to the <function>distance</> function are identical to
|
||||||
|
the arguments of the <function>consistent</> function, except that no
|
||||||
|
recheck flag is used. The distance to a leaf index entry must always
|
||||||
|
be determined exactly, since there is no way to re-order the tuples
|
||||||
|
once they are returned. Some approximation is allowed when determining
|
||||||
|
the distance to an internal tree node, so long as the result is never
|
||||||
|
greater than any child's actual distance. Thus, for example, distance
|
||||||
|
to a bounding box is usually sufficient in geometric applications. The
|
||||||
|
result value can be any finite <type>float8</> value. (Infinity and
|
||||||
|
minus infinity are used internally to handle cases such as nulls, so it
|
||||||
|
is not recommended that <function>distance</> functions return these
|
||||||
|
values.)
|
||||||
|
</para>
|
||||||
|
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
</variablelist>
|
</variablelist>
|
||||||
|
|
||||||
</sect1>
|
</sect1>
|
||||||
|
@ -505,11 +505,31 @@ amrestrpos (IndexScanDesc scan);
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
Some access methods return index entries in a well-defined order, others
|
Some access methods return index entries in a well-defined order, others
|
||||||
do not. If entries are returned in sorted order, the access method should
|
do not. There are actually two different ways that an access method can
|
||||||
set <structname>pg_am</>.<structfield>amcanorder</> true to indicate that
|
support sorted output:
|
||||||
it supports ordered scans.
|
|
||||||
All such access methods must use btree-compatible strategy numbers for
|
<itemizedlist>
|
||||||
their equality and ordering operators.
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Access methods that always return entries in the natural ordering
|
||||||
|
of their data (such as btree) should set
|
||||||
|
<structname>pg_am</>.<structfield>amcanorder</> to true.
|
||||||
|
Currently, such access methods must use btree-compatible strategy
|
||||||
|
numbers for their equality and ordering operators.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Access methods that support ordering operators should set
|
||||||
|
<structname>pg_am</>.<structfield>amcanorderbyop</> to true.
|
||||||
|
This indicates that the index is capable of returning entries in
|
||||||
|
an order satisfying <literal>ORDER BY</> <replaceable>index_key</>
|
||||||
|
<replaceable>operator</> <replaceable>constant</>. Scan modifiers
|
||||||
|
of that form can be passed to <function>amrescan</> as described
|
||||||
|
previously.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -521,7 +541,7 @@ amrestrpos (IndexScanDesc scan);
|
|||||||
the normal front-to-back direction, so <function>amgettuple</> must return
|
the normal front-to-back direction, so <function>amgettuple</> must return
|
||||||
the last matching tuple in the index, rather than the first one as it
|
the last matching tuple in the index, rather than the first one as it
|
||||||
normally would. (This will only occur for access
|
normally would. (This will only occur for access
|
||||||
methods that advertise they support ordered scans.) After the
|
methods that set <structfield>amcanorder</> to true.) After the
|
||||||
first call, <function>amgettuple</> must be prepared to advance the scan in
|
first call, <function>amgettuple</> must be prepared to advance the scan in
|
||||||
either direction from the most recently returned entry. (But if
|
either direction from the most recently returned entry. (But if
|
||||||
<structname>pg_am</>.<structfield>amcanbackward</> is false, all subsequent
|
<structname>pg_am</>.<structfield>amcanbackward</> is false, all subsequent
|
||||||
@ -563,7 +583,8 @@ amrestrpos (IndexScanDesc scan);
|
|||||||
tuples at once and marking or restoring scan positions isn't
|
tuples at once and marking or restoring scan positions isn't
|
||||||
supported. Secondly, the tuples are returned in a bitmap which doesn't
|
supported. Secondly, the tuples are returned in a bitmap which doesn't
|
||||||
have any specific ordering, which is why <function>amgetbitmap</> doesn't
|
have any specific ordering, which is why <function>amgetbitmap</> doesn't
|
||||||
take a <literal>direction</> argument. Finally, <function>amgetbitmap</>
|
take a <literal>direction</> argument. (Ordering operators will never be
|
||||||
|
supplied for such a scan, either.) Finally, <function>amgetbitmap</>
|
||||||
does not guarantee any locking of the returned tuples, with implications
|
does not guarantee any locking of the returned tuples, with implications
|
||||||
spelled out in <xref linkend="index-locking">.
|
spelled out in <xref linkend="index-locking">.
|
||||||
</para>
|
</para>
|
||||||
|
@ -167,6 +167,11 @@ CREATE INDEX test1_id_index ON test1 (id);
|
|||||||
upper/lower case conversion.
|
upper/lower case conversion.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
B-tree indexes can also be used to retrieve data in sorted order.
|
||||||
|
This is not always faster than a simple scan and sort, but it is
|
||||||
|
often helpful.
|
||||||
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
<indexterm>
|
<indexterm>
|
||||||
@ -236,6 +241,18 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
|
|||||||
classes are available in the <literal>contrib</> collection or as separate
|
classes are available in the <literal>contrib</> collection or as separate
|
||||||
projects. For more information see <xref linkend="GiST">.
|
projects. For more information see <xref linkend="GiST">.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
GiST indexes are also capable of optimizing <quote>nearest-neighbor</>
|
||||||
|
searches, such as
|
||||||
|
<programlisting><![CDATA[
|
||||||
|
SELECT * FROM places ORDER BY location <-> point '(101,456)' LIMIT 10;
|
||||||
|
]]>
|
||||||
|
</programlisting>
|
||||||
|
which finds the ten places closest to a given target point. The ability
|
||||||
|
to do this is again dependent on the particular operator class being used.
|
||||||
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
<indexterm>
|
<indexterm>
|
||||||
<primary>index</primary>
|
<primary>index</primary>
|
||||||
|
@ -361,59 +361,74 @@
|
|||||||
</table>
|
</table>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
GiST indexes require seven support functions,
|
GiST indexes require seven support functions, with an optional eighth, as
|
||||||
shown in <xref linkend="xindex-gist-support-table">.
|
shown in <xref linkend="xindex-gist-support-table">.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<table tocentry="1" id="xindex-gist-support-table">
|
<table tocentry="1" id="xindex-gist-support-table">
|
||||||
<title>GiST Support Functions</title>
|
<title>GiST Support Functions</title>
|
||||||
<tgroup cols="2">
|
<tgroup cols="3">
|
||||||
<thead>
|
<thead>
|
||||||
<row>
|
<row>
|
||||||
<entry>Function</entry>
|
<entry>Function</entry>
|
||||||
|
<entry>Description</entry>
|
||||||
<entry>Support Number</entry>
|
<entry>Support Number</entry>
|
||||||
</row>
|
</row>
|
||||||
</thead>
|
</thead>
|
||||||
<tbody>
|
<tbody>
|
||||||
<row>
|
<row>
|
||||||
<entry>consistent - determine whether key satisfies the
|
<entry><function>consistent</></entry>
|
||||||
|
<entry>determine whether key satisfies the
|
||||||
query qualifier</entry>
|
query qualifier</entry>
|
||||||
<entry>1</entry>
|
<entry>1</entry>
|
||||||
</row>
|
</row>
|
||||||
<row>
|
<row>
|
||||||
<entry>union - compute union of a set of keys</entry>
|
<entry><function>union</></entry>
|
||||||
|
<entry>compute union of a set of keys</entry>
|
||||||
<entry>2</entry>
|
<entry>2</entry>
|
||||||
</row>
|
</row>
|
||||||
<row>
|
<row>
|
||||||
<entry>compress - compute a compressed representation of a key or value
|
<entry><function>compress</></entry>
|
||||||
|
<entry>compute a compressed representation of a key or value
|
||||||
to be indexed</entry>
|
to be indexed</entry>
|
||||||
<entry>3</entry>
|
<entry>3</entry>
|
||||||
</row>
|
</row>
|
||||||
<row>
|
<row>
|
||||||
<entry>decompress - compute a decompressed representation of a
|
<entry><function>decompress</></entry>
|
||||||
|
<entry>compute a decompressed representation of a
|
||||||
compressed key</entry>
|
compressed key</entry>
|
||||||
<entry>4</entry>
|
<entry>4</entry>
|
||||||
</row>
|
</row>
|
||||||
<row>
|
<row>
|
||||||
<entry>penalty - compute penalty for inserting new key into subtree
|
<entry><function>penalty</></entry>
|
||||||
|
<entry>compute penalty for inserting new key into subtree
|
||||||
with given subtree's key</entry>
|
with given subtree's key</entry>
|
||||||
<entry>5</entry>
|
<entry>5</entry>
|
||||||
</row>
|
</row>
|
||||||
<row>
|
<row>
|
||||||
<entry>picksplit - determine which entries of a page are to be moved
|
<entry><function>picksplit</></entry>
|
||||||
|
<entry>determine which entries of a page are to be moved
|
||||||
to the new page and compute the union keys for resulting pages</entry>
|
to the new page and compute the union keys for resulting pages</entry>
|
||||||
<entry>6</entry>
|
<entry>6</entry>
|
||||||
</row>
|
</row>
|
||||||
<row>
|
<row>
|
||||||
<entry>equal - compare two keys and return true if they are equal</entry>
|
<entry><function>equal</></entry>
|
||||||
|
<entry>compare two keys and return true if they are equal</entry>
|
||||||
<entry>7</entry>
|
<entry>7</entry>
|
||||||
</row>
|
</row>
|
||||||
|
<row>
|
||||||
|
<entry><function>distance</></entry>
|
||||||
|
<entry>
|
||||||
|
(optional method) determine distance from key to query value
|
||||||
|
</entry>
|
||||||
|
<entry>8</entry>
|
||||||
|
</row>
|
||||||
</tbody>
|
</tbody>
|
||||||
</tgroup>
|
</tgroup>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
GIN indexes require four support functions,
|
GIN indexes require four support functions, with an optional fifth, as
|
||||||
shown in <xref linkend="xindex-gin-support-table">.
|
shown in <xref linkend="xindex-gin-support-table">.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
@ -20,6 +20,7 @@ The current implementation of GiST supports:
|
|||||||
|
|
||||||
* Variable length keys
|
* Variable length keys
|
||||||
* Composite keys (multi-key)
|
* Composite keys (multi-key)
|
||||||
|
* Ordered search (nearest-neighbor search)
|
||||||
* provides NULL-safe interface to GiST core
|
* provides NULL-safe interface to GiST core
|
||||||
* Concurrency
|
* Concurrency
|
||||||
* Recovery support via WAL logging
|
* Recovery support via WAL logging
|
||||||
@ -32,8 +33,8 @@ Marcel Kornaker:
|
|||||||
|
|
||||||
The original algorithms were modified in several ways:
|
The original algorithms were modified in several ways:
|
||||||
|
|
||||||
* They should be adapted to PostgreSQL conventions. For example, the SEARCH
|
* They had to be adapted to PostgreSQL conventions. For example, the SEARCH
|
||||||
algorithm was considerably changed, because in PostgreSQL function search
|
algorithm was considerably changed, because in PostgreSQL the search function
|
||||||
should return one tuple (next), not all tuples at once. Also, it should
|
should return one tuple (next), not all tuples at once. Also, it should
|
||||||
release page locks between calls.
|
release page locks between calls.
|
||||||
* Since we added support for variable length keys, it's not possible to
|
* Since we added support for variable length keys, it's not possible to
|
||||||
@ -41,12 +42,12 @@ The original algorithms were modified in several ways:
|
|||||||
defined function picksplit doesn't have information about size of tuples
|
defined function picksplit doesn't have information about size of tuples
|
||||||
(each tuple may contain several keys as in multicolumn index while picksplit
|
(each tuple may contain several keys as in multicolumn index while picksplit
|
||||||
could work with only one key) and pages.
|
could work with only one key) and pages.
|
||||||
* We modified original INSERT algorithm for performance reason. In particular,
|
* We modified original INSERT algorithm for performance reasons. In particular,
|
||||||
it is now a single-pass algorithm.
|
it is now a single-pass algorithm.
|
||||||
* Since the papers were theoretical, some details were omitted and we
|
* Since the papers were theoretical, some details were omitted and we
|
||||||
have to find out ourself how to solve some specific problems.
|
had to find out ourself how to solve some specific problems.
|
||||||
|
|
||||||
Because of the above reasons, we have to revised interaction of GiST
|
Because of the above reasons, we have revised the interaction of GiST
|
||||||
core and PostgreSQL WAL system. Moreover, we encountered (and solved)
|
core and PostgreSQL WAL system. Moreover, we encountered (and solved)
|
||||||
a problem of uncompleted insertions when recovering after crash, which
|
a problem of uncompleted insertions when recovering after crash, which
|
||||||
was not touched in the paper.
|
was not touched in the paper.
|
||||||
@ -54,46 +55,49 @@ was not touched in the paper.
|
|||||||
Search Algorithm
|
Search Algorithm
|
||||||
----------------
|
----------------
|
||||||
|
|
||||||
Function gettuple finds a tuple which satisfies the search
|
The search code maintains a queue of unvisited items, where an "item" is
|
||||||
predicate. It store their state and returns next tuple under
|
either a heap tuple known to satisfy the search conditions, or an index
|
||||||
subsequent calls. Stack contains page, its LSN and LSN of parent page
|
page that is consistent with the search conditions according to inspection
|
||||||
and currentposition is saved between calls.
|
of its parent page's downlink item. Initially the root page is searched
|
||||||
|
to find unvisited items in it. Then we pull items from the queue. A
|
||||||
|
heap tuple pointer is just returned immediately; an index page entry
|
||||||
|
causes that page to be searched, generating more queue entries.
|
||||||
|
|
||||||
gettuple(search-pred)
|
The queue is kept ordered with heap tuple items at the front, then
|
||||||
if ( firsttime )
|
index page entries, with any newly-added index page entry inserted
|
||||||
push(stack, [root, 0, 0]) // page, LSN, parentLSN
|
before existing index page entries. This ensures depth-first traversal
|
||||||
currentposition=0
|
of the index, and in particular causes the first few heap tuples to be
|
||||||
end
|
returned as soon as possible. That is helpful in case there is a LIMIT
|
||||||
ptr = top of stack
|
that requires only a few tuples to be produced.
|
||||||
while(true)
|
|
||||||
latch( ptr->page, S-mode )
|
|
||||||
if ( ptr->page->lsn != ptr->lsn )
|
|
||||||
ptr->lsn = ptr->page->lsn
|
|
||||||
currentposition=0
|
|
||||||
if ( ptr->parentlsn < ptr->page->nsn )
|
|
||||||
add to stack rightlink
|
|
||||||
else
|
|
||||||
currentposition++
|
|
||||||
end
|
|
||||||
|
|
||||||
while(true)
|
To implement nearest-neighbor search, the queue entries are augmented
|
||||||
currentposition = find_first_match( currentposition )
|
with distance data: heap tuple entries are labeled with exact distance
|
||||||
if ( currentposition is invalid )
|
from the search argument, while index-page entries must be labeled with
|
||||||
unlatch( ptr->page )
|
the minimum distance that any of their children could have. Then,
|
||||||
pop stack
|
queue entries are retrieved in smallest-distance-first order, with
|
||||||
ptr = top of stack
|
entries having identical distances managed as stated in the previous
|
||||||
if (ptr is NULL)
|
paragraph.
|
||||||
return NULL
|
|
||||||
break loop
|
The search algorithm keeps an index page locked only long enough to scan
|
||||||
else if ( ptr->page is leaf )
|
its entries and queue those that satisfy the search conditions. Since
|
||||||
unlatch( ptr->page )
|
insertions can occur concurrently with searches, it is possible for an
|
||||||
return tuple
|
index child page to be split between the time we make a queue entry for it
|
||||||
else
|
(while visiting its parent page) and the time we actually reach and scan
|
||||||
add to stack child page
|
the child page. To avoid missing the entries that were moved to the right
|
||||||
end
|
sibling, we detect whether a split has occurred by comparing the child
|
||||||
currentposition++
|
page's NSN to the LSN that the parent had when visited. If it did, the
|
||||||
end
|
sibling page is immediately added to the front of the queue, ensuring that
|
||||||
end
|
its items will be scanned in the same order as if they were still on the
|
||||||
|
original child page.
|
||||||
|
|
||||||
|
As is usual in Postgres, the search algorithm only guarantees to find index
|
||||||
|
entries that existed before the scan started; index entries added during
|
||||||
|
the scan might or might not be visited. This is okay as long as all
|
||||||
|
searches use MVCC snapshot rules to reject heap tuples newer than the time
|
||||||
|
of scan start. In particular, this means that we need not worry about
|
||||||
|
cases where a parent page's downlink key is "enlarged" after we look at it.
|
||||||
|
Any such enlargement would be to add child items that we aren't interested
|
||||||
|
in returning anyway.
|
||||||
|
|
||||||
|
|
||||||
Insert Algorithm
|
Insert Algorithm
|
||||||
|
Reference in New Issue
Block a user