mirror of
https://github.com/postgres/postgres.git
synced 2025-05-11 05:41:32 +03:00
This is for use by the PG 13 release notes, but might be used for minor release notes in the future. Backpatch-through: 9.5
1024 lines
47 KiB
Plaintext
1024 lines
47 KiB
Plaintext
<!-- doc/src/sgml/spgist.sgml -->
|
|
|
|
<chapter id="spgist" xreflabel="SP-GiST">
|
|
<title>SP-GiST Indexes</title>
|
|
|
|
<indexterm>
|
|
<primary>index</primary>
|
|
<secondary>SP-GiST</secondary>
|
|
</indexterm>
|
|
|
|
<sect1 id="spgist-intro">
|
|
<title>Introduction</title>
|
|
|
|
<para>
|
|
<acronym>SP-GiST</acronym> is an abbreviation for space-partitioned
|
|
<acronym>GiST</acronym>. <acronym>SP-GiST</acronym> supports partitioned
|
|
search trees, which facilitate development of a wide range of different
|
|
non-balanced data structures, such as quad-trees, k-d trees, and radix
|
|
trees (tries). The common feature of these structures is that they
|
|
repeatedly divide the search space into partitions that need not be
|
|
of equal size. Searches that are well matched to the partitioning rule
|
|
can be very fast.
|
|
</para>
|
|
|
|
<para>
|
|
These popular data structures were originally developed for in-memory
|
|
usage. In main memory, they are usually designed as a set of dynamically
|
|
allocated nodes linked by pointers. This is not suitable for direct
|
|
storing on disk, since these chains of pointers can be rather long which
|
|
would require too many disk accesses. In contrast, disk-based data
|
|
structures should have a high fanout to minimize I/O. The challenge
|
|
addressed by <acronym>SP-GiST</acronym> is to map search tree nodes to
|
|
disk pages in such a way that a search need access only a few disk pages,
|
|
even if it traverses many nodes.
|
|
</para>
|
|
|
|
<para>
|
|
Like <acronym>GiST</acronym>, <acronym>SP-GiST</acronym> is meant to allow
|
|
the development of custom data types with the appropriate access methods,
|
|
by an expert in the domain of the data type, rather than a database expert.
|
|
</para>
|
|
|
|
<para>
|
|
Some of the information here is derived from Purdue University's
|
|
SP-GiST Indexing Project
|
|
<ulink url="https://www.cs.purdue.edu/spgist/">web site</ulink>.
|
|
The <acronym>SP-GiST</acronym> implementation in
|
|
<productname>PostgreSQL</productname> is primarily maintained by Teodor
|
|
Sigaev and Oleg Bartunov, and there is more information on their
|
|
<!-- URL will be changed -->
|
|
<ulink url="http://www.sai.msu.su/~megera/wiki/spgist_dev">web site</ulink>.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="spgist-builtin-opclasses">
|
|
<title>Built-in Operator Classes</title>
|
|
|
|
<para>
|
|
The core <productname>PostgreSQL</productname> distribution
|
|
includes the <acronym>SP-GiST</acronym> operator classes shown in
|
|
<xref linkend="spgist-builtin-opclasses-table"/>.
|
|
</para>
|
|
|
|
<table id="spgist-builtin-opclasses-table">
|
|
<title>Built-in <acronym>SP-GiST</acronym> Operator Classes</title>
|
|
<tgroup cols="4">
|
|
<thead>
|
|
<row>
|
|
<entry>Name</entry>
|
|
<entry>Indexed Data Type</entry>
|
|
<entry>Indexable Operators</entry>
|
|
<entry>Ordering Operators</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry><literal>kd_point_ops</literal></entry>
|
|
<entry><type>point</type></entry>
|
|
<entry>
|
|
<literal><<</literal>
|
|
<literal><@</literal>
|
|
<literal><^</literal>
|
|
<literal>>></literal>
|
|
<literal>>^</literal>
|
|
<literal>~=</literal>
|
|
</entry>
|
|
<entry>
|
|
<literal><-></literal>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>quad_point_ops</literal></entry>
|
|
<entry><type>point</type></entry>
|
|
<entry>
|
|
<literal><<</literal>
|
|
<literal><@</literal>
|
|
<literal><^</literal>
|
|
<literal>>></literal>
|
|
<literal>>^</literal>
|
|
<literal>~=</literal>
|
|
</entry>
|
|
<entry>
|
|
<literal><-></literal>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>range_ops</literal></entry>
|
|
<entry>any range type</entry>
|
|
<entry>
|
|
<literal>&&</literal>
|
|
<literal>&<</literal>
|
|
<literal>&></literal>
|
|
<literal>-|-</literal>
|
|
<literal><<</literal>
|
|
<literal><@</literal>
|
|
<literal>=</literal>
|
|
<literal>>></literal>
|
|
<literal>@></literal>
|
|
</entry>
|
|
<entry>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>box_ops</literal></entry>
|
|
<entry><type>box</type></entry>
|
|
<entry>
|
|
<literal><<</literal>
|
|
<literal>&<</literal>
|
|
<literal>&&</literal>
|
|
<literal>&></literal>
|
|
<literal>>></literal>
|
|
<literal>~=</literal>
|
|
<literal>@></literal>
|
|
<literal><@</literal>
|
|
<literal>&<|</literal>
|
|
<literal><<|</literal>
|
|
<literal>|>></literal>
|
|
<literal>|&></literal>
|
|
</entry>
|
|
<entry>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>poly_ops</literal></entry>
|
|
<entry><type>polygon</type></entry>
|
|
<entry>
|
|
<literal><<</literal>
|
|
<literal>&<</literal>
|
|
<literal>&&</literal>
|
|
<literal>&></literal>
|
|
<literal>>></literal>
|
|
<literal>~=</literal>
|
|
<literal>@></literal>
|
|
<literal><@</literal>
|
|
<literal>&<|</literal>
|
|
<literal><<|</literal>
|
|
<literal>|>></literal>
|
|
<literal>|&></literal>
|
|
</entry>
|
|
<entry>
|
|
<literal><-></literal>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>text_ops</literal></entry>
|
|
<entry><type>text</type></entry>
|
|
<entry>
|
|
<literal><</literal>
|
|
<literal><=</literal>
|
|
<literal>=</literal>
|
|
<literal>></literal>
|
|
<literal>>=</literal>
|
|
<literal>~<=~</literal>
|
|
<literal>~<~</literal>
|
|
<literal>~>=~</literal>
|
|
<literal>~>~</literal>
|
|
<literal>^@</literal>
|
|
</entry>
|
|
<entry>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>inet_ops</literal></entry>
|
|
<entry><type>inet</type>, <type>cidr</type></entry>
|
|
<entry>
|
|
<literal>&&</literal>
|
|
<literal>>></literal>
|
|
<literal>>>=</literal>
|
|
<literal>></literal>
|
|
<literal>>=</literal>
|
|
<literal><></literal>
|
|
<literal><<</literal>
|
|
<literal><<=</literal>
|
|
<literal><</literal>
|
|
<literal><=</literal>
|
|
<literal>=</literal>
|
|
</entry>
|
|
<entry>
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<para>
|
|
Of the two operator classes for type <type>point</type>,
|
|
<literal>quad_point_ops</literal> is the default. <literal>kd_point_ops</literal>
|
|
supports the same operators but uses a different index data structure that
|
|
may offer better performance in some applications.
|
|
</para>
|
|
<para>
|
|
The <literal>quad_point_ops</literal>, <literal>kd_point_ops</literal> and
|
|
<literal>poly_ops</literal> operator classes support the <literal><-></literal>
|
|
ordering operator, which enables the k-nearest neighbor (<literal>k-NN</literal>)
|
|
search over indexed point or polygon data sets.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="spgist-extensibility">
|
|
<title>Extensibility</title>
|
|
|
|
<para>
|
|
<acronym>SP-GiST</acronym> offers an interface with a high level of
|
|
abstraction, requiring the access method developer to implement only
|
|
methods specific to a given data type. The <acronym>SP-GiST</acronym> core
|
|
is responsible for efficient disk mapping and searching the tree structure.
|
|
It also takes care of concurrency and logging considerations.
|
|
</para>
|
|
|
|
<para>
|
|
Leaf tuples of an <acronym>SP-GiST</acronym> tree contain values of the
|
|
same data type as the indexed column. Leaf tuples at the root level will
|
|
always contain the original indexed data value, but leaf tuples at lower
|
|
levels might contain only a compressed representation, such as a suffix.
|
|
In that case the operator class support functions must be able to
|
|
reconstruct the original value using information accumulated from the
|
|
inner tuples that are passed through to reach the leaf level.
|
|
</para>
|
|
|
|
<para>
|
|
Inner tuples are more complex, since they are branching points in the
|
|
search tree. Each inner tuple contains a set of one or more
|
|
<firstterm>nodes</firstterm>, which represent groups of similar leaf values.
|
|
A node contains a downlink that leads either to another, lower-level inner
|
|
tuple, or to a short list of leaf tuples that all lie on the same index page.
|
|
Each node normally has a <firstterm>label</firstterm> that describes it; for example,
|
|
in a radix tree the node label could be the next character of the string
|
|
value. (Alternatively, an operator class can omit the node labels, if it
|
|
works with a fixed set of nodes for all inner tuples;
|
|
see <xref linkend="spgist-null-labels"/>.)
|
|
Optionally, an inner tuple can have a <firstterm>prefix</firstterm> value
|
|
that describes all its members. In a radix tree this could be the common
|
|
prefix of the represented strings. The prefix value is not necessarily
|
|
really a prefix, but can be any data needed by the operator class;
|
|
for example, in a quad-tree it can store the central point that the four
|
|
quadrants are measured with respect to. A quad-tree inner tuple would
|
|
then also contain four nodes corresponding to the quadrants around this
|
|
central point.
|
|
</para>
|
|
|
|
<para>
|
|
Some tree algorithms require knowledge of level (or depth) of the current
|
|
tuple, so the <acronym>SP-GiST</acronym> core provides the possibility for
|
|
operator classes to manage level counting while descending the tree.
|
|
There is also support for incrementally reconstructing the represented
|
|
value when that is needed, and for passing down additional data (called
|
|
<firstterm>traverse values</firstterm>) during a tree descent.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
The <acronym>SP-GiST</acronym> core code takes care of null entries.
|
|
Although <acronym>SP-GiST</acronym> indexes do store entries for nulls
|
|
in indexed columns, this is hidden from the index operator class code:
|
|
no null index entries or search conditions will ever be passed to the
|
|
operator class methods. (It is assumed that <acronym>SP-GiST</acronym>
|
|
operators are strict and so cannot succeed for null values.) Null values
|
|
are therefore not discussed further here.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
There are five user-defined methods that an index operator class for
|
|
<acronym>SP-GiST</acronym> must provide, and one is optional. All five
|
|
mandatory methods follow the convention of accepting two <type>internal</type>
|
|
arguments, the first of which is a pointer to a C struct containing input
|
|
values for the support method, while the second argument is a pointer to a
|
|
C struct where output values must be placed. Four of the mandatory methods just
|
|
return <type>void</type>, since all their results appear in the output struct; but
|
|
<function>leaf_consistent</function> additionally returns a <type>boolean</type> result.
|
|
The methods must not modify any fields of their input structs. In all
|
|
cases, the output struct is initialized to zeroes before calling the
|
|
user-defined method. The optional sixth method <function>compress</function>
|
|
accepts datum to be indexed as the only argument and returns a value suitable
|
|
for physical storage in a leaf tuple.
|
|
</para>
|
|
|
|
<para>
|
|
The five mandatory user-defined methods are:
|
|
</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><function>config</function></term>
|
|
<listitem>
|
|
<para>
|
|
Returns static information about the index implementation, including
|
|
the data type OIDs of the prefix and node label data types.
|
|
</para>
|
|
<para>
|
|
The <acronym>SQL</acronym> declaration of the function must look like this:
|
|
<programlisting>
|
|
CREATE FUNCTION my_config(internal, internal) RETURNS void ...
|
|
</programlisting>
|
|
The first argument is a pointer to a <structname>spgConfigIn</structname>
|
|
C struct, containing input data for the function.
|
|
The second argument is a pointer to a <structname>spgConfigOut</structname>
|
|
C struct, which the function must fill with result data.
|
|
<programlisting>
|
|
typedef struct spgConfigIn
|
|
{
|
|
Oid attType; /* Data type to be indexed */
|
|
} spgConfigIn;
|
|
|
|
typedef struct spgConfigOut
|
|
{
|
|
Oid prefixType; /* Data type of inner-tuple prefixes */
|
|
Oid labelType; /* Data type of inner-tuple node labels */
|
|
Oid leafType; /* Data type of leaf-tuple values */
|
|
bool canReturnData; /* Opclass can reconstruct original data */
|
|
bool longValuesOK; /* Opclass can cope with values > 1 page */
|
|
} spgConfigOut;
|
|
</programlisting>
|
|
|
|
<structfield>attType</structfield> is passed in order to support polymorphic
|
|
index operator classes; for ordinary fixed-data-type operator classes, it
|
|
will always have the same value and so can be ignored.
|
|
</para>
|
|
|
|
<para>
|
|
For operator classes that do not use prefixes,
|
|
<structfield>prefixType</structfield> can be set to <literal>VOIDOID</literal>.
|
|
Likewise, for operator classes that do not use node labels,
|
|
<structfield>labelType</structfield> can be set to <literal>VOIDOID</literal>.
|
|
<structfield>canReturnData</structfield> should be set true if the operator class
|
|
is capable of reconstructing the originally-supplied index value.
|
|
<structfield>longValuesOK</structfield> should be set true only when the
|
|
<structfield>attType</structfield> is of variable length and the operator
|
|
class is capable of segmenting long values by repeated suffixing
|
|
(see <xref linkend="spgist-limits"/>).
|
|
</para>
|
|
|
|
<para>
|
|
<structfield>leafType</structfield> is typically the same as
|
|
<structfield>attType</structfield>. For the reasons of backward
|
|
compatibility, method <function>config</function> can
|
|
leave <structfield>leafType</structfield> uninitialized; that would
|
|
give the same effect as setting <structfield>leafType</structfield> equal
|
|
to <structfield>attType</structfield>. When <structfield>attType</structfield>
|
|
and <structfield>leafType</structfield> are different, then optional
|
|
method <function>compress</function> must be provided.
|
|
Method <function>compress</function> is responsible
|
|
for transformation of datums to be indexed from <structfield>attType</structfield>
|
|
to <structfield>leafType</structfield>.
|
|
Note: both consistent functions will get <structfield>scankeys</structfield>
|
|
unchanged, without transformation using <function>compress</function>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><function>choose</function></term>
|
|
<listitem>
|
|
<para>
|
|
Chooses a method for inserting a new value into an inner tuple.
|
|
</para>
|
|
|
|
<para>
|
|
The <acronym>SQL</acronym> declaration of the function must look like this:
|
|
<programlisting>
|
|
CREATE FUNCTION my_choose(internal, internal) RETURNS void ...
|
|
</programlisting>
|
|
The first argument is a pointer to a <structname>spgChooseIn</structname>
|
|
C struct, containing input data for the function.
|
|
The second argument is a pointer to a <structname>spgChooseOut</structname>
|
|
C struct, which the function must fill with result data.
|
|
<programlisting>
|
|
typedef struct spgChooseIn
|
|
{
|
|
Datum datum; /* original datum to be indexed */
|
|
Datum leafDatum; /* current datum to be stored at leaf */
|
|
int level; /* current level (counting from zero) */
|
|
|
|
/* Data from current inner tuple */
|
|
bool allTheSame; /* tuple is marked all-the-same? */
|
|
bool hasPrefix; /* tuple has a prefix? */
|
|
Datum prefixDatum; /* if so, the prefix value */
|
|
int nNodes; /* number of nodes in the inner tuple */
|
|
Datum *nodeLabels; /* node label values (NULL if none) */
|
|
} spgChooseIn;
|
|
|
|
typedef enum spgChooseResultType
|
|
{
|
|
spgMatchNode = 1, /* descend into existing node */
|
|
spgAddNode, /* add a node to the inner tuple */
|
|
spgSplitTuple /* split inner tuple (change its prefix) */
|
|
} spgChooseResultType;
|
|
|
|
typedef struct spgChooseOut
|
|
{
|
|
spgChooseResultType resultType; /* action code, see above */
|
|
union
|
|
{
|
|
struct /* results for spgMatchNode */
|
|
{
|
|
int nodeN; /* descend to this node (index from 0) */
|
|
int levelAdd; /* increment level by this much */
|
|
Datum restDatum; /* new leaf datum */
|
|
} matchNode;
|
|
struct /* results for spgAddNode */
|
|
{
|
|
Datum nodeLabel; /* new node's label */
|
|
int nodeN; /* where to insert it (index from 0) */
|
|
} addNode;
|
|
struct /* results for spgSplitTuple */
|
|
{
|
|
/* Info to form new upper-level inner tuple with one child tuple */
|
|
bool prefixHasPrefix; /* tuple should have a prefix? */
|
|
Datum prefixPrefixDatum; /* if so, its value */
|
|
int prefixNNodes; /* number of nodes */
|
|
Datum *prefixNodeLabels; /* their labels (or NULL for
|
|
* no labels) */
|
|
int childNodeN; /* which node gets child tuple */
|
|
|
|
/* Info to form new lower-level inner tuple with all old nodes */
|
|
bool postfixHasPrefix; /* tuple should have a prefix? */
|
|
Datum postfixPrefixDatum; /* if so, its value */
|
|
} splitTuple;
|
|
} result;
|
|
} spgChooseOut;
|
|
</programlisting>
|
|
|
|
<structfield>datum</structfield> is the original datum of
|
|
<structname>spgConfigIn</structname>.<structfield>attType</structfield>
|
|
type that was to be inserted into the index.
|
|
<structfield>leafDatum</structfield> is a value of
|
|
<structname>spgConfigOut</structname>.<structfield>leafType</structfield>
|
|
type, which is initially a result of method
|
|
<function>compress</function> applied to <structfield>datum</structfield>
|
|
when method <function>compress</function> is provided, or the same value as
|
|
<structfield>datum</structfield> otherwise.
|
|
<structfield>leafDatum</structfield> can change at lower levels of the tree
|
|
if the <function>choose</function> or <function>picksplit</function>
|
|
methods change it. When the insertion search reaches a leaf page,
|
|
the current value of <structfield>leafDatum</structfield> is what will be stored
|
|
in the newly created leaf tuple.
|
|
<structfield>level</structfield> is the current inner tuple's level, starting at
|
|
zero for the root level.
|
|
<structfield>allTheSame</structfield> is true if the current inner tuple is
|
|
marked as containing multiple equivalent nodes
|
|
(see <xref linkend="spgist-all-the-same"/>).
|
|
<structfield>hasPrefix</structfield> is true if the current inner tuple contains
|
|
a prefix; if so,
|
|
<structfield>prefixDatum</structfield> is its value.
|
|
<structfield>nNodes</structfield> is the number of child nodes contained in the
|
|
inner tuple, and
|
|
<structfield>nodeLabels</structfield> is an array of their label values, or
|
|
NULL if there are no labels.
|
|
</para>
|
|
|
|
<para>
|
|
The <function>choose</function> function can determine either that
|
|
the new value matches one of the existing child nodes, or that a new
|
|
child node must be added, or that the new value is inconsistent with
|
|
the tuple prefix and so the inner tuple must be split to create a
|
|
less restrictive prefix.
|
|
</para>
|
|
|
|
<para>
|
|
If the new value matches one of the existing child nodes,
|
|
set <structfield>resultType</structfield> to <literal>spgMatchNode</literal>.
|
|
Set <structfield>nodeN</structfield> to the index (from zero) of that node in
|
|
the node array.
|
|
Set <structfield>levelAdd</structfield> to the increment in
|
|
<structfield>level</structfield> caused by descending through that node,
|
|
or leave it as zero if the operator class does not use levels.
|
|
Set <structfield>restDatum</structfield> to equal <structfield>leafDatum</structfield>
|
|
if the operator class does not modify datums from one level to the
|
|
next, or otherwise set it to the modified value to be used as
|
|
<structfield>leafDatum</structfield> at the next level.
|
|
</para>
|
|
|
|
<para>
|
|
If a new child node must be added,
|
|
set <structfield>resultType</structfield> to <literal>spgAddNode</literal>.
|
|
Set <structfield>nodeLabel</structfield> to the label to be used for the new
|
|
node, and set <structfield>nodeN</structfield> to the index (from zero) at which
|
|
to insert the node in the node array.
|
|
After the node has been added, the <function>choose</function>
|
|
function will be called again with the modified inner tuple;
|
|
that call should result in an <literal>spgMatchNode</literal> result.
|
|
</para>
|
|
|
|
<para>
|
|
If the new value is inconsistent with the tuple prefix,
|
|
set <structfield>resultType</structfield> to <literal>spgSplitTuple</literal>.
|
|
This action moves all the existing nodes into a new lower-level
|
|
inner tuple, and replaces the existing inner tuple with a tuple
|
|
having a single downlink pointing to the new lower-level inner tuple.
|
|
Set <structfield>prefixHasPrefix</structfield> to indicate whether the new
|
|
upper tuple should have a prefix, and if so set
|
|
<structfield>prefixPrefixDatum</structfield> to the prefix value. This new
|
|
prefix value must be sufficiently less restrictive than the original
|
|
to accept the new value to be indexed.
|
|
Set <structfield>prefixNNodes</structfield> to the number of nodes needed in the
|
|
new tuple, and set <structfield>prefixNodeLabels</structfield> to a palloc'd array
|
|
holding their labels, or to NULL if node labels are not required.
|
|
Note that the total size of the new upper tuple must be no more
|
|
than the total size of the tuple it is replacing; this constrains
|
|
the lengths of the new prefix and new labels.
|
|
Set <structfield>childNodeN</structfield> to the index (from zero) of the node
|
|
that will downlink to the new lower-level inner tuple.
|
|
Set <structfield>postfixHasPrefix</structfield> to indicate whether the new
|
|
lower-level inner tuple should have a prefix, and if so set
|
|
<structfield>postfixPrefixDatum</structfield> to the prefix value. The
|
|
combination of these two prefixes and the downlink node's label
|
|
(if any) must have the same meaning as the original prefix, because
|
|
there is no opportunity to alter the node labels that are moved to
|
|
the new lower-level tuple, nor to change any child index entries.
|
|
After the node has been split, the <function>choose</function>
|
|
function will be called again with the replacement inner tuple.
|
|
That call may return an <literal>spgAddNode</literal> result, if no suitable
|
|
node was created by the <literal>spgSplitTuple</literal> action. Eventually
|
|
<function>choose</function> must return <literal>spgMatchNode</literal> to
|
|
allow the insertion to descend to the next level.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><function>picksplit</function></term>
|
|
<listitem>
|
|
<para>
|
|
Decides how to create a new inner tuple over a set of leaf tuples.
|
|
</para>
|
|
|
|
<para>
|
|
The <acronym>SQL</acronym> declaration of the function must look like this:
|
|
<programlisting>
|
|
CREATE FUNCTION my_picksplit(internal, internal) RETURNS void ...
|
|
</programlisting>
|
|
The first argument is a pointer to a <structname>spgPickSplitIn</structname>
|
|
C struct, containing input data for the function.
|
|
The second argument is a pointer to a <structname>spgPickSplitOut</structname>
|
|
C struct, which the function must fill with result data.
|
|
<programlisting>
|
|
typedef struct spgPickSplitIn
|
|
{
|
|
int nTuples; /* number of leaf tuples */
|
|
Datum *datums; /* their datums (array of length nTuples) */
|
|
int level; /* current level (counting from zero) */
|
|
} spgPickSplitIn;
|
|
|
|
typedef struct spgPickSplitOut
|
|
{
|
|
bool hasPrefix; /* new inner tuple should have a prefix? */
|
|
Datum prefixDatum; /* if so, its value */
|
|
|
|
int nNodes; /* number of nodes for new inner tuple */
|
|
Datum *nodeLabels; /* their labels (or NULL for no labels) */
|
|
|
|
int *mapTuplesToNodes; /* node index for each leaf tuple */
|
|
Datum *leafTupleDatums; /* datum to store in each new leaf tuple */
|
|
} spgPickSplitOut;
|
|
</programlisting>
|
|
|
|
<structfield>nTuples</structfield> is the number of leaf tuples provided.
|
|
<structfield>datums</structfield> is an array of their datum values of
|
|
<structname>spgConfigOut</structname>.<structfield>leafType</structfield>
|
|
type.
|
|
<structfield>level</structfield> is the current level that all the leaf tuples
|
|
share, which will become the level of the new inner tuple.
|
|
</para>
|
|
|
|
<para>
|
|
Set <structfield>hasPrefix</structfield> to indicate whether the new inner
|
|
tuple should have a prefix, and if so set
|
|
<structfield>prefixDatum</structfield> to the prefix value.
|
|
Set <structfield>nNodes</structfield> to indicate the number of nodes that
|
|
the new inner tuple will contain, and
|
|
set <structfield>nodeLabels</structfield> to an array of their label values,
|
|
or to NULL if node labels are not required.
|
|
Set <structfield>mapTuplesToNodes</structfield> to an array that gives the index
|
|
(from zero) of the node that each leaf tuple should be assigned to.
|
|
Set <structfield>leafTupleDatums</structfield> to an array of the values to
|
|
be stored in the new leaf tuples (these will be the same as the
|
|
input <structfield>datums</structfield> if the operator class does not modify
|
|
datums from one level to the next).
|
|
Note that the <function>picksplit</function> function is
|
|
responsible for palloc'ing the
|
|
<structfield>nodeLabels</structfield>, <structfield>mapTuplesToNodes</structfield> and
|
|
<structfield>leafTupleDatums</structfield> arrays.
|
|
</para>
|
|
|
|
<para>
|
|
If more than one leaf tuple is supplied, it is expected that the
|
|
<function>picksplit</function> function will classify them into more than
|
|
one node; otherwise it is not possible to split the leaf tuples
|
|
across multiple pages, which is the ultimate purpose of this
|
|
operation. Therefore, if the <function>picksplit</function> function
|
|
ends up placing all the leaf tuples in the same node, the core
|
|
SP-GiST code will override that decision and generate an inner
|
|
tuple in which the leaf tuples are assigned at random to several
|
|
identically-labeled nodes. Such a tuple is marked
|
|
<literal>allTheSame</literal> to signify that this has happened. The
|
|
<function>choose</function> and <function>inner_consistent</function> functions
|
|
must take suitable care with such inner tuples.
|
|
See <xref linkend="spgist-all-the-same"/> for more information.
|
|
</para>
|
|
|
|
<para>
|
|
<function>picksplit</function> can be applied to a single leaf tuple only
|
|
in the case that the <function>config</function> function set
|
|
<structfield>longValuesOK</structfield> to true and a larger-than-a-page input
|
|
value has been supplied. In this case the point of the operation is
|
|
to strip off a prefix and produce a new, shorter leaf datum value.
|
|
The call will be repeated until a leaf datum short enough to fit on
|
|
a page has been produced. See <xref linkend="spgist-limits"/> for
|
|
more information.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><function>inner_consistent</function></term>
|
|
<listitem>
|
|
<para>
|
|
Returns set of nodes (branches) to follow during tree search.
|
|
</para>
|
|
|
|
<para>
|
|
The <acronym>SQL</acronym> declaration of the function must look like this:
|
|
<programlisting>
|
|
CREATE FUNCTION my_inner_consistent(internal, internal) RETURNS void ...
|
|
</programlisting>
|
|
The first argument is a pointer to a <structname>spgInnerConsistentIn</structname>
|
|
C struct, containing input data for the function.
|
|
The second argument is a pointer to a <structname>spgInnerConsistentOut</structname>
|
|
C struct, which the function must fill with result data.
|
|
|
|
<programlisting>
|
|
typedef struct spgInnerConsistentIn
|
|
{
|
|
ScanKey scankeys; /* array of operators and comparison values */
|
|
ScanKey orderbys; /* array of ordering operators and comparison
|
|
* values */
|
|
int nkeys; /* length of scankeys array */
|
|
int norderbys; /* length of orderbys array */
|
|
|
|
Datum reconstructedValue; /* value reconstructed at parent */
|
|
void *traversalValue; /* opclass-specific traverse value */
|
|
MemoryContext traversalMemoryContext; /* put new traverse values here */
|
|
int level; /* current level (counting from zero) */
|
|
bool returnData; /* original data must be returned? */
|
|
|
|
/* Data from current inner tuple */
|
|
bool allTheSame; /* tuple is marked all-the-same? */
|
|
bool hasPrefix; /* tuple has a prefix? */
|
|
Datum prefixDatum; /* if so, the prefix value */
|
|
int nNodes; /* number of nodes in the inner tuple */
|
|
Datum *nodeLabels; /* node label values (NULL if none) */
|
|
} spgInnerConsistentIn;
|
|
|
|
typedef struct spgInnerConsistentOut
|
|
{
|
|
int nNodes; /* number of child nodes to be visited */
|
|
int *nodeNumbers; /* their indexes in the node array */
|
|
int *levelAdds; /* increment level by this much for each */
|
|
Datum *reconstructedValues; /* associated reconstructed values */
|
|
void **traversalValues; /* opclass-specific traverse values */
|
|
double **distances; /* associated distances */
|
|
} spgInnerConsistentOut;
|
|
</programlisting>
|
|
|
|
The array <structfield>scankeys</structfield>, of length <structfield>nkeys</structfield>,
|
|
describes the index search condition(s). These conditions are
|
|
combined with AND — only index entries that satisfy all of
|
|
them are interesting. (Note that <structfield>nkeys</structfield> = 0 implies
|
|
that all index entries satisfy the query.) Usually the consistent
|
|
function only cares about the <structfield>sk_strategy</structfield> and
|
|
<structfield>sk_argument</structfield> fields of each array entry, which
|
|
respectively give the indexable operator and comparison value.
|
|
In particular it is not necessary to check <structfield>sk_flags</structfield> to
|
|
see if the comparison value is NULL, because the SP-GiST core code
|
|
will filter out such conditions.
|
|
The array <structfield>orderbys</structfield>, of length <structfield>norderbys</structfield>,
|
|
describes ordering operators (if any) in the same manner.
|
|
<structfield>reconstructedValue</structfield> is the value reconstructed for the
|
|
parent tuple; it is <literal>(Datum) 0</literal> at the root level or if the
|
|
<function>inner_consistent</function> function did not provide a value at the
|
|
parent level. <structfield>reconstructedValue</structfield> is always of
|
|
<structname>spgConfigOut</structname>.<structfield>leafType</structfield> type.
|
|
<structfield>traversalValue</structfield> is a pointer to any traverse data
|
|
passed down from the previous call of <function>inner_consistent</function>
|
|
on the parent index tuple, or NULL at the root level.
|
|
<structfield>traversalMemoryContext</structfield> is the memory context in which
|
|
to store output traverse values (see below).
|
|
<structfield>level</structfield> is the current inner tuple's level, starting at
|
|
zero for the root level.
|
|
<structfield>returnData</structfield> is <literal>true</literal> if reconstructed data is
|
|
required for this query; this will only be so if the
|
|
<function>config</function> function asserted <structfield>canReturnData</structfield>.
|
|
<structfield>allTheSame</structfield> is true if the current inner tuple is
|
|
marked <quote>all-the-same</quote>; in this case all the nodes have the
|
|
same label (if any) and so either all or none of them match the query
|
|
(see <xref linkend="spgist-all-the-same"/>).
|
|
<structfield>hasPrefix</structfield> is true if the current inner tuple contains
|
|
a prefix; if so,
|
|
<structfield>prefixDatum</structfield> is its value.
|
|
<structfield>nNodes</structfield> is the number of child nodes contained in the
|
|
inner tuple, and
|
|
<structfield>nodeLabels</structfield> is an array of their label values, or
|
|
NULL if the nodes do not have labels.
|
|
</para>
|
|
|
|
<para>
|
|
<structfield>nNodes</structfield> must be set to the number of child nodes that
|
|
need to be visited by the search, and
|
|
<structfield>nodeNumbers</structfield> must be set to an array of their indexes.
|
|
If the operator class keeps track of levels, set
|
|
<structfield>levelAdds</structfield> to an array of the level increments
|
|
required when descending to each node to be visited. (Often these
|
|
increments will be the same for all the nodes, but that's not
|
|
necessarily so, so an array is used.)
|
|
If value reconstruction is needed, set
|
|
<structfield>reconstructedValues</structfield> to an array of the values
|
|
of <structname>spgConfigOut</structname>.<structfield>leafType</structfield> type
|
|
reconstructed for each child node to be visited; otherwise, leave
|
|
<structfield>reconstructedValues</structfield> as NULL.
|
|
If ordered search is performed, set <structfield>distances</structfield>
|
|
to an array of distance values according to <structfield>orderbys</structfield>
|
|
array (nodes with lowest distances will be processed first). Leave it
|
|
NULL otherwise.
|
|
If it is desired to pass down additional out-of-band information
|
|
(<quote>traverse values</quote>) to lower levels of the tree search,
|
|
set <structfield>traversalValues</structfield> to an array of the appropriate
|
|
traverse values, one for each child node to be visited; otherwise,
|
|
leave <structfield>traversalValues</structfield> as NULL.
|
|
Note that the <function>inner_consistent</function> function is
|
|
responsible for palloc'ing the
|
|
<structfield>nodeNumbers</structfield>, <structfield>levelAdds</structfield>,
|
|
<structfield>distances</structfield>,
|
|
<structfield>reconstructedValues</structfield>, and
|
|
<structfield>traversalValues</structfield> arrays in the current memory context.
|
|
However, any output traverse values pointed to by
|
|
the <structfield>traversalValues</structfield> array should be allocated
|
|
in <structfield>traversalMemoryContext</structfield>.
|
|
Each traverse value must be a single palloc'd chunk.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><function>leaf_consistent</function></term>
|
|
<listitem>
|
|
<para>
|
|
Returns true if a leaf tuple satisfies a query.
|
|
</para>
|
|
|
|
<para>
|
|
The <acronym>SQL</acronym> declaration of the function must look like this:
|
|
<programlisting>
|
|
CREATE FUNCTION my_leaf_consistent(internal, internal) RETURNS bool ...
|
|
</programlisting>
|
|
The first argument is a pointer to a <structname>spgLeafConsistentIn</structname>
|
|
C struct, containing input data for the function.
|
|
The second argument is a pointer to a <structname>spgLeafConsistentOut</structname>
|
|
C struct, which the function must fill with result data.
|
|
<programlisting>
|
|
typedef struct spgLeafConsistentIn
|
|
{
|
|
ScanKey scankeys; /* array of operators and comparison values */
|
|
ScanKey orderbys; /* array of ordering operators and comparison
|
|
* values */
|
|
int nkeys; /* length of scankeys array */
|
|
int norderbys; /* length of orderbys array */
|
|
|
|
Datum reconstructedValue; /* value reconstructed at parent */
|
|
void *traversalValue; /* opclass-specific traverse value */
|
|
int level; /* current level (counting from zero) */
|
|
bool returnData; /* original data must be returned? */
|
|
|
|
Datum leafDatum; /* datum in leaf tuple */
|
|
} spgLeafConsistentIn;
|
|
|
|
typedef struct spgLeafConsistentOut
|
|
{
|
|
Datum leafValue; /* reconstructed original data, if any */
|
|
bool recheck; /* set true if operator must be rechecked */
|
|
bool recheckDistances; /* set true if distances must be rechecked */
|
|
double *distances; /* associated distances */
|
|
} spgLeafConsistentOut;
|
|
</programlisting>
|
|
|
|
The array <structfield>scankeys</structfield>, of length <structfield>nkeys</structfield>,
|
|
describes the index search condition(s). These conditions are
|
|
combined with AND — only index entries that satisfy all of
|
|
them satisfy the query. (Note that <structfield>nkeys</structfield> = 0 implies
|
|
that all index entries satisfy the query.) Usually the consistent
|
|
function only cares about the <structfield>sk_strategy</structfield> and
|
|
<structfield>sk_argument</structfield> fields of each array entry, which
|
|
respectively give the indexable operator and comparison value.
|
|
In particular it is not necessary to check <structfield>sk_flags</structfield> to
|
|
see if the comparison value is NULL, because the SP-GiST core code
|
|
will filter out such conditions.
|
|
The array <structfield>orderbys</structfield>, of length <structfield>norderbys</structfield>,
|
|
describes the ordering operators in the same manner.
|
|
<structfield>reconstructedValue</structfield> is the value reconstructed for the
|
|
parent tuple; it is <literal>(Datum) 0</literal> at the root level or if the
|
|
<function>inner_consistent</function> function did not provide a value at the
|
|
parent level. <structfield>reconstructedValue</structfield> is always of
|
|
<structname>spgConfigOut</structname>.<structfield>leafType</structfield> type.
|
|
<structfield>traversalValue</structfield> is a pointer to any traverse data
|
|
passed down from the previous call of <function>inner_consistent</function>
|
|
on the parent index tuple, or NULL at the root level.
|
|
<structfield>level</structfield> is the current leaf tuple's level, starting at
|
|
zero for the root level.
|
|
<structfield>returnData</structfield> is <literal>true</literal> if reconstructed data is
|
|
required for this query; this will only be so if the
|
|
<function>config</function> function asserted <structfield>canReturnData</structfield>.
|
|
<structfield>leafDatum</structfield> is the key value of
|
|
<structname>spgConfigOut</structname>.<structfield>leafType</structfield>
|
|
stored in the current leaf tuple.
|
|
</para>
|
|
|
|
<para>
|
|
The function must return <literal>true</literal> if the leaf tuple matches the
|
|
query, or <literal>false</literal> if not. In the <literal>true</literal> case,
|
|
if <structfield>returnData</structfield> is <literal>true</literal> then
|
|
<structfield>leafValue</structfield> must be set to the value of
|
|
<structname>spgConfigIn</structname>.<structfield>attType</structfield> type
|
|
originally supplied to be indexed for this leaf tuple. Also,
|
|
<structfield>recheck</structfield> may be set to <literal>true</literal> if the match
|
|
is uncertain and so the operator(s) must be re-applied to the actual
|
|
heap tuple to verify the match.
|
|
If ordered search is performed, set <structfield>distances</structfield>
|
|
to an array of distance values according to <structfield>orderbys</structfield>
|
|
array. Leave it NULL otherwise. If at least one of returned distances
|
|
is not exact, set <structfield>recheckDistances</structfield> to true.
|
|
In this case, the executor will calculate the exact distances after
|
|
fetching the tuple from the heap, and will reorder the tuples if needed.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
<para>
|
|
The optional user-defined method is:
|
|
</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><function>Datum compress(Datum in)</function></term>
|
|
<listitem>
|
|
<para>
|
|
Converts the data item into a format suitable for physical storage in
|
|
a leaf tuple of index page. It accepts
|
|
<structname>spgConfigIn</structname>.<structfield>attType</structfield>
|
|
value and returns
|
|
<structname>spgConfigOut</structname>.<structfield>leafType</structfield>
|
|
value. Output value should not be toasted.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
<para>
|
|
All the SP-GiST support methods are normally called in a short-lived
|
|
memory context; that is, <varname>CurrentMemoryContext</varname> will be reset
|
|
after processing of each tuple. It is therefore not very important to
|
|
worry about pfree'ing everything you palloc. (The <function>config</function>
|
|
method is an exception: it should try to avoid leaking memory. But
|
|
usually the <function>config</function> method need do nothing but assign
|
|
constants into the passed parameter struct.)
|
|
</para>
|
|
|
|
<para>
|
|
If the indexed column is of a collatable data type, the index collation
|
|
will be passed to all the support methods, using the standard
|
|
<function>PG_GET_COLLATION()</function> mechanism.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="spgist-implementation">
|
|
<title>Implementation</title>
|
|
|
|
<para>
|
|
This section covers implementation details and other tricks that are
|
|
useful for implementers of <acronym>SP-GiST</acronym> operator classes to
|
|
know.
|
|
</para>
|
|
|
|
<sect2 id="spgist-limits">
|
|
<title>SP-GiST Limits</title>
|
|
|
|
<para>
|
|
Individual leaf tuples and inner tuples must fit on a single index page
|
|
(8kB by default). Therefore, when indexing values of variable-length
|
|
data types, long values can only be supported by methods such as radix
|
|
trees, in which each level of the tree includes a prefix that is short
|
|
enough to fit on a page, and the final leaf level includes a suffix also
|
|
short enough to fit on a page. The operator class should set
|
|
<structfield>longValuesOK</structfield> to true only if it is prepared to arrange for
|
|
this to happen. Otherwise, the <acronym>SP-GiST</acronym> core will
|
|
reject any request to index a value that is too large to fit
|
|
on an index page.
|
|
</para>
|
|
|
|
<para>
|
|
Likewise, it is the operator class's responsibility that inner tuples
|
|
do not grow too large to fit on an index page; this limits the number
|
|
of child nodes that can be used in one inner tuple, as well as the
|
|
maximum size of a prefix value.
|
|
</para>
|
|
|
|
<para>
|
|
Another limitation is that when an inner tuple's node points to a set
|
|
of leaf tuples, those tuples must all be in the same index page.
|
|
(This is a design decision to reduce seeking and save space in the
|
|
links that chain such tuples together.) If the set of leaf tuples
|
|
grows too large for a page, a split is performed and an intermediate
|
|
inner tuple is inserted. For this to fix the problem, the new inner
|
|
tuple <emphasis>must</emphasis> divide the set of leaf values into more than one
|
|
node group. If the operator class's <function>picksplit</function> function
|
|
fails to do that, the <acronym>SP-GiST</acronym> core resorts to
|
|
extraordinary measures described in <xref linkend="spgist-all-the-same"/>.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="spgist-null-labels">
|
|
<title>SP-GiST Without Node Labels</title>
|
|
|
|
<para>
|
|
Some tree algorithms use a fixed set of nodes for each inner tuple;
|
|
for example, in a quad-tree there are always exactly four nodes
|
|
corresponding to the four quadrants around the inner tuple's centroid
|
|
point. In such a case the code typically works with the nodes by
|
|
number, and there is no need for explicit node labels. To suppress
|
|
node labels (and thereby save some space), the <function>picksplit</function>
|
|
function can return NULL for the <structfield>nodeLabels</structfield> array,
|
|
and likewise the <function>choose</function> function can return NULL for
|
|
the <structfield>prefixNodeLabels</structfield> array during
|
|
a <literal>spgSplitTuple</literal> action.
|
|
This will in turn result in <structfield>nodeLabels</structfield> being NULL during
|
|
subsequent calls to <function>choose</function> and <function>inner_consistent</function>.
|
|
In principle, node labels could be used for some inner tuples and omitted
|
|
for others in the same index.
|
|
</para>
|
|
|
|
<para>
|
|
When working with an inner tuple having unlabeled nodes, it is an error
|
|
for <function>choose</function> to return <literal>spgAddNode</literal>, since the set
|
|
of nodes is supposed to be fixed in such cases.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="spgist-all-the-same">
|
|
<title><quote>All-the-Same</quote> Inner Tuples</title>
|
|
|
|
<para>
|
|
The <acronym>SP-GiST</acronym> core can override the results of the
|
|
operator class's <function>picksplit</function> function when
|
|
<function>picksplit</function> fails to divide the supplied leaf values into
|
|
at least two node categories. When this happens, the new inner tuple
|
|
is created with multiple nodes that each have the same label (if any)
|
|
that <function>picksplit</function> gave to the one node it did use, and the
|
|
leaf values are divided at random among these equivalent nodes.
|
|
The <literal>allTheSame</literal> flag is set on the inner tuple to warn the
|
|
<function>choose</function> and <function>inner_consistent</function> functions that the
|
|
tuple does not have the node set that they might otherwise expect.
|
|
</para>
|
|
|
|
<para>
|
|
When dealing with an <literal>allTheSame</literal> tuple, a <function>choose</function>
|
|
result of <literal>spgMatchNode</literal> is interpreted to mean that the new
|
|
value can be assigned to any of the equivalent nodes; the core code will
|
|
ignore the supplied <structfield>nodeN</structfield> value and descend into one
|
|
of the nodes at random (so as to keep the tree balanced). It is an
|
|
error for <function>choose</function> to return <literal>spgAddNode</literal>, since
|
|
that would make the nodes not all equivalent; the
|
|
<literal>spgSplitTuple</literal> action must be used if the value to be inserted
|
|
doesn't match the existing nodes.
|
|
</para>
|
|
|
|
<para>
|
|
When dealing with an <literal>allTheSame</literal> tuple, the
|
|
<function>inner_consistent</function> function should return either all or none
|
|
of the nodes as targets for continuing the index search, since they are
|
|
all equivalent. This may or may not require any special-case code,
|
|
depending on how much the <function>inner_consistent</function> function normally
|
|
assumes about the meaning of the nodes.
|
|
</para>
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="spgist-examples">
|
|
<title>Examples</title>
|
|
|
|
<para>
|
|
The <productname>PostgreSQL</productname> source distribution includes
|
|
several examples of index operator classes for <acronym>SP-GiST</acronym>,
|
|
as described in <xref linkend="spgist-builtin-opclasses-table"/>. Look
|
|
into <filename>src/backend/access/spgist/</filename>
|
|
and <filename>src/backend/utils/adt/</filename> to see the code.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|