mirror of
https://github.com/postgres/postgres.git
synced 2025-07-28 23:42:10 +03:00
Push index operator lossiness determination down to GIST/GIN opclass
"consistent" functions, and remove pg_amop.opreqcheck, as per recent discussion. The main immediate benefit of this is that we no longer need 8.3's ugly hack of requiring @@@ rather than @@ to test weight-using tsquery searches on GIN indexes. In future it should be possible to optimize some other queries better than is done now, by detecting at runtime whether the index match is exact or not. Tom Lane, after an idea of Heikki's, and with some help from Teodor.
This commit is contained in:
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.164 2008/04/10 22:25:25 tgl Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.165 2008/04/14 17:05:32 tgl Exp $ -->
|
||||
<!--
|
||||
Documentation of the system catalogs, directed toward PostgreSQL developers
|
||||
-->
|
||||
@ -606,13 +606,6 @@
|
||||
<entry>Operator strategy number</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry><structfield>amopreqcheck</structfield></entry>
|
||||
<entry><type>bool</type></entry>
|
||||
<entry></entry>
|
||||
<entry>Index hit must be rechecked</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry><structfield>amopopr</structfield></entry>
|
||||
<entry><type>oid</type></entry>
|
||||
|
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.429 2008/04/10 13:34:33 alvherre Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.430 2008/04/14 17:05:32 tgl Exp $ -->
|
||||
|
||||
<chapter id="functions">
|
||||
<title>Functions and Operators</title>
|
||||
@ -7738,7 +7738,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
|
||||
</row>
|
||||
<row>
|
||||
<entry> <literal>@@@</literal> </entry>
|
||||
<entry>same as <literal>@@</>, but see <xref linkend="textsearch-indexes"></entry>
|
||||
<entry>deprecated synonym for <literal>@@</></entry>
|
||||
<entry><literal>to_tsvector('fat cats ate rats') @@@ to_tsquery('cat & rat')</literal></entry>
|
||||
<entry><literal>t</literal></entry>
|
||||
</row>
|
||||
|
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/gin.sgml,v 2.13 2007/11/16 03:23:07 tgl Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/gin.sgml,v 2.14 2008/04/14 17:05:32 tgl Exp $ -->
|
||||
|
||||
<chapter id="GIN">
|
||||
<title>GIN Indexes</title>
|
||||
@ -111,12 +111,12 @@
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>bool consistent(bool check[], StrategyNumber n, Datum query)</term>
|
||||
<term>bool consistent(bool check[], StrategyNumber n, Datum query, bool *recheck)</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Returns TRUE if the indexed value satisfies the query operator with
|
||||
strategy number <literal>n</> (or would satisfy, if the operator is
|
||||
marked RECHECK in the operator class). The <literal>check</> array has
|
||||
strategy number <literal>n</> (or might satisfy, if the recheck
|
||||
indication is returned). The <literal>check</> array has
|
||||
the same length as the number of keys previously returned by
|
||||
<function>extractQuery</> for this query. Each element of the
|
||||
<literal>check</> array is TRUE if the indexed value contains the
|
||||
@ -124,6 +124,9 @@
|
||||
<function>extractQuery</> result array is present in the indexed value.
|
||||
The original <literal>query</> datum (not the extracted key array!) is
|
||||
passed in case the <function>consistent</> method needs to consult it.
|
||||
On success, <literal>*recheck</> should be set to TRUE if the heap
|
||||
tuple needs to be rechecked against the query operator, or FALSE if
|
||||
the index test is exact.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/gist.sgml,v 1.29 2007/11/13 23:36:26 tgl Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/gist.sgml,v 1.30 2008/04/14 17:05:32 tgl Exp $ -->
|
||||
|
||||
<chapter id="GiST">
|
||||
<title>GiST Indexes</title>
|
||||
@ -103,7 +103,10 @@
|
||||
Given a predicate <literal>p</literal> on a tree page, and a user
|
||||
query, <literal>q</literal>, this method will return false if it is
|
||||
certain that both <literal>p</literal> and <literal>q</literal> cannot
|
||||
be true for a given data item.
|
||||
be true for a given data item. For a true result, a
|
||||
<literal>recheck</> flag must also be returned; this indicates whether
|
||||
the predicate implies the query (<literal>recheck</> = false) or
|
||||
not (<literal>recheck</> = true).
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/indexam.sgml,v 2.25 2008/04/13 19:18:13 tgl Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/indexam.sgml,v 2.26 2008/04/14 17:05:32 tgl Exp $ -->
|
||||
|
||||
<chapter id="indexam">
|
||||
<title>Index Access Method Interface Definition</title>
|
||||
@ -183,7 +183,7 @@ aminsert (Relation indexRelation,
|
||||
parameter. See <xref linkend="index-unique-checks"> for details.
|
||||
The result is TRUE if an index entry was inserted, FALSE if not. (A FALSE
|
||||
result does not denote an error condition, but is used for cases such
|
||||
as an index AM refusing to index a NULL.)
|
||||
as an index method refusing to index a NULL.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
@ -430,13 +430,13 @@ amrestrpos (IndexScanDesc scan);
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The operator family can indicate that the index is <firstterm>lossy</> for a
|
||||
particular operator; this implies that the index scan will return all the
|
||||
entries that pass the scan key, plus possibly additional entries that do
|
||||
not. The core system's index-scan machinery will then apply that operator
|
||||
again to the heap tuple to verify whether or not it really should be
|
||||
selected. For non-lossy operators, the index scan must return exactly the
|
||||
set of matching entries, as there is no recheck.
|
||||
The access method can report that the index is <firstterm>lossy</>, or
|
||||
requires rechecks, for a particular query. This implies that the index
|
||||
scan will return all the entries that pass the scan key, plus possibly
|
||||
additional entries that do not. The core system's index-scan machinery
|
||||
will then apply the index conditions again to the heap tuple to verify
|
||||
whether or not it really should be selected. If the recheck option is not
|
||||
specified, the index scan must return exactly the set of matching entries.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
@ -849,7 +849,7 @@ amcostestimate (PlannerInfo *root,
|
||||
<para>
|
||||
The indexSelectivity should be set to the estimated fraction of the parent
|
||||
table rows that will be retrieved during the index scan. In the case
|
||||
of a lossy index, this will typically be higher than the fraction of
|
||||
of a lossy query, this will typically be higher than the fraction of
|
||||
rows that actually pass the given qual conditions.
|
||||
</para>
|
||||
|
||||
|
@ -1,5 +1,5 @@
|
||||
<!--
|
||||
$PostgreSQL: pgsql/doc/src/sgml/ref/alter_opfamily.sgml,v 1.3 2007/02/14 04:30:26 tgl Exp $
|
||||
$PostgreSQL: pgsql/doc/src/sgml/ref/alter_opfamily.sgml,v 1.4 2008/04/14 17:05:32 tgl Exp $
|
||||
PostgreSQL documentation
|
||||
-->
|
||||
|
||||
@ -21,7 +21,7 @@ PostgreSQL documentation
|
||||
<refsynopsisdiv>
|
||||
<synopsis>
|
||||
ALTER OPERATOR FAMILY <replaceable>name</replaceable> USING <replaceable class="parameter">index_method</replaceable> ADD
|
||||
{ OPERATOR <replaceable class="parameter">strategy_number</replaceable> <replaceable class="parameter">operator_name</replaceable> ( <replaceable class="parameter">op_type</replaceable>, <replaceable class="parameter">op_type</replaceable> ) [ RECHECK ]
|
||||
{ OPERATOR <replaceable class="parameter">strategy_number</replaceable> <replaceable class="parameter">operator_name</replaceable> ( <replaceable class="parameter">op_type</replaceable>, <replaceable class="parameter">op_type</replaceable> )
|
||||
| FUNCTION <replaceable class="parameter">support_number</replaceable> [ ( <replaceable class="parameter">op_type</replaceable> [ , <replaceable class="parameter">op_type</replaceable> ] ) ] <replaceable class="parameter">funcname</replaceable> ( <replaceable class="parameter">argument_type</replaceable> [, ...] )
|
||||
} [, ... ]
|
||||
ALTER OPERATOR FAMILY <replaceable>name</replaceable> USING <replaceable class="parameter">index_method</replaceable> DROP
|
||||
@ -154,18 +154,6 @@ ALTER OPERATOR FAMILY <replaceable>name</replaceable> USING <replaceable class="
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><literal>RECHECK</></term>
|
||||
<listitem>
|
||||
<para>
|
||||
If present, the index is <quote>lossy</> for this operator, and
|
||||
so the rows retrieved using the index must be rechecked to
|
||||
verify that they actually satisfy the qualification clause
|
||||
involving this operator.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><replaceable class="parameter">support_number</replaceable></term>
|
||||
<listitem>
|
||||
@ -247,6 +235,14 @@ ALTER OPERATOR FAMILY <replaceable>name</replaceable> USING <replaceable class="
|
||||
is likely to be inlined into the calling query, which will prevent
|
||||
the optimizer from recognizing that the query matches an index.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Before <productname>PostgreSQL</productname> 8.4, the <literal>OPERATOR</>
|
||||
clause could include a <literal>RECHECK</> option. This is no longer
|
||||
supported because whether an index operator is <quote>lossy</> is now
|
||||
determined on-the-fly at runtime. This allows efficient handling of
|
||||
cases where an operator might or might not be lossy.
|
||||
</para>
|
||||
</refsect1>
|
||||
|
||||
<refsect1>
|
||||
|
@ -1,5 +1,5 @@
|
||||
<!--
|
||||
$PostgreSQL: pgsql/doc/src/sgml/ref/create_opclass.sgml,v 1.21 2007/12/03 23:49:51 tgl Exp $
|
||||
$PostgreSQL: pgsql/doc/src/sgml/ref/create_opclass.sgml,v 1.22 2008/04/14 17:05:32 tgl Exp $
|
||||
PostgreSQL documentation
|
||||
-->
|
||||
|
||||
@ -22,7 +22,7 @@ PostgreSQL documentation
|
||||
<synopsis>
|
||||
CREATE OPERATOR CLASS <replaceable class="parameter">name</replaceable> [ DEFAULT ] FOR TYPE <replaceable class="parameter">data_type</replaceable>
|
||||
USING <replaceable class="parameter">index_method</replaceable> [ FAMILY <replaceable class="parameter">family_name</replaceable> ] AS
|
||||
{ OPERATOR <replaceable class="parameter">strategy_number</replaceable> <replaceable class="parameter">operator_name</replaceable> [ ( <replaceable class="parameter">op_type</replaceable>, <replaceable class="parameter">op_type</replaceable> ) ] [ RECHECK ]
|
||||
{ OPERATOR <replaceable class="parameter">strategy_number</replaceable> <replaceable class="parameter">operator_name</replaceable> [ ( <replaceable class="parameter">op_type</replaceable>, <replaceable class="parameter">op_type</replaceable> ) ]
|
||||
| FUNCTION <replaceable class="parameter">support_number</replaceable> [ ( <replaceable class="parameter">op_type</replaceable> [ , <replaceable class="parameter">op_type</replaceable> ] ) ] <replaceable class="parameter">funcname</replaceable> ( <replaceable class="parameter">argument_type</replaceable> [, ...] )
|
||||
| STORAGE <replaceable class="parameter">storage_type</replaceable>
|
||||
} [, ... ]
|
||||
@ -179,18 +179,6 @@ CREATE OPERATOR CLASS <replaceable class="parameter">name</replaceable> [ DEFAUL
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><literal>RECHECK</></term>
|
||||
<listitem>
|
||||
<para>
|
||||
If present, the index is <quote>lossy</> for this operator, and
|
||||
so the rows retrieved using the index must be rechecked to
|
||||
verify that they actually satisfy the qualification clause
|
||||
involving this operator.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><replaceable class="parameter">support_number</replaceable></term>
|
||||
<listitem>
|
||||
@ -256,6 +244,14 @@ CREATE OPERATOR CLASS <replaceable class="parameter">name</replaceable> [ DEFAUL
|
||||
is likely to be inlined into the calling query, which will prevent
|
||||
the optimizer from recognizing that the query matches an index.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Before <productname>PostgreSQL</productname> 8.4, the <literal>OPERATOR</>
|
||||
clause could include a <literal>RECHECK</> option. This is no longer
|
||||
supported because whether an index operator is <quote>lossy</> is now
|
||||
determined on-the-fly at runtime. This allows efficient handling of
|
||||
cases where an operator might or might not be lossy.
|
||||
</para>
|
||||
</refsect1>
|
||||
|
||||
<refsect1>
|
||||
@ -271,12 +267,12 @@ CREATE OPERATOR CLASS <replaceable class="parameter">name</replaceable> [ DEFAUL
|
||||
CREATE OPERATOR CLASS gist__int_ops
|
||||
DEFAULT FOR TYPE _int4 USING gist AS
|
||||
OPERATOR 3 &&,
|
||||
OPERATOR 6 = RECHECK,
|
||||
OPERATOR 6 = (anyarray, anyarray),
|
||||
OPERATOR 7 @>,
|
||||
OPERATOR 8 <@,
|
||||
OPERATOR 20 @@ (_int4, query_int),
|
||||
FUNCTION 1 g_int_consistent (internal, _int4, int4),
|
||||
FUNCTION 2 g_int_union (bytea, internal),
|
||||
FUNCTION 1 g_int_consistent (internal, _int4, int, oid, internal),
|
||||
FUNCTION 2 g_int_union (internal, internal),
|
||||
FUNCTION 3 g_int_compress (internal),
|
||||
FUNCTION 4 g_int_decompress (internal),
|
||||
FUNCTION 5 g_int_penalty (internal, internal, internal),
|
||||
|
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.42 2008/03/10 03:01:28 tgl Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.43 2008/04/14 17:05:32 tgl Exp $ -->
|
||||
|
||||
<chapter id="textsearch">
|
||||
<title id="textsearch-title">Full Text Search</title>
|
||||
@ -3142,19 +3142,7 @@ SELECT plainto_tsquery('supernovae stars');
|
||||
A GiST index is <firstterm>lossy</firstterm>, meaning that the index
|
||||
may produce false matches, and it is necessary
|
||||
to check the actual table row to eliminate such false matches.
|
||||
<productname>PostgreSQL</productname> does this automatically; for
|
||||
example, in the query plan below, the <literal>Filter:</literal>
|
||||
line indicates the index output will be rechecked:
|
||||
|
||||
<programlisting>
|
||||
EXPLAIN SELECT * FROM apod WHERE textsearch @@ to_tsquery('supernovae');
|
||||
QUERY PLAN
|
||||
-------------------------------------------------------------------------
|
||||
Index Scan using textsearch_gidx on apod (cost=0.00..12.29 rows=2 width=1469)
|
||||
Index Cond: (textsearch @@ '''supernova'''::tsquery)
|
||||
Filter: (textsearch @@ '''supernova'''::tsquery)
|
||||
</programlisting>
|
||||
|
||||
(<productname>PostgreSQL</productname> does this automatically when needed.)
|
||||
GiST indexes are lossy because each document is represented in the
|
||||
index by a fixed-length signature. The signature is generated by hashing
|
||||
each word into a random bit in an n-bit string, with all these bits OR-ed
|
||||
@ -3174,57 +3162,11 @@ EXPLAIN SELECT * FROM apod WHERE textsearch @@ to_tsquery('supernovae');
|
||||
</para>
|
||||
|
||||
<para>
|
||||
GIN indexes are not lossy but their performance depends logarithmically on
|
||||
the number of unique words.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Actually, GIN indexes store only the words (lexemes) of <type>tsvector</>
|
||||
values, and not their weight labels. Thus, while a GIN index can be
|
||||
considered non-lossy for a query that does not specify weights, it is
|
||||
lossy for one that does. Thus a table row recheck is needed when using
|
||||
a query that involves weights. Unfortunately, in the current design of
|
||||
<productname>PostgreSQL</>, whether a recheck is needed is a static
|
||||
property of a particular operator, and not something that can be enabled
|
||||
or disabled on-the-fly depending on the values given to the operator.
|
||||
To deal with this situation without imposing the overhead of rechecks
|
||||
on queries that do not need them, the following approach has been
|
||||
adopted:
|
||||
</para>
|
||||
|
||||
<itemizedlist spacing="compact" mark="bullet">
|
||||
<listitem>
|
||||
<para>
|
||||
The standard text match operator <literal>@@</> is marked as non-lossy
|
||||
for GIN indexes.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
An additional match operator <literal>@@@</> is provided, and marked
|
||||
as lossy for GIN indexes. This operator behaves exactly like
|
||||
<literal>@@</> otherwise.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
When a GIN index search is initiated with the <literal>@@</> operator,
|
||||
the index support code will throw an error if the query specifies any
|
||||
weights. This protects against giving wrong answers due to failure
|
||||
to recheck the weights.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
In short, you must use <literal>@@@</> rather than <literal>@@</> to
|
||||
perform GIN index searches on queries that involve weight restrictions.
|
||||
For queries that do not have weight restrictions, either operator will
|
||||
work, but <literal>@@</> will be faster.
|
||||
This awkwardness will probably be addressed in a future release of
|
||||
<productname>PostgreSQL</>.
|
||||
GIN indexes are not lossy for standard queries, but their performance
|
||||
depends logarithmically on the number of unique words.
|
||||
(However, GIN indexes store only the words (lexemes) of <type>tsvector</>
|
||||
values, and not their weight labels. Thus a table row recheck is needed
|
||||
when using a query that involves weights.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
|
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.61 2007/12/02 04:36:40 tgl Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.62 2008/04/14 17:05:32 tgl Exp $ -->
|
||||
|
||||
<sect1 id="xindex">
|
||||
<title>Interfacing Extensions To Indexes</title>
|
||||
@ -913,26 +913,31 @@ ALTER OPERATOR FAMILY integer_ops USING btree ADD
|
||||
|
||||
<para>
|
||||
Normally, declaring an operator as a member of an operator class
|
||||
(or family) means
|
||||
that the index method can retrieve exactly the set of rows
|
||||
(or family) means that the index method can retrieve exactly the set of rows
|
||||
that satisfy a <literal>WHERE</> condition using the operator. For example:
|
||||
<programlisting>
|
||||
SELECT * FROM table WHERE integer_column < 4;
|
||||
</programlisting>
|
||||
can be satisfied exactly by a B-tree index on the integer column.
|
||||
But there are cases where an index is useful as an inexact guide to
|
||||
the matching rows. For example, if a GiST index stores only
|
||||
bounding boxes for objects, then it cannot exactly satisfy a <literal>WHERE</>
|
||||
the matching rows. For example, if a GiST index stores only bounding boxes
|
||||
for geometric objects, then it cannot exactly satisfy a <literal>WHERE</>
|
||||
condition that tests overlap between nonrectangular objects such as
|
||||
polygons. Yet we could use the index to find objects whose bounding
|
||||
box overlaps the bounding box of the target object, and then do the
|
||||
exact overlap test only on the objects found by the index. If this
|
||||
scenario applies, the index is said to be <quote>lossy</> for the
|
||||
operator, and we add <literal>RECHECK</> to the <literal>OPERATOR</> clause
|
||||
in the <command>CREATE OPERATOR CLASS</> command.
|
||||
<literal>RECHECK</> is valid if the index is guaranteed to return
|
||||
all the required rows, plus perhaps some additional rows, which
|
||||
can be eliminated by performing the original operator invocation.
|
||||
operator. Lossy index searches are implemented by having the index
|
||||
method return a <firstterm>recheck</> flag when a row might or might
|
||||
not really satisfy the query condition. The core system will then
|
||||
test the original query condition on the retrieved row to see whether
|
||||
it should be returned as a valid match. This approach works if
|
||||
the index is guaranteed to return all the required rows, plus perhaps
|
||||
some additional rows, which can be eliminated by performing the original
|
||||
operator invocation. The index methods that support lossy searches
|
||||
(currently, GiST and GIN) allow the support functions of individual
|
||||
operator classes to set the recheck flag, and so this is essentially an
|
||||
operator-class feature.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
|
Reference in New Issue
Block a user