1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-16 06:01:02 +03:00

Allow opclasses to provide tri-valued GIN consistent functions.

With the GIN "fast scan" feature, GIN can skip items without fetching all
the keys for them, if it can prove that they don't match regardless of
those keys. So far, it has done the proving by calling the boolean
consistent function with all combinations of TRUE/FALSE for the unfetched
keys, but since that's O(n^2), it becomes unfeasible with more than a few
keys. We can avoid calling consistent with all the combinations, if we can
tell the operator class implementation directly which keys are unknown.

This commit includes a triConsistent function for the built-in array and
tsvector opclasses.

Alexander Korotkov, with some changes by me.
This commit is contained in:
Heikki Linnakangas
2014-03-12 17:13:22 +02:00
parent fecfc2b913
commit c5608ea26a
16 changed files with 467 additions and 101 deletions

View File

@ -74,15 +74,15 @@
<para>
All it takes to get a <acronym>GIN</acronym> access method working is to
implement four (or five) user-defined methods, which define the behavior of
implement a few user-defined methods, which define the behavior of
keys in the tree and the relationships between keys, indexed items,
and indexable queries. In short, <acronym>GIN</acronym> combines
extensibility with generality, code reuse, and a clean interface.
</para>
<para>
The four methods that an operator class for
<acronym>GIN</acronym> must provide are:
There are three methods that an operator class for
<acronym>GIN</acronym> must provide:
<variablelist>
<varlistentry>
@ -190,7 +190,18 @@
</listitem>
</varlistentry>
</variablelist>
An operator class must also provide a function to check if an indexed item
matches the query. It comes in two flavors, a boolean <function>consistent</>
function, and a ternary <function>triConsistent</> function.
<function>triConsistent</> covers the functionality of both, so providing
triConsistent alone is sufficient. However, if the boolean variant is
significantly cheaper to calculate, it can be advantegous to provide both.
If only the boolean variant is provided, some optimizations that depend on
refuting index items before fetching all the keys are disabled.
<variablelist>
<varlistentry>
<term><function>bool consistent(bool check[], StrategyNumber n, Datum query,
int32 nkeys, Pointer extra_data[], bool *recheck,
@ -241,10 +252,38 @@
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><function>GinLogicValue triConsistent(GinLogicValue check[], StrategyNumber n, Datum query,
int32 nkeys, Pointer extra_data[],
Datum queryKeys[], bool nullFlags[])</></term>
<listitem>
<para>
<function>triConsistent</> is similar to <function>consistent</>,
but instead of a boolean <literal>check[]</>, there are three possible
values for each key: <literal>GIN_TRUE</>, <literal>GIN_FALSE</> and
<literal>GIN_MAYBE</>. <literal>GIN_FALSE</> and <literal>GIN_TRUE</>
have the same meaning as regular boolean values.
<literal>GIN_MAYBE</> means that the presence of that key is not known.
When <literal>GIN_MAYBE</> values are present, the function should only
return GIN_TRUE if the item matches whether or not the index item
contains the corresponding query keys. Likewise, the function must
return GIN_FALSE only if the item does not match, whether or not it
contains the GIN_MAYBE keys. If the result depends on the GIN_MAYBE
entries, ie. the match cannot be confirmed or refuted based on the
known query keys, the function must return GIN_MAYBE.
</para>
<para>
When there are no GIN_MAYBE values in the <literal>check</> vector,
<literal>GIN_MAYBE</> return value is equivalent of setting
<literal>recheck</> flag in the boolean <function>consistent</> function.
</para>
</listitem>
</varlistentry>
</variablelist>
Optionally, an operator class for
<acronym>GIN</acronym> can supply a fifth method:
Optionally, an operator class for <acronym>GIN</acronym> can supply the
following method:
<variablelist>
<varlistentry>
@ -282,8 +321,9 @@
above vary depending on the operator class. The item values passed to
<function>extractValue</> are always of the operator class's input type, and
all key values must be of the class's <literal>STORAGE</> type. The type of
the <literal>query</> argument passed to <function>extractQuery</> and
<function>consistent</> is whatever is specified as the right-hand input
the <literal>query</> argument passed to <function>extractQuery</>,
<function>consistent</> and <function>triConsistent</> is whatever is
specified as the right-hand input
type of the class member operator identified by the strategy number.
This need not be the same as the item type, so long as key values of the
correct type can be extracted from it.

View File

@ -567,7 +567,10 @@
</row>
<row>
<entry><function>consistent</></entry>
<entry>determine whether value matches query condition</entry>
<entry>
determine whether value matches query condition (boolean variant)
(optional if support function 6 is present)
</entry>
<entry>4</entry>
</row>
<row>
@ -580,6 +583,14 @@
</entry>
<entry>5</entry>
</row>
<row>
<entry><function>triConsistent</></entry>
<entry>
determine whether value matches query condition (ternary variant)
(optional if support function 4 is present)
</entry>
<entry>6</entry>
</row>
</tbody>
</tgroup>
</table>