Previously, the spgSplitTuple action could only create a new upper tuple
containing a single labeled node. This made it useless for opclasses
that prefer to work with fixed sets of nodes (labeled or otherwise),
which meant that restrictive prefixes could not be used with such
node definitions. Change the output field set for the choose() method
to allow it to specify any valid node set for the new upper tuple,
and to specify which of these nodes to place the modified lower tuple in.
In addition to its primary use for fixed node sets, this feature could
allow existing opclasses that use variable node sets to skip a separate
spgAddNode action when splitting a tuple, by setting up the node needed
for the incoming value as part of the spgSplitTuple action. However, care
would have to be taken to add the extra node only when it would not make
the tuple bigger than before. (spgAddNode can enlarge the tuple,
spgSplitTuple can't.)
This is a prerequisite for an upcoming SP-GiST inet opclass, but is
being committed separately to increase the visibility of the API change.
In passing, improve the documentation about the traverse-values feature
that was added by commit ccd6eb49a.
Emre Hasegeli, with cosmetic adjustments and documentation rework by me
Discussion: <CAE2gYzxtth9qatW_OAqdOjykS0bxq7AYHLuyAQLPgT7H9ZU0Cw@mail.gmail.com>
spg_text_inner_consistent is capable of reconstructing an empty string
to pass down to the next index level; this happens if we have an empty
string coming in, no prefix, and a dummy node label. (In practice, what
is needed to trigger that is insertion of a whole bunch of empty-string
values.) Then, we will arrive at the next level with in->level == 0
and a non-NULL (but zero length) in->reconstructedValue, which is valid
but the Assert tests weren't expecting it.
Per report from Andreas Seltenreich. This has no impact in non-Assert
builds, so should not be a problem in production, but back-patch to
all affected branches anyway.
In passing, remove a couple of useless variable initializations and
shorten the code by not duplicating DatumGetPointer() calls.
Previously, the code used a node label of zero both for strings that
contain no bytes beyond the inner tuple's prefix, and for cases where an
"allTheSame" inner tuple has to be split to allow a string with a different
next byte to be inserted into it. Failing to distinguish these cases meant
that if a string ending with the current prefix needed to be inserted into
an allTheSame tuple, we got into an infinite loop, because after splitting
the tuple we'd descend into the child allTheSame tuple and then find we
need to split again.
To fix, instead use -1 and -2 as the node labels for these two cases.
This requires widening the node label type from "char" to int2, but
fortunately SPGiST stores all pass-by-value node label types in their
Datum representation, which means that this change is transparently upward
compatible so far as the on-disk representation goes. We continue to
recognize zero as a dummy node label for reading purposes, but will not
attempt to push new index entries down into such a label, so that the loop
won't occur even when dealing with an existing index.
Per report from Teodor Sigaev. Back-patch to 9.2 where the faulty
code was introduced.
What we have implemented is a radix tree (or a radix trie or a patricia
trie), but the docs and code comments incorrectly called it a "suffix tree".
Alexander Korotkov
The original coding failed miserably for BLCKSZ of 4K or less, as reported
by Josh Kupershmidt. With the present design for text indexes, a given
inner tuple could have up to 256 labels (requiring either 3K or 4K bytes
depending on MAXALIGN), which means that we can't positively guarantee no
failures for smaller blocksizes. But we can at least make it behave sanely
so long as there are few enough labels to fit on a page. Considering that
btree is also more prone to "index tuple too large" failures when BLCKSZ is
small, it's not clear that we should expend more work than this on this
case.
The original API definition was incapable of supporting whole-index scans
because there was no way to invoke leaf-value reconstruction without
checking any qual conditions. Also, it was inefficient for
multiple-qual-condition scans because value reconstruction got done over
again for each qual condition, and because other internal work in the
consistent functions likewise had to be done for each qual. To fix these
issues, pass the whole scankey array to the opclass consistent functions,
instead of only letting them see one item at a time. (Essentially, the
loop over scankey entries is now inside the consistent functions not
outside them. This makes the consistent functions a bit more complicated,
but not unreasonably so.)
In itself this commit does nothing except save a few cycles in
multiple-qual-condition index scans, since we can't support whole-index
scans on SPGiST indexes until nulls are included in the index. However,
I consider this a must-fix for 9.2 because once we release it will get
very much harder to change the opclass API definition.
Operator classes can specify whether or not they support this; this
preserves the flexibility to use lossy representations within an index.
In passing, move constant data about a given index into the rd_amcache
cache area, instead of doing fresh lookups each time we start an index
operation. This is mainly to try to make sure that spgcanreturn() has
insignificant cost; I still don't have any proof that it matters for
actual index accesses. Also, get rid of useless copying of FmgrInfo
pointers; we can perfectly well use the relcache's versions in-place.
SP-GiST is comparable to GiST in flexibility, but supports non-balanced
partitioned search structures rather than balanced trees. As described at
PGCon 2011, this new indexing structure can beat GiST in both index build
time and query speed for search problems that it is well matched to.
There are a number of areas that could still use improvement, but at this
point the code seems committable.
Teodor Sigaev and Oleg Bartunov, with considerable revisions by Tom Lane