Similarly to B-tree, GiST index access method gets support of INCLUDE
attributes. These attributes aren't used for tree navigation and aren't
present in non-leaf pages. But they are present in leaf pages and can be
fetched during index-only scan.
The point of having INCLUDE attributes in GiST indexes is slightly different
from the point of having them in B-tree. The main point of INCLUDE attributes
in B-tree is to define UNIQUE constraint over part of attributes enabled for
index-only scan. In GiST the main point of INCLUDE attributes is to use
index-only scan for attributes, whose data types don't have GiST opclasses.
Discussion: https://postgr.es/m/73A1A452-AD5F-40D4-BD61-978622FF75C1%40yandex-team.ru
Author: Andrey Borodin, with small changes by me
Reviewed-by: Andreas Karlsson
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources. The upper case spellings are only used
in some files/modules. So standardize on the standard spellings.
The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.
In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
After further reflection I was unconvinced that the existing coding is
guaranteed to return valid union datums in every code path for multi-column
indexes. Fix that by forcing a gistunionsubkey() call at the end of the
recursion. Having done that, we can remove some clearly-redundant calls
elsewhere. This should be a little faster for multi-column indexes (since
the previous coding would uselessly do such a call for each column while
unwinding the recursion), as well as much harder to break.
Also, simplify the handling of cases where one side or the other of a
primary split contains only don't-care tuples. The previous coding used a
very ugly hack in removeDontCares() that essentially forced one random
tuple to be treated as non-don't-care, providing a random initial choice of
seed datum for the secondary split. It seems unlikely that that method
will give better-than-random splits. Instead, treat such a split as
degenerate and just let the next column determine the split, the same way
that we handle fully degenerate cases where the two sides produce identical
union datums.
This LOG message was put in over five years ago with the evident
expectation that we'd make all GiST opclasses support secondary split
directly. However, no such thing ever happened, and indeed the number of
opclasses supporting it decreased to zero in 9.2. The reason is that
improving on the default implementation isn't that easy --- the
opclass-specific code that did exist, before 9.2, doesn't appear to have
been any improvement over the default.
Hence, remove the message altogether. There's certainly no point in
nagging users about this in released branches, but I doubt that we'll
ever implement complete opclass-specific support anyway.
Improve comments, rename some variables and functions, slightly simplify
a couple of APIs, in an attempt to make this code readable by people other
than its original author.
Even though this is essentially just cosmetic, back-patch to all active
branches, because otherwise it's going to make back-patching future fixes
in this file very painful.
When considering a non-last column in a multi-column GiST index,
gistsplit.c tries to improve on the split chosen by the opclass-specific
pickSplit function by considering penalties for the next column. However,
there were two bugs in this code: it failed to recompute the union keys for
the leftmost index columns, even though these might well change after
reassigning tuples; and it included the old union keys in the recomputation
for the columns it did recompute, so that those keys couldn't get smaller
even if they should. The first problem could result in an invalid index
in which searches wouldn't find index entries that are in fact present;
the second would make the index less efficient to search.
Both of these errors were caused by misuse of gistMakeUnionItVec, whose
API was designed in a way that just begged such errors to be made. There
is no situation in which it's safe or useful to compute the union keys for
a subset of the index columns, and there is no caller that wants any
previous union keys to be included in the computation; so the undocumented
choice to treat the union keys as in/out rather than pure output parameters
is a waste of code as well as being dangerous.
Hence, rather than just making a minimal patch, I've changed the API of
gistMakeUnionItVec to remove the "startkey" parameter (it now always
processes all index columns) and treat the attr/isnull arrays as purely
output parameters.
In passing, also get rid of a couple of unnecessary and dangerous uses
of static variables in gistutil.c. It's remarkable that the one in
gistMakeUnionKey hasn't given us portability troubles before now, because
in addition to posing a re-entrancy hazard, it was unsafely assuming that
a static char[] array would have at least Datum alignment.
Per investigation of a trouble report from Tomas Vondra. (There are also
some bugs in contrib/btree_gist to be fixed, but that seems like material
for a separate patch.) Back-patch to all supported branches.
Experimentation with contrib/btree_gist shows that the majority of the GIST
support functions potentially need collation information. Safest policy
seems to be to pass it to all of them, instead of making assumptions about
which ones could possibly need it.
cleanup stage to finish incomplete inserts or splits anymore. There was two
reasons for the cleanup step:
1. When a new tuple was inserted to a leaf page, the downlink in the parent
needed to be updated to contain (ie. to be consistent with) the new key.
Updating the parent in turn might require recursively updating the parent of
the parent. We now handle that by updating the parent while traversing down
the tree, so that when we insert the leaf tuple, all the parents are already
consistent with the new key, and the tree is consistent at every step.
2. When a page is split, we need to insert the downlink for the new right
page(s), and update the downlink for the original page to not include keys
that moved to the right page(s). We now handle that by setting a new flag,
F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is
set, scans always follow the rightlink, regardless of the NSN mechanism used
to detect concurrent page splits. That way the tree is consistent right after
split, even though the downlink is still missing. This is very similar to the
way B-tree splits are handled. When the downlink is inserted in the parent,
the flag is cleared. To keep the insertion algorithm simple, when an
insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it
finishes the split before doing anything else.
These changes allow removing the whole "invalid tuple" mechanism, but I
retained the scan code to still follow invalid tuples correctly. While we
don't create any such tuples anymore, we want to handle them gracefully in
case you pg_upgrade a GiST index that has them. If we encounter any on an
insert, though, we just throw an error saying that you need to REINDEX.
The issue that got me into doing this is that if you did a checkpoint while
an insert or split was in progress, and the checkpoint finishes quickly so
that there is no WAL record related to the insert between RedoRecPtr and the
checkpoint record, recovery from that checkpoint would not know to finish
the incomplete insert. IOW, we have the same issue we solved with the
rm_safe_restartpoint mechanism during normal operation too. It's highly
unlikely to happen in practice, and this fix is far too large to backpatch,
so we're just going to live with in previous versions, but this refactoring
fixes it going forward.
With this patch, you don't get the annoying
'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices
anymore if you crash at an unfortunate moment.
corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.