entryGetItem()'s three code paths each contained bugs associated
with filtering the entries for gin_fuzzy_search_limit.
The posting-tree path failed to advance "advancePast" after having
decided to filter an item. If we ran out of items on the current
page and needed to advance to the next, what would actually happen
is that entryLoadMoreItems() would re-load the same page. Eventually,
the random dropItem() test would accept one of the same items it'd
previously rejected, and we'd move on --- but it could take awhile
with small gin_fuzzy_search_limit. To add insult to injury, this
case would inevitably cause entryLoadMoreItems() to decide it needed
to re-descend from the root, making things even slower.
The posting-list path failed to implement gin_fuzzy_search_limit
filtering at all, so that all entries in the posting list would
be returned.
The bitmap-result path used a "gotitem" variable that it failed to
update in the one place where it'd actually make a difference, ie
at the one "continue" statement. I think this was unreachable in
practice, because if we'd looped around then it shouldn't be the
case that the entries on the new page are before advancePast.
Still, the "gotitem" variable was contributing nothing to either
clarity or correctness, so get rid of it.
Refactor all three loops so that the termination conditions are
more alike and less unreadable.
The code coverage report showed that we had no coverage at all for
the re-descend-from-root code path in entryLoadMoreItems(), which
seems like a very bad thing, so add a test case that exercises it.
We also had exactly no coverage for gin_fuzzy_search_limit, so add a
simplistic test case that at least hits those code paths a little bit.
Back-patch to all supported branches.
Adé Heyward and Tom Lane
Discussion: https://postgr.es/m/CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com
Commit 7f46eaf added the regression test which checks that
gin_clean_pending_list() cleans up the GIN pending list and returns >0.
This usually works fine. But if autovacuum comes along and cleans
the list before gin_clean_pending_list() starts, the function may
return 0, and then the regression test may fail.
To fix the problem, this commit disables autovacuum on the target
index of gin_clean_pending_list() by setting autovacuum_enabled
reloption to off when creating the table.
Also this commit sets gin_pending_list_limit reloption to 4MB on
the target index. Otherwise when running "make installcheck" with
small gin_pending_list_limit GUC, insertions of data may trigger
the cleanup of pending list before gin_clean_pending_list() starts
and the function may return 0. This could cause the regression test
to fail.
Per buildfarm member spoonbill.
Reported-By: Tom Lane
This function cleans up the pending list of the GIN index by
moving entries in it to the main GIN data structure in bulk.
It returns the number of pages cleaned up from the pending list.
This function is useful, for example, when the pending list
needs to be cleaned up *quickly* to improve the performance of
the search using GIN index. VACUUM can do the same thing, too,
but it may take days to run on a large table.
Jeff Janes,
reviewed by Julien Rouhaud, Jaime Casanova, Alvaro Herrera and me.
Discussion: CAMkU=1x8zFkpfnozXyt40zmR3Ub_kHu58LtRmwHUKRgQss7=iQ@mail.gmail.com
That includes VACUUM on GIN, GiST and SP-GiST indexes, and B-tree indexes
large enough to cause page deletions in B-tree. Plus some other special
cases.
After this patch, the regression tests generate all different WAL record
types. Not all branches within the redo functions are covered, but it's a
step forward.