1
0
mirror of https://github.com/postgres/postgres.git synced 2025-10-25 13:17:41 +03:00

Further optimize nbtree search scan key comparisons.

Postgres 17 commit e0b1ee17 added two complementary optimizations to
nbtree: the "prechecked" and "firstmatch" optimizations.  _bt_readpage
was made to avoid needlessly evaluating keys that are guaranteed to be
satisfied by applying page-level context.  "prechecked" did this for
keys required in the current scan direction, while "firstmatch" did it
for keys required in the opposite-to-scan direction only.

The "prechecked" design had a number of notable issues.  It didn't
account for the fact that an = array scan key's sk_argument field might
need to advance at the point of the page precheck (it didn't check the
precheck tuple against the key's array, only the key's sk_argument,
which needlessly made it ineffective in cases involving stepping to a
page having advanced the scan's arrays using a truncated high key).
"prechecked" was also completely ineffective when only one scan key
wasn't guaranteed to be satisfied by every tuple (it didn't recognize
that it was still safe to avoid evaluating other, earlier keys).

The "firstmatch" optimization had similar limitations.  It could only be
applied after _bt_readpage found its first matching tuple, regardless of
why any earlier tuples failed to satisfy the scan's index quals.  This
allowed unsatisfied non-required scan keys to impede the optimization.

Replace both optimizations with a new optimization, without any of these
limitations: the "startikey" optimization.  Affected _bt_readpage calls
generate a page-level key offset ("startikey"), that their _bt_checkkeys
calls can then start at.  This is an offset to the first key that isn't
known to be satisfied by every tuple on the page.

Although this is independently useful work, its main goal is to avoid
performance regressions with index scans that use skip arrays, but still
never manage to skip over irrelevant leaf pages.  We must avoid wasting
CPU cycles on overly granular skip array maintenance in these cases.
The new "startikey" optimization helps with this by selectively
disabling array maintenance for the duration of a _bt_readpage call.
This has no lasting consequences for the scan's array keys (they'll
still reliably track the scan's progress through the index's key space
whenever the scan is "between pages").

Skip scan adds skip arrays during preprocessing using simple, static
rules, and decides how best to navigate/apply the scan's skip arrays
dynamically, at runtime.  The "startikey" optimization enables this
approach.  As a result of all this, the planner doesn't need to generate
distinct, competing index paths (one path for skip scan, another for an
equivalent traditional full index scan).  The overall effect is to make
scan runtime close to optimal, even when the planner works off an
incorrect cardinality estimate.  Scans will also perform well given a
skipped column with data skew: individual groups of pages with many
distinct values (in respect of a skipped column) can be read about as
efficiently as before -- without the scan being forced to give up on
skipping over other groups of pages that are provably irrelevant.

Many scans that cannot possibly skip will still benefit from the use of
skip arrays, since they'll allow the "startikey" optimization to be as
effective as possible (by allowing preprocessing to mark all the scan's
keys as required).  A scan that uses a skip array on "a" for a qual
"WHERE a BETWEEN 0 AND 1_000_000 AND b = 42" is often much faster now,
even when every tuple read by the scan has its own distinct "a" value.
However, there are still some remaining regressions, affecting certain
trickier cases.

Scans whose index quals have several range skip arrays, each on some
high cardinality column, can still be slower than they were before the
introduction of skip scan -- even with the new "startikey" optimization.
There are also known regressions affecting very selective index scans
that use a skip array.  The underlying issue with such selective scans
is that they never get as far as reading a second leaf page, and so will
never get a chance to consider applying the "startikey" optimization.
In principle, all regressions could be avoided by teaching preprocessing
to not add skip arrays whenever they aren't expected to help, but it
seems best to err on the side of robust performance.

Follow-up to commit 92fe23d9, which added nbtree skip scan.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=Y93jf5WjoOsN=xvqpMjRy-bxCE037bVFi-EasrpeUJA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WznWDK45JfNPNvDxh6RQy-TaCwULaM5u5ALMXbjLBMcugQ@mail.gmail.com
This commit is contained in:
Peter Geoghegan
2025-04-04 12:27:52 -04:00
parent 92fe23d93a
commit 8a510275dd
5 changed files with 484 additions and 154 deletions

View File

@@ -1390,6 +1390,7 @@ _bt_preprocess_array_keys(IndexScanDesc scan, int *new_numberOfKeys)
arrayKeyData = (ScanKey) palloc(numArrayKeyData * sizeof(ScanKeyData)); arrayKeyData = (ScanKey) palloc(numArrayKeyData * sizeof(ScanKeyData));
/* Allocate space for per-array data in the workspace context */ /* Allocate space for per-array data in the workspace context */
so->skipScan = (numSkipArrayKeys > 0);
so->arrayKeys = (BTArrayKeyInfo *) palloc(numArrayKeys * sizeof(BTArrayKeyInfo)); so->arrayKeys = (BTArrayKeyInfo *) palloc(numArrayKeys * sizeof(BTArrayKeyInfo));
/* Allocate space for ORDER procs used to help _bt_checkkeys */ /* Allocate space for ORDER procs used to help _bt_checkkeys */

View File

@@ -349,6 +349,7 @@ btbeginscan(Relation rel, int nkeys, int norderbys)
else else
so->keyData = NULL; so->keyData = NULL;
so->skipScan = false;
so->needPrimScan = false; so->needPrimScan = false;
so->scanBehind = false; so->scanBehind = false;
so->oppositeDirCheck = false; so->oppositeDirCheck = false;

View File

@@ -1648,47 +1648,14 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
pstate.finaltup = NULL; pstate.finaltup = NULL;
pstate.page = page; pstate.page = page;
pstate.firstpage = firstpage; pstate.firstpage = firstpage;
pstate.forcenonrequired = false;
pstate.startikey = 0;
pstate.offnum = InvalidOffsetNumber; pstate.offnum = InvalidOffsetNumber;
pstate.skip = InvalidOffsetNumber; pstate.skip = InvalidOffsetNumber;
pstate.continuescan = true; /* default assumption */ pstate.continuescan = true; /* default assumption */
pstate.prechecked = false;
pstate.firstmatch = false;
pstate.rechecks = 0; pstate.rechecks = 0;
pstate.targetdistance = 0; pstate.targetdistance = 0;
/*
* Prechecking the value of the continuescan flag for the last item on the
* page (for backwards scan it will be the first item on a page). If we
* observe it to be true, then it should be true for all other items. This
* allows us to do significant optimizations in the _bt_checkkeys()
* function for all the items on the page.
*
* With the forward scan, we do this check for the last item on the page
* instead of the high key. It's relatively likely that the most
* significant column in the high key will be different from the
* corresponding value from the last item on the page. So checking with
* the last item on the page would give a more precise answer.
*
* We skip this for the first page read by each (primitive) scan, to avoid
* slowing down point queries. They typically don't stand to gain much
* when the optimization can be applied, and are more likely to notice the
* overhead of the precheck. Also avoid it during scans with array keys,
* which might be using skip scan (XXX fixed in next commit).
*/
if (!pstate.firstpage && !arrayKeys && minoff < maxoff)
{
ItemId iid;
IndexTuple itup;
iid = PageGetItemId(page, ScanDirectionIsForward(dir) ? maxoff : minoff);
itup = (IndexTuple) PageGetItem(page, iid);
/* Call with arrayKeys=false to avoid undesirable side-effects */
_bt_checkkeys(scan, &pstate, false, itup, indnatts);
pstate.prechecked = pstate.continuescan;
pstate.continuescan = true; /* reset */
}
if (ScanDirectionIsForward(dir)) if (ScanDirectionIsForward(dir))
{ {
/* SK_SEARCHARRAY forward scans must provide high key up front */ /* SK_SEARCHARRAY forward scans must provide high key up front */
@@ -1716,6 +1683,13 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
so->scanBehind = so->oppositeDirCheck = false; /* reset */ so->scanBehind = so->oppositeDirCheck = false; /* reset */
} }
/*
* Consider pstate.startikey optimization once the ongoing primitive
* index scan has already read at least one page
*/
if (!pstate.firstpage && minoff < maxoff)
_bt_set_startikey(scan, &pstate);
/* load items[] in ascending order */ /* load items[] in ascending order */
itemIndex = 0; itemIndex = 0;
@@ -1752,6 +1726,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
{ {
Assert(!passes_quals && pstate.continuescan); Assert(!passes_quals && pstate.continuescan);
Assert(offnum < pstate.skip); Assert(offnum < pstate.skip);
Assert(!pstate.forcenonrequired);
offnum = pstate.skip; offnum = pstate.skip;
pstate.skip = InvalidOffsetNumber; pstate.skip = InvalidOffsetNumber;
@@ -1761,7 +1736,6 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
if (passes_quals) if (passes_quals)
{ {
/* tuple passes all scan key conditions */ /* tuple passes all scan key conditions */
pstate.firstmatch = true;
if (!BTreeTupleIsPosting(itup)) if (!BTreeTupleIsPosting(itup))
{ {
/* Remember it */ /* Remember it */
@@ -1816,7 +1790,8 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
int truncatt; int truncatt;
truncatt = BTreeTupleGetNAtts(itup, rel); truncatt = BTreeTupleGetNAtts(itup, rel);
pstate.prechecked = false; /* precheck didn't cover HIKEY */ pstate.forcenonrequired = false;
pstate.startikey = 0; /* _bt_set_startikey ignores HIKEY */
_bt_checkkeys(scan, &pstate, arrayKeys, itup, truncatt); _bt_checkkeys(scan, &pstate, arrayKeys, itup, truncatt);
} }
@@ -1855,6 +1830,13 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
so->scanBehind = so->oppositeDirCheck = false; /* reset */ so->scanBehind = so->oppositeDirCheck = false; /* reset */
} }
/*
* Consider pstate.startikey optimization once the ongoing primitive
* index scan has already read at least one page
*/
if (!pstate.firstpage && minoff < maxoff)
_bt_set_startikey(scan, &pstate);
/* load items[] in descending order */ /* load items[] in descending order */
itemIndex = MaxTIDsPerBTreePage; itemIndex = MaxTIDsPerBTreePage;
@@ -1894,6 +1876,11 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
Assert(!BTreeTupleIsPivot(itup)); Assert(!BTreeTupleIsPivot(itup));
pstate.offnum = offnum; pstate.offnum = offnum;
if (arrayKeys && offnum == minoff && pstate.forcenonrequired)
{
pstate.forcenonrequired = false;
pstate.startikey = 0;
}
passes_quals = _bt_checkkeys(scan, &pstate, arrayKeys, passes_quals = _bt_checkkeys(scan, &pstate, arrayKeys,
itup, indnatts); itup, indnatts);
@@ -1905,6 +1892,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
{ {
Assert(!passes_quals && pstate.continuescan); Assert(!passes_quals && pstate.continuescan);
Assert(offnum > pstate.skip); Assert(offnum > pstate.skip);
Assert(!pstate.forcenonrequired);
offnum = pstate.skip; offnum = pstate.skip;
pstate.skip = InvalidOffsetNumber; pstate.skip = InvalidOffsetNumber;
@@ -1914,7 +1902,6 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
if (passes_quals && tuple_alive) if (passes_quals && tuple_alive)
{ {
/* tuple passes all scan key conditions */ /* tuple passes all scan key conditions */
pstate.firstmatch = true;
if (!BTreeTupleIsPosting(itup)) if (!BTreeTupleIsPosting(itup))
{ {
/* Remember it */ /* Remember it */
@@ -1970,6 +1957,20 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum,
so->currPos.itemIndex = MaxTIDsPerBTreePage - 1; so->currPos.itemIndex = MaxTIDsPerBTreePage - 1;
} }
/*
* If _bt_set_startikey told us to temporarily treat the scan's keys as
* nonrequired (possible only during scans with array keys), there must be
* no lasting consequences for the scan's array keys. The scan's arrays
* should now have exactly the same elements as they would have had if the
* nonrequired behavior had never been used. (In general, a scan's arrays
* are expected to track its progress through the index's key space.)
*
* We are required (by _bt_set_startikey) to call _bt_checkkeys against
* pstate.finaltup with pstate.forcenonrequired=false to allow the scan's
* arrays to recover. Assert that that step hasn't been missed.
*/
Assert(!pstate.forcenonrequired);
return (so->currPos.firstItem <= so->currPos.lastItem); return (so->currPos.firstItem <= so->currPos.lastItem);
} }

View File

@@ -57,11 +57,11 @@ static bool _bt_oppodir_checkkeys(IndexScanDesc scan, ScanDirection dir,
IndexTuple finaltup); IndexTuple finaltup);
static bool _bt_check_compare(IndexScanDesc scan, ScanDirection dir, static bool _bt_check_compare(IndexScanDesc scan, ScanDirection dir,
IndexTuple tuple, int tupnatts, TupleDesc tupdesc, IndexTuple tuple, int tupnatts, TupleDesc tupdesc,
bool advancenonrequired, bool prechecked, bool firstmatch, bool advancenonrequired, bool forcenonrequired,
bool *continuescan, int *ikey); bool *continuescan, int *ikey);
static bool _bt_check_rowcompare(ScanKey skey, static bool _bt_check_rowcompare(ScanKey skey,
IndexTuple tuple, int tupnatts, TupleDesc tupdesc, IndexTuple tuple, int tupnatts, TupleDesc tupdesc,
ScanDirection dir, bool *continuescan); ScanDirection dir, bool forcenonrequired, bool *continuescan);
static void _bt_checkkeys_look_ahead(IndexScanDesc scan, BTReadPageState *pstate, static void _bt_checkkeys_look_ahead(IndexScanDesc scan, BTReadPageState *pstate,
int tupnatts, TupleDesc tupdesc); int tupnatts, TupleDesc tupdesc);
static int _bt_keep_natts(Relation rel, IndexTuple lastleft, static int _bt_keep_natts(Relation rel, IndexTuple lastleft,
@@ -1422,9 +1422,10 @@ _bt_start_prim_scan(IndexScanDesc scan, ScanDirection dir)
* postcondition's <= operator with a >=. In other words, just swap the * postcondition's <= operator with a >=. In other words, just swap the
* precondition with the postcondition.) * precondition with the postcondition.)
* *
* We also deal with "advancing" non-required arrays here. Callers whose * We also deal with "advancing" non-required arrays here (or arrays that are
* sktrig scan key is non-required specify sktrig_required=false. These calls * treated as non-required for the duration of a _bt_readpage call). Callers
* are the only exception to the general rule about always advancing the * whose sktrig scan key is non-required specify sktrig_required=false. These
* calls are the only exception to the general rule about always advancing the
* required array keys (the scan may not even have a required array). These * required array keys (the scan may not even have a required array). These
* callers should just pass a NULL pstate (since there is never any question * callers should just pass a NULL pstate (since there is never any question
* of stopping the scan). No call to _bt_tuple_before_array_skeys is required * of stopping the scan). No call to _bt_tuple_before_array_skeys is required
@@ -1464,6 +1465,7 @@ _bt_advance_array_keys(IndexScanDesc scan, BTReadPageState *pstate,
all_satisfied = true; all_satisfied = true;
Assert(!so->needPrimScan && !so->scanBehind && !so->oppositeDirCheck); Assert(!so->needPrimScan && !so->scanBehind && !so->oppositeDirCheck);
Assert(_bt_verify_keys_with_arraykeys(scan));
if (sktrig_required) if (sktrig_required)
{ {
@@ -1473,17 +1475,6 @@ _bt_advance_array_keys(IndexScanDesc scan, BTReadPageState *pstate,
Assert(!_bt_tuple_before_array_skeys(scan, dir, tuple, tupdesc, Assert(!_bt_tuple_before_array_skeys(scan, dir, tuple, tupdesc,
tupnatts, false, 0, NULL)); tupnatts, false, 0, NULL));
/*
* Required scan key wasn't satisfied, so required arrays will have to
* advance. Invalidate page-level state that tracks whether the
* scan's required-in-opposite-direction-only keys are known to be
* satisfied by page's remaining tuples.
*/
pstate->firstmatch = false;
/* Shouldn't have to invalidate 'prechecked', though */
Assert(!pstate->prechecked);
/* /*
* Once we return we'll have a new set of required array keys, so * Once we return we'll have a new set of required array keys, so
* reset state used by "look ahead" optimization * reset state used by "look ahead" optimization
@@ -1491,8 +1482,26 @@ _bt_advance_array_keys(IndexScanDesc scan, BTReadPageState *pstate,
pstate->rechecks = 0; pstate->rechecks = 0;
pstate->targetdistance = 0; pstate->targetdistance = 0;
} }
else if (sktrig < so->numberOfKeys - 1 &&
!(so->keyData[so->numberOfKeys - 1].sk_flags & SK_SEARCHARRAY))
{
int least_sign_ikey = so->numberOfKeys - 1;
bool continuescan;
Assert(_bt_verify_keys_with_arraykeys(scan)); /*
* Optimization: perform a precheck of the least significant key
* during !sktrig_required calls when it isn't already our sktrig
* (provided the precheck key is not itself an array).
*
* When the precheck works out we'll avoid an expensive binary search
* of sktrig's array (plus any other arrays before least_sign_ikey).
*/
Assert(so->keyData[sktrig].sk_flags & SK_SEARCHARRAY);
if (!_bt_check_compare(scan, dir, tuple, tupnatts, tupdesc, false,
false, &continuescan,
&least_sign_ikey))
return false;
}
for (int ikey = 0; ikey < so->numberOfKeys; ikey++) for (int ikey = 0; ikey < so->numberOfKeys; ikey++)
{ {
@@ -1534,8 +1543,6 @@ _bt_advance_array_keys(IndexScanDesc scan, BTReadPageState *pstate,
if (cur->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) if (cur->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD))
{ {
Assert(sktrig_required);
required = true; required = true;
if (cur->sk_attno > tupnatts) if (cur->sk_attno > tupnatts)
@@ -1669,7 +1676,7 @@ _bt_advance_array_keys(IndexScanDesc scan, BTReadPageState *pstate,
} }
else else
{ {
Assert(sktrig_required && required); Assert(required);
/* /*
* This is a required non-array equality strategy scan key, which * This is a required non-array equality strategy scan key, which
@@ -1711,7 +1718,7 @@ _bt_advance_array_keys(IndexScanDesc scan, BTReadPageState *pstate,
* be eliminated by _bt_preprocess_keys. It won't matter if some of * be eliminated by _bt_preprocess_keys. It won't matter if some of
* our "true" array scan keys (or even all of them) are non-required. * our "true" array scan keys (or even all of them) are non-required.
*/ */
if (required && if (sktrig_required && required &&
((ScanDirectionIsForward(dir) && result > 0) || ((ScanDirectionIsForward(dir) && result > 0) ||
(ScanDirectionIsBackward(dir) && result < 0))) (ScanDirectionIsBackward(dir) && result < 0)))
beyond_end_advance = true; beyond_end_advance = true;
@@ -1726,7 +1733,7 @@ _bt_advance_array_keys(IndexScanDesc scan, BTReadPageState *pstate,
* array scan keys are considered interesting.) * array scan keys are considered interesting.)
*/ */
all_satisfied = false; all_satisfied = false;
if (required) if (sktrig_required && required)
all_required_satisfied = false; all_required_satisfied = false;
else else
{ {
@@ -1786,6 +1793,12 @@ _bt_advance_array_keys(IndexScanDesc scan, BTReadPageState *pstate,
* of any required scan key). All that matters is whether caller's tuple * of any required scan key). All that matters is whether caller's tuple
* satisfies the new qual, so it's safe to just skip the _bt_check_compare * satisfies the new qual, so it's safe to just skip the _bt_check_compare
* recheck when we've already determined that it can only return 'false'. * recheck when we've already determined that it can only return 'false'.
*
* Note: In practice most scan keys are marked required by preprocessing,
* if necessary by generating a preceding skip array. We nevertheless
* often handle array keys marked required as if they were nonrequired.
* This behavior is requested by our _bt_check_compare caller, though only
* when it is passed "forcenonrequired=true" by _bt_checkkeys.
*/ */
if ((sktrig_required && all_required_satisfied) || if ((sktrig_required && all_required_satisfied) ||
(!sktrig_required && all_satisfied)) (!sktrig_required && all_satisfied))
@@ -1796,9 +1809,9 @@ _bt_advance_array_keys(IndexScanDesc scan, BTReadPageState *pstate,
Assert(all_required_satisfied); Assert(all_required_satisfied);
/* Recheck _bt_check_compare on behalf of caller */ /* Recheck _bt_check_compare on behalf of caller */
if (_bt_check_compare(scan, dir, tuple, tupnatts, tupdesc, if (_bt_check_compare(scan, dir, tuple, tupnatts, tupdesc, false,
false, false, false, false, &continuescan,
&continuescan, &nsktrig) && &nsktrig) &&
!so->scanBehind) !so->scanBehind)
{ {
/* This tuple satisfies the new qual */ /* This tuple satisfies the new qual */
@@ -2042,8 +2055,9 @@ new_prim_scan:
* read at least one leaf page before the one we're reading now. This * read at least one leaf page before the one we're reading now. This
* makes primscan scheduling more efficient when scanning subsets of an * makes primscan scheduling more efficient when scanning subsets of an
* index with many distinct attribute values matching many array elements. * index with many distinct attribute values matching many array elements.
* It encourages fewer, larger primitive scans where that makes sense * It encourages fewer, larger primitive scans where that makes sense.
* (where index descent costs need to be kept under control). * This will in turn encourage _bt_readpage to apply the pstate.startikey
* optimization more often.
* *
* Note: This heuristic isn't as aggressive as you might think. We're * Note: This heuristic isn't as aggressive as you might think. We're
* conservative about allowing a primitive scan to step from the first * conservative about allowing a primitive scan to step from the first
@@ -2200,17 +2214,14 @@ _bt_verify_keys_with_arraykeys(IndexScanDesc scan)
* the page to the right. * the page to the right.
* *
* Advances the scan's array keys when necessary for arrayKeys=true callers. * Advances the scan's array keys when necessary for arrayKeys=true callers.
* Caller can avoid all array related side-effects when calling just to do a * Scans without any array keys must always pass arrayKeys=false.
* page continuescan precheck -- pass arrayKeys=false for that. Scans without
* any arrays keys must always pass arrayKeys=false.
* *
* Also stops and starts primitive index scans for arrayKeys=true callers. * Also stops and starts primitive index scans for arrayKeys=true callers.
* Scans with array keys are required to set up page state that helps us with * Scans with array keys are required to set up page state that helps us with
* this. The page's finaltup tuple (the page high key for a forward scan, or * this. The page's finaltup tuple (the page high key for a forward scan, or
* the page's first non-pivot tuple for a backward scan) must be set in * the page's first non-pivot tuple for a backward scan) must be set in
* pstate.finaltup ahead of the first call here for the page (or possibly the * pstate.finaltup ahead of the first call here for the page. Set this to
* first call after an initial continuescan-setting page precheck call). Set * NULL for rightmost page (or the leftmost page for backwards scans).
* this to NULL for rightmost page (or the leftmost page for backwards scans).
* *
* scan: index scan descriptor (containing a search-type scankey) * scan: index scan descriptor (containing a search-type scankey)
* pstate: page level input and output parameters * pstate: page level input and output parameters
@@ -2225,42 +2236,46 @@ _bt_checkkeys(IndexScanDesc scan, BTReadPageState *pstate, bool arrayKeys,
TupleDesc tupdesc = RelationGetDescr(scan->indexRelation); TupleDesc tupdesc = RelationGetDescr(scan->indexRelation);
BTScanOpaque so = (BTScanOpaque) scan->opaque; BTScanOpaque so = (BTScanOpaque) scan->opaque;
ScanDirection dir = so->currPos.dir; ScanDirection dir = so->currPos.dir;
int ikey = 0; int ikey = pstate->startikey;
bool res; bool res;
Assert(BTreeTupleGetNAtts(tuple, scan->indexRelation) == tupnatts); Assert(BTreeTupleGetNAtts(tuple, scan->indexRelation) == tupnatts);
Assert(!so->needPrimScan && !so->scanBehind && !so->oppositeDirCheck); Assert(!so->needPrimScan && !so->scanBehind && !so->oppositeDirCheck);
Assert(arrayKeys || so->numArrayKeys == 0);
res = _bt_check_compare(scan, dir, tuple, tupnatts, tupdesc, res = _bt_check_compare(scan, dir, tuple, tupnatts, tupdesc, arrayKeys,
arrayKeys, pstate->prechecked, pstate->firstmatch, pstate->forcenonrequired, &pstate->continuescan,
&pstate->continuescan, &ikey); &ikey);
/*
* If _bt_check_compare relied on the pstate.startikey optimization, call
* again (in assert-enabled builds) to verify it didn't affect our answer.
*
* Note: we can't do this when !pstate.forcenonrequired, since any arrays
* before pstate.startikey won't have advanced on this page at all.
*/
Assert(!pstate->forcenonrequired || arrayKeys);
#ifdef USE_ASSERT_CHECKING #ifdef USE_ASSERT_CHECKING
if (!arrayKeys && so->numArrayKeys) if (pstate->startikey > 0 && !pstate->forcenonrequired)
{ {
/* bool dres,
* This is a continuescan precheck call for a scan with array keys. dcontinuescan;
*
* Assert that the scan isn't in danger of becoming confused.
*/
Assert(!so->scanBehind && !so->oppositeDirCheck);
Assert(!pstate->prechecked && !pstate->firstmatch);
Assert(!_bt_tuple_before_array_skeys(scan, dir, tuple, tupdesc,
tupnatts, false, 0, NULL));
}
if (pstate->prechecked || pstate->firstmatch)
{
bool dcontinuescan;
int dikey = 0; int dikey = 0;
/* /* Pass arrayKeys=false to avoid array side-effects */
* Call relied on continuescan/firstmatch prechecks -- assert that we dres = _bt_check_compare(scan, dir, tuple, tupnatts, tupdesc, false,
* get the same answer without those optimizations pstate->forcenonrequired, &dcontinuescan,
*/ &dikey);
Assert(res == _bt_check_compare(scan, dir, tuple, tupnatts, tupdesc, Assert(res == dres);
false, false, false,
&dcontinuescan, &dikey));
Assert(pstate->continuescan == dcontinuescan); Assert(pstate->continuescan == dcontinuescan);
/*
* Should also get the same ikey result. We need a slightly weaker
* assertion during arrayKeys calls, since they might be using an
* array that couldn't be marked required during preprocessing.
*/
Assert(arrayKeys || ikey == dikey);
Assert(ikey <= dikey);
} }
#endif #endif
@@ -2281,6 +2296,7 @@ _bt_checkkeys(IndexScanDesc scan, BTReadPageState *pstate, bool arrayKeys,
* It's also possible that the scan is still _before_ the _start_ of * It's also possible that the scan is still _before_ the _start_ of
* tuples matching the current set of array keys. Check for that first. * tuples matching the current set of array keys. Check for that first.
*/ */
Assert(!pstate->forcenonrequired);
if (_bt_tuple_before_array_skeys(scan, dir, tuple, tupdesc, tupnatts, true, if (_bt_tuple_before_array_skeys(scan, dir, tuple, tupdesc, tupnatts, true,
ikey, NULL)) ikey, NULL))
{ {
@@ -2394,8 +2410,9 @@ _bt_oppodir_checkkeys(IndexScanDesc scan, ScanDirection dir,
Assert(so->numArrayKeys); Assert(so->numArrayKeys);
_bt_check_compare(scan, flipped, finaltup, nfinaltupatts, tupdesc, _bt_check_compare(scan, flipped, finaltup, nfinaltupatts, tupdesc, false,
false, false, false, &continuescan, &ikey); false, &continuescan,
&ikey);
if (!continuescan && so->keyData[ikey].sk_strategy != BTEqualStrategyNumber) if (!continuescan && so->keyData[ikey].sk_strategy != BTEqualStrategyNumber)
return false; return false;
@@ -2403,6 +2420,301 @@ _bt_oppodir_checkkeys(IndexScanDesc scan, ScanDirection dir,
return true; return true;
} }
/*
* Determines an offset to the first scan key (an so->keyData[]-wise offset)
* that is _not_ guaranteed to be satisfied by every tuple from pstate.page,
* which is set in pstate.startikey for _bt_checkkeys calls for the page.
* This allows caller to save cycles on comparisons of a prefix of keys while
* reading pstate.page.
*
* Also determines if later calls to _bt_checkkeys (for pstate.page) should be
* forced to treat all required scan keys >= pstate.startikey as nonrequired
* (that is, if they're to be treated as if any SK_BT_REQFWD/SK_BT_REQBKWD
* markings that were set by preprocessing were not set at all, for the
* duration of _bt_checkkeys calls prior to the call for pstate.finaltup).
* This is indicated to caller by setting pstate.forcenonrequired.
*
* Call here at the start of reading a leaf page beyond the first one for the
* primitive index scan. We consider all non-pivot tuples, so it doesn't make
* sense to call here when only a subset of those tuples can ever be read.
* This is also a good idea on performance grounds; not calling here when on
* the first page (first for the current primitive scan) avoids wasting cycles
* during selective point queries. They typically don't stand to gain as much
* when we can set pstate.startikey, and are likely to notice the overhead of
* calling here.
*
* Caller must reset startikey and forcenonrequired ahead of the _bt_checkkeys
* call for pstate.finaltup iff we set forcenonrequired=true. This will give
* _bt_checkkeys the opportunity to call _bt_advance_array_keys once more,
* with sktrig_required=true, to advance the arrays that were ignored during
* checks of all of the page's prior tuples. Caller doesn't need to do this
* on the rightmost/leftmost page in the index (where pstate.finaltup isn't
* set), since forcenonrequired won't be set here by us in the first place.
*/
void
_bt_set_startikey(IndexScanDesc scan, BTReadPageState *pstate)
{
BTScanOpaque so = (BTScanOpaque) scan->opaque;
Relation rel = scan->indexRelation;
TupleDesc tupdesc = RelationGetDescr(rel);
ItemId iid;
IndexTuple firsttup,
lasttup;
int startikey = 0,
arrayidx = 0,
firstchangingattnum;
bool start_past_saop_eq = false;
Assert(!so->scanBehind);
Assert(pstate->minoff < pstate->maxoff);
Assert(!pstate->firstpage);
Assert(pstate->startikey == 0);
Assert(!so->numArrayKeys || pstate->finaltup ||
P_RIGHTMOST(BTPageGetOpaque(pstate->page)) ||
P_LEFTMOST(BTPageGetOpaque(pstate->page)));
if (so->numberOfKeys == 0)
return;
/* minoff is an offset to the lowest non-pivot tuple on the page */
iid = PageGetItemId(pstate->page, pstate->minoff);
firsttup = (IndexTuple) PageGetItem(pstate->page, iid);
/* maxoff is an offset to the highest non-pivot tuple on the page */
iid = PageGetItemId(pstate->page, pstate->maxoff);
lasttup = (IndexTuple) PageGetItem(pstate->page, iid);
/* Determine the first attribute whose values change on caller's page */
firstchangingattnum = _bt_keep_natts_fast(rel, firsttup, lasttup);
for (; startikey < so->numberOfKeys; startikey++)
{
ScanKey key = so->keyData + startikey;
BTArrayKeyInfo *array;
Datum firstdatum,
lastdatum;
bool firstnull,
lastnull;
int32 result;
/*
* Determine if it's safe to set pstate.startikey to an offset to a
* key that comes after this key, by examining this key
*/
if (!(key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)))
{
/* Scan key isn't marked required (corner case) */
Assert(!(key->sk_flags & SK_ROW_HEADER));
break; /* unsafe */
}
if (key->sk_flags & SK_ROW_HEADER)
{
/*
* Can't let pstate.startikey get set to an ikey beyond a
* RowCompare inequality
*/
break; /* unsafe */
}
if (key->sk_strategy != BTEqualStrategyNumber)
{
/*
* Scalar inequality key.
*
* It's definitely safe for _bt_checkkeys to avoid assessing this
* inequality when the page's first and last non-pivot tuples both
* satisfy the inequality (since the same must also be true of all
* the tuples in between these two).
*
* Unlike the "=" case, it doesn't matter if this attribute has
* more than one distinct value (though it _is_ necessary for any
* and all _prior_ attributes to contain no more than one distinct
* value amongst all of the tuples from pstate.page).
*/
if (key->sk_attno > firstchangingattnum) /* >, not >= */
break; /* unsafe, preceding attr has multiple
* distinct values */
firstdatum = index_getattr(firsttup, key->sk_attno, tupdesc, &firstnull);
lastdatum = index_getattr(lasttup, key->sk_attno, tupdesc, &lastnull);
if (key->sk_flags & SK_ISNULL)
{
/* IS NOT NULL key */
Assert(key->sk_flags & SK_SEARCHNOTNULL);
if (firstnull || lastnull)
break; /* unsafe */
/* Safe, IS NOT NULL key satisfied by every tuple */
continue;
}
/* Test firsttup */
if (firstnull ||
!DatumGetBool(FunctionCall2Coll(&key->sk_func,
key->sk_collation, firstdatum,
key->sk_argument)))
break; /* unsafe */
/* Test lasttup */
if (lastnull ||
!DatumGetBool(FunctionCall2Coll(&key->sk_func,
key->sk_collation, lastdatum,
key->sk_argument)))
break; /* unsafe */
/* Safe, scalar inequality satisfied by every tuple */
continue;
}
/* Some = key (could be a a scalar = key, could be an array = key) */
Assert(key->sk_strategy == BTEqualStrategyNumber);
if (!(key->sk_flags & SK_SEARCHARRAY))
{
/*
* Scalar = key (possibly an IS NULL key).
*
* It is unsafe to set pstate.startikey to an ikey beyond this
* key, unless the = key is satisfied by every possible tuple on
* the page (possible only when attribute has just one distinct
* value among all tuples on the page).
*/
if (key->sk_attno >= firstchangingattnum)
break; /* unsafe, multiple distinct attr values */
firstdatum = index_getattr(firsttup, key->sk_attno, tupdesc,
&firstnull);
if (key->sk_flags & SK_ISNULL)
{
/* IS NULL key */
Assert(key->sk_flags & SK_SEARCHNULL);
if (!firstnull)
break; /* unsafe */
/* Safe, IS NULL key satisfied by every tuple */
continue;
}
if (firstnull ||
!DatumGetBool(FunctionCall2Coll(&key->sk_func,
key->sk_collation, firstdatum,
key->sk_argument)))
break; /* unsafe */
/* Safe, scalar = key satisfied by every tuple */
continue;
}
/* = array key (could be a SAOP array, could be a skip array) */
array = &so->arrayKeys[arrayidx++];
Assert(array->scan_key == startikey);
if (array->num_elems != -1)
{
/*
* SAOP array = key.
*
* Handle this like we handle scalar = keys (though binary search
* for a matching element, to avoid relying on key's sk_argument).
*/
if (key->sk_attno >= firstchangingattnum)
break; /* unsafe, multiple distinct attr values */
firstdatum = index_getattr(firsttup, key->sk_attno, tupdesc,
&firstnull);
_bt_binsrch_array_skey(&so->orderProcs[startikey],
false, NoMovementScanDirection,
firstdatum, firstnull, array, key,
&result);
if (result != 0)
break; /* unsafe */
/* Safe, SAOP = key satisfied by every tuple */
start_past_saop_eq = true;
continue;
}
/*
* Skip array = key
*/
Assert(key->sk_flags & SK_BT_SKIP);
if (array->null_elem)
{
/*
* Non-range skip array = key.
*
* Safe, non-range skip array "satisfied" by every tuple on page
* (safe even when "key->sk_attno > firstchangingattnum").
*/
continue;
}
/*
* Range skip array = key.
*
* Handle this like we handle scalar inequality keys (but avoid using
* key's sk_argument directly, as in the SAOP array case).
*/
if (key->sk_attno > firstchangingattnum) /* >, not >= */
break; /* unsafe, preceding attr has multiple
* distinct values */
firstdatum = index_getattr(firsttup, key->sk_attno, tupdesc, &firstnull);
lastdatum = index_getattr(lasttup, key->sk_attno, tupdesc, &lastnull);
/* Test firsttup */
_bt_binsrch_skiparray_skey(false, ForwardScanDirection,
firstdatum, firstnull, array, key,
&result);
if (result != 0)
break; /* unsafe */
/* Test lasttup */
_bt_binsrch_skiparray_skey(false, ForwardScanDirection,
lastdatum, lastnull, array, key,
&result);
if (result != 0)
break; /* unsafe */
/* Safe, range skip array satisfied by every tuple on page */
}
/*
* Use of forcenonrequired is typically undesirable, since it'll force
* _bt_readpage caller to read every tuple on the page -- even though, in
* general, it might well be possible to end the scan on an earlier tuple.
* However, caller must use forcenonrequired when start_past_saop_eq=true,
* since the usual required array behavior might fail to roll over to the
* SAOP array.
*
* We always prefer forcenonrequired=true during scans with skip arrays
* (except on the first page of each primitive index scan), though -- even
* when "startikey == 0". That way, _bt_advance_array_keys's low-order
* key precheck optimization can always be used (unless on the first page
* of the scan). It seems slightly preferable to check more tuples when
* that allows us to do significantly less skip array maintenance.
*/
pstate->forcenonrequired = (start_past_saop_eq || so->skipScan);
pstate->startikey = startikey;
/*
* _bt_readpage caller is required to call _bt_checkkeys against page's
* finaltup with forcenonrequired=false whenever we initially set
* forcenonrequired=true. That way the scan's arrays will reliably track
* its progress through the index's key space.
*
* We don't expect this when _bt_readpage caller has no finaltup due to
* its page being the rightmost (or the leftmost, during backwards scans).
* When we see that _bt_readpage has no finaltup, back out of everything.
*/
Assert(!pstate->forcenonrequired || so->numArrayKeys);
if (pstate->forcenonrequired && !pstate->finaltup)
{
pstate->forcenonrequired = false;
pstate->startikey = 0;
}
}
/* /*
* Test whether an indextuple satisfies current scan condition. * Test whether an indextuple satisfies current scan condition.
* *
@@ -2432,23 +2744,33 @@ _bt_oppodir_checkkeys(IndexScanDesc scan, ScanDirection dir,
* by the current array key, or if they're truly unsatisfied (that is, if * by the current array key, or if they're truly unsatisfied (that is, if
* they're unsatisfied by every possible array key). * they're unsatisfied by every possible array key).
* *
* Though we advance non-required array keys on our own, that shouldn't have
* any lasting consequences for the scan. By definition, non-required arrays
* have no fixed relationship with the scan's progress. (There are delicate
* considerations for non-required arrays when the arrays need to be advanced
* following our setting continuescan to false, but that doesn't concern us.)
*
* Pass advancenonrequired=false to avoid all array related side effects. * Pass advancenonrequired=false to avoid all array related side effects.
* This allows _bt_advance_array_keys caller to avoid infinite recursion. * This allows _bt_advance_array_keys caller to avoid infinite recursion.
*
* Pass forcenonrequired=true to instruct us to treat all keys as nonrequired.
* This is used to make it safe to temporarily stop properly maintaining the
* scan's required arrays. _bt_checkkeys caller (_bt_readpage, actually)
* determines a prefix of keys that must satisfy every possible corresponding
* index attribute value from its page, which is passed to us via *ikey arg
* (this is the first key that might be unsatisfied by tuples on the page).
* Obviously, we won't maintain any array keys from before *ikey, so it's
* quite possible for such arrays to "fall behind" the index's keyspace.
* Caller will need to "catch up" by passing forcenonrequired=true (alongside
* an *ikey=0) once the page's finaltup is reached.
*
* Note: it's safe to pass an *ikey > 0 with forcenonrequired=false, but only
* when caller determines that it won't affect array maintenance.
*/ */
static bool static bool
_bt_check_compare(IndexScanDesc scan, ScanDirection dir, _bt_check_compare(IndexScanDesc scan, ScanDirection dir,
IndexTuple tuple, int tupnatts, TupleDesc tupdesc, IndexTuple tuple, int tupnatts, TupleDesc tupdesc,
bool advancenonrequired, bool prechecked, bool firstmatch, bool advancenonrequired, bool forcenonrequired,
bool *continuescan, int *ikey) bool *continuescan, int *ikey)
{ {
BTScanOpaque so = (BTScanOpaque) scan->opaque; BTScanOpaque so = (BTScanOpaque) scan->opaque;
Assert(!forcenonrequired || advancenonrequired);
*continuescan = true; /* default assumption */ *continuescan = true; /* default assumption */
for (; *ikey < so->numberOfKeys; (*ikey)++) for (; *ikey < so->numberOfKeys; (*ikey)++)
@@ -2461,36 +2783,20 @@ _bt_check_compare(IndexScanDesc scan, ScanDirection dir,
/* /*
* Check if the key is required in the current scan direction, in the * Check if the key is required in the current scan direction, in the
* opposite scan direction _only_, or in neither direction * opposite scan direction _only_, or in neither direction (except
* when we're forced to treat all scan keys as nonrequired)
*/ */
if (((key->sk_flags & SK_BT_REQFWD) && ScanDirectionIsForward(dir)) || if (forcenonrequired)
((key->sk_flags & SK_BT_REQBKWD) && ScanDirectionIsBackward(dir))) {
/* treating scan's keys as non-required */
}
else if (((key->sk_flags & SK_BT_REQFWD) && ScanDirectionIsForward(dir)) ||
((key->sk_flags & SK_BT_REQBKWD) && ScanDirectionIsBackward(dir)))
requiredSameDir = true; requiredSameDir = true;
else if (((key->sk_flags & SK_BT_REQFWD) && ScanDirectionIsBackward(dir)) || else if (((key->sk_flags & SK_BT_REQFWD) && ScanDirectionIsBackward(dir)) ||
((key->sk_flags & SK_BT_REQBKWD) && ScanDirectionIsForward(dir))) ((key->sk_flags & SK_BT_REQBKWD) && ScanDirectionIsForward(dir)))
requiredOppositeDirOnly = true; requiredOppositeDirOnly = true;
/*
* If the caller told us the *continuescan flag is known to be true
* for the last item on the page, then we know the keys required for
* the current direction scan should be matched. Otherwise, the
* *continuescan flag would be set for the current item and
* subsequently the last item on the page accordingly.
*
* If the key is required for the opposite direction scan, we can skip
* the check if the caller tells us there was already at least one
* matching item on the page. Also, we require the *continuescan flag
* to be true for the last item on the page to know there are no
* NULLs.
*
* Both cases above work except for the row keys, where NULLs could be
* found in the middle of matching values.
*/
if (prechecked &&
(requiredSameDir || (requiredOppositeDirOnly && firstmatch)) &&
!(key->sk_flags & SK_ROW_HEADER))
continue;
if (key->sk_attno > tupnatts) if (key->sk_attno > tupnatts)
{ {
/* /*
@@ -2512,6 +2818,16 @@ _bt_check_compare(IndexScanDesc scan, ScanDirection dir,
{ {
Assert(key->sk_flags & SK_SEARCHARRAY); Assert(key->sk_flags & SK_SEARCHARRAY);
Assert(key->sk_flags & SK_BT_SKIP); Assert(key->sk_flags & SK_BT_SKIP);
Assert(requiredSameDir || forcenonrequired);
/*
* Cannot fall back on _bt_tuple_before_array_skeys when we're
* treating the scan's keys as nonrequired, though. Just handle
* this like any other non-required equality-type array key.
*/
if (forcenonrequired)
return _bt_advance_array_keys(scan, NULL, tuple, tupnatts,
tupdesc, *ikey, false);
*continuescan = false; *continuescan = false;
return false; return false;
@@ -2521,7 +2837,7 @@ _bt_check_compare(IndexScanDesc scan, ScanDirection dir,
if (key->sk_flags & SK_ROW_HEADER) if (key->sk_flags & SK_ROW_HEADER)
{ {
if (_bt_check_rowcompare(key, tuple, tupnatts, tupdesc, dir, if (_bt_check_rowcompare(key, tuple, tupnatts, tupdesc, dir,
continuescan)) forcenonrequired, continuescan))
continue; continue;
return false; return false;
} }
@@ -2554,9 +2870,20 @@ _bt_check_compare(IndexScanDesc scan, ScanDirection dir,
*/ */
if (requiredSameDir) if (requiredSameDir)
*continuescan = false; *continuescan = false;
else if (unlikely(key->sk_flags & SK_BT_SKIP))
{
/*
* If we're treating scan keys as nonrequired, and encounter a
* skip array scan key whose current element is NULL, then it
* must be a non-range skip array. It must be satisfied, so
* there's no need to call _bt_advance_array_keys to check.
*/
Assert(forcenonrequired && *ikey > 0);
continue;
}
/* /*
* In any case, this indextuple doesn't match the qual. * This indextuple doesn't match the qual.
*/ */
return false; return false;
} }
@@ -2577,7 +2904,7 @@ _bt_check_compare(IndexScanDesc scan, ScanDirection dir,
* (_bt_advance_array_keys also relies on this behavior during * (_bt_advance_array_keys also relies on this behavior during
* forward scans.) * forward scans.)
*/ */
if ((key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) && if ((requiredSameDir || requiredOppositeDirOnly) &&
ScanDirectionIsBackward(dir)) ScanDirectionIsBackward(dir))
*continuescan = false; *continuescan = false;
} }
@@ -2595,7 +2922,7 @@ _bt_check_compare(IndexScanDesc scan, ScanDirection dir,
* (_bt_advance_array_keys also relies on this behavior during * (_bt_advance_array_keys also relies on this behavior during
* backward scans.) * backward scans.)
*/ */
if ((key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) && if ((requiredSameDir || requiredOppositeDirOnly) &&
ScanDirectionIsForward(dir)) ScanDirectionIsForward(dir))
*continuescan = false; *continuescan = false;
} }
@@ -2606,15 +2933,7 @@ _bt_check_compare(IndexScanDesc scan, ScanDirection dir,
return false; return false;
} }
/* if (!DatumGetBool(FunctionCall2Coll(&key->sk_func, key->sk_collation,
* Apply the key-checking function, though only if we must.
*
* When a key is required in the opposite-of-scan direction _only_,
* then it must already be satisfied if firstmatch=true indicates that
* an earlier tuple from this same page satisfied it earlier on.
*/
if (!(requiredOppositeDirOnly && firstmatch) &&
!DatumGetBool(FunctionCall2Coll(&key->sk_func, key->sk_collation,
datum, key->sk_argument))) datum, key->sk_argument)))
{ {
/* /*
@@ -2664,7 +2983,8 @@ _bt_check_compare(IndexScanDesc scan, ScanDirection dir,
*/ */
static bool static bool
_bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts, _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
TupleDesc tupdesc, ScanDirection dir, bool *continuescan) TupleDesc tupdesc, ScanDirection dir,
bool forcenonrequired, bool *continuescan)
{ {
ScanKey subkey = (ScanKey) DatumGetPointer(skey->sk_argument); ScanKey subkey = (ScanKey) DatumGetPointer(skey->sk_argument);
int32 cmpresult = 0; int32 cmpresult = 0;
@@ -2704,7 +3024,11 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
if (isNull) if (isNull)
{ {
if (subkey->sk_flags & SK_BT_NULLS_FIRST) if (forcenonrequired)
{
/* treating scan's keys as non-required */
}
else if (subkey->sk_flags & SK_BT_NULLS_FIRST)
{ {
/* /*
* Since NULLs are sorted before non-NULLs, we know we have * Since NULLs are sorted before non-NULLs, we know we have
@@ -2758,8 +3082,12 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
*/ */
Assert(subkey != (ScanKey) DatumGetPointer(skey->sk_argument)); Assert(subkey != (ScanKey) DatumGetPointer(skey->sk_argument));
subkey--; subkey--;
if ((subkey->sk_flags & SK_BT_REQFWD) && if (forcenonrequired)
ScanDirectionIsForward(dir)) {
/* treating scan's keys as non-required */
}
else if ((subkey->sk_flags & SK_BT_REQFWD) &&
ScanDirectionIsForward(dir))
*continuescan = false; *continuescan = false;
else if ((subkey->sk_flags & SK_BT_REQBKWD) && else if ((subkey->sk_flags & SK_BT_REQBKWD) &&
ScanDirectionIsBackward(dir)) ScanDirectionIsBackward(dir))
@@ -2811,7 +3139,7 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
break; break;
} }
if (!result) if (!result && !forcenonrequired)
{ {
/* /*
* Tuple fails this qual. If it's a required qual for the current * Tuple fails this qual. If it's a required qual for the current
@@ -2855,6 +3183,8 @@ _bt_checkkeys_look_ahead(IndexScanDesc scan, BTReadPageState *pstate,
OffsetNumber aheadoffnum; OffsetNumber aheadoffnum;
IndexTuple ahead; IndexTuple ahead;
Assert(!pstate->forcenonrequired);
/* Avoid looking ahead when comparing the page high key */ /* Avoid looking ahead when comparing the page high key */
if (pstate->offnum < pstate->minoff) if (pstate->offnum < pstate->minoff)
return; return;

View File

@@ -1059,6 +1059,7 @@ typedef struct BTScanOpaqueData
/* workspace for SK_SEARCHARRAY support */ /* workspace for SK_SEARCHARRAY support */
int numArrayKeys; /* number of equality-type array keys */ int numArrayKeys; /* number of equality-type array keys */
bool skipScan; /* At least one skip array in arrayKeys[]? */
bool needPrimScan; /* New prim scan to continue in current dir? */ bool needPrimScan; /* New prim scan to continue in current dir? */
bool scanBehind; /* Check scan not still behind on next page? */ bool scanBehind; /* Check scan not still behind on next page? */
bool oppositeDirCheck; /* scanBehind opposite-scan-dir check? */ bool oppositeDirCheck; /* scanBehind opposite-scan-dir check? */
@@ -1105,6 +1106,8 @@ typedef struct BTReadPageState
IndexTuple finaltup; /* Needed by scans with array keys */ IndexTuple finaltup; /* Needed by scans with array keys */
Page page; /* Page being read */ Page page; /* Page being read */
bool firstpage; /* page is first for primitive scan? */ bool firstpage; /* page is first for primitive scan? */
bool forcenonrequired; /* treat all keys as nonrequired? */
int startikey; /* start comparisons from this scan key */
/* Per-tuple input parameters, set by _bt_readpage for _bt_checkkeys */ /* Per-tuple input parameters, set by _bt_readpage for _bt_checkkeys */
OffsetNumber offnum; /* current tuple's page offset number */ OffsetNumber offnum; /* current tuple's page offset number */
@@ -1113,13 +1116,6 @@ typedef struct BTReadPageState
OffsetNumber skip; /* Array keys "look ahead" skip offnum */ OffsetNumber skip; /* Array keys "look ahead" skip offnum */
bool continuescan; /* Terminate ongoing (primitive) index scan? */ bool continuescan; /* Terminate ongoing (primitive) index scan? */
/*
* Input and output parameters, set and unset by both _bt_readpage and
* _bt_checkkeys to manage precheck optimizations
*/
bool prechecked; /* precheck set continuescan to 'true'? */
bool firstmatch; /* at least one match so far? */
/* /*
* Private _bt_checkkeys state used to manage "look ahead" optimization * Private _bt_checkkeys state used to manage "look ahead" optimization
* (only used during scans with array keys) * (only used during scans with array keys)
@@ -1327,6 +1323,7 @@ extern bool _bt_checkkeys(IndexScanDesc scan, BTReadPageState *pstate, bool arra
IndexTuple tuple, int tupnatts); IndexTuple tuple, int tupnatts);
extern bool _bt_scanbehind_checkkeys(IndexScanDesc scan, ScanDirection dir, extern bool _bt_scanbehind_checkkeys(IndexScanDesc scan, ScanDirection dir,
IndexTuple finaltup); IndexTuple finaltup);
extern void _bt_set_startikey(IndexScanDesc scan, BTReadPageState *pstate);
extern void _bt_killitems(IndexScanDesc scan); extern void _bt_killitems(IndexScanDesc scan);
extern BTCycleId _bt_vacuum_cycleid(Relation rel); extern BTCycleId _bt_vacuum_cycleid(Relation rel);
extern BTCycleId _bt_start_vacuum(Relation rel); extern BTCycleId _bt_start_vacuum(Relation rel);