1
0
mirror of https://github.com/postgres/postgres.git synced 2025-10-25 13:17:41 +03:00

Improve nbtree array primitive scan scheduling.

Add a new scheduling heuristic: don't end the ongoing primitive index
scan immediately (at the point where _bt_advance_array_keys notices that
the next set of matching tuples must be on a later page) if the primscan
already managed to step right/left from its first leaf page.  Schedule a
recheck against the next sibling leaf page's finaltup instead.

The new heuristic tends to avoid scenarios where the top-level scan
repeatedly starts and ends primitive index scans that each read only one
leaf page from a group of neighboring leaf pages.  Affected top-level
scans will now tend to step forward (or backward) through the index
instead, without wasting cycles on descending the index anew.

The recheck mechanism isn't exactly new.  But up until now it has only
been used to deal with edge cases involving high key finaltups with one
or more truncated -inf attributes that _bt_advance_array_keys deemed
"provisionally satisfied" (satisfied for the purposes of allowing the
scan to step onto the next page, subject to recheck once on that page).
The mechanism was added by commit 5bf748b8, which invented the general
concept of primitive scan scheduling.  It was later enhanced by commit
79fa7b3b, which taught it about cases involving -inf attributes that
satisfy inequality scan keys required in the opposite-to-scan direction
only (arguably, they should have been covered by the earliest version).
Now the recheck mechanism can be applied based on scan-level heuristics,
which have nothing to do with truncated high keys.  Now rechecks might
be performed by _bt_readpage when scanning in _either_ scan direction.

The theory behind the new heuristic is that any primitive scan that
makes it past its first leaf page is one that is already likely to have
arrays whose key values match index tuples that are closely clustered
together in the index.  The rules that determine whether we ever get
past the first page are still conservative (that'll still only happen
when pstate.finaltup strongly suggests that it's the right thing to do).
Surviving past the first leaf page is a strong signal in itself.

Preparation for an upcoming patch that will add skip scan optimizations
to nbtree.  That'll work by adding skip arrays, which behave similarly
to SAOP arrays, but generate their elements procedurally and on-demand.

Note that this commit isn't specifically concerned with skip arrays; the
scheduling logic doesn't (and won't) condition anything on whether the
scan uses skip arrays, SAOP arrays, or some combination of the two
(which seems like a good general principle for _bt_advance_array_keys).
While the problems that this commit ameliorates are more likely with
skip arrays (at least in practice), SAOP arrays (or those with very
dense, contiguous array elements) are also affected.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzkz0wPe6+02kr+hC+JJNKfGtjGTzpG3CFVTQmKwWNrXNw@mail.gmail.com
This commit is contained in:
Peter Geoghegan
2025-03-22 13:02:18 -04:00
parent e215166c9c
commit 9a2e2a285a
3 changed files with 162 additions and 138 deletions

View File

@@ -1043,8 +1043,8 @@ typedef struct BTScanOpaqueData
/* workspace for SK_SEARCHARRAY support */
int numArrayKeys; /* number of equality-type array keys */
bool needPrimScan; /* New prim scan to continue in current dir? */
bool scanBehind; /* Last array advancement matched -inf attr? */
bool oppositeDirCheck; /* explicit scanBehind recheck needed? */
bool scanBehind; /* Check scan not still behind on next page? */
bool oppositeDirCheck; /* scanBehind opposite-scan-dir check? */
BTArrayKeyInfo *arrayKeys; /* info about each equality-type array key */
FmgrInfo *orderProcs; /* ORDER procs for required equality keys */
MemoryContext arrayContext; /* scan-lifespan context for array data */
@@ -1087,11 +1087,12 @@ typedef struct BTReadPageState
OffsetNumber maxoff; /* Highest non-pivot tuple's offset */
IndexTuple finaltup; /* Needed by scans with array keys */
Page page; /* Page being read */
bool firstpage; /* page is first for primitive scan? */
/* Per-tuple input parameters, set by _bt_readpage for _bt_checkkeys */
OffsetNumber offnum; /* current tuple's page offset number */
/* Output parameter, set by _bt_checkkeys for _bt_readpage */
/* Output parameters, set by _bt_checkkeys for _bt_readpage */
OffsetNumber skip; /* Array keys "look ahead" skip offnum */
bool continuescan; /* Terminate ongoing (primitive) index scan? */
@@ -1298,8 +1299,8 @@ extern int _bt_binsrch_array_skey(FmgrInfo *orderproc,
extern void _bt_start_array_keys(IndexScanDesc scan, ScanDirection dir);
extern bool _bt_checkkeys(IndexScanDesc scan, BTReadPageState *pstate, bool arrayKeys,
IndexTuple tuple, int tupnatts);
extern bool _bt_oppodir_checkkeys(IndexScanDesc scan, ScanDirection dir,
IndexTuple finaltup);
extern bool _bt_scanbehind_checkkeys(IndexScanDesc scan, ScanDirection dir,
IndexTuple finaltup);
extern void _bt_killitems(IndexScanDesc scan);
extern BTCycleId _bt_vacuum_cycleid(Relation rel);
extern BTCycleId _bt_start_vacuum(Relation rel);