Consider secondary factors during nbtree splits.

Teach nbtree to give some consideration to how "distinguishing" candidate leaf page split points are. This should not noticeably affect the balance of free space within each half of the split, while still making suffix truncation truncate away significantly more attributes on average. The logic for choosing a leaf split point now uses a fallback mode in the case where the page is full of duplicates and it isn't possible to find even a minimally distinguishing split point. When the page is full of duplicates, the split should pack the left half very tightly, while leaving the right half mostly empty. Our assumption is that logical duplicates will almost always be inserted in ascending heap TID order with v4 indexes. This strategy leaves most of the free space on the half of the split that will likely be where future logical duplicates of the same value need to be placed. The number of cycles added is not very noticeable. This is important because deciding on a split point takes place while at least one exclusive buffer lock is held. We avoid using authoritative insertion scankey comparisons to save cycles, unlike suffix truncation proper. We use a faster binary comparison instead. Note that even pg_upgrade'd v3 indexes make use of these optimizations. Benchmarking has shown that even v3 indexes benefit, despite the fact that suffix truncation will only truncate non-key attributes in INCLUDE indexes. Grouping relatively similar tuples together is beneficial in and of itself, since it reduces the number of leaf pages that must be accessed by subsequent index scans. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/CAH2-WzmmoLNQOj9mAD78iQHfWLJDszHEDrAzGTUMG3mVh5xWPw@mail.gmail.com
2025-06-30 21:42:05 +03:00 · 2019-03-20 10:12:19 -07:00
parent dd299df818
commit fab2502433
6 changed files with 961 additions and 291 deletions
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@ -160,11 +160,15 @@ typedef struct BTMetaPageData
 * For pages above the leaf level, we use a fixed 70% fillfactor.
 * The fillfactor is applied during index build and when splitting
 * a rightmost page; when splitting non-rightmost pages we try to
- * divide the data equally.
+ * divide the data equally.  When splitting a page that's entirely
+ * filled with a single value (duplicates), the effective leaf-page
+ * fillfactor is 96%, regardless of whether the page is a rightmost
+ * page.
 */
 #define BTREE_MIN_FILLFACTOR		10
 #define BTREE_DEFAULT_FILLFACTOR	90
 #define BTREE_NONLEAF_FILLFACTOR	70
+#define BTREE_SINGLEVAL_FILLFACTOR	96

 /*
 *	In general, the btree code tries to localize its knowledge about
@ -711,6 +715,13 @@ extern bool _bt_doinsert(Relation rel, IndexTuple itup,
 extern Buffer _bt_getstackbuf(Relation rel, BTStack stack);
 extern void _bt_finish_split(Relation rel, Buffer bbuf, BTStack stack);

+/*
+ * prototypes for functions in nbtsplitloc.c
+ */
+extern OffsetNumber _bt_findsplitloc(Relation rel, Page page,
+				 OffsetNumber newitemoff, Size newitemsz, IndexTuple newitem,
+				 bool *newitemonleft);
+
 /*
 * prototypes for functions in nbtpage.c
 */
@ -777,6 +788,8 @@ extern bool btproperty(Oid index_oid, int attno,
 		   bool *res, bool *isnull);
 extern IndexTuple _bt_truncate(Relation rel, IndexTuple lastleft,
 			 IndexTuple firstright, BTScanInsert itup_key);
+extern int _bt_keep_natts_fast(Relation rel, IndexTuple lastleft,
+					IndexTuple firstright);
 extern bool _bt_check_natts(Relation rel, bool heapkeyspace, Page page,
 				OffsetNumber offnum);
 extern void _bt_check_third_page(Relation rel, Relation heap,