mirror of
https://github.com/postgres/postgres.git
synced 2025-06-29 10:41:53 +03:00
Make heap TID a tiebreaker nbtree index column.
Make nbtree treat all index tuples as having a heap TID attribute. Index searches can distinguish duplicates by heap TID, since heap TID is always guaranteed to be unique. This general approach has numerous benefits for performance, and is prerequisite to teaching VACUUM to perform "retail index tuple deletion". Naively adding a new attribute to every pivot tuple has unacceptable overhead (it bloats internal pages), so suffix truncation of pivot tuples is added. This will usually truncate away the "extra" heap TID attribute from pivot tuples during a leaf page split, and may also truncate away additional user attributes. This can increase fan-out, especially in a multi-column index. Truncation can only occur at the attribute granularity, which isn't particularly effective, but works well enough for now. A future patch may add support for truncating "within" text attributes by generating truncated key values using new opclass infrastructure. Only new indexes (BTREE_VERSION 4 indexes) will have insertions that treat heap TID as a tiebreaker attribute, or will have pivot tuples undergo suffix truncation during a leaf page split (on-disk compatibility with versions 2 and 3 is preserved). Upgrades to version 4 cannot be performed on-the-fly, unlike upgrades from version 2 to version 3. contrib/amcheck continues to work with version 2 and 3 indexes, while also enforcing stricter invariants when verifying version 4 indexes. These stricter invariants are the same invariants described by "3.1.12 Sequencing" from the Lehman and Yao paper. A later patch will enhance the logic used by nbtree to pick a split point. This patch is likely to negatively impact performance without smarter choices around the precise point to split leaf pages at. Making these two mostly-distinct sets of enhancements into distinct commits seems like it might clarify their design, even though neither commit is particularly useful on its own. The maximum allowed size of new tuples is reduced by an amount equal to the space required to store an extra MAXALIGN()'d TID in a new high key during leaf page splits. The user-facing definition of the "1/3 of a page" restriction is already imprecise, and so does not need to be revised. However, there should be a compatibility note in the v12 release notes. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas, Alexander Korotkov Discussion: https://postgr.es/m/CAH2-WzkVb0Kom=R+88fDFb=JSxZMFvbHVC6Mn9LJ2n=X=kS-Uw@mail.gmail.com
This commit is contained in:
@ -28,8 +28,7 @@
|
||||
#define XLOG_BTREE_INSERT_META 0x20 /* same, plus update metapage */
|
||||
#define XLOG_BTREE_SPLIT_L 0x30 /* add index tuple with split */
|
||||
#define XLOG_BTREE_SPLIT_R 0x40 /* as above, new item on right */
|
||||
#define XLOG_BTREE_SPLIT_L_HIGHKEY 0x50 /* as above, include truncated highkey */
|
||||
#define XLOG_BTREE_SPLIT_R_HIGHKEY 0x60 /* as above, include truncated highkey */
|
||||
/* 0x50 and 0x60 are unused */
|
||||
#define XLOG_BTREE_DELETE 0x70 /* delete leaf index tuples for a page */
|
||||
#define XLOG_BTREE_UNLINK_PAGE 0x80 /* delete a half-dead page */
|
||||
#define XLOG_BTREE_UNLINK_PAGE_META 0x90 /* same, and update metapage */
|
||||
@ -47,6 +46,7 @@
|
||||
*/
|
||||
typedef struct xl_btree_metadata
|
||||
{
|
||||
uint32 version;
|
||||
BlockNumber root;
|
||||
uint32 level;
|
||||
BlockNumber fastroot;
|
||||
@ -80,27 +80,30 @@ typedef struct xl_btree_insert
|
||||
* whole page image. The left page, however, is handled in the normal
|
||||
* incremental-update fashion.
|
||||
*
|
||||
* Note: the four XLOG_BTREE_SPLIT xl_info codes all use this data record.
|
||||
* The _L and _R variants indicate whether the inserted tuple went into the
|
||||
* left or right split page (and thus, whether newitemoff and the new item
|
||||
* are stored or not). The _HIGHKEY variants indicate that we've logged
|
||||
* explicitly left page high key value, otherwise redo should use right page
|
||||
* leftmost key as a left page high key. _HIGHKEY is specified for internal
|
||||
* pages where right page leftmost key is suppressed, and for leaf pages
|
||||
* of covering indexes where high key have non-key attributes truncated.
|
||||
* Note: XLOG_BTREE_SPLIT_L and XLOG_BTREE_SPLIT_R share this data record.
|
||||
* There are two variants to indicate whether the inserted tuple went into the
|
||||
* left or right split page (and thus, whether newitemoff and the new item are
|
||||
* stored or not). We always log the left page high key because suffix
|
||||
* truncation can generate a new leaf high key using user-defined code. This
|
||||
* is also necessary on internal pages, since the first right item that the
|
||||
* left page's high key was based on will have been truncated to zero
|
||||
* attributes in the right page (the original is unavailable from the right
|
||||
* page).
|
||||
*
|
||||
* Backup Blk 0: original page / new left page
|
||||
*
|
||||
* The left page's data portion contains the new item, if it's the _L variant.
|
||||
* (In the _R variants, the new item is one of the right page's tuples.)
|
||||
* If level > 0, an IndexTuple representing the HIKEY of the left page
|
||||
* follows. We don't need this on leaf pages, because it's the same as the
|
||||
* leftmost key in the new right page.
|
||||
* An IndexTuple representing the high key of the left page must follow with
|
||||
* either variant.
|
||||
*
|
||||
* Backup Blk 1: new right page
|
||||
*
|
||||
* The right page's data portion contains the right page's tuples in the
|
||||
* form used by _bt_restore_page.
|
||||
* The right page's data portion contains the right page's tuples in the form
|
||||
* used by _bt_restore_page. This includes the new item, if it's the _R
|
||||
* variant. The right page's tuples also include the right page's high key
|
||||
* with either variant (moved from the left/original page during the split),
|
||||
* unless the split happened to be of the rightmost page on its level, where
|
||||
* there is no high key for new right page.
|
||||
*
|
||||
* Backup Blk 2: next block (orig page's rightlink), if any
|
||||
* Backup Blk 3: child's left sibling, if non-leaf split
|
||||
|
Reference in New Issue
Block a user