mirror of
https://github.com/postgres/postgres.git
synced 2025-05-03 22:24:49 +03:00
Explain subtlety in nbtree locking protocol.
The Postgres approach to coupling locks during an ascent of the tree is slightly different to the approach taken by Lehman and Yao. Add a new paragraph to the "Differences to the Lehman & Yao algorithm" section of the nbtree README that explains the similarities and differences.
This commit is contained in:
parent
989d23b04b
commit
867d25ccb4
@ -136,6 +136,25 @@ since we saw the root. We can identify the correct tree level by means of
|
|||||||
the level numbers stored in each page. The situation is rare enough that
|
the level numbers stored in each page. The situation is rare enough that
|
||||||
we do not need a more efficient solution.)
|
we do not need a more efficient solution.)
|
||||||
|
|
||||||
|
Lehman and Yao must couple/chain locks as part of moving right when
|
||||||
|
relocating a child page's downlink during an ascent of the tree. This is
|
||||||
|
the only point where Lehman and Yao have to simultaneously hold three
|
||||||
|
locks (a lock on the child, the original parent, and the original parent's
|
||||||
|
right sibling). We don't need to couple internal page locks for pages on
|
||||||
|
the same level, though. We match a child's block number to a downlink
|
||||||
|
from a pivot tuple one level up, whereas Lehman and Yao match on the
|
||||||
|
separator key associated with the downlink that was followed during the
|
||||||
|
initial descent. We can release the lock on the original parent page
|
||||||
|
before acquiring a lock on its right sibling, since there is never any
|
||||||
|
need to deal with the case where the separator key that we must relocate
|
||||||
|
becomes the original parent's high key. Lanin and Shasha don't couple
|
||||||
|
locks here either, though they also don't couple locks between levels
|
||||||
|
during ascents. They are willing to "wait and try again" to avoid races.
|
||||||
|
Their algorithm is optimistic, which means that "an insertion holds no
|
||||||
|
more than one write lock at a time during its ascent". We more or less
|
||||||
|
stick with Lehman and Yao's approach of conservatively coupling parent and
|
||||||
|
child locks when ascending the tree, since it's far simpler.
|
||||||
|
|
||||||
Lehman and Yao assume fixed-size keys, but we must deal with
|
Lehman and Yao assume fixed-size keys, but we must deal with
|
||||||
variable-size keys. Therefore there is not a fixed maximum number of
|
variable-size keys. Therefore there is not a fixed maximum number of
|
||||||
keys per page; we just stuff in as many as will fit. When we split a
|
keys per page; we just stuff in as many as will fit. When we split a
|
||||||
@ -224,13 +243,7 @@ it, but it's still linked to its siblings.
|
|||||||
|
|
||||||
(Note: Lanin and Shasha prefer to make the key space move left, but their
|
(Note: Lanin and Shasha prefer to make the key space move left, but their
|
||||||
argument for doing so hinges on not having left-links, which we have
|
argument for doing so hinges on not having left-links, which we have
|
||||||
anyway. So we simplify the algorithm by moving the key space right. Note
|
anyway. So we simplify the algorithm by moving the key space right.)
|
||||||
also that Lanin and Shasha optimistically avoid holding multiple locks as
|
|
||||||
the tree is ascended. They're willing to release all locks and retry in
|
|
||||||
"rare" cases where the correct location for a new downlink cannot be found
|
|
||||||
immediately. We prefer to stick with Lehman and Yao's approach of
|
|
||||||
pessimistically coupling buffer locks when ascending the tree, since it's
|
|
||||||
far simpler.)
|
|
||||||
|
|
||||||
To preserve consistency on the parent level, we cannot merge the key space
|
To preserve consistency on the parent level, we cannot merge the key space
|
||||||
of a page into its right sibling unless the right sibling is a child of
|
of a page into its right sibling unless the right sibling is a child of
|
||||||
|
@ -2019,6 +2019,9 @@ _bt_getstackbuf(Relation rel, BTStack stack, BlockNumber child)
|
|||||||
|
|
||||||
/*
|
/*
|
||||||
* The item we're looking for moved right at least one page.
|
* The item we're looking for moved right at least one page.
|
||||||
|
*
|
||||||
|
* Lehman and Yao couple/chain locks when moving right here, which we
|
||||||
|
* can avoid. See nbtree/README.
|
||||||
*/
|
*/
|
||||||
if (P_RIGHTMOST(opaque))
|
if (P_RIGHTMOST(opaque))
|
||||||
{
|
{
|
||||||
|
Loading…
x
Reference in New Issue
Block a user