mirror of
https://github.com/postgres/postgres.git
synced 2025-07-30 11:03:19 +03:00
nbtree README: Add note about latestRemovedXid.
Point out that index tuple deletion generally needs a latestRemovedXid
value for the deletion operation's WAL record. This is bound to be the
most expensive part of the whole deletion operation now that it takes
place up front, during original execution.
This was arguably an oversight in commit 558a9165e0
, which moved the
work required to generate these values from index deletion REDO routines
to original execution of index deletion operations.
This commit is contained in:
@ -490,24 +490,33 @@ lock on the leaf page).
|
|||||||
Once an index tuple has been marked LP_DEAD it can actually be deleted
|
Once an index tuple has been marked LP_DEAD it can actually be deleted
|
||||||
from the index immediately; since index scans only stop "between" pages,
|
from the index immediately; since index scans only stop "between" pages,
|
||||||
no scan can lose its place from such a deletion. We separate the steps
|
no scan can lose its place from such a deletion. We separate the steps
|
||||||
because we allow LP_DEAD to be set with only a share lock (it's exactly
|
because we allow LP_DEAD to be set with only a share lock (it's like a
|
||||||
like a hint bit for a heap tuple), but physically removing tuples requires
|
hint bit for a heap tuple), but physically deleting tuples requires an
|
||||||
exclusive lock. Also, delaying the deletion often allows us to pick up
|
exclusive lock. We also need to generate a latestRemovedXid value for
|
||||||
extra index tuples that weren't initially safe for index scans to mark
|
each deletion operation's WAL record, which requires additional
|
||||||
LP_DEAD. We do this with index tuples whose TIDs point to the same table
|
coordinating with the tableam when the deletion actually takes place.
|
||||||
blocks as an LP_DEAD-marked tuple. They're practically free to check in
|
(This latestRemovedXid value may be used to generate a recovery conflict
|
||||||
passing, and have a pretty good chance of being safe to delete due to
|
during subsequent REDO of the record by a standby.)
|
||||||
various locality effects.
|
|
||||||
|
|
||||||
We only try to delete LP_DEAD tuples (and nearby tuples) when we are
|
Delaying and batching index tuple deletion like this enables a further
|
||||||
otherwise faced with having to split a page to do an insertion (and hence
|
optimization: opportunistic checking of "extra" nearby index tuples
|
||||||
have exclusive lock on it already). Deduplication and bottom-up index
|
(tuples that are not LP_DEAD-set) when they happen to be very cheap to
|
||||||
deletion can also prevent a page split, but simple deletion is always our
|
check in passing (because we already know that the tableam will be
|
||||||
preferred approach. (Note that posting list tuples can only have their
|
visiting their table block to generate a latestRemovedXid value). Any
|
||||||
LP_DEAD bit set when every table TID within the posting list is known
|
index tuples that turn out to be safe to delete will also be deleted.
|
||||||
dead. This isn't much of a problem in practice because LP_DEAD bits are
|
Simple deletion will behave as if the extra tuples that actually turn
|
||||||
just a starting point for simple deletion -- we still manage to perform
|
out to be delete-safe had their LP_DEAD bits set right from the start.
|
||||||
granular deletes of posting list TIDs quite often.)
|
|
||||||
|
Deduplication can also prevent a page split, but index tuple deletion is
|
||||||
|
our preferred approach. Note that posting list tuples can only have
|
||||||
|
their LP_DEAD bit set when every table TID within the posting list is
|
||||||
|
known dead. This isn't much of a problem in practice because LP_DEAD
|
||||||
|
bits are just a starting point for deletion. What really matters is
|
||||||
|
that _some_ deletion operation that targets related nearby-in-table TIDs
|
||||||
|
takes place at some point before the page finally splits. That's all
|
||||||
|
that's required for the deletion process to perform granular removal of
|
||||||
|
groups of dead TIDs from posting list tuples (without the situation ever
|
||||||
|
being allowed to get out of hand).
|
||||||
|
|
||||||
It's sufficient to have an exclusive lock on the index page, not a
|
It's sufficient to have an exclusive lock on the index page, not a
|
||||||
super-exclusive lock, to do deletion of LP_DEAD items. It might seem
|
super-exclusive lock, to do deletion of LP_DEAD items. It might seem
|
||||||
|
Reference in New Issue
Block a user