mirror of
https://github.com/postgres/postgres.git
synced 2025-06-26 12:21:12 +03:00
nbtree README: move VACUUM linear scan section.
Discuss VACUUM's linear scan after discussion of tuple deletion by VACUUM, but before discussion of page deletion by VACUUM. This progression is a lot more natural. Also tweak the wording a little. It seems unnecessary to talk about how it worked prior to PostgreSQL 8.2.
This commit is contained in:
@ -214,6 +214,34 @@ page). Since we hold a lock on the lower page (per L&Y) until we have
|
|||||||
re-found the parent item that links to it, we can be assured that the
|
re-found the parent item that links to it, we can be assured that the
|
||||||
parent item does still exist and can't have been deleted.
|
parent item does still exist and can't have been deleted.
|
||||||
|
|
||||||
|
VACUUM's linear scan, concurrent page splits
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
|
VACUUM accesses the index by doing a linear scan to search for deletable
|
||||||
|
TIDs, while considering the possibility of deleting empty pages in
|
||||||
|
passing. This is in physical/block order, not logical/keyspace order.
|
||||||
|
The tricky part of this is avoiding missing any deletable tuples in the
|
||||||
|
presence of concurrent page splits: a page split could easily move some
|
||||||
|
tuples from a page not yet passed over by the sequential scan to a
|
||||||
|
lower-numbered page already passed over.
|
||||||
|
|
||||||
|
To implement this, we provide a "vacuum cycle ID" mechanism that makes it
|
||||||
|
possible to determine whether a page has been split since the current
|
||||||
|
btbulkdelete cycle started. If btbulkdelete finds a page that has been
|
||||||
|
split since it started, and has a right-link pointing to a lower page
|
||||||
|
number, then it temporarily suspends its sequential scan and visits that
|
||||||
|
page instead. It must continue to follow right-links and vacuum dead
|
||||||
|
tuples until reaching a page that either hasn't been split since
|
||||||
|
btbulkdelete started, or is above the location of the outer sequential
|
||||||
|
scan. Then it can resume the sequential scan. This ensures that all
|
||||||
|
tuples are visited. It may be that some tuples are visited twice, but
|
||||||
|
that has no worse effect than an inaccurate index tuple count (and we
|
||||||
|
can't guarantee an accurate count anyway in the face of concurrent
|
||||||
|
activity). Note that this still works if the has-been-recently-split test
|
||||||
|
has a small probability of false positives, so long as it never gives a
|
||||||
|
false negative. This makes it possible to implement the test with a small
|
||||||
|
counter value stored on each index page.
|
||||||
|
|
||||||
Deleting entire pages during VACUUM
|
Deleting entire pages during VACUUM
|
||||||
-----------------------------------
|
-----------------------------------
|
||||||
|
|
||||||
@ -371,33 +399,6 @@ as part of the atomic update for the delete (either way, the metapage has
|
|||||||
to be the last page locked in the update to avoid deadlock risks). This
|
to be the last page locked in the update to avoid deadlock risks). This
|
||||||
avoids race conditions if two such operations are executing concurrently.
|
avoids race conditions if two such operations are executing concurrently.
|
||||||
|
|
||||||
VACUUM needs to do a linear scan of an index to search for deleted pages
|
|
||||||
that can be reclaimed because they are older than all open transactions.
|
|
||||||
For efficiency's sake, we'd like to use the same linear scan to search for
|
|
||||||
deletable tuples. Before Postgres 8.2, btbulkdelete scanned the leaf pages
|
|
||||||
in index order, but it is possible to visit them in physical order instead.
|
|
||||||
The tricky part of this is to avoid missing any deletable tuples in the
|
|
||||||
presence of concurrent page splits: a page split could easily move some
|
|
||||||
tuples from a page not yet passed over by the sequential scan to a
|
|
||||||
lower-numbered page already passed over. (This wasn't a concern for the
|
|
||||||
index-order scan, because splits always split right.) To implement this,
|
|
||||||
we provide a "vacuum cycle ID" mechanism that makes it possible to
|
|
||||||
determine whether a page has been split since the current btbulkdelete
|
|
||||||
cycle started. If btbulkdelete finds a page that has been split since
|
|
||||||
it started, and has a right-link pointing to a lower page number, then
|
|
||||||
it temporarily suspends its sequential scan and visits that page instead.
|
|
||||||
It must continue to follow right-links and vacuum dead tuples until
|
|
||||||
reaching a page that either hasn't been split since btbulkdelete started,
|
|
||||||
or is above the location of the outer sequential scan. Then it can resume
|
|
||||||
the sequential scan. This ensures that all tuples are visited. It may be
|
|
||||||
that some tuples are visited twice, but that has no worse effect than an
|
|
||||||
inaccurate index tuple count (and we can't guarantee an accurate count
|
|
||||||
anyway in the face of concurrent activity). Note that this still works
|
|
||||||
if the has-been-recently-split test has a small probability of false
|
|
||||||
positives, so long as it never gives a false negative. This makes it
|
|
||||||
possible to implement the test with a small counter value stored on each
|
|
||||||
index page.
|
|
||||||
|
|
||||||
Fastpath For Index Insertion
|
Fastpath For Index Insertion
|
||||||
----------------------------
|
----------------------------
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user