1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-28 23:42:10 +03:00

Teach pageinspect about nbtree deduplication.

Add a new bt_metap() column to display the metapage's allequalimage
field.  Also add three new columns to contrib/pageinspect's
bt_page_items() function:

* Add a boolean column ("dead") that displays the LP_DEAD bit value for
each non-pivot tuple.

* Add a TID column ("htid") that displays a single heap TID value for
each tuple.  This is the TID that is returned by BTreeTupleGetHeapTID(),
so comparable values are shown for pivot tuples, plain non-pivot tuples,
and posting list tuples.

* Add a TID array column ("tids") that displays TIDs from each tuple's
posting list, if any.  This works just like the "tids" column from
pageinspect's gin_leafpage_items() function.

No version bump for the pageinspect extension, since there hasn't been a
stable Postgres release since the last version bump (the last bump was
part of commit 58b4cb30).

Author: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-WzmSMmU2eNvY9+a4MNP+z02h6sa-uxZvN3un6jY02ZVBSw@mail.gmail.com
This commit is contained in:
Peter Geoghegan
2020-02-29 12:10:17 -08:00
parent 58c47ccfff
commit 93ee38eade
4 changed files with 273 additions and 53 deletions

View File

@ -300,13 +300,14 @@ test=# SELECT t_ctid, raw_flags, combined_flags
test=# SELECT * FROM bt_metap('pg_cast_oid_index');
-[ RECORD 1 ]-----------+-------
magic | 340322
version | 3
version | 4
root | 1
level | 0
fastroot | 1
fastlevel | 0
oldest_xact | 582
last_cleanup_num_tuples | 1000
allequalimage | f
</screen>
</para>
</listitem>
@ -329,11 +330,11 @@ test=# SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
-[ RECORD 1 ]-+-----
blkno | 1
type | l
live_items | 256
live_items | 224
dead_items | 0
avg_item_size | 12
avg_item_size | 16
page_size | 8192
free_size | 4056
free_size | 3668
btpo_prev | 0
btpo_next | 0
btpo | 0
@ -356,33 +357,75 @@ btpo_flags | 3
<function>bt_page_items</function> returns detailed information about
all of the items on a B-tree index page. For example:
<screen>
test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------
1 | (0,1) | 12 | f | f | 23 27 00 00
2 | (0,2) | 12 | f | f | 24 27 00 00
3 | (0,3) | 12 | f | f | 25 27 00 00
4 | (0,4) | 12 | f | f | 26 27 00 00
5 | (0,5) | 12 | f | f | 27 27 00 00
6 | (0,6) | 12 | f | f | 28 27 00 00
7 | (0,7) | 12 | f | f | 29 27 00 00
8 | (0,8) | 12 | f | f | 2a 27 00 00
test=# SELECT itemoffset, ctid, itemlen, nulls, vars, data, dead, htid, tids[0:2] AS some_tids
FROM bt_page_items(get_raw_page('tenk2_hundred', 5));
itemoffset | ctid | itemlen | nulls | vars | data | dead | htid | some_tids
------------+-----------+---------+-------+------+-------------------------+------+--------+---------------------
1 | (16,1) | 16 | f | f | 30 00 00 00 00 00 00 00 | | |
2 | (16,8292) | 616 | f | f | 24 00 00 00 00 00 00 00 | f | (1,6) | {"(1,6)","(10,22)"}
3 | (16,8292) | 616 | f | f | 25 00 00 00 00 00 00 00 | f | (1,18) | {"(1,18)","(4,22)"}
4 | (16,8292) | 616 | f | f | 26 00 00 00 00 00 00 00 | f | (4,18) | {"(4,18)","(6,17)"}
5 | (16,8292) | 616 | f | f | 27 00 00 00 00 00 00 00 | f | (1,2) | {"(1,2)","(1,19)"}
6 | (16,8292) | 616 | f | f | 28 00 00 00 00 00 00 00 | f | (2,24) | {"(2,24)","(4,11)"}
7 | (16,8292) | 616 | f | f | 29 00 00 00 00 00 00 00 | f | (2,17) | {"(2,17)","(11,2)"}
8 | (16,8292) | 616 | f | f | 2a 00 00 00 00 00 00 00 | f | (0,25) | {"(0,25)","(3,20)"}
9 | (16,8292) | 616 | f | f | 2b 00 00 00 00 00 00 00 | f | (0,10) | {"(0,10)","(0,14)"}
10 | (16,8292) | 616 | f | f | 2c 00 00 00 00 00 00 00 | f | (1,3) | {"(1,3)","(3,9)"}
11 | (16,8292) | 616 | f | f | 2d 00 00 00 00 00 00 00 | f | (6,28) | {"(6,28)","(11,1)"}
12 | (16,8292) | 616 | f | f | 2e 00 00 00 00 00 00 00 | f | (0,27) | {"(0,27)","(1,13)"}
13 | (16,8292) | 616 | f | f | 2f 00 00 00 00 00 00 00 | f | (4,17) | {"(4,17)","(4,21)"}
(13 rows)
</screen>
In a B-tree leaf page, <structfield>ctid</structfield> points to a heap tuple.
In an internal page, the block number part of <structfield>ctid</structfield>
points to another page in the index itself, while the offset part
(the second number) is ignored and is usually 1.
This is a B-tree leaf page. All tuples that point to the table
happen to be posting list tuples (all of which store a total of
100 6 byte TIDs). There is also a <quote>high key</quote> tuple
at <literal>itemoffset</literal> number 1.
<structfield>ctid</structfield> is used to store encoded
information about each tuple in this example, though leaf page
tuples often store a heap TID directly in the
<structfield>ctid</structfield> field instead.
<structfield>tids</structfield> is the list of TIDs stored as a
posting list.
</para>
<para>
In an internal page (not shown), the block number part of
<structfield>ctid</structfield> is a <quote>downlink</quote>,
which is a block number of another page in the index itself.
The offset part (the second number) of
<structfield>ctid</structfield> stores encoded information about
the tuple, such as the number of columns present (suffix
truncation may have removed unneeded suffix columns). Truncated
columns are treated as having the value <quote>minus
infinity</quote>.
</para>
<para>
<structfield>htid</structfield> shows a heap TID for the tuple,
regardless of the underlying tuple representation. This value
may match <structfield>ctid</structfield>, or may be decoded
from the alternative representations used by posting list tuples
and tuples from internal pages. Tuples in internal pages
usually have the implementation level heap TID column truncated
away, which is represented as a NULL
<structfield>htid</structfield> value.
</para>
<para>
Note that the first item on any non-rightmost page (any page with
a non-zero value in the <structfield>btpo_next</structfield> field) is the
page's <quote>high key</quote>, meaning its <structfield>data</structfield>
serves as an upper bound on all items appearing on the page, while
its <structfield>ctid</structfield> field is meaningless. Also, on non-leaf
pages, the first real data item (the first item that is not a high
key) is a <quote>minus infinity</quote> item, with no actual value
in its <structfield>data</structfield> field. Such an item does have a valid
downlink in its <structfield>ctid</structfield> field, however.
its <structfield>ctid</structfield> field does not point to
another block. Also, on internal pages, the first real data
item (the first item that is not a high key) reliably has every
column truncated away, leaving no actual value in its
<structfield>data</structfield> field. Such an item does have a
valid downlink in its <structfield>ctid</structfield> field,
however.
</para>
<para>
For more details about the structure of B-tree indexes, see
<xref linkend="btree-structure"/>. For more details about
deduplication and posting lists, see <xref
linkend="btree-deduplication"/>.
</para>
</listitem>
</varlistentry>
@ -402,17 +445,24 @@ test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
with <function>get_raw_page</function> should be passed as argument. So
the last example could also be rewritten like this:
<screen>
test=# SELECT * FROM bt_page_items(get_raw_page('pg_cast_oid_index', 1));
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------
1 | (0,1) | 12 | f | f | 23 27 00 00
2 | (0,2) | 12 | f | f | 24 27 00 00
3 | (0,3) | 12 | f | f | 25 27 00 00
4 | (0,4) | 12 | f | f | 26 27 00 00
5 | (0,5) | 12 | f | f | 27 27 00 00
6 | (0,6) | 12 | f | f | 28 27 00 00
7 | (0,7) | 12 | f | f | 29 27 00 00
8 | (0,8) | 12 | f | f | 2a 27 00 00
test=# SELECT itemoffset, ctid, itemlen, nulls, vars, data, dead, htid, tids[0:2] AS some_tids
FROM bt_page_items(get_raw_page('tenk2_hundred', 5));
itemoffset | ctid | itemlen | nulls | vars | data | dead | htid | some_tids
------------+-----------+---------+-------+------+-------------------------+------+--------+---------------------
1 | (16,1) | 16 | f | f | 30 00 00 00 00 00 00 00 | | |
2 | (16,8292) | 616 | f | f | 24 00 00 00 00 00 00 00 | f | (1,6) | {"(1,6)","(10,22)"}
3 | (16,8292) | 616 | f | f | 25 00 00 00 00 00 00 00 | f | (1,18) | {"(1,18)","(4,22)"}
4 | (16,8292) | 616 | f | f | 26 00 00 00 00 00 00 00 | f | (4,18) | {"(4,18)","(6,17)"}
5 | (16,8292) | 616 | f | f | 27 00 00 00 00 00 00 00 | f | (1,2) | {"(1,2)","(1,19)"}
6 | (16,8292) | 616 | f | f | 28 00 00 00 00 00 00 00 | f | (2,24) | {"(2,24)","(4,11)"}
7 | (16,8292) | 616 | f | f | 29 00 00 00 00 00 00 00 | f | (2,17) | {"(2,17)","(11,2)"}
8 | (16,8292) | 616 | f | f | 2a 00 00 00 00 00 00 00 | f | (0,25) | {"(0,25)","(3,20)"}
9 | (16,8292) | 616 | f | f | 2b 00 00 00 00 00 00 00 | f | (0,10) | {"(0,10)","(0,14)"}
10 | (16,8292) | 616 | f | f | 2c 00 00 00 00 00 00 00 | f | (1,3) | {"(1,3)","(3,9)"}
11 | (16,8292) | 616 | f | f | 2d 00 00 00 00 00 00 00 | f | (6,28) | {"(6,28)","(11,1)"}
12 | (16,8292) | 616 | f | f | 2e 00 00 00 00 00 00 00 | f | (0,27) | {"(0,27)","(1,13)"}
13 | (16,8292) | 616 | f | f | 2f 00 00 00 00 00 00 00 | f | (4,17) | {"(4,17)","(4,21)"}
(13 rows)
</screen>
All the other details are the same as explained in the previous item.
</para>