mirror of
https://github.com/postgres/postgres.git
synced 2025-09-03 15:22:11 +03:00
Doc: Describe CREATE INDEX deduplication strategy.
The B-Tree index deduplication strategy used during CREATE INDEX and REINDEX differs from the lazy strategy used by retail inserts. Make that clear by adding a new paragraph to the B-Tree implementation section of the documentation. In passing, do some copy-editing of nearby deduplication documentation.
This commit is contained in:
@@ -622,12 +622,13 @@ equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
|
|||||||
</para>
|
</para>
|
||||||
<note>
|
<note>
|
||||||
<para>
|
<para>
|
||||||
While NULL is generally not considered to be equal to any other
|
B-Tree deduplication is just as effective with
|
||||||
value, including NULL, NULL is nevertheless treated as just
|
<quote>duplicates</quote> that contain a NULL value, even though
|
||||||
another value from the domain of indexed values by the B-Tree
|
NULL values are never equal to each other according to the
|
||||||
implementation (except when enforcing uniqueness in a unique
|
<literal>=</literal> member of any B-Tree operator class. As far
|
||||||
index). B-Tree deduplication is therefore just as effective with
|
as any part of the implementation that understands the on-disk
|
||||||
<quote>duplicates</quote> that contain a NULL value.
|
B-Tree structure is concerned, NULL is just another value from the
|
||||||
|
domain of indexed values.
|
||||||
</para>
|
</para>
|
||||||
</note>
|
</note>
|
||||||
<para>
|
<para>
|
||||||
@@ -642,6 +643,20 @@ equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
|
|||||||
see a moderate performance benefit from using deduplication.
|
see a moderate performance benefit from using deduplication.
|
||||||
Deduplication is enabled by default.
|
Deduplication is enabled by default.
|
||||||
</para>
|
</para>
|
||||||
|
<para>
|
||||||
|
<command>CREATE INDEX</command> and <command>REINDEX</command>
|
||||||
|
apply deduplication to create posting list tuples, though the
|
||||||
|
strategy they use is slightly different. Each group of duplicate
|
||||||
|
ordinary tuples encountered in the sorted input taken from the
|
||||||
|
table is merged into a posting list tuple
|
||||||
|
<emphasis>before</emphasis> being added to the current pending leaf
|
||||||
|
page. Individual posting list tuples are packed with as many
|
||||||
|
<acronym>TID</acronym>s as possible. Leaf pages are written out in
|
||||||
|
the usual way, without any separate deduplication pass. This
|
||||||
|
strategy is well-suited to <command>CREATE INDEX</command> and
|
||||||
|
<command>REINDEX</command> because they are once-off batch
|
||||||
|
operations.
|
||||||
|
</para>
|
||||||
<para>
|
<para>
|
||||||
Write-heavy workloads that don't benefit from deduplication due to
|
Write-heavy workloads that don't benefit from deduplication due to
|
||||||
having few or no duplicate values in indexes will incur a small,
|
having few or no duplicate values in indexes will incur a small,
|
||||||
@@ -657,17 +672,22 @@ equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
|
|||||||
B-Tree indexes are not directly aware that under MVCC, there might
|
B-Tree indexes are not directly aware that under MVCC, there might
|
||||||
be multiple extant versions of the same logical table row; to an
|
be multiple extant versions of the same logical table row; to an
|
||||||
index, each tuple is an independent object that needs its own index
|
index, each tuple is an independent object that needs its own index
|
||||||
entry. Thus, an update of a row always creates all-new index
|
entry. <quote>Version duplicates</quote> may sometimes accumulate
|
||||||
entries for the row, even if the key values did not change. Some
|
and adversely affect query latency and throughput. This typically
|
||||||
workloads suffer from index bloat caused by these
|
occurs with <command>UPDATE</command>-heavy workloads where most
|
||||||
implementation-level version duplicates (this is typically a
|
individual updates cannot apply the <acronym>HOT</acronym>
|
||||||
problem for <command>UPDATE</command>-heavy workloads that cannot
|
optimization (often because at least one indexed column gets
|
||||||
apply the <acronym>HOT</acronym> optimization due to modifying at
|
modified, necessitating a new set of index tuple versions —
|
||||||
least one indexed column). B-Tree deduplication does not
|
one new tuple for <emphasis>each and every</emphasis> index). In
|
||||||
distinguish between these implementation-level version duplicates
|
effect, B-Tree deduplication ameliorates index bloat caused by
|
||||||
and conventional duplicates. Deduplication can nevertheless help
|
version churn. Note that even the tuples from a unique index are
|
||||||
with controlling index bloat caused by implementation-level version
|
not necessarily <emphasis>physically</emphasis> unique when stored
|
||||||
churn.
|
on disk due to version churn. The deduplication optimization is
|
||||||
|
selectively applied within unique indexes. It targets those pages
|
||||||
|
that appear to have version duplicates. The high level goal is to
|
||||||
|
give <command>VACUUM</command> more time to run before an
|
||||||
|
<quote>unnecessary</quote> page split caused by version churn can
|
||||||
|
take place.
|
||||||
</para>
|
</para>
|
||||||
<tip>
|
<tip>
|
||||||
<para>
|
<para>
|
||||||
|
Reference in New Issue
Block a user