1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-14 18:42:34 +03:00

Support varlena fields with single-byte headers and unaligned storage.

This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields.  While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.

Greg Stark with some help from Tom Lane
This commit is contained in:
Tom Lane
2007-04-06 04:21:44 +00:00
parent d44163953c
commit 3e23b68dac
38 changed files with 1802 additions and 805 deletions

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/storage.sgml,v 1.16 2007/04/03 04:14:26 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/storage.sgml,v 1.17 2007/04/06 04:21:41 tgl Exp $ -->
<chapter id="storage">
@ -210,18 +210,27 @@ value, but in some cases more efficient approaches are possible.)
</para>
<para>
<acronym>TOAST</> usurps the high-order two bits of the varlena length word,
<acronym>TOAST</> usurps two bits of the varlena length word (the high-order
bits on big-endian machines, the low-order bits on little-endian machines),
thereby limiting the logical size of any value of a <acronym>TOAST</>-able
data type to 1 GB (2<superscript>30</> - 1 bytes). When both bits are zero,
the value is an ordinary un-<acronym>TOAST</>ed value of the data type. One
of these bits, if set, indicates that the value has been compressed and must
be decompressed before use. The other bit, if set, indicates that the value
has been stored out-of-line. In this case the remainder of the value is
actually just a pointer, and the correct data has to be found elsewhere. When
both bits are set, the out-of-line data has been compressed too. In each case
the length in the low-order bits of the varlena word indicates the actual size
of the datum, not the size of the logical value that would be extracted by
decompression or fetching of the out-of-line data.
the value is an ordinary un-<acronym>TOAST</>ed value of the data type, and
the remaining bits of the length word give the total datum size (including
length word) in bytes. When the highest-order or lowest-order bit is set,
the value has only a single-byte header instead of the normal four-byte
header, and the remaining bits give the total datum size (including length
byte) in bytes. As a special case, if the remaining bits are all zero
(which would be impossible for a self-inclusive length), the value is a
pointer to out-of-line data stored in a separate TOAST table. (The size of
a TOAST pointer is known a priori, so it doesn't need to be represented in
the header.) Values with single-byte headers aren't aligned on any particular
boundary, either. Lastly, when the highest-order or lowest-order bit is
clear but the adjacent bit is set, the content of the datum has been
compressed and must be decompressed before use. In this case the remaining
bits of the length word give the total size of the compressed datum, not the
original data. Note that compression is also possible for out-of-line data
but the varlena header does not tell whether it has occurred &mdash;
the content of the TOAST pointer tells that, instead.
</para>
<para>
@ -254,8 +263,8 @@ retrieval of the values. A pointer datum representing an out-of-line
<acronym>TOAST</> table in which to look and the OID of the specific value
(its <structfield>chunk_id</>). For convenience, pointer datums also store the
logical datum size (original uncompressed data length) and actual stored size
(different if compression was applied). Allowing for the varlena header word,
the total size of a <acronym>TOAST</> pointer datum is therefore 20 bytes
(different if compression was applied). Allowing for the varlena header byte,
the total size of a <acronym>TOAST</> pointer datum is therefore 17 bytes
regardless of the actual size of the represented value.
</para>
@ -280,7 +289,9 @@ The <acronym>TOAST</> code recognizes four different strategies for storing
<listitem>
<para>
<literal>PLAIN</literal> prevents either compression or
out-of-line storage. This is the only possible strategy for
out-of-line storage; furthermore it disables use of single-byte headers
for varlena types.
This is the only possible strategy for
columns of non-<acronym>TOAST</>-able data types.
</para>
</listitem>
@ -562,7 +573,7 @@ data. Empty in ordinary tables.</entry>
<para>
All table rows are structured in the same way. There is a fixed-size
header (occupying 27 bytes on most machines), followed by an optional null
header (occupying 23 bytes on most machines), followed by an optional null
bitmap, an optional object ID field, and the user data. The header is
detailed
in <xref linkend="heaptupleheaderdata-table">. The actual user data
@ -604,12 +615,6 @@ data. Empty in ordinary tables.</entry>
<entry>4 bytes</entry>
<entry>insert XID stamp</entry>
</row>
<row>
<entry>t_cmin</entry>
<entry>CommandId</entry>
<entry>4 bytes</entry>
<entry>insert CID stamp</entry>
</row>
<row>
<entry>t_xmax</entry>
<entry>TransactionId</entry>
@ -617,10 +622,10 @@ data. Empty in ordinary tables.</entry>
<entry>delete XID stamp</entry>
</row>
<row>
<entry>t_cmax</entry>
<entry>t_cid</entry>
<entry>CommandId</entry>
<entry>4 bytes</entry>
<entry>delete CID stamp (overlays with t_xvac)</entry>
<entry>insert and/or delete CID stamp (overlays with t_xvac)</entry>
</row>
<row>
<entry>t_xvac</entry>
@ -635,10 +640,10 @@ data. Empty in ordinary tables.</entry>
<entry>current TID of this or newer row version</entry>
</row>
<row>
<entry>t_natts</entry>
<entry>t_infomask2</entry>
<entry>int16</entry>
<entry>2 bytes</entry>
<entry>number of attributes</entry>
<entry>number of attributes, plus various flag bits</entry>
</row>
<row>
<entry>t_infomask</entry>
@ -682,7 +687,7 @@ data. Empty in ordinary tables.</entry>
fixed width field, then all the bytes are simply placed. If it's a
variable length field (attlen = -1) then it's a bit more complicated.
All variable-length datatypes share the common header structure
<type>varattrib</type>, which includes the total length of the stored
<type>struct varlena</type>, which includes the total length of the stored
value and some flag bits. Depending on the flags, the data can be either
inline or in a <acronym>TOAST</> table;
it might be compressed, too (see <xref linkend="storage-toast">).