mirror of
https://github.com/postgres/postgres.git
synced 2025-09-03 15:22:11 +03:00
Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an in-memory representation that is different from their on-disk format. On-disk formats are typically optimized for minimal size, and in any case they can't contain pointers, so they are often not well-suited for computation. Now a datatype can invent an "expanded" in-memory format that is better suited for its operations, and then pass that around among the C functions that operate on the datatype. There are also provisions (rudimentary as yet) to allow an expanded object to be modified in-place under suitable conditions, so that operations like assignment to an element of an array need not involve copying the entire array. The initial application for this feature is arrays, but it is not hard to foresee using it for other container types like JSON, XML and hstore. I have hopes that it will be useful to PostGIS as well. In this initial implementation, a few heuristics have been hard-wired into plpgsql to improve performance for arrays that are stored in plpgsql variables. We would like to generalize those hacks so that other datatypes can obtain similar improvements, but figuring out some appropriate APIs is left as a task for future work. (The heuristics themselves are probably not optimal yet, either, as they sometimes force expansion of arrays that would be better left alone.) Preliminary performance testing shows impressive speed gains for plpgsql functions that do element-by-element access or update of large arrays. There are other cases that get a little slower, as a result of added array format conversions; but we can hope to improve anything that's annoyingly bad. In any case most applications should see a net win. Tom Lane, reviewed by Andres Freund
This commit is contained in:
@@ -503,8 +503,9 @@ comparison table, in which all the HTML pages were cut down to 7 kB to fit.
|
||||
<acronym>TOAST</> pointers can point to data that is not on disk, but is
|
||||
elsewhere in the memory of the current server process. Such pointers
|
||||
obviously cannot be long-lived, but they are nonetheless useful. There
|
||||
is currently just one sub-case:
|
||||
pointers to <firstterm>indirect</> data.
|
||||
are currently two sub-cases:
|
||||
pointers to <firstterm>indirect</> data and
|
||||
pointers to <firstterm>expanded</> data.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
@@ -518,6 +519,43 @@ that the referenced data survives for as long as the pointer could exist,
|
||||
and there is no infrastructure to help with this.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Expanded <acronym>TOAST</> pointers are useful for complex data types
|
||||
whose on-disk representation is not especially suited for computational
|
||||
purposes. As an example, the standard varlena representation of a
|
||||
<productname>PostgreSQL</> array includes dimensionality information, a
|
||||
nulls bitmap if there are any null elements, then the values of all the
|
||||
elements in order. When the element type itself is variable-length, the
|
||||
only way to find the <replaceable>N</>'th element is to scan through all the
|
||||
preceding elements. This representation is appropriate for on-disk storage
|
||||
because of its compactness, but for computations with the array it's much
|
||||
nicer to have an <quote>expanded</> or <quote>deconstructed</>
|
||||
representation in which all the element starting locations have been
|
||||
identified. The <acronym>TOAST</> pointer mechanism supports this need by
|
||||
allowing a pass-by-reference Datum to point to either a standard varlena
|
||||
value (the on-disk representation) or a <acronym>TOAST</> pointer that
|
||||
points to an expanded representation somewhere in memory. The details of
|
||||
this expanded representation are up to the data type, though it must have
|
||||
a standard header and meet the other API requirements given
|
||||
in <filename>src/include/utils/expandeddatum.h</>. C-level functions
|
||||
working with the data type can choose to handle either representation.
|
||||
Functions that do not know about the expanded representation, but simply
|
||||
apply <function>PG_DETOAST_DATUM</> to their inputs, will automatically
|
||||
receive the traditional varlena representation; so support for an expanded
|
||||
representation can be introduced incrementally, one function at a time.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<acronym>TOAST</> pointers to expanded values are further broken down
|
||||
into <firstterm>read-write</> and <firstterm>read-only</> pointers.
|
||||
The pointed-to representation is the same either way, but a function that
|
||||
receives a read-write pointer is allowed to modify the referenced value
|
||||
in-place, whereas one that receives a read-only pointer must not; it must
|
||||
first create a copy if it wants to make a modified version of the value.
|
||||
This distinction and some associated conventions make it possible to avoid
|
||||
unnecessary copying of expanded values during query execution.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For all types of in-memory <acronym>TOAST</> pointer, the <acronym>TOAST</>
|
||||
management code ensures that no such pointer datum can accidentally get
|
||||
|
@@ -300,6 +300,77 @@ CREATE TYPE complex (
|
||||
</para>
|
||||
</note>
|
||||
|
||||
<para>
|
||||
Another feature that's enabled by <acronym>TOAST</> support is the
|
||||
possibility of having an <firstterm>expanded</> in-memory data
|
||||
representation that is more convenient to work with than the format that
|
||||
is stored on disk. The regular or <quote>flat</> varlena storage format
|
||||
is ultimately just a blob of bytes; it cannot for example contain
|
||||
pointers, since it may get copied to other locations in memory.
|
||||
For complex data types, the flat format may be quite expensive to work
|
||||
with, so <productname>PostgreSQL</> provides a way to <quote>expand</>
|
||||
the flat format into a representation that is more suited to computation,
|
||||
and then pass that format in-memory between functions of the data type.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
To use expanded storage, a data type must define an expanded format that
|
||||
follows the rules given in <filename>src/include/utils/expandeddatum.h</>,
|
||||
and provide functions to <quote>expand</> a flat varlena value into
|
||||
expanded format and <quote>flatten</> the expanded format back to the
|
||||
regular varlena representation. Then ensure that all C functions for
|
||||
the data type can accept either representation, possibly by converting
|
||||
one into the other immediately upon receipt. This does not require fixing
|
||||
all existing functions for the data type at once, because the standard
|
||||
<function>PG_DETOAST_DATUM</> macro is defined to convert expanded inputs
|
||||
into regular flat format. Therefore, existing functions that work with
|
||||
the flat varlena format will continue to work, though slightly
|
||||
inefficiently, with expanded inputs; they need not be converted until and
|
||||
unless better performance is important.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
C functions that know how to work with an expanded representation
|
||||
typically fall into two categories: those that can only handle expanded
|
||||
format, and those that can handle either expanded or flat varlena inputs.
|
||||
The former are easier to write but may be less efficient overall, because
|
||||
converting a flat input to expanded form for use by a single function may
|
||||
cost more than is saved by operating on the expanded format.
|
||||
When only expanded format need be handled, conversion of flat inputs to
|
||||
expanded form can be hidden inside an argument-fetching macro, so that
|
||||
the function appears no more complex than one working with traditional
|
||||
varlena input.
|
||||
To handle both types of input, write an argument-fetching function that
|
||||
will detoast external, short-header, and compressed varlena inputs, but
|
||||
not expanded inputs. Such a function can be defined as returning a
|
||||
pointer to a union of the flat varlena format and the expanded format.
|
||||
Callers can use the <function>VARATT_IS_EXPANDED_HEADER()</> macro to
|
||||
determine which format they received.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <acronym>TOAST</> infrastructure not only allows regular varlena
|
||||
values to be distinguished from expanded values, but also
|
||||
distinguishes <quote>read-write</> and <quote>read-only</> pointers to
|
||||
expanded values. C functions that only need to examine an expanded
|
||||
value, or will only change it in safe and non-semantically-visible ways,
|
||||
need not care which type of pointer they receive. C functions that
|
||||
produce a modified version of an input value are allowed to modify an
|
||||
expanded input value in-place if they receive a read-write pointer, but
|
||||
must not modify the input if they receive a read-only pointer; in that
|
||||
case they have to copy the value first, producing a new value to modify.
|
||||
A C function that has constructed a new expanded value should always
|
||||
return a read-write pointer to it. Also, a C function that is modifying
|
||||
a read-write expanded value in-place should take care to leave the value
|
||||
in a sane state if it fails partway through.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For examples of working with expanded values, see the standard array
|
||||
infrastructure, particularly
|
||||
<filename>src/backend/utils/adt/array_expanded.c</>.
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
</sect1>
|
||||
|
Reference in New Issue
Block a user