mirror of
https://github.com/postgres/postgres.git
synced 2025-05-28 05:21:27 +03:00
- Misc grammar and punctuation fixes. - Stylistic cleanup: use spaces between function arguments and JSON fields in examples. For example "foo(a,b)" -> "foo(a, b)". Add semicolon after last END in a few PL/pgSQL examples that were missing them. - Make sentence that talked about "..." and ".." operators more clear, by avoiding to end the sentence with "..". That makes it look the same as "..." - Fix syntax description for HAVING: HAVING conditions cannot be repeated Patch by Justin Pryzby, per Yaroslav Schekin's report. Backpatch to all supported versions, to the extent that the patch applies easily. Discussion: https://www.postgresql.org/message-id/20201005191922.GE17626%40telsasoft.com
424 lines
14 KiB
Plaintext
424 lines
14 KiB
Plaintext
<!-- doc/src/sgml/seg.sgml -->
|
|
|
|
<sect1 id="seg" xreflabel="seg">
|
|
<title>seg</title>
|
|
|
|
<indexterm zone="seg">
|
|
<primary>seg</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
This module implements a data type <type>seg</type> for
|
|
representing line segments, or floating point intervals.
|
|
<type>seg</type> can represent uncertainty in the interval endpoints,
|
|
making it especially useful for representing laboratory measurements.
|
|
</para>
|
|
|
|
<para>
|
|
This module is considered <quote>trusted</quote>, that is, it can be
|
|
installed by non-superusers who have <literal>CREATE</literal> privilege
|
|
on the current database.
|
|
</para>
|
|
|
|
<sect2>
|
|
<title>Rationale</title>
|
|
|
|
<para>
|
|
The geometry of measurements is usually more complex than that of a
|
|
point in a numeric continuum. A measurement is usually a segment of
|
|
that continuum with somewhat fuzzy limits. The measurements come out
|
|
as intervals because of uncertainty and randomness, as well as because
|
|
the value being measured may naturally be an interval indicating some
|
|
condition, such as the temperature range of stability of a protein.
|
|
</para>
|
|
|
|
<para>
|
|
Using just common sense, it appears more convenient to store such data
|
|
as intervals, rather than pairs of numbers. In practice, it even turns
|
|
out more efficient in most applications.
|
|
</para>
|
|
|
|
<para>
|
|
Further along the line of common sense, the fuzziness of the limits
|
|
suggests that the use of traditional numeric data types leads to a
|
|
certain loss of information. Consider this: your instrument reads
|
|
6.50, and you input this reading into the database. What do you get
|
|
when you fetch it? Watch:
|
|
|
|
<screen>
|
|
test=> select 6.50 :: float8 as "pH";
|
|
pH
|
|
---
|
|
6.5
|
|
(1 row)
|
|
</screen>
|
|
|
|
In the world of measurements, 6.50 is not the same as 6.5. It may
|
|
sometimes be critically different. The experimenters usually write
|
|
down (and publish) the digits they trust. 6.50 is actually a fuzzy
|
|
interval contained within a bigger and even fuzzier interval, 6.5,
|
|
with their center points being (probably) the only common feature they
|
|
share. We definitely do not want such different data items to appear the
|
|
same.
|
|
</para>
|
|
|
|
<para>
|
|
Conclusion? It is nice to have a special data type that can record the
|
|
limits of an interval with arbitrarily variable precision. Variable in
|
|
the sense that each data element records its own precision.
|
|
</para>
|
|
|
|
<para>
|
|
Check this out:
|
|
|
|
<screen>
|
|
test=> select '6.25 .. 6.50'::seg as "pH";
|
|
pH
|
|
------------
|
|
6.25 .. 6.50
|
|
(1 row)
|
|
</screen>
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Syntax</title>
|
|
|
|
<para>
|
|
The external representation of an interval is formed using one or two
|
|
floating-point numbers joined by the range operator (<literal>..</literal>
|
|
or <literal>...</literal>). Alternatively, it can be specified as a
|
|
center point plus or minus a deviation.
|
|
Optional certainty indicators (<literal><</literal>,
|
|
<literal>></literal> or <literal>~</literal>) can be stored as well.
|
|
(Certainty indicators are ignored by all the built-in operators, however.)
|
|
<xref linkend="seg-repr-table"/> gives an overview of allowed
|
|
representations; <xref linkend="seg-input-examples"/> shows some
|
|
examples.
|
|
</para>
|
|
|
|
<para>
|
|
In <xref linkend="seg-repr-table"/>, <replaceable>x</replaceable>, <replaceable>y</replaceable>, and
|
|
<replaceable>delta</replaceable> denote
|
|
floating-point numbers. <replaceable>x</replaceable> and <replaceable>y</replaceable>, but
|
|
not <replaceable>delta</replaceable>, can be preceded by a certainty indicator.
|
|
</para>
|
|
|
|
<table id="seg-repr-table">
|
|
<title><type>seg</type> External Representations</title>
|
|
<tgroup cols="2">
|
|
<tbody>
|
|
<row>
|
|
<entry><literal><replaceable>x</replaceable></literal></entry>
|
|
<entry>Single value (zero-length interval)
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal><replaceable>x</replaceable> .. <replaceable>y</replaceable></literal></entry>
|
|
<entry>Interval from <replaceable>x</replaceable> to <replaceable>y</replaceable>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal><replaceable>x</replaceable> (+-) <replaceable>delta</replaceable></literal></entry>
|
|
<entry>Interval from <replaceable>x</replaceable> - <replaceable>delta</replaceable> to
|
|
<replaceable>x</replaceable> + <replaceable>delta</replaceable>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal><replaceable>x</replaceable> ..</literal></entry>
|
|
<entry>Open interval with lower bound <replaceable>x</replaceable>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>.. <replaceable>x</replaceable></literal></entry>
|
|
<entry>Open interval with upper bound <replaceable>x</replaceable>
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<table id="seg-input-examples">
|
|
<title>Examples of Valid <type>seg</type> Input</title>
|
|
<tgroup cols="2">
|
|
<colspec colname="col1" colwidth="1*"/>
|
|
<colspec colname="col2" colwidth="2*"/>
|
|
<tbody>
|
|
<row>
|
|
<entry><literal>5.0</literal></entry>
|
|
<entry>
|
|
Creates a zero-length segment (a point, if you will)
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>~5.0</literal></entry>
|
|
<entry>
|
|
Creates a zero-length segment and records
|
|
<literal>~</literal> in the data. <literal>~</literal> is ignored
|
|
by <type>seg</type> operations, but
|
|
is preserved as a comment.
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal><5.0</literal></entry>
|
|
<entry>
|
|
Creates a point at 5.0. <literal><</literal> is ignored but
|
|
is preserved as a comment.
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>>5.0</literal></entry>
|
|
<entry>
|
|
Creates a point at 5.0. <literal>></literal> is ignored but
|
|
is preserved as a comment.
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>5(+-)0.3</literal></entry>
|
|
<entry>
|
|
Creates an interval <literal>4.7 .. 5.3</literal>.
|
|
Note that the <literal>(+-)</literal> notation isn't preserved.
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>50 .. </literal></entry>
|
|
<entry>Everything that is greater than or equal to 50</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>.. 0</literal></entry>
|
|
<entry>Everything that is less than or equal to 0</entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>1.5e-2 .. 2E-2 </literal></entry>
|
|
<entry>Creates an interval <literal>0.015 .. 0.02</literal></entry>
|
|
</row>
|
|
<row>
|
|
<entry><literal>1 ... 2</literal></entry>
|
|
<entry>
|
|
The same as <literal>1...2</literal>, or <literal>1 .. 2</literal>,
|
|
or <literal>1..2</literal>
|
|
(spaces around the range operator are ignored)
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<para>
|
|
Because the <literal>...</literal> operator is widely used in data sources, it is allowed
|
|
as an alternative spelling of the <literal>..</literal> operator. Unfortunately, this
|
|
creates a parsing ambiguity: it is not clear whether the upper bound
|
|
in <literal>0...23</literal> is meant to be <literal>23</literal> or <literal>0.23</literal>.
|
|
This is resolved by requiring at least one digit before the decimal
|
|
point in all numbers in <type>seg</type> input.
|
|
</para>
|
|
|
|
<para>
|
|
As a sanity check, <type>seg</type> rejects intervals with the lower bound
|
|
greater than the upper, for example <literal>5 .. 2</literal>.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Precision</title>
|
|
|
|
<para>
|
|
<type>seg</type> values are stored internally as pairs of 32-bit floating point
|
|
numbers. This means that numbers with more than 7 significant digits
|
|
will be truncated.
|
|
</para>
|
|
|
|
<para>
|
|
Numbers with 7 or fewer significant digits retain their
|
|
original precision. That is, if your query returns 0.00, you will be
|
|
sure that the trailing zeroes are not the artifacts of formatting: they
|
|
reflect the precision of the original data. The number of leading
|
|
zeroes does not affect precision: the value 0.0067 is considered to
|
|
have just 2 significant digits.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Usage</title>
|
|
|
|
<para>
|
|
The <filename>seg</filename> module includes a GiST index operator class for
|
|
<type>seg</type> values.
|
|
The operators supported by the GiST operator class are shown in <xref linkend="seg-gist-operators"/>.
|
|
</para>
|
|
|
|
<table id="seg-gist-operators">
|
|
<title>Seg GiST Operators</title>
|
|
<tgroup cols="1">
|
|
<thead>
|
|
<row>
|
|
<entry role="func_table_entry"><para role="func_signature">
|
|
Operator
|
|
</para>
|
|
<para>
|
|
Description
|
|
</para></entry>
|
|
</row>
|
|
</thead>
|
|
|
|
<tbody>
|
|
<row>
|
|
<entry role="func_table_entry"><para role="func_signature">
|
|
<type>seg</type> <literal><<</literal> <type>seg</type>
|
|
<returnvalue>boolean</returnvalue>
|
|
</para>
|
|
<para>
|
|
Is the first <type>seg</type> entirely to the left of the second?
|
|
[a, b] << [c, d] is true if b < c.
|
|
</para></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry role="func_table_entry"><para role="func_signature">
|
|
<type>seg</type> <literal>>></literal> <type>seg</type>
|
|
<returnvalue>boolean</returnvalue>
|
|
</para>
|
|
<para>
|
|
Is the first <type>seg</type> entirely to the right of the second?
|
|
[a, b] >> [c, d] is true if a > d.
|
|
</para></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry role="func_table_entry"><para role="func_signature">
|
|
<type>seg</type> <literal>&<</literal> <type>seg</type>
|
|
<returnvalue>boolean</returnvalue>
|
|
</para>
|
|
<para>
|
|
Does the first <type>seg</type> not extend to the right of the
|
|
second?
|
|
[a, b] &< [c, d] is true if b <= d.
|
|
</para></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry role="func_table_entry"><para role="func_signature">
|
|
<type>seg</type> <literal>&></literal> <type>seg</type>
|
|
<returnvalue>boolean</returnvalue>
|
|
</para>
|
|
<para>
|
|
Does the first <type>seg</type> not extend to the left of the
|
|
second?
|
|
[a, b] &> [c, d] is true if a >= c.
|
|
</para></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry role="func_table_entry"><para role="func_signature">
|
|
<type>seg</type> <literal>=</literal> <type>seg</type>
|
|
<returnvalue>boolean</returnvalue>
|
|
</para>
|
|
<para>
|
|
Are the two <type>seg</type>s equal?
|
|
</para></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry role="func_table_entry"><para role="func_signature">
|
|
<type>seg</type> <literal>&&</literal> <type>seg</type>
|
|
<returnvalue>boolean</returnvalue>
|
|
</para>
|
|
<para>
|
|
Do the two <type>seg</type>s overlap?
|
|
</para></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry role="func_table_entry"><para role="func_signature">
|
|
<type>seg</type> <literal>@></literal> <type>seg</type>
|
|
<returnvalue>boolean</returnvalue>
|
|
</para>
|
|
<para>
|
|
Does the first <type>seg</type> contain the second?
|
|
</para></entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry role="func_table_entry"><para role="func_signature">
|
|
<type>seg</type> <literal><@</literal> <type>seg</type>
|
|
<returnvalue>boolean</returnvalue>
|
|
</para>
|
|
<para>
|
|
Is the first <type>seg</type> contained in the second?
|
|
</para></entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<para>
|
|
(Before PostgreSQL 8.2, the containment operators <literal>@></literal> and <literal><@</literal> were
|
|
respectively called <literal>@</literal> and <literal>~</literal>. These names are still available, but are
|
|
deprecated and will eventually be retired. Notice that the old names
|
|
are reversed from the convention formerly followed by the core geometric
|
|
data types!)
|
|
</para>
|
|
|
|
<para>
|
|
In addition to the above operators, the usual comparison
|
|
operators shown in <xref linkend="functions-comparison-op-table"/> are
|
|
available for type <type>seg</type>. These operators
|
|
first compare (a) to (c),
|
|
and if these are equal, compare (b) to (d). That results in
|
|
reasonably good sorting in most cases, which is useful if
|
|
you want to use ORDER BY with this type.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Notes</title>
|
|
|
|
<para>
|
|
For examples of usage, see the regression test <filename>sql/seg.sql</filename>.
|
|
</para>
|
|
|
|
<para>
|
|
The mechanism that converts <literal>(+-)</literal> to regular ranges
|
|
isn't completely accurate in determining the number of significant digits
|
|
for the boundaries. For example, it adds an extra digit to the lower
|
|
boundary if the resulting interval includes a power of ten:
|
|
|
|
<screen>
|
|
postgres=> select '10(+-)1'::seg as seg;
|
|
seg
|
|
---------
|
|
9.0 .. 11 -- should be: 9 .. 11
|
|
</screen>
|
|
</para>
|
|
|
|
<para>
|
|
The performance of an R-tree index can largely depend on the initial
|
|
order of input values. It may be very helpful to sort the input table
|
|
on the <type>seg</type> column; see the script <filename>sort-segments.pl</filename>
|
|
for an example.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Credits</title>
|
|
|
|
<para>
|
|
Original author: Gene Selkov, Jr. <email>selkovjr@mcs.anl.gov</email>,
|
|
Mathematics and Computer Science Division, Argonne National Laboratory.
|
|
</para>
|
|
|
|
<para>
|
|
My thanks are primarily to Prof. Joe Hellerstein
|
|
(<ulink url="https://dsf.berkeley.edu/jmh/"></ulink>) for elucidating the
|
|
gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am
|
|
also grateful to all Postgres developers, present and past, for enabling
|
|
myself to create my own world and live undisturbed in it. And I would like
|
|
to acknowledge my gratitude to Argonne Lab and to the U.S. Department of
|
|
Energy for the years of faithful support of my database research.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|