1
0
mirror of https://github.com/postgres/postgres.git synced 2025-04-22 23:02:54 +03:00

Make an editorial pass over the newly SGML-ified contrib documentation.

Fix lots of bad markup, bad English, bad explanations.

This commit covers only about half the contrib modules, but I grow weary...
This commit is contained in:
Tom Lane 2007-12-06 04:12:10 +00:00
parent a37a0a4180
commit 53e99f57fc
21 changed files with 3713 additions and 3093 deletions

View File

@ -1,3 +1,5 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/adminpack.sgml,v 1.3 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="adminpack">
<title>adminpack</title>
@ -6,31 +8,33 @@
</indexterm>
<para>
adminpack is a PostgreSQL standard module that implements a number of
support functions which pgAdmin and other administration and management tools
can use to provide additional functionality if installed on a server.
<filename>adminpack</> provides a number of support functions which
<application>pgAdmin</> and other administration and management tools can
use to provide additional functionality, such as remote management
of server log files.
</para>
<sect2>
<title>Functions implemented</title>
<para>
Functions implemented by adminpack can only be run by a superuser. Here's a
list of these functions:
</para>
<para>
<programlisting>
int8 pg_catalog.pg_file_write(fname text, data text, append bool)
bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text)
bool pg_catalog.pg_file_rename(oldname text, newname text)
bool pg_catalog.pg_file_unlink(fname text)
setof record pg_catalog.pg_logdir_ls()
/* Renaming of existing backend functions for pgAdmin compatibility */
int8 pg_catalog.pg_file_read(fname text, data text, append bool)
bigint pg_catalog.pg_file_length(text)
int4 pg_catalog.pg_logfile_rotate()
</programlisting>
<para>
The functions implemented by <filename>adminpack</> can only be run by a
superuser. Here's a list of these functions:
<programlisting>
int8 pg_catalog.pg_file_write(fname text, data text, append bool)
bool pg_catalog.pg_file_rename(oldname text, newname text, archivename text)
bool pg_catalog.pg_file_rename(oldname text, newname text)
bool pg_catalog.pg_file_unlink(fname text)
setof record pg_catalog.pg_logdir_ls()
/* Renaming of existing backend functions for pgAdmin compatibility */
int8 pg_catalog.pg_file_read(fname text, data text, append bool)
bigint pg_catalog.pg_file_length(text)
int4 pg_catalog.pg_logfile_rotate()
</programlisting>
</para>
</sect2>
</sect1>

View File

@ -1,3 +1,5 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/btree-gist.sgml,v 1.4 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="btree-gist">
<title>btree_gist</title>
@ -6,32 +8,49 @@
</indexterm>
<para>
btree_gist is a B-Tree implementation using GiST that supports the int2, int4,
int8, float4, float8 timestamp with/without time zone, time
with/without time zone, date, interval, oid, money, macaddr, char,
varchar/text, bytea, numeric, bit, varbit and inet/cidr types.
<filename>btree_gist</> provides sample GiST operator classes that
implement B-Tree equivalent behavior for the data types
<type>int2</>, <type>int4</>, <type>int8</>, <type>float4</>,
<type>float8</>, <type>numeric</>, <type>timestamp with time zone</>,
<type>timestamp without time zone</>, <type>time with time zone</>,
<type>time without time zone</>, <type>date</>, <type>interval</>,
<type>oid</>, <type>money</>, <type>char</>,
<type>varchar</>, <type>text</>, <type>bytea</>, <type>bit</>,
<type>varbit</>, <type>macaddr</>, <type>inet</>, and <type>cidr</>.
</para>
<para>
In general, these operator classes will not outperform the equivalent
standard btree index methods, and they lack one major feature of the
standard btree code: the ability to enforce uniqueness. However,
they are useful for GiST testing and as a base for developing other
GiST operator classes.
</para>
<sect2>
<title>Example usage</title>
<programlisting>
CREATE TABLE test (a int4);
-- create index
CREATE INDEX testidx ON test USING gist (a);
-- query
SELECT * FROM test WHERE a &lt; 10;
</programlisting>
<programlisting>
CREATE TABLE test (a int4);
-- create index
CREATE INDEX testidx ON test USING gist (a);
-- query
SELECT * FROM test WHERE a &lt; 10;
</programlisting>
</sect2>
<sect2>
<title>Authors</title>
<para>
All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) ,
Oleg Bartunov (<email>oleg@sai.msu.su</email>), Janko Richter
(<email>jankorichter@yahoo.de</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for additional
information.
Teodor Sigaev (<email>teodor@stack.net</email>) ,
Oleg Bartunov (<email>oleg@sai.msu.su</email>), and
Janko Richter (<email>jankorichter@yahoo.de</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink>
for additional information.
</para>
</sect2>
</sect1>

View File

@ -1,17 +1,45 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/chkpass.sgml,v 1.2 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="chkpass">
<title>chkpass</title>
<!--
<indexterm zone="chkpass">
<primary>chkpass</primary>
</indexterm>
-->
<para>
chkpass is a password type that is automatically checked and converted upon
entry. It is stored encrypted. To compare, simply compare against a clear
This module implements a data type <type>chkpass</> that is
designed for storing encrypted passwords.
Each password is automatically converted to encrypted form upon entry,
and is always stored encrypted. To compare, simply compare against a clear
text password and the comparison function will encrypt it before comparing.
It also returns an error if the code determines that the password is easily
crackable. This is currently a stub that does nothing.
</para>
<para>
There are provisions in the code to report an error if the password is
determined to be easily crackable. However, this is currently just
a stub that does nothing.
</para>
<para>
If you precede an input string with a colon, it is assumed to be an
already-encrypted password, and is stored without further encryption.
This allows entry of previously-encrypted passwords.
</para>
<para>
On output, a colon is prepended. This makes it possible to dump and reload
passwords without re-encrypting them. If you want the encrypted password
without the colon then use the <function>raw()</> function.
This allows you to use the
type with things like Apache's Auth_PostgreSQL module.
</para>
<para>
The encryption uses the standard Unix function <function>crypt()</>,
and so it suffers
from all the usual limitations of that function; notably that only the
first eight characters of a password are considered.
</para>
<para>
@ -23,28 +51,10 @@
</para>
<para>
If you precede the string with a colon, the encryption and checking are
skipped so that you can enter existing passwords into the field.
Sample usage:
</para>
<para>
On output, a colon is prepended. This makes it possible to dump and reload
passwords without re-encrypting them. If you want the password (encrypted)
without the colon then use the raw() function. This allows you to use the
type with things like Apache's Auth_PostgreSQL module.
</para>
<para>
The encryption uses the standard Unix function crypt(), and so it suffers
from all the usual limitations of that function; notably that only the
first eight characters of a password are considered.
</para>
<para>
Here is some sample usage:
</para>
<programlisting>
<programlisting>
test=# create table test (p chkpass);
CREATE TABLE
test=# insert into test values ('hello');
@ -72,13 +82,14 @@ test=# select p = 'goodbye' from test;
----------
f
(1 row)
</programlisting>
</programlisting>
<sect2>
<title>Author</title>
<para>
D'Arcy J.M. Cain <email>darcy@druid.net</email>
D'Arcy J.M. Cain (<email>darcy@druid.net</email>)
</para>
</sect2>
</sect1>
</sect1>

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib-spi.sgml,v 1.1 2007/12/03 04:18:47 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib-spi.sgml,v 1.2 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="contrib-spi">
<title>spi</title>
@ -29,27 +29,28 @@
<para>
<function>check_primary_key()</> checks the referencing table.
To use, create a BEFORE INSERT OR UPDATE trigger using this
function on a table referencing another table. You are to specify
as trigger arguments: triggered table column names which correspond
to foreign key, referenced table name and column names in referenced
table which correspond to primary/unique key. To handle multiple
foreign keys, create a trigger for each reference.
To use, create a <literal>BEFORE INSERT OR UPDATE</> trigger using this
function on a table referencing another table. Specify as the trigger
arguments: the referencing table's column name(s) which form the foreign
key, the referenced table name, and the column names in the referenced table
which form the primary/unique key. To handle multiple foreign
keys, create a trigger for each reference.
</para>
<para>
<function>check_foreign_key()</> checks the referenced table.
To use, create a BEFORE DELETE OR UPDATE trigger using this
function on a table referenced by other table(s). You are to specify
as trigger arguments: number of references for which the function has to
perform checking, action if referencing key found ('cascade' &mdash; to delete
corresponding foreign key, 'restrict' &mdash; to abort transaction if foreign keys
exist, 'setnull' &mdash; to set foreign key referencing primary/unique key
being deleted to null), triggered table column names which correspond
to primary/unique key, then referencing table name and column names
corresponding to foreign key (repeated for as many referencing tables/keys
as were specified by first argument). Note that the primary/unique key
columns should be marked NOT NULL and should have a unique index.
To use, create a <literal>BEFORE DELETE OR UPDATE</> trigger using this
function on a table referenced by other table(s). Specify as the trigger
arguments: the number of referencing tables for which the function has to
perform checking, the action if a referencing key is found
(<literal>cascade</> &mdash; to delete the referencing row,
<literal>restrict</> &mdash; to abort transaction if referencing keys
exist, <literal>setnull</> &mdash; to set referencing key fields to null),
the triggered table's column names which form the primary/unique key, then
the referencing table name and column names (repeated for as many
referencing tables as were specified by first argument). Note that the
primary/unique key columns should be marked NOT NULL and should have a
unique index.
</para>
<para>
@ -64,60 +65,65 @@
Long ago, <productname>PostgreSQL</> had a built-in time travel feature
that kept the insert and delete times for each tuple. This can be
emulated using these functions. To use these functions,
you are to add to a table two columns of <type>abstime</> type to store
you must add to a table two columns of <type>abstime</> type to store
the date when a tuple was inserted (start_date) and changed/deleted
(stop_date):
<programlisting>
CREATE TABLE mytab (
... ...
start_date abstime default now(),
stop_date abstime default 'infinity'
start_date abstime,
stop_date abstime
... ...
);
</programlisting>
So, tuples being inserted with unspecified start_date/stop_date will get
the current time in start_date and <literal>infinity</> in
stop_date.
The columns can be named whatever you like, but in this discussion
we'll call them start_date and stop_date.
</para>
<para>
When a new row is inserted, start_date should normally be set to
current time, and stop_date to <literal>infinity</>. The trigger
will automatically substitute these values if the inserted data
contains nulls in these columns. Generally, inserting explicit
non-null data in these columns should only be done when re-loading
dumped data.
</para>
<para>
Tuples with stop_date equal to <literal>infinity</> are <quote>valid
now</quote>: when trigger will be fired for UPDATE/DELETE of a tuple with
stop_date NOT equal to <literal>infinity</> then
this tuple will not be changed/deleted!
now</quote>, and can be modified. Tuples with a finite stop_date cannot
be modified anymore &mdash; the trigger will prevent it. (If you need
to do that, you can turn off time travel as shown below.)
</para>
<para>
If stop_date is equal to <literal>infinity</> then on
update only the stop_date in the tuple being updated will be changed (to
current time) and a new tuple with new data (coming from SET ... in UPDATE)
will be inserted. Start_date in this new tuple will be set to current time
and stop_date to <literal>infinity</>.
For a modifiable row, on update only the stop_date in the tuple being
updated will be changed (to current time) and a new tuple with the modified
data will be inserted. Start_date in this new tuple will be set to current
time and stop_date to <literal>infinity</>.
</para>
<para>
A delete does not actually remove the tuple but only set its stop_date
A delete does not actually remove the tuple but only sets its stop_date
to current time.
</para>
<para>
To query for tuples <quote>valid now</quote>, include
<literal>stop_date = 'infinity'</> in the query's WHERE condition.
(You might wish to incorporate that in a view.)
</para>
<para>
You can't change start/stop date columns with UPDATE!
Use set_timetravel (below) if you need this.
(You might wish to incorporate that in a view.) Similarly, you can
query for tuples valid at any past time with suitable conditions on
start_date and stop_date.
</para>
<para>
<function>timetravel()</> is the general trigger function that supports
this behavior. Create a BEFORE INSERT OR UPDATE OR DELETE trigger using this
function on each time-traveled table. You are to specify two trigger arguments:
name of start_date column and name of stop_date column in triggered table.
this behavior. Create a <literal>BEFORE INSERT OR UPDATE OR DELETE</>
trigger using this function on each time-traveled table. Specify two
trigger arguments: the actual
names of the start_date and stop_date columns.
Optionally, you can specify one to three more arguments, which must refer
to columns of type <type>text</>. The trigger will store the name of
the current user into the first of these columns during INSERT, the
@ -130,7 +136,9 @@ CREATE TABLE mytab (
<literal>set_timetravel('mytab', 1)</> will turn TT ON for table mytab.
<literal>set_timetravel('mytab', 0)</> will turn TT OFF for table mytab.
In both cases the old status is reported. While TT is off, you can modify
the start_date and stop_date columns freely.
the start_date and stop_date columns freely. Note that the on/off status
is local to the current database session &mdash; fresh sessions will
always start out with TT ON for all tables.
</para>
<para>
@ -156,9 +164,9 @@ CREATE TABLE mytab (
</para>
<para>
To use, create a BEFORE INSERT (or optionally BEFORE INSERT OR UPDATE)
trigger using this function. You are to specify
as trigger arguments: the name of the integer column to be modified,
To use, create a <literal>BEFORE INSERT</> (or optionally <literal>BEFORE
INSERT OR UPDATE</>) trigger using this function. Specify two
trigger arguments: the name of the integer column to be modified,
and the name of the sequence object that will supply values.
(Actually, you can specify any number of pairs of such names, if
you'd like to update more than one autoincrementing column.)
@ -180,8 +188,8 @@ CREATE TABLE mytab (
</para>
<para>
To use, create a BEFORE INSERT and/or UPDATE
trigger using this function. You are to specify a single trigger
To use, create a <literal>BEFORE INSERT</> and/or <literal>UPDATE</>
trigger using this function. Specify a single trigger
argument: the name of the text column to be modified.
</para>
@ -201,8 +209,8 @@ CREATE TABLE mytab (
</para>
<para>
To use, create a BEFORE UPDATE
trigger using this function. You are to specify a single trigger
To use, create a <literal>BEFORE UPDATE</>
trigger using this function. Specify a single trigger
argument: the name of the <type>timestamp</> column to be modified.
</para>

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib.sgml,v 1.7 2007/12/03 04:18:47 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/contrib.sgml,v 1.8 2007/12/06 04:12:09 tgl Exp $ -->
<appendix id="contrib">
<title>Additional Supplied Modules</title>
@ -54,6 +54,7 @@ psql -d dbname -f <replaceable>SHAREDIR</>/contrib/<replaceable>module</>.sql
Here, <replaceable>SHAREDIR</> means the installation's <quote>share</>
directory (<literal>pg_config --sharedir</> will tell you what this is).
In most cases the script must be run by a database superuser.
</para>
<para>

View File

@ -1,3 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/cube.sgml,v 1.5 2007/12/06 04:12:09 tgl Exp $ -->
<sect1 id="cube">
<title>cube</title>
@ -7,15 +8,17 @@
</indexterm>
<para>
This module contains the user-defined type, CUBE, representing
multidimensional cubes.
This module implements a data type <type>cube</> for
representing multi-dimensional cubes.
</para>
<sect2>
<title>Syntax</title>
<para>
The following are valid external representations for the CUBE type:
The following are valid external representations for the <type>cube</>
type. <replaceable>x</>, <replaceable>y</>, etc denote floating-point
numbers:
</para>
<table>
@ -23,192 +26,281 @@
<tgroup cols="2">
<tbody>
<row>
<entry>'x'</entry>
<entry>A floating point value representing a one-dimensional point or
one-dimensional zero length cubement
</entry>
</row>
<row>
<entry>'(x)'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'x1,x2,x3,...,xn'</entry>
<entry>A point in n-dimensional space, represented internally as a zero
volume box
</entry>
</row>
<row>
<entry>'(x1,x2,x3,...,xn)'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'(x),(y)'</entry>
<entry>1-D cubement starting at x and ending at y or vice versa; the
order does not matter
</entry>
</row>
<row>
<entry>'(x1,...,xn),(y1,...,yn)'</entry>
<entry>n-dimensional box represented by a pair of its opposite corners, no
matter which. Functions take care of swapping to achieve "lower left --
upper right" representation before computing any values
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Grammar</title>
<table>
<title>Cube Grammar Rules</title>
<tgroup cols="2">
<tbody>
<row>
<entry>rule 1</entry>
<entry>box -> O_BRACKET paren_list COMMA paren_list C_BRACKET</entry>
</row>
<row>
<entry>rule 2</entry>
<entry>box -> paren_list COMMA paren_list</entry>
</row>
<row>
<entry>rule 3</entry>
<entry>box -> paren_list</entry>
</row>
<row>
<entry>rule 4</entry>
<entry>box -> list</entry>
</row>
<row>
<entry>rule 5</entry>
<entry>paren_list -> O_PAREN list C_PAREN</entry>
</row>
<row>
<entry>rule 6</entry>
<entry>list -> FLOAT</entry>
</row>
<row>
<entry>rule 7</entry>
<entry>list -> list COMMA FLOAT</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Tokens</title>
<table>
<title>Cube Grammar Rules</title>
<tgroup cols="2">
<tbody>
<row>
<entry>n</entry>
<entry>[0-9]+</entry>
</row>
<row>
<entry>i</entry>
<entry>nteger [+-]?{n}</entry>
</row>
<row>
<entry>real</entry>
<entry>[+-]?({n}\.{n}?|\.{n})</entry>
</row>
<row>
<entry>FLOAT</entry>
<entry>({integer}|{real})([eE]{integer})?</entry>
</row>
<row>
<entry>O_BRACKET</entry>
<entry>\[</entry>
</row>
<row>
<entry>C_BRACKET</entry>
<entry>\]</entry>
</row>
<row>
<entry>O_PAREN</entry>
<entry>\(</entry>
</row>
<row>
<entry>C_PAREN</entry>
<entry>\)</entry>
</row>
<row>
<entry>COMMA</entry>
<entry>\,</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Examples</title>
<table>
<title>Examples</title>
<tgroup cols="2">
<tbody>
<row>
<entry>'x'</entry>
<entry>A floating point value representing a one-dimensional point
<entry><literal><replaceable>x</></literal></entry>
<entry>A one-dimensional point
(or, zero-length one-dimensional interval)
</entry>
</row>
<row>
<entry>'(x)'</entry>
<entry><literal>(<replaceable>x</>)</literal></entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'x1,x2,x3,...,xn'</entry>
<entry>A point in n-dimensional space,represented internally as a zero
volume cube
<entry><literal><replaceable>x1</>,<replaceable>x2</>,...,<replaceable>xn</></literal></entry>
<entry>A point in n-dimensional space, represented internally as a
zero-volume cube
</entry>
</row>
<row>
<entry>'(x1,x2,x3,...,xn)'</entry>
<entry><literal>(<replaceable>x1</>,<replaceable>x2</>,...,<replaceable>xn</>)</literal></entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'(x),(y)'</entry>
<entry>A 1-D interval starting at x and ending at y or vice versa; the
<entry><literal>(<replaceable>x</>),(<replaceable>y</>)</literal></entry>
<entry>A one-dimensional interval starting at <replaceable>x</> and ending at <replaceable>y</> or vice versa; the
order does not matter
</entry>
</row>
<row>
<entry>'[(x),(y)]'</entry>
<entry><literal>[(<replaceable>x</>),(<replaceable>y</>)]</literal></entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'(x1,...,xn),(y1,...,yn)'</entry>
<entry>An n-dimensional box represented by a pair of its diagonally
opposite corners, regardless of order. Swapping is provided
by all comarison routines to ensure the
"lower left -- upper right" representation
before actaul comparison takes place.
<entry><literal>(<replaceable>x1</>,...,<replaceable>xn</>),(<replaceable>y1</>,...,<replaceable>yn</>)</literal></entry>
<entry>An n-dimensional cube represented by a pair of its diagonally
opposite corners
</entry>
</row>
<row>
<entry>'[(x1,...,xn),(y1,...,yn)]'</entry>
<entry><literal>[(<replaceable>x1</>,...,<replaceable>xn</>),(<replaceable>y1</>,...,<replaceable>yn</>)]</literal></entry>
<entry>Same as above</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
White space is ignored, so '[(x),(y)]' can be: '[ ( x ), ( y ) ]'
It does not matter which order the opposite corners of a cube are
entered in. The <type>cube</> functions
automatically swap values if needed to create a uniform
<quote>lower left &mdash; upper right</> internal representation.
</para>
<para>
White space is ignored, so <literal>[(<replaceable>x</>),(<replaceable>y</>)]</literal> is the same as
<literal>[ ( <replaceable>x</> ), ( <replaceable>y</> ) ]</literal>.
</para>
</sect2>
<sect2>
<title>Precision</title>
<para>
Values are stored internally as 64-bit floating point numbers. This means
that numbers with more than about 16 significant digits will be truncated.
</para>
</sect2>
<sect2>
<title>Usage</title>
<para>
The <filename>cube</> module includes a GiST index operator class for
<type>cube</> values.
The operators supported by the GiST opclass include:
</para>
<itemizedlist>
<listitem>
<programlisting>
a = b Same as
</programlisting>
<para>
The cubes a and b are identical.
</para>
</listitem>
<listitem>
<programlisting>
a &amp;&amp; b Overlaps
</programlisting>
<para>
The cubes a and b overlap.
</para>
</listitem>
<listitem>
<programlisting>
a @&gt; b Contains
</programlisting>
<para>
The cube a contains the cube b.
</para>
</listitem>
<listitem>
<programlisting>
a &lt;@ b Contained in
</programlisting>
<para>
The cube a is contained in the cube b.
</para>
</listitem>
</itemizedlist>
<para>
(Before PostgreSQL 8.2, the containment operators @&gt; and &lt;@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
<para>
The standard B-tree operators are also provided, for example
<programlisting>
[a, b] &lt; [c, d] Less than
[a, b] &gt; [c, d] Greater than
</programlisting>
These operators do not make a lot of sense for any practical
purpose but sorting. These operators first compare (a) to (c),
and if these are equal, compare (b) to (d). That results in
reasonably good sorting in most cases, which is useful if
you want to use ORDER BY with this type.
</para>
<para>
The following functions are available:
</para>
<table>
<title>Cube functions</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>cube(float8) returns cube</literal></entry>
<entry>Makes a one dimensional cube with both coordinates the same.
<literal>cube(1) == '(1)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8, float8) returns cube</literal></entry>
<entry>Makes a one dimensional cube.
<literal>cube(1,2) == '(1),(2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8[]) returns cube</literal></entry>
<entry>Makes a zero-volume cube using the coordinates
defined by the array.
<literal>cube(ARRAY[1,2]) == '(1,2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8[], float8[]) returns cube</literal></entry>
<entry>Makes a cube with upper right and lower left
coordinates as defined by the two arrays, which must be of the
same length.
<literal>cube('{1,2}'::float[], '{3,4}'::float[]) == '(1,2),(3,4)'
</literal>
</entry>
</row>
<row>
<entry><literal>cube(cube, float8) returns cube</literal></entry>
<entry>Makes a new cube by adding a dimension on to an
existing cube with the same values for both parts of the new coordinate.
This is useful for building cubes piece by piece from calculated values.
<literal>cube('(1)',2) == '(1,2),(1,2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(cube, float8, float8) returns cube</literal></entry>
<entry>Makes a new cube by adding a dimension on to an
existing cube. This is useful for building cubes piece by piece from
calculated values. <literal>cube('(1,2)',3,4) == '(1,3),(2,4)'</literal>
</entry>
</row>
<row>
<entry><literal>cube_dim(cube) returns int</literal></entry>
<entry>Returns the number of dimensions of the cube
</entry>
</row>
<row>
<entry><literal>cube_ll_coord(cube, int) returns double </literal></entry>
<entry>Returns the n'th coordinate value for the lower left
corner of a cube
</entry>
</row>
<row>
<entry><literal>cube_ur_coord(cube, int) returns double
</literal></entry>
<entry>Returns the n'th coordinate value for the
upper right corner of a cube
</entry>
</row>
<row>
<entry><literal>cube_is_point(cube) returns bool</literal></entry>
<entry>Returns true if a cube is a point, that is,
the two defining corners are the same.</entry>
</row>
<row>
<entry><literal>cube_distance(cube, cube) returns double</literal></entry>
<entry>Returns the distance between two cubes. If both
cubes are points, this is the normal distance function.
</entry>
</row>
<row>
<entry><literal>cube_subset(cube, int[]) returns cube
</literal></entry>
<entry>Makes a new cube from an existing cube, using a list of
dimension indexes from an array. Can be used to find both the LL and UR
coordinates of a single dimension, e.g.
<literal>cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)'</>.
Or can be used to drop dimensions, or reorder them as desired, e.g.
<literal>cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) = '(5, 3,
1, 1),(8, 7, 6, 6)'</>.
</entry>
</row>
<row>
<entry><literal>cube_union(cube, cube) returns cube</literal></entry>
<entry>Produces the union of two cubes
</entry>
</row>
<row>
<entry><literal>cube_inter(cube, cube) returns cube</literal></entry>
<entry>Produces the intersection of two cubes
</entry>
</row>
<row>
<entry><literal>cube_enlarge(cube c, double r, int n) returns cube</literal></entry>
<entry>Increases the size of a cube by a specified radius in at least
n dimensions. If the radius is negative the cube is shrunk instead. This
is useful for creating bounding boxes around a point for searching for
nearby points. All defined dimensions are changed by the radius r.
LL coordinates are decreased by r and UR coordinates are increased by r.
If a LL coordinate is increased to larger than the corresponding UR
coordinate (this can only happen when r &lt; 0) than both coordinates
are set to their average. If n is greater than the number of defined
dimensions and the cube is being increased (r &gt;= 0) then 0 is used
as the base for the extra coordinates.
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Defaults</title>
<para>
I believe this union:
</para>
<programlisting>
select cube_union('(0,5,2),(2,3,1)','0');
select cube_union('(0,5,2),(2,3,1)', '0');
cube_union
-------------------
(0, 0, 0),(2, 5, 2)
@ -216,11 +308,11 @@ cube_union
</programlisting>
<para>
does not contradict to the common sense, neither does the intersection
does not contradict common sense, neither does the intersection
</para>
<programlisting>
select cube_inter('(0,-1),(1,1)','(-2),(2)');
select cube_inter('(0,-1),(1,1)', '(-2),(2)');
cube_inter
-------------
(0, 0),(1, 0)
@ -228,9 +320,10 @@ cube_inter
</programlisting>
<para>
In all binary operations on differently sized boxes, I assume the smaller
one to be a cartesian projection, i. e., having zeroes in place of coordinates
omitted in the string representation. The above examples are equivalent to:
In all binary operations on differently-dimensioned cubes, I assume the
lower-dimensional one to be a cartesian projection, i. e., having zeroes
in place of coordinates omitted in the string representation. The above
examples are equivalent to:
</para>
<programlisting>
@ -241,7 +334,7 @@ cube_inter('(0,-1),(1,1)','(-2,0),(2,0)');
<para>
The following containment predicate uses the point syntax,
while in fact the second argument is internally represented by a box.
This syntax makes it unnecessary to define the special Point type
This syntax makes it unnecessary to define a separate point type
and functions for (box,point) predicates.
</para>
@ -253,268 +346,42 @@ t
(1 row)
</programlisting>
</sect2>
<sect2>
<title>Precision</title>
<para>
Values are stored internally as 64-bit floating point numbers. This means that
numbers with more than about 16 significant digits will be truncated.
</para>
</sect2>
<sect2>
<title>Usage</title>
<title>Notes</title>
<para>
The access method for CUBE is a GiST index (gist_cube_ops), which is a
generalization of R-tree. GiSTs allow the postgres implementation of
R-tree, originally encoded to support 2-D geometric types such as
boxes and polygons, to be used with any data type whose data domain
can be partitioned using the concepts of containment, intersection and
equality. In other words, everything that can intersect or contain
its own kind can be indexed with a GiST. That includes, among other
things, all geometric data types, regardless of their dimensionality
(see also contrib/seg).
For examples of usage, see the regression test <filename>sql/cube.sql</>.
</para>
<para>
The operators supported by the GiST access method include:
</para>
<programlisting>
a = b Same as
</programlisting>
<para>
The cubements a and b are identical.
</para>
<programlisting>
a &amp;&amp; b Overlaps
</programlisting>
<para>
The cubements a and b overlap.
</para>
<programlisting>
a @&gt; b Contains
</programlisting>
<para>
The cubement a contains the cubement b.
</para>
<programlisting>
a &lt;@ b Contained in
</programlisting>
<para>
The cubement a is contained in b.
</para>
<para>
(Before PostgreSQL 8.2, the containment operators @&gt; and &lt;@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
<para>
Although the mnemonics of the following operators is questionable, I
preserved them to maintain visual consistency with other geometric
data types defined in Postgres.
</para>
<para>
Other operators:
</para>
<programlisting>
[a, b] &lt; [c, d] Less than
[a, b] &gt; [c, d] Greater than
</programlisting>
<para>
These operators do not make a lot of sense for any practical
purpose but sorting. These operators first compare (a) to (c),
and if these are equal, compare (b) to (d). That accounts for
reasonably good sorting in most cases, which is useful if
you want to use ORDER BY with this type
</para>
<para>
The following functions are available:
</para>
<table>
<title>Functions available</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>cube_distance(cube, cube) returns double</literal></entry>
<entry>cube_distance returns the distance between two cubes. If both
cubes are points, this is the normal distance function.
</entry>
</row>
<row>
<entry><literal>cube(text)</literal></entry>
<entry>Takes text input and returns a cube. This is useful for making
cubes from computed strings.
</entry>
</row>
<row>
<entry><literal>cube(float8) returns cube</literal></entry>
<entry>This makes a one dimensional cube with both coordinates the same.
If the type of the argument is a numeric type other than float8 an
explicit cast to float8 may be needed.
<literal>cube(1) == '(1)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8, float8) returns cube</literal></entry>
<entry>
This makes a one dimensional cube.
<literal>cube(1,2) == '(1),(2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8[]) returns cube</literal></entry>
<entry>This makes a zero-volume cube using the coordinates
defined by thearray.<literal>cube(ARRAY[1,2]) == '(1,2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8[], float8[]) returns cube</literal></entry>
<entry>This makes a cube, with upper right and lower left
coordinates as defined by the 2 float arrays. Arrays must be of the
same length.
<literal>cube('{1,2}'::float[], '{3,4}'::float[]) == '(1,2),(3,4)'
</literal>
</entry>
</row>
<row>
<entry><literal>cube(cube, float8) returns cube</literal></entry>
<entry>This builds a new cube by adding a dimension on to an
existing cube with the same values for both parts of the new coordinate.
This is useful for building cubes piece by piece from calculated values.
<literal>cube('(1)',2) == '(1,2),(1,2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(cube, float8, float8) returns cube</literal></entry>
<entry>This builds a new cube by adding a dimension on to an
existing cube. This is useful for building cubes piece by piece from
calculated values. <literal>cube('(1,2)',3,4) == '(1,3),(2,4)'</literal>
</entry>
</row>
<row>
<entry><literal>cube_dim(cube) returns int</literal></entry>
<entry>cube_dim returns the number of dimensions stored in the
the data structure
for a cube. This is useful for constraints on the dimensions of a cube.
</entry>
</row>
<row>
<entry><literal>cube_ll_coord(cube, int) returns double </literal></entry>
<entry>
cube_ll_coord returns the nth coordinate value for the lower left
corner of a cube. This is useful for doing coordinate transformations.
</entry>
</row>
<row>
<entry><literal>cube_ur_coord(cube, int) returns double
</literal></entry>
<entry>cube_ur_coord returns the nth coordinate value for the
upper right corner of a cube. This is useful for doing coordinate
transformations.
</entry>
</row>
<row>
<entry><literal>cube_subset(cube, int[]) returns cube
</literal></entry>
<entry>Builds a new cube from an existing cube, using a list of
dimension indexes
from an array. Can be used to find both the ll and ur coordinate of single
dimenion, e.g.: cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)'
Or can be used to drop dimensions, or reorder them as desired, e.g.:
cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) =
'(5, 3, 1, 1),(8, 7, 6, 6)'
</entry>
</row>
<row>
<entry><literal>cube_is_point(cube) returns bool</literal></entry>
<entry>cube_is_point returns true if a cube is also a point.
This is true when the two defining corners are the same.</entry>
</row>
<row>
<entry><literal>cube_enlarge(cube, double, int) returns cube</literal></entry>
<entry>
cube_enlarge increases the size of a cube by a specified
radius in at least
n dimensions. If the radius is negative the box is shrunk instead. This
is useful for creating bounding boxes around a point for searching for
nearby points. All defined dimensions are changed by the radius. If n
is greater than the number of defined dimensions and the cube is being
increased (r &gt;= 0) then 0 is used as the base for the extra coordinates.
LL coordinates are decreased by r and UR coordinates are increased by r.
If a LL coordinate is increased to larger than the corresponding UR
coordinate (this can only happen when r &lt; 0) than both coordinates are
set to their average. To make it harder for people to break things there
is an effective maximum on the dimension of cubes of 100. This is set
in cubedata.h if you need something bigger.
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
There are a few other potentially useful functions defined in cube.c
that vanished from the schema because I stopped using them. Some of
these were meant to support type casting. Let me know if I was wrong:
I will then add them back to the schema. I would also appreciate
other ideas that would enhance the type and make it more useful.
</para>
<para>
For examples of usage, see sql/cube.sql
To make it harder for people to break things, there
is a limit of 100 on the number of dimensions of cubes. This is set
in <filename>cubedata.h</> if you need something bigger.
</para>
</sect2>
<sect2>
<title>Credits</title>
<para>
This code is essentially based on the example written for
Illustra, <ulink url="http://garcia.me.berkeley.edu/~adong/rtree"></ulink>
Original author: Gene Selkov, Jr. <email>selkovjr@mcs.anl.gov</email>,
Mathematics and Computer Science Division, Argonne National Laboratory.
</para>
<para>
My thanks are primarily to Prof. Joe Hellerstein
(<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>), and
to his former student, Andy Dong
(<ulink url="http://best.me.berkeley.edu/~adong/"></ulink>), for his exemplar.
I am also grateful to all postgres developers, present and past, for enabling
myself to create my own world and live undisturbed in it. And I would like to
acknowledge my gratitude to Argonne Lab and to the U.S. Department of Energy
for the years of faithful support of my database research.
</para>
<para>
Gene Selkov, Jr.
Computational Scientist
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Ave.
Building 221
Argonne, IL 60439-4844
<email>selkovjr@mcs.anl.gov</email>
to his former student, Andy Dong (<ulink
url="http://best.me.berkeley.edu/~adong/"></ulink>), for his example
written for Illustra,
<ulink url="http://garcia.me.berkeley.edu/~adong/rtree"></ulink>.
I am also grateful to all Postgres developers, present and past, for
enabling myself to create my own world and live undisturbed in it. And I
would like to acknowledge my gratitude to Argonne Lab and to the
U.S. Department of Energy for the years of faithful support of my database
research.
</para>
<para>
@ -527,9 +394,9 @@ a &lt;@ b Contained in
<para>
Additional updates were made by Joshua Reich <email>josh@root.net</email> in
July 2006. These include <literal>cube(float8[], float8[])</literal> and
cleaning up the code to use the V1 call protocol instead of the deprecated V0
form.
cleaning up the code to use the V1 call protocol instead of the deprecated
V0 protocol.
</para>
</sect2>
</sect1>
</sect1>

File diff suppressed because it is too large Load Diff

View File

@ -1,3 +1,5 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/dict-int.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="dict-int">
<title>dict_int</title>
@ -6,13 +8,16 @@
</indexterm>
<para>
The motivation for this example dictionary is to control the indexing of
integers (signed and unsigned), and, consequently, to minimize the number of
unique words which greatly affect the performance of searching.
<filename>dict_int</> is an example of an add-on dictionary template
for full-text search. The motivation for this example dictionary is to
control the indexing of integers (signed and unsigned), allowing such
numbers to be indexed while preventing excessive growth in the number of
unique words, which greatly affects the performance of searching.
</para>
<sect2>
<title>Configuration</title>
<para>
The dictionary accepts two options:
</para>
@ -20,17 +25,19 @@
<itemizedlist>
<listitem>
<para>
The MAXLEN parameter specifies the maximum length (number of digits)
allowed in an integer word. The default value is 6.
The <literal>maxlen</> parameter specifies the maximum number of
digits allowed in an integer word. The default value is 6.
</para>
</listitem>
<listitem>
<para>
The REJECTLONG parameter specifies if an overlength integer should be
truncated or ignored. If REJECTLONG=FALSE (default), the dictionary returns
the first MAXLEN digits of the integer. If REJECTLONG=TRUE, the
dictionary treats an overlength integer as a stop word, so that it will
not be indexed.
The <literal>rejectlong</> parameter specifies whether an overlength
integer should be truncated or ignored. If <literal>rejectlong</> is
<literal>false</> (the default), the dictionary returns the first
<literal>maxlen</> digits of the integer. If <literal>rejectlong</> is
<literal>true</>, the dictionary treats an overlength integer as a stop
word, so that it will not be indexed. Note that this also means that
such an integer cannot be searched for.
</para>
</listitem>
</itemizedlist>

View File

@ -1,3 +1,5 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/dict-xsyn.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="dict-xsyn">
<title>dict_xsyn</title>
@ -6,28 +8,34 @@
</indexterm>
<para>
The Extended Synonym Dictionary module replaces words with groups of their
synonyms, and so makes it possible to search for a word using any of its
synonyms.
<filename>dict_xsyn</> (Extended Synonym Dictionary) is an example of an
add-on dictionary template for full-text search. This dictionary type
replaces words with groups of their synonyms, and so makes it possible to
search for a word using any of its synonyms.
</para>
<sect2>
<title>Configuration</title>
<para>
A <literal>dict_xsyn</> dictionary accepts the following options:
</para>
<itemizedlist>
<listitem>
<para>
KEEPORIG controls whether the original word is included, or only its
synonyms. Default is 'true'.
<literal>keeporig</> controls whether the original word is included (if
<literal>true</>), or only its synonyms (if <literal>false</>). Default
is <literal>true</>.
</para>
</listitem>
<listitem>
<para>
RULES is the base name of the file containing the list of synonyms.
This file must be in $(prefix)/share/tsearch_data/, and its name must
end in ".rules" (which is not included in the RULES parameter).
<literal>rules</> is the base name of the file containing the list of
synonyms. This file must be stored in
<filename>$SHAREDIR/tsearch_data/</> (where <literal>$SHAREDIR</> means
the <productname>PostgreSQL</> installation's shared-data directory).
Its name must end in <literal>.rules</> (which is not to be included in
the <literal>rules</> parameter).
</para>
</listitem>
</itemizedlist>
@ -38,41 +46,63 @@
<listitem>
<para>
Each line represents a group of synonyms for a single word, which is
given first on the line. Synonyms are separated by whitespace:
</para>
given first on the line. Synonyms are separated by whitespace, thus:
<programlisting>
word syn1 syn2 syn3
</programlisting>
</para>
</listitem>
<listitem>
<para>
Sharp ('#') sign is a comment delimiter. It may appear at any position
inside the line. The rest of the line will be skipped.
The sharp (<literal>#</>) sign is a comment delimiter. It may appear at
any position in a line. The rest of the line will be skipped.
</para>
</listitem>
</itemizedlist>
<para>
Look at xsyn_sample.rules, which is installed in $(prefix)/share/tsearch_data/,
for an example.
Look at <filename>xsyn_sample.rules</>, which is installed in
<filename>$SHAREDIR/tsearch_data/</>, for an example.
</para>
</sect2>
<sect2>
<title>Usage</title>
<programlisting>
mydb=# SELECT ts_lexize('xsyn','word');
ts_lexize
----------------
{word,syn1,syn2,syn3)
</programlisting>
<para>
Change dictionary options:
</para>
<programlisting>
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (KEEPORIG=false);
Running the installation script creates a text search template
<literal>xsyn_template</> and a dictionary <literal>xsyn</>
based on it, with default parameters. You can alter the
parameters, for example
<programlisting>
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='my_rules', KEEPORIG=false);
ALTER TEXT SEARCH DICTIONARY
</programlisting>
</programlisting>
or create new dictionaries based on the template.
</para>
<para>
To test the dictionary, you can try
<programlisting>
mydb=# SELECT ts_lexize('xsyn', 'word');
ts_lexize
-----------------------
{word,syn1,syn2,syn3}
</programlisting>
but real-world usage will involve including it in a text search
configuration as described in <xref linkend="textsearch">.
That might look like this:
<programlisting>
ALTER TEXT SEARCH CONFIGURATION english
ALTER MAPPING FOR word, asciiword WITH xsyn, english_stem;
</programlisting>
</para>
</sect2>
</sect1>

View File

@ -1,3 +1,5 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/earthdistance.sgml,v 1.3 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="earthdistance">
<title>earthdistance</title>
@ -6,128 +8,184 @@
</indexterm>
<para>
This module contains two different approaches to calculating
great circle distances on the surface of the Earth. The one described
first depends on the contrib/cube package (which MUST be installed before
earthdistance is installed). The second one is based on the point
datatype using latitude and longitude for the coordinates. The install
script makes the defined functions executable by anyone.
</para>
<para>
A spherical model of the Earth is used.
</para>
<para>
Data is stored in cubes that are points (both corners are the same) using 3
coordinates representing the distance from the center of the Earth.
</para>
<para>
The radius of the Earth is obtained from the earth() function. It is
given in meters. But by changing this one function you can change it
to use some other units or to use a different value of the radius
that you feel is more appropiate.
</para>
<para>
This package also has applications to astronomical databases as well.
Astronomers will probably want to change earth() to return a radius of
180/pi() so that distances are in degrees.
</para>
<para>
Functions are provided to allow for input in latitude and longitude (in
degrees), to allow for output of latitude and longitude, to calculate
the great circle distance between two points and to easily specify a
bounding box usable for index searches.
</para>
<para>
The functions are all 'sql' functions. If you want to make these functions
executable by other people you will also have to make the referenced
cube functions executable. cube(text), cube(float8), cube(cube,float8),
cube_distance(cube,cube), cube_ll_coord(cube,int) and
cube_enlarge(cube,float8,int) are used indirectly by the earth distance
functions. is_point(cube) and cube_dim(cube) are used in constraints for data
in domain earth. cube_ur_coord(cube,int) is used in the regression tests and
might be useful for looking at bounding box coordinates in user applications.
</para>
<para>
A domain of type cube named earth is defined.
There are constraints on it defined to make sure the cube is a point,
that it does not have more than 3 dimensions and that it is very near
the surface of a sphere centered about the origin with the radius of
the Earth.
</para>
<para>
The following functions are provided:
The <filename>earthdistance</> module provides two different approaches to
calculating great circle distances on the surface of the Earth. The one
described first depends on the <filename>cube</> package (which
<emphasis>must</> be installed before <filename>earthdistance</> can be
installed). The second one is based on the built-in <type>point</> datatype,
using longitude and latitude for the coordinates.
</para>
<table id="earthdistance-functions">
<title>EarthDistance functions</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>earth()</literal></entry>
<entry>returns the radius of the Earth in meters.</entry>
</row>
<row>
<entry><literal>sec_to_gc(float8)</literal></entry>
<entry>converts the normal straight line
(secant) distance between between two points on the surface of the Earth
to the great circle distance between them.
</entry>
</row>
<row>
<entry><literal>gc_to_sec(float8)</literal></entry>
<entry>Converts the great circle distance
between two points on the surface of the Earth to the normal straight line
(secant) distance between them.
</entry>
</row>
<row>
<entry><literal>ll_to_earth(float8, float8)</literal></entry>
<entry>Returns the location of a point on the surface of the Earth given
its latitude (argument 1) and longitude (argument 2) in degrees.
</entry>
</row>
<row>
<entry><literal>latitude(earth)</literal></entry>
<entry>Returns the latitude in degrees of a point on the surface of the
Earth.
</entry>
</row>
<row>
<entry><literal>longitude(earth)</literal></entry>
<entry>Returns the longitude in degrees of a point on the surface of the
Earth.
</entry>
</row>
<row>
<entry><literal>earth_distance(earth, earth)</literal></entry>
<entry>Returns the great circle distance between two points on the
surface of the Earth.
</entry>
</row>
<row>
<entry><literal>earth_box(earth, float8)</literal></entry>
<entry>Returns a box suitable for an indexed search using the cube @>
operator for points within a given great circle distance of a location.
Some points in this box are further than the specified great circle
distance from the location so a second check using earth_distance
should be made at the same time.
</entry>
</row>
<row>
<entry><literal>&lt;@&gt;</literal> operator</entry>
<entry>gives the distance in statute miles between
two points on the Earth's surface. Coordinates are in degrees. Points are
taken as (longitude, latitude) and not vice versa as longitude is closer
to the intuitive idea of x-axis and latitude to y-axis.
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
One advantage of using cube representation over a point using latitude and
longitude for coordinates, is that you don't have to worry about special
conditions at +/- 180 degrees of longitude or near the poles.
In this module, the Earth is assumed to be perfectly spherical.
(If that's too inaccurate for you, you might want to look at the
<application><ulink url="http://www.postgis.org/">PostGIS</ulink></>
project.)
</para>
<sect2>
<title>Cube-based earth distances</title>
<para>
Data is stored in cubes that are points (both corners are the same) using 3
coordinates representing the x, y, and z distance from the center of the
Earth. A domain <type>earth</> over <type>cube</> is provided, which
includes constraint checks that the value meets these restrictions and
is reasonably close to the actual surface of the Earth.
</para>
<para>
The radius of the Earth is obtained from the <function>earth()</>
function. It is given in meters. But by changing this one function you can
change the module to use some other units, or to use a different value of
the radius that you feel is more appropiate.
</para>
<para>
This package has applications to astronomical databases as well.
Astronomers will probably want to change <function>earth()</> to return a
radius of <literal>180/pi()</> so that distances are in degrees.
</para>
<para>
Functions are provided to support input in latitude and longitude (in
degrees), to support output of latitude and longitude, to calculate
the great circle distance between two points and to easily specify a
bounding box usable for index searches.
</para>
<para>
The following functions are provided:
</para>
<table id="earthdistance-cube-functions">
<title>Cube-based earthdistance functions</title>
<tgroup cols="3">
<thead>
<row>
<entry>Function</entry>
<entry>Returns</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><function>earth()</function></entry>
<entry><type>float8</type></entry>
<entry>Returns the assumed radius of the Earth.</entry>
</row>
<row>
<entry><function>sec_to_gc(float8)</function></entry>
<entry><type>float8</type></entry>
<entry>Converts the normal straight line
(secant) distance between between two points on the surface of the Earth
to the great circle distance between them.
</entry>
</row>
<row>
<entry><function>gc_to_sec(float8)</function></entry>
<entry><type>float8</type></entry>
<entry>Converts the great circle distance between two points on the
surface of the Earth to the normal straight line (secant) distance
between them.
</entry>
</row>
<row>
<entry><function>ll_to_earth(float8, float8)</function></entry>
<entry><type>earth</type></entry>
<entry>Returns the location of a point on the surface of the Earth given
its latitude (argument 1) and longitude (argument 2) in degrees.
</entry>
</row>
<row>
<entry><function>latitude(earth)</function></entry>
<entry><type>float8</type></entry>
<entry>Returns the latitude in degrees of a point on the surface of the
Earth.
</entry>
</row>
<row>
<entry><function>longitude(earth)</function></entry>
<entry><type>float8</type></entry>
<entry>Returns the longitude in degrees of a point on the surface of the
Earth.
</entry>
</row>
<row>
<entry><function>earth_distance(earth, earth)</function></entry>
<entry><type>float8</type></entry>
<entry>Returns the great circle distance between two points on the
surface of the Earth.
</entry>
</row>
<row>
<entry><function>earth_box(earth, float8)</function></entry>
<entry><type>cube</type></entry>
<entry>Returns a box suitable for an indexed search using the cube
<literal>@&gt;</>
operator for points within a given great circle distance of a location.
Some points in this box are further than the specified great circle
distance from the location, so a second check using
<function>earth_distance</> should be included in the query.
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Point-based earth distances</title>
<para>
The second part of the module relies on representing Earth locations as
values of type <type>point</>, in which the first component is taken to
represent longitude in degrees, and the second component is taken to
represent latitude in degrees. Points are taken as (longitude, latitude)
and not vice versa because longitude is closer to the intuitive idea of
x-axis and latitude to y-axis.
</para>
<para>
A single operator is provided:
</para>
<table id="earthdistance-point-operators">
<title>Point-based earthdistance operators</title>
<tgroup cols="3">
<thead>
<row>
<entry>Operator</entry>
<entry>Returns</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><type>point</> <literal>&lt;@&gt;</literal> <type>point</></entry>
<entry><type>float8</type></entry>
<entry>Gives the distance in statute miles between
two points on the Earth's surface.
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
Note that unlike the <type>cube</>-based part of the module, units
are hardwired here: changing the <function>earth()</> function will
not affect the results of this operator.
</para>
<para>
One disadvantage of the longitude/latitude representation is that
you need to be careful about the edge conditions near the poles
and near +/- 180 degrees of longitude. The <type>cube</>-based
representation avoids these discontinuities.
</para>
</sect2>
</sect1>

View File

@ -1,30 +1,51 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/fuzzystrmatch.sgml,v 1.3 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="fuzzystrmatch">
<title>fuzzystrmatch</title>
<indexterm zone="fuzzystrmatch">
<primary>fuzzystrmatch</primary>
</indexterm>
<para>
This section describes the fuzzystrmatch module which provides different
The <filename>fuzzystrmatch</> module provides several
functions to determine similarities and distance between strings.
</para>
<sect2>
<title>Soundex</title>
<para>
The Soundex system is a method of matching similar sounding names
(or any words) to the same code. It was initially used by the
United States Census in 1880, 1900, and 1910, but it has little use
beyond English names (or the English pronunciation of names), and
it is not a linguistic tool.
The Soundex system is a method of matching similar-sounding names
by converting them to the same code. It was initially used by the
United States Census in 1880, 1900, and 1910. Note that Soundex
is not very useful for non-English names.
</para>
<para>
When comparing two soundex values to determine similarity, the
difference function reports how close the match is on a scale
from zero to four, with zero being no match and four being an
exact match.
The <filename>fuzzystrmatch</> module provides two functions
for working with Soundex codes:
</para>
<programlisting>
soundex(text) returns text
difference(text, text) returns int
</programlisting>
<para>
The following are some usage examples:
The <function>soundex</> function converts a string to its Soundex code.
The <function>difference</> function converts two strings to their Soundex
codes and then reports the number of matching code positions. Since
Soundex codes have four characters, the result ranges from zero to four,
with zero being no match and four being an exact match. (Thus, the
function is misnamed &mdash; <function>similarity</> would have been
a better name.)
</para>
<para>
Here are some usage examples:
</para>
<programlisting>
SELECT soundex('hello world!');
@ -41,81 +62,106 @@ INSERT INTO s VALUES ('jack');
SELECT * FROM s WHERE soundex(nm) = soundex('john');
SELECT a.nm, b.nm FROM s a, s b WHERE soundex(a.nm) = soundex(b.nm) AND a.oid &lt;&gt; b.oid;
CREATE FUNCTION text_sx_eq(text, text) RETURNS boolean AS
'select soundex($1) = soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_lt(text, text) RETURNS boolean AS
'select soundex($1) &lt; soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_gt(text, text) RETURNS boolean AS
'select soundex($1) &gt; soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_le(text, text) RETURNS boolean AS
'select soundex($1) &lt;= soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_ge(text, text) RETURNS boolean AS
'select soundex($1) &gt;= soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_ne(text, text) RETURNS boolean AS
'select soundex($1) &lt;&gt; soundex($2)'
LANGUAGE SQL;
DROP OPERATOR #= (text, text);
CREATE OPERATOR #= (leftarg=text, rightarg=text, procedure=text_sx_eq, commutator = #=);
SELECT * FROM s WHERE text_sx_eq(nm, 'john');
SELECT * FROM s WHERE s.nm #= 'john';
SELECT * FROM s WHERE difference(s.nm, 'john') &gt; 2;
</programlisting>
</sect2>
<sect2>
<title>levenshtein</title>
<title>Levenshtein</title>
<para>
This function calculates the levenshtein distance between two strings:
This function calculates the Levenshtein distance between two strings:
</para>
<programlisting>
int levenshtein(text source, text target)
levenshtein(text source, text target) returns int
</programlisting>
<para>
Both <literal>source</literal> and <literal>target</literal> can be any
NOT NULL string with a maximum of 255 characters.
non-null string, with a maximum of 255 characters.
</para>
<para>
Example:
</para>
<programlisting>
SELECT levenshtein('GUMBO','GAMBOL');
test=# SELECT levenshtein('GUMBO', 'GAMBOL');
levenshtein
-------------
2
(1 row)
</programlisting>
</sect2>
<sect2>
<title>metaphone</title>
<title>Metaphone</title>
<para>
This function calculates and returns the metaphone code of an input string:
Metaphone, like Soundex, is based on the idea of constructing a
representative code for an input string. Two strings are then
deemed similar if they have the same codes.
</para>
<programlisting>
text metahpone(text source, int max_output_length)
</programlisting>
<para>
<literal>source</literal> has to be a NOT NULL string with a maximum of
255 characters. <literal>max_output_length</literal> fixes the maximum
This function calculates the metaphone code of an input string:
</para>
<programlisting>
metaphone(text source, int max_output_length) returns text
</programlisting>
<para>
<literal>source</literal> has to be a non-null string with a maximum of
255 characters. <literal>max_output_length</literal> sets the maximum
length of the output metaphone code; if longer, the output is truncated
to this length.
</para>
<para>Example</para>
<para>
Example:
</para>
<programlisting>
SELECT metaphone('GUMBO',4);
test=# SELECT metaphone('GUMBO', 4);
metaphone
-----------
KM
(1 row)
</programlisting>
</sect2>
<sect2>
<title>Double Metaphone</title>
<para>
The Double Metaphone system computes two <quote>sounds like</> strings
for a given input string &mdash; a <quote>primary</> and an
<quote>alternate</>. In most cases they are the same, but for non-English
names especially they can be a bit different, depending on pronunciation.
These functions compute the primary and alternate codes:
</para>
<programlisting>
dmetaphone(text source) returns text
dmetaphone_alt(text source) returns text
</programlisting>
<para>
There is no length limit on the input strings.
</para>
<para>
Example:
</para>
<programlisting>
test=# select dmetaphone('gumbo');
dmetaphone
------------
KMP
(1 row)
</programlisting>
</sect2>

View File

@ -1,3 +1,5 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/hstore.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="hstore">
<title>hstore</title>
@ -6,224 +8,234 @@
</indexterm>
<para>
The <literal>hstore</literal> module is usefull for storing (key,value) pairs.
This module can be useful in different scenarios: case with many attributes
rarely searched, semistructural data or a lazy DBA.
This module implements a data type <type>hstore</> for storing sets of
(key,value) pairs within a single <productname>PostgreSQL</> data field.
This can be useful in various scenarios, such as rows with many attributes
that are rarely examined, or semi-structured data.
</para>
<sect2>
<title>Operations</title>
<itemizedlist>
<listitem>
<para>
<literal>hstore -> text</literal> - get value , perl analogy $h{key}
</para>
<programlisting>
select 'a=>q, b=>g'->'a';
?
------
q
</programlisting>
<para>
Note the use of parenthesis in the select below, because priority of 'is' is
higher than that of '->':
</para>
<programlisting>
SELECT id FROM entrants WHERE (info->'education_period') IS NOT NULL;
</programlisting>
</listitem>
<title><type>hstore</> External Representation</title>
<listitem>
<para>
<literal>hstore || hstore</literal> - concatenation, perl analogy %a=( %b, %c );
</para>
<programlisting>
regression=# select 'a=>b'::hstore || 'c=>d'::hstore;
?column?
--------------------
"a"=>"b", "c"=>"d"
(1 row)
</programlisting>
<para>
The text representation of an <type>hstore</> value includes zero
or more <replaceable>key</> <literal>=&gt;</> <replaceable>value</>
items, separated by commas. For example:
<para>
but, notice
</para>
<programlisting>
k => v
foo => bar, baz => whatever
"1-a" => "anything at all"
</programlisting>
<programlisting>
regression=# select 'a=>b'::hstore || 'a=>d'::hstore;
?column?
----------
"a"=>"d"
(1 row)
</programlisting>
</listitem>
The order of the items is not considered significant (and may not be
reproduced on output). Whitespace between items or around the
<literal>=&gt;</> sign is ignored. Use double quotes if a key or
value includes whitespace, comma, <literal>=</> or <literal>&gt;</>.
To include a double quote or a backslash in a key or value, precede
it with another backslash. (Keep in mind that depending on the
setting of <varname>standard_conforming_strings</>, you may need to
double backslashes in SQL literal strings.)
</para>
<listitem>
<para>
<literal>text => text</literal> - creates hstore type from two text strings
</para>
<programlisting>
select 'a'=>'b';
?column?
----------
"a"=>"b"
</programlisting>
</listitem>
<para>
A value (but not a key) can be a SQL NULL. This is represented as
<listitem>
<para>
<literal>hstore @> hstore</literal> - contains operation, check if left operand contains right.
</para>
<programlisting>
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'a=>c';
?column?
----------
f
(1 row)
<programlisting>
key => NULL
</programlisting>
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'b=>1';
?column?
----------
t
(1 row)
</programlisting>
</listitem>
The <literal>NULL</> keyword is not case-sensitive. Again, use
double quotes if you want the string <literal>null</> to be treated
as an ordinary data value.
</para>
<para>
Currently, double quotes are always used to surround key and value
strings on output, even when this is not strictly necessary.
</para>
<listitem>
<para>
<literal>hstore &lt;@ hstore</literal> - contained operation, check if
left operand is contained in right
</para>
<para>
(Before PostgreSQL 8.2, the containment operators @&gt; and &lt;@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Functions</title>
<title><type>hstore</> Operators and Functions</title>
<itemizedlist>
<listitem>
<para>
<literal>akeys(hstore)</literal> - returns all keys from hstore as array
</para>
<programlisting>
regression=# select akeys('a=>1,b=>2');
akeys
-------
{a,b}
</programlisting>
</listitem>
<table id="hstore-op-table">
<title><type>hstore</> Operators</title>
<listitem>
<para>
<literal>skeys(hstore)</literal> - returns all keys from hstore as strings
</para>
<programlisting>
regression=# select skeys('a=>1,b=>2');
skeys
-------
a
b
</programlisting>
</listitem>
<tgroup cols="4">
<thead>
<row>
<entry>Operator</entry>
<entry>Description</entry>
<entry>Example</entry>
<entry>Result</entry>
</row>
</thead>
<listitem>
<para>
<literal>avals(hstore)</literal> - returns all values from hstore as array
</para>
<programlisting>
regression=# select avals('a=>1,b=>2');
avals
-------
{1,2}
</programlisting>
</listitem>
<tbody>
<row>
<entry><type>hstore</> <literal>-&gt;</> <type>text</></entry>
<entry>get value for key (null if not present)</entry>
<entry><literal>'a=&gt;x, b=&gt;y'::hstore -&gt; 'a'</literal></entry>
<entry><literal>x</literal></entry>
</row>
<listitem>
<para>
<literal>svals(hstore)</literal> - returns all values from hstore as
strings
</para>
<programlisting>
regression=# select svals('a=>1,b=>2');
svals
-------
1
2
</programlisting>
</listitem>
<row>
<entry><type>text</> <literal>=&gt;</> <type>text</></entry>
<entry>make single-item <type>hstore</></entry>
<entry><literal>'a' =&gt; 'b'</literal></entry>
<entry><literal>"a"=&gt;"b"</literal></entry>
</row>
<listitem>
<para>
<literal>delete (hstore,text)</literal> - delete (key,value) from hstore if
key matches argument.
</para>
<programlisting>
regression=# select delete('a=>1,b=>2','b');
delete
----------
"a"=>"1"
</programlisting>
</listitem>
<row>
<entry><type>hstore</> <literal>||</> <type>hstore</></entry>
<entry>concatenation</entry>
<entry><literal>'a=&gt;b, c=&gt;d'::hstore || 'c=&gt;x, d=&gt;q'::hstore</literal></entry>
<entry><literal>"a"=&gt;"b", "c"=&gt;"x", "d"=&gt;"q"</literal></entry>
</row>
<listitem>
<para>
<literal>each(hstore)</literal> - return (key, value) pairs
</para>
<programlisting>
regression=# select * from each('a=>1,b=>2');
<row>
<entry><type>hstore</> <literal>?</> <type>text</></entry>
<entry>does <type>hstore</> contain key?</entry>
<entry><literal>'a=&gt;1'::hstore ? 'a'</literal></entry>
<entry><literal>t</literal></entry>
</row>
<row>
<entry><type>hstore</> <literal>@&gt;</> <type>hstore</></entry>
<entry>does left operand contain right?</entry>
<entry><literal>'a=&gt;b, b=&gt;1, c=&gt;NULL'::hstore @&gt; 'b=&gt;1'</literal></entry>
<entry><literal>t</literal></entry>
</row>
<row>
<entry><type>hstore</> <literal>&lt;@</> <type>hstore</></entry>
<entry>is left operand contained in right?</entry>
<entry><literal>'a=&gt;c'::hstore &lt;@ 'a=&gt;b, b=&gt;1, c=&gt;NULL'</literal></entry>
<entry><literal>f</literal></entry>
</row>
</tbody>
</tgroup>
</table>
<para>
(Before PostgreSQL 8.2, the containment operators @&gt; and &lt;@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
<table id="hstore-func-table">
<title><type>hstore</> Functions</title>
<tgroup cols="5">
<thead>
<row>
<entry>Function</entry>
<entry>Return Type</entry>
<entry>Description</entry>
<entry>Example</entry>
<entry>Result</entry>
</row>
</thead>
<tbody>
<row>
<entry><function>akeys(hstore)</function></entry>
<entry><type>text[]</type></entry>
<entry>get <type>hstore</>'s keys as array</entry>
<entry><literal>akeys('a=&gt;1,b=&gt;2')</literal></entry>
<entry><literal>{a,b}</literal></entry>
</row>
<row>
<entry><function>skeys(hstore)</function></entry>
<entry><type>setof text</type></entry>
<entry>get <type>hstore</>'s keys as set</entry>
<entry><literal>skeys('a=&gt;1,b=&gt;2')</literal></entry>
<entry>
<programlisting>
a
b
</programlisting></entry>
</row>
<row>
<entry><function>avals(hstore)</function></entry>
<entry><type>text[]</type></entry>
<entry>get <type>hstore</>'s values as array</entry>
<entry><literal>avals('a=&gt;1,b=&gt;2')</literal></entry>
<entry><literal>{1,2}</literal></entry>
</row>
<row>
<entry><function>svals(hstore)</function></entry>
<entry><type>setof text</type></entry>
<entry>get <type>hstore</>'s values as set</entry>
<entry><literal>svals('a=&gt;1,b=&gt;2')</literal></entry>
<entry>
<programlisting>
1
2
</programlisting></entry>
</row>
<row>
<entry><function>each(hstore)</function></entry>
<entry><type>setof (key text, value text)</type></entry>
<entry>get <type>hstore</>'s keys and values as set</entry>
<entry><literal>select * from each('a=&gt;1,b=&gt;2')</literal></entry>
<entry>
<programlisting>
key | value
-----+-------
a | 1
b | 2
</programlisting>
</listitem>
</programlisting></entry>
</row>
<listitem>
<para>
<literal>exist (hstore,text)</literal>
</para>
<para>
<literal>hstore ? text</literal> - returns 'true if key is exists in hstore
and false otherwise.
</para>
<programlisting>
regression=# select exist('a=>1','a'), 'a=>1' ? 'a';
exist | ?column?
-------+----------
t | t
</programlisting>
</listitem>
<row>
<entry><function>exist(hstore,text)</function></entry>
<entry><type>boolean</type></entry>
<entry>does <type>hstore</> contain key?</entry>
<entry><literal>exist('a=&gt;1','a')</literal></entry>
<entry><literal>t</literal></entry>
</row>
<listitem>
<para>
<literal>defined (hstore,text)</literal> - returns true if key is exists in
hstore and its value is not NULL.
</para>
<programlisting>
regression=# select defined('a=>NULL','a');
defined
---------
f
</programlisting>
</listitem>
</itemizedlist>
<row>
<entry><function>defined(hstore,text)</function></entry>
<entry><type>boolean</type></entry>
<entry>does <type>hstore</> contain non-null value for key?</entry>
<entry><literal>defined('a=&gt;NULL','a')</literal></entry>
<entry><literal>f</literal></entry>
</row>
<row>
<entry><function>delete(hstore,text)</function></entry>
<entry><type>hstore</type></entry>
<entry>delete any item matching key</entry>
<entry><literal>delete('a=&gt;1,b=&gt;2','b')</literal></entry>
<entry><literal>"a"=>"1"</literal></entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Indices</title>
<title>Indexes</title>
<para>
Module provides index support for '@>' and '?' operations.
<type>hstore</> has index support for <literal>@&gt;</> and <literal>?</>
operators. You can use either GiST or GIN index types. For example:
</para>
<programlisting>
CREATE INDEX hidx ON testhstore USING GIST(h);
CREATE INDEX hidx ON testhstore USING GIN(h);
</programlisting>
</sect2>
@ -232,44 +244,52 @@ CREATE INDEX hidx ON testhstore USING GIN(h);
<title>Examples</title>
<para>
Add a key:
Add a key, or update an existing key with a new value:
</para>
<programlisting>
UPDATE tt SET h=h||'c=>3';
UPDATE tab SET h = h || ('c' => '3');
</programlisting>
<para>
Delete a key:
</para>
<programlisting>
UPDATE tt SET h=delete(h,'k1');
UPDATE tab SET h = delete(h, 'k1');
</programlisting>
</sect2>
<sect2>
<title>Statistics</title>
<para>
hstore type, because of its intrinsic liberality, could contain a lot of
different keys. Checking for valid keys is the task of application.
Examples below demonstrate several techniques how to check keys statistics.
The <type>hstore</> type, because of its intrinsic liberality, could
contain a lot of different keys. Checking for valid keys is the task of the
application. Examples below demonstrate several techniques for checking
keys and obtaining statistics.
</para>
<para>
Simple example
Simple example:
</para>
<programlisting>
SELECT * FROM each('aaa=>bq, b=>NULL, ""=>1 ');
SELECT * FROM each('aaa=>bq, b=>NULL, ""=>1');
</programlisting>
<para>
Using table
Using a table:
</para>
<programlisting>
SELECT (each(h)).key, (each(h)).value INTO stat FROM testhstore ;
SELECT (each(h)).key, (each(h)).value INTO stat FROM testhstore;
</programlisting>
<para>Online stat</para>
<para>
Online statistics:
</para>
<programlisting>
SELECT key, count(*) FROM (SELECT (each(h)).key FROM testhstore) AS stat GROUP BY key ORDER BY count DESC, key;
SELECT key, count(*) FROM
(SELECT (each(h)).key FROM testhstore) AS stat
GROUP BY key
ORDER BY count DESC, key;
key | count
-----------+-------
line | 883
@ -287,12 +307,14 @@ SELECT key, count(*) FROM (SELECT (each(h)).key FROM testhstore) AS stat GROUP B
<sect2>
<title>Authors</title>
<para>
Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia
</para>
<para>
Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia
Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd., Russia
</para>
</sect2>
</sect1>
</sect1>

View File

@ -1,3 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/lo.sgml,v 1.3 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="lo">
<title>lo</title>
@ -7,112 +8,119 @@
</indexterm>
<para>
PostgreSQL type extension for managing Large Objects
The <filename>lo</> module provides support for managing Large Objects
(also called LOs or BLOBs). This includes a data type <type>lo</>
and a trigger <function>lo_manage</>.
</para>
<sect2>
<title>Overview</title>
<title>Rationale</title>
<para>
One of the problems with the JDBC driver (and this affects the ODBC driver
also), is that the specification assumes that references to BLOBS (Binary
Large OBjectS) are stored within a table, and if that entry is changed, the
also), is that the specification assumes that references to BLOBs (Binary
Large OBjects) are stored within a table, and if that entry is changed, the
associated BLOB is deleted from the database.
</para>
<para>
As PostgreSQL stands, this doesn't occur. Large objects are treated as
objects in their own right; a table entry can reference a large object by
OID, but there can be multiple table entries referencing the same large
object OID, so the system doesn't delete the large object just because you
change or remove one such entry.
As <productname>PostgreSQL</> stands, this doesn't occur. Large objects
are treated as objects in their own right; a table entry can reference a
large object by OID, but there can be multiple table entries referencing
the same large object OID, so the system doesn't delete the large object
just because you change or remove one such entry.
</para>
<para>
Now this is fine for new PostgreSQL-specific applications, but existing ones
using JDBC or ODBC won't delete the objects, resulting in orphaning - objects
that are not referenced by anything, and simply occupy disk space.
Now this is fine for <productname>PostgreSQL</>-specific applications, but
standard code using JDBC or ODBC won't delete the objects, resulting in
orphan objects &mdash; objects that are not referenced by anything, and
simply occupy disk space.
</para>
<para>
The <filename>lo</> module allows fixing this by attaching a trigger
to tables that contain LO reference columns. The trigger essentially just
does a <function>lo_unlink</> whenever you delete or modify a value
referencing a large object. When you use this trigger, you are assuming
that there is only one database reference to any large object that is
referenced in a trigger-controlled column!
</para>
<para>
The module also provides a data type <type>lo</>, which is really just
a domain of the <type>oid</> type. This is useful for differentiating
database columns that hold large object references from those that are
OIDs of other things. You don't have to use the <type>lo</> type to
use the trigger, but it may be convenient to use it to keep track of which
columns in your database represent large objects that you are managing with
the trigger. It is also rumored that the ODBC driver gets confused if you
don't use <type>lo</> for BLOB columns.
</para>
</sect2>
<sect2>
<title>The Fix</title>
<para>
I've fixed this by creating a new data type 'lo', some support functions, and
a Trigger which handles the orphaning problem. The trigger essentially just
does a 'lo_unlink' whenever you delete or modify a value referencing a large
object. When you use this trigger, you are assuming that there is only one
database reference to any large object that is referenced in a
trigger-controlled column!
</para>
<para>
The 'lo' type was created because we needed to differentiate between plain
OIDs and Large Objects. Currently the JDBC driver handles this dilemma easily,
but (after talking to Byron), the ODBC driver needed a unique type. They had
created an 'lo' type, but not the solution to orphaning.
</para>
<para>
You don't actually have to use the 'lo' type to use the trigger, but it may be
convenient to use it to keep track of which columns in your database represent
large objects that you are managing with the trigger.
</para>
</sect2>
<title>How to Use It</title>
<sect2>
<title>How to Use</title>
<para>
The easiest way is by an example:
Here's a simple example of usage:
</para>
<programlisting>
CREATE TABLE image (title TEXT, raster lo);
CREATE TRIGGER t_raster BEFORE UPDATE OR DELETE ON image
FOR EACH ROW EXECUTE PROCEDURE lo_manage(raster);
</programlisting>
<para>
Create a trigger for each column that contains a lo type, and give the column
name as the trigger procedure argument. You can have more than one trigger on
a table if you need multiple lo columns in the same table, but don't forget to
give a different name to each trigger.
For each column that will contain unique references to large objects,
create a <literal>BEFORE UPDATE OR DELETE</> trigger, and give the column
name as the sole trigger argument. If you need multiple <type>lo</>
columns in the same table, create a separate trigger for each one,
remembering to give a different name to each trigger on the same table.
</para>
</sect2>
<sect2>
<title>Issues</title>
<title>Limitations</title>
<itemizedlist>
<listitem>
<para>
Dropping a table will still orphan any objects it contains, as the trigger
is not executed.
is not executed. You can avoid this by preceding the <command>DROP
TABLE</> with <command>DELETE FROM <replaceable>table</></command>.
</para>
<para>
Avoid this by preceding the 'drop table' with 'delete from {table}'.
<command>TRUNCATE</> has the same hazard.
</para>
<para>
If you already have, or suspect you have, orphaned large objects, see
the contrib/vacuumlo module to help you clean them up. It's a good idea
to run contrib/vacuumlo occasionally as a back-stop to the lo_manage
trigger.
If you already have, or suspect you have, orphaned large objects, see the
<filename>contrib/vacuumlo</> module (<xref linkend="vacuumlo">) to help
you clean them up. It's a good idea to run <application>vacuumlo</>
occasionally as a back-stop to the <function>lo_manage</> trigger.
</para>
</listitem>
<listitem>
<para>
Some frontends may create their own tables, and will not create the
associated trigger(s). Also, users may not remember (or know) to create
associated trigger(s). Also, users may not remember (or know) to create
the triggers.
</para>
</listitem>
</itemizedlist>
<para>
As the ODBC driver needs a permanent lo type (&amp; JDBC could be optimised to
use it if it's Oid is fixed), and as the above issues can only be fixed by
some internal changes, I feel it should become a permanent built-in type.
</para>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Mount <email>peter@retep.org.uk</email> June 13 1998
Peter Mount <email>peter@retep.org.uk</email>
</para>
</sect2>
</sect1>
</sect1>

View File

@ -1,3 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/seg.sgml,v 1.4 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="seg">
<title>seg</title>
@ -7,13 +8,15 @@
</indexterm>
<para>
The <literal>seg</literal> module contains the code for the user-defined
type, <literal>SEG</literal>, representing laboratory measurements as
floating point intervals.
This module implements a data type <type>seg</> for
representing line segments, or floating point intervals.
<type>seg</> can represent uncertainty in the interval endpoints,
making it especially useful for representing laboratory measurements.
</para>
<sect2>
<title>Rationale</title>
<para>
The geometry of measurements is usually more complex than that of a
point in a numeric continuum. A measurement is usually a segment of
@ -22,26 +25,28 @@
the value being measured may naturally be an interval indicating some
condition, such as the temperature range of stability of a protein.
</para>
<para>
Using just common sense, it appears more convenient to store such data
as intervals, rather than pairs of numbers. In practice, it even turns
out more efficient in most applications.
</para>
<para>
Further along the line of common sense, the fuzziness of the limits
suggests that the use of traditional numeric data types leads to a
certain loss of information. Consider this: your instrument reads
6.50, and you input this reading into the database. What do you get
when you fetch it? Watch:
</para>
<programlisting>
test=> select 6.50 as "pH";
test=> select 6.50 :: float8 as "pH";
pH
---
6.5
(1 row)
</programlisting>
<para>
In the world of measurements, 6.50 is not the same as 6.5. It may
sometimes be critically different. The experimenters usually write
down (and publish) the digits they trust. 6.50 is actually a fuzzy
@ -50,234 +55,171 @@ test=> select 6.50 as "pH";
share. We definitely do not want such different data items to appear the
same.
</para>
<para>
Conclusion? It is nice to have a special data type that can record the
limits of an interval with arbitrarily variable precision. Variable in
a sense that each data element records its own precision.
the sense that each data element records its own precision.
</para>
<para>
Check this out:
</para>
<programlisting>
<programlisting>
test=> select '6.25 .. 6.50'::seg as "pH";
pH
------------
6.25 .. 6.50
(1 row)
</programlisting>
</programlisting>
</para>
</sect2>
<sect2>
<title>Syntax</title>
<para>
The external representation of an interval is formed using one or two
floating point numbers joined by the range operator ('..' or '...').
Optional certainty indicators (&lt;, &gt; and ~) are ignored by the internal
logics, but are retained in the data.
floating point numbers joined by the range operator (<literal>..</literal>
or <literal>...</literal>). Alternatively, it can be specified as a
center point plus or minus a deviation.
Optional certainty indicators (<literal>&lt;</literal>,
<literal>&gt;</literal> and <literal>~</literal>) can be stored as well.
(Certainty indicators are ignored by all the built-in operators, however.)
</para>
<table>
<title>Rules</title>
<tgroup cols="2">
<tbody>
<row>
<entry>rule 1</entry>
<entry>seg -&gt; boundary PLUMIN deviation</entry>
</row>
<row>
<entry>rule 2</entry>
<entry>seg -&gt; boundary RANGE boundary</entry>
</row>
<row>
<entry>rule 3</entry>
<entry>seg -&gt; boundary RANGE</entry>
</row>
<row>
<entry>rule 4</entry>
<entry>seg -&gt; RANGE boundary</entry>
</row>
<row>
<entry>rule 5</entry>
<entry>seg -&gt; boundary</entry>
</row>
<row>
<entry>rule 6</entry>
<entry>boundary -&gt; FLOAT</entry>
</row>
<row>
<entry>rule 7</entry>
<entry>boundary -&gt; EXTENSION FLOAT</entry>
</row>
<row>
<entry>rule 8</entry>
<entry>deviation -&gt; FLOAT</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>Tokens</title>
<tgroup cols="2">
<tbody>
<row>
<entry>RANGE</entry>
<entry>(\.\.)(\.)?</entry>
</row>
<row>
<entry>PLUMIN</entry>
<entry>\'\+\-\'</entry>
</row>
<row>
<entry>integer</entry>
<entry>[+-]?[0-9]+</entry>
</row>
<row>
<entry>real</entry>
<entry>[+-]?[0-9]+\.[0-9]+</entry>
</row>
<row>
<entry>FLOAT</entry>
<entry>({integer}|{real})([eE]{integer})?</entry>
</row>
<row>
<entry>EXTENSION</entry>
<entry>[&lt;&gt;~]</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>Examples of valid <literal>SEG</literal> representations</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Any number</entry>
<entry>
(rules 5,6) -- creates a zero-length segment (a point,
if you will)
</entry>
</row>
<row>
<entry>~5.0</entry>
<entry>
(rules 5,7) -- creates a zero-length segment AND records
'~' in the data. This notation reads 'approximately 5.0',
but its meaning is not recognized by the code. It is ignored
until you get the value back. View it is a short-hand comment.
</entry>
</row>
<row>
<entry>&lt;5.0</entry>
<entry>
(rules 5,7) -- creates a point at 5.0; '&lt;' is ignored but
is preserved as a comment
</entry>
</row>
<row>
<entry>&gt;5.0</entry>
<entry>
(rules 5,7) -- creates a point at 5.0; '&gt;' is ignored but
is preserved as a comment
</entry>
</row>
<row>
<entry><para>5(+-)0.3</para><para>5'+-'0.3</para></entry>
<entry>
<para>
(rules 1,8) -- creates an interval '4.7..5.3'. As of this
writing (02/09/2000), this mechanism isn't completely accurate
in determining the number of significant digits for the
boundaries. For example, it adds an extra digit to the lower
boundary if the resulting interval includes a power of ten:
</para>
<programlisting>
postgres=> select '10(+-)1'::seg as seg;
seg
---------
9.0 .. 11 -- should be: 9 .. 11
</programlisting>
<para>
Also, the (+-) notation is not preserved: 'a(+-)b' will
always be returned as '(a-b) .. (a+b)'. The purpose of this
notation is to allow input from certain data sources without
conversion.
</para>
</entry>
</row>
<row>
<entry>50 .. </entry>
<entry>(rule 3) -- everything that is greater than or equal to 50</entry>
</row>
<row>
<entry>.. 0</entry>
<entry>(rule 4) -- everything that is less than or equal to 0</entry>
</row>
<row>
<entry>1.5e-2 .. 2E-2 </entry>
<entry>(rule 2) -- creates an interval (0.015 .. 0.02)</entry>
</row>
<row>
<entry>1 ... 2</entry>
<entry>
The same as 1...2, or 1 .. 2, or 1..2 (space is ignored).
Because of the widespread use of '...' in the data sources,
I decided to stick to is as a range operator. This, and
also the fact that the white space around the range operator
is ignored, creates a parsing conflict with numeric constants
starting with a decimal point.
</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>Examples</title>
<tgroup cols="2">
<tbody>
<row>
<entry>.1e7</entry>
<entry>should be: 0.1e7</entry>
</row>
<row>
<entry>.1 .. .2</entry>
<entry>should be: 0.1 .. 0.2</entry>
</row>
<row>
<entry>2.4 E4</entry>
<entry>should be: 2.4E4</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
The following, although it is not a syntax error, is disallowed to improve
the sanity of the data:
In the following table, <replaceable>x</>, <replaceable>y</>, and
<replaceable>delta</> denote
floating-point numbers. <replaceable>x</> and <replaceable>y</>, but
not <replaceable>delta</>, can be preceded by a certainty indicator:
</para>
<table>
<title></title>
<title><type>seg</> external representations</title>
<tgroup cols="2">
<tbody>
<row>
<entry>5 .. 2</entry>
<entry>should be: 2 .. 5</entry>
<entry><literal><replaceable>x</></literal></entry>
<entry>Single value (zero-length interval)
</entry>
</row>
<row>
<entry><literal><replaceable>x</> .. <replaceable>y</></literal></entry>
<entry>Interval from <replaceable>x</> to <replaceable>y</>
</entry>
</row>
<row>
<entry><literal><replaceable>x</> (+-) <replaceable>delta</></literal></entry>
<entry>Interval from <replaceable>x</> - <replaceable>delta</> to
<replaceable>x</> + <replaceable>delta</>
</entry>
</row>
<row>
<entry><literal><replaceable>x</> ..</literal></entry>
<entry>Open interval with lower bound <replaceable>x</>
</entry>
</row>
<row>
<entry><literal>.. <replaceable>x</></literal></entry>
<entry>Open interval with upper bound <replaceable>x</>
</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>Examples of valid <type>seg</> input</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>5.0</literal></entry>
<entry>
Creates a zero-length segment (a point, if you will)
</entry>
</row>
<row>
<entry><literal>~5.0</literal></entry>
<entry>
Creates a zero-length segment and records
<literal>~</> in the data. <literal>~</literal> is ignored
by <type>seg</> operations, but
is preserved as a comment.
</entry>
</row>
<row>
<entry><literal>&lt;5.0</literal></entry>
<entry>
Creates a point at 5.0. <literal>&lt;</literal> is ignored but
is preserved as a comment.
</entry>
</row>
<row>
<entry><literal>&gt;5.0</literal></entry>
<entry>
Creates a point at 5.0. <literal>&gt;</literal> is ignored but
is preserved as a comment.
</entry>
</row>
<row>
<entry><literal>5(+-)0.3</literal></entry>
<entry>
Creates an interval <literal>4.7 .. 5.3</literal>.
Note that the <literal>(+-)</> notation isn't preserved.
</entry>
</row>
<row>
<entry><literal>50 .. </literal></entry>
<entry>Everything that is greater than or equal to 50</entry>
</row>
<row>
<entry><literal>.. 0</literal></entry>
<entry>Everything that is less than or equal to 0</entry>
</row>
<row>
<entry><literal>1.5e-2 .. 2E-2 </literal></entry>
<entry>Creates an interval <literal>0.015 .. 0.02</literal></entry>
</row>
<row>
<entry><literal>1 ... 2</literal></entry>
<entry>
The same as <literal>1...2</literal>, or <literal>1 .. 2</literal>,
or <literal>1..2</literal>
(spaces around the range operator are ignored)
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
Because <literal>...</> is widely used in data sources, it is allowed
as an alternative spelling of <literal>..</>. Unfortunately, this
creates a parsing ambiguity: it is not clear whether the upper bound
in <literal>0...23</> is meant to be <literal>23</> or <literal>0.23</>.
This is resolved by requiring at least one digit before the decimal
point in all numbers in <type>seg</> input.
</para>
<para>
As a sanity check, <type>seg</> rejects intervals with the lower bound
greater than the upper, for example <literal>5 .. 2</>.
</para>
</sect2>
<sect2>
<title>Precision</title>
<para>
The segments are stored internally as pairs of 32-bit floating point
numbers. It means that the numbers with more than 7 significant digits
<type>seg</> values are stored internally as pairs of 32-bit floating point
numbers. This means that numbers with more than 7 significant digits
will be truncated.
</para>
<para>
The numbers with less than or exactly 7 significant digits retain their
Numbers with 7 or fewer significant digits retain their
original precision. That is, if your query returns 0.00, you will be
sure that the trailing zeroes are not the artifacts of formatting: they
reflect the precision of the original data. The number of leading
@ -288,28 +230,20 @@ postgres=> select '10(+-)1'::seg as seg;
<sect2>
<title>Usage</title>
<para>
The access method for SEG is a GiST index (gist_seg_ops), which is a
generalization of R-tree. GiSTs allow the postgres implementation of
R-tree, originally encoded to support 2-D geometric types such as
boxes and polygons, to be used with any data type whose data domain
can be partitioned using the concepts of containment, intersection and
equality. In other words, everything that can intersect or contain
its own kind can be indexed with a GiST. That includes, among other
things, all geometric data types, regardless of their dimensionality
(see also contrib/cube).
</para>
<para>
The operators supported by the GiST access method include:
The <filename>seg</> module includes a GiST index operator class for
<type>seg</> values.
The operators supported by the GiST opclass include:
</para>
<itemizedlist>
<listitem>
<programlisting>
[a, b] &lt;&lt; [c, d] Is left of
</programlisting>
<para>
The left operand, [a, b], occurs entirely to the left of the
right operand, [c, d], on the axis (-inf, inf). It means,
[a, b] is entirely to the left of [c, d]. That is,
[a, b] &lt;&lt; [c, d] is true if b &lt; c and false otherwise
</para>
</listitem>
@ -318,8 +252,8 @@ postgres=> select '10(+-)1'::seg as seg;
[a, b] &gt;&gt; [c, d] Is right of
</programlisting>
<para>
[a, b] is occurs entirely to the right of [c, d].
[a, b] &gt;&gt; [c, d] is true if a &gt; d and false otherwise
[a, b] is entirely to the right of [c, d]. That is,
[a, b] &gt;&gt; [c, d] is true if a &gt; d and false otherwise
</para>
</listitem>
<listitem>
@ -327,8 +261,8 @@ postgres=> select '10(+-)1'::seg as seg;
[a, b] &amp;&lt; [c, d] Overlaps or is left of
</programlisting>
<para>
This might be better read as "does not extend to right of".
It is true when b &lt;= d.
This might be better read as <quote>does not extend to right of</quote>.
It is true when b &lt;= d.
</para>
</listitem>
<listitem>
@ -336,17 +270,16 @@ postgres=> select '10(+-)1'::seg as seg;
[a, b] &amp;&gt; [c, d] Overlaps or is right of
</programlisting>
<para>
This might be better read as "does not extend to left of".
It is true when a &gt;= c.
This might be better read as <quote>does not extend to left of</quote>.
It is true when a &gt;= c.
</para>
</listitem>
<listitem>
<programlisting>
[a, b] = [c, d] Same as
[a, b] = [c, d] Same as
</programlisting>
<para>
The segments [a, b] and [c, d] are identical, that is, a == b
and c == d
The segments [a, b] and [c, d] are identical, that is, a = c and b = d
</para>
</listitem>
<listitem>
@ -354,28 +287,29 @@ postgres=> select '10(+-)1'::seg as seg;
[a, b] &amp;&amp; [c, d] Overlaps
</programlisting>
<para>
The segments [a, b] and [c, d] overlap.
The segments [a, b] and [c, d] overlap.
</para>
</listitem>
<listitem>
<programlisting>
[a, b] @&gt; [c, d] Contains
[a, b] @&gt; [c, d] Contains
</programlisting>
<para>
The segment [a, b] contains the segment [c, d], that is,
a &lt;= c and b &gt;= d
The segment [a, b] contains the segment [c, d], that is,
a &lt;= c and b &gt;= d
</para>
</listitem>
<listitem>
<programlisting>
[a, b] &lt;@ [c, d] Contained in
[a, b] &lt;@ [c, d] Contained in
</programlisting>
<para>
The segment [a, b] is contained in [c, d], that is,
a &gt;= c and b &lt;= d
The segment [a, b] is contained in [c, d], that is,
a &gt;= c and b &lt;= d
</para>
</listitem>
</itemizedlist>
<para>
(Before PostgreSQL 8.2, the containment operators @&gt; and &lt;@ were
respectively called @ and ~. These names are still available, but are
@ -383,68 +317,70 @@ postgres=> select '10(+-)1'::seg as seg;
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
<para>
Although the mnemonics of the following operators is questionable, I
preserved them to maintain visual consistency with other geometric
data types defined in Postgres.
</para>
<para>
Other operators:
</para>
The standard B-tree operators are also provided, for example
<programlisting>
[a, b] &lt; [c, d] Less than
[a, b] &gt; [c, d] Greater than
</programlisting>
<para>
These operators do not make a lot of sense for any practical
purpose but sorting. These operators first compare (a) to (c),
and if these are equal, compare (b) to (d). That accounts for
and if these are equal, compare (b) to (d). That results in
reasonably good sorting in most cases, which is useful if
you want to use ORDER BY with this type
you want to use ORDER BY with this type.
</para>
</sect2>
<sect2>
<title>Notes</title>
<para>
For examples of usage, see the regression test <filename>sql/seg.sql</>.
</para>
<para>
There are a few other potentially useful functions defined in seg.c
that vanished from the schema because I stopped using them. Some of
these were meant to support type casting. Let me know if I was wrong:
I will then add them back to the schema. I would also appreciate
other ideas that would enhance the type and make it more useful.
The mechanism that converts <literal>(+-)</> to regular ranges
isn't completely accurate in determining the number of significant digits
for the boundaries. For example, it adds an extra digit to the lower
boundary if the resulting interval includes a power of ten:
<programlisting>
postgres=> select '10(+-)1'::seg as seg;
seg
---------
9.0 .. 11 -- should be: 9 .. 11
</programlisting>
</para>
<para>
For examples of usage, see sql/seg.sql
</para>
<para>
NOTE: The performance of an R-tree index can largely depend on the
The performance of an R-tree index can largely depend on the initial
order of input values. It may be very helpful to sort the input table
on the SEG column (see the script sort-segments.pl for an example)
on the <type>seg</> column; see the script <filename>sort-segments.pl</>
for an example.
</para>
</sect2>
<sect2>
<title>Credits</title>
<para>
Original author: Gene Selkov, Jr. <email>selkovjr@mcs.anl.gov</email>,
Mathematics and Computer Science Division, Argonne National Laboratory.
</para>
<para>
My thanks are primarily to Prof. Joe Hellerstein
(<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am
also grateful to all postgres developers, present and past, for enabling
also grateful to all Postgres developers, present and past, for enabling
myself to create my own world and live undisturbed in it. And I would like
to acknowledge my gratitude to Argonne Lab and to the U.S. Department of
Energy for the years of faithful support of my database research.
</para>
<programlisting>
Gene Selkov, Jr.
Computational Scientist
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Ave.
Building 221
Argonne, IL 60439-4844
</programlisting>
<para>
<email>selkovjr@mcs.anl.gov</email>
</para>
</sect2>
</sect1>

View File

@ -1,3 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/sslinfo.sgml,v 1.3 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="sslinfo">
<title>sslinfo</title>
@ -7,105 +8,119 @@
</indexterm>
<para>
This modules provides information about current SSL certificate for PostgreSQL.
The <filename>sslinfo</> module provides information about the SSL
certificate that the current client provided when connecting to
<productname>PostgreSQL</>. The module is useless (most functions
will return NULL) if the current connection does not use SSL.
</para>
<para>
This extension won't build at all unless the installation was
configured with <literal>--with-openssl</>.
</para>
<sect2>
<title>Notes</title>
<para>
This extension won't build unless your PostgreSQL server is configured
with --with-openssl. Information provided with these functions would
be completely useless if you don't use SSL to connect to database.
</para>
</sect2>
<title>Functions Provided</title>
<sect2>
<title>Functions Description</title>
<itemizedlist>
<listitem>
<programlisting>
ssl_is_used() RETURNS boolean;
</programlisting>
<variablelist>
<varlistentry>
<term><function>
ssl_is_used() returns boolean
</function></term>
<listitem>
<para>
Returns TRUE, if current connection to server uses SSL and FALSE
Returns TRUE if current connection to server uses SSL, and FALSE
otherwise.
</para>
</listitem>
</listitem>
</varlistentry>
<listitem>
<programlisting>
ssl_client_cert_present() RETURNS boolean
</programlisting>
<varlistentry>
<term><function>
ssl_client_cert_present() returns boolean
</function></term>
<listitem>
<para>
Returns TRUE if current client have presented valid SSL client
certificate to the server and FALSE otherwise (e.g., no SSL,
certificate hadn't be requested by server).
Returns TRUE if current client has presented a valid SSL client
certificate to the server, and FALSE otherwise. (The server
might or might not be configured to require a client certificate.)
</para>
</listitem>
</listitem>
</varlistentry>
<listitem>
<programlisting>
ssl_client_serial() RETURNS numeric
</programlisting>
<varlistentry>
<term><function>
ssl_client_serial() returns numeric
</function></term>
<listitem>
<para>
Returns serial number of current client certificate. The combination
of certificate serial number and certificate issuer is guaranteed to
uniquely identify certificate (but not its owner -- the owner ought to
regularily change his keys, and get new certificates from the issuer).
Returns serial number of current client certificate. The combination of
certificate serial number and certificate issuer is guaranteed to
uniquely identify a certificate (but not its owner &mdash; the owner
ought to regularly change his keys, and get new certificates from the
issuer).
</para>
<para>
So, if you run you own CA and allow only certificates from this CA to
be accepted by server, the serial number is the most reliable (albeit
not very mnemonic) means to indentify user.
</para>
</listitem>
<listitem>
<programlisting>
ssl_client_dn() RETURNS text
</programlisting>
<para>
Returns the full subject of current client certificate, converting
So, if you run your own CA and allow only certificates from this CA to
be accepted by the server, the serial number is the most reliable (albeit
not very mnemonic) means to identify a user.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><function>
ssl_client_dn() returns text
</function></term>
<listitem>
<para>
Returns the full subject of the current client certificate, converting
character data into the current database encoding. It is assumed that
if you use non-Latin characters in the certificate names, your
if you use non-ASCII characters in the certificate names, your
database is able to represent these characters, too. If your database
uses the SQL_ASCII encoding, non-Latin characters in the name will be
uses the SQL_ASCII encoding, non-ASCII characters in the name will be
represented as UTF-8 sequences.
</para>
<para>
The result looks like '/CN=Somebody /C=Some country/O=Some organization'.
</para>
</listitem>
<listitem>
<programlisting>
ssl_issuer_dn()
</programlisting>
<para>
Returns the full issuer name of the client certificate, converting
character data into current database encoding.
The result looks like <literal>/CN=Somebody /C=Some country/O=Some organization</>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><function>
ssl_issuer_dn() returns text
</function></term>
<listitem>
<para>
Returns the full issuer name of the current client certificate, converting
character data into the current database encoding. Encoding conversions
are handled the same as for <function>ssl_client_dn</>.
</para>
<para>
The combination of the return value of this function with the
certificate serial number uniquely identifies the certificate.
</para>
<para>
The result of this function is really useful only if you have more
than one trusted CA certificate in your server's root.crt file, or if
this CA has issued some intermediate certificate authority
certificates.
This function is really useful only if you have more than one trusted CA
certificate in your server's <filename>root.crt</> file, or if this CA
has issued some intermediate certificate authority certificates.
</para>
</listitem>
</listitem>
</varlistentry>
<listitem>
<programlisting>
ssl_client_dn_field(fieldName text) RETURNS text
</programlisting>
<varlistentry>
<term><function>
ssl_client_dn_field(fieldname text) returns text
</function></term>
<listitem>
<para>
This function returns the value of the specified field in the
certificate subject. Field names are string constants that are
converted into ASN1 object identificators using the OpenSSL object
certificate subject, or NULL if the field is not present.
Field names are string constants that are
converted into ASN1 object identifiers using the OpenSSL object
database. The following values are acceptable:
</para>
<programlisting>
@ -127,38 +142,46 @@ generationQualifier
description
dnQualifier
x500UniqueIdentifier
pseudonim
pseudonym
role
emailAddress
</programlisting>
<para>
All of these fields are optional, except commonName. It depends
entirely on your CA policy which of them would be included and which
wouldn't. The meaning of these fields, howeer, is strictly defined by
All of these fields are optional, except <structfield>commonName</>.
It depends
entirely on your CA's policy which of them would be included and which
wouldn't. The meaning of these fields, however, is strictly defined by
the X.500 and X.509 standards, so you cannot just assign arbitrary
meaning to them.
</para>
</listitem>
</listitem>
</varlistentry>
<listitem>
<programlisting>
ssl_issuer_field(fieldName text) RETURNS text;
</programlisting>
<varlistentry>
<term><function>
ssl_issuer_field(fieldname text) returns text
</function></term>
<listitem>
<para>
Does same as ssl_client_dn_field, but for the certificate issuer
Same as <function>ssl_client_dn_field</>, but for the certificate issuer
rather than the certificate subject.
</para>
</listitem>
</itemizedlist>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>Author</title>
<para>
Victor Wagner <email>vitus@cryptocom.ru</email>, Cryptocom LTD
</para>
<para>
E-Mail of Cryptocom OpenSSL development group:
<email>openssl@cryptocom.ru</email>
</para>
</sect2>
</sect1>
</sect1>

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/test-parser.sgml,v 1.1 2007/12/03 04:18:47 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/test-parser.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="test-parser">
<title>test_parser</title>
@ -8,11 +8,14 @@
</indexterm>
<para>
This is an example of a custom parser for full text search.
<filename>test_parser</> is an example of a custom parser for full-text
search. It doesn't do anything especially useful, but can serve as
a starting point for developing your own parser.
</para>
<para>
It recognizes space-delimited words and returns just two token types:
<filename>test_parser</> recognizes words separated by white space,
and returns just two token types:
<programlisting>
mydb=# SELECT * FROM ts_token_type('testparser');

View File

@ -1,3 +1,5 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/tsearch2.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="tsearch2">
<title>tsearch2</title>

View File

@ -1,3 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/uuid-ossp.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="uuid-ossp">
<title>uuid-ossp</title>
@ -7,13 +8,19 @@
</indexterm>
<para>
This module provides functions to generate universally unique
identifiers (UUIDs) using one of the several standard algorithms, as
well as functions to produce certain special UUID constants.
The <filename>uuid-ossp</> module provides functions to generate universally
unique identifiers (UUIDs) using one of several standard algorithms. There
are also functions to produce certain special UUID constants.
</para>
<para>
This module depends on the OSSP UUID library, which can be found at
<ulink url="http://www.ossp.org/pkg/lib/uuid/"></ulink>.
</para>
<sect2>
<title>UUID Generation</title>
<title><literal>uuid-ossp</literal> Functions</title>
<para>
The relevant standards ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC
4122 specify four algorithms for generating UUIDs, identified by the
@ -23,7 +30,7 @@
</para>
<table>
<title><literal>uuid-ossp</literal> functions</title>
<title>Functions for UUID Generation</title>
<tgroup cols="2">
<thead>
<row>
@ -59,22 +66,9 @@
<para>
This function generates a version 3 UUID in the given namespace using
the specified input name. The namespace should be one of the special
constants produced by the uuid_ns_*() functions shown below. (It
could be any UUID in theory.) The name is an identifier in the
selected namespace. For example:
</para>
</entry>
</row>
<row>
<entry><literal>uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org')</literal></entry>
<entry>
<para>
The name parameter will be MD5-hashed, so the cleartext cannot be
derived from the generated UUID.
</para>
<para>
The generation of UUIDs by this method has no random or
environment-dependent element and is therefore reproducible.
constants produced by the <function>uuid_ns_*()</> functions shown
below. (It could be any UUID in theory.) The name is an identifier
in the selected namespace.
</para>
</entry>
</row>
@ -102,15 +96,28 @@
</tgroup>
</table>
<para>
For example:
<programlisting>
SELECT uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org');
</programlisting>
The name parameter will be MD5-hashed, so the cleartext cannot be
derived from the generated UUID.
The generation of UUIDs by this method has no random or
environment-dependent element and is therefore reproducible.
</para>
<table>
<title>UUID Constants</title>
<title>Functions Returning UUID Constants</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>uuid_nil()</literal></entry>
<entry>
<para>
A "nil" UUID constant, which does not occur as a real UUID.
A <quote>nil</> UUID constant, which does not occur as a real UUID.
</para>
</entry>
</row>
@ -135,8 +142,8 @@
<entry>
<para>
Constant designating the ISO object identifier (OID) namespace for
UUIDs. (This pertains to ASN.1 OIDs, unrelated to the OIDs used in
PostgreSQL.)
UUIDs. (This pertains to ASN.1 OIDs, which are unrelated to the OIDs
used in <productname>PostgreSQL</>.)
</para>
</entry>
</row>
@ -153,11 +160,14 @@
</tgroup>
</table>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Eisentraut <email>peter_e@gmx.net</email>
</para>
</sect2>
</sect1>
</sect2>
</sect1>

View File

@ -1,3 +1,5 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/vacuumlo.sgml,v 1.2 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="vacuumlo">
<title>vacuumlo</title>
@ -6,69 +8,103 @@
</indexterm>
<para>
This is a simple utility that will remove any orphaned large objects out of a
PostgreSQL database. An orphaned LO is considered to be any LO whose OID
does not appear in any OID data column of the database.
<application>vacuumlo</> is a simple utility program that will remove any
<quote>orphaned</> large objects from a
<productname>PostgreSQL</> database. An orphaned large object (LO) is
considered to be any LO whose OID does not appear in any <type>oid</> or
<type>lo</> data column of the database.
</para>
<para>
If you use this, you may also be interested in the lo_manage trigger in
contrib/lo. lo_manage is useful to try to avoid creating orphaned LOs
in the first place.
</para>
<para>
<note>
<para>
It was decided to place this in contrib as it needs further testing, but hopefully,
this (or a variant of it) would make it into the backend as a "vacuum lo"
command in a later release.
</para>
</note>
If you use this, you may also be interested in the <function>lo_manage</>
trigger in <filename>contrib/lo</> (see <xref linkend="lo">).
<function>lo_manage</> is useful to try
to avoid creating orphaned LOs in the first place.
</para>
<sect2>
<title>Usage</title>
<programlisting>
vacuumlo [options] database [database2 ... databasen]
</programlisting>
<synopsis>
vacuumlo [options] database [database2 ... databaseN]
</synopsis>
<para>
All databases named on the command line are processed. Available options
include:
</para>
<programlisting>
-v Write a lot of progress messages
-n Don't remove large objects, just show what would be done
-U username Username to connect as
-W Prompt for password
-h hostname Database server host
-p port Database server port
</programlisting>
<variablelist>
<varlistentry>
<term><option>-v</option></term>
<listitem>
<para>Write a lot of progress messages</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-n</option></term>
<listitem>
<para>Don't remove anything, just show what would be done</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-U</option> <replaceable>username</></term>
<listitem>
<para>Username to connect as</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-W</option></term>
<listitem>
<para>Force prompt for password (generally useless)</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-h</option> <replaceable>hostname</></term>
<listitem>
<para>Database server's host</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-p</option> <replaceable>port</></term>
<listitem>
<para>Database server's port</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>Method</title>
<para>
First, it builds a temporary table which contains all of the OIDs of the
large objects in that database.
</para>
<para>
It then scans through all columns in the database that are of type "oid"
or "lo", and removes matching entries from the temporary table.
It then scans through all columns in the database that are of type
<type>oid</> or <type>lo</>, and removes matching entries from the
temporary table.
</para>
<para>
The remaining entries in the temp table identify orphaned LOs. These are
removed.
The remaining entries in the temp table identify orphaned LOs.
These are removed.
</para>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Mount <email>peter@retep.org.uk</email>
</para>
<para>
<ulink url="http://www.retep.org.uk"></ulink>
</para>
</sect2>
</sect1>

View File

@ -1,31 +1,41 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/xml2.sgml,v 1.4 2007/12/06 04:12:10 tgl Exp $ -->
<sect1 id="xml2">
<title>xml2: XML-handling functions</title>
<title>xml2</title>
<indexterm zone="xml2">
<primary>xml2</primary>
</indexterm>
<para>
The <filename>xml2</> module provides XPath querying and
XSLT functionality.
</para>
<sect2>
<title>Deprecation notice</title>
<para>
From PostgreSQL 8.3 on, there is XML-related
functionality based on the SQL/XML standard in the core server.
That functionality covers XML syntax checking and XPath queries,
which is what this module does as well, and more, but the API is
not at all compatible. It is planned that this module will be
removed in PostgreSQL 8.4 in favor of the newer standard API, so
you are encouraged to try converting your applications. If you
find that some of the functionality of this module is not
available in an adequate form with the newer API, please explain
your issue to pgsql-hackers@postgresql.org so that the deficiency
can be addressed.
From <productname>PostgreSQL</> 8.3 on, there is XML-related
functionality based on the SQL/XML standard in the core server.
That functionality covers XML syntax checking and XPath queries,
which is what this module does, and more, but the API is
not at all compatible. It is planned that this module will be
removed in PostgreSQL 8.4 in favor of the newer standard API, so
you are encouraged to try converting your applications. If you
find that some of the functionality of this module is not
available in an adequate form with the newer API, please explain
your issue to pgsql-hackers@postgresql.org so that the deficiency
can be addressed.
</para>
</sect2>
<sect2>
<title>Description of functions</title>
<para>
The first set of functions are straightforward XML parsing and XPath queries:
These functions provide straightforward XML parsing and XPath queries.
All arguments are of type <type>text</>, so for brevity that is not shown.
</para>
<table>
@ -34,27 +44,27 @@
<tbody>
<row>
<entry>
<programlisting>
xml_is_well_formed(document) RETURNS bool
</programlisting>
<synopsis>
xml_is_well_formed(document) returns bool
</synopsis>
</entry>
<entry>
<para>
This parses the document text in its parameter and returns true if the
document is well-formed XML. (Note: before PostgreSQL 8.2, this function
was called xml_valid(). That is the wrong name since validity and
well-formedness have different meanings in XML. The old name is still
available, but is deprecated and will be removed in 8.3.)
document is well-formed XML. (Note: before PostgreSQL 8.2, this
function was called <function>xml_valid()</>. That is the wrong name
since validity and well-formedness have different meanings in XML.
The old name is still available, but is deprecated.)
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_string(document,query) RETURNS text
xpath_number(document,query) RETURNS float4
xpath_bool(document,query) RETURNS bool
</programlisting>
<synopsis>
xpath_string(document,query) returns text
xpath_number(document,query) returns float4
xpath_bool(document,query) returns bool
</synopsis>
</entry>
<entry>
<para>
@ -65,9 +75,9 @@
</row>
<row>
<entry>
<programlisting>
xpath_nodeset(document,query,toptag,itemtag) RETURNS text
</programlisting>
<synopsis>
xpath_nodeset(document,query,toptag,itemtag) returns text
</synopsis>
</entry>
<entry>
<para>
@ -75,10 +85,10 @@
the result is multivalued, the output will look like:
</para>
<literal>
&lt;toptag>
&lt;itemtag>Value 1 which could be an XML fragment&lt;/itemtag>
&lt;itemtag>Value 2....&lt;/itemtag>
&lt;/toptag>
&lt;toptag&gt;
&lt;itemtag&gt;Value 1 which could be an XML fragment&lt;/itemtag&gt;
&lt;itemtag&gt;Value 2....&lt;/itemtag&gt;
&lt;/toptag&gt;
</literal>
<para>
If either toptag or itemtag is an empty string, the relevant tag is omitted.
@ -87,49 +97,51 @@
</row>
<row>
<entry>
<programlisting>
xpath_nodeset(document,query) RETURNS
</programlisting>
<synopsis>
xpath_nodeset(document,query) returns text
</synopsis>
</entry>
<entry>
<para>
Like xpath_nodeset(document,query,toptag,itemtag) but text omits both tags.
Like xpath_nodeset(document,query,toptag,itemtag) but result omits both tags.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_nodeset(document,query,itemtag) RETURNS
</programlisting>
<synopsis>
xpath_nodeset(document,query,itemtag) returns text
</synopsis>
</entry>
<entry>
<para>
Like xpath_nodeset(document,query,toptag,itemtag) but text omits toptag.
Like xpath_nodeset(document,query,toptag,itemtag) but result omits toptag.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_list(document,query,seperator) RETURNS text
</programlisting>
<synopsis>
xpath_list(document,query,separator) returns text
</synopsis>
</entry>
<entry>
<para>
This function returns multiple values seperated by the specified
seperator, e.g. Value 1,Value 2,Value 3 if seperator=','.
This function returns multiple values separated by the specified
separator, for example <literal>Value 1,Value 2,Value 3</> if
separator is <literal>,</>.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_list(document,query) RETURNS text
</programlisting>
<synopsis>
xpath_list(document,query) returns text
</synopsis>
</entry>
<entry>
This is a wrapper for the above function that uses ',' as the seperator.
This is a wrapper for the above function that uses <literal>,</>
as the separator.
</entry>
</row>
</tbody>
@ -137,38 +149,37 @@
</table>
</sect2>
<sect2>
<title><literal>xpath_table</literal></title>
<synopsis>
xpath_table(text key, text document, text relation, text xpaths, text criteria) returns setof record
</synopsis>
<para>
This is a table function which evaluates a set of XPath queries on
each of a set of documents and returns the results as a table. The
primary key field from the original document table is returned as the
first column of the result so that the resultset from xpath_table can
be readily used in joins.
<function>xpath_table</> is a table function that evaluates a set of XPath
queries on each of a set of documents and returns the results as a
table. The primary key field from the original document table is returned
as the first column of the result so that the result set
can readily be used in joins.
</para>
<para>
The function itself takes 5 arguments, all text.
</para>
<programlisting>
xpath_table(key,document,relation,xpaths,criteria)
</programlisting>
<table>
<title>Parameters</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>key</literal></entry>
<entry><parameter>key</parameter></entry>
<entry>
<para>
the name of the "key" field - this is just a field to be used as
the first column of the output table i.e. it identifies the record from
which each output row came (see note below about multiple values).
the name of the <quote>key</> field &mdash; this is just a field to be used as
the first column of the output table, i.e. it identifies the record from
which each output row came (see note below about multiple values)
</para>
</entry>
</row>
<row>
<entry><literal>document</literal></entry>
<entry><parameter>document</parameter></entry>
<entry>
<para>
the name of the field containing the XML document
@ -176,7 +187,7 @@
</entry>
</row>
<row>
<entry><literal>relation</literal></entry>
<entry><parameter>relation</parameter></entry>
<entry>
<para>
the name of the table or view containing the documents
@ -184,20 +195,20 @@
</entry>
</row>
<row>
<entry><literal>xpaths</literal></entry>
<entry><parameter>xpaths</parameter></entry>
<entry>
<para>
multiple xpath expressions separated by <literal>|</literal>
one or more XPath expressions, separated by <literal>|</literal>
</para>
</entry>
</row>
<row>
<entry><literal>criteria</literal></entry>
<entry><parameter>criteria</parameter></entry>
<entry>
<para>
The contents of the where clause. This needs to be specified,
so use "true" or "1=1" here if you want to process all the rows in the
relation.
the contents of the WHERE clause. This cannot be omitted, so use
<literal>true</literal> or <literal>1=1</literal> if you want to
process all the rows in the relation
</para>
</entry>
</row>
@ -206,19 +217,19 @@
</table>
<para>
NB These parameters (except the XPath strings) are just substituted
into a plain SQL SELECT statement, so you have some flexibility - the
These parameters (except the XPath strings) are just substituted
into a plain SQL SELECT statement, so you have some flexibility &mdash; the
statement is
</para>
<para>
<literal>
SELECT &lt;key>,&lt;document> FROM &lt;relation> WHERE &lt;criteria>
SELECT &lt;key&gt;, &lt;document&gt; FROM &lt;relation&gt; WHERE &lt;criteria&gt;
</literal>
</para>
<para>
so those parameters can be *anything* valid in those particular
so those parameters can be <emphasis>anything</> valid in those particular
locations. The result from this SELECT needs to return exactly two
columns (which it will unless you try to list multiple fields for key
or document). Beware that this simplistic approach requires that you
@ -226,43 +237,43 @@
</para>
<para>
Using the function
</para>
<para>
The function has to be used in a FROM expression. This gives the following
form:
The function has to be used in a <literal>FROM</> expression, with an
<literal>AS</> clause to specify the output columns; for example
</para>
<programlisting>
SELECT * FROM
xpath_table('article_id',
'article_xml',
'articles',
'/article/author|/article/pages|/article/title',
'date_entered > ''2003-01-01'' ')
'article_xml',
'articles',
'/article/author|/article/pages|/article/title',
'date_entered > ''2003-01-01'' ')
AS t(article_id integer, author text, page_count integer, title text);
</programlisting>
<para>
The AS clause defines the names and types of the columns in the
virtual table. If there are more XPath queries than result columns,
The <literal>AS</> clause defines the names and types of the columns in the
output table. The first is the <quote>key</> field and the rest correspond
to the XPath queries.
If there are more XPath queries than result columns,
the extra queries will be ignored. If there are more result columns
than XPath queries, the extra columns will be NULL.
</para>
<para>
Note that I've said in this example that pages is an integer. The
function deals internally with string representations, so when you say
you want an integer in the output, it will take the string
representation of the XPath result and use PostgreSQL input functions
to transform it into an integer (or whatever type the AS clause
requests). An error will result if it can't do this - for example if
the result is empty - so you may wish to just stick to 'text' as the
column type if you think your data has any problems.
Notice that this example defines the <structname>page_count</> result
column as an integer. The function deals internally with string
representations, so when you say you want an integer in the output, it will
take the string representation of the XPath result and use PostgreSQL input
functions to transform it into an integer (or whatever type the <type>AS</>
clause requests). An error will result if it can't do this &mdash; for
example if the result is empty &mdash; so you may wish to just stick to
<type>text</> as the column type if you think your data has any problems.
</para>
<para>
The select statement doesn't need to use * alone - it can reference the
The calling <command>SELECT</> statement doesn't necessarily have be
be just <literal>SELECT *</> &mdash; it can reference the output
columns by name or join them to other tables. The function produces a
virtual table with which you can perform any operation you wish (e.g.
aggregation, joining, sorting etc). So we could also have:
@ -270,10 +281,10 @@ AS t(article_id integer, author text, page_count integer, title text);
<programlisting>
SELECT t.title, p.fullname, p.email
FROM xpath_table('article_id','article_xml','articles',
'/article/title|/article/author/@id',
'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
AS t(article_id integer, title text, author_id integer),
FROM xpath_table('article_id', 'article_xml', 'articles',
'/article/title|/article/author/@id',
'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
AS t(article_id integer, title text, author_id integer),
tblPeopleInfo AS p
WHERE t.author_id = p.person_id;
</programlisting>
@ -282,91 +293,74 @@ WHERE t.author_id = p.person_id;
as a more complicated example. Of course, you could wrap all
of this in a view for convenience.
</para>
<sect3>
<title>Multivalued results</title>
<para>
The xpath_table function assumes that the results of each XPath query
The <function>xpath_table</> function assumes that the results of each XPath query
might be multi-valued, so the number of rows returned by the function
may not be the same as the number of input documents. The first row
returned contains the first result from each query, the second row the
second result from each query. If one of the queries has fewer values
than the others, NULLs will be returned instead.
</para>
<para>
In some cases, a user will know that a given XPath query will return
only a single result (perhaps a unique document identifier) - if used
only a single result (perhaps a unique document identifier) &mdash; if used
alongside an XPath query returning multiple results, the single-valued
result will appear only on the first row of the result. The solution
to this is to use the key field as part of a join against a simpler
XPath query. As an example:
</para>
<para>
<literal>
CREATE TABLE test
(
id int4 NOT NULL,
xml text,
CONSTRAINT pk PRIMARY KEY (id)
)
WITHOUT OIDS;
INSERT INTO test VALUES (1, '&lt;doc num="C1">
&lt;line num="L1">&lt;a>1&lt;/a>&lt;b>2&lt;/b>&lt;c>3&lt;/c>&lt;/line>
&lt;line num="L2">&lt;a>11&lt;/a>&lt;b>22&lt;/b>&lt;c>33&lt;/c>&lt;/line>
&lt;/doc>');
INSERT INTO test VALUES (2, '&lt;doc num="C2">
&lt;line num="L1">&lt;a>111&lt;/a>&lt;b>222&lt;/b>&lt;c>333&lt;/c>&lt;/line>
&lt;line num="L2">&lt;a>111&lt;/a>&lt;b>222&lt;/b>&lt;c>333&lt;/c>&lt;/line>
&lt;/doc>');
</literal>
</para>
</sect3>
<sect3>
<title>The query</title>
<programlisting>
SELECT * FROM xpath_table('id','xml','test',
'/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4,
val2 int4, val3 int4)
WHERE id = 1 ORDER BY doc_num, line_num
CREATE TABLE test (
id int4 NOT NULL,
xml text,
CONSTRAINT pk PRIMARY KEY (id)
);
INSERT INTO test VALUES (1, '&lt;doc num="C1"&gt;
&lt;line num="L1"&gt;&lt;a&gt;1&lt;/a&gt;&lt;b&gt;2&lt;/b&gt;&lt;c&gt;3&lt;/c&gt;&lt;/line&gt;
&lt;line num="L2"&gt;&lt;a&gt;11&lt;/a&gt;&lt;b&gt;22&lt;/b&gt;&lt;c&gt;33&lt;/c&gt;&lt;/line&gt;
&lt;/doc&gt;');
INSERT INTO test VALUES (2, '&lt;doc num="C2"&gt;
&lt;line num="L1"&gt;&lt;a&gt;111&lt;/a&gt;&lt;b&gt;222&lt;/b&gt;&lt;c&gt;333&lt;/c&gt;&lt;/line&gt;
&lt;line num="L2"&gt;&lt;a&gt;111&lt;/a&gt;&lt;b&gt;222&lt;/b&gt;&lt;c&gt;333&lt;/c&gt;&lt;/line&gt;
&lt;/doc&gt;');
SELECT * FROM
xpath_table('id','xml','test',
'/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
'true')
AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4, val2 int4, val3 int4)
WHERE id = 1 ORDER BY doc_num, line_num
id | doc_num | line_num | val1 | val2 | val3
----+---------+----------+------+------+------
1 | C1 | L1 | 1 | 2 | 3
1 | | L2 | 11 | 22 | 33
</programlisting>
<para>
Gives the result:
</para>
<programlisting>
id | doc_num | line_num | val1 | val2 | val3
----+---------+----------+------+------+------
1 | C1 | L1 | 1 | 2 | 3
1 | | L2 | 11 | 22 | 33
</programlisting>
<para>
To get doc_num on every line, the solution is to use two invocations
of xpath_table and join the results:
To get doc_num on every line, the solution is to use two invocations
of xpath_table and join the results:
</para>
<programlisting>
SELECT t.*,i.doc_num FROM
xpath_table('id','xml','test',
'/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4),
xpath_table('id','xml','test','/doc/@num','1=1')
AS i(id int4, doc_num varchar(10))
xpath_table('id', 'xml', 'test',
'/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
'true')
AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4),
xpath_table('id', 'xml', 'test', '/doc/@num', 'true')
AS i(id int4, doc_num varchar(10))
WHERE i.id=t.id AND i.id=1
ORDER BY doc_num, line_num;
</programlisting>
<para>
which gives the desired result:
</para>
<programlisting>
id | line_num | val1 | val2 | val3 | doc_num
----+----------+------+------+------+---------
1 | L1 | 1 | 2 | 3 | C1
@ -376,61 +370,57 @@ WHERE t.author_id = p.person_id;
</sect3>
</sect2>
<sect2>
<title>XSLT functions</title>
<para>
The following functions are available if libxslt is installed (this is
not currently detected automatically, so you will have to amend the
Makefile)
Makefile):
</para>
<sect3>
<title><literal>xslt_process</literal></title>
<programlisting>
xslt_process(document,stylesheet,paramlist) RETURNS text
</programlisting>
<synopsis>
xslt_process(text document, text stylesheet, text paramlist) returns text
</synopsis>
<para>
This function appplies the XSL stylesheet to the document and returns
the transformed result. The paramlist is a list of parameter
assignments to be used in the transformation, specified in the form
'a=1,b=2'. Note that this is also proof-of-concept code and the
parameter parsing is very simple-minded (e.g. parameter values cannot
contain commas!)
<literal>a=1,b=2</>. Note that the
parameter parsing is very simple-minded: parameter values cannot
contain commas!
</para>
<para>
Also note that if either the document or stylesheet values do not
begin with a &lt; then they will be treated as URLs and libxslt will
fetch them. It thus follows that you can use xslt_process as a means
to fetch the contents of URLs - you should be aware of the security
implications of this.
fetch them. It follows that you can use <function>xslt_process</> as a
means to fetch the contents of URLs &mdash; you should be aware of the
security implications of this.
</para>
<para>
There is also a two-parameter version of xslt_process which does not
pass any parameters to the transformation.
There is also a two-parameter version of <function>xslt_process</> which
does not pass any parameters to the transformation.
</para>
</sect3>
</sect2>
<sect2>
<title>Credits</title>
<title>Author</title>
<para>
Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com)
John Gray <email>jgray@azuli.co.uk</email>
</para>
<para>
Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com).
It has the same BSD licence as PostgreSQL.
</para>
<para>
This version of the XML functions provides both XPath querying and
XSLT functionality. There is also a new table function which allows
the straightforward return of multiple XML results. Note that the current code
doesn't take any particular care over character sets - this is
something that should be fixed at some point!
</para>
<para>
If you have any comments or suggestions, please do contact me at
<email>jgray@azuli.co.uk.</email> Unfortunately, this isn't my main job, so
I can't guarantee a rapid response to your query!
</para>
</sect2>
</sect1>
</sect1>