1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-27 12:41:57 +03:00

Move most /contrib README files into SGML. Some still need conversion

or will never be converted.
This commit is contained in:
Bruce Momjian
2007-11-10 23:30:46 +00:00
parent 6e414a171e
commit c3c69ab4fd
60 changed files with 9280 additions and 5635 deletions

View File

@ -0,0 +1,32 @@
<sect1>
<title>adminpack</title>
<para>
adminpack is a PostgreSQL standard module that implements a number of
support functions which pgAdmin and other administration and management tools
can use to provide additional functionality if installed on a server.
</para>
<sect2>
<title>Functions implemented</title>
<para>
Functions implemented by adminpack can only be run by a superuser. Here's a
list of these functions:
</para>
<para>
<programlisting>
int8 pg_catalog.pg_file_write(fname text, data text, append bool)
bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text)
bool pg_catalog.pg_file_rename(oldname text, newname text)
bool pg_catalog.pg_file_unlink(fname text)
setof record pg_catalog.pg_logdir_ls()
/* Renaming of existing backend functions for pgAdmin compatibility */
int8 pg_catalog.pg_file_read(fname text, data text, append bool)
bigint pg_catalog.pg_file_length(text)
int4 pg_catalog.pg_logfile_rotate()
</programlisting>
</para>
</sect2>
</sect1>

View File

@ -0,0 +1,40 @@
<sect1>
<!--
<indexterm zone="btree-gist">
<primary>btree-gist</primary>
</indexterm>
-->
<title>btree-gist</title>
<para>
btree-gist is a B-Tree implementation using GiST that supports the int2, int4,
int8, float4, float8 timestamp with/without time zone, time
with/without time zone, date, interval, oid, money, macaddr, char,
varchar/text, bytea, numeric, bit, varbit and inet/cidr types.
</para>
<sect2>
<title>Example usage</title>
<programlisting>
CREATE TABLE test (a int4);
-- create index
CREATE INDEX testidx ON test USING gist (a);
-- query
SELECT * FROM test WHERE a < 10;
</programlisting>
</sect2>
<sect2>
<title>Authors</title>
<para>
All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) ,
Oleg Bartunov (<email>oleg@sai.msu.su</email>), Janko Richter
(<email>jankorichter@yahoo.de</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for additional
information.
</para>
</sect2>
</sect1>

View File

@ -0,0 +1,121 @@
<sect1 id="buffercache">
<title>pg_buffercache</title>
<indexterm zone="buffercache">
<primary>pg_buffercache</primary>
</indexterm>
<para>
<literal>pg_buffercache</literal> module provides the means for examining
what's happening to the buffercache at any given time without having to
restart or rebuild the server with debugging code added. The intent is to
do for the buffercache what pg_locks does for locks.
</para>
<para>
This module consists of a C function <literal>pg_buffercache_pages()</literal>
that returns a set of records, plus a view <literal>pg_buffercache</literal>
to wrapper the function.
</para>
<para>
By default public access is REVOKED from both of these, just in case there
are security issues lurking.
</para>
<sect2>
<title>Notes</title>
<para>
The definition of the columns exposed in the view is:
</para>
<programlisting>
Column | references | Description
----------------+----------------------+------------------------------------
bufferid | | Id, 1..shared_buffers.
relfilenode | pg_class.relfilenode | Refilenode of the relation.
reltablespace | pg_tablespace.oid | Tablespace oid of the relation.
reldatabase | pg_database.oid | Database for the relation.
relblocknumber | | Offset of the page in the relation.
isdirty | | Is the page dirty?
usagecount | | Page LRU count
</programlisting>
<para>
There is one row for each buffer in the shared cache. Unused buffers are
shown with all fields null except bufferid.
</para>
<para>
Because the cache is shared by all the databases, there are pages from
relations not belonging to the current database.
</para>
<para>
When the pg_buffercache view is accessed, internal buffer manager locks are
taken, and a copy of the buffer cache data is made for the view to display.
This ensures that the view produces a consistent set of results, while not
blocking normal buffer activity longer than necessary. Nonetheless there
could be some impact on database performance if this view is read often.
</para>
</sect2>
<sect2>
<title>Sample output</title>
<programlisting>
regression=# \d pg_buffercache;
View "public.pg_buffercache"
Column | Type | Modifiers
----------------+----------+-----------
bufferid | integer |
relfilenode | oid |
reltablespace | oid |
reldatabase | oid |
relblocknumber | bigint |
isdirty | boolean |
usagecount | smallint |
View definition:
SELECT p.bufferid, p.relfilenode, p.reltablespace, p.reldatabase,
p.relblocknumber, p.isdirty, p.usagecount
FROM pg_buffercache_pages() p(bufferid integer, relfilenode oid,
reltablespace oid, reldatabase oid, relblocknumber bigint,
isdirty boolean, usagecount smallint);
regression=# SELECT c.relname, count(*) AS buffers
FROM pg_class c INNER JOIN pg_buffercache b
ON b.relfilenode = c.relfilenode INNER JOIN pg_database d
ON (b.reldatabase = d.oid AND d.datname = current_database())
GROUP BY c.relname
ORDER BY 2 DESC LIMIT 10;
relname | buffers
---------------------------------+---------
tenk2 | 345
tenk1 | 141
pg_proc | 46
pg_class | 45
pg_attribute | 43
pg_class_relname_nsp_index | 30
pg_proc_proname_args_nsp_index | 28
pg_attribute_relid_attnam_index | 26
pg_depend | 22
pg_depend_reference_index | 20
(10 rows)
regression=#
</programlisting>
</sect2>
<sect2>
<title>Authors</title>
<itemizedlist>
<listitem>
<para>
Mark Kirkwood <email>markir@paradise.net.nz</email>
</para>
</listitem>
<listitem>
<para>Design suggestions: Neil Conway <email>neilc@samurai.com</email></para>
</listitem>
<listitem>
<para>Debugging advice: Tom Lane <email>tgl@sss.pgh.pa.us</email></para>
</listitem>
</itemizedlist>
</sect2>
</sect1>

84
doc/src/sgml/chkpass.sgml Normal file
View File

@ -0,0 +1,84 @@
<sect1 id="chkpass">
<title>chkpass</title>
<!--
<indexterm zone="chkpass">
<primary>chkpass</primary>
</indexterm>
-->
<para>
chkpass is a password type that is automatically checked and converted upon
entry. It is stored encrypted. To compare, simply compare against a clear
text password and the comparison function will encrypt it before comparing.
It also returns an error if the code determines that the password is easily
crackable. This is currently a stub that does nothing.
</para>
<para>
Note that the chkpass data type is not indexable.
<!--
I haven't worried about making this type indexable. I doubt that anyone
would ever need to sort a file in order of encrypted password.
-->
</para>
<para>
If you precede the string with a colon, the encryption and checking are
skipped so that you can enter existing passwords into the field.
</para>
<para>
On output, a colon is prepended. This makes it possible to dump and reload
passwords without re-encrypting them. If you want the password (encrypted)
without the colon then use the raw() function. This allows you to use the
type with things like Apache's Auth_PostgreSQL module.
</para>
<para>
The encryption uses the standard Unix function crypt(), and so it suffers
from all the usual limitations of that function; notably that only the
first eight characters of a password are considered.
</para>
<para>
Here is some sample usage:
</para>
<programlisting>
test=# create table test (p chkpass);
CREATE TABLE
test=# insert into test values ('hello');
INSERT 0 1
test=# select * from test;
p
----------------
:dVGkpXdOrE3ko
(1 row)
test=# select raw(p) from test;
raw
---------------
dVGkpXdOrE3ko
(1 row)
test=# select p = 'hello' from test;
?column?
----------
t
(1 row)
test=# select p = 'goodbye' from test;
?column?
----------
f
(1 row)
</programlisting>
<sect2>
<title>Author</title>
<para>
D'Arcy J.M. Cain <email>darcy@druid.net</email>
</para>
</sect2>
</sect1>

56
doc/src/sgml/contrib.sgml Normal file
View File

@ -0,0 +1,56 @@
<chapter id="contrib">
<title>Standard Modules</title>
<para>
This section contains information regarding the standard modules which
can be found in the <literal>contrib</literal> directory of the
PostgreSQL distribution. These are porting tools, analysis utilities,
and plug-in features that are not part of the core PostgreSQL system,
mainly because they address a limited audience or are too experimental
to be part of the main source tree. This does not preclude their
usefulness.
</para>
<para>
Some modules supply new user-defined functions, operators, or types. In
these cases, you will need to run <literal>make</literal> and <literal>make
install</literal> in <literal>contrib/module</literal>. After you have
installed the files you need to register the new entities in the database
system by running the commands in the supplied .sql file. For example,
<programlisting>
$ psql -d dbname -f module.sql
</programlisting>
</para>
&adminpack;
&btree-gist;
&chkpass;
&cube;
&dblink;
&earthdistance;
&fuzzystrmatch;
&hstore;
&intagg;
&intarray;
&isn;
&lo;
&ltree;
&oid2name;
&pageinspect;
&pgbench;
&buffercache;
&pgcrypto;
&freespacemap;
&pgrowlocks;
&standby;
&pgstattuple;
&trgm;
&seg;
&sslinfo;
&tablefunc;
&uuid-ossp;
&vacuumlo;
&xml2;
</chapter>

529
doc/src/sgml/cube.sgml Normal file
View File

@ -0,0 +1,529 @@
<sect1 id="cube">
<title>cube</title>
<indexterm zone="cube">
<primary>cube</primary>
</indexterm>
<para>
This module contains the user-defined type, CUBE, representing
multidimensional cubes.
</para>
<sect2>
<title>Syntax</title>
<para>
The following are valid external representations for the CUBE type:
</para>
<table>
<title>Cube external representations</title>
<tgroup cols="2">
<tbody>
<row>
<entry>'x'</entry>
<entry>A floating point value representing a one-dimensional point or
one-dimensional zero length cubement
</entry>
</row>
<row>
<entry>'(x)'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'x1,x2,x3,...,xn'</entry>
<entry>A point in n-dimensional space, represented internally as a zero
volume box
</entry>
</row>
<row>
<entry>'(x1,x2,x3,...,xn)'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'(x),(y)'</entry>
<entry>1-D cubement starting at x and ending at y or vice versa; the
order does not matter
</entry>
</row>
<row>
<entry>'(x1,...,xn),(y1,...,yn)'</entry>
<entry>n-dimensional box represented by a pair of its opposite corners, no
matter which. Functions take care of swapping to achieve "lower left --
upper right" representation before computing any values
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Grammar</title>
<table>
<title>Cube Grammar Rules</title>
<tgroup cols="2">
<tbody>
<row>
<entry>rule 1</entry>
<entry>box -> O_BRACKET paren_list COMMA paren_list C_BRACKET</entry>
</row>
<row>
<entry>rule 2</entry>
<entry>box -> paren_list COMMA paren_list</entry>
</row>
<row>
<entry>rule 3</entry>
<entry>box -> paren_list</entry>
</row>
<row>
<entry>rule 4</entry>
<entry>box -> list</entry>
</row>
<row>
<entry>rule 5</entry>
<entry>paren_list -> O_PAREN list C_PAREN</entry>
</row>
<row>
<entry>rule 6</entry>
<entry>list -> FLOAT</entry>
</row>
<row>
<entry>rule 7</entry>
<entry>list -> list COMMA FLOAT</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Tokens</title>
<table>
<title>Cube Grammar Rules</title>
<tgroup cols="2">
<tbody>
<row>
<entry>n</entry>
<entry>[0-9]+</entry>
</row>
<row>
<entry>i</entry>
<entry>nteger [+-]?{n}</entry>
</row>
<row>
<entry>real</entry>
<entry>[+-]?({n}\.{n}?|\.{n})</entry>
</row>
<row>
<entry>FLOAT</entry>
<entry>({integer}|{real})([eE]{integer})?</entry>
</row>
<row>
<entry>O_BRACKET</entry>
<entry>\[</entry>
</row>
<row>
<entry>C_BRACKET</entry>
<entry>\]</entry>
</row>
<row>
<entry>O_PAREN</entry>
<entry>\(</entry>
</row>
<row>
<entry>C_PAREN</entry>
<entry>\)</entry>
</row>
<row>
<entry>COMMA</entry>
<entry>\,</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Examples</title>
<table>
<title>Examples</title>
<tgroup cols="2">
<tbody>
<row>
<entry>'x'</entry>
<entry>A floating point value representing a one-dimensional point
(or, zero-length one-dimensional interval)
</entry>
</row>
<row>
<entry>'(x)'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'x1,x2,x3,...,xn'</entry>
<entry>A point in n-dimensional space,represented internally as a zero
volume cube
</entry>
</row>
<row>
<entry>'(x1,x2,x3,...,xn)'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'(x),(y)'</entry>
<entry>A 1-D interval starting at x and ending at y or vice versa; the
order does not matter
</entry>
</row>
<row>
<entry>'[(x),(y)]'</entry>
<entry>Same as above</entry>
</row>
<row>
<entry>'(x1,...,xn),(y1,...,yn)'</entry>
<entry>An n-dimensional box represented by a pair of its diagonally
opposite corners, regardless of order. Swapping is provided
by all comarison routines to ensure the
"lower left -- upper right" representation
before actaul comparison takes place.
</entry>
</row>
<row>
<entry>'[(x1,...,xn),(y1,...,yn)]'</entry>
<entry>Same as above</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
White space is ignored, so '[(x),(y)]' can be: '[ ( x ), ( y ) ]'
</para>
</sect2>
<sect2>
<title>Defaults</title>
<para>
I believe this union:
</para>
<programlisting>
select cube_union('(0,5,2),(2,3,1)','0');
cube_union
-------------------
(0, 0, 0),(2, 5, 2)
(1 row)
</programlisting>
<para>
does not contradict to the common sense, neither does the intersection
</para>
<programlisting>
select cube_inter('(0,-1),(1,1)','(-2),(2)');
cube_inter
-------------
(0, 0),(1, 0)
(1 row)
</programlisting>
<para>
In all binary operations on differently sized boxes, I assume the smaller
one to be a cartesian projection, i. e., having zeroes in place of coordinates
omitted in the string representation. The above examples are equivalent to:
</para>
<programlisting>
cube_union('(0,5,2),(2,3,1)','(0,0,0),(0,0,0)');
cube_inter('(0,-1),(1,1)','(-2,0),(2,0)');
</programlisting>
<para>
The following containment predicate uses the point syntax,
while in fact the second argument is internally represented by a box.
This syntax makes it unnecessary to define the special Point type
and functions for (box,point) predicates.
</para>
<programlisting>
select cube_contains('(0,0),(1,1)', '0.5,0.5');
cube_contains
--------------
t
(1 row)
</programlisting>
</sect2>
<sect2>
<title>Precision</title>
<para>
Values are stored internally as 64-bit floating point numbers. This means that
numbers with more than about 16 significant digits will be truncated.
</para>
</sect2>
<sect2>
<title>Usage</title>
<para>
The access method for CUBE is a GiST index (gist_cube_ops), which is a
generalization of R-tree. GiSTs allow the postgres implementation of
R-tree, originally encoded to support 2-D geometric types such as
boxes and polygons, to be used with any data type whose data domain
can be partitioned using the concepts of containment, intersection and
equality. In other words, everything that can intersect or contain
its own kind can be indexed with a GiST. That includes, among other
things, all geometric data types, regardless of their dimensionality
(see also contrib/seg).
</para>
<para>
The operators supported by the GiST access method include:
</para>
<programlisting>
a = b Same as
</programlisting>
<para>
The cubements a and b are identical.
</para>
<programlisting>
a && b Overlaps
</programlisting>
<para>
The cubements a and b overlap.
</para>
<programlisting>
a @> b Contains
</programlisting>
<para>
The cubement a contains the cubement b.
</para>
<programlisting>
a <@ b Contained in
</programlisting>
<para>
The cubement a is contained in b.
</para>
<para>
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
<para>
Although the mnemonics of the following operators is questionable, I
preserved them to maintain visual consistency with other geometric
data types defined in Postgres.
</para>
<para>
Other operators:
</para>
<programlisting>
[a, b] < [c, d] Less than
[a, b] > [c, d] Greater than
</programlisting>
<para>
These operators do not make a lot of sense for any practical
purpose but sorting. These operators first compare (a) to (c),
and if these are equal, compare (b) to (d). That accounts for
reasonably good sorting in most cases, which is useful if
you want to use ORDER BY with this type
</para>
<para>
The following functions are available:
</para>
<table>
<title>Functions available</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>cube_distance(cube, cube) returns double</literal></entry>
<entry>cube_distance returns the distance between two cubes. If both
cubes are points, this is the normal distance function.
</entry>
</row>
<row>
<entry><literal>cube(float8) returns cube</literal></entry>
<entry>This makes a one dimensional cube with both coordinates the same.
If the type of the argument is a numeric type other than float8 an
explicit cast to float8 may be needed.
<literal>cube(1) == '(1)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8, float8) returns cube</literal></entry>
<entry>
This makes a one dimensional cube.
<literal>cube(1,2) == '(1),(2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8[]) returns cube</literal></entry>
<entry>This makes a zero-volume cube using the coordinates
defined by thearray.<literal>cube(ARRAY[1,2]) == '(1,2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(float8[], float8[]) returns cube</literal></entry>
<entry>This makes a cube, with upper right and lower left
coordinates as defined by the 2 float arrays. Arrays must be of the
same length.
<literal>cube('{1,2}'::float[], '{3,4}'::float[]) == '(1,2),(3,4)'
</literal>
</entry>
</row>
<row>
<entry><literal>cube(cube, float8) returns cube</literal></entry>
<entry>This builds a new cube by adding a dimension on to an
existing cube with the same values for both parts of the new coordinate.
This is useful for building cubes piece by piece from calculated values.
<literal>cube('(1)',2) == '(1,2),(1,2)'</literal>
</entry>
</row>
<row>
<entry><literal>cube(cube, float8, float8) returns cube</literal></entry>
<entry>This builds a new cube by adding a dimension on to an
existing cube. This is useful for building cubes piece by piece from
calculated values. <literal>cube('(1,2)',3,4) == '(1,3),(2,4)'</literal>
</entry>
</row>
<row>
<entry><literal>cube_dim(cube) returns int</literal></entry>
<entry>cube_dim returns the number of dimensions stored in the
the data structure
for a cube. This is useful for constraints on the dimensions of a cube.
</entry>
</row>
<row>
<entry><literal>cube_ll_coord(cube, int) returns double </literal></entry>
<entry>
cube_ll_coord returns the nth coordinate value for the lower left
corner of a cube. This is useful for doing coordinate transformations.
</entry>
</row>
<row>
<entry><literal>cube_ur_coord(cube, int) returns double
</literal></entry>
<entry>cube_ur_coord returns the nth coordinate value for the
upper right corner of a cube. This is useful for doing coordinate
transformations.
</entry>
</row>
<row>
<entry><literal>cube_subset(cube, int[]) returns cube
</literal></entry>
<entry>Builds a new cube from an existing cube, using a list of
dimension indexes
from an array. Can be used to find both the ll and ur coordinate of single
dimenion, e.g.: cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)'
Or can be used to drop dimensions, or reorder them as desired, e.g.:
cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) =
'(5, 3, 1, 1),(8, 7, 6, 6)'
</entry>
</row>
<row>
<entry><literal>cube_is_point(cube) returns bool</literal></entry>
<entry>cube_is_point returns true if a cube is also a point.
This is true when the two defining corners are the same.</entry>
</row>
<row>
<entry><literal>cube_enlarge(cube, double, int) returns cube</literal></entry>
<entry>
cube_enlarge increases the size of a cube by a specified
radius in at least
n dimensions. If the radius is negative the box is shrunk instead. This
is useful for creating bounding boxes around a point for searching for
nearby points. All defined dimensions are changed by the radius. If n
is greater than the number of defined dimensions and the cube is being
increased (r >= 0) then 0 is used as the base for the extra coordinates.
LL coordinates are decreased by r and UR coordinates are increased by r.
If a LL coordinate is increased to larger than the corresponding UR
coordinate (this can only happen when r < 0) than both coordinates are
set to their average. To make it harder for people to break things there
is an effective maximum on the dimension of cubes of 100. This is set
in cubedata.h if you need something bigger.
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
There are a few other potentially useful functions defined in cube.c
that vanished from the schema because I stopped using them. Some of
these were meant to support type casting. Let me know if I was wrong:
I will then add them back to the schema. I would also appreciate
other ideas that would enhance the type and make it more useful.
</para>
<para>
For examples of usage, see sql/cube.sql
</para>
</sect2>
<sect2>
<title>Credits</title>
<para>
This code is essentially based on the example written for
Illustra, <ulink url="http://garcia.me.berkeley.edu/~adong/rtree"></ulink>
</para>
<para>
My thanks are primarily to Prof. Joe Hellerstein
(<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>), and
to his former student, Andy Dong
(<ulink url="http://best.me.berkeley.edu/~adong/"></ulink>), for his exemplar.
I am also grateful to all postgres developers, present and past, for enabling
myself to create my own world and live undisturbed in it. And I would like to
acknowledge my gratitude to Argonne Lab and to the U.S. Department of Energy
for the years of faithful support of my database research.
</para>
<para>
Gene Selkov, Jr.
Computational Scientist
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Ave.
Building 221
Argonne, IL 60439-4844
<email>selkovjr@mcs.anl.gov</email>
</para>
<para>
Minor updates to this package were made by Bruno Wolff III
<email>bruno@wolff.to</email> in August/September of 2002. These include
changing the precision from single precision to double precision and adding
some new functions.
</para>
<para>
Additional updates were made by Joshua Reich <email>josh@root.net</email> in
July 2006. These include <literal>cube(float8[], float8[])</literal> and
cleaning up the code to use the V1 call protocol instead of the deprecated V0
form.
</para>
</sect2>
</sect1>

1312
doc/src/sgml/dblink.sgml Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,133 @@
<sect1 id="earthdistance">
<title>earthdistance</title>
<indexterm zone="earthdistance">
<primary>earthdistance</primary>
</indexterm>
<para>
This module contains two different approaches to calculating
great circle distances on the surface of the Earth. The one described
first depends on the contrib/cube package (which MUST be installed before
earthdistance is installed). The second one is based on the point
datatype using latitude and longitude for the coordinates. The install
script makes the defined functions executable by anyone.
</para>
<para>
A spherical model of the Earth is used.
</para>
<para>
Data is stored in cubes that are points (both corners are the same) using 3
coordinates representing the distance from the center of the Earth.
</para>
<para>
The radius of the Earth is obtained from the earth() function. It is
given in meters. But by changing this one function you can change it
to use some other units or to use a different value of the radius
that you feel is more appropiate.
</para>
<para>
This package also has applications to astronomical databases as well.
Astronomers will probably want to change earth() to return a radius of
180/pi() so that distances are in degrees.
</para>
<para>
Functions are provided to allow for input in latitude and longitude (in
degrees), to allow for output of latitude and longitude, to calculate
the great circle distance between two points and to easily specify a
bounding box usable for index searches.
</para>
<para>
The functions are all 'sql' functions. If you want to make these functions
executable by other people you will also have to make the referenced
cube functions executable. cube(text), cube(float8), cube(cube,float8),
cube_distance(cube,cube), cube_ll_coord(cube,int) and
cube_enlarge(cube,float8,int) are used indirectly by the earth distance
functions. is_point(cube) and cube_dim(cube) are used in constraints for data
in domain earth. cube_ur_coord(cube,int) is used in the regression tests and
might be useful for looking at bounding box coordinates in user applications.
</para>
<para>
A domain of type cube named earth is defined.
There are constraints on it defined to make sure the cube is a point,
that it does not have more than 3 dimensions and that it is very near
the surface of a sphere centered about the origin with the radius of
the Earth.
</para>
<para>
The following functions are provided:
</para>
<table id="earthdistance-functions">
<title>EarthDistance functions</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>earth()</literal></entry>
<entry>returns the radius of the Earth in meters.</entry>
</row>
<row>
<entry><literal>sec_to_gc(float8)</literal></entry>
<entry>converts the normal straight line
(secant) distance between between two points on the surface of the Earth
to the great circle distance between them.
</entry>
</row>
<row>
<entry><literal>gc_to_sec(float8)</literal></entry>
<entry>Converts the great circle distance
between two points on the surface of the Earth to the normal straight line
(secant) distance between them.
</entry>
</row>
<row>
<entry><literal>ll_to_earth(float8, float8)</literal></entry>
<entry>Returns the location of a point on the surface of the Earth given
its latitude (argument 1) and longitude (argument 2) in degrees.
</entry>
</row>
<row>
<entry><literal>latitude(earth)</literal></entry>
<entry>Returns the latitude in degrees of a point on the surface of the
Earth.
</entry>
</row>
<row>
<entry><literal>longitude(earth)</literal></entry>
<entry>Returns the longitude in degrees of a point on the surface of the
Earth.
</entry>
</row>
<row>
<entry><literal>earth_distance(earth, earth)</literal></entry>
<entry>Returns the great circle distance between two points on the
surface of the Earth.
</entry>
</row>
<row>
<entry><literal>earth_box(earth, float8)</literal></entry>
<entry>Returns a box suitable for an indexed search using the cube @>
operator for points within a given great circle distance of a location.
Some points in this box are further than the specified great circle
distance from the location so a second check using earth_distance
should be made at the same time.
</entry>
</row>
<row>
<entry><literal><@></literal> operator</entry>
<entry>gives the distance in statute miles between
two points on the Earth's surface. Coordinates are in degrees. Points are
taken as (longitude, latitude) and not vice versa as longitude is closer
to the intuitive idea of x-axis and latitude to y-axis.
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
One advantage of using cube representation over a point using latitude and
longitude for coordinates, is that you don't have to worry about special
conditions at +/- 180 degrees of longitude or near the poles.
</para>
</sect1>

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.51 2007/11/01 17:00:18 momjian Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.52 2007/11/10 23:30:46 momjian Exp $ -->
<!entity history SYSTEM "history.sgml">
<!entity info SYSTEM "info.sgml">
@ -89,6 +89,38 @@
<!entity sources SYSTEM "sources.sgml">
<!entity storage SYSTEM "storage.sgml">
<!-- contrib information -->
<!entity contrib SYSTEM "contrib.sgml">
<!entity adminpack SYSTEM "adminpack.sgml">
<!entity btree-gist SYSTEM "btree-gist.sgml">
<!entity chkpass SYSTEM "chkpass.sgml">
<!entity cube SYSTEM "cube.sgml">
<!entity dblink SYSTEM "dblink.sgml">
<!entity earthdistance SYSTEM "earthdistance.sgml">
<!entity fuzzystrmatch SYSTEM "fuzzystrmatch.sgml">
<!entity hstore SYSTEM "hstore.sgml">
<!entity intagg SYSTEM "intagg.sgml">
<!entity intarray SYSTEM "intarray.sgml">
<!entity isn SYSTEM "isn.sgml">
<!entity lo SYSTEM "lo.sgml">
<!entity ltree SYSTEM "ltree.sgml">
<!entity oid2name SYSTEM "oid2name.sgml">
<!entity pageinspect SYSTEM "pageinspect.sgml">
<!entity pgbench SYSTEM "pgbench.sgml">
<!entity buffercache SYSTEM "buffercache.sgml">
<!entity pgcrypto SYSTEM "pgcrypto.sgml">
<!entity freespacemap SYSTEM "freespacemap.sgml">
<!entity pgrowlocks SYSTEM "pgrowlocks.sgml">
<!entity standby SYSTEM "standby.sgml">
<!entity pgstattuple SYSTEM "pgstattuple.sgml">
<!entity trgm SYSTEM "trgm.sgml">
<!entity seg SYSTEM "seg.sgml">
<!entity sslinfo SYSTEM "sslinfo.sgml">
<!entity tablefunc SYSTEM "tablefunc.sgml">
<!entity uuid-ossp SYSTEM "uuid-ossp.sgml">
<!entity vacuumlo SYSTEM "vacuumlo.sgml">
<!entity xml2 SYSTEM "xml2.sgml">
<!-- appendixes -->
<!entity contacts SYSTEM "contacts.sgml">
<!entity cvs SYSTEM "cvs.sgml">

View File

@ -0,0 +1,243 @@
<sect1 id="pgfreespacemap">
<title>pgfreespacemap</title>
<indexterm zone="pgfreespacemap">
<primary>pgfreespacemap</primary>
</indexterm>
<para>
This modules provides the means for examining the free space map (FSM). It
consists of two C functions: <literal>pg_freespacemap_relations()</literal>
and <literal>pg_freespacemap_pages()</literal> that return a set
of records, plus two views <literal>pg_freespacemap_relations</literal> and
<literal>pg_freespacemap_pages</literal> for more user-friendly access to
the functions.
</para>
<para>
The module provides the ability to examine the contents of the free space
map, without having to restart or rebuild the server with additional
debugging code.
</para>
<para>
By default public access is REVOKED from the functions and views, just in
case there are security issues present in the code.
</para>
<sect2>
<title>Notes</title>
<para>
The definitions for the columns exposed in the views are:
</para>
<table>
<title>pg_freespacemap_relations</title>
<tgroup cols="3">
<thead>
<row>
<entry>Column</entry>
<entry>references</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>reltablespace</entry>
<entry>pg_tablespace.oid</entry>
<entry>Tablespace oid of the relation.</entry>
</row>
<row>
<entry>reldatabase</entry>
<entry>pg_database.oid</entry>
<entry>Database oid of the relation.</entry>
</row>
<row>
<entry>relfilenode</entry>
<entry>pg_class.relfilenode</entry>
<entry>Relfilenode of the relation.</entry>
</row>
<row>
<entry>avgrequest</entry>
<entry></entry>
<entry>Moving average of free space requests (NULL for indexes)</entry>
</row>
<row>
<entry>interestingpages</entry>
<entry></entry>
<entry>Count of pages last reported as containing useful free space.</entry>
</row>
<row>
<entry>storedpages</entry>
<entry></entry>
<entry>Count of pages actually stored in free space map.</entry>
</row>
<row>
<entry>nextpage</entry>
<entry></entry>
<entry>Page index (from 0) to start next search at.</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>pg_freespacemap_pages</title>
<tgroup cols="3">
<thead>
<row>
<entry>Column</entry>
<entry> references</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>reltablespace</entry>
<entry>pg_tablespace.oid</entry>
<entry>Tablespace oid of the relation.</entry>
</row>
<row>
<entry>reldatabase</entry>
<entry>pg_database.oid</entry>
<entry>Database oid of the relation.</entry>
</row>
<row>
<entry>relfilenode</entry>
<entry>pg_class.relfilenode</entry>
<entry>Relfilenode of the relation.</entry>
</row>
<row>
<entry>relblocknumber</entry>
<entry></entry>
<entry>Page number in the relation.</entry>
</row>
<row>
<entry>bytes</entry>
<entry></entry>
<entry>Free bytes in the page, or NULL for an index page (see below).</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
For <literal>pg_freespacemap_relations</literal>, there is one row for each
relation in the free space map. <literal>storedpages</literal> is the
number of pages actually stored in the map, while
<literal>interestingpages</literal> is the number of pages the last VACUUM
thought had useful amounts of free space.
</para>
<para>
If <literal>storedpages</literal> is consistently less than interestingpages
then it'd be a good idea to increase <literal>max_fsm_pages</literal>. Also,
if the number of rows in <literal>pg_freespacemap_relations</literal> is
close to <literal>max_fsm_relations</literal>, then you should consider
increasing <literal>max_fsm_relations</literal>.
</para>
<para>
For <literal>pg_freespacemap_pages</literal>, there is one row for each page
in the free space map. The number of rows for a relation will match the
<literal>storedpages</literal> column in
<literal>pg_freespacemap_relations</literal>.
</para>
<para>
For indexes, what is tracked is entirely-unused pages, rather than free
space within pages. Therefore, the average request size and free bytes
within a page are not meaningful, and are shown as NULL.
</para>
<para>
Because the map is shared by all the databases, it will include relations
not belonging to the current database.
</para>
<para>
When either of the views are accessed, internal free space map locks are
taken, and a copy of the map data is made for them to display.
This ensures that the views produce a consistent set of results, while not
blocking normal activity longer than necessary. Nonetheless there
could be some impact on database performance if they are read often.
</para>
</sect2>
<sect2>
<title>Sample output - pg_freespacemap_relations</title>
<programlisting>
regression=# \d pg_freespacemap_relations
View "public.pg_freespacemap_relations"
Column | Type | Modifiers
------------------+---------+-----------
reltablespace | oid |
reldatabase | oid |
relfilenode | oid |
avgrequest | integer |
interestingpages | integer |
storedpages | integer |
nextpage | integer |
View definition:
SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.avgrequest, p.interestingpages, p.storedpages, p.nextpage
FROM pg_freespacemap_relations() p(reltablespace oid, reldatabase oid, relfilenode oid, avgrequest integer, interestingpages integer, storedpages integer, nextpage integer);
regression=# SELECT c.relname, r.avgrequest, r.interestingpages, r.storedpages
FROM pg_freespacemap_relations r INNER JOIN pg_class c
ON c.relfilenode = r.relfilenode INNER JOIN pg_database d
ON r.reldatabase = d.oid AND (d.datname = current_database())
ORDER BY r.storedpages DESC LIMIT 10;
relname | avgrequest | interestingpages | storedpages
---------------------------------+------------+------------------+-------------
onek | 256 | 109 | 109
pg_attribute | 167 | 93 | 93
pg_class | 191 | 49 | 49
pg_attribute_relid_attnam_index | | 48 | 48
onek2 | 256 | 37 | 37
pg_depend | 95 | 26 | 26
pg_type | 199 | 16 | 16
pg_rewrite | 1011 | 13 | 13
pg_class_relname_nsp_index | | 10 | 10
pg_proc | 302 | 8 | 8
(10 rows)
</programlisting>
</sect2>
<sect2>
<title>Sample output - pg_freespacemap_pages</title>
<programlisting>
regression=# \d pg_freespacemap_pages
View "public.pg_freespacemap_pages"
Column | Type | Modifiers
----------------+---------+-----------
reltablespace | oid |
reldatabase | oid |
relfilenode | oid |
relblocknumber | bigint |
bytes | integer |
View definition:
SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.relblocknumber, p.bytes
FROM pg_freespacemap_pages() p(reltablespace oid, reldatabase oid, relfilenode oid, relblocknumber bigint, bytes integer);
regression=# SELECT c.relname, p.relblocknumber, p.bytes
FROM pg_freespacemap_pages p INNER JOIN pg_class c
ON c.relfilenode = p.relfilenode INNER JOIN pg_database d
ON (p.reldatabase = d.oid AND d.datname = current_database())
ORDER BY c.relname LIMIT 10;
relname | relblocknumber | bytes
--------------+----------------+-------
a_star | 0 | 8040
abstime_tbl | 0 | 7908
aggtest | 0 | 8008
altinhoid | 0 | 8128
altstartwith | 0 | 8128
arrtest | 0 | 7172
b_star | 0 | 7976
box_tbl | 0 | 7912
bt_f8_heap | 54 | 7728
bt_i4_heap | 49 | 8008
(10 rows)
</programlisting>
</sect2>
<sect2>
<title>Author</title>
<para>
Mark Kirkwood <email>markir@paradise.net.nz</email>
</para>
</sect2>
</sect1>

View File

@ -0,0 +1,122 @@
<sect1 id="fuzzystrmatch">
<title>fuzzystrmatch</title>
<para>
This section describes the fuzzystrmatch module which provides different
functions to determine similarities and distance between strings.
</para>
<sect2>
<title>Soundex</title>
<para>
The Soundex system is a method of matching similar sounding names
(or any words) to the same code. It was initially used by the
United States Census in 1880, 1900, and 1910, but it has little use
beyond English names (or the English pronunciation of names), and
it is not a linguistic tool.
</para>
<para>
When comparing two soundex values to determine similarity, the
difference function reports how close the match is on a scale
from zero to four, with zero being no match and four being an
exact match.
</para>
<para>
The following are some usage examples:
</para>
<programlisting>
SELECT soundex('hello world!');
SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann');
SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');
CREATE TABLE s (nm text);
INSERT INTO s VALUES ('john');
INSERT INTO s VALUES ('joan');
INSERT INTO s VALUES ('wobbly');
INSERT INTO s VALUES ('jack');
SELECT * FROM s WHERE soundex(nm) = soundex('john');
SELECT a.nm, b.nm FROM s a, s b WHERE soundex(a.nm) = soundex(b.nm) AND a.oid <> b.oid;
CREATE FUNCTION text_sx_eq(text, text) RETURNS boolean AS
'select soundex($1) = soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_lt(text, text) RETURNS boolean AS
'select soundex($1) < soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_gt(text, text) RETURNS boolean AS
'select soundex($1) > soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_le(text, text) RETURNS boolean AS
'select soundex($1) <= soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_ge(text, text) RETURNS boolean AS
'select soundex($1) >= soundex($2)'
LANGUAGE SQL;
CREATE FUNCTION text_sx_ne(text, text) RETURNS boolean AS
'select soundex($1) <> soundex($2)'
LANGUAGE SQL;
DROP OPERATOR #= (text, text);
CREATE OPERATOR #= (leftarg=text, rightarg=text, procedure=text_sx_eq, commutator = #=);
SELECT * FROM s WHERE text_sx_eq(nm, 'john');
SELECT * FROM s WHERE s.nm #= 'john';
SELECT * FROM s WHERE difference(s.nm, 'john') > 2;
</programlisting>
</sect2>
<sect2>
<title>levenshtein</title>
<para>
This function calculates the levenshtein distance between two strings:
</para>
<programlisting>
int levenshtein(text source, text target)
</programlisting>
<para>
Both <literal>source</literal> and <literal>target</literal> can be any
NOT NULL string with a maximum of 255 characters.
</para>
<para>
Example:
</para>
<programlisting>
SELECT levenshtein('GUMBO','GAMBOL');
</programlisting>
</sect2>
<sect2>
<title>metaphone</title>
<para>
This function calculates and returns the metaphone code of an input string:
</para>
<programlisting>
text metahpone(text source, int max_output_length)
</programlisting>
<para>
<literal>source</literal> has to be a NOT NULL string with a maximum of
255 characters. <literal>max_output_length</literal> fixes the maximum
length of the output metaphone code; if longer, the output is truncated
to this length.
</para>
<para>Example</para>
<programlisting>
SELECT metaphone('GUMBO',4);
</programlisting>
</sect2>
</sect1>

298
doc/src/sgml/hstore.sgml Normal file
View File

@ -0,0 +1,298 @@
<sect1 id="hstore">
<title>hstore</title>
<indexterm zone="hstore">
<primary>hstore</primary>
</indexterm>
<para>
The <literal>hstore</literal> module is usefull for storing (key,value) pairs.
This module can be useful in different scenarios: case with many attributes
rarely searched, semistructural data or a lazy DBA.
</para>
<sect2>
<title>Operations</title>
<itemizedlist>
<listitem>
<para>
<literal>hstore -> text</literal> - get value , perl analogy $h{key}
</para>
<programlisting>
select 'a=>q, b=>g'->'a';
?
------
q
</programlisting>
<para>
Note the use of parenthesis in the select below, because priority of 'is' is
higher than that of '->':
</para>
<programlisting>
SELECT id FROM entrants WHERE (info->'education_period') IS NOT NULL;
</programlisting>
</listitem>
<listitem>
<para>
<literal>hstore || hstore</literal> - concatenation, perl analogy %a=( %b, %c );
</para>
<programlisting>
regression=# select 'a=>b'::hstore || 'c=>d'::hstore;
?column?
--------------------
"a"=>"b", "c"=>"d"
(1 row)
</programlisting>
<para>
but, notice
</para>
<programlisting>
regression=# select 'a=>b'::hstore || 'a=>d'::hstore;
?column?
----------
"a"=>"d"
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>text => text</literal> - creates hstore type from two text strings
</para>
<programlisting>
select 'a'=>'b';
?column?
----------
"a"=>"b"
</programlisting>
</listitem>
<listitem>
<para>
<literal>hstore @> hstore</literal> - contains operation, check if left operand contains right.
</para>
<programlisting>
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'a=>c';
?column?
----------
f
(1 row)
regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'b=>1';
?column?
----------
t
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>hstore &lt;@ hstore</literal> - contained operation, check if
left operand is contained in right
</para>
<para>
(Before PostgreSQL 8.2, the containment operators @&gt; and &lt;@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Functions</title>
<itemizedlist>
<listitem>
<para>
<literal>akeys(hstore)</literal> - returns all keys from hstore as array
</para>
<programlisting>
regression=# select akeys('a=>1,b=>2');
akeys
-------
{a,b}
</programlisting>
</listitem>
<listitem>
<para>
<literal>skeys(hstore)</literal> - returns all keys from hstore as strings
</para>
<programlisting>
regression=# select skeys('a=>1,b=>2');
skeys
-------
a
b
</programlisting>
</listitem>
<listitem>
<para>
<literal>avals(hstore)</literal> - returns all values from hstore as array
</para>
<programlisting>
regression=# select avals('a=>1,b=>2');
avals
-------
{1,2}
</programlisting>
</listitem>
<listitem>
<para>
<literal>svals(hstore)</literal> - returns all values from hstore as
strings
</para>
<programlisting>
regression=# select svals('a=>1,b=>2');
svals
-------
1
2
</programlisting>
</listitem>
<listitem>
<para>
<literal>delete (hstore,text)</literal> - delete (key,value) from hstore if
key matches argument.
</para>
<programlisting>
regression=# select delete('a=>1,b=>2','b');
delete
----------
"a"=>"1"
</programlisting>
</listitem>
<listitem>
<para>
<literal>each(hstore)</literal> - return (key, value) pairs
</para>
<programlisting>
regression=# select * from each('a=>1,b=>2');
key | value
-----+-------
a | 1
b | 2
</programlisting>
</listitem>
<listitem>
<para>
<literal>exist (hstore,text)</literal>
</para>
<para>
<literal>hstore ? text</literal> - returns 'true if key is exists in hstore
and false otherwise.
</para>
<programlisting>
regression=# select exist('a=>1','a'), 'a=>1' ? 'a';
exist | ?column?
-------+----------
t | t
</programlisting>
</listitem>
<listitem>
<para>
<literal>defined (hstore,text)</literal> - returns true if key is exists in
hstore and its value is not NULL.
</para>
<programlisting>
regression=# select defined('a=>NULL','a');
defined
---------
f
</programlisting>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Indices</title>
<para>
Module provides index support for '@>' and '?' operations.
</para>
<programlisting>
CREATE INDEX hidx ON testhstore USING GIST(h);
CREATE INDEX hidx ON testhstore USING GIN(h);
</programlisting>
</sect2>
<sect2>
<title>Examples</title>
<para>
Add a key:
</para>
<programlisting>
UPDATE tt SET h=h||'c=>3';
</programlisting>
<para>
Delete a key:
</para>
<programlisting>
UPDATE tt SET h=delete(h,'k1');
</programlisting>
</sect2>
<sect2>
<title>Statistics</title>
<para>
hstore type, because of its intrinsic liberality, could contain a lot of
different keys. Checking for valid keys is the task of application.
Examples below demonstrate several techniques how to check keys statistics.
</para>
<para>
Simple example
</para>
<programlisting>
SELECT * FROM each('aaa=>bq, b=>NULL, ""=>1 ');
</programlisting>
<para>
Using table
</para>
<programlisting>
SELECT (each(h)).key, (each(h)).value INTO stat FROM testhstore ;
</programlisting>
<para>Online stat</para>
<programlisting>
SELECT key, count(*) FROM (SELECT (each(h)).key FROM testhstore) AS stat GROUP BY key ORDER BY count DESC, key;
key | count
-----------+-------
line | 883
query | 207
pos | 203
node | 202
space | 197
status | 195
public | 194
title | 190
org | 189
...................
</programlisting>
</sect2>
<sect2>
<title>Authors</title>
<para>
Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia
</para>
<para>
Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia
</para>
</sect2>
</sect1>

82
doc/src/sgml/intagg.sgml Normal file
View File

@ -0,0 +1,82 @@
<sect1 id="intagg">
<title>intagg</title>
<indexterm zone="intagg">
<primary>intagg</primary>
</indexterm>
<para>
This section describes the <literal>intagg</literal> module which provides an integer aggregator and an enumerator.
</para>
<para>
Many database systems have the notion of a one to many table. Such a table usually sits between two indexed tables, as:
</para>
<programlisting>
CREATE TABLE one_to_many(left INT, right INT) ;
</programlisting>
<para>
And it is used like this:
</para>
<programlisting>
SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right)
WHERE one_to_many.left = item;
</programlisting>
<para>
This will return all the items in the right hand table for an entry
in the left hand table. This is a very common construct in SQL.
</para>
<para>
Now, this methodology can be cumbersome with a very large number of
entries in the one_to_many table. Depending on the order in which
data was entered, a join like this could result in an index scan
and a fetch for each right hand entry in the table for a particular
left hand entry. If you have a very dynamic system, there is not much you
can do. However, if you have some data which is fairly static, you can
create a summary table with the aggregator.
</para>
<programlisting>
CREATE TABLE summary as SELECT left, int_array_aggregate(right)
AS right FROM one_to_many GROUP BY left;
</programlisting>
<para>
This will create a table with one row per left item, and an array
of right items. Now this is pretty useless without some way of using
the array, thats why there is an array enumerator.
</para>
<programlisting>
SELECT left, int_array_enum(right) FROM summary WHERE left = item;
</programlisting>
<para>
The above query using int_array_enum, produces the same results as:
</para>
<programlisting>
SELECT left, right FROM one_to_many WHERE left = item;
</programlisting>
<para>
The difference is that the query against the summary table has to get
only one row from the table, where as the query against "one_to_many"
must index scan and fetch a row for each entry.
</para>
<para>
On our system, an EXPLAIN shows a query with a cost of 8488 gets reduced
to a cost of 329. The query is a join between the one_to_many table,
</para>
<programlisting>
SELECT right, count(right) FROM
(
SELECT left, int_array_enum(right) AS right FROM summary JOIN
(SELECT left FROM left_table WHERE left = item) AS lefts
ON (summary.left = lefts.left )
) AS list GROUP BY right ORDER BY count DESC ;
</programlisting>
</sect1>

286
doc/src/sgml/intarray.sgml Normal file
View File

@ -0,0 +1,286 @@
<sect1 id="intarray">
<title>intarray</title>
<indexterm zone="intarray">
<primary>intarray</primary>
</indexterm>
<para>
This is an implementation of RD-tree data structure using GiST interface
of PostgreSQL. It has built-in lossy compression.
</para>
<para>
Current implementation provides index support for one-dimensional array of
int4's - gist__int_ops, suitable for small and medium size of arrays (used on
default), and gist__intbig_ops for indexing large arrays (we use superimposed
signature with length of 4096 bits to represent sets).
</para>
<sect2>
<title>Functions</title>
<itemizedlist>
<listitem>
<para>
<literal>int icount(int[])</literal> - the number of elements in intarray
</para>
<programlisting>
test=# select icount('{1,2,3}'::int[]);
icount
--------
3
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>int[] sort(int[], 'asc' | 'desc')</literal> - sort intarray
</para>
<programlisting>
test=# select sort('{1,2,3}'::int[],'desc');
sort
---------
{3,2,1}
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>int[] sort(int[])</literal> - sort in ascending order
</para>
</listitem>
<listitem>
<para>
<literal>int[] sort_asc(int[]),sort_desc(int[])</literal> - shortcuts for sort
</para>
</listitem>
<listitem>
<para>
<literal>int[] uniq(int[])</literal> - returns unique elements
</para>
<programlisting>
test=# select uniq(sort('{1,2,3,2,1}'::int[]));
uniq
---------
{1,2,3}
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>int idx(int[], int item)</literal> - returns index of first
intarray matching element to item, or '0' if matching failed.
</para>
<programlisting>
test=# select idx('{1,2,3,2,1}'::int[],2);
idx
-----
2
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>int[] subarray(int[],int START [, int LEN])</literal> - returns
part of intarray starting from element number START (from 1) and length LEN.
</para>
<programlisting>
test=# select subarray('{1,2,3,2,1}'::int[],2,3);
subarray
----------
{2,3,2}
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
<literal>int[] intset(int4)</literal> - casting int4 to int[]
</para>
<programlisting>
test=# select intset(1);
intset
--------
{1}
(1 row)
</programlisting>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Operations</title>
<table>
<title>Operations</title>
<tgroup cols="2">
<thead>
<row>
<entry>Operator</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>int[] && int[]</literal></entry>
<entry>overlap - returns TRUE if arrays have at least one common element</entry>
</row>
<row>
<entry><literal>int[] @> int[]</literal></entry>
<entry>contains - returns TRUE if left array contains right array</entry>
</row>
<row>
<entry><literal>int[] <@ int[]</literal></entry>
<entry>contained - returns TRUE if left array is contained in right array</entry>
</row>
<row>
<entry><literal># int[]</literal></entry>
<entry>returns the number of elements in array</entry>
</row>
<row>
<entry><literal>int[] + int</literal></entry>
<entry>push element to array ( add to end of array)</entry>
</row>
<row>
<entry><literal>int[] + int[] </literal></entry>
<entry>merge of arrays (right array added to the end of left one)</entry>
</row>
<row>
<entry><literal>int[] - int</literal></entry>
<entry>remove entries matched by right argument from array</entry>
</row>
<row>
<entry><literal>int[] - int[]</literal></entry>
<entry>remove right array from left</entry>
</row>
<row>
<entry><literal>int[] | int</literal></entry>
<entry>returns intarray - union of arguments</entry>
</row>
<row>
<entry><literal>int[] | int[]</literal></entry>
<entry>returns intarray as a union of two arrays</entry>
</row>
<row>
<entry><literal>int[] & int[]</literal></entry>
<entry>returns intersection of arrays</entry>
</row>
<row>
<entry><literal>int[] @@ query_int</literal></entry>
<entry>
returns TRUE if array satisfies query (like
<literal>'1&amp;(2|3)'</literal>)
</entry>
</row>
<row>
<entry><literal>query_int ~~ int[]</literal></entry>
<entry>returns TRUE if array satisfies query (commutator of @@)</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
</sect2>
<sect2>
<title>Example</title>
<programlisting>
CREATE TABLE message (mid INT NOT NULL,sections INT[]);
CREATE TABLE message_section_map (mid INT NOT NULL,sid INT NOT NULL);
-- create indices
CREATE unique index message_key ON message ( mid );
CREATE unique index message_section_map_key2 ON message_section_map (sid, mid );
CREATE INDEX message_rdtree_idx ON message USING GIST ( sections gist__int_ops);
-- select some messages with section in 1 OR 2 - OVERLAP operator
SELECT message.mid FROM message WHERE message.sections && '{1,2}';
-- select messages contains in sections 1 AND 2 - CONTAINS operator
SELECT message.mid FROM message WHERE message.sections @> '{1,2}';
-- the same, CONTAINED operator
SELECT message.mid FROM message WHERE '{1,2}' <@ message.sections;
</programlisting>
</sect2>
<sect2>
<title>Benchmark</title>
<para>
subdirectory bench contains benchmark suite.
</para>
<programlisting>
cd ./bench
1. createdb TEST
2. psql TEST < ../_int.sql
3. ./create_test.pl | psql TEST
4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries
with/without RD-Tree. Run script without arguments to
see availbale options.
a)test without RD-Tree (OR)
./bench.pl -d TEST -c -s 1,2 -v
b)test with RD-Tree
./bench.pl -d TEST -c -s 1,2 -v -r
BENCHMARKS:
Size of table &lt;message>: 200000
Size of table &lt;message_section_map>: 269133
Distribution of messages by sections:
section 0: 74377 messages
section 1: 16284 messages
section 50: 1229 messages
section 99: 683 messages
old - without RD-Tree support,
new - with RD-Tree
+----------+---------------+----------------+
|Search set|OR, time in sec|AND, time in sec|
| +-------+-------+--------+-------+
| | old | new | old | new |
+----------+-------+-------+--------+-------+
| 1| 0.625| 0.101| -| -|
+----------+-------+-------+--------+-------+
| 99| 0.018| 0.017| -| -|
+----------+-------+-------+--------+-------+
| 1,2| 0.766| 0.133| 0.628| 0.045|
+----------+-------+-------+--------+-------+
| 1,2,50,65| 0.794| 0.141| 0.030| 0.006|
+----------+-------+-------+--------+-------+
</programlisting>
</sect2>
<sect2>
<title>Authors</title>
<para>
All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and Oleg
Bartunov (<email>oleg@sai.msu.su</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for
additional information. Andrey Oktyabrski did a great work on adding new
functions and operations.
</para>
</sect2>
</sect1>

502
doc/src/sgml/isn.sgml Normal file
View File

@ -0,0 +1,502 @@
<sect1 id="isn">
<title>isn</title>
<indexterm zone="isn">
<primary>isn</primary>
</indexterm>
<para>
The <literal>isn</literal> module adds data types for the following
international-standard namespaces: EAN13, UPC, ISBN (books), ISMN (music),
and ISSN (serials). This module is inspired by Garrett A. Wollman's
isbn_issn code.
</para>
<para>
This module validates, and automatically adds the correct
hyphenations to the numbers. Also, it supports the new ISBN-13
numbers to be used starting in January 2007.
</para>
<para>
Premises:
</para>
<orderedlist>
<listitem>
<para>ISBN13, ISMN13, ISSN13 numbers are all EAN13 numbers</para>
</listitem>
<listitem>
<para>EAN13 numbers aren't always ISBN13, ISMN13 or ISSN13 (some are)</para>
</listitem>
<listitem>
<para>some ISBN13 numbers can be displayed as ISBN</para>
</listitem>
<listitem>
<para>some ISMN13 numbers can be displayed as ISMN</para>
</listitem>
<listitem>
<para>some ISSN13 numbers can be displayed as ISSN</para>
</listitem>
<listitem>
<para>all UPC, ISBN, ISMN and ISSN can be represented as EAN13 numbers</para>
</listitem>
</orderedlist>
<note>
<para>
All types are internally represented as 64 bit integers,
and internally all are consistently interchangeable.
</para>
</note>
<note>
<para>
We have two operator classes (for btree and for hash) so each data type
can be indexed for faster access.
</para>
</note>
<sect2>
<title>Data types</title>
<para>
We have the following data types:
</para>
<table>
<title>Data types</title>
<tgroup cols="2">
<thead>
<row>
<entry><para>Data type</para></entry>
<entry><para>Description</para></entry>
</row>
</thead>
<tbody>
<row>
<entry><para><literal>EAN13</literal></para></entry>
<entry>
<para>
European Article Numbers. This type will always show the EAN13-display
format. Te output function for this is <literal>ean13_out()</literal>
</para>
</entry>
</row>
<row>
<entry><para><literal>ISBN13</literal></para></entry>
<entry>
<para>
For International Standard Book Numbers to be displayed in
the new EAN13-display format.
</para>
</entry>
</row>
<row>
<entry><para><literal>ISMN13</literal></para></entry>
<entry>
<para>
For International Standard Music Numbers to be displayed in
the new EAN13-display format.
</para>
</entry>
</row>
<row>
<entry><para><literal>ISSN13</literal></para></entry>
<entry>
<para>
For International Standard Serial Numbers to be displayed in the new
EAN13-display format.
</para>
</entry>
</row>
<row>
<entry><para><literal>ISBN</literal></para></entry>
<entry>
<para>
For International Standard Book Numbers to be displayed in the current
short-display format.
</para>
</entry>
</row>
<row>
<entry><para><literal>ISMN</literal></para></entry>
<entry>
<para>
For International Standard Music Numbers to be displayed in the
current short-display format.
</para>
</entry>
</row>
<row>
<entry><para><literal>ISSN</literal></para></entry>
<entry>
<para>
For International Standard Serial Numbers to be displayed in the
current short-display format. These types will display the short
version of the ISxN (ISxN 10) whenever it's possible, and it will
show ISxN 13 when it's impossible to show the short version. The
output function to do this is <literal>isn_out()</literal>
</para>
</entry>
</row>
<row>
<entry><para><literal>UPC</literal></para></entry>
<entry>
<para>
For Universal Product Codes. UPC numbers are a subset of the EAN13
numbers (they are basically EAN13 without the first '0' digit.)
The output function to do this is also <literal>isn_out()</literal>
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
<note>
<para>
<literal>EAN13</literal>, <literal>ISBN13</literal>,
<literal>ISMN13</literal> and <literal>ISSN13</literal> types will always
display the long version of the ISxN (EAN13). The output function to do
this is <literal>ean13_out()</literal>.
</para>
<para>
The need for these types is just for displaying in different ways the same
data: <literal>ISBN13</literal> is actually the same as
<literal>ISBN</literal>, <literal>ISMN13=ISMN</literal> and
<literal>ISSN13=ISSN</literal>.
</para>
</note>
</sect2>
<sect2>
<title>Input functions</title>
<para>
We have the following input functions:
</para>
<table>
<title>Input functions</title>
<tgroup cols="2">
<thead>
<row>
<entry>Function</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><para><literal>ean13_in()</literal></para></entry>
<entry>
<para>
To take a string and return an EAN13.
</para>
</entry>
</row>
<row>
<entry><para><literal>isbn_in()</literal></para></entry>
<entry>
<para>
To take a string and return valid ISBN or ISBN13 numbers.
</para>
</entry>
</row>
<row>
<entry><para><literal>ismn_in()</literal></para></entry>
<entry>
<para>
To take a string and return valid ISMN or ISMN13 numbers.
</para>
</entry>
</row>
<row>
<entry><para><literal>issn_in()</literal></para></entry>
<entry>
<para>
To take a string and return valid ISSN or ISSN13 numbers.
</para>
</entry>
</row>
<row>
<entry><para><literal>upc_in()</literal></para></entry>
<entry>
<para>
To take a string and return an UPC codes.
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Casts</title>
<para>
We are able to cast from:
</para>
<itemizedlist>
<listitem>
<para>
ISBN13 -> EAN13
</para>
</listitem>
<listitem>
<para>
ISMN13 -> EAN13
</para>
</listitem>
<listitem>
<para>
ISSN13 -> EAN13
</para>
</listitem>
<listitem>
<para>
ISBN -> EAN13
</para>
</listitem>
<listitem>
<para>
ISMN -> EAN13
</para>
</listitem>
<listitem>
<para>
ISSN -> EAN13
</para>
</listitem>
<listitem>
<para>
UPC -> EAN13
</para>
</listitem>
<listitem>
<para>
ISBN <-> ISBN13
</para>
</listitem>
<listitem>
<para>
ISMN <-> ISMN13
</para>
</listitem>
<listitem>
<para>
ISSN <-> ISSN13
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>C API</title>
<para>
The C API is implemented as:
</para>
<programlisting>
extern Datum isn_out(PG_FUNCTION_ARGS);
extern Datum ean13_out(PG_FUNCTION_ARGS);
extern Datum ean13_in(PG_FUNCTION_ARGS);
extern Datum isbn_in(PG_FUNCTION_ARGS);
extern Datum ismn_in(PG_FUNCTION_ARGS);
extern Datum issn_in(PG_FUNCTION_ARGS);
extern Datum upc_in(PG_FUNCTION_ARGS);
</programlisting>
<para>
On success:
</para>
<itemizedlist>
<listitem>
<para>
<literal>isn_out()</literal> takes any of our types and returns a string containing
the shortes possible representation of the number.
</para>
</listitem>
<listitem>
<para>
<literal>ean13_out()</literal> takes any of our types and returns the
EAN13 (long) representation of the number.
</para>
</listitem>
<listitem>
<para>
<literal>ean13_in()</literal> takes a string and return a EAN13. Which, as stated in (2)
could or could not be any of our types, but it certainly is an EAN13
number. Only if the string is a valid EAN13 number, otherwise it fails.
</para>
</listitem>
<listitem>
<para>
<literal>isbn_in()</literal> takes a string and return an ISBN/ISBN13. Only if the string
is really a ISBN/ISBN13, otherwise it fails.
</para>
</listitem>
<listitem>
<para>
<literal>ismn_in()</literal> takes a string and return an ISMN/ISMN13. Only if the string
is really a ISMN/ISMN13, otherwise it fails.
</para>
</listitem>
<listitem>
<para>
<literal>issn_in()</literal> takes a string and return an ISSN/ISSN13. Only if the string
is really a ISSN/ISSN13, otherwise it fails.
</para>
</listitem>
<listitem>
<para>
<literal>upc_in()</literal> takes a string and return an UPC. Only if the string is
really a UPC, otherwise it fails.
</para>
</listitem>
</itemizedlist>
<para>
(on failure, the functions 'ereport' the error)
</para>
</sect2>
<sect2>
<title>Testing functions</title>
<table>
<title>Testing functions</title>
<tgroup cols="2">
<thead>
<row>
<entry><para>Function</para></entry>
<entry><para>Description</para></entry>
</row>
</thead>
<tbody>
<row>
<entry><para><literal>isn_weak(boolean)</literal></para></entry>
<entry><para>Sets the weak input mode.</para></entry>
</row>
<row>
<entry><para><literal>isn_weak()</literal></para></entry>
<entry><para>Gets the current status of the weak mode.</para></entry>
</row>
<row>
<entry><para><literal>make_valid()</literal></para></entry>
<entry><para>Validates an invalid number (deleting the invalid flag).</para></entry>
</row>
<row>
<entry><para><literal>is_valid()</literal></para></entry>
<entry><para>Checks for the invalid flag prsence.</para></entry>
</row>
</tbody>
</tgroup>
</table>
<para>
<literal>Weak</literal> mode is used to be able to insert invalid data to
a table. Invalid as in the check digit being wrong, not missing numbers.
</para>
<para>
Why would you want to use the weak mode? Well, it could be that
you have a huge collection of ISBN numbers, and that there are so many of
them that for weird reasons some have the wrong check digit (perhaps the
numbers where scanned from a printed list and the OCR got the numbers wrong,
perhaps the numbers were manually captured... who knows.) Anyway, the thing
is you might want to clean the mess up, but you still want to be able to have
all the numbers in your database and maybe use an external tool to access
the invalid numbers in the database so you can verify the information and
validate it more easily; as selecting all the invalid numbers in the table.
</para>
<para>
When you insert invalid numbers in a table using the weak mode, the number
will be inserted with the corrected check digit, but it will be flagged
with an exclamation mark ('!') at the end (i.e. 0-11-000322-5!)
</para>
<para>
You can also force the insertion of invalid numbers even not in the weak mode,
appending the '!' character at the end of the number.
</para>
</sect2>
<sect2>
<title>Examples</title>
<programlisting>
--Using the types directly:
SELECT isbn('978-0-393-04002-9');
SELECT isbn13('0901690546');
SELECT issn('1436-4522');
--Casting types:
-- note that you can only cast from ean13 to other type when the casted
-- number would be valid in the realm of the casted type;
-- thus, the following will NOT work: select isbn(ean13('0220356483481'));
-- but these will:
SELECT upc(ean13('0220356483481'));
SELECT ean13(upc('220356483481'));
--Create a table with a single column to hold ISBN numbers:
CREATE TABLE test ( id isbn );
INSERT INTO test VALUES('9780393040029');
--Automatically calculating check digits (observe the '?'):
INSERT INTO test VALUES('220500896?');
INSERT INTO test VALUES('978055215372?');
SELECT issn('3251231?');
SELECT ismn('979047213542?');
--Using the weak mode:
SELECT isn_weak(true);
INSERT INTO test VALUES('978-0-11-000533-4');
INSERT INTO test VALUES('9780141219307');
INSERT INTO test VALUES('2-205-00876-X');
SELECT isn_weak(false);
SELECT id FROM test WHERE NOT is_valid(id);
UPDATE test SET id=make_valid(id) WHERE id = '2-205-00876-X!';
SELECT * FROM test;
SELECT isbn13(id) FROM test;
</programlisting>
</sect2>
<sect2>
<title>Bibliography</title>
<para>
The information to implement this module was collected through
several sites, including:
</para>
<programlisting>
http://www.isbn-international.org/
http://www.issn.org/
http://www.ismn-international.org/
http://www.wikipedia.org/
</programlisting>
<para>
the prefixes used for hyphenation where also compiled from:
</para>
<programlisting>
http://www.gs1.org/productssolutions/idkeys/support/prefix_list.html
http://www.isbn-international.org/en/identifiers.html
http://www.ismn-international.org/ranges.html
</programlisting>
<para>
Care was taken during the creation of the algorithms and they
were meticulously verified against the suggested algorithms
in the official ISBN, ISMN, ISSN User Manuals.
</para>
</sect2>
<sect2>
<title>Author</title>
<para>
Germ<72>n M<>ndez Bravo (Kronuz), 2004 - 2006
</para>
</sect2>
</sect1>

118
doc/src/sgml/lo.sgml Normal file
View File

@ -0,0 +1,118 @@
<sect1 id="lo">
<title>lo</title>
<indexterm zone="lo">
<primary>lo</primary>
</indexterm>
<para>
PostgreSQL type extension for managing Large Objects
</para>
<sect2>
<title>Overview</title>
<para>
One of the problems with the JDBC driver (and this affects the ODBC driver
also), is that the specification assumes that references to BLOBS (Binary
Large OBjectS) are stored within a table, and if that entry is changed, the
associated BLOB is deleted from the database.
</para>
<para>
As PostgreSQL stands, this doesn't occur. Large objects are treated as
objects in their own right; a table entry can reference a large object by
OID, but there can be multiple table entries referencing the same large
object OID, so the system doesn't delete the large object just because you
change or remove one such entry.
</para>
<para>
Now this is fine for new PostgreSQL-specific applications, but existing ones
using JDBC or ODBC won't delete the objects, resulting in orphaning - objects
that are not referenced by anything, and simply occupy disk space.
</para>
</sect2>
<sect2>
<title>The Fix</title>
<para>
I've fixed this by creating a new data type 'lo', some support functions, and
a Trigger which handles the orphaning problem. The trigger essentially just
does a 'lo_unlink' whenever you delete or modify a value referencing a large
object. When you use this trigger, you are assuming that there is only one
database reference to any large object that is referenced in a
trigger-controlled column!
</para>
<para>
The 'lo' type was created because we needed to differentiate between plain
OIDs and Large Objects. Currently the JDBC driver handles this dilemma easily,
but (after talking to Byron), the ODBC driver needed a unique type. They had
created an 'lo' type, but not the solution to orphaning.
</para>
<para>
You don't actually have to use the 'lo' type to use the trigger, but it may be
convenient to use it to keep track of which columns in your database represent
large objects that you are managing with the trigger.
</para>
</sect2>
<sect2>
<title>How to Use</title>
<para>
The easiest way is by an example:
</para>
<programlisting>
CREATE TABLE image (title TEXT, raster lo);
CREATE TRIGGER t_raster BEFORE UPDATE OR DELETE ON image
FOR EACH ROW EXECUTE PROCEDURE lo_manage(raster);
</programlisting>
<para>
Create a trigger for each column that contains a lo type, and give the column
name as the trigger procedure argument. You can have more than one trigger on
a table if you need multiple lo columns in the same table, but don't forget to
give a different name to each trigger.
</para>
</sect2>
<sect2>
<title>Issues</title>
<itemizedlist>
<listitem>
<para>
Dropping a table will still orphan any objects it contains, as the trigger
is not executed.
</para>
<para>
Avoid this by preceding the 'drop table' with 'delete from {table}'.
</para>
<para>
If you already have, or suspect you have, orphaned large objects, see
the contrib/vacuumlo module to help you clean them up. It's a good idea
to run contrib/vacuumlo occasionally as a back-stop to the lo_manage
trigger.
</para>
</listitem>
<listitem>
<para>
Some frontends may create their own tables, and will not create the
associated trigger(s). Also, users may not remember (or know) to create
the triggers.
</para>
</listitem>
</itemizedlist>
<para>
As the ODBC driver needs a permanent lo type (& JDBC could be optimised to
use it if it's Oid is fixed), and as the above issues can only be fixed by
some internal changes, I feel it should become a permanent built-in type.
</para>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Mount <email>peter@retep.org.uk</email> June 13 1998
</para>
</sect2>
</sect1>

771
doc/src/sgml/ltree.sgml Normal file
View File

@ -0,0 +1,771 @@
<sect1 id="ltree">
<title>ltree</title>
<indexterm zone="ltree">
<primary>ltree</primary>
</indexterm>
<para>
<literal>ltree</literal> is a PostgreSQL module that contains implementation
of data types, indexed access methods and queries for data organized as a
tree-like structures.
</para>
<sect2>
<title>Definitions</title>
<para>
A <emphasis>label</emphasis> of a node is a sequence of one or more words
separated by blank character '_' and containing letters and digits ( for
example, [a-zA-Z0-9] for C locale). The length of a label is limited by 256
bytes.
</para>
<para>
Example: 'Countries', 'Personal_Services'
</para>
<para>
A <emphasis>label path</emphasis> of a node is a sequence of one or more
dot-separated labels l1.l2...ln, represents path from root to the node. The
length of a label path is limited by 65Kb, but size <= 2Kb is preferrable.
We consider it's not a strict limitation (maximal size of label path for
DMOZ catalogue - <ulink url="http://www.dmoz.org"></ulink>, is about 240
bytes!)
</para>
<para>
Example: <literal>'Top.Countries.Europe.Russia'</literal>
</para>
<para>
We introduce several datatypes:
</para>
<itemizedlist>
<listitem>
<para>
<literal>ltree</literal> - is a datatype for label path.
</para>
</listitem>
<listitem>
<para>
<literal>ltree[]</literal> - is a datatype for arrays of ltree.
</para>
</listitem>
<listitem>
<para>
<literal>lquery</literal>
- is a path expression that has regular expression in the label path and
used for ltree matching. Star symbol (*) is used to specify any number of
labels (levels) and could be used at the beginning and the end of lquery,
for example, '*.Europe.*'.
</para>
<para>
The following quantifiers are recognized for '*' (like in Perl):
</para>
<itemizedlist>
<listitem>
<para>{n} Match exactly n levels</para>
</listitem>
<listitem>
<para>{n,} Match at least n levels</para>
</listitem>
<listitem>
<para>{n,m} Match at least n but not more than m levels</para>
</listitem>
<listitem>
<para>{,m} Match at maximum m levels (eq. to {0,m})</para>
</listitem>
</itemizedlist>
<para>
It is possible to use several modifiers at the end of a label:
</para>
<itemizedlist>
<listitem>
<para>@ Do case-insensitive label matching</para>
</listitem>
<listitem>
<para>* Do prefix matching for a label</para>
</listitem>
<listitem>
<para>% Don't account word separator '_' in label matching, that is
'Russian%' would match 'Russian_nations', but not 'Russian'
</para>
</listitem>
</itemizedlist>
<para>
<literal>lquery</literal> can contain logical '!' (NOT) at the beginning
of the label and '|' (OR) to specify possible alternatives for label
matching.
</para>
<para>
Example of <literal>lquery</literal>:
</para>
<programlisting>
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
a) b) c) d) e)
</programlisting>
<para>
A label path should
</para>
<orderedlist numeration='loweralpha'>
<listitem>
<para>
begin from a node with label 'Top'
</para>
</listitem>
<listitem>
<para>
and following zero or 2 labels until
</para>
</listitem>
<listitem>
<para>
a node with label beginning from case-insensitive prefix 'sport'
</para>
</listitem>
<listitem>
<para>
following node with label not matched 'football' or 'tennis' and
</para>
</listitem>
<listitem>
<para>
end on node with label beginning from 'Russ' or strictly matched
'Spain'.
</para>
</listitem>
</orderedlist>
</listitem>
<listitem>
<para><literal>ltxtquery</literal>
- is a datatype for label searching (like type 'query' for full text
searching, see contrib/tsearch). It's possible to use modifiers @,%,* at
the end of word. The meaning of modifiers are the same as for lquery.
</para>
<para>
Example: <literal>'Europe & Russia*@ & !Transportation'</literal>
</para>
<para>
Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and
not 'Transportation'. Notice, the order of words as they appear in label
path is not important !
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Operations</title>
<para>
The following operations are defined for type ltree:
</para>
<itemizedlist>
<listitem>
<para>
<literal><,>,<=,>=,=, <></literal>
- Have their usual meanings. Comparison is doing in the order of direct
tree traversing, children of a node are sorted lexicographic.
</para>
</listitem>
<listitem>
<para>
<literal>ltree @> ltree</literal>
- returns TRUE if left argument is an ancestor of right argument (or
equal).
</para>
</listitem>
<listitem>
<para>
<literal>ltree <@ ltree </literal>
- returns TRUE if left argument is a descendant of right argument (or
equal).
</para>
</listitem>
<listitem>
<para>
<literal>ltree ~ lquery, lquery ~ ltree</literal>
- return TRUE if node represented by ltree satisfies lquery.
</para>
</listitem>
<listitem>
<para>
<literal>ltree ? lquery[], lquery ? ltree[]</literal>
- return TRUE if node represented by ltree satisfies at least one lquery
from array.
</para>
</listitem>
<listitem>
<para>
<literal>ltree @ ltxtquery, ltxtquery @ ltree</literal>
- return TRUE if node represented by ltree satisfies ltxtquery.
</para>
</listitem>
<listitem>
<para>
<literal>ltree || ltree, ltree || text, text || ltree</literal>
- return concatenated ltree.
</para>
</listitem>
</itemizedlist>
<para>
Operations for arrays of ltree (<literal>ltree[]</literal>):
</para>
<itemizedlist>
<listitem>
<para>
<literal>ltree[] @> ltree, ltree <@ ltree[]</literal>
- returns TRUE if array ltree[] contains an ancestor of ltree.
</para>
</listitem>
<listitem>
<para>
<literal>ltree @> ltree[], ltree[] <@ ltree</literal>
- returns TRUE if array ltree[] contains a descendant of ltree.
</para>
</listitem>
<listitem>
<para>
<literal>ltree[] ~ lquery, lquery ~ ltree[]</literal>
- returns TRUE if array ltree[] contains label paths matched lquery.
</para>
</listitem>
<listitem>
<para>
<literal>ltree[] ? lquery[], lquery[] ? ltree[]</literal>
- returns TRUE if array ltree[] contains label paths matched atleaset one
lquery from array.
</para>
</listitem>
<listitem>
<para>
<literal>ltree[] @ ltxtquery, ltxtquery @ ltree[]</literal>
- returns TRUE if array ltree[] contains label paths matched ltxtquery
(full text search).
</para>
</listitem>
<listitem>
<para>
<literal>ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery</literal>
- returns first element of array ltree[] satisfies corresponding condition
and NULL in vice versa.
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Remark</title>
<para>
Operations <literal>&lt;@</literal>, <literal>@&gt;</literal>, <literal>@</literal> and
<literal>~</literal> have analogues - <literal>^&lt;@, ^@&gt;, ^@, ^~,</literal> which don't use
indices!
</para>
</sect2>
<sect2>
<title>Indices</title>
<para>
Various indices could be created to speed up execution of operations:
</para>
<itemizedlist>
<listitem>
<para>
B-tree index over ltree: <literal>&lt;, &lt;=, =, &gt;=, &gt;</literal>
</para>
</listitem>
<listitem>
<para>
GiST index over ltree: <literal>&lt;, &lt;=, =, &gt;=, &gt;, @&gt;, &lt;@, @, ~, ?</literal>
</para>
<para>
Example:
</para>
<programlisting>
CREATE INDEX path_gist_idx ON test USING GIST (path);
</programlisting>
</listitem>
<listitem>
<para>GiST index over ltree[]:
<literal>ltree[]<@ ltree, ltree @> ltree[], @, ~, ?.</literal>
</para>
<para>
Example:
</para>
<programlisting>
CREATE INDEX path_gist_idx ON test USING GIST (array_path);
</programlisting>
<para>
Notices: This index is lossy.
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Functions</title>
<itemizedlist>
<listitem>
<para>
<literal>ltree subltree(ltree, start, end)</literal>
returns subpath of ltree from start (inclusive) until the end.
</para>
<programlisting>
# select subltree('Top.Child1.Child2',1,2);
subltree
--------
Child1
</programlisting>
</listitem>
<listitem>
<para>
<literal>ltree subpath(ltree, OFFSET,LEN)</literal> and
<literal>ltree subpath(ltree, OFFSET)</literal>
returns subpath of ltree from OFFSET (inclusive) with length LEN.
If OFFSET is negative returns subpath starts that far from the end
of the path. If LENGTH is omitted, returns everything to the end
of the path. If LENGTH is negative, leaves that many labels off
the end of the path.
</para>
<programlisting>
# select subpath('Top.Child1.Child2',1,2);
subpath
-------
Child1.Child2
# select subpath('Top.Child1.Child2',-2,1);
subpath
---------
Child1
</programlisting>
</listitem>
<listitem>
<para>
<literal>int4 nlevel(ltree)</literal> - returns level of the node.
</para>
<programlisting>
# select nlevel('Top.Child1.Child2');
nlevel
--------
3
</programlisting>
<para>
Note, that arguments start, end, OFFSET, LEN have meaning of level of the
node !
</para>
</listitem>
<listitem>
<para>
<literal>int4 index(ltree,ltree)</literal> and
<literal>int4 index(ltree,ltree,OFFSET)</literal>
returns number of level of the first occurence of second argument in first
one beginning from OFFSET. if OFFSET is negative, than search begins from |
OFFSET| levels from the end of the path.
</para>
<programlisting>
SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',3);
index
-------
6
SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4);
index
-------
9
</programlisting>
</listitem>
<listitem>
<para>
<literal>ltree text2ltree(text)</literal> and
<literal>text ltree2text(text)</literal> cast functions for ltree and text.
</para>
</listitem>
<listitem>
<para>
<literal>ltree lca(ltree,ltree,...) (up to 8 arguments)</literal> and
<literal>ltree lca(ltree[])</literal> Returns Lowest Common Ancestor (lca).
</para>
<programlisting>
# select lca('1.2.2.3','1.2.3.4.5.6');
lca
-----
1.2
# select lca('{la.2.3,1.2.3.4.5.6}') is null;
?column?
----------
f
</programlisting>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Installation</title>
<programlisting>
cd contrib/ltree
make
make install
make installcheck
</programlisting>
</sect2>
<sect2>
<title>Example</title>
<programlisting>
createdb ltreetest
psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql
psql ltreetest < ltreetest.sql
</programlisting>
<para>
Now, we have a database ltreetest populated with a data describing hierarchy
shown below:
</para>
<programlisting>
TOP
/ | \
Science Hobbies Collections
/ | \
Astronomy Amateurs_Astronomy Pictures
/ \ |
Astrophysics Cosmology Astronomy
/ | \
Galaxies Stars Astronauts
</programlisting>
<para>
Inheritance:
</para>
<programlisting>
ltreetest=# select path from test where path <@ 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(4 rows)
</programlisting>
<para>
Matching:
</para>
<programlisting>
ltreetest=# select path from test where path ~ '*.Astronomy.*';
path
-----------------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Collections.Pictures.Astronomy
Top.Collections.Pictures.Astronomy.Stars
Top.Collections.Pictures.Astronomy.Galaxies
Top.Collections.Pictures.Astronomy.Astronauts
(7 rows)
ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(3 rows)
</programlisting>
<para>
Full text search:
</para>
<programlisting>
ltreetest=# select path from test where path @ 'Astro*% & !pictures@';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Hobbies.Amateurs_Astronomy
(4 rows)
ltreetest=# select path from test where path @ 'Astro* & !pictures@';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(3 rows)
</programlisting>
<para>
Using Functions:
</para>
<programlisting>
ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy';
?column?
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
Top.Science.Space.Astronomy.Cosmology
(3 rows)
We could create SQL-function:
CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree
AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
LANGUAGE SQL IMMUTABLE;
</programlisting>
<para>
and previous select could be rewritten as:
</para>
<programlisting>
ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy';
ins_label
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
Top.Science.Space.Astronomy.Cosmology
(3 rows)
</programlisting>
<para>
Or with another arguments:
</para>
<programlisting>
CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree
AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));'
LANGUAGE SQL IMMUTABLE;
ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy';
ins_label
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
Top.Science.Space.Astronomy.Cosmology
(3 rows)
</programlisting>
</sect2>
<sect2>
<title>Additional data</title>
<para>
To get more feeling from our ltree module you could download
dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes),
available from
<ulink url="http://www.sai.msu.su/~megera/postgres/gist/ltree/"></ulink>
dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree.
Setup your test database (dmoz), load ltree module and issue command:
</para>
<programlisting>
zcat dmozltree-eng.sql.gz| psql dmoz
</programlisting>
<para>
Data will be loaded into database dmoz and all indices will be created.
</para>
</sect2>
<sect2>
<title>Benchmarks</title>
<para>
All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ
data, containing 300,274 nodes (see above for download link). We used some
basic queries typical for walking through catalog.
</para>
<sect3>
<title>Queries</title>
<itemizedlist>
<listitem>
<para>
Q0: Count all rows (sort of base time for comparison)
</para>
<programlisting>
select count(*) from dmoz;
count
--------
300274
(1 row)
</programlisting>
</listitem>
<listitem>
<para>
Q1: Get direct children (without inheritance)
</para>
<programlisting>
select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}';
path
-----------------------------------
Top.Adult.Arts.Animation.Cartoons
Top.Adult.Arts.Animation.Anime
(2 rows)
</programlisting>
</listitem>
<listitem>
<para>
Q2: The same as Q1 but with counting of successors
</para>
<programlisting>
select path as parentpath , (select count(*)-1 from dmoz where path <@
p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}';
parentpath | count
-----------------------------------+-------
Top.Adult.Arts.Animation.Cartoons | 2
Top.Adult.Arts.Animation.Anime | 61
(2 rows)
</programlisting>
</listitem>
<listitem>
<para>
Q3: Get all parents
</para>
<programlisting>
select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by
path asc;
path
--------------------------
Top
Top.Adult
Top.Adult.Arts
Top.Adult.Arts.Animation
(4 rows)
</programlisting>
</listitem>
<listitem>
<para>
Q4: Get all parents with counting of children
</para>
<programlisting>
select path, (select count(*)-1 from dmoz where path <@ p.path) as count
from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc;
path | count
--------------------------+--------
Top | 300273
Top.Adult | 4913
Top.Adult.Arts | 339
Top.Adult.Arts.Animation | 65
(4 rows)
</programlisting>
</listitem>
<listitem>
<para>
Q5: Get all children with levels
</para>
<programlisting>
select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level
from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc;
path | level
------------------------------------------------+-------
Top.Adult.Arts.Animation.Anime | 1
Top.Adult.Arts.Animation.Anime.Fan_Works | 2
Top.Adult.Arts.Animation.Anime.Games | 2
Top.Adult.Arts.Animation.Anime.Genres | 2
Top.Adult.Arts.Animation.Anime.Image_Galleries | 2
Top.Adult.Arts.Animation.Anime.Multimedia | 2
Top.Adult.Arts.Animation.Anime.Resources | 2
Top.Adult.Arts.Animation.Anime.Titles | 2
Top.Adult.Arts.Animation.Cartoons | 1
Top.Adult.Arts.Animation.Cartoons.AVS | 2
Top.Adult.Arts.Animation.Cartoons.Members | 2
(11 rows)
</programlisting>
</listitem>
</itemizedlist>
</sect3>
<sect3>
<title>Timings</title>
<programlisting>
+---------------------------------------------+
|Query|Rows|Time (ms) index|Time (ms) no index|
|-----+----+---------------+------------------|
| Q0| 1| NA| 1453.44|
|-----+----+---------------+------------------|
| Q1| 2| 0.49| 1001.54|
|-----+----+---------------+------------------|
| Q2| 2| 1.48| 3009.39|
|-----+----+---------------+------------------|
| Q3| 4| 0.55| 906.98|
|-----+----+---------------+------------------|
| Q4| 4| 24385.07| 4951.91|
|-----+----+---------------+------------------|
| Q5| 11| 0.85| 1003.23|
+---------------------------------------------+
</programlisting>
<para>
Timings without indices were obtained using operations which doesn't use
indices (see above)
</para>
</sect3>
<sect3>
<title>Remarks</title>
<para>
We didn't run full-scale tests, also we didn't present (yet) data for
operations with arrays of ltree (ltree[]) and full text searching. We'll
appreciate your input. So far, below some (rather obvious) results:
</para>
<itemizedlist>
<listitem>
<para>
Indices does help execution of queries
</para>
</listitem>
<listitem>
<para>
Q4 performs bad because one needs to read almost all data from the HDD
</para>
</listitem>
</itemizedlist>
</sect3>
</sect2>
<sect2>
<title>Some Backgrounds</title>
<para>
The approach we use for ltree is much like one we used in our other GiST based
contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical
background is available in papers referenced from our GiST development page
(<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink>).
</para>
<para>
A hierarchical data structure (tree) is a set of nodes. Each node has a
signature (LPS) of a fixed size, which is a hashed label path of that node.
Traversing a tree we could *certainly* prune branches if
</para>
<programlisting>
LQS (bitwise AND) LPS != LQS
</programlisting>
<para>
where LQS is a signature of lquery or ltxtquery, obtained in the same way as
LPS.
</para>
<programlisting>
ltree[]:
</programlisting>
<para>
For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children
reachable from that node. Signatures are stored in RD-tree, implemented using
GiST, which provides indexed access.
</para>
<programlisting>
ltree:
</programlisting>
<para>
For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is
represented by (left_bound, signature, right_bound), so that we could speedup
operations <literal><, <=, =, >=, ></literal> using left_bound, right_bound and prune branches of
a tree using signature.
</para>
</sect2>
<sect2>
<title>Authors</title>
<para>
All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and
Oleg Bartunov (<email>oleg@sai.msu.su</email>). See
<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for
additional information. Authors would like to thank Eugeny Rodichev for
helpful discussions. Comments and bug reports are welcome.
</para>
</sect2>
</sect1>

205
doc/src/sgml/oid2name.sgml Normal file
View File

@ -0,0 +1,205 @@
<sect1 id="oid2name">
<title>oid2name</title>
<indexterm zone="oid2name">
<primary>oid2name</primary>
</indexterm>
<para>
This utility allows administrators to examine the file structure used by
PostgreSQL. To make use of it, you need to be familiar with the file
structure, which is described in <xref linkend="storage">.
</para>
<sect2>
<title>Overview</title>
<para>
<literal>oid2name</literal> connects to the database and extracts OID,
filenode, and table name information. You can also have it show database
OIDs and tablespace OIDs.
</para>
<para>
When displaying specific tables, you can select which tables to show by
using -o, -f and -t. The first switch takes an OID, the second takes
a filenode, and the third takes a tablename (actually, it's a LIKE
pattern, so you can use things like "foo%"). Note that you can use as many
of these switches as you like, and the listing will include all objects
matched by any of the switches. Also note that these switches can only
show objects in the database given in -d.
</para>
<para>
If you don't give any of -o, -f or -t it will dump all the tables in the
database given in -d. If you don't give -d, it will show a database
listing. Alternatively you can give -s to get a tablespace listing.
</para>
<table>
<title>Additional switches</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>-i</literal></entry>
<entry>include indexes and sequences in the database listing.</entry>
</row>
<row>
<entry><literal>-x</literal></entry>
<entry>display more information about each object shown: tablespace name,
schema name, OID.
</entry>
</row>
<row>
<entry><literal>-S</literal></entry>
<entry>also show system objects (those in information_schema, pg_toast
and pg_catalog schemas)
</entry>
</row>
<row>
<entry><literal>-q</literal></entry>
<entry>don't display headers(useful for scripting)</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Examples</title>
<programlisting>
$ oid2name
All databases:
Oid Database Name Tablespace
----------------------------------
17228 alvherre pg_default
17255 regression pg_default
17227 template0 pg_default
1 template1 pg_default
$ oid2name -s
All tablespaces:
Oid Tablespace Name
-------------------------
1663 pg_default
1664 pg_global
155151 fastdisk
155152 bigdisk
$ cd $PGDATA/17228
$ # get top 10 db objects in the default tablespace, ordered by size
$ ls -lS * | head -10
-rw------- 1 alvherre alvherre 136536064 sep 14 09:51 155173
-rw------- 1 alvherre alvherre 17965056 sep 14 09:51 1155291
-rw------- 1 alvherre alvherre 1204224 sep 14 09:51 16717
-rw------- 1 alvherre alvherre 581632 sep 6 17:51 1255
-rw------- 1 alvherre alvherre 237568 sep 14 09:50 16674
-rw------- 1 alvherre alvherre 212992 sep 14 09:51 1249
-rw------- 1 alvherre alvherre 204800 sep 14 09:51 16684
-rw------- 1 alvherre alvherre 196608 sep 14 09:50 16700
-rw------- 1 alvherre alvherre 163840 sep 14 09:50 16699
-rw------- 1 alvherre alvherre 122880 sep 6 17:51 16751
$ oid2name -d alvherre -f 155173
From database "alvherre":
Filenode Table Name
----------------------
155173 accounts
$ # you can ask for more than one object
$ oid2name -d alvherre -f 155173 -f 1155291
From database "alvherre":
Filenode Table Name
-------------------------
155173 accounts
1155291 accounts_pkey
$ # you can also mix the options, and have more details
$ oid2name -d alvherre -t accounts -f 1155291 -x
From database "alvherre":
Filenode Table Name Oid Schema Tablespace
------------------------------------------------------
155173 accounts 155173 public pg_default
1155291 accounts_pkey 1155291 public pg_default
$ # show disk space for every db object
$ du [0-9]* |
> while read SIZE FILENODE
> do
> echo "$SIZE `oid2name -q -d alvherre -i -f $FILENODE`"
> done
16 1155287 branches_pkey
16 1155289 tellers_pkey
17561 1155291 accounts_pkey
...
$ # same, but sort by size
$ du [0-9]* | sort -rn | while read SIZE FN
> do
> echo "$SIZE `oid2name -q -d alvherre -f $FN`"
> done
133466 155173 accounts
17561 1155291 accounts_pkey
1177 16717 pg_proc_proname_args_nsp_index
...
$ # If you want to see what's in tablespaces, use the pg_tblspc directory
$ cd $PGDATA/pg_tblspc
$ oid2name -s
All tablespaces:
Oid Tablespace Name
-------------------------
1663 pg_default
1664 pg_global
155151 fastdisk
155152 bigdisk
$ # what databases have objects in tablespace "fastdisk"?
$ ls -d 155151/*
155151/17228/ 155151/PG_VERSION
$ # Oh, what was database 17228 again?
$ oid2name
All databases:
Oid Database Name Tablespace
----------------------------------
17228 alvherre pg_default
17255 regression pg_default
17227 template0 pg_default
1 template1 pg_default
$ # Let's see what objects does this database have in the tablespace.
$ cd 155151/17228
$ ls -l
total 0
-rw------- 1 postgres postgres 0 sep 13 23:20 155156
$ # OK, this is a pretty small table ... but which one is it?
$ oid2name -d alvherre -f 155156
From database "alvherre":
Filenode Table Name
----------------------
155156 foo
$ # end of sample session.
</programlisting>
<para>
You can also get approximate size data for each object using psql. For
example,
</para>
<programlisting>
SELECT relpages, relfilenode, relname FROM pg_class ORDER BY relpages DESC;
</programlisting>
<para>
Each page is typically 8k. Relpages is updated by VACUUM.
</para>
</sect2>
<sect2>
<title>Author</title>
<para>
b. palmer, <email>bpalmer@crimelabs.net</email>
</para>
</sect2>
</sect1>

View File

@ -0,0 +1,125 @@
<sect1 id="pageinspect">
<title>pageinspect</title>
<indexterm zone="pageinspect">
<primary>pageinspect</primary>
</indexterm>
<para>
The functions in this module allow you to inspect the contents of data pages
at a low level, for debugging purposes.
</para>
<sect2>
<title>Functions included</title>
<itemizedlist>
<listitem>
<para>
<literal>get_raw_page</literal> reads one block of the named table and returns a copy as a
bytea field. This allows a single time-consistent copy of the block to be
made. Use of this functions is restricted to superusers.
</para>
</listitem>
<listitem>
<para>
<literal>page_header</literal> shows fields which are common to all PostgreSQL heap and index
pages. Use of this function is restricted to superusers.
</para>
<para>
A page image obtained with <literal>get_raw_page</literal> should be passed as argument:
</para>
<programlisting>
test=# SELECT * FROM page_header(get_raw_page('pg_class',0));
lsn | tli | flags | lower | upper | special | pagesize | version
----------+-----+-------+-------+-------+---------+----------+---------
0/3C5614 | 1 | 1 | 216 | 256 | 8192 | 8192 | 4
(1 row)
</programlisting>
<para>
The returned columns correspond to the fields in the PageHeaderData-struct,
see src/include/storage/bufpage.h for more details.
</para>
</listitem>
<listitem>
<para>
<literal>heap_page_items</literal> shows all line pointers on a heap page. For those line
pointers that are in use, tuple headers are also shown. All tuples are
shown, whether or not the tuples were visible to an MVCC snapshot at the
time the raw page was copied. Use of this function is restricted to
superusers.
</para>
<para>
A heap page image obtained with <literal>get_raw_page</literal> should be passed as argument:
</para>
<programlisting>
test=# SELECT * FROM heap_page_items(get_raw_page('pg_class',0));
</programlisting>
<para>
See src/include/storage/itemid.h and src/include/access/htup.h for
explanations of the fields returned.
</para>
</listitem>
<listitem>
<para>
<literal>bt_metap()</literal> returns information about the btree index metapage:
</para>
<programlisting>
test=> SELECT * FROM bt_metap('pg_cast_oid_index');
-[ RECORD 1 ]-----
magic | 340322
version | 2
root | 1
level | 0
fastroot | 1
fastlevel | 0
</programlisting>
</listitem>
<listitem>
<para>
<literal>bt_page_stats()</literal> shows information about single btree pages:
</para>
<programlisting>
test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
-[ RECORD 1 ]-+-----
blkno | 1
type | l
live_items | 256
dead_items | 0
avg_item_size | 12
page_size | 8192
free_size | 4056
btpo_prev | 0
btpo_next | 0
btpo | 0
btpo_flags | 3
</programlisting>
</listitem>
<listitem>
<para>
<literal>bt_page_items()</literal> returns information about specific items on btree pages:
</para>
<programlisting>
test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------
1 | (0,1) | 12 | f | f | 23 27 00 00
2 | (0,2) | 12 | f | f | 24 27 00 00
3 | (0,3) | 12 | f | f | 25 27 00 00
4 | (0,4) | 12 | f | f | 26 27 00 00
5 | (0,5) | 12 | f | f | 27 27 00 00
6 | (0,6) | 12 | f | f | 28 27 00 00
7 | (0,7) | 12 | f | f | 29 27 00 00
8 | (0,8) | 12 | f | f | 2a 27 00 00
</programlisting>
</listitem>
</itemizedlist>
</sect2>
</sect1>

422
doc/src/sgml/pgbench.sgml Normal file
View File

@ -0,0 +1,422 @@
<sect1 id="pgbench">
<title>pgbench</title>
<indexterm zone="pgbench">
<primary>pgbench</primary>
</indexterm>
<para>
<literal>pgbench</literal> is a simple program to run a benchmark test.
<literal>pgbench</literal> is a client application of PostgreSQL and runs
with PostgreSQL only. It performs lots of small and simple transactions
including SELECT/UPDATE/INSERT operations then calculates number of
transactions successfully completed within a second (transactions
per second, tps). Targeting data includes a table with at least 100k
tuples.
</para>
<para>
Example outputs from pgbench look like:
</para>
<programlisting>
number of clients: 4
number of transactions per client: 100
number of processed transactions: 400/400
tps = 19.875015(including connections establishing)
tps = 20.098827(excluding connections establishing)
</programlisting>
<para> Similar program called "JDBCBench" already exists, but it requires
Java that may not be available on every platform. Moreover some
people concerned about the overhead of Java that might lead
inaccurate results. So I decided to write in pure C, and named
it "pgbench."
</para>
<para>
Features of pgbench:
</para>
<itemizedlist>
<listitem>
<para>
pgbench is written in C using libpq only. So it is very portable
and easy to install.
</para>
</listitem>
<listitem>
<para>
pgbench can simulate concurrent connections using asynchronous
capability of libpq. No threading is required.
</para>
</listitem>
</itemizedlist>
<sect2>
<title>Overview</title>
<orderedlist>
<listitem>
<para>(optional)Initialize database by:</para>
<programlisting>
pgbench -i &lt;dbname&gt;
</programlisting>
<para>
where &lt;dbname&gt; is the name of database. pgbench uses four tables
accounts, branches, history and tellers. These tables will be
destroyed. Be very careful if you have tables having same
names. Default test data contains:
</para>
<programlisting>
table # of tuples
-------------------------
branches 1
tellers 10
accounts 100000
history 0
</programlisting>
<para>
You can increase the number of tuples by using -s option. branches,
tellers and accounts tables are created with a fillfactor which is
set using -F option. See below.
</para>
</listitem>
<listitem>
<para>Run the benchmark test</para>
<programlisting>
pgbench &lt;dbname&gt;
</programlisting>
<para>
The default configuration is:
</para>
<programlisting>
number of clients: 1
number of transactions per client: 10
</programlisting>
</listitem>
</orderedlist>
<table>
<title><literal>pgbench</literal> options</title>
<tgroup cols="2">
<thead>
<row>
<entry>Parameter</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>-h hostname</literal></entry>
<entry>
<para>
hostname where the backend is running. If this option
is omitted, pgbench will connect to the localhost via
Unix domain socket.
</para>
</entry>
</row>
<row>
<entry><literal>-p port</literal></entry>
<entry>
<para>
the port number that the backend is accepting. default is
libpq's default, usually 5432.
</para>
</entry>
</row>
<row>
<entry><literal>-c number_of_clients</literal></entry>
<entry>
<para>
Number of clients simulated. default is 1.
</para>
</entry>
</row>
<row>
<entry><literal>-t number_of_transactions</literal></entry>
<entry>
<para>
Number of transactions each client runs. default is 10.
</para>
</entry>
</row>
<row>
<entry><literal>-s scaling_factor</literal></entry>
<entry>
<para>
this should be used with -i (initialize) option.
number of tuples generated will be multiple of the
scaling factor. For example, -s 100 will imply 10M
(10,000,000) tuples in the accounts table.
default is 1.
</para>
<para>
NOTE: scaling factor should be at least
as large as the largest number of clients you intend
to test; else you'll mostly be measuring update contention.
Regular (not initializing) runs using one of the
built-in tests will detect scale based on the number of
branches in the database. For custom (-f) runs it can
be manually specified with this parameter.
</para>
</entry>
</row>
<row>
<entry><literal>-D varname=value</literal></entry>
<entry>
<para>
Define a variable. It can be refered to by a script
provided by using -f option. Multiple -D options are allowed.
</para>
</entry>
</row>
<row>
<entry><literal>-U login</literal></entry>
<entry>
<para>
Specify db user's login name if it is different from
the Unix login name.
</para>
</entry>
</row>
<row>
<entry><literal>-P password</literal></entry>
<entry>
<para>
Specify the db password. CAUTION: using this option
might be a security hole since ps command will
show the password. Use this for TESTING PURPOSE ONLY.
</para>
</entry>
</row>
<row>
<entry><literal>-n</literal></entry>
<entry>
<para>
No vacuuming and cleaning the history table prior to the
test is performed.
</para>
</entry>
</row>
<row>
<entry><literal>-v</literal></entry>
<entry>
<para>
Do vacuuming before testing. This will take some time.
With neither -n nor -v, pgbench will vacuum tellers and
branches tables only.
</para>
</entry>
</row>
<row>
<entry><literal>-S</literal></entry>
<entry>
<para>
Perform select only transactions instead of TPC-B.
</para>
</entry>
</row>
<row>
<entry><literal>-N</literal></entry>
<entry>
<para>
Do not update "branches" and "tellers". This will
avoid heavy update contention on branches and tellers,
while it will not make pgbench supporting TPC-B like
transactions.
</para>
</entry>
</row>
<row>
<entry><literal>-f filename</literal></entry>
<entry>
<para>
Read transaction script from file. Detailed
explanation will appear later.
</para>
</entry>
</row>
<row>
<entry><literal>-C</literal></entry>
<entry>
<para>
Establish connection for each transaction, rather than
doing it just once at beginning of pgbench in the normal
mode. This is useful to measure the connection overhead.
</para>
</entry>
</row>
<row>
<entry><literal>-l</literal></entry>
<entry>
<para>
Write the time taken by each transaction to a logfile,
with the name "pgbench_log.xxx", where xxx is the PID
of the pgbench process. The format of the log is:
</para>
<programlisting>
client_id transaction_no time file_no time-epoch time-us
</programlisting>
<para>
where time is measured in microseconds, , the file_no is
which test file was used (useful when multiple were
specified with -f), and time-epoch/time-us are a
UNIX epoch format timestamp followed by an offset
in microseconds (suitable for creating a ISO 8601
timestamp with a fraction of a second) of when
the transaction completed.
</para>
<para>
Here are example outputs:
</para>
<programlisting>
0 199 2241 0 1175850568 995598
0 200 2465 0 1175850568 998079
0 201 2513 0 1175850569 608
0 202 2038 0 1175850569 2663
</programlisting>
</entry>
</row>
<row>
<entry><literal>-F fillfactor</literal></entry>
<entry>
<para>
Create tables(accounts, tellers and branches) with the given
fillfactor. Default is 100. This should be used with -i
(initialize) option.
</para>
</entry>
</row>
<row>
<entry><literal>-d</literal></entry>
<entry>
<para>
debug option.
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>What is the "transaction" actually performed in pgbench?</title>
<orderedlist>
<listitem><para><literal>begin;</literal></para></listitem>
<listitem><para><literal>update accounts set abalance = abalance + :delta where aid = :aid;</literal></para></listitem>
<listitem><para><literal>select abalance from accounts where aid = :aid;</literal></para></listitem>
<listitem><para><literal>update tellers set tbalance = tbalance + :delta where tid = :tid;</literal></para></listitem>
<listitem><para><literal>update branches set bbalance = bbalance + :delta where bid = :bid;</literal></para></listitem>
<listitem><para><literal>insert into history(tid,bid,aid,delta) values(:tid,:bid,:aid,:delta);</literal></para></listitem>
<listitem><para><literal>end;</literal></para></listitem>
</orderedlist>
<para>
If you specify -N, (4) and (5) aren't included in the transaction.
</para>
</sect2>
<sect2>
<title>Script file</title>
<para>
<literal>pgbench</literal> has support for reading a transaction script
from a specified file (<literal>-f</literal> option). This file should
include SQL commands in each line. SQL command consists of multiple lines
are not supported. Empty lines and lines begging with "--" will be ignored.
</para>
<para>
Multiple <literal>-f</literal> options are allowed. In this case each
transaction is assigned randomly chosen script.
</para>
<para>
SQL commands can include "meta command" which begins with "\" (back
slash). A meta command takes some arguments separted by white
spaces. Currently following meta command is supported:
</para>
<itemizedlist>
<listitem>
<para>
<literal>\set name operand1 [ operator operand2 ]</literal>
- Sets the calculated value using "operand1" "operator"
"operand2" to variable "name". If "operator" and "operand2"
are omitted, the value of operand1 is set to variable "name".
</para>
<para>
Example:
</para>
<programlisting>
\set ntellers 10 * :scale
</programlisting>
</listitem>
<listitem>
<para>
<literal>\setrandom name min max</literal>
- Assigns random integer to name between min and max
</para>
<para>
Example:
</para>
<programlisting>
\setrandom aid 1 100000
</programlisting>
</listitem>
<listitem>
<para>
Variables can be referred to in SQL comands by adding ":" in front
of the varible name.
</para>
<para>
Example:
</para>
<programlisting>
SELECT abalance FROM accounts WHERE aid = :aid
</programlisting>
<para>
Variables can also be defined by using -D option.
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Examples</title>
<para>
Example, TPC-B like benchmark can be defined as follows(scaling
factor = 1):
</para>
<programlisting>
\set nbranches :scale
\set ntellers 10 * :scale
\set naccounts 100000 * :scale
\setrandom aid 1 :naccounts
\setrandom bid 1 :nbranches
\setrandom tid 1 :ntellers
\setrandom delta 1 10000
BEGIN
UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid
SELECT abalance FROM accounts WHERE aid = :aid
UPDATE tellers SET tbalance = tbalance + :delta WHERE tid = :tid
UPDATE branches SET bbalance = bbalance + :delta WHERE bid = :bid
INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, 'now')
END
</programlisting>
<para>
If you want to automatically set the scaling factor from the number of
tuples in branches table, use -s option and shell command like this:
</para>
<programlisting>
pgbench -s $(psql -At -c "SELECT count(*) FROM branches") -f tpc_b.sql
</programlisting>
<para>
Notice that -f option does not execute vacuum and clearing history
table before starting benchmark.
</para>
</sect2>
</sect1>

1144
doc/src/sgml/pgcrypto.sgml Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,123 @@
<sect1 id="pgrowlocks">
<title>pgrowlocks</title>
<indexterm zone="pgrowlocks">
<primary>pgrowlocks</primary>
</indexterm>
<para>
The <literal>pgrowlocks</literal> module provides a function to show row
locking information for a specified table.
</para>
<sect2>
<title>Overview</title>
<programlisting>
pgrowlocks(text) RETURNS pgrowlocks_type
</programlisting>
<para>
The parameter is a name of table. And <literal>pgrowlocks_type</literal> is
defined as:
</para>
<programlisting>
CREATE TYPE pgrowlocks_type AS (
locked_row TID, -- row TID
lock_type TEXT, -- lock type
locker XID, -- locking XID
multi bool, -- multi XID?
xids xid[], -- multi XIDs
pids INTEGER[] -- locker's process id
);
</programlisting>
<table>
<title>pgrowlocks_type</title>
<tgroup cols="2">
<tbody>
<row>
<entry>locked_row</entry>
<entry>tuple ID(TID) of each locked rows</entry>
</row>
<row>
<entry>lock_type</entry>
<entry>"Shared" for shared lock, "Exclusive" for exclusive lock</entry>
</row>
<row>
<entry>locker</entry>
<entry>transaction ID of locker (Note 1)</entry>
</row>
<row>
<entry>multi</entry>
<entry>"t" if locker is a multi transaction, otherwise "f"</entry>
</row>
<row>
<entry>xids</entry>
<entry>XIDs of lockers (Note 2)</entry>
</row>
<row>
<entry>pids</entry>
<entry>process ids of locking backends</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
Note1: If the locker is multi transaction, it represents the multi ID.
</para>
<para>
Note2: If the locker is multi, multiple data are shown.
</para>
<para>
The calling sequence for <literal>pgrowlocks</literal> is as follows:
<literal>pgrowlocks</literal> grabs AccessShareLock for the target table and
reads each row one by one to get the row locking information. You should
notice that:
</para>
<orderedlist>
<listitem>
<para>
if the table is exclusive locked by someone else,
<literal>pgrowlocks</literal> will be blocked.
</para>
</listitem>
<listitem>
<para>
<literal>pgrowlocks</literal> may show incorrect information if there's a
new lock or a lock is freeed while its execution.
</para>
</listitem>
</orderedlist>
<para>
<literal>pgrowlocks</literal> does not show the contents of locked rows. If
you want to take a look at the row contents at the same time, you could do
something like this:
</para>
<programlisting>
SELECT * FROM accounts AS a, pgrowlocks('accounts') AS p WHERE p.locked_ row = a.ctid;
</programlisting>
</sect2>
<sect2>
<title>Example</title>
<para>
<literal>pgrowlocks</literal> returns the following data type:
</para>
<para>
Here is a sample execution of pgrowlocks:
</para>
<programlisting>
test=# SELECT * FROM pgrowlocks('t1');
locked_row | lock_type | locker | multi | xids | pids
------------+-----------+--------+-------+-----------+---------------
(0,1) | Shared | 19 | t | {804,805} | {29066,29068}
(0,2) | Shared | 19 | t | {804,805} | {29066,29068}
(0,3) | Exclusive | 804 | f | {804} | {29066}
(0,4) | Exclusive | 804 | f | {804} | {29066}
(4 rows)
</programlisting>
</sect2>
</sect1>

View File

@ -0,0 +1,158 @@
<sect1 id="pgstattuple">
<title>pgstattuple</title>
<indexterm zone="pgstattuple">
<primary>pgstattuple</primary>
</indexterm>
<para>
<literal>pgstattuple</literal> modules provides various functions to obtain
tuple statistics.
</para>
<sect2>
<title>Functions</title>
<itemizedlist>
<listitem>
<para>
<literal>pgstattuple()</literal> returns the relation length, percentage
of the "dead" tuples of a relation and other info. This may help users to
determine whether vacuum is necessary or not. Here is an example session:
</para>
<programlisting>
test=> \x
Expanded display is on.
test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
-[ RECORD 1 ]------+-------
table_len | 458752
tuple_count | 1470
tuple_len | 438896
tuple_percent | 95.67
dead_tuple_count | 11
dead_tuple_len | 3157
dead_tuple_percent | 0.69
free_space | 8932
free_percent | 1.95
</programlisting>
<para>
Here are explanations for each column:
</para>
<table>
<title><literal>pgstattuple()</literal> column descriptions</title>
<tgroup cols="2">
<thead>
<row>
<entry>Column</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>table_len</entry>
<entry>physical relation length in bytes</entry>
</row>
<row>
<entry>tuple_count</entry>
<entry>number of live tuples</entry>
</row>
<row>
<entry>tuple_len</entry>
<entry>total tuples length in bytes</entry>
</row>
<row>
<entry>tuple_percent</entry>
<entry>live tuples in %</entry>
</row>
<row>
<entry>dead_tuple_len</entry>
<entry>total dead tuples length in bytes</entry>
</row>
<row>
<entry>dead_tuple_percent</entry>
<entry>dead tuples in %</entry>
</row>
<row>
<entry>free_space</entry>
<entry>free space in bytes</entry>
</row>
<row>
<entry>free_percent</entry>
<entry>free space in %</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
<note>
<para>
<literal>pgstattuple</literal> acquires only a read lock on the relation. So
concurrent update may affect the result.
</para>
</note>
<note>
<para>
<literal>pgstattuple</literal> judges a tuple is "dead" if HeapTupleSatisfiesNow()
returns false.
</para>
</note>
</para>
</listitem>
<listitem>
<para>
<literal>pg_relpages()</literal> returns the number of pages in the relation.
</para>
</listitem>
<listitem>
<para>
<literal>pgstatindex()</literal> returns an array showing the information about an index:
</para>
<programlisting>
test=> \x
Expanded display is on.
test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
-[ RECORD 1 ]------+------
version | 2
tree_level | 0
index_size | 8192
root_block_no | 1
internal_pages | 0
leaf_pages | 1
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 50.27
leaf_fragmentation | 0
</programlisting>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Usage</title>
<para>
<literal>pgstattuple</literal> may be called as a relation function and is
defined as follows:
</para>
<programlisting>
CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuple'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuplebyid'
LANGUAGE C STRICT;
</programlisting>
<para>
The argument is the relation name (optionally it may be qualified)
or the OID of the relation. Note that pgstattuple only returns
one row.
</para>
</sect2>
</sect1>

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.83 2007/11/01 17:00:18 momjian Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.84 2007/11/10 23:30:46 momjian Exp $ -->
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
@ -102,6 +102,7 @@
&typeconv;
&indices;
&textsearch;
&contrib;
&mvcc;
&perform;

450
doc/src/sgml/seg.sgml Normal file
View File

@ -0,0 +1,450 @@
<sect1 id="seg">
<title>seg</title>
<indexterm zone="seg">
<primary>seg</primary>
</indexterm>
<para>
The <literal>seg</literal> module contains the code for the user-defined
type, <literal>SEG</literal>, representing laboratory measurements as
floating point intervals.
</para>
<sect2>
<title>Rationale</title>
<para>
The geometry of measurements is usually more complex than that of a
point in a numeric continuum. A measurement is usually a segment of
that continuum with somewhat fuzzy limits. The measurements come out
as intervals because of uncertainty and randomness, as well as because
the value being measured may naturally be an interval indicating some
condition, such as the temperature range of stability of a protein.
</para>
<para>
Using just common sense, it appears more convenient to store such data
as intervals, rather than pairs of numbers. In practice, it even turns
out more efficient in most applications.
</para>
<para>
Further along the line of common sense, the fuzziness of the limits
suggests that the use of traditional numeric data types leads to a
certain loss of information. Consider this: your instrument reads
6.50, and you input this reading into the database. What do you get
when you fetch it? Watch:
</para>
<programlisting>
test=> select 6.50 as "pH";
pH
---
6.5
(1 row)
</programlisting>
<para>
In the world of measurements, 6.50 is not the same as 6.5. It may
sometimes be critically different. The experimenters usually write
down (and publish) the digits they trust. 6.50 is actually a fuzzy
interval contained within a bigger and even fuzzier interval, 6.5,
with their center points being (probably) the only common feature they
share. We definitely do not want such different data items to appear the
same.
</para>
<para>
Conclusion? It is nice to have a special data type that can record the
limits of an interval with arbitrarily variable precision. Variable in
a sense that each data element records its own precision.
</para>
<para>
Check this out:
</para>
<programlisting>
test=> select '6.25 .. 6.50'::seg as "pH";
pH
------------
6.25 .. 6.50
(1 row)
</programlisting>
</sect2>
<sect2>
<title>Syntax</title>
<para>
The external representation of an interval is formed using one or two
floating point numbers joined by the range operator ('..' or '...').
Optional certainty indicators (<, > and ~) are ignored by the internal
logics, but are retained in the data.
</para>
<table>
<title>Rules</title>
<tgroup cols="2">
<tbody>
<row>
<entry>rule 1</entry>
<entry>seg -> boundary PLUMIN deviation</entry>
</row>
<row>
<entry>rule 2</entry>
<entry>seg -> boundary RANGE boundary</entry>
</row>
<row>
<entry>rule 3</entry>
<entry>seg -> boundary RANGE</entry>
</row>
<row>
<entry>rule 4</entry>
<entry>seg -> RANGE boundary</entry>
</row>
<row>
<entry>rule 5</entry>
<entry>seg -> boundary</entry>
</row>
<row>
<entry>rule 6</entry>
<entry>boundary -> FLOAT</entry>
</row>
<row>
<entry>rule 7</entry>
<entry>boundary -> EXTENSION FLOAT</entry>
</row>
<row>
<entry>rule 8</entry>
<entry>deviation -> FLOAT</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>Tokens</title>
<tgroup cols="2">
<tbody>
<row>
<entry>RANGE</entry>
<entry>(\.\.)(\.)?</entry>
</row>
<row>
<entry>PLUMIN</entry>
<entry>\'\+\-\'</entry>
</row>
<row>
<entry>integer</entry>
<entry>[+-]?[0-9]+</entry>
</row>
<row>
<entry>real</entry>
<entry>[+-]?[0-9]+\.[0-9]+</entry>
</row>
<row>
<entry>FLOAT</entry>
<entry>({integer}|{real})([eE]{integer})?</entry>
</row>
<row>
<entry>EXTENSION</entry>
<entry>[<>~]</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>Examples of valid <literal>SEG</literal> representations</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Any number</entry>
<entry>
(rules 5,6) -- creates a zero-length segment (a point,
if you will)
</entry>
</row>
<row>
<entry>~5.0</entry>
<entry>
(rules 5,7) -- creates a zero-length segment AND records
'~' in the data. This notation reads 'approximately 5.0',
but its meaning is not recognized by the code. It is ignored
until you get the value back. View it is a short-hand comment.
</entry>
</row>
<row>
<entry><5.0</entry>
<entry>
(rules 5,7) -- creates a point at 5.0; '<' is ignored but
is preserved as a comment
</entry>
</row>
<row>
<entry>>5.0</entry>
<entry>
(rules 5,7) -- creates a point at 5.0; '>' is ignored but
is preserved as a comment
</entry>
</row>
<row>
<entry><para>5(+-)0.3</para><para>5'+-'0.3</para></entry>
<entry>
<para>
(rules 1,8) -- creates an interval '4.7..5.3'. As of this
writing (02/09/2000), this mechanism isn't completely accurate
in determining the number of significant digits for the
boundaries. For example, it adds an extra digit to the lower
boundary if the resulting interval includes a power of ten:
</para>
<programlisting>
postgres=> select '10(+-)1'::seg as seg;
seg
---------
9.0 .. 11 -- should be: 9 .. 11
</programlisting>
<para>
Also, the (+-) notation is not preserved: 'a(+-)b' will
always be returned as '(a-b) .. (a+b)'. The purpose of this
notation is to allow input from certain data sources without
conversion.
</para>
</entry>
</row>
<row>
<entry>50 .. </entry>
<entry>(rule 3) -- everything that is greater than or equal to 50</entry>
</row>
<row>
<entry>.. 0</entry>
<entry>(rule 4) -- everything that is less than or equal to 0</entry>
</row>
<row>
<entry>1.5e-2 .. 2E-2 </entry>
<entry>(rule 2) -- creates an interval (0.015 .. 0.02)</entry>
</row>
<row>
<entry>1 ... 2</entry>
<entry>
The same as 1...2, or 1 .. 2, or 1..2 (space is ignored).
Because of the widespread use of '...' in the data sources,
I decided to stick to is as a range operator. This, and
also the fact that the white space around the range operator
is ignored, creates a parsing conflict with numeric constants
starting with a decimal point.
</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>Examples</title>
<tgroup cols="2">
<tbody>
<row>
<entry>.1e7</entry>
<entry>should be: 0.1e7</entry>
</row>
<row>
<entry>.1 .. .2</entry>
<entry>should be: 0.1 .. 0.2</entry>
</row>
<row>
<entry>2.4 E4</entry>
<entry>should be: 2.4E4</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
The following, although it is not a syntax error, is disallowed to improve
the sanity of the data:
</para>
<table>
<title></title>
<tgroup cols="2">
<tbody>
<row>
<entry>5 .. 2</entry>
<entry>should be: 2 .. 5</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Precision</title>
<para>
The segments are stored internally as pairs of 32-bit floating point
numbers. It means that the numbers with more than 7 significant digits
will be truncated.
</para>
<para>
The numbers with less than or exactly 7 significant digits retain their
original precision. That is, if your query returns 0.00, you will be
sure that the trailing zeroes are not the artifacts of formatting: they
reflect the precision of the original data. The number of leading
zeroes does not affect precision: the value 0.0067 is considered to
have just 2 significant digits.
</para>
</sect2>
<sect2>
<title>Usage</title>
<para>
The access method for SEG is a GiST index (gist_seg_ops), which is a
generalization of R-tree. GiSTs allow the postgres implementation of
R-tree, originally encoded to support 2-D geometric types such as
boxes and polygons, to be used with any data type whose data domain
can be partitioned using the concepts of containment, intersection and
equality. In other words, everything that can intersect or contain
its own kind can be indexed with a GiST. That includes, among other
things, all geometric data types, regardless of their dimensionality
(see also contrib/cube).
</para>
<para>
The operators supported by the GiST access method include:
</para>
<itemizedlist>
<listitem>
<programlisting>
[a, b] << [c, d] Is left of
</programlisting>
<para>
The left operand, [a, b], occurs entirely to the left of the
right operand, [c, d], on the axis (-inf, inf). It means,
[a, b] << [c, d] is true if b < c and false otherwise
</para>
</listitem>
<listitem>
<programlisting>
[a, b] >> [c, d] Is right of
</programlisting>
<para>
[a, b] is occurs entirely to the right of [c, d].
[a, b] >> [c, d] is true if a > d and false otherwise
</para>
</listitem>
<listitem>
<programlisting>
[a, b] &< [c, d] Overlaps or is left of
</programlisting>
<para>
This might be better read as "does not extend to right of".
It is true when b <= d.
</para>
</listitem>
<listitem>
<programlisting>
[a, b] &> [c, d] Overlaps or is right of
</programlisting>
<para>
This might be better read as "does not extend to left of".
It is true when a >= c.
</para>
</listitem>
<listitem>
<programlisting>
[a, b] = [c, d] Same as
</programlisting>
<para>
The segments [a, b] and [c, d] are identical, that is, a == b
and c == d
</para>
</listitem>
<listitem>
<programlisting>
[a, b] && [c, d] Overlaps
</programlisting>
<para>
The segments [a, b] and [c, d] overlap.
</para>
</listitem>
<listitem>
<programlisting>
[a, b] @> [c, d] Contains
</programlisting>
<para>
The segment [a, b] contains the segment [c, d], that is,
a <= c and b >= d
</para>
</listitem>
<listitem>
<programlisting>
[a, b] <@ [c, d] Contained in
</programlisting>
<para>
The segment [a, b] is contained in [c, d], that is,
a >= c and b <= d
</para>
</listitem>
</itemizedlist>
<para>
(Before PostgreSQL 8.2, the containment operators @> and <@ were
respectively called @ and ~. These names are still available, but are
deprecated and will eventually be retired. Notice that the old names
are reversed from the convention formerly followed by the core geometric
datatypes!)
</para>
<para>
Although the mnemonics of the following operators is questionable, I
preserved them to maintain visual consistency with other geometric
data types defined in Postgres.
</para>
<para>
Other operators:
</para>
<programlisting>
[a, b] < [c, d] Less than
[a, b] > [c, d] Greater than
</programlisting>
<para>
These operators do not make a lot of sense for any practical
purpose but sorting. These operators first compare (a) to (c),
and if these are equal, compare (b) to (d). That accounts for
reasonably good sorting in most cases, which is useful if
you want to use ORDER BY with this type
</para>
<para>
There are a few other potentially useful functions defined in seg.c
that vanished from the schema because I stopped using them. Some of
these were meant to support type casting. Let me know if I was wrong:
I will then add them back to the schema. I would also appreciate
other ideas that would enhance the type and make it more useful.
</para>
<para>
For examples of usage, see sql/seg.sql
</para>
<para>
NOTE: The performance of an R-tree index can largely depend on the
order of input values. It may be very helpful to sort the input table
on the SEG column (see the script sort-segments.pl for an example)
</para>
</sect2>
<sect2>
<title>Credits</title>
<para>
My thanks are primarily to Prof. Joe Hellerstein
(<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am
also grateful to all postgres developers, present and past, for enabling
myself to create my own world and live undisturbed in it. And I would like
to acknowledge my gratitude to Argonne Lab and to the U.S. Department of
Energy for the years of faithful support of my database research.
</para>
<programlisting>
Gene Selkov, Jr.
Computational Scientist
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Ave.
Building 221
Argonne, IL 60439-4844
</programlisting>
<para>
<email>selkovjr@mcs.anl.gov</email>
</para>
</sect2>
</sect1>

164
doc/src/sgml/sslinfo.sgml Normal file
View File

@ -0,0 +1,164 @@
<sect1 id="sslinfo">
<title>sslinfo</title>
<indexterm zone="sslinfo">
<primary>sslinfo</primary>
</indexterm>
<para>
This modules provides information about current SSL certificate for PostgreSQL.
</para>
<sect2>
<title>Notes</title>
<para>
This extension won't build unless your PostgreSQL server is configured
with --with-openssl. Information provided with these functions would
be completely useless if you don't use SSL to connect to database.
</para>
</sect2>
<sect2>
<title>Functions Description</title>
<itemizedlist>
<listitem>
<programlisting>
ssl_is_used() RETURNS boolean;
</programlisting>
<para>
Returns TRUE, if current connection to server uses SSL and FALSE
otherwise.
</para>
</listitem>
<listitem>
<programlisting>
ssl_client_cert_present() RETURNS boolean
</programlisting>
<para>
Returns TRUE if current client have presented valid SSL client
certificate to the server and FALSE otherwise (e.g., no SSL,
certificate hadn't be requested by server).
</para>
</listitem>
<listitem>
<programlisting>
ssl_client_serial() RETURNS numeric
</programlisting>
<para>
Returns serial number of current client certificate. The combination
of certificate serial number and certificate issuer is guaranteed to
uniquely identify certificate (but not its owner -- the owner ought to
regularily change his keys, and get new certificates from the issuer).
</para>
<para>
So, if you run you own CA and allow only certificates from this CA to
be accepted by server, the serial number is the most reliable (albeit
not very mnemonic) means to indentify user.
</para>
</listitem>
<listitem>
<programlisting>
ssl_client_dn() RETURNS text
</programlisting>
<para>
Returns the full subject of current client certificate, converting
character data into the current database encoding. It is assumed that
if you use non-Latin characters in the certificate names, your
database is able to represent these characters, too. If your database
uses the SQL_ASCII encoding, non-Latin characters in the name will be
represented as UTF-8 sequences.
</para>
<para>
The result looks like '/CN=Somebody /C=Some country/O=Some organization'.
</para>
</listitem>
<listitem>
<programlisting>
ssl_issuer_dn()
</programlisting>
<para>
Returns the full issuer name of the client certificate, converting
character data into current database encoding.
</para>
<para>
The combination of the return value of this function with the
certificate serial number uniquely identifies the certificate.
</para>
<para>
The result of this function is really useful only if you have more
than one trusted CA certificate in your server's root.crt file, or if
this CA has issued some intermediate certificate authority
certificates.
</para>
</listitem>
<listitem>
<programlisting>
ssl_client_dn_field(fieldName text) RETURNS text
</programlisting>
<para>
This function returns the value of the specified field in the
certificate subject. Field names are string constants that are
converted into ASN1 object identificators using the OpenSSL object
database. The following values are acceptable:
</para>
<programlisting>
commonName (alias CN)
surname (alias SN)
name
givenName (alias GN)
countryName (alias C)
localityName (alias L)
stateOrProvinceName (alias ST)
organizationName (alias O)
organizationUnitName (alias OU)
title
description
initials
postalCode
streetAddress
generationQualifier
description
dnQualifier
x500UniqueIdentifier
pseudonim
role
emailAddress
</programlisting>
<para>
All of these fields are optional, except commonName. It depends
entirely on your CA policy which of them would be included and which
wouldn't. The meaning of these fields, howeer, is strictly defined by
the X.500 and X.509 standards, so you cannot just assign arbitrary
meaning to them.
</para>
</listitem>
<listitem>
<programlisting>
ssl_issuer_field(fieldName text) RETURNS text;
</programlisting>
<para>
Does same as ssl_client_dn_field, but for the certificate issuer
rather than the certificate subject.
</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Author</title>
<para>
Victor Wagner <email>vitus@cryptocom.ru</email>, Cryptocom LTD
E-Mail of Cryptocom OpenSSL development group:
<email>openssl@cryptocom.ru</email>
</para>
</sect2>
</sect1>

249
doc/src/sgml/standby.sgml Normal file
View File

@ -0,0 +1,249 @@
<sect1 id="pgstandby">
<title>pg_standby</title>
<indexterm zone="pgstandby">
<primary>pgstandby</primary>
</indexterm>
<para>
<literal>pg_standby</literal> is a production-ready program that can be used
to create a Warm Standby server. Other configuration is required as well,
all of which is described in the main server manual.
</para>
<para>
The program is designed to be a wait-for <literal>restore_command</literal>,
required to turn a normal archive recovery into a Warm Standby. Within the
<literal>restore_command</literal> of the <literal>recovery.conf</literal>
you could configure <literal>pg_standby</literal> in the following way:
</para>
<programlisting>
restore_command = 'pg_standby archiveDir %f %p'
</programlisting>
<para>
which would be sufficient to define that files will be restored from
archiveDir.
</para>
<para>
<literal>pg_standby</literal> features include:
</para>
<itemizedlist>
<listitem>
<para>
It is written in C. So it is very portable
and easy to install.
</para>
</listitem>
<listitem>
<para>
Supports copy or link from a directory (only)
</para>
</listitem>
<listitem>
<para>
Source easy to modify, with specifically designated
sections to modify for your own needs, allowing
interfaces to be written for additional Backup Archive Restore
(BAR) systems
</para>
</listitem>
<listitem>
<para>
Already tested on Linux and Windows
</para>
</listitem>
</itemizedlist>
<sect2>
<title>Usage</title>
<para>
<literal>pg_standby</literal> should be used within the
<literal>restore_command</literal> of the <literal>recovery.conf</literal>
file.
</para>
<para>
The basic usage should be like this:
</para>
<programlisting>
restore_command = 'pg_standby archiveDir %f %p'
</programlisting>
<para>
with the pg_standby command usage as
</para>
<programlisting>
pg_standby [OPTION]... [ARCHIVELOCATION] [NEXTWALFILE] [XLOGFILEPATH]
</programlisting>
<para>
When used within the <literal>restore_command</literal> the %f and %p macros
will provide the actual file and path required for the restore/recovery.
</para>
<table>
<title>Options</title>
<tgroup cols="2">
<tbody>
<row>
<entry>-c</entry>
<entry> use copy/cp command to restore WAL files from archive</entry>
</row>
<row>
<entry>-d</entry>
<entry>debug/logging option.</entry>
</row>
<row>
<entry>-k numfiles</entry>
<entry>
<para>
Cleanup files in the archive so that we maintain no more
than this many files in the archive.
</para>
<para>
You should be wary against setting this number too low,
since this may mean you cannot restart the standby. This
is because the last restartpoint marked in the WAL files
may be many files in the past and can vary considerably.
This should be set to a value exceeding the number of WAL
files that can be recovered in 2*checkpoint_timeout seconds,
according to the value in the warm standby postgresql.conf.
It is wholly unrelated to the setting of checkpoint_segments
on either primary or standby.
</para>
<para>
If in doubt, use a large value or do not set a value at all.
</para>
</entry>
</row>
<row>
<entry>-l</entry>
<entry>
<para>
use ln command to restore WAL files from archive
WAL files will remain in archive
</para>
<para>
Link is more efficient, but the default is copy to
allow you to maintain the WAL archive for recovery
purposes as well as high-availability.
</para>
<para>
This option uses the Windows Vista command mklink
to provide a file-to-file symbolic link. -l will
not work on versions of Windows prior to Vista.
Use the -c option instead.
see <ulink url="http://en.wikipedia.org/wiki/NTFS_symbolic_link"></ulink>
</para>
</entry>
</row>
<row>
<entry>-r maxretries</entry>
<entry>
<para>
the maximum number of times to retry the restore command if it
fails. After each failure, we wait for sleeptime * num_retries
so that the wait time increases progressively, so by default
we will wait 5 secs, 10 secs then 15 secs before reporting
the failure back to the database server. This will be
interpreted as and end of recovery and the Standby will come
up fully as a result. <literal>Default=3</literal>
</para>
</entry>
</row>
<row>
<entry>-s sleeptime</entry>
<entry>
the number of seconds to sleep between testing to see
if the file to be restored is available in the archive yet.
The default setting is not necessarily recommended,
consult the main database server manual for discussion.
<literal>Default=5</literal>
</entry>
</row>
<row>
<entry>-t triggerfile</entry>
<entry>
the presence of the triggerfile will cause recovery to end
whether or not the next file is available
It is recommended that you use a structured filename to
avoid confusion as to which server is being triggered
when multiple servers exist on same system.
e.g. /tmp/pgsql.trigger.5432
</entry>
</row>
<row>
<entry>-w maxwaittime</entry>
<entry>
the maximum number of seconds to wait for the next file,
after which recovery will end and the Standby will come up.
The default setting is not necessarily recommended,
consult the main database server manual for discussion.
<literal>Default=0</literal>
</entry>
</row>
</tbody>
</tgroup>
</table>
<note>
<para>
<literal>--help</literal> is not supported since
<literal>pg_standby</literal> is not intended for interactive use, except
during development and testing.
</para>
</note>
</sect2>
<sect2>
<title>Examples</title>
<itemizedlist>
<listitem>
<para>Example on Linux</para>
<programlisting>
archive_command = 'cp %p ../archive/%f'
restore_command = 'pg_standby -l -d -k 255 -r 2 -s 2 -w 0 -t /tmp/pgsql.trigger.5442 $PWD/../archive %f %p 2>> standby.log'
</programlisting>
<para>
which will
</para>
<itemizedlist>
<listitem><para>use a ln command to restore WAL files from archive</para></listitem>
<listitem><para>produce logfile output in standby.log</para></listitem>
<listitem><para>keep the last 255 full WAL files, plus the current one</para></listitem>
<listitem><para>sleep for 2 seconds between checks for next WAL file is full</para></listitem>
<listitem><para>never timeout if file not found</para></listitem>
<listitem><para>stop waiting when a trigger file called /tmp.pgsql.trigger.5442 appears</para></listitem>
</itemizedlist>
</listitem>
<listitem>
<para>
Example on Windows
</para>
<programlisting>
archive_command = 'copy %p ..\\archive\\%f'
</programlisting>
<para>
Note that backslashes need to be doubled in the archive_command, but
*not* in the restore_command, in 8.2, 8.1, 8.0 on Windows.
</para>
<programlisting>
restore_command = 'pg_standby -c -d -s 5 -w 0 -t C:\pgsql.trigger.5442
..\archive %f %p 2>> standby.log'
</programlisting>
<para>
which will
</para>
<itemizedlist>
<listitem><para>use a copy command to restore WAL files from archive</para></listitem>
<listitem><para>produce logfile output in standby.log</para></listitem>
<listitem><para>sleep for 5 seconds between checks for next WAL file is full</para></listitem>
<listitem><para>never timeout if file not found</para></listitem>
<listitem><para>stop waiting when a trigger file called C:\pgsql.trigger.5442 appears</para></listitem>
</itemizedlist>
</listitem>
</itemizedlist>
</sect2>
</sect1>

765
doc/src/sgml/tablefunc.sgml Normal file
View File

@ -0,0 +1,765 @@
<sect1 id="tablefunc">
<title>tablefunc</title>
<indexterm zone="tablefunc">
<primary>tablefunc</primary>
</indexterm>
<para>
<literal>tablefunc</literal> provides functions to convert query rows into fields.
</para>
<sect2>
<title>Functions</title>
<table>
<title></title>
<tgroup cols="3">
<thead>
<row>
<entry>Function</entry>
<entry>Returns</entry>
<entry>Comments</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<literal>
normal_rand(int numvals, float8 mean, float8 stddev)
</literal>
</entry>
<entry>
returns a set of normally distributed float8 values
</entry>
<entry></entry>
</row>
<row>
<entry><literal>crosstabN(text sql)</literal></entry>
<entry>returns a set of row_name plus N category value columns</entry>
<entry>
crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per the instructions
in the documentation below.
</entry>
</row>
<row>
<entry><literal>crosstab(text sql)</literal></entry>
<entry>returns a set of row_name plus N category value columns</entry>
<entry>
requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
</entry>
</row>
<row>
<entry><literal>crosstab(text sql, N int)</literal></entry>
<entry></entry>
<entry>
<para>obsolete version of crosstab()</para>
<para>
the argument N is now ignored, since the number of value columns
is always determined by the calling query
</para>
</entry>
</row>
<row>
<entry>
<literal>
connectby(text relname, text keyid_fld, text parent_keyid_fld
[, text orderby_fld], text start_with, int max_depth
[, text branch_delim])
</literal>
</entry>
<entry>
returns keyid, parent_keyid, level, and an optional branch string
and an optional serial column for ordering siblings
</entry>
<entry>
requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
</entry>
</row>
</tbody>
</tgroup>
</table>
<sect3>
<title><literal>normal_rand</literal></title>
<programlisting>
normal_rand(int numvals, float8 mean, float8 stddev) RETURNS SETOF float8
</programlisting>
<para>
Where <literal>numvals</literal> is the number of values to be returned
from the function. <literal>mean</literal> is the mean of the normal
distribution of values and <literal>stddev</literal> is the standard
deviation of the normal distribution of values.
</para>
<para>
Returns a float8 set of random values normally distributed (Gaussian
distribution).
</para>
<para>
Example:
</para>
<programlisting>
test=# SELECT * FROM
test=# normal_rand(1000, 5, 3);
normal_rand
----------------------
1.56556322244898
9.10040991424657
5.36957140345079
-0.369151492880995
0.283600703686639
.
.
.
4.82992125404908
9.71308014517282
2.49639286969028
(1000 rows)
</programlisting>
<para>
Returns 1000 values with a mean of 5 and a standard deviation of 3.
</para>
</sect3>
<sect3>
<title><literal>crosstabN(text sql)</literal></title>
<programlisting>
crosstabN(text sql)
</programlisting>
<para>
The <literal>sql</literal> parameter is a SQL statement which produces the
source set of data. The SQL statement must return one row_name column, one
category column, and one value column. <literal>row_name</literal> and
value must be of type text. The function returns a set of
<literal>row_name</literal> plus N category value columns.
</para>
<para>
Provided <literal>sql</literal> must produce a set something like:
</para>
<programlisting>
row_name cat value
---------+-------+-------
row1 cat1 val1
row1 cat2 val2
row1 cat3 val3
row1 cat4 val4
row2 cat1 val5
row2 cat2 val6
row2 cat3 val7
row2 cat4 val8
</programlisting>
<para>
The returned value is a <literal>SETOF table_crosstab_N</literal>, which
is defined by:
</para>
<programlisting>
CREATE TYPE tablefunc_crosstab_N AS (
row_name TEXT,
category_1 TEXT,
category_2 TEXT,
.
.
.
category_N TEXT
);
</programlisting>
<para>
for the default installed functions, where N is 2, 3, or 4.
</para>
<para>
e.g. the provided crosstab2 function produces a set something like:
</para>
<programlisting>
<== values columns ==>
row_name category_1 category_2
---------+------------+------------
row1 val1 val2
row2 val5 val6
</programlisting>
<note>
<orderedlist>
<listitem><para>The sql result must be ordered by 1,2.</para></listitem>
<listitem>
<para>
The number of values columns depends on the tuple description
of the function's declared return type.
</para>
</listitem>
<listitem>
<para>
Missing values (i.e. not enough adjacent rows of same row_name to
fill the number of result values columns) are filled in with nulls.
</para>
</listitem>
<listitem>
<para>
Extra values (i.e. too many adjacent rows of same row_name to fill
the number of result values columns) are skipped.
</para>
</listitem>
<listitem>
<para>
Rows with all nulls in the values columns are skipped.
</para>
</listitem>
<listitem>
<para>
The installed defaults are for illustration purposes. You
can create your own return types and functions based on the
crosstab() function of the installed library. See below for
details.
</para>
</listitem>
</orderedlist>
</note>
<para>
Example:
</para>
<programlisting>
create table ct(id serial, rowclass text, rowid text, attribute text, value text);
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att1','val1');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att2','val2');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att3','val3');
insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att4','val4');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att1','val5');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att2','val6');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att3','val7');
insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att4','val8');
select * from crosstab3(
'select rowid, attribute, value
from ct
where rowclass = ''group1''
and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;');
row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
test1 | val2 | val3 |
test2 | val6 | val7 |
(2 rows)
</programlisting>
</sect3>
<sect3>
<title><literal>crosstab(text)</literal></title>
<programlisting>
crosstab(text sql)
crosstab(text sql, int N)
</programlisting>
<para>
The <literal>sql</literal> parameter is a SQL statement which produces the
source set of data. The SQL statement must return one
<literal>row_name</literal> column, one <literal>category</literal> column,
and one <literal>value</literal> column. <literal>N</literal> is an
obsolete argument; ignored if supplied (formerly this had to match the
number of category columns determined by the calling query).
</para>
<para>
</para>
<para>
e.g. provided sql must produce a set something like:
</para>
<programlisting>
row_name cat value
----------+-------+-------
row1 cat1 val1
row1 cat2 val2
row1 cat3 val3
row1 cat4 val4
row2 cat1 val5
row2 cat2 val6
row2 cat3 val7
row2 cat4 val8
</programlisting>
<para>
Returns a <literal>SETOF RECORD</literal>, which must be defined with a
column definition in the FROM clause of the SELECT statement, e.g.:
</para>
<programlisting>
SELECT *
FROM crosstab(sql) AS ct(row_name text, category_1 text, category_2 text);
</programlisting>
<para>
the example crosstab function produces a set something like:
</para>
<programlisting>
<== values columns ==>
row_name category_1 category_2
---------+------------+------------
row1 val1 val2
row2 val5 val6
</programlisting>
<para>
Note that it follows these rules:
</para>
<orderedlist>
<listitem><para>The sql result must be ordered by 1,2.</para></listitem>
<listitem>
<para>
The number of values columns is determined by the column definition
provided in the FROM clause. The FROM clause must define one
row_name column (of the same datatype as the first result column
of the sql query) followed by N category columns (of the same
datatype as the third result column of the sql query). You can
set up as many category columns as you wish.
</para>
</listitem>
<listitem>
<para>
Missing values (i.e. not enough adjacent rows of same row_name to
fill the number of result values columns) are filled in with nulls.
</para>
</listitem>
<listitem>
<para>
Extra values (i.e. too many adjacent rows of same row_name to fill
the number of result values columns) are skipped.
</para>
</listitem>
<listitem>
<para>
Rows with all nulls in the values columns are skipped.
</para>
</listitem>
<listitem>
<para>
You can avoid always having to write out a FROM clause that defines the
output columns by setting up a custom crosstab function that has
the desired output row type wired into its definition.
</para>
</listitem>
</orderedlist>
<para>
There are two ways you can set up a custom crosstab function:
</para>
<itemizedlist>
<listitem>
<para>
Create a composite type to define your return type, similar to the
examples in the installation script. Then define a unique function
name accepting one text parameter and returning setof your_type_name.
For example, if your source data produces row_names that are TEXT,
and values that are FLOAT8, and you want 5 category columns:
</para>
<programlisting>
CREATE TYPE my_crosstab_float8_5_cols AS (
row_name TEXT,
category_1 FLOAT8,
category_2 FLOAT8,
category_3 FLOAT8,
category_4 FLOAT8,
category_5 FLOAT8
);
CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text)
RETURNS setof my_crosstab_float8_5_cols
AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
</programlisting>
</listitem>
<listitem>
<para>
Use OUT parameters to define the return type implicitly.
The same example could also be done this way:
</para>
<programlisting>
CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(IN text,
OUT row_name TEXT,
OUT category_1 FLOAT8,
OUT category_2 FLOAT8,
OUT category_3 FLOAT8,
OUT category_4 FLOAT8,
OUT category_5 FLOAT8)
RETURNS setof record
AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
</programlisting>
</listitem>
</itemizedlist>
<para>
Example:
</para>
<programlisting>
CREATE TABLE ct(id SERIAL, rowclass TEXT, rowid TEXT, attribute TEXT, value TEXT);
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att1','val1');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att2','val2');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att3','val3');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att4','val4');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att1','val5');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att2','val6');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att3','val7');
INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att4','val8');
SELECT *
FROM crosstab(
'select rowid, attribute, value
from ct
where rowclass = ''group1''
and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;', 3)
AS ct(row_name text, category_1 text, category_2 text, category_3 text);
row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
test1 | val2 | val3 |
test2 | val6 | val7 |
(2 rows)
</programlisting>
</sect3>
<sect3>
<title><literal>crosstab(text, text)</literal></title>
<programlisting>
crosstab(text source_sql, text category_sql)
</programlisting>
<para>
Where <literal>source_sql</literal> is a SQL statement which produces the
source set of data. The SQL statement must return one
<literal>row_name</literal> column, one <literal>category</literal> column,
and one <literal>value</literal> column. It may also have one or more
<emphasis>extra</emphasis> columns.
</para>
<para>
The <literal>row_name</literal> column must be first. The
<literal>category</literal> and <literal>value</literal> columns must be
the last two columns, in that order. <emphasis>extra</emphasis> columns must
be columns 2 through (N - 2), where N is the total number of columns.
</para>
<para>
The <emphasis>extra</emphasis> columns are assumed to be the same for all
rows with the same <literal>row_name</literal>. The values returned are
copied from the first row with a given <literal>row_name</literal> and
subsequent values of these columns are ignored until
<literal>row_name</literal> changes.
</para>
<para>
e.g. <literal>source_sql</literal> must produce a set something like:
</para>
<programlisting>
SELECT row_name, extra_col, cat, value FROM foo;
row_name extra_col cat value
----------+------------+-----+---------
row1 extra1 cat1 val1
row1 extra1 cat2 val2
row1 extra1 cat4 val4
row2 extra2 cat1 val5
row2 extra2 cat2 val6
row2 extra2 cat3 val7
row2 extra2 cat4 val8
</programlisting>
<para>
<literal>category_sql</literal> has to be a SQL statement which produces
the distinct set of categories. The SQL statement must return one category
column only. <literal>category_sql</literal> must produce at least one
result row or an error will be generated. <literal>category_sql</literal>
must not produce duplicate categories or an error will be generated. e.g.:
</para>
<programlisting>
SELECT DISTINCT cat FROM foo;
cat
-------
cat1
cat2
cat3
cat4
</programlisting>
<para>
The function returns <literal>SETOF RECORD</literal>, which must be defined
with a column definition in the FROM clause of the SELECT statement, e.g.:
</para>
<programlisting>
SELECT * FROM crosstab(source_sql, cat_sql)
AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);
</programlisting>
<para>
the example crosstab function produces a set something like:
</para>
<programlisting>
<== values columns ==>
row_name extra cat1 cat2 cat3 cat4
---------+-------+------+------+------+------
row1 extra1 val1 val2 val4
row2 extra2 val5 val6 val7 val8
</programlisting>
<para>
Note that it follows these rules:
</para>
<orderedlist>
<listitem><para>source_sql must be ordered by row_name (column 1).</para></listitem>
<listitem>
<para>
The number of values columns is determined at run-time. The
column definition provided in the FROM clause must provide for
the correct number of columns of the proper data types.
</para>
</listitem>
<listitem>
<para>
Missing values (i.e. not enough adjacent rows of same row_name to
fill the number of result values columns) are filled in with nulls.
</para>
</listitem>
<listitem>
<para>
Extra values (i.e. source rows with category not found in category_sql
result) are skipped.
</para>
</listitem>
<listitem>
<para>
Rows with a null row_name column are skipped.
</para>
</listitem>
<listitem>
<para>
You can create predefined functions to avoid having to write out
the result column names/types in each query. See the examples
for crosstab(text).
</para>
</listitem>
</orderedlist>
<programlisting>
CREATE TABLE cth(id serial, rowid text, rowdt timestamp, attribute text, val text);
INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','temperature','42');
INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','test_result','PASS');
INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','volts','2.6987');
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','temperature','53');
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_result','FAIL');
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003');
INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','volts','3.1234');
SELECT * FROM crosstab
(
'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1'
)
AS
(
rowid text,
rowdt timestamp,
temperature int4,
test_result text,
test_startdate timestamp,
volts float8
);
rowid | rowdt | temperature | test_result | test_startdate | volts
-------+--------------------------+-------------+-------------+--------------------------+--------
test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987
test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234
(2 rows)
</programlisting>
</sect3>
<sect3>
<title>
<literal>connectby(text, text, text[, text], text, text, int[, text])</literal>
</title>
<programlisting>
connectby(text relname, text keyid_fld, text parent_keyid_fld
[, text orderby_fld], text start_with, int max_depth
[, text branch_delim])
</programlisting>
<table>
<title><literal>connectby</literal> parameters</title>
<tgroup cols="2">
<thead>
<row>
<entry>Parameter</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>relname</literal></entry>
<entry>Name of the source relation</entry>
</row>
<row>
<entry><literal>keyid_fld</literal></entry>
<entry>Name of the key field</entry>
</row>
<row>
<entry><literal>parent_keyid_fld</literal></entry>
<entry>Name of the key_parent field</entry>
</row>
<row>
<entry><literal>orderby_fld</literal></entry>
<entry>
If optional ordering of siblings is desired: Name of the field to
order siblings
</entry>
</row>
<row>
<entry><literal>start_with</literal></entry>
<entry>
Root value of the tree input as a text value regardless of
<literal>keyid_fld</literal>
</entry>
</row>
<row>
<entry><literal>max_depth</literal></entry>
<entry>
Zero (0) for unlimited depth, otherwise restrict level to this depth
</entry>
</row>
<row>
<entry><literal>branch_delim</literal></entry>
<entry>
If optional branch value is desired, this string is used as the delimiter.
When not provided, a default value of '~' is used for internal
recursion detection only, and no "branch" field is returned.
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
The function returns <literal>SETOF RECORD</literal>, which must defined
with a column definition in the FROM clause of the SELECT statement, e.g.:
</para>
<programlisting>
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text);
</programlisting>
<para>
or
</para>
<programlisting>
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
AS t(keyid text, parent_keyid text, level int);
</programlisting>
<para>
or
</para>
<programlisting>
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text, pos int);
</programlisting>
<para>
or
</para>
<programlisting>
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
AS t(keyid text, parent_keyid text, level int, pos int);
</programlisting>
<para>
Note that it follows these rules:
</para>
<orderedlist>
<listitem><para>keyid and parent_keyid must be the same data type</para></listitem>
<listitem>
<para>
The column definition *must* include a third column of type INT4 for
the level value output
</para>
</listitem>
<listitem>
<para>
If the branch field is not desired, omit both the branch_delim input
parameter *and* the branch field in the query column definition. Note
that when branch_delim is not provided, a default value of '~' is used
for branch_delim for internal recursion detection, even though the branch
field is not returned.
</para>
</listitem>
<listitem>
<para>
If the branch field is desired, it must be the fourth column in the query
column definition, and it must be type TEXT.
</para>
</listitem>
<listitem>
<para>
The parameters representing table and field names must include double
quotes if the names are mixed-case or contain special characters.
</para>
</listitem>
<listitem>
<para>
If sorting of siblings is desired, the orderby_fld input parameter *and*
a name for the resulting serial field (type INT32) in the query column
definition must be given.
</para>
</listitem>
</orderedlist>
<para>
Example:
</para>
<programlisting>
CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int);
INSERT INTO connectby_tree VALUES('row1',NULL, 0);
INSERT INTO connectby_tree VALUES('row2','row1', 0);
INSERT INTO connectby_tree VALUES('row3','row1', 0);
INSERT INTO connectby_tree VALUES('row4','row2', 1);
INSERT INTO connectby_tree VALUES('row5','row2', 0);
INSERT INTO connectby_tree VALUES('row6','row4', 0);
INSERT INTO connectby_tree VALUES('row7','row3', 0);
INSERT INTO connectby_tree VALUES('row8','row6', 0);
INSERT INTO connectby_tree VALUES('row9','row5', 0);
-- with branch, without orderby_fld
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text);
keyid | parent_keyid | level | branch
-------+--------------+-------+---------------------
row2 | | 0 | row2
row4 | row2 | 1 | row2~row4
row6 | row4 | 2 | row2~row4~row6
row8 | row6 | 3 | row2~row4~row6~row8
row5 | row2 | 1 | row2~row5
row9 | row5 | 2 | row2~row5~row9
(6 rows)
-- without branch, without orderby_fld
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
AS t(keyid text, parent_keyid text, level int);
keyid | parent_keyid | level
-------+--------------+-------
row2 | | 0
row4 | row2 | 1
row6 | row4 | 2
row8 | row6 | 3
row5 | row2 | 1
row9 | row5 | 2
(6 rows)
-- with branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text, pos int) ORDER BY t.pos;
keyid | parent_keyid | level | branch | pos
-------+--------------+-------+---------------------+-----
row2 | | 0 | row2 | 1
row5 | row2 | 1 | row2~row5 | 2
row9 | row5 | 2 | row2~row5~row9 | 3
row4 | row2 | 1 | row2~row4 | 4
row6 | row4 | 2 | row2~row4~row6 | 5
row8 | row6 | 3 | row2~row4~row6~row8 | 6
(6 rows)
-- without branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
AS t(keyid text, parent_keyid text, level int, pos int) ORDER BY t.pos;
keyid | parent_keyid | level | pos
-------+--------------+-------+-----
row2 | | 0 | 1
row5 | row2 | 1 | 2
row9 | row5 | 2 | 3
row4 | row2 | 1 | 4
row6 | row4 | 2 | 5
row8 | row6 | 3 | 6
(6 rows)
</programlisting>
</sect3>
</sect2>
<sect2>
<title>Author</title>
<para>
Joe Conway
</para>
</sect2>
</sect1>

214
doc/src/sgml/trgm.sgml Normal file
View File

@ -0,0 +1,214 @@
<sect1 id="pgtrgm">
<title>pg_trgm</title>
<indexterm zone="pgtrgm">
<primary>pgtrgm</primary>
</indexterm>
<para>
The <literal>pg_trgm</literal> module provides functions and index classes
for determining the similarity of text based on trigram matching.
</para>
<sect2>
<title>Trigram (or Trigraph)</title>
<para>
A trigram is a set of three consecutive characters taken
from a string. A string is considered to have two spaces
prefixed and one space suffixed when determining the set
of trigrams that comprise the string.
</para>
<para>
eg. The set of trigrams in the word "cat" is " c", " ca",
"at " and "cat".
</para>
</sect2>
<sect2>
<title>Public Functions</title>
<table>
<title><literal>pg_trgm</literal> functions</title>
<tgroup cols="2">
<thead>
<row>
<entry>Function</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>real similarity(text, text)</literal></entry>
<entry>
<para>
Returns a number that indicates how closely matches the two
arguments are. A zero result indicates that the two words
are completely dissimilar, and a result of one indicates that
the two words are identical.
</para>
</entry>
</row>
<row>
<entry><literal>real show_limit()</literal></entry>
<entry>
<para>
Returns the current similarity threshold used by the '%'
operator. This in effect sets the minimum similarity between
two words in order that they be considered similar enough to
be misspellings of each other, for example.
</para>
</entry>
</row>
<row>
<entry><literal>real set_limit(real)</literal></entry>
<entry>
<para>
Sets the current similarity threshold that is used by the '%'
operator, and is returned by the show_limit() function.
</para>
</entry>
</row>
<row>
<entry><literal>text[] show_trgm(text)</literal></entry>
<entry>
<para>
Returns an array of all the trigrams of the supplied text
parameter.
</para>
</entry>
</row>
<row>
<entry>Operator: <literal>text % text (returns boolean)</literal></entry>
<entry>
<para>
The '%' operator returns TRUE if its two arguments have a similarity
that is greater than the similarity threshold set by set_limit(). It
will return FALSE if the similarity is less than the current
threshold.
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Public Index Operator Class</title>
<para>
The <literal>pg_trgm</literal> module comes with the
<literal>gist_trgm_ops</literal> index operator class that allows a
developer to create an index over a text column for the purpose
of very fast similarity searches.
</para>
<para>
To use this index, the '%' operator must be used and an appropriate
similarity threshold for the application must be set. Example:
</para>
<programlisting>
CREATE TABLE test_trgm (t text);
CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);
</programlisting>
<para>
At this point, you will have an index on the t text column that you
can use for similarity searching. Example:
</para>
<programlisting>
SELECT
t,
similarity(t, 'word') AS sml
FROM
test_trgm
WHERE
t % 'word'
ORDER BY
sml DESC, t;
</programlisting>
<para>
This will return all values in the text column that are sufficiently
similar to 'word', sorted from best match to worst. The index will
be used to make this a fast operation over very large data sets.
</para>
</sect2>
<sect2>
<title>Tsearch2 Integration</title>
<para>
Trigram matching is a very useful tool when used in conjunction
with a text index created by the Tsearch2 contrib module. (See
contrib/tsearch2)
</para>
<para>
The first step is to generate an auxiliary table containing all
the unique words in the Tsearch2 index:
</para>
<programlisting>
CREATE TABLE words AS SELECT word FROM
stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
</programlisting>
<para>
Where 'documents' is a table that has a text field 'bodytext'
that TSearch2 is used to search. The use of the 'simple' dictionary
with the to_tsvector function, instead of just using the already
existing vector is to avoid creating a list of already stemmed
words. This way, only the original, unstemmed words are added
to the word list.
</para>
<para>
Next, create a trigram index on the word column:
</para>
<programlisting>
CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops);
</programlisting>
<para>
or
</para>
<programlisting>
CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops);
</programlisting>
<para>
Now, a <literal>SELECT</literal> query similar to the example above can be
used to suggest spellings for misspelled words in user search terms. A
useful extra clause is to ensure that the similar words are also
of similar length to the misspelled word.
</para>
<para>
<note>
<para>
Since the 'words' table has been generated as a separate,
static table, it will need to be periodically regenerated so that
it remains up to date with the word list in the Tsearch2 index.
</para>
</note>
</para>
</sect2>
<sect2>
<title>References</title>
<para>
Tsearch2 Development Site
<ulink url="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/"></ulink>
</para>
<para>
GiST Development Site
<ulink url="http://www.sai.msu.su/~megera/postgres/gist/"></ulink>
</para>
</sect2>
<sect2>
<title>Authors</title>
<para>
Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia
</para>
<para>
Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia
</para>
<para>
Documentation: Christopher Kings-Lynne
</para>
<para>
This module is sponsored by Delta-Soft Ltd., Moscow, Russia.
</para>
</sect2>
</sect1>

163
doc/src/sgml/uuid-ossp.sgml Normal file
View File

@ -0,0 +1,163 @@
<sect1 id="uuid-ossp">
<title>uuid-ossp</title>
<indexterm zone="uuid-ossp">
<primary>uuid-ossp</primary>
</indexterm>
<para>
This module provides functions to generate universally unique
identifiers (UUIDs) using one of the several standard algorithms, as
well as functions to produce certain special UUID constants.
</para>
<sect2>
<title>UUID Generation</title>
<para>
The relevant standards ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC
4122 specify four algorithms for generating UUIDs, identified by the
version numbers 1, 3, 4, and 5. (There is no version 2 algorithm.)
Each of these algorithms could be suitable for a different set of
applications.
</para>
<table>
<title><literal>uuid-ossp</literal> functions</title>
<tgroup cols="2">
<thead>
<row>
<entry>Function</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><literal>uuid_generate_v1()</literal></entry>
<entry>
<para>
This function generates a version 1 UUID. This involves the MAC
address of the computer and a time stamp. Note that UUIDs of this
kind reveal the identity of the computer that created the identifier
and the time at which it did so, which might make it unsuitable for
certain security-sensitive applications.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_generate_v1mc()</literal></entry>
<entry>
<para>
This function generates a version 1 UUID but uses a random multicast
MAC address instead of the real MAC address of the computer.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_generate_v3(namespace uuid, name text)</literal></entry>
<entry>
<para>
This function generates a version 3 UUID in the given namespace using
the specified input name. The namespace should be one of the special
constants produced by the uuid_ns_*() functions shown below. (It
could be any UUID in theory.) The name is an identifier in the
selected namespace. For example:
</para>
</entry>
</row>
<row>
<entry><literal>uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org')</literal></entry>
<entry>
<para>
The name parameter will be MD5-hashed, so the cleartext cannot be
derived from the generated UUID.
</para>
<para>
The generation of UUIDs by this method has no random or
environment-dependent element and is therefore reproducible.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_generate_v4()</literal></entry>
<entry>
<para>
This function generates a version 4 UUID, which is derived entirely
from random numbers.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_generate_v5(namespace uuid, name text)</literal></entry>
<entry>
<para>
This function generates a version 5 UUID, which works like a version 3
UUID except that SHA-1 is used as a hashing method. Version 5 should
be preferred over version 3 because SHA-1 is thought to be more secure
than MD5.
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
<table>
<title>UUID Constants</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>uuid_nil()</literal></entry>
<entry>
<para>
A "nil" UUID constant, which does not occur as a real UUID.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_ns_dns()</literal></entry>
<entry>
<para>
Constant designating the DNS namespace for UUIDs.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_ns_url()</literal></entry>
<entry>
<para>
Constant designating the URL namespace for UUIDs.
</para>
</entry>
</row>
<row>
<entry><literal>uuid_ns_oid()</literal></entry>
<entry>
<para>
Constant designating the ISO object identifier (OID) namespace for
UUIDs. (This pertains to ASN.1 OIDs, unrelated to the OIDs used in
PostgreSQL.)
</para>
</entry>
</row>
<row>
<entry><literal>uuid_ns_x500()</literal></entry>
<entry>
<para>
Constant designating the X.500 distinguished name (DN) namespace for
UUIDs.
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Eisentraut <email>peter_e@gmx.net</email>
</para>
</sect2>
</sect1>

View File

@ -0,0 +1,74 @@
<sect1 id="vacuumlo">
<title>vacuumlo</title>
<indexterm zone="vacuumlo">
<primary>vacuumlo</primary>
</indexterm>
<para>
This is a simple utility that will remove any orphaned large objects out of a
PostgreSQL database. An orphaned LO is considered to be any LO whose OID
does not appear in any OID data column of the database.
</para>
<para>
If you use this, you may also be interested in the lo_manage trigger in
contrib/lo. lo_manage is useful to try to avoid creating orphaned LOs
in the first place.
</para>
<para>
<note>
<para>
It was decided to place this in contrib as it needs further testing, but hopefully,
this (or a variant of it) would make it into the backend as a "vacuum lo"
command in a later release.
</para>
</note>
</para>
<sect2>
<title>Usage</title>
<programlisting>
vacuumlo [options] database [database2 ... databasen]
</programlisting>
<para>
All databases named on the command line are processed. Available options
include:
</para>
<programlisting>
-v Write a lot of progress messages
-n Don't remove large objects, just show what would be done
-U username Username to connect as
-W Prompt for password
-h hostname Database server host
-p port Database server port
</programlisting>
</sect2>
<sect2>
<title>Method</title>
<para>
First, it builds a temporary table which contains all of the OIDs of the
large objects in that database.
</para>
<para>
It then scans through all columns in the database that are of type "oid"
or "lo", and removes matching entries from the temporary table.
</para>
<para>
The remaining entries in the temp table identify orphaned LOs. These are
removed.
</para>
</sect2>
<sect2>
<title>Author</title>
<para>
Peter Mount <email>peter@retep.org.uk</email>
</para>
<para>
<ulink url="http://www.retep.org.uk"></ulink>
</para>
</sect2>
</sect1>

436
doc/src/sgml/xml2.sgml Normal file
View File

@ -0,0 +1,436 @@
<sect1 id="xml2">
<title>xml2: XML-handling functions</title>
<indexterm zone="xml2">
<primary>xml2</primary>
</indexterm>
<sect2>
<title>Deprecation notice</title>
<para>
From PostgreSQL 8.3 on, there is XML-related
functionality based on the SQL/XML standard in the core server.
That functionality covers XML syntax checking and XPath queries,
which is what this module does as well, and more, but the API is
not at all compatible. It is planned that this module will be
removed in PostgreSQL 8.4 in favor of the newer standard API, so
you are encouraged to try converting your applications. If you
find that some of the functionality of this module is not
available in an adequate form with the newer API, please explain
your issue to pgsql-hackers@postgresql.org so that the deficiency
can be addressed.
</para>
</sect2>
<sect2>
<title>Description of functions</title>
<para>
The first set of functions are straightforward XML parsing and XPath queries:
</para>
<table>
<title>Functions</title>
<tgroup cols="2">
<tbody>
<row>
<entry>
<programlisting>
xml_is_well_formed(document) RETURNS bool
</programlisting>
</entry>
<entry>
<para>
This parses the document text in its parameter and returns true if the
document is well-formed XML. (Note: before PostgreSQL 8.2, this function
was called xml_valid(). That is the wrong name since validity and
well-formedness have different meanings in XML. The old name is still
available, but is deprecated and will be removed in 8.3.)
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_string(document,query) RETURNS text
xpath_number(document,query) RETURNS float4
xpath_bool(document,query) RETURNS bool
</programlisting>
</entry>
<entry>
<para>
These functions evaluate the XPath query on the supplied document, and
cast the result to the specified type.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_nodeset(document,query,toptag,itemtag) RETURNS text
</programlisting>
</entry>
<entry>
<para>
This evaluates query on document and wraps the result in XML tags. If
the result is multivalued, the output will look like:
</para>
<literal>
&lt;toptag>
&lt;itemtag>Value 1 which could be an XML fragment&lt;/itemtag>
&lt;itemtag>Value 2....&lt;/itemtag>
&lt;/toptag>
</literal>
<para>
If either toptag or itemtag is an empty string, the relevant tag is omitted.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_nodeset(document,query) RETURNS
</programlisting>
</entry>
<entry>
<para>
Like xpath_nodeset(document,query,toptag,itemtag) but text omits both tags.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_nodeset(document,query,itemtag) RETURNS
</programlisting>
</entry>
<entry>
<para>
Like xpath_nodeset(document,query,toptag,itemtag) but text omits toptag.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_list(document,query,seperator) RETURNS text
</programlisting>
</entry>
<entry>
<para>
This function returns multiple values seperated by the specified
seperator, e.g. Value 1,Value 2,Value 3 if seperator=','.
</para>
</entry>
</row>
<row>
<entry>
<programlisting>
xpath_list(document,query) RETURNS text
</programlisting>
</entry>
<entry>
This is a wrapper for the above function that uses ',' as the seperator.
</entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2>
<title><literal>xpath_table</literal></title>
<para>
This is a table function which evaluates a set of XPath queries on
each of a set of documents and returns the results as a table. The
primary key field from the original document table is returned as the
first column of the result so that the resultset from xpath_table can
be readily used in joins.
</para>
<para>
The function itself takes 5 arguments, all text.
</para>
<programlisting>
xpath_table(key,document,relation,xpaths,criteria)
</programlisting>
<table>
<title>Parameters</title>
<tgroup cols="2">
<tbody>
<row>
<entry><literal>key</literal></entry>
<entry>
<para>
the name of the "key" field - this is just a field to be used as
the first column of the output table i.e. it identifies the record from
which each output row came (see note below about multiple values).
</para>
</entry>
</row>
<row>
<entry><literal>document</literal></entry>
<entry>
<para>
the name of the field containing the XML document
</para>
</entry>
</row>
<row>
<entry><literal>relation</literal></entry>
<entry>
<para>
the name of the table or view containing the documents
</para>
</entry>
</row>
<row>
<entry><literal>xpaths</literal></entry>
<entry>
<para>
multiple xpath expressions separated by <literal>|</literal>
</para>
</entry>
</row>
<row>
<entry><literal>criteria</literal></entry>
<entry>
<para>
The contents of the where clause. This needs to be specified,
so use "true" or "1=1" here if you want to process all the rows in the
relation.
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
NB These parameters (except the XPath strings) are just substituted
into a plain SQL SELECT statement, so you have some flexibility - the
statement is
</para>
<para>
<literal>
SELECT &lt;key>,&lt;document> FROM &lt;relation> WHERE &lt;criteria>
</literal>
</para>
<para>
so those parameters can be *anything* valid in those particular
locations. The result from this SELECT needs to return exactly two
columns (which it will unless you try to list multiple fields for key
or document). Beware that this simplistic approach requires that you
validate any user-supplied values to avoid SQL injection attacks.
</para>
<para>
Using the function
</para>
<para>
The function has to be used in a FROM expression. This gives the following
form:
</para>
<programlisting>
SELECT * FROM
xpath_table('article_id',
'article_xml',
'articles',
'/article/author|/article/pages|/article/title',
'date_entered > ''2003-01-01'' ')
AS t(article_id integer, author text, page_count integer, title text);
</programlisting>
<para>
The AS clause defines the names and types of the columns in the
virtual table. If there are more XPath queries than result columns,
the extra queries will be ignored. If there are more result columns
than XPath queries, the extra columns will be NULL.
</para>
<para>
Note that I've said in this example that pages is an integer. The
function deals internally with string representations, so when you say
you want an integer in the output, it will take the string
representation of the XPath result and use PostgreSQL input functions
to transform it into an integer (or whatever type the AS clause
requests). An error will result if it can't do this - for example if
the result is empty - so you may wish to just stick to 'text' as the
column type if you think your data has any problems.
</para>
<para>
The select statement doesn't need to use * alone - it can reference the
columns by name or join them to other tables. The function produces a
virtual table with which you can perform any operation you wish (e.g.
aggregation, joining, sorting etc). So we could also have:
</para>
<programlisting>
SELECT t.title, p.fullname, p.email
FROM xpath_table('article_id','article_xml','articles',
'/article/title|/article/author/@id',
'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
AS t(article_id integer, title text, author_id integer),
tblPeopleInfo AS p
WHERE t.author_id = p.person_id;
</programlisting>
<para>
as a more complicated example. Of course, you could wrap all
of this in a view for convenience.
</para>
<sect3>
<title>Multivalued results</title>
<para>
The xpath_table function assumes that the results of each XPath query
might be multi-valued, so the number of rows returned by the function
may not be the same as the number of input documents. The first row
returned contains the first result from each query, the second row the
second result from each query. If one of the queries has fewer values
than the others, NULLs will be returned instead.
</para>
<para>
In some cases, a user will know that a given XPath query will return
only a single result (perhaps a unique document identifier) - if used
alongside an XPath query returning multiple results, the single-valued
result will appear only on the first row of the result. The solution
to this is to use the key field as part of a join against a simpler
XPath query. As an example:
</para>
<para>
<literal>
CREATE TABLE test
(
id int4 NOT NULL,
xml text,
CONSTRAINT pk PRIMARY KEY (id)
)
WITHOUT OIDS;
INSERT INTO test VALUES (1, '&lt;doc num="C1">
&lt;line num="L1">&lt;a>1&lt;/a>&lt;b>2&lt;/b>&lt;c>3&lt;/c>&lt;/line>
&lt;line num="L2">&lt;a>11&lt;/a>&lt;b>22&lt;/b>&lt;c>33&lt;/c>&lt;/line>
&lt;/doc>');
INSERT INTO test VALUES (2, '&lt;doc num="C2">
&lt;line num="L1">&lt;a>111&lt;/a>&lt;b>222&lt;/b>&lt;c>333&lt;/c>&lt;/line>
&lt;line num="L2">&lt;a>111&lt;/a>&lt;b>222&lt;/b>&lt;c>333&lt;/c>&lt;/line>
&lt;/doc>');
</literal>
</para>
</sect3>
<sect3>
<title>The query</title>
<programlisting>
SELECT * FROM xpath_table('id','xml','test',
'/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4,
val2 int4, val3 int4)
WHERE id = 1 ORDER BY doc_num, line_num
</programlisting>
<para>
Gives the result:
</para>
<programlisting>
id | doc_num | line_num | val1 | val2 | val3
----+---------+----------+------+------+------
1 | C1 | L1 | 1 | 2 | 3
1 | | L2 | 11 | 22 | 33
</programlisting>
<para>
To get doc_num on every line, the solution is to use two invocations
of xpath_table and join the results:
</para>
<programlisting>
SELECT t.*,i.doc_num FROM
xpath_table('id','xml','test',
'/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4),
xpath_table('id','xml','test','/doc/@num','1=1')
AS i(id int4, doc_num varchar(10))
WHERE i.id=t.id AND i.id=1
ORDER BY doc_num, line_num;
</programlisting>
<para>
which gives the desired result:
</para>
<programlisting>
id | line_num | val1 | val2 | val3 | doc_num
----+----------+------+------+------+---------
1 | L1 | 1 | 2 | 3 | C1
1 | L2 | 11 | 22 | 33 | C1
(2 rows)
</programlisting>
</sect3>
</sect2>
<sect2>
<title>XSLT functions</title>
<para>
The following functions are available if libxslt is installed (this is
not currently detected automatically, so you will have to amend the
Makefile)
</para>
<sect3>
<title><literal>xslt_process</literal></title>
<programlisting>
xslt_process(document,stylesheet,paramlist) RETURNS text
</programlisting>
<para>
This function appplies the XSL stylesheet to the document and returns
the transformed result. The paramlist is a list of parameter
assignments to be used in the transformation, specified in the form
'a=1,b=2'. Note that this is also proof-of-concept code and the
parameter parsing is very simple-minded (e.g. parameter values cannot
contain commas!)
</para>
<para>
Also note that if either the document or stylesheet values do not
begin with a < then they will be treated as URLs and libxslt will
fetch them. It thus follows that you can use xslt_process as a means
to fetch the contents of URLs - you should be aware of the security
implications of this.
</para>
<para>
There is also a two-parameter version of xslt_process which does not
pass any parameters to the transformation.
</para>
</sect3>
</sect2>
<sect2>
<title>Credits</title>
<para>
Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com)
It has the same BSD licence as PostgreSQL.
</para>
<para>
This version of the XML functions provides both XPath querying and
XSLT functionality. There is also a new table function which allows
the straightforward return of multiple XML results. Note that the current code
doesn't take any particular care over character sets - this is
something that should be fixed at some point!
</para>
<para>
If you have any comments or suggestions, please do contact me at
<email>jgray@azuli.co.uk.</email> Unfortunately, this isn't my main job, so
I can't guarantee a rapid response to your query!
</para>
</sect2>
</sect1>