mirror of
https://github.com/postgres/postgres.git
synced 2025-07-27 12:41:57 +03:00
Move most /contrib README files into SGML. Some still need conversion
or will never be converted.
This commit is contained in:
@ -1,512 +0,0 @@
|
||||
contrib/ltree module
|
||||
|
||||
ltree - is a PostgreSQL contrib module which contains implementation of data
|
||||
types, indexed access methods and queries for data organized as a tree-like
|
||||
structures.
|
||||
This module will works for PostgreSQL version 7.3.
|
||||
(version for 7.2 version is available from http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz)
|
||||
-------------------------------------------------------------------------------
|
||||
All work was done by Teodor Sigaev (teodor@stack.net) and Oleg Bartunov
|
||||
(oleg@sai.msu.su). See http://www.sai.msu.su/~megera/postgres/gist for
|
||||
additional information. Authors would like to thank Eugeny Rodichev for helpful
|
||||
discussions. Comments and bug reports are welcome.
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
LEGAL NOTICES: This module is released under BSD license (as PostgreSQL
|
||||
itself). This work was done in framework of Russian Scientific Network and
|
||||
partially supported by Russian Foundation for Basic Research and Stack Group.
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
MOTIVATION
|
||||
|
||||
This is a placeholder for introduction to the problem. Hope, people reading
|
||||
this document doesn't need it too much :-)
|
||||
|
||||
DEFINITIONS
|
||||
|
||||
A label of a node is a sequence of one or more words separated by blank
|
||||
character '_' and containing letters and digits ( for example, [a-zA-Z0-9] for
|
||||
C locale). The length of a label is limited by 256 bytes.
|
||||
|
||||
Example: 'Countries', 'Personal_Services'
|
||||
|
||||
A label path of a node is a sequence of one or more dot-separated labels
|
||||
l1.l2...ln, represents path from root to the node. The length of a label path
|
||||
is limited by 65Kb, but size <= 2Kb is preferrable. We consider it's not a
|
||||
strict limitation ( maximal size of label path for DMOZ catalogue - http://
|
||||
www.dmoz.org, is about 240 bytes !)
|
||||
|
||||
Example: 'Top.Countries.Europe.Russia'
|
||||
|
||||
We introduce several datatypes:
|
||||
|
||||
ltree
|
||||
- is a datatype for label path.
|
||||
|
||||
ltree[]
|
||||
- is a datatype for arrays of ltree.
|
||||
|
||||
lquery
|
||||
- is a path expression that has regular expression in the label path and
|
||||
used for ltree matching. Star symbol (*) is used to specify any number of
|
||||
labels (levels) and could be used at the beginning and the end of lquery,
|
||||
for example, '*.Europe.*'.
|
||||
|
||||
The following quantifiers are recognized for '*' (like in Perl):
|
||||
|
||||
{n} Match exactly n levels
|
||||
{n,} Match at least n levels
|
||||
{n,m} Match at least n but not more than m levels
|
||||
{,m} Match at maximum m levels (eq. to {0,m})
|
||||
|
||||
It is possible to use several modifiers at the end of a label:
|
||||
|
||||
|
||||
@ Do case-insensitive label matching
|
||||
* Do prefix matching for a label
|
||||
% Don't account word separator '_' in label matching, that is
|
||||
'Russian%' would match 'Russian_nations', but not 'Russian'
|
||||
|
||||
lquery could contains logical '!' (NOT) at the beginning of the label and '
|
||||
|' (OR) to specify possible alternatives for label matching.
|
||||
|
||||
Example of lquery:
|
||||
|
||||
|
||||
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
|
||||
a) b) c) d) e)
|
||||
|
||||
A label path should
|
||||
+ a) begins from a node with label 'Top'
|
||||
+ b) and following zero or 2 labels until
|
||||
+ c) a node with label beginning from case-insensitive prefix 'sport'
|
||||
+ d) following node with label not matched 'football' or 'tennis' and
|
||||
+ e) ends on node with label beginning from 'Russ' or strictly matched
|
||||
'Spain'.
|
||||
|
||||
ltxtquery
|
||||
- is a datatype for label searching (like type 'query' for full text
|
||||
searching, see contrib/tsearch). It's possible to use modifiers @,%,* at
|
||||
the end of word. The meaning of modifiers are the same as for lquery.
|
||||
|
||||
Example: 'Europe & Russia*@ & !Transportation'
|
||||
|
||||
Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and
|
||||
not 'Transportation'. Notice, the order of words as they appear in label
|
||||
path is not important !
|
||||
|
||||
OPERATIONS
|
||||
|
||||
The following operations are defined for type ltree:
|
||||
|
||||
<,>,<=,>=,=, <>
|
||||
- have their usual meanings. Comparison is doing in the order of direct
|
||||
tree traversing, children of a node are sorted lexicographic.
|
||||
ltree @> ltree
|
||||
- returns TRUE if left argument is an ancestor of right argument (or
|
||||
equal).
|
||||
ltree <@ ltree
|
||||
- returns TRUE if left argument is a descendant of right argument (or
|
||||
equal).
|
||||
ltree ~ lquery, lquery ~ ltree
|
||||
- return TRUE if node represented by ltree satisfies lquery.
|
||||
ltree ? lquery[], lquery ? ltree[]
|
||||
- return TRUE if node represented by ltree satisfies at least one lquery
|
||||
from array.
|
||||
ltree @ ltxtquery, ltxtquery @ ltree
|
||||
- return TRUE if node represented by ltree satisfies ltxtquery.
|
||||
ltree || ltree, ltree || text, text || ltree
|
||||
- return concatenated ltree.
|
||||
|
||||
Operations for arrays of ltree (ltree[]):
|
||||
|
||||
ltree[] @> ltree, ltree <@ ltree[]
|
||||
- returns TRUE if array ltree[] contains an ancestor of ltree.
|
||||
ltree @> ltree[], ltree[] <@ ltree
|
||||
- returns TRUE if array ltree[] contains a descendant of ltree.
|
||||
ltree[] ~ lquery, lquery ~ ltree[]
|
||||
- returns TRUE if array ltree[] contains label paths matched lquery.
|
||||
ltree[] ? lquery[], lquery[] ? ltree[]
|
||||
- returns TRUE if array ltree[] contains label paths matched atleaset one
|
||||
lquery from array.
|
||||
ltree[] @ ltxtquery, ltxtquery @ ltree[]
|
||||
- returns TRUE if array ltree[] contains label paths matched ltxtquery
|
||||
(full text search).
|
||||
ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery
|
||||
- returns first element of array ltree[] satisfies corresponding condition
|
||||
and NULL in vice versa.
|
||||
|
||||
REMARK
|
||||
|
||||
Operations <@, @>, @ and ~ have analogues - ^<@, ^@>, ^@, ^~, which doesn't use
|
||||
indices !
|
||||
|
||||
INDICES
|
||||
|
||||
Various indices could be created to speed up execution of operations:
|
||||
|
||||
* B-tree index over ltree:
|
||||
<, <=, =, >=, >
|
||||
* GiST index over ltree:
|
||||
<, <=, =, >=, >, @>, <@, @, ~, ?
|
||||
Example:
|
||||
create index path_gist_idx on test using gist (path);
|
||||
* GiST index over ltree[]:
|
||||
ltree[]<@ ltree, ltree @> ltree[], @, ~, ?.
|
||||
Example:
|
||||
create index path_gist_idx on test using gist (array_path);
|
||||
Notices: This index is lossy.
|
||||
|
||||
FUNCTIONS
|
||||
|
||||
ltree subltree
|
||||
ltree subltree(ltree, start, end)
|
||||
returns subpath of ltree from start (inclusive) until the end.
|
||||
# select subltree('Top.Child1.Child2',1,2);
|
||||
subltree
|
||||
--------
|
||||
Child1
|
||||
ltree subpath
|
||||
ltree subpath(ltree, OFFSET,LEN)
|
||||
ltree subpath(ltree, OFFSET)
|
||||
returns subpath of ltree from OFFSET (inclusive) with length LEN.
|
||||
If OFFSET is negative returns subpath starts that far from the end
|
||||
of the path. If LENGTH is omitted, returns everything to the end
|
||||
of the path. If LENGTH is negative, leaves that many labels off
|
||||
the end of the path.
|
||||
# select subpath('Top.Child1.Child2',1,2);
|
||||
subpath
|
||||
-------
|
||||
Child1.Child2
|
||||
|
||||
# select subpath('Top.Child1.Child2',-2,1);
|
||||
subpath
|
||||
---------
|
||||
Child1
|
||||
int4 nlevel
|
||||
|
||||
int4 nlevel(ltree) - returns level of the node.
|
||||
# select nlevel('Top.Child1.Child2');
|
||||
nlevel
|
||||
--------
|
||||
3
|
||||
Note, that arguments start, end, OFFSET, LEN have meaning of level of the
|
||||
node !
|
||||
|
||||
int4 index(ltree,ltree), int4 index(ltree,ltree,OFFSET)
|
||||
returns number of level of the first occurence of second argument in first
|
||||
one beginning from OFFSET. if OFFSET is negative, than search begins from |
|
||||
OFFSET| levels from the end of the path.
|
||||
SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',3);
|
||||
index
|
||||
-------
|
||||
6
|
||||
SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4);
|
||||
index
|
||||
-------
|
||||
9
|
||||
|
||||
ltree text2ltree(text), text ltree2text(text)
|
||||
cast functions for ltree and text.
|
||||
|
||||
|
||||
ltree lca(ltree,ltree,...) (up to 8 arguments)
|
||||
ltree lca(ltree[])
|
||||
Returns Lowest Common Ancestor (lca)
|
||||
# select lca('1.2.2.3','1.2.3.4.5.6');
|
||||
lca
|
||||
-----
|
||||
1.2
|
||||
# select lca('{la.2.3,1.2.3.4.5.6}') is null;
|
||||
?column?
|
||||
----------
|
||||
f
|
||||
|
||||
|
||||
INSTALLATION
|
||||
|
||||
cd contrib/ltree
|
||||
make
|
||||
make install
|
||||
make installcheck
|
||||
|
||||
EXAMPLE OF USAGE
|
||||
|
||||
createdb ltreetest
|
||||
psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql
|
||||
psql ltreetest < ltreetest.sql
|
||||
|
||||
Now, we have a database ltreetest populated with a data describing hierarchy
|
||||
shown below:
|
||||
|
||||
|
||||
TOP
|
||||
/ | \
|
||||
Science Hobbies Collections
|
||||
/ | \
|
||||
Astronomy Amateurs_Astronomy Pictures
|
||||
/ \ |
|
||||
Astrophysics Cosmology Astronomy
|
||||
/ | \
|
||||
Galaxies Stars Astronauts
|
||||
|
||||
Inheritance:
|
||||
|
||||
ltreetest=# select path from test where path <@ 'Top.Science';
|
||||
path
|
||||
------------------------------------
|
||||
Top.Science
|
||||
Top.Science.Astronomy
|
||||
Top.Science.Astronomy.Astrophysics
|
||||
Top.Science.Astronomy.Cosmology
|
||||
(4 rows)
|
||||
|
||||
Matching:
|
||||
|
||||
ltreetest=# select path from test where path ~ '*.Astronomy.*';
|
||||
path
|
||||
-----------------------------------------------
|
||||
Top.Science.Astronomy
|
||||
Top.Science.Astronomy.Astrophysics
|
||||
Top.Science.Astronomy.Cosmology
|
||||
Top.Collections.Pictures.Astronomy
|
||||
Top.Collections.Pictures.Astronomy.Stars
|
||||
Top.Collections.Pictures.Astronomy.Galaxies
|
||||
Top.Collections.Pictures.Astronomy.Astronauts
|
||||
(7 rows)
|
||||
ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*';
|
||||
path
|
||||
------------------------------------
|
||||
Top.Science.Astronomy
|
||||
Top.Science.Astronomy.Astrophysics
|
||||
Top.Science.Astronomy.Cosmology
|
||||
(3 rows)
|
||||
|
||||
Full text search:
|
||||
|
||||
ltreetest=# select path from test where path @ 'Astro*% & !pictures@';
|
||||
path
|
||||
------------------------------------
|
||||
Top.Science.Astronomy
|
||||
Top.Science.Astronomy.Astrophysics
|
||||
Top.Science.Astronomy.Cosmology
|
||||
Top.Hobbies.Amateurs_Astronomy
|
||||
(4 rows)
|
||||
|
||||
ltreetest=# select path from test where path @ 'Astro* & !pictures@';
|
||||
path
|
||||
------------------------------------
|
||||
Top.Science.Astronomy
|
||||
Top.Science.Astronomy.Astrophysics
|
||||
Top.Science.Astronomy.Cosmology
|
||||
(3 rows)
|
||||
|
||||
Using Functions:
|
||||
|
||||
ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy';
|
||||
?column?
|
||||
------------------------------------------
|
||||
Top.Science.Space.Astronomy
|
||||
Top.Science.Space.Astronomy.Astrophysics
|
||||
Top.Science.Space.Astronomy.Cosmology
|
||||
(3 rows)
|
||||
We could create SQL-function:
|
||||
CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree
|
||||
AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
|
||||
LANGUAGE SQL IMMUTABLE;
|
||||
|
||||
and previous select could be rewritten as:
|
||||
|
||||
ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy';
|
||||
ins_label
|
||||
------------------------------------------
|
||||
Top.Science.Space.Astronomy
|
||||
Top.Science.Space.Astronomy.Astrophysics
|
||||
Top.Science.Space.Astronomy.Cosmology
|
||||
(3 rows)
|
||||
|
||||
Or with another arguments:
|
||||
|
||||
CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree
|
||||
AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));'
|
||||
LANGUAGE SQL IMMUTABLE;
|
||||
|
||||
ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy';
|
||||
ins_label
|
||||
------------------------------------------
|
||||
Top.Science.Space.Astronomy
|
||||
Top.Science.Space.Astronomy.Astrophysics
|
||||
Top.Science.Space.Astronomy.Cosmology
|
||||
(3 rows)
|
||||
|
||||
ADDITIONAL DATA
|
||||
|
||||
To get more feeling from our ltree module you could download
|
||||
dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes),
|
||||
available from http://www.sai.msu.su/~megera/postgres/gist/ltree/
|
||||
dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree.
|
||||
Setup your test database (dmoz), load ltree module and issue command:
|
||||
|
||||
zcat dmozltree-eng.sql.gz| psql dmoz
|
||||
|
||||
Data will be loaded into database dmoz and all indices will be created.
|
||||
|
||||
BENCHMARKS
|
||||
|
||||
All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ
|
||||
data, containing 300,274 nodes (see above for download link). We used some
|
||||
basic queries typical for walking through catalog.
|
||||
|
||||
QUERIES
|
||||
|
||||
* Q0: Count all rows (sort of base time for comparison)
|
||||
select count(*) from dmoz;
|
||||
count
|
||||
--------
|
||||
300274
|
||||
(1 row)
|
||||
* Q1: Get direct children (without inheritance)
|
||||
select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}';
|
||||
path
|
||||
-----------------------------------
|
||||
Top.Adult.Arts.Animation.Cartoons
|
||||
Top.Adult.Arts.Animation.Anime
|
||||
(2 rows)
|
||||
* Q2: The same as Q1 but with counting of successors
|
||||
select path as parentpath , (select count(*)-1 from dmoz where path <@
|
||||
p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}';
|
||||
parentpath | count
|
||||
-----------------------------------+-------
|
||||
Top.Adult.Arts.Animation.Cartoons | 2
|
||||
Top.Adult.Arts.Animation.Anime | 61
|
||||
(2 rows)
|
||||
* Q3: Get all parents
|
||||
select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by
|
||||
path asc;
|
||||
path
|
||||
--------------------------
|
||||
Top
|
||||
Top.Adult
|
||||
Top.Adult.Arts
|
||||
Top.Adult.Arts.Animation
|
||||
(4 rows)
|
||||
* Q4: Get all parents with counting of children
|
||||
select path, (select count(*)-1 from dmoz where path <@ p.path) as count
|
||||
from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc;
|
||||
path | count
|
||||
--------------------------+--------
|
||||
Top | 300273
|
||||
Top.Adult | 4913
|
||||
Top.Adult.Arts | 339
|
||||
Top.Adult.Arts.Animation | 65
|
||||
(4 rows)
|
||||
* Q5: Get all children with levels
|
||||
select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level
|
||||
from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc;
|
||||
path | level
|
||||
------------------------------------------------+-------
|
||||
Top.Adult.Arts.Animation.Anime | 1
|
||||
Top.Adult.Arts.Animation.Anime.Fan_Works | 2
|
||||
Top.Adult.Arts.Animation.Anime.Games | 2
|
||||
Top.Adult.Arts.Animation.Anime.Genres | 2
|
||||
Top.Adult.Arts.Animation.Anime.Image_Galleries | 2
|
||||
Top.Adult.Arts.Animation.Anime.Multimedia | 2
|
||||
Top.Adult.Arts.Animation.Anime.Resources | 2
|
||||
Top.Adult.Arts.Animation.Anime.Titles | 2
|
||||
Top.Adult.Arts.Animation.Cartoons | 1
|
||||
Top.Adult.Arts.Animation.Cartoons.AVS | 2
|
||||
Top.Adult.Arts.Animation.Cartoons.Members | 2
|
||||
(11 rows)
|
||||
|
||||
Timings
|
||||
|
||||
+---------------------------------------------+
|
||||
|Query|Rows|Time (ms) index|Time (ms) no index|
|
||||
|-----+----+---------------+------------------|
|
||||
| Q0| 1| NA| 1453.44|
|
||||
|-----+----+---------------+------------------|
|
||||
| Q1| 2| 0.49| 1001.54|
|
||||
|-----+----+---------------+------------------|
|
||||
| Q2| 2| 1.48| 3009.39|
|
||||
|-----+----+---------------+------------------|
|
||||
| Q3| 4| 0.55| 906.98|
|
||||
|-----+----+---------------+------------------|
|
||||
| Q4| 4| 24385.07| 4951.91|
|
||||
|-----+----+---------------+------------------|
|
||||
| Q5| 11| 0.85| 1003.23|
|
||||
+---------------------------------------------+
|
||||
Timings without indices were obtained using operations which doesn't use
|
||||
indices (see above)
|
||||
|
||||
Remarks
|
||||
|
||||
We didn't run full-scale tests, also we didn't present (yet) data for
|
||||
operations with arrays of ltree (ltree[]) and full text searching. We'll
|
||||
appreciate your input. So far, below some (rather obvious) results:
|
||||
|
||||
* Indices does help execution of queries
|
||||
* Q4 performs bad because one needs to read almost all data from the HDD
|
||||
|
||||
CHANGES
|
||||
|
||||
Mar 28, 2003
|
||||
Added functions index(ltree,ltree,offset), text2ltree(text),
|
||||
ltree2text(text)
|
||||
Feb 7, 2003
|
||||
Add ? operation
|
||||
Fix ~ operation bug: eg '1.1.1' ~ '*.1'
|
||||
Optimize index storage
|
||||
Aug 9, 2002
|
||||
Fixed very stupid but important bug :-)
|
||||
July 31, 2002
|
||||
Now works on 64-bit platforms.
|
||||
Added function lca - lowest common ancestor
|
||||
Version for 7.2 is distributed as separate package -
|
||||
http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz
|
||||
July 13, 2002
|
||||
Initial release.
|
||||
|
||||
TODO
|
||||
|
||||
* Testing on 64-bit platforms. There are several known problems with byte
|
||||
alignment; -- RESOLVED
|
||||
* Better documentation;
|
||||
* We plan (probably) to improve regular expressions processing using
|
||||
non-deterministic automata;
|
||||
* Some sort of XML support;
|
||||
* Better full text searching;
|
||||
|
||||
SOME BACKGROUNDS
|
||||
|
||||
The approach we use for ltree is much like one we used in our other GiST based
|
||||
contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical
|
||||
background is available in papers referenced from our GiST development page
|
||||
(http://www.sai.msu.su/~megera/postgres/gist).
|
||||
|
||||
A hierarchical data structure (tree) is a set of nodes. Each node has a
|
||||
signature (LPS) of a fixed size, which is a hashed label path of that node.
|
||||
Traversing a tree we could *certainly* prune branches if
|
||||
|
||||
LQS (bitwise AND) LPS != LQS
|
||||
|
||||
where LQS is a signature of lquery or ltxtquery, obtained in the same way as
|
||||
LPS.
|
||||
|
||||
ltree[]:
|
||||
For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children
|
||||
reachable from that node. Signatures are stored in RD-tree, implemented using
|
||||
GiST, which provides indexed access.
|
||||
|
||||
ltree:
|
||||
For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is
|
||||
represented by (left_bound, signature, right_bound), so that we could speedup
|
||||
operations <, <=, =, >=, > using left_bound, right_bound and prune branches of
|
||||
a tree using signature.
|
||||
-------------------------------------------------------------------------------
|
||||
We ask people who find the module useful to send us a postcards to:
|
||||
Moscow, 119899, Universitetski pr.13, Moscow State University, Sternberg
|
||||
Astronomical Institute, Russia
|
||||
For: Bartunov O.S.
|
||||
and
|
||||
Moscow, Bratislavskaya str.23, appt. 18, Russia
|
||||
For: Sigaev F.G.
|
Reference in New Issue
Block a user