mirror of
https://github.com/postgres/postgres.git
synced 2025-11-10 17:42:29 +03:00
Remove useless whitespace at end of lines
This commit is contained in:
@@ -9,27 +9,27 @@ Gin stands for Generalized Inverted Index and should be considered as a genie,
|
||||
not a drink.
|
||||
|
||||
Generalized means that the index does not know which operation it accelerates.
|
||||
It instead works with custom strategies, defined for specific data types (read
|
||||
"Index Method Strategies" in the PostgreSQL documentation). In that sense, Gin
|
||||
It instead works with custom strategies, defined for specific data types (read
|
||||
"Index Method Strategies" in the PostgreSQL documentation). In that sense, Gin
|
||||
is similar to GiST and differs from btree indices, which have predefined,
|
||||
comparison-based operations.
|
||||
|
||||
An inverted index is an index structure storing a set of (key, posting list)
|
||||
pairs, where 'posting list' is a set of documents in which the key occurs.
|
||||
(A text document would usually contain many keys.) The primary goal of
|
||||
An inverted index is an index structure storing a set of (key, posting list)
|
||||
pairs, where 'posting list' is a set of documents in which the key occurs.
|
||||
(A text document would usually contain many keys.) The primary goal of
|
||||
Gin indices is support for highly scalable, full-text search in PostgreSQL.
|
||||
|
||||
Gin consists of a B-tree index constructed over entries (ET, entries tree),
|
||||
where each entry is an element of the indexed value (element of array, lexeme
|
||||
for tsvector) and where each tuple in a leaf page is either a pointer to a
|
||||
B-tree over item pointers (PT, posting tree), or a list of item pointers
|
||||
for tsvector) and where each tuple in a leaf page is either a pointer to a
|
||||
B-tree over item pointers (PT, posting tree), or a list of item pointers
|
||||
(PL, posting list) if the tuple is small enough.
|
||||
|
||||
Note: There is no delete operation for ET. The reason for this is that in
|
||||
our experience, the set of distinct words in a large corpus changes very
|
||||
rarely. This greatly simplifies the code and concurrency algorithms.
|
||||
|
||||
Gin comes with built-in support for one-dimensional arrays (eg. integer[],
|
||||
Gin comes with built-in support for one-dimensional arrays (eg. integer[],
|
||||
text[]), but no support for NULL elements. The following operations are
|
||||
available:
|
||||
|
||||
@@ -59,25 +59,25 @@ Gin Fuzzy Limit
|
||||
|
||||
There are often situations when a full-text search returns a very large set of
|
||||
results. Since reading tuples from the disk and sorting them could take a
|
||||
lot of time, this is unacceptable for production. (Note that the search
|
||||
lot of time, this is unacceptable for production. (Note that the search
|
||||
itself is very fast.)
|
||||
|
||||
Such queries usually contain very frequent lexemes, so the results are not
|
||||
very helpful. To facilitate execution of such queries Gin has a configurable
|
||||
soft upper limit on the size of the returned set, determined by the
|
||||
'gin_fuzzy_search_limit' GUC variable. This is set to 0 by default (no
|
||||
Such queries usually contain very frequent lexemes, so the results are not
|
||||
very helpful. To facilitate execution of such queries Gin has a configurable
|
||||
soft upper limit on the size of the returned set, determined by the
|
||||
'gin_fuzzy_search_limit' GUC variable. This is set to 0 by default (no
|
||||
limit).
|
||||
|
||||
If a non-zero search limit is set, then the returned set is a subset of the
|
||||
whole result set, chosen at random.
|
||||
|
||||
"Soft" means that the actual number of returned results could slightly differ
|
||||
from the specified limit, depending on the query and the quality of the
|
||||
from the specified limit, depending on the query and the quality of the
|
||||
system's random number generator.
|
||||
|
||||
From experience, a value of 'gin_fuzzy_search_limit' in the thousands
|
||||
(eg. 5000-20000) works well. This means that 'gin_fuzzy_search_limit' will
|
||||
have no effect for queries returning a result set with less tuples than this
|
||||
have no effect for queries returning a result set with less tuples than this
|
||||
number.
|
||||
|
||||
Limitations
|
||||
@@ -115,5 +115,5 @@ Distant future:
|
||||
Authors
|
||||
-------
|
||||
|
||||
All work was done by Teodor Sigaev (teodor@sigaev.ru) and Oleg Bartunov
|
||||
All work was done by Teodor Sigaev (teodor@sigaev.ru) and Oleg Bartunov
|
||||
(oleg@sai.msu.su).
|
||||
|
||||
@@ -24,21 +24,21 @@ The current implementation of GiST supports:
|
||||
* Concurrency
|
||||
* Recovery support via WAL logging
|
||||
|
||||
The support for concurrency implemented in PostgreSQL was developed based on
|
||||
the paper "Access Methods for Next-Generation Database Systems" by
|
||||
The support for concurrency implemented in PostgreSQL was developed based on
|
||||
the paper "Access Methods for Next-Generation Database Systems" by
|
||||
Marcel Kornaker:
|
||||
|
||||
http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/access-methods-for-next-generation.pdf.gz
|
||||
|
||||
The original algorithms were modified in several ways:
|
||||
|
||||
* They should be adapted to PostgreSQL conventions. For example, the SEARCH
|
||||
algorithm was considerably changed, because in PostgreSQL function search
|
||||
should return one tuple (next), not all tuples at once. Also, it should
|
||||
* They should be adapted to PostgreSQL conventions. For example, the SEARCH
|
||||
algorithm was considerably changed, because in PostgreSQL function search
|
||||
should return one tuple (next), not all tuples at once. Also, it should
|
||||
release page locks between calls.
|
||||
* Since we added support for variable length keys, it's not possible to
|
||||
guarantee enough free space for all keys on pages after splitting. User
|
||||
defined function picksplit doesn't have information about size of tuples
|
||||
* Since we added support for variable length keys, it's not possible to
|
||||
guarantee enough free space for all keys on pages after splitting. User
|
||||
defined function picksplit doesn't have information about size of tuples
|
||||
(each tuple may contain several keys as in multicolumn index while picksplit
|
||||
could work with only one key) and pages.
|
||||
* We modified original INSERT algorithm for performance reason. In particular,
|
||||
@@ -67,7 +67,7 @@ gettuple(search-pred)
|
||||
ptr = top of stack
|
||||
while(true)
|
||||
latch( ptr->page, S-mode )
|
||||
if ( ptr->page->lsn != ptr->lsn )
|
||||
if ( ptr->page->lsn != ptr->lsn )
|
||||
ptr->lsn = ptr->page->lsn
|
||||
currentposition=0
|
||||
if ( ptr->parentlsn < ptr->page->nsn )
|
||||
@@ -88,7 +88,7 @@ gettuple(search-pred)
|
||||
else if ( ptr->page is leaf )
|
||||
unlatch( ptr->page )
|
||||
return tuple
|
||||
else
|
||||
else
|
||||
add to stack child page
|
||||
end
|
||||
currentposition++
|
||||
@@ -99,20 +99,20 @@ gettuple(search-pred)
|
||||
Insert Algorithm
|
||||
----------------
|
||||
|
||||
INSERT guarantees that the GiST tree remains balanced. User defined key method
|
||||
Penalty is used for choosing a subtree to insert; method PickSplit is used for
|
||||
the node splitting algorithm; method Union is used for propagating changes
|
||||
INSERT guarantees that the GiST tree remains balanced. User defined key method
|
||||
Penalty is used for choosing a subtree to insert; method PickSplit is used for
|
||||
the node splitting algorithm; method Union is used for propagating changes
|
||||
upward to maintain the tree properties.
|
||||
|
||||
NOTICE: We modified original INSERT algorithm for performance reason. In
|
||||
NOTICE: We modified original INSERT algorithm for performance reason. In
|
||||
particularly, it is now a single-pass algorithm.
|
||||
|
||||
Function findLeaf is used to identify subtree for insertion. Page, in which
|
||||
insertion is proceeded, is locked as well as its parent page. Functions
|
||||
findParent and findPath are used to find parent pages, which could be changed
|
||||
because of concurrent access. Function pageSplit is recurrent and could split
|
||||
page by more than 2 pages, which could be necessary if keys have different
|
||||
lengths or more than one key are inserted (in such situation, user defined
|
||||
Function findLeaf is used to identify subtree for insertion. Page, in which
|
||||
insertion is proceeded, is locked as well as its parent page. Functions
|
||||
findParent and findPath are used to find parent pages, which could be changed
|
||||
because of concurrent access. Function pageSplit is recurrent and could split
|
||||
page by more than 2 pages, which could be necessary if keys have different
|
||||
lengths or more than one key are inserted (in such situation, user defined
|
||||
function pickSplit cannot guarantee free space on page).
|
||||
|
||||
findLeaf(new-key)
|
||||
@@ -143,7 +143,7 @@ findLeaf(new-key)
|
||||
end
|
||||
|
||||
findPath( stack item )
|
||||
push stack, [root, 0, 0] // page, LSN, parent
|
||||
push stack, [root, 0, 0] // page, LSN, parent
|
||||
while( stack )
|
||||
ptr = top of stack
|
||||
latch( ptr->page, S-mode )
|
||||
@@ -152,7 +152,7 @@ findPath( stack item )
|
||||
end
|
||||
for( each tuple on page )
|
||||
if ( tuple->pagepointer == item->page )
|
||||
return stack
|
||||
return stack
|
||||
else
|
||||
add to stack at the end [tuple->pagepointer,0, ptr]
|
||||
end
|
||||
@@ -160,12 +160,12 @@ findPath( stack item )
|
||||
unlatch( ptr->page )
|
||||
pop stack
|
||||
end
|
||||
|
||||
|
||||
findParent( stack item )
|
||||
parent = item->parent
|
||||
latch( parent->page, X-mode )
|
||||
if ( parent->page->lsn != parent->lsn )
|
||||
while(true)
|
||||
while(true)
|
||||
search parent tuple on parent->page, if found the return
|
||||
rightlink = parent->page->rightlink
|
||||
unlatch( parent->page )
|
||||
@@ -214,7 +214,7 @@ placetopage(page, keysarray)
|
||||
keysarray = [ union(keysarray) ]
|
||||
end
|
||||
end
|
||||
|
||||
|
||||
insert(new-key)
|
||||
stack = findLeaf(new-key)
|
||||
keysarray = [new-key]
|
||||
@@ -236,4 +236,4 @@ insert(new-key)
|
||||
|
||||
Authors:
|
||||
Teodor Sigaev <teodor@sigaev.ru>
|
||||
Oleg Bartunov <oleg@sai.msu.su>
|
||||
Oleg Bartunov <oleg@sai.msu.su>
|
||||
|
||||
@@ -154,7 +154,7 @@ even pages that don't contain any deletable tuples. This guarantees that
|
||||
the btbulkdelete call cannot return while any indexscan is still holding
|
||||
a copy of a deleted index tuple. Note that this requirement does not say
|
||||
that btbulkdelete must visit the pages in any particular order. (See also
|
||||
on-the-fly deletion, below.)
|
||||
on-the-fly deletion, below.)
|
||||
|
||||
There is no such interlocking for deletion of items in internal pages,
|
||||
since backends keep no lock nor pin on a page they have descended past.
|
||||
|
||||
@@ -5608,7 +5608,7 @@ GetLatestXTime(void)
|
||||
* Returns timestamp of latest processed commit/abort record.
|
||||
*
|
||||
* When the server has been started normally without recovery the function
|
||||
* returns NULL.
|
||||
* returns NULL.
|
||||
*/
|
||||
Datum
|
||||
pg_last_xact_replay_timestamp(PG_FUNCTION_ARGS)
|
||||
|
||||
Reference in New Issue
Block a user