Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
On-disk and binary in/out format of tsquery are backward compatible.
It has two side effect:
- change order for tsquery, so, users, who has a btree index over tsquery,
should reindex it
- less number of parenthesis in tsquery output, and tsquery becomes more
readable
Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
Reviewers: Alexander Korotkov, Artur Zakirov
The tsquery, ltxtquery and query_int data types have a common ancestor.
Having acquired check_stack_depth() calls independently, each was
missing at least one call. Back-patch to 9.0 (all supported versions).
This alters various incidental uses of C++ key words to use other similar
identifiers, so that a C++ compiler won't choke outright. You still
(probably) need extern "C" { }; around the inclusion of backend headers.
based on a patch by Kurt Harriman <harriman@acm.org>
Also add a script cpluspluscheck to check for C++ compatibility in the
future. As of right now, this passes without error for me.
- change the alignment requirement of lexemes in TSVector slightly.
Lexeme strings were always padded to 2-byte aligned length to make sure
that if there's position array (uint16[]) it has the right alignment.
The patch changes that so that the padding is not done when there's no
positions. That makes the storage of tsvectors without positions
slightly more compact.
- added some #include "miscadmin.h" lines I missed in the earlier when I
added calls to check_stack_depth().
- Reimplement the send/recv functions, and added a comment
above them describing the on-wire format. The CRC is now recalculated in
tsquery as well per previous discussion.
- add code to check that the query tree is well-formed. It was indeed
possible to send malformed queries in binary mode, which produced all
kinds of strange results.
- make the left-field a uint32. There's no reason to
arbitrarily limit it to 16-bits, and it won't increase the disk/memory
footprint either now that QueryOperator and QueryOperand are separate
structs.
- add check_stack_depth() call to all recursive functions I found.
Some of them might have a natural limit so that you can't force
arbitrarily deep recursions, but check_stack_depth() is cheap enough
that seems best to just stick it into anything that might be a problem.
small editorization by me
- Brake the QueryItem struct into QueryOperator and QueryOperand.
Type was really the only common field between them. QueryItem still
exists, and is used in the TSQuery struct as before, but it's now a
union of the two. Many other changes fell from that, like separation
of pushval_asis function into pushValue, pushOperator and pushStop.
- Moved some structs that were for internal use only from header files
to the right .c-files.
- Moved tsvector parser to a new tsvector_parser.c file. Parser code was
about half of the size of tsvector.c, it's also used from tsquery.c, and
it has some data structures of its own, so it seems better to separate
it. Cleaned up the API so that TSVectorParserState is not accessed from
outside tsvector_parser.c.
- Separated enumerations (#defines, really) used for QueryItem.type
field and as return codes from gettoken_query. It was just accidental
code sharing.
- Removed ParseQueryNode struct used internally by makepol and friends.
push*-functions now construct QueryItems directly.
- Changed int4 variables to just ints for variables like "i" or "array
size", where the storage-size was not significant.
Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing,
so anything that's broken is probably my fault.
Documentation is nonexistent as yet, but let's land the patch so we can
get some portability testing done.