1
0
mirror of https://github.com/postgres/postgres.git synced 2025-08-28 18:48:04 +03:00
Commit Graph

26 Commits

Author SHA1 Message Date
Tom Lane
3ccae48f44 Support indexing of regular-expression searches in contrib/pg_trgm.
This works by extracting trigrams from the given regular expression,
in generally the same spirit as the previously-existing support for
LIKE searches, though of course the details are far more complicated.

Currently, only GIN indexes are supported.  We might be able to make
it work with GiST indexes later.

The implementation includes adding API functions to backend/regex/
to provide a view of the search NFA created from a regular expression.
These functions are meant to be generic enough to be supportable in
a standalone version of the regex library, should that ever happen.

Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane
2013-04-09 01:06:54 -04:00
Tom Lane
628cbb50ba Re-implement extraction of fixed prefixes from regular expressions.
To generate btree-indexable conditions from regex WHERE conditions (such as
WHERE indexed_col ~ '^foo'), we need to be able to identify any fixed
prefix that a regex might have; that is, find any string that must be a
prefix of all strings satisfying the regex.  We used to do that with
entirely ad-hoc code that looked at the source text of the regex.  It
didn't know very much about regex syntax, which mostly meant that it would
fail to identify some optimizable cases; but Viktor Rosenfeld reported that
it would produce actively wrong answers for quantified parenthesized
subexpressions, such as '^(foo)?bar'.  Rather than trying to extend the
ad-hoc code to cover this, let's get rid of it altogether in favor of
identifying prefixes by examining the compiled form of a regex.

To do this, I've added a new entry point "pg_regprefix" to the regex library;
hopefully it is defined in a sufficiently general fashion that it can remain
in the library when/if that code gets split out as a standalone project.

Since this bug has been there for a very long time, this fix needs to get
back-patched.  However it depends on some other recent commits (particularly
the addition of wchar-to-database-encoding conversion), so I'll commit this
separately and then go to work on back-porting the necessary fixes.
2012-07-10 14:54:37 -04:00
Tom Lane
1e16a8107d Teach regular expression operators to honor collations.
This involves getting the character classification and case-folding
functions in the regex library to use the collations infrastructure.
Most of this work had been done already in connection with the upper/lower
and LIKE logic, so it was a simple matter of transposition.

While at it, split out these functions into a separate source file
regc_pg_locale.c, so that they can be correctly labeled with the Postgres
project's license rather than the Scriptics license.  These functions are
100% Postgres-written code whereas what remains in regc_locale.c is still
mostly not ours, so lumping them both under the same copyright notice was
getting more and more misleading.
2011-04-10 18:03:09 -04:00
Magnus Hagander
9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Peter Eisentraut
0474dcb608 Refactor backend makefiles to remove lots of duplicate code 2008-02-19 10:30:09 +00:00
PostgreSQL Daemon
969685ad44 $Header: -> $PostgreSQL Changes ... 2003-11-29 19:52:15 +00:00
Tom Lane
7bcc6d98fb Replace regular expression package with Henry Spencer's latest version
(extracted from Tcl 8.4.1 release, as Henry still hasn't got round to
making it a separate library).  This solves a performance problem for
multibyte, as well as upgrading our regexp support to match recent Tcl
and nearly match recent Perl.
2003-02-05 17:41:33 +00:00
Bruce Momjian
a2ba9a76b8 Remove retest Makefile entry because it does not compile. 2002-09-16 16:02:43 +00:00
Peter Eisentraut
77f7763b55 Remove all traces of multibyte and locale options. Clean up comments
referring to "multibyte" where it really means character encoding.
2002-09-03 21:45:44 +00:00
Tatsuo Ishii
f03b7ba0de Add dependency for regexec.c 2001-10-04 04:16:16 +00:00
Peter Eisentraut
e5ba2fc5b5 Make all commands that link a program look like
$(CC) $(CFLAGS) $(LDFLAGS) <object files> <extra-libraries> $(LIBS) -o $@

This form seemed to be the most portable, readable, and logical, but in any
case it's better than having a dozen different ones in the tree.
2000-11-30 20:36:13 +00:00
Peter Eisentraut
805e431a38 Add support for VPATH builds, that is, building somewhere else than in the
source directory.  This involves mostly makefiles using $(srcdir) when they
might have used ".".  (Regression tests don't work with this, yet.)

Sort out usage of CPPFLAGS, CFLAGS (and CXXFLAGS).  Add "override" keyword
in most places, to preserve necessary flags even when the user overrode the
flags.
2000-10-20 21:04:27 +00:00
Peter Eisentraut
424f0edcb8 Fix relative path references so that make knowns which dependencies refer
to one another. Sort out builddir vs srcdir variable namings. Remove some
now obsoleted make variables.
2000-08-31 16:12:35 +00:00
Tom Lane
091126fa28 Generated header files parse.h and fmgroids.h are now copied into
the src/include tree, so that -I backend is no longer necessary anywhere.
Also, clean up some bit rot in contrib tree.
2000-05-29 05:45:56 +00:00
Peter Eisentraut
533d516629 Removed MBFLAGS from makefiles since it's now done in include/config.h. 2000-01-19 02:59:03 +00:00
Bruce Momjian
a82f9ffde6 New LDOUT makefile variable for QNX os. 1999-12-13 22:35:27 +00:00
Bruce Momjian
3ffd3d82db Make LD -r as macros that can be changed for QNX. 1999-12-09 19:15:45 +00:00
Tatsuo Ishii
08bcc77a5c add retest, a regex testing program 1999-05-21 06:27:54 +00:00
Marc G. Fournier
5979d73841 From: t-ishii@sra.co.jp
As Bruce mentioned, this is due to the conflict among changes we made.
Included patches should fix the problem(I changed all MB to
MULTIBYTE). Please let me know if you have further problem.

P.S. I did not include pathces to configure and gram.c to save the
file size(configure.in and gram.y modified).
1998-07-26 04:31:41 +00:00
Marc G. Fournier
bf00bbb0c4 I really hope that I haven't missed anything in this one...
From: t-ishii@sra.co.jp

Attached are patches to enhance the multi-byte support.  (patches are
against 7/18 snapshot)

* determine encoding at initdb/createdb rather than compile time

Now initdb/createdb has an option to specify the encoding. Also, I
modified the syntax of CREATE DATABASE to accept encoding option. See
README.mb for more details.

For this purpose I have added new column "encoding" to pg_database.
Also pg_attribute and pg_class are changed to catch up the
modification to pg_database.  Actually I haved added pg_database_mb.h,
pg_attribute_mb.h and pg_class_mb.h. These are used only when MB is
enabled. The reason having separate files is I couldn't find a way to
use ifdef or whatever in those files. I have to admit it looks
ugly. No way.

* support for PGCLIENTENCODING when issuing COPY command

commands/copy.c modified.

* support for SQL92 syntax "SET NAMES"

See gram.y.

* support for LATIN2-5
* add UNICODE regression test case
* new test suite for MB

New directory test/mb added.

* clean up source files

Basic idea is to have MB's own subdirectory for easier maintenance.
These are include/mb and backend/utils/mb.
1998-07-24 03:32:46 +00:00
Bruce Momjian
1e801a8f16 Hi,
Attached you'll find a (big) patch that fixes make dep and make
depend in all Makefiles where I found it to be appropriate.

It also removes the dependency in Makefile.global for NAMEDATALEN
and OIDNAMELEN by making backend/catalog/genbki.sh and bin/initdb/initdb.sh
a little smarter.

This no longer requires initdb.sh that is turned into initdb with
a sed script when installing Postgres, hence initdb.sh should be
renamed to initdb (after the patch has been applied :-) )

This patch is against the 6.3 sources, as it took a while to
complete.

Please review and apply,

Cheers,

Jeroen van Vianen
1998-04-06 00:32:26 +00:00
Marc G. Fournier
661ecf3c48 From: t-ishii@sra.co.jp
Included are patches intended for allowing PostgreSQL to handle
multi-byte charachter sets such as EUC(Extende Unix Code), Unicode and
Mule internal code. With the MB patch you can use multi-byte character
sets in regexp and LIKE. The encoding system chosen is determined at
the compile time.

To enable the MB extension, you need to define a variable "MB" in
Makefile.global or in Makefile.custom. For further information please
take a look at README.mb under doc directory.

(Note that unlike "jp patch" I do not use modified GNU regexp any
more. I changed Henry Spencer's regexp coming with PostgreSQL.)
1998-03-15 07:39:04 +00:00
Marc G. Fournier
6e337eef45 Major cleanout of PORTNAME variables from Makefiles...bound to screw up
some of the ports...
1997-12-20 00:29:35 +00:00
Marc G. Fournier
5379b84eff More cleanups. I can now compile without PORTNAME being defined n
Makefile.global.

End result, if all goes well, should allow for much easier porting, since
there will no longer be a concept of a "port".  Most, if not everything,
*should* be determined by configure, or by the compiler itself.  Still
work to be done though :)
1997-12-19 02:09:10 +00:00
Bruce Momjian
a0990e1884 Makefile cleanup after reorganization 1996-11-09 06:24:51 +00:00
Bryan Henderson
b0d6f0aa63 Simplify make files, add full dependencies. 1996-10-27 09:55:05 +00:00