1
0
mirror of https://github.com/postgres/postgres.git synced 2025-10-24 01:29:19 +03:00
Commit Graph

49 Commits

Author SHA1 Message Date
Tom Lane
47acf3add3 Remove new coupling between NAMEDATALEN and MAX_LEVENSHTEIN_STRLEN.
Commit e529cd4ffa introduced an Assert requiring NAMEDATALEN to be
less than MAX_LEVENSHTEIN_STRLEN, which has been 255 for a long time.
Since up to that instant we had always allowed NAMEDATALEN to be
substantially more than that, this was ill-advised.

It's debatable whether we need MAX_LEVENSHTEIN_STRLEN at all (versus
putting a CHECK_FOR_INTERRUPTS into the loop), or whether it has to be
so tight; but this patch takes the narrower approach of just not applying
the MAX_LEVENSHTEIN_STRLEN limit to calls from the parser.

Trusting the parser for this seems reasonable, first because the strings
are limited to NAMEDATALEN which is unlikely to be hugely more than 256,
and second because the maximum distance is tightly constrained by
MAX_FUZZY_DISTANCE (though we'd forgotten to make use of that limit in one
place).  That means the cost is not really O(mn) but more like O(max(m,n)).

Relaxing the limit for user-supplied calls is left for future research;
given the lack of complaints to date, it doesn't seem very high priority.

In passing, fix confusion between lengths-in-bytes and lengths-in-chars
in comments and error messages.

Per gripe from Kevin Day; solution suggested by Robert Haas.  Back-patch
to 9.5 where the unwanted restriction was introduced.
2016-01-22 11:53:06 -05:00
Heikki Linnakangas
4eaafa0453 Remove dead code.
Commit 13629df changed metaphone() function to return an empty string on
empty input, but it left the old error message in place. It's now dead code.

Michael Paquier, per Coverity warning.
2015-02-03 09:43:44 +02:00
Bruce Momjian
4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Robert Haas
c0828b78e9 Move the guts of our Levenshtein implementation into core.
The hope is that we can use this to produce better diagnostics in
some cases.

Peter Geoghegan, reviewed by Michael Paquier, with some further
changes by me.
2014-11-13 12:33:26 -05:00
Peter Eisentraut
e7128e8dbb Create function prototype as part of PG_FUNCTION_INFO_V1 macro
Because of gcc -Wmissing-prototypes, all functions in dynamically
loadable modules must have a separate prototype declaration.  This is
meant to detect global functions that are not declared in header files,
but in cases where the function is called via dfmgr, this is redundant.
Besides filling up space with boilerplate, this is a frequent source of
compiler warnings in extension modules.

We can fix that by creating the function prototype as part of the
PG_FUNCTION_INFO_V1 macro, which such modules have to use anyway.  That
makes the code of modules cleaner, because there is one less place where
the entry points have to be listed, and creates an additional check that
functions have the right prototype.

Remove now redundant prototypes from contrib and other modules.
2014-04-18 00:03:19 -04:00
Bruce Momjian
7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Bruce Momjian
bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Robert Haas
5d4b60f2f2 Lots of doc corrections.
Josh Kupershmidt
2012-04-23 22:43:09 -04:00
Bruce Momjian
e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Bruce Momjian
6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Peter Eisentraut
bf6be7af25 Put inline declaration before return type
gcc -Wextra complains that the other way around is obsolescent, and
this was the only place where it was written in this order.
2011-07-19 07:57:38 +03:00
Bruce Momjian
5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Robert Haas
604ab08145 Add levenshtein_less_equal, optimized version for small distances.
Alexander Korotkov, heavily revised by me.
2010-10-19 09:51:06 -04:00
Robert Haas
12679b8bc9 In levenshtein_internal(), describe algorithm a bit more clearly. 2010-09-24 14:36:39 -04:00
Magnus Hagander
9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Robert Haas
57d9aefcaa Teach levenshtein() about multi-byte characters.
Based on a patch by, and further ideas from, Alexander Korotkov.
2010-08-02 23:20:23 +00:00
Robert Haas
980341b3c2 Avoid using text_to_cstring() in levenshtein functions.
Operating directly on the underlying varlena saves palloc and memcpy
overhead, which testing shows to be significant.

Extracted from a larger patch by Alexander Korotkov.
2010-07-29 20:11:48 +00:00
Bruce Momjian
0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Robert Haas
da07641481 Fix levenshtein with costs. The previous code multiplied by the cost in only
3 of the 7 relevant locations.

Marcin Mank, slightly adjusted by me.
2009-12-10 01:54:17 +00:00
Bruce Momjian
d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Tom Lane
bb6bbc3277 Defend against non-ASCII letters in fuzzystrmatch code. The functions
still don't behave very sanely for multibyte encodings, but at least
they won't be indexing off the ends of static arrays.
2009-04-07 15:53:54 +00:00
Bruce Momjian
511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Tom Lane
55f6e5f689 Add a variant of the Levenshtein string-distance function that lets the user
specify the cost values to use, instead of always using 1's.
Volkan Yazici

In passing, remove fuzzystrmatch.h, which contained a bunch of stuff that had
no business being in a .h file; fold it into its only user, fuzzystrmatch.c.
2008-04-03 21:13:07 +00:00
Tom Lane
220db7ccd8 Simplify and standardize conversions between TEXT datums and ordinary C
strings.  This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString.  A number of
existing macros that provided variants on these themes were removed.

Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed.  There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).

This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring.  We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.

Brendan Jurd and Tom Lane
2008-03-25 22:42:46 +00:00
Bruce Momjian
9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Bruce Momjian
fb2b088cf4 Update /contrib/fuzzystrmatch error message to mention bytes, not just
'length', which can be characters.
2007-02-13 18:00:35 +00:00
Bruce Momjian
29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Bruce Momjian
b538215d5d Remove a few baby-C macros in fuzzystrmatch. Add a few missing includes. 2006-07-10 18:40:16 +00:00
Tom Lane
a0ffab351e Magic blocks don't do us any good unless we use 'em ... so install one
in every shared library.
2006-05-30 22:12:16 +00:00
Neil Conway
a323ede280 Fix a few places that were checking for the return value of palloc() to be
non-NULL: palloc() ereports on OOM, so we can safely assume it returns a
valid pointer.
2006-03-19 22:22:56 +00:00
Bruce Momjian
f3d99d160d Add CVS tag lines to files that were lacking them. 2006-03-11 04:38:42 +00:00
Bruce Momjian
f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Bruce Momjian
1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Bruce Momjian
c40cd3660f One of the web pages mentioned in dmetaphone.c has moved. Also fix
a few typos in comments.

The dictionaries I checked list "altho" as a variant of "although,"
but I didn't find any other instances of the former in the source
tree so I changed it.

Michael Fuhr
2005-09-30 22:38:44 +00:00
Neil Conway
1ac9f0e9f7 The attached patch implements the soundex difference function which
compares two strings' soundex values for similarity, from Kris Jurka.
Also mark the text_soundex() function as STRICT, to avoid crashing
on NULL input.
2005-01-26 08:04:04 +00:00
Bruce Momjian
2daed8c5b3 Update copyrights that were missed. 2005-01-01 05:43:09 +00:00
Bruce Momjian
da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Joe Conway
13629df5a1 Add double metaphone code from Andrew Dunstan. Also change metaphone so that
an empty input string causes an empty output string to be returned, instead of
throwing an ERROR -- per complaint from Aaron Hillegass, and consistent with
double metaphone. Fix examples in README.soundex pointed out by James Robinson.
2004-07-01 03:25:48 +00:00
Tom Lane
2f9c859ea1 Fix some copyright notices that weren't updated. Improve copyright tool
so it won't miss 'em again.
2003-08-04 23:59:41 +00:00
Bruce Momjian
089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane
8fd5b3ed67 Error message editing in contrib (mostly by Joe Conway --- thanks Joe!) 2003-07-24 17:52:50 +00:00
Bruce Momjian
7b1f6ffaab Jim C. Nasby wrote:
> Second argument to metaphone is suposed to set the limit on the
> number of characters to return, but it breaks on some phrases:
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'Hello world'::varchar AS a) a;
> HLW       | HLWR      | HLWRLT
>
> usps=# select metaphone(a,3),metaphone(a,4),metaphone(a,20) from
> (select 'A A COMEAUX MEMORIAL'::varchar AS a) a;
  > AKM       | AKMKS     | AKMKSMMRL
>
> In every case I've found that does this, the 4th and 5th letters are
> always 'KS'.

Nice catch.

There was a bug in the original metaphone algorithm from CPAN. Patch
attached (while I was at it I updated my email address, changed the
copyright to PGDG, and removed an unnecessary palloc). Here's how it
looks now:

regression=# select metaphone(a,4) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
   metaphone
-----------
   AKMK
(1 row)

regression=# select metaphone(a,5) from (select 'A A COMEAUX
MEMORIAL'::varchar AS a) a;
   metaphone
-----------
   AKMKS
(1 row)

Joe Conway
2003-06-24 22:59:46 +00:00
Tom Lane
e4704001ea This patch fixes a bunch of spelling mistakes in comments throughout the
PostgreSQL source code.

Neil Conway
2003-03-10 22:28:22 +00:00
Tom Lane
ee051baeac Make sure that all <ctype.h> routines are called with unsigned char
values; it's not portable to call them with signed chars.  I recall doing
this for the last release, but a few more uncasted calls have snuck in.
2001-12-30 23:09:42 +00:00
Bruce Momjian
cd01c32f55 Add trailing semicolon for Joe Conway 2001-10-29 19:41:54 +00:00
Bruce Momjian
b81844b173 pgindent run on all C files. Java run to follow. initdb/regression
tests pass.
2001-10-25 05:50:21 +00:00
Bruce Momjian
fde8edaf53 Add do { ... } while (0) to more bad macros. 2001-10-25 01:29:37 +00:00
Bruce Momjian
cdd02cdf00 Sorry - I should have gotten to this sooner. Here's a patch which you should
be able to apply against what you just committed. It rolls soundex into
fuzzystrmatch.

Remove soundex/metaphone and merge into fuzzystrmatch.

Joe Conway
2001-08-07 18:16:01 +00:00
Bruce Momjian
d8783c512e Per this discussion, here's a patch to implement both levenshtein() and
metaphone() in a contrib. There seem to be a fair number of different
approaches to both of these algorithms. I used the simplest case for
levenshtein which has a cost  of 1 for any character insertion, deletion, or
substitution. For metaphone, I adapted the same code from CPAN that the PHP
folks did.

A couple of questions:
1. Does it make sense to fold the soundex contrib together with this one?

2. I was debating trying to add multibyte support to levenshtein (it would
make no sense at all for metaphone), but a quick search through the contrib
directory found no hits on the word MULTIBYTE. Should worry about adding
multibyte support to levenshtein()?

Joe Conway
2001-08-07 16:47:43 +00:00