1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-14 18:42:34 +03:00

Allow empty replacement strings in contrib/unaccent.

This is useful in languages where diacritic signs are represented as
separate characters; it's also one step towards letting unaccent be used
for arbitrary substring substitutions.

In passing, improve the user documentation for unaccent, which was sadly
vague about some important details.

Mohammad Alhashash, reviewed by Abhijit Menon-Sen
This commit is contained in:
Tom Lane
2014-06-30 20:51:26 -04:00
parent 55863274d9
commit 97c40ce614
2 changed files with 54 additions and 11 deletions

View File

@ -104,11 +104,21 @@ initTrie(char *filename)
while ((line = tsearch_readline(&trst)) != NULL)
{
/*
* The format of each line must be "src trg" where src and trg
* are sequences of one or more non-whitespace characters,
* separated by whitespace. Whitespace at start or end of
* line is ignored.
/*----------
* The format of each line must be "src" or "src trg", where
* src and trg are sequences of one or more non-whitespace
* characters, separated by whitespace. Whitespace at start
* or end of line is ignored. If trg is omitted, an empty
* string is used as the replacement.
*
* We use a simple state machine, with states
* 0 initial (before src)
* 1 in src
* 2 in whitespace after src
* 3 in trg
* 4 in whitespace after trg
* -1 syntax error detected (line will be ignored)
*----------
*/
int state;
char *ptr;
@ -160,7 +170,14 @@ initTrie(char *filename)
}
}
if (state >= 3)
if (state == 1 || state == 2)
{
/* trg was omitted, so use "" */
trg = "";
trglen = 0;
}
if (state > 0)
rootTrie = placeChar(rootTrie,
(unsigned char *) src, srclen,
trg, trglen);