1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-16 06:01:02 +03:00

Disallow making an empty lexeme via array_to_tsvector().

The tsvector data type has always forbidden lexemes to be empty.
However, array_to_tsvector() didn't get that memo, and would
allow an empty-string array element to become an empty lexeme.
This could result in dump/restore failures later, not to mention
whatever semantic issues might be behind the original prohibition.

However, other functions that take a plain text input directly as
a lexeme value do not need a similar restriction, because they only
match the string against existing tsvector entries.  In particular
it'd be a bad idea to make ts_delete() reject empty strings, since
that is the most convenient way to clean up any bad data that might
have gotten into a tsvector column via this bug.

Reflecting on that, let's also remove the prohibition against NULL
array elements in tsvector_delete_arr and tsvector_setweight_by_filter.
It seems more consistent to ignore them, as an empty-string element
would be ignored.

There's a case for back-patching this, since it's clearly a bug fix.
On balance though, it doesn't seem like something to change in a
minor release.

Jean-Christophe Arnu

Discussion: https://postgr.es/m/CAHZmTm1YVndPgUVRoag2WL0w900XcoiivDDj-gTTYBsG25c65A@mail.gmail.com
This commit is contained in:
Tom Lane
2021-11-06 13:28:53 -04:00
parent 1241fcbd7e
commit cbe25dcff7
4 changed files with 44 additions and 17 deletions

View File

@ -322,10 +322,9 @@ tsvector_setweight_by_filter(PG_FUNCTION_ARGS)
int lex_len,
lex_pos;
/* Ignore null array elements, they surely don't match */
if (nulls[i])
ereport(ERROR,
(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
errmsg("lexeme array may not contain nulls")));
continue;
lex = VARDATA(dlexemes[i]);
lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ;
@ -602,10 +601,9 @@ tsvector_delete_arr(PG_FUNCTION_ARGS)
int lex_len,
lex_pos;
/* Ignore null array elements, they surely don't match */
if (nulls[i])
ereport(ERROR,
(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
errmsg("lexeme array may not contain nulls")));
continue;
lex = VARDATA(dlexemes[i]);
lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ;
@ -761,13 +759,21 @@ array_to_tsvector(PG_FUNCTION_ARGS)
deconstruct_array(v, TEXTOID, -1, false, TYPALIGN_INT, &dlexemes, &nulls, &nitems);
/* Reject nulls (maybe we should just ignore them, instead?) */
/*
* Reject nulls and zero length strings (maybe we should just ignore them,
* instead?)
*/
for (i = 0; i < nitems; i++)
{
if (nulls[i])
ereport(ERROR,
(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
errmsg("lexeme array may not contain nulls")));
if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0)
ereport(ERROR,
(errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
errmsg("lexeme array may not contain empty strings")));
}
/* Sort and de-dup, because this is required for a valid tsvector. */