1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-28 23:42:10 +03:00

Disallow making an empty lexeme via array_to_tsvector().

The tsvector data type has always forbidden lexemes to be empty.
However, array_to_tsvector() didn't get that memo, and would
allow an empty-string array element to become an empty lexeme.
This could result in dump/restore failures later, not to mention
whatever semantic issues might be behind the original prohibition.

However, other functions that take a plain text input directly as
a lexeme value do not need a similar restriction, because they only
match the string against existing tsvector entries.  In particular
it'd be a bad idea to make ts_delete() reject empty strings, since
that is the most convenient way to clean up any bad data that might
have gotten into a tsvector column via this bug.

Reflecting on that, let's also remove the prohibition against NULL
array elements in tsvector_delete_arr and tsvector_setweight_by_filter.
It seems more consistent to ignore them, as an empty-string element
would be ignored.

There's a case for back-patching this, since it's clearly a bug fix.
On balance though, it doesn't seem like something to change in a
minor release.

Jean-Christophe Arnu

Discussion: https://postgr.es/m/CAHZmTm1YVndPgUVRoag2WL0w900XcoiivDDj-gTTYBsG25c65A@mail.gmail.com
This commit is contained in:
Tom Lane
2021-11-06 13:28:53 -04:00
parent 1241fcbd7e
commit cbe25dcff7
4 changed files with 44 additions and 17 deletions

View File

@ -12920,8 +12920,10 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
<returnvalue>tsvector</returnvalue>
</para>
<para>
Converts an array of lexemes to a <type>tsvector</type>.
The given strings are used as-is without further processing.
Converts an array of text strings to a <type>tsvector</type>.
The given strings are used as lexemes as-is, without further
processing. Array elements must not be empty strings
or <literal>NULL</literal>.
</para>
<para>
<literal>array_to_tsvector('{fat,cat,rat}'::text[])</literal>
@ -13104,6 +13106,9 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
Assigns the specified <parameter>weight</parameter> to elements
of the <parameter>vector</parameter> that are listed
in <parameter>lexemes</parameter>.
The strings in <parameter>lexemes</parameter> are taken as lexemes
as-is, without further processing. Strings that do not match any
lexeme in <parameter>vector</parameter> are ignored.
</para>
<para>
<literal>setweight('fat:2,4 cat:3 rat:5,6B'::tsvector, 'A', '{cat,rat}')</literal>
@ -13265,6 +13270,8 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
<para>
Removes any occurrence of the given <parameter>lexeme</parameter>
from the <parameter>vector</parameter>.
The <parameter>lexeme</parameter> string is treated as a lexeme as-is,
without further processing.
</para>
<para>
<literal>ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, 'fat')</literal>
@ -13281,6 +13288,9 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
Removes any occurrences of the lexemes
in <parameter>lexemes</parameter>
from the <parameter>vector</parameter>.
The strings in <parameter>lexemes</parameter> are taken as lexemes
as-is, without further processing. Strings that do not match any
lexeme in <parameter>vector</parameter> are ignored.
</para>
<para>
<literal>ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat'])</literal>