1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-04 12:42:24 +03:00

5 Commits

Author SHA1 Message Date
Tom Lane
fad06b2873 Further fix pg_trgm's extraction of trigrams from regular expressions.
Commit 9e43e8714 turns out to have been insufficient: not only is it
necessary to track tentative parent links while considering a set of
arc removals, but it's necessary to track tentative flag additions
as well.  This is because we always merge arc target states into
arc source states; therefore, when considering a merge of the final
state with some other, it is the other state that will acquire a new
TSTATE_FIN bit.  If there's another arc for the same color trigram
that would cause merging of that state with the initial state, we
failed to recognize the problem.  The test cases for the prior commit
evidently only exercised situations where a tentative merge with the
initial state occurs before one with the final state.  If it goes the
other way around, we'll happily merge the initial and final states,
either producing a broken final graph that would never match anything,
or triggering the Assert added by the prior commit.

It's tempting to consider switching the merge direction when the merge
involves the final state, but I lack the time to analyze that idea in
detail.  Instead just keep track of the flag changes that would result
from proposed merges, in the same way that the prior commit tracked
proposed parent links.

Along the way, add some more debugging support, because I'm not entirely
confident that this is the last bug here.  And tweak matters so that
the transformed.dot file uses small integers rather than pointer values
to identify states; that makes it more readable if you're just eyeballing
it rather than fooling with Graphviz.  And rename a couple of identically
named struct fields to reduce confusion.

Per report from Corey Csuhta.  Add a test case based on his example.
(Note: this case does not trigger the bug under 9.3, apparently because
its different measurement of costs causes it to stop merging states before
it hits the failure.  I spent some time trying to find a variant that would
fail in 9.3, without success; but I'm sure such cases exist.)

Like the previous patch, back-patch to 9.3 where this code was added.

Report: https://postgr.es/m/E2B01A4B-4530-406B-8D17-2F67CF9A16BA@csuhta.com
2017-04-14 14:52:03 -04:00
Tom Lane
53b5a8c131 Fix contrib/pg_trgm's extraction of trigrams from regular expressions.
The logic for removing excess trigrams from the result was faulty.
It intends to avoid merging the initial and final states of the NFA,
which is necessary, but in testing whether removal of a specific trigram
would cause that, it failed to consider the combined effects of all the
state merges that that trigram's removal would cause.  This could result
in a broken final graph that would never match anything, leading to GIN
or GiST indexscans not finding anything.

To fix, add a "tentParent" field that is used only within this loop,
and set it to show state merges that we are tentatively going to do.
While examining a particular arc, we must chase up through tentParent
links as well as regular parent links (the former can only appear atop
the latter), and we must account for state init/fin flag merges that
haven't actually been done yet.

To simplify the latter, combine the separate init and fin bool fields
into a bitmap flags field.  I also chose to get rid of the "children"
state list, which seems entirely inessential.

Per bug #14563 from Alexey Isayko, which the added test cases are based on.
Back-patch to 9.3 where this code was added.

Report: https://postgr.es/m/20170222111446.1256.67547@wrigleys.postgresql.org
Discussion: https://postgr.es/m/8816.1487787594@sss.pgh.pa.us
2017-02-22 15:04:07 -05:00
Bruce Momjian
04e15c69d2 Remove tabs after spaces in C comments
This was not changed in HEAD, but will be done later as part of a
pgindent run.  Future pgindent runs will also do this.

Report by Tom Lane

Backpatch through all supported branches, but not HEAD
2014-05-06 11:26:28 -04:00
Tom Lane
6f5b8beb64 Make contrib/pg_trgm also support regex searches with GiST indexes.
This wasn't addressed in the original patch, but it doesn't take very
much additional code to cover the case, so let's get it done.

Since pg_trgm 1.1 hasn't been released yet, I just changed the definition
of what's in it, rather than inventing a 1.2.
2013-04-10 13:31:02 -04:00
Tom Lane
3ccae48f44 Support indexing of regular-expression searches in contrib/pg_trgm.
This works by extracting trigrams from the given regular expression,
in generally the same spirit as the previously-existing support for
LIKE searches, though of course the details are far more complicated.

Currently, only GIN indexes are supported.  We might be able to make
it work with GiST indexes later.

The implementation includes adding API functions to backend/regex/
to provide a view of the search NFA created from a regular expression.
These functions are meant to be generic enough to be supportable in
a standalone version of the regex library, should that ever happen.

Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane
2013-04-09 01:06:54 -04:00