1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-16 06:01:02 +03:00

Recognize "match-all" NFAs within the regex engine.

This builds on the previous "rainbow" patch to detect NFAs that will
match any string, though possibly with constraints on the string length.
This definition is chosen to match constructs such as ".*", ".+", and
".{1,100}".  Recognizing such an NFA after the optimization pass is
fairly cheap, since we basically just have to verify that all arcs
are RAINBOW arcs and count the number of steps to the end state.
(Well, there's a bit of complication with pseudo-color arcs for string
boundary conditions, but not much.)

Once we have these markings, the regex executor functions longest(),
shortest(), and matchuntil() don't have to expend per-character work
to determine whether a given substring satisfies such an NFA; they
just need to check its length against the bounds.  Since some matching
problems require O(N) invocations of these functions, we've reduced
the runtime for an N-character string from O(N^2) to O(N).  Of course,
this is no help for non-matchall sub-patterns, but those usually have
constraints that allow us to avoid needing O(N) substring checks in the
first place.  It's precisely the unconstrained "match-all" cases that
cause the most headaches.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
This commit is contained in:
Tom Lane
2021-02-20 18:31:19 -05:00
parent 08c0d6ad65
commit 824bf71902
5 changed files with 376 additions and 1 deletions

View File

@ -77,6 +77,10 @@ pg_regprefix(regex_t *re,
assert(g->tree != NULL);
cnfa = &g->tree->cnfa;
/* matchall NFAs never have a fixed prefix */
if (cnfa->flags & MATCHALL)
return REG_NOMATCH;
/*
* Since a correct NFA should never contain any exit-free loops, it should
* not be possible for our traversal to return to a previously visited NFA