1
0
mirror of https://github.com/postgres/postgres.git synced 2025-12-09 02:08:45 +03:00

Fix the general case of quantified regex back-references.

Cases where a back-reference is part of a larger subexpression that
is quantified have never worked in Spencer's regex engine, because
he used a compile-time transformation that neglected the need to
check the back-reference match in iterations before the last one.
(That was okay for capturing parens, and we still do it if the
regex has *only* capturing parens ... but it's not okay for backrefs.)

To make this work properly, we have to add an "iteration" node type
to the regex engine's vocabulary of sub-regex nodes.  Since this is a
moderately large change with a fair risk of introducing new bugs of its
own, apply to HEAD only, even though it's a fix for a longstanding bug.
This commit is contained in:
Tom Lane
2012-02-24 01:40:18 -05:00
parent 0c9e5d5e0d
commit 173e29aa5d
6 changed files with 884 additions and 55 deletions

View File

@@ -102,15 +102,15 @@ consists of a tree of sub-expressions ("subre"s). Leaf tree nodes are
either plain regular expressions (which are executed as DFAs in the manner
described above) or back-references (which try to match the input to some
previous substring). Non-leaf nodes are capture nodes (which save the
location of the substring currently matching their child node) or
concatenation or alternation nodes. At execution time, the executor
recursively scans the tree. At concatenation or alternation nodes,
it considers each possible alternative way of matching the input string,
ie each place where the string could be split for a concatenation, or each
child node for an alternation. It tries the next alternative if the match
fails according to the child nodes. This is exactly the sort of
backtracking search done by a traditional NFA regex engine. If there are
many tree levels it can get very slow.
location of the substring currently matching their child node),
concatenation, alternation, or iteration nodes. At execution time, the
executor recursively scans the tree. At concatenation, alternation, or
iteration nodes, it considers each possible alternative way of matching the
input string, that is each place where the string could be split for a
concatenation or iteration, or each child node for an alternation. It
tries the next alternative if the match fails according to the child nodes.
This is exactly the sort of backtracking search done by a traditional NFA
regex engine. If there are many tree levels it can get very slow.
But all is not lost: we can still be smarter than the average pure NFA
engine. To do this, each subre node has an associated DFA, which